ABSTRACT
Pipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing the overall execution cost of branch instructions. Three schemes to reduce the cost of branches are presented in the context of a general pipeline model. Ten realistic Unix domain programs are used to directly compare the cost and performance of the three schemes and the results are in favor of the software-based scheme. For example, the software-based scheme has a cost of 1.65 cycles/branch vs. a cost of 1.68 cycles/branch of the best hardware scheme for a highly pipelined processor (11-stage pipeline). The results are 1.19 (software scheme) vs. 1.23 cycles/branch (best hardware scheme) for a moderately pipelined processor (5-stage pipeline).
- 1.S. McFarling and J. L. Hennessy, "Reducing the cost of branches," in Proc. 13th Annu. Symp. on Comput. Arch., (Tokyo, Japan), pp. 396-403, June 1986. Google ScholarDigital Library
- 2.J. S. Emer and D. W. Clark, "A characterization of processor performance in the VAX-11/780,"" in Proc. flth. Annu. Symp. on Comput. Arch., pp. 301-309, Google ScholarDigital Library
- 3.J. K. F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Computer, Jan. 1984.Google Scholar
- 4.J. E. Smith, "A study of branch predition strategies," in Proc. 8th Annu. Symp. on Comput. Arch., pp. 135-148, June 1981. Google ScholarDigital Library
- 5.D. J. Lilja, "Reducing the branch penalty in pipelined processors," IEEE Computer, July 1988. Google ScholarDigital Library
- 6.J. A. DeRosa and H. M. Levy, "An evaluation of branch architectures,"" in Proc. 15th. Annu. Symp. on Comput. Arch., pp. 10-16, June 1987. Google ScholarDigital Library
- 7.S. Bandyopadhyay, V. S. Begwani, and R. B. Murray, 'Compiling for the CRISP microprocessor," in Proc. 1987 Spring COMPCON, pp. 86-96, 1987.Google Scholar
- 8.D. R. Ditzel and H. R. McLellan, "Branch folding in the CRISP microprocessor: reducing branch delay to zero,"" in Proc. 14th Annu. Symp. on Comput. Arch., pp. 2-9, June 1987. Google ScholarDigital Library
- 9.Digital Equipment Corp., VAX12 Architecture Handbook, 1979.Google Scholar
- 10.D. A. Patterson and C. H. Sequin, "RISC I: a reduced instruction set VLSI computer," in Proc. 8th Annu. Symp. on Comput. Arch., pp. 443-457, May 1981. Google ScholarDigital Library
- 11.W. W. Hwu and P. P. Chang, "Trace selection for compiling large C application programs to microcode," in Proc. 2lst Annu. Workshop on Microprogramming and Microarchitectures, (San Diego, CA.), Nov. 1988. Google ScholarDigital Library
- 12.R. M. Tomasulo, "An efhcient algorithm for exploiting multiple arithmetic units," IBM Journal of Besearch and Development, vol. 11, pp. 25-33, Jan. 1967.Google ScholarDigital Library
- 13.J. E. Thornton, "Parallel operation in the Control Data 6600," in Proc. AFIPS FJCC, pp. 33-40, 1964.Google Scholar
- 14.J. A. Fisher, "Trace scheduling: A technique for global microcode compaction," IEEE Trans. Comput., vol. c-30, no. 7, pp. 478-490, July 1981.Google ScholarDigital Library
Index Terms
- Comparing software and hardware schemes for reducing the cost of branches
Recommendations
Comparing software and hardware schemes for reducing the cost of branches
Special Issue: Proceedings of the 16th annual international symposium on Computer ArchitecturePipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing ...
Reducing the cost of branches by using registers
Special Issue: Proceedings of the 17th annual international symposium on Computer ArchitectureIn an attempt to reduce the number of operand memory references, many RISC machines have thirty-two or more general-purpose registers (e.g., MIPS, ARM, Spectrum, 88K). Without special compiler optimizations, such as inlining or interprocedural register ...
Reducing the cost of branches by using registers
ISCA '90: Proceedings of the 17th annual international symposium on Computer ArchitectureIn an attempt to reduce the number of operand memory references, many RISC machines have thirty-two or more general-purpose registers (e.g., MIPS, ARM, Spectrum, 88K). Without special compiler optimizations, such as inlining or interprocedural register ...
Comments