ABSTRACT
Thread-level speculative execution is a technique that makes it possible for a wider range of single-threaded applications to make use of the processing resources in a chip multiprocessor.We consider module-level speculation, i.e., speculative threads executing the code after a module (i.e., a procedure, function, or method) call. Unfortunately, previous studies have shown that indiscriminate module-level speculation results in significant overheads, mainly due to frequent misspeculations. In addition to hurting performance, excessive overhead is harmful from a resource usage and energy efficiency standpoint. We show that the overhead when spawning speculative threads for all module continuations is on average three times as big as the time spent on useful execution on our baseline 8-way chip multiprocessorIn this paper, we present and make a detailed evaluation of a technique that aims at reducing the overhead associated with misspeculations. History-based prediction is used in an attempt to prevent speculative threads from being spawned when they are expected to cause misspeculations. We find that the overhead can be reduced with a factor of six on average compared to indiscriminate speculation. The impact on speedup is small for most applications, but in several cases speedup is slightly improved.
- L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00), pages 282--293. ACM Press, June 2000. Google ScholarDigital Library
- M. K. Chen and K. Olukotun. Exploiting method-level parallelism in single-threaded Java programs. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (PACT'98), pages 176--184. IEEE Computer Society, Oct. 1998. Google ScholarDigital Library
- Y. Chen, R. Sendag, and D. J. Lilja. Using incorrect speculation to prefetch data in a concurrent multithreaded processor. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '03), page~76. IEEE Computer Society, Apr. 2003. Google ScholarDigital Library
- M. Cintra and J. Torrellas. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In Proceedings of the Eight International Symposium on High-Performance Computer Architecture (HPCA '02), pages 43--54. IEEE Computer Society, Feb. 2002. Google ScholarDigital Library
- L. Codrescu and D. S. Wills. Architecture of the atlas chip-multiprocessor: Dynamically parallelizing irregular applications. In Proceedings of the 1999 International Conference on Computer Design (ICCD '99), pages 428--435. IEEE Computer Society, Oct. 1999. Google ScholarDigital Library
- L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII '98), pages 58--69. ACM Press, Oct. 1998. Google ScholarDigital Library
- V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 48(9):866--880, 1999. Google ScholarDigital Library
- P. S. Magnusson, F. Larsson, A. Moestedt, B. Werner, F. Dahlgren, M. Karlsson, F. Lundholm, J. Nilsson, P. Stenström, and H. Grahn. SimICS/sun4m: A virtual workstation. In Proceedings of the USENIX 1998 Annual Technical Conference, pages 119--130. USENIX Association, June 1998. Google ScholarDigital Library
- P. Marcuello and A. Gonzalez. Clustered speculative multithreaded processors. In Proceedings of the 1999 International Conference on Supercomputing (ICS '99), pages 365--372. ACM Press, June 1999. Google ScholarDigital Library
- P. Marcuello and A. Gonzalez. A quantitative assessment of thread-level speculation techniques. In Proceedings of the 14th International Conference on Parallel and Distributed Processing Symposium (IPDPS '00), pages 595--604. IEEE Computer Society, May 2000. Google ScholarDigital Library
- A. Moshovos and G. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA '97), pages 181--193. IEEE Computer Society, May 1997. Google ScholarDigital Library
- C.-L. Ooi, S. W. Kim, I. Park, R. Eigenmann, B. Falsafi, and T. N. Vijaykumar. Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In International Conference on Supercomputing (ICS '01), pages 368--380. ACM Press, June 2001. Google ScholarDigital Library
- J. Oplinger and M. S. Lam. Enhancing software reliability with speculative threads. In Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '02), pages 184--196. ACM Press, Oct. 2002. Google ScholarDigital Library
- J. T. Oplinger, D. L. Heine, and M. S. Lam. In search of speculative thread-level parallelism. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT '99), pages 303--313. IEEE Computer Society, Oct. 1999. Google ScholarDigital Library
- G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), pages 414--425. ACM Press, June 1995. Google ScholarDigital Library
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In Proceedings of the Eight International Symposium on High-Performance Computer Architecture (HPCA '02), page~65. IEEE Computer Society, Feb. 2002. Google ScholarDigital Library
- J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture (HPCA '98), pages 2--13. IEEE Computer Society, Feb. 1998. Google ScholarDigital Library
- J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), pages 35--46. IEEE Computer Society, Oct. 1996. Google ScholarDigital Library
- F. Warg and P. Stenström. Limits on speculative module-level parallelism in imperative and object-oriented programs on CMP platforms. In International Conference on Parallel Architectures and Compilation Techniques (PACT '01), pages 221--230. IEEE Computer Society, Sept. 2001. Google ScholarDigital Library
- F. Warg and P. Stenström. Improving speculative thread-level parallelism through module run-length prediction. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '03), page~12. IEEE Computer Society, Apr. 2003. Google ScholarDigital Library
Index Terms
- Reducing misspeculation overhead for module-level speculative execution
Recommendations
Reducing misspeculation penalty in trace-level speculative multithreaded architectures
ISHPC'05/ALPS'06: Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systemsTrace-Level Speculative Multithreaded Processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several ...
Compiler and hardware support for reducing the synchronization of speculative threads
Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program ...
Exposing speculative thread parallelism in SPEC2000
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programmingAs increasing the performance of single-threaded processors becomes increasingly difficult, consumer desktop processors are moving toward multi-core designs. One way to enhance the performance of chip multiprocessors that has received considerable ...
Comments