skip to main content
research-article

A systematic approach for optimized bypass configurations for application-specific embedded processors

Published:30 September 2013Publication History
Skip Abstract Section

Abstract

The diversity of today's mobile applications requires embedded processor cores with a high resource efficiency, that means, the devices should provide a high performance at low area requirements and power consumption. The fine-grained parallelism supported by multiple functional units of VLIW architectures offers a high throughput at reasonable low clock frequencies compared to single-core RISC processors. To efficiently utilize the processor pipeline, common system architectures have to cope with data hazards due to data dependencies between consecutive operations. On the one hand, such hazards can be resolved by complex forwarding circuits (i.e., a pipeline bypass) which forward intermediate results to a subsequent instruction. On the other hand, the pipeline bypass can strongly affect or even dominate the total resource requirements and degrade the maximum clock frequency. In this work the CoreVA VLIW architecture is used for the development and the analysis of application-specific bypass configurations. It is shown that many paths of a comprehensive bypass system are rarely used and may not be required for certain applications. For this reason, several strategies have been implemented to enhance the efficiency of the total system by introducing application-specific bypass configurations. The configuration can be carried out statically by only implementing required paths or at runtime by dynamically reconfiguring the hardware. An algorithm is proposed which derives an optimized configuration by iteratively disabling single bypass paths. The adaptation of these application-specific bypass configurations allows for a reduction of the critical path by 26%. As a result, the execution time and energy requirements could be reduced by up to 21.5%. Using Dynamic Frequency Scaling (DFS) and dynamic deactivation/reactivation of bypass paths allows for a runtime reconfiguration of the bypass system. This ensures the highest efficiency while processing varying applications.

References

  1. Ahuja, P. S., Clark, D. W., and Rogers, A. 1995. The performance impact of incomplete bypassing in processor pipelines. In Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO'95). 36--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brigham, E. and Morrow, R. 2009. The fast Fourier transform. IEEE Spectrum 4, 12, 63--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brown, M. D. and Patt, Y. N. 2001. Using internal redundant representations and limited bypass to support pipelined adders and register files. In Proceedings of the 8th Annual International Symposium on High-Performance Computer Architecture. 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daemen, J. and Rijmen, V. 2002. The Design of Rijndael: AES--The Advanced Encryption Standard. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dreesen, R., Jungeblut, T., Thies, M., Porrmann, M., Rückert, U., and Kastens, U. 2009. A synchronization method for register traces of pipelined processors. In Proceedings of the International Embedded Systems Symposium (IESS'09). 207--217.Google ScholarGoogle Scholar
  6. Ekdahl, P. and Johansson, T. 2000. SNOW-- A new stream cipher. In Proceedings of the 1st Open NESSIE Workshop.Google ScholarGoogle Scholar
  7. Fan, K., Clark, N., Chu, M., Manjunath, K. V., Ravindran, R., Smelyanskiy, M., and Mahlke, S. 2003. Systematic register bypass customization for application-specific processors. In Proceedings of the of IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASSAP'03). 64--74.Google ScholarGoogle Scholar
  8. Fisher, J. A. 1983. Very long instruction word architectures and the ELI-512. In Proceedings of the 10th Annual International Symposium on Computer Architecture (ISCA'83). 140--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fisher, J. A. 2009. Retrospective: Very long instruction word architectures and the ELI-512. IEEE Solid-State Circ. Mag. 1, 34--36.Google ScholarGoogle ScholarCross RefCross Ref
  10. Fisher, J. A., Faraboschi, P., and Young, C. 2009. VLIW processors: From blue sky to best buy. IEEE Solid-State Circ. Mag. 1, 10--17.Google ScholarGoogle ScholarCross RefCross Ref
  11. Goel, N., Kumar, A., and Panda, P. R. 2007. Power reduction in VLIW processor with compiler driven bypass network. In Proceedings of the 20th International Conference on VLSI Design (VLSID'07), held jointly with 6th International Conference on Embedded Systems. 233--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hsu, C., Kremer, U., and Hsiao, M. 2001. Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design. IEEE, 275--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES'05), held in conjunction with the 3rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO'05).Google ScholarGoogle Scholar
  14. Jungeblut, T., Dreesen, R., Porrmann, M., Thies, M., Rückert, U., and Kastens, U. 2010a. A framework for the design space exploration of software-defined radio applications. In Proceedings of the 2nd International ICST Conference on Mobile Lightweight Wireless Systems.Google ScholarGoogle Scholar
  15. Jungeblut, T., Klassen, D., Dreesen, R., Porrmann, M., Thies, M., Rückert, U., and Kastens, U. 2009. Design space exploration for next generation wireless technologies. In Proceedings of the Electrical and Electronic Engineering for Communication Conference (EEEfCOM'09).Google ScholarGoogle Scholar
  16. Jungeblut, T., Puttmann, C., Dreesen, R., Porrmann, M., Thies, M., Rückert, U., and Kastens, U. 2010b. Resource efficiency of hardware extensions of a 4-issue VLIW processor for elliptic curve cryptography. Adv. Radio Sci. 8, 295--305.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jungeblut, T., Sievers, G., Porrmann, M., and Rückert, U. 2010c. Design space exploration for memory subsystems of VLIW architectures. In Proceedings of the 5th IEEE International Conference on Networking, Architecture, and Storage (NAS'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kastens, U., Le, D. K., Slowik, A., and Thies, M. 2004. Feedback driven instruction-set extension. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lung, C., Hsiao, H., Zeng, Z., and Chang, S. 2010. LP-based multi-mode multi-corner clock skew optimization. In Proceedings of the International Symposium on VLSI Design Automation and Test (VLSI-DAT'10). IEEE, 335--338.Google ScholarGoogle Scholar
  20. Peterson, W. W. and Brown, D. T. 1961. Cyclic codes for error detection. Proc. IRE 49, 1, 228--235.Google ScholarGoogle ScholarCross RefCross Ref
  21. Porrmann, M., Hagemeyer, J., Pohl, C., Romoth, J., and Strugholtz, M. 2010. RAPTOR -- A scalable platform for rapid prototyping and FPGA-based cluster computing. In Parallel Computing: From Multicores and GPU's to Petascale, Advances in Parallel Computing, vol. 19, IOS Press, 592--599.Google ScholarGoogle Scholar
  22. Richardson, I. 2010. The H.264 Advanced Video Compression Standard. John Wiley and Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., and Zafalon, R. 2002. Low-power data forwarding for VLIW embedded architectures. IEEE Trans. VLSI Syst. 10, 5, 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Terechko, A., Garg, M., and Corporaal, H. 2005. Evaluation of speed and area of clustered VLIW processors. In Proceedings of the 18th International Conference on VLSI Design. IEEE, 557--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Viterbi, A. 2002. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 2, 260--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Weicker, R. 1984. Dhrystone: A synthetic systems programming benchmark. Comm. ACM 27, 10, 1013--1030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xie, Y., Wolf, W., and Lekatsas, H. 2006. Code compression for embedded VLIW processors using variable-to-fixed coding. IEEE Trans. VLSI Syst. 14, 5, 525--536. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A systematic approach for optimized bypass configurations for application-specific embedded processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 13, Issue 2
      Special issue on application-specific processors
      September 2013
      254 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/2514641
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 September 2013
      • Accepted: 1 May 2012
      • Revised: 1 February 2012
      • Received: 1 February 2011
      Published in tecs Volume 13, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader