Abstract
Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.
- Agarwal, M., Paul, B., Zhang, M., and Mitra, S. 2007. Circuit failure prediction and its application to transistor aging. In Proceedings of the 25th IEEE VLSI Test Symposium. 277--286. Google ScholarDigital Library
- Aisopos, K., Chen, C.-H., and Peh, L.-S. 2011a. Enabling system-level modeling of variation-induced faults in networks-on-chips. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference (DAC'11). 930--935. Google ScholarDigital Library
- Aisopos, K., Deorio, A., Peh, L.-S., and Bertacco, V. 2011b. Ariadne: Agnostic reconfiguration in a disconnected network environment. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). 298--309. Google ScholarDigital Library
- Alaghi, A., Karimi, N., Sedghi, M., and Navabi, Z. 2007. Online noc switch fault detection and diagnosis using a high level fault model. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 21--29. Google ScholarDigital Library
- Alaghi, A., Sedghi, M., Karimi, N., Fathy, M., and Navabi, Z. 2008. Reliable noc architecture utilizing a robust rerouting algorithms. In 9th IEEE East-West Design and Test Symposium (EWDTS'08).Google Scholar
- Ali, M., Welzl, M., and Hellebrand, S. 2005. A dynamic routing mechanism for network on chip. In Proceedings of the 23rd NORCHIP Conference. 70--73.Google Scholar
- Ali, M., Welzl, M., and Hessler, S. 2007. And end 2 end reliability protocol to address transient faults in network on chips. In Digest of the Workshop on Diagnostic Services in Network-on-Chips.Google Scholar
- Anghel, L. and Nicolaidis, M. 2000. Cost reduction and evaluation of a temporary faults detecting technique. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 591--598. Google ScholarDigital Library
- Angiolini, F., Meloni, P., Carta, S., Benini, L., and Raffo, L. 2006. Contrasting a noc and a traditional interconnect fabric with layout awareness. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). Vol. 1. 1--6. Google ScholarDigital Library
- Avizienis, A., Laprie, J.-C., Randell, B., and Landwehr, C. 2004. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1, 1, 11--33. Google ScholarDigital Library
- Baumann, R. 2005. Soft errors in advanced computer systems. IEEE Des. Test Comput. 22, 3, 258--266. Google ScholarDigital Library
- Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Leung, V., Mackay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Miao, C.-C., Ramey, C., Wentzlaff, D., Anderson, W., Berger, E., Fairbanks, N., Khan, D., Montenegro, F., Stickney, J., and Zook, J. 2008. TILE64 processor: A 64-core SoC with mesh interconnect. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'08). 87--90.Google Scholar
- Bertozzi, D., Benini, L., and De Micheli, G. 2002. Low power error resilient encoding for on-chip data buses. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 102--109. Google ScholarDigital Library
- Bertozzi, D., Benini, L., and De Micheli, G. 2005. Error control schemes for on-chip communication links: The energy reliability tradeoff. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 24, 6, 818--831. Google ScholarDigital Library
- Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of network-on-chip. ACM Comput. Surv. 38, 1--51. Google ScholarDigital Library
- Bobda, C., Ahmadinia, A., Majer, M., Teich, J., Fekete, S., and Van Der Veen, J. 2005. Dynoc: A dynamic infrastructure for communication in dynamically reconfigurable devices. In Proceedings of the International Field Programmable Logic and Applications Conference. 153--158.Google ScholarCross Ref
- Bogdan, P., Dumitras, T., and Marculescu, R. 2007. Stochastic communication: A new paradigm for fault-tolerant networks-on-chip. VLSI Des. 2007, 1--17.Google ScholarCross Ref
- Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2007. Routing table minimization for irregular mesh nocs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'07). 1--6. Google ScholarDigital Library
- Bondavalli, A., Chiaradonna, S., Giandomenico, F. D., and Grandoni, F. 2000. Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Trans. Comput. 49, 4, 230--245. Google ScholarDigital Library
- Boppana, R. V. and Chalasani, S. 1995. Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Trans. Comput. 44, 7, 848--864. Google ScholarDigital Library
- Borkar, S. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6, 10--16. Google ScholarDigital Library
- Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). ACM Press, New York, 746--749. Google ScholarDigital Library
- Boyan, J. and Littman, M. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Adv. Neural Inf. Process. Syst. 6, 671--678.Google Scholar
- Breuer, M., Gupta, S., and Mak, T. 2004. Defect and error tolerance in the presence of massive numbers of defects. IEEE Des. Test Comput. 21, 3, 216--227. Google ScholarDigital Library
- Chen, C.-L. and Chiu, G.-M. 2001. A fault-tolerant routing scheme for meshes with nonconvex faults. IEEE Trans. Parallel Distrib. Syst. 12, 5, 467--475. Google ScholarDigital Library
- Concatto, C., Matos, D., Carro, L., Kastensmidt, F., Susin, A., Cota, E., and Kreutz, M. 2009. Fault tolerant mechanism to improve yield in nocs using a reconfigurable router. In Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design (SBCCI'09). ACM Press, New York, 1--6. Google ScholarDigital Library
- Constantinescu, C. 2003. Trends and challenges in vlsi circuit reliability. IEEE Micro 23, 4, 14--19. Google ScholarDigital Library
- Constantinides, K., Plaza, S., Blome, J., Zhang, B., Bertacco, V., Mahlke, S., Austin, T., and Orshansky, M. 2006. Bulletproof: A defect-tolerant cmp switch architecture. In Proceedings of the 12th International High-Performance Computer Architecture Symposium. 5--16.Google Scholar
- Cota, E., Kastensmidt, F., Cassel, M., Herve, M., Almeida, P., Meirelles, P., Amory, A., and Lubaszewski, M. 2008. A high-fault-coverage approach for the test of data, control and handshake interconnects in mesh networks-on-chip. IEEE Trans. Comput. 57, 9, 1202--1215. Google ScholarDigital Library
- Cuviello, M., Dey, S., Bai, X., and Zhao, Y. 1999. Fault modeling and simulation for crosstalk in system-on-chip interconnects. In IEEE/ACM International Digest of Technical Papers on Computer-Aided Design. 297--303. Google ScholarDigital Library
- Dalirsani, A., Holst, S., Elm, M., and Wunderlich, H. 2011. Structural test for graceful degradation of noc switches. In Proceedings of the European Test Symposium (ETS'11). 183--188. Google ScholarDigital Library
- De Micheli, G. and Benini, L. 2006. Networks On Chips: Technology and Tools. Morgan Kaufmann Publishers.Google Scholar
- Dodd, P. and Massengill, L. 2003. Basic mechanisms and modeling of single-event upset in digital microelectronics. IEEE Trans. Nuclear Sci. 50, 3, 583--602.Google ScholarCross Ref
- Duan, X., Zhang, D., and Sun, X. 2009. Fault-tolerant routing schemes for wormhole mesh. In Proceedings of the IEEE International Parallel and Distributed Processing with Applications Symposium. 298--301.Google Scholar
- Duato, J., Lysne, O., Pang, R., and Pinkston, T. 2005. Part i: A theory for deadlock-free dynamic network reconfiguration. IEEE Trans. Parallel Distrib. Syst. 16, 5, 412--427. Google ScholarDigital Library
- Dubrova, E. 2008. Fault-Tolerant Design: An Introduction. Kluwer Academic Publishers. Google ScholarDigital Library
- Dumitras, T. and Marculescu, R. 2003. On-chip stochastic communication {soc applications}. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'03). 790--795. Google ScholarDigital Library
- Dutta, A. and Touba, N. 2007. Reliable network-on-chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Defect and Fault-Tolerance in VLSI Systems Symposium (DFT'07). 3--11. Google ScholarDigital Library
- Eghbal, A., Yaghini, P. M., Pedram, H., and Zarandi, H. R. 2010. Designing fault-tolerant network-on-chip router architecture. Int. J. Electron. 97, 10, 1181--1192.Google ScholarCross Ref
- Ejlali, A., Al-Hashimi, B. M., Rosinger, P., and Miremadi, S. G. 2007. Joint consideration of fault-tolerance, energy efficiency and performance in on-chip networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'07). 647--1652. Google ScholarDigital Library
- Elakkumanan, P., Prasad, K., and Sridhar, R. 2006. Time redundancy based scan flip-flop reuse to reduce ser of combinational logic. In Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06). IEEE Computer Society, Los Alamitos, CA, 617--624. Google ScholarDigital Library
- Ernst, D., Kim, N. S., Das, S., Pant, S., Rao, R., Pham, T., Ziesler, C., Blaauw, D., Austin, T., Flautner, K., and Mudge, T. 2003. Razor: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). 7--18. Google ScholarDigital Library
- Feng, C., Lu, Z., Jantsch, A., Li, J., and Zhang, M. 2010a. FoN: Fault-on-neighbor aware routing algorithm for networks-onchip. In International SOC Conference.Google Scholar
- Feng, C., Lu, Z., Jantsch, A., Li, J., and Zhang, M. 2010b. A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for networks-on-chip. In Proceedings of the International Workshop on Network on Chip Architectures (NoCArc'10). Google ScholarDigital Library
- Fick, D., Deorio, A., Chen, G., Bertacco, V., Sylvester, D., and Blaauw, D. 2009a. A highly resilient routing algorithm for fault-tolerant nocs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'09). 21--26. Google ScholarDigital Library
- Fick, D., Deorio, A., Hu, J., Bertacco, V., Blaauw, D., and Sylvester, D. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). ACM Press, New York, 812--817. Google ScholarDigital Library
- Fiorin, L., Micconi, L., and Sami, M. 2011. Design of fault tolerant network interfaces for nocs. In Proceedings of the 14th Euromicro Conference on Digital System Design. 393--400. Google ScholarDigital Library
- Flich, J., Mejia, A., Lopez, P., and Duato, J. 2007. Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. In Proceedings of the Symposium on Networks-on-Chip (NOCS'07). 183--194. Google ScholarDigital Library
- Flich, J., Skeie, T., Mejia, A., Lysne, O., Lopez, P., Robles, A., Duato, J., Koibuchi, M., Rokicki, T., and Sancho, J. 2012. A survey and evaluation of topology-agnostic deterministic routing algorithms. IEEE Trans. Parallel Distrib. Syst. 23, 3, 405--425. Google ScholarDigital Library
- Forney, G. D. 1973. The viterbi algorithm. Proc. IEEE 61, 3, 268--278.Google ScholarCross Ref
- Frantz, A., Kastensmidt, F., Carro, L., and Cota, E. 2006a. Dependable network-on-chip router able to simultaneously tolerate soft errors and crosstalk. In Proceedings of the IEEE International Test Conference (ITC'06). 1--9.Google Scholar
- Frantz, A. P., Cassel, M., Kastensmidt, F. L., Cota, E., and Carro, L. 2007. Crosstalk- and seu-aware networks on chips. IEEE Des. Test Comput. 24, 4, 340--350. Google ScholarDigital Library
- Frantz, A. P., Kastensmidt, F. L., Carro, L., and Cota, E. 2006b. Evaluation of seu and crosstalk effects in network-on-chip switches. In Proceedings of the Symposium on Integrated Circuits and Systems Design (SBCCI'06). Google ScholarDigital Library
- Fu, B. and Ampadu, P. 2009. On hamming product codes with type-ii hybrid arq for on-chip interconnects. IEEE Trans. Circ. Syst. I: Regular Papers 56, 9, 2042--2054. Google ScholarDigital Library
- Fukushima, Y., Fukushi, M., and Horiguchi, S. 2009. Fault-tolerant routing algorithm for network on chip without virtual channels. In Proceedings of the 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'09). 313--321. Google ScholarDigital Library
- Furber, S. 2006. Living with failure: Lessons from nature? In Proceedings of the European Test Symposium (ETS'06). 4--8. Google ScholarDigital Library
- Gadlage, M., Ahlbin, J., Narasimham, B., Bhuva, B., Massengill, L., Reed, R., Schrimpf, R., and Vizkelethy, G. 2010. Scaling trends in set pulse widths in sub-100 nm bulk cmos processes. IEEE Trans. Nuclear Sci. 57, 6, 3336--3341.Google Scholar
- Ganguly, A., Pande, P. P., and Belzer, B. 2009. Crosstalk-aware channel coding schemes for energy efficient and reliable noc interconnects. IEEE Trans. Very Large Scale Inter Syst. 17, 11, 1626--1639. Google ScholarDigital Library
- Ganguly, A., Pande, P. P., Belzer, B., and Grecu, C. 2007. Addressing signal integrity in networks on chip interconnects through crosstalk-aware double error correction coding. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'07). 317--324. Google ScholarDigital Library
- Gizopoulos, D., Psarakis, M., Adve, S. V., Ramachandran, P., Hari, S. K. S., Sorin, D., Biswas, A. M. A., and Vera, X. 2011. Architectures for online error detection and recovery in multicore processors. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11).Google Scholar
- Glass, C. J. and Ni, L. M. 1993. Fault-tolerant wormhole routing in meshes. In Proceedings of the 23rd International Fault-Tolerant Computing Digest of Papers Symposium (FTCS'93). 240--249.Google Scholar
- Grecu, C., Ivanov, A., Pande, R., Jantsch, A., Salminen, E., Ogras, U., and Marculescu, R. 2007. Towards open network-on-chip benchmarks. In Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS'07). Google ScholarDigital Library
- Grecu, C., Ivanov, A., Saleh, R., and Pande, P. P. 2006a. Noc interconnect yield improvement using crosspoint redundancy. In Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'06). 457--465. Google ScholarDigital Library
- Grecu, C., Ivanov, A., Saleh, R., Sogomonyan, E., and Pande, P. P. 2006b. On-line fault detection and location for noc interconnects. In Proceedings of the 12th IEEE International On-Line Testing Symposium (IOLTS'06). 145--150. Google ScholarDigital Library
- Hazucha, P., Karnik, T., Maiz, J., Walstra, S., Bloechel, B., Tschanz, J., Dermer, G., Hareland, S., Armstrong, P., and Borkar, S. 2003. Neutron soft error rate measurements in a 90-nm cmos process and scaling trends in sram from 0.25-mu;m to 90-nm generation. In IEEE International Electron Devices Meeting Technical Digest (IEDM'03). 21.5.1--21.5.4.Google Scholar
- Hegde, R. and Shanbhag, N. 2000. Toward achieving energy efficiency in presence of deep submicron noise. IEEE Trans. Syst. 8, 4, 379--391. Google ScholarDigital Library
- Hernandez, C., Federico, F., Santonja, V., and Duato, J. 2009. A new mechanism to deal with process variability in noc links. In Proceedings of the International Parallel and Distributed Processing Symposium (PDPS'09). 1--11. Google ScholarDigital Library
- Hoskote, Y., Vangal, S., Singh, A., Borkar, N., and Borkar, S. 2007. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61. Google ScholarDigital Library
- Hu, J. and Marculescu, R. 2004. Dyad - smart routing for networks-on-chip. In Proceedings of the 41st Design Automation Conference (DAC'04). 260--263. Google ScholarDigital Library
- Huffman, W. C. and Pless, V. 2003. Fundamentals of Error-Correcting Codes. Cambridge University Press.Google Scholar
- INTEL LABS. 2010. The scc platform overview. Tech. rep. revision 0.7, Intel Corporation. http://www.intel.la/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-single-chip-platform-overview-paper.pdf.Google Scholar
- ITRS. 2009. International technology roadmap for semiconductors. Tech. rep., ITRS Technology Working Group. http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_Interconnect.pdf.Google Scholar
- Jantsch, A., Lauter, R., and Vitkowski, A. 2005. Power analysis of link level and end-to-end data protection in networks on chip. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'05). Vol. 2. 1770--1773.Google Scholar
- Jovanovic, S., Tanougast, C., Weber, S., and Bobda, C. 2009. A new deadlock-free fault-tolerant routing algorithm for noc interconnections. In Proceedings of the International Conference on Field Programmable Logic (FPL'09). 326--331.Google Scholar
- Kakoee, M. R., Bertacco, V., and Benini, L. 2011a. A distributed and topology-agnostic approach for on-line noc testing. In Proceedings of the Network on Chip Symposium. Google ScholarDigital Library
- Kakoee, M. R., Bertacco, V., and Benini, L. 2011b. Relinoc: A reliable network for priority-based on-chip communication. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11).Google Scholar
- Keane, J. and Kim, C. 2011. An odometer for cpus. IEEE Spectrum 48, 5, 26--31.Google ScholarCross Ref
- Keane, J., Kim, T.-H., and Kim, C. H. 2007. An on-chip nbti sensor for measuring pmos threshold voltage degradation. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarDigital Library
- Kim, J., Nicopoulos, C., Park, D., Narayanan, V., Yousif, M. S., and Das, C. R. 2006. A gracefully degrading and energyefficient modular router architecture for on-chip networks. In Proceedings of the International Symposium on Computer Architecture (ISCA'06). 4--15. Google ScholarDigital Library
- Kim, J., Park, D., Nicopoulos, C., Vijaykrishnan, N., and Das, C. 2005. Design and analysis of an noc architecture from performance, reliability and energy perspective. In Proceedings of the Symposium on Architecture for Networking and Communications Systems (ANCS'05). Google ScholarDigital Library
- Kim, Y. B. and Kim, Y.-B. 2007. Fault tolerant source routing for network-on-chip. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 12--20. Google ScholarDigital Library
- Kohler, A. and Radetzki, M. 2009. Fault-tolerant architecture and deflection routing for degradable noc switches. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chips (NOCS'09). 22--31. Google ScholarDigital Library
- Kohler, A., Schley, G., and Radetzki, M. 2010. Fault tolerant network on chip switching with graceful performance degradation. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 20, 6, 883--896. Google ScholarDigital Library
- Koibuchi, M., Matsutani, H., Amano, H., and Pinkston, T. M. 2008. A lightweight fault-tolerant mechanism for networkon-chip. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NoCS'08). 13--22. Google ScholarDigital Library
- Koupaei, F. K., Khademzadeh, A., and Janidarmian, M. 2011. Fault-tolerant application-specific network-on-chip. In Proceedings of the World Congress on Engineering and Computer Science.Google Scholar
- Kuhn, K., Kenyon, C., Kornfeld, A., Liu, M., Maheshwari, A., Kai Shih, W., Sivakumar, S., Taylor, G., Vandervoorn, P., and Zawadzki, K. 2008. Managing process variation in Intel's 45nm CMOS technology. Intel Technol. J. 12, 2.Google Scholar
- Lee, H., Chang, N., Ogras, U., and Marculescu, R. 2007. On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches. ACM Trans. Des. Autom. Electron. Syst. 12, 3. Google ScholarDigital Library
- Lehtonen, T., Liljeberg, P., and Plosila, J. 2007a. Analysis of forward error correction methods for nanoscale networks-onchip. In Proceedings of the 2nd International Conference on Nano-Networks (Nano-Net'07). Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, 1--5. Google ScholarDigital Library
- Lehtonen, T., Liljeberg, P., and Plosila, J. 2007b. Online reconfigurable self-timed links for fault tolerant noc. VLSI Des. 2007, 13.Google ScholarCross Ref
- Lehtonen, T., Wolpert, D., Liljeberg, P., Plosila, J., and Ampadu, P. 2010. Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans. VLSI Syst. 18, 4, 527--540. Google ScholarDigital Library
- Lin, S.-Y., Shen, W.-C., Hsu, C.-C., Chao, C.-H., and Wu, A.-Y. 2009. Fault-tolerant router with built-in self-test/self-diagnosis and fault-isolation circuits for 2d-mesh based chip multiprocessor systems. In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT'09). 72--75.Google Scholar
- Lysne, O., Pinkston, T., and Duato, J. 2005. Part ii: A methodology for developing deadlock-free dynamic network reconfiguration processes. IEEE Trans. Parallel Distrib. Syst. 16, 5, 428--443. Google ScholarDigital Library
- Majer, M., Bobda, C., Ahmadinia, A., and Teich, J. 2005. Packet routing in dynamically changing networks on chip. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 154b--154b. Google ScholarDigital Library
- Malkin, G. and Steenstrup, M. 1995. Distance-vector routing. In Routing in Communication Networks, M. Steenstrup, Ed., Prentice Hall, 83--98. Google ScholarDigital Library
- Marculescu, R., Ogras, U., Peh, L.-S., Jerger, N., and Hoskote, Y. 2009. Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives. IEEE Trans. Comput. 28, 1, 3--21. Google ScholarDigital Library
- Mcpherson, J. 2006. Reliability challenges for 45nm and beyond. In Proceedings of the 43rd ACM/IEEE Design Automation Conference (DAC'06). 176--181. Google ScholarDigital Library
- Mediratta, S. D. and Draper, J. 2007. Performance evaluation of probe-send fault-tolerant network-on-chip router. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'07). 69--75.Google Scholar
- Mejia, A., Flich, J., Duato, J., Reinemo, S.-A., and Skeie, T. 2006. Segment-based routing: An efficient fault-tolerant routing algorithm for meshes and tori. In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS'06). Google ScholarDigital Library
- Mejia, A., Palesi, M., Flich, J., Kumar, S., Lopez, P., Holsmark, R., and Duato, J. 2009. Region-based routing: A mechanism to support efficient routing algorithms in nocs. IEEE Trans. Syst. 17, 3, 356--369. Google ScholarDigital Library
- Mintarno, E., Skaf, J., Zheng, R., Velamala, J. B., Cao, Y., Boyd, S., Dutton, R. W., and Mitra, S. 2011. Selftuning for maximized lifetime energy-efficiency in the presence of circuit aging. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 30, 5, 760--773. Google ScholarDigital Library
- Miranda, E. and Sune, J. 2004. Electron transport through broken down ultra-thin sio2 layers in mos devices. Microelectron. Reliabil. 44, 1, 1--23.Google ScholarCross Ref
- Mitra, S., Zhang, M., Waqas, S., Seifert, N., Gill, B., and Kim, K. S. 2006. Combinational logic soft error correction. In Proceedings of the IEEE International Test Conference (ITC'06). 1--9.Google Scholar
- Moy, J. 1995. Link-state routing. In Routing in Communication Networks, M. Ste, Ed., Prentice Hall, 135--157. Google ScholarDigital Library
- Murali, S., Atienza, D., Benini, L., and De Micheli, G. 2006. A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip. In Proceedings of the 43rd ACM/IEEE Design Automation Conference (DAC'06). 845--848. Google ScholarDigital Library
- Murali, S., Theocharides, T., Vijaykrishnan, N., Irwin, M., Benini, L., and Demicheli, G. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442. Google ScholarDigital Library
- Nicolaidis, M. 1999. Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proceedings of the 17th IEEE VLSI Test Symposium. 86--94. Google ScholarDigital Library
- Ogras, U., Hu, J., and Marculescu, R. 2005. Key research problems in noc design: A holistic perspective. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05). Google ScholarDigital Library
- Owens, J., Dally, W., Ho, R., Jayasimha, D., Keckler, S., and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108. Google ScholarDigital Library
- Palesi, M., Kumar, S., and Catania, V. 2010. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 29, 426--440. Google ScholarDigital Library
- Pande, P. P., Ganguly, A., Feero, B., Belzer, B., and Grecu, C. 2006. Design of low power and reliable networks on chip through joint crosstalk avoidance and forward error correction coding. In Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'06). 466--476. Google ScholarDigital Library
- Parikh, R. and Bertacco, V. 2011. Formally enhanced runtime verification to ensure noc functional correctness. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 410--419. Google ScholarDigital Library
- Park, D., Nicopoulos, C., Kim, J., Vijaykrishnan, N., and Das, C. R. 2006. Exploring fault-tolerant network-on-chip architectures. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06). IEEE Computer Society, Los Alamitos, CA, 93--104. Google ScholarDigital Library
- Patooghy, A. and Miremadi, S. G. 2008. Ltr: A low-overhead and reliable routing algorithm for network on chips. In Proceedings of the International SoC Design Conference (ISOCC'08). Vol. 1.Google Scholar
- Patooghy, A., Miremadi, S. G., and Shafaei, M. 2010. Crosstalk modeling to predict channel elay in network-on-chips. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 396--401.Google Scholar
- Pirretti, M., Link, G. M., Brooks, R. R., Vijaykrishnan, N., Kandemir, M. T., and Irwin, M. J. 2004. Fault tolerant algorithms for network-on-chip interconnect. In Proceedings of the International Symposium on VLSI (ISVLSI'04). IEEE Computer Society, Los Alamitos, CA, 46--51.Google Scholar
- Puente, V., Gregorio, J. A., Vallejo, F., and Beivide, R. 2008. Immunet: Dependable routing for interconnection networks with arbitrary topology. IEEE Trans. Comput. 57, 12, 1676--1689. Google ScholarDigital Library
- Radetzki, M. 2011. Fault-tolerant differential q routing in arbitrary noc topologies. In Proceedings of the International Conference on Embedded and Ubiquitous Computing (EUC'11). 33--40. Google ScholarDigital Library
- Raik, J., Ubar, R., and Govind, V. 2007. Test configurations for diagnosing faulty links in noc switches. In Proceedings of the 12th IEEE European Test Symposium (ETS'07). 29--34. Google ScholarDigital Library
- Rantala, V., Lehtonen, T., Liljeberg, P., and Plosila, J. 2009. Multi network interface architectures for fault tolerant network-on-chip. In Proceedings of the International Symposium on Signals, Circuits and Systems. 1--4.Google Scholar
- Ravindran, D. K. 2009. Structural fault-tolerance on the noc circuit level. Tech. rep., Institut fur Technische Informatik, Universitat Stuttgart. June.Google Scholar
- Rodrigo, S., Flich, J., Duato, J., and Hummel, M. 2008. Efficient unicast and multicast support for cmps. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 364--375. Google ScholarDigital Library
- Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F., and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the 4th ACM/IEEE International Networks-on-Chip Symposium (NOCS'10). 25--32. Google ScholarDigital Library
- Rossi, D., Angelini, P., and Metra, C. 2007. Configurable error control scheme for noc signal integrity. In Proceedings of the International On-Line Testing Symposium (IOLTS'07). 43--48. Google ScholarDigital Library
- Saha, S. 2010. Modeling process variability in scaled cmos technology. IEEE Des. Test Comput. 27, 2, 8--16. Google ScholarDigital Library
- Sanyo Semiconductors. 2011. Quality and reliability handbook ver 3. http://semicon.sanyo.com/en/reliability/.Google Scholar
- Schroeder, M. D., Birrell, A. D., Burrows, M., Murray, H., Needham, R. M., Rodeheffer, T. L., Satterthwaite, E. H., and Thacker, C. P. 1991. Autonet: A high-speed, self-configuring local area network using point-to-point links. IEEE J. Selected Areas Comm. 9, 8, 1318--1335. Google ScholarDigital Library
- Schafer, M., Hollstein, T., Zimmer, H., and Glesner, M. 2005. Deadlock-free routing and component placement for irregular mesh-based networks-on-chip. In Proceedings of the International Conference on Computer Aided Design (ICCAD'05). 238--245. Google ScholarDigital Library
- Schonwald, T., Zimmermann, J., Bringmann, O., and Rosenstiel, W. 2007. Fully adaptive fault-tolerant routing algorithm for network-on-chip architectures. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD'07). 527--534. Google ScholarDigital Library
- Shamshiri, S., Ghofrani, A., and Cheng, K.-T. 2011. End-to-end error correction and online diagnosis for on-chip networks. In Proceedings of the International Test Conference.Google Scholar
- Shivakumar, P., Kistler, M., Keckler, S. W., Burger, D., and Alvisi, L. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks. Google ScholarDigital Library
- Shooman, M. L. 2002. Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design. John Wiley & Sons. Google ScholarDigital Library
- Song, W., Edwards, D., Nunez-Yanez, J., and Dasgupta, S. 2009. Adaptive stochastic routing in fault-tolerant on-chip networks. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip (NoCS'09). 32--37. Google ScholarDigital Library
- Sridhara, S. and Shanbhag, N. 2005. Coding for system-on-chip networks: A unified framework. IEEE Trans. VLSI Syst. 13, 6, 655--667. Google ScholarDigital Library
- Strano, A., Bertozzi, D., Trivino, F., Sanchez, J. L., Alfaro, F. J., and Flich, J. 2012. Osr-lite: Fast and deadlock-free noc reconfiguration framework. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modelling and Simulation.Google Scholar
- Takeda, E. and Yang, C. 1995. Hot-Carrier Effects in MOS Devices. Academic Press.Google Scholar
- Tamhankar, R., Murali, S., and De Micheli, G. 2005. Performance driven reliable link design for networks on chips. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'05). Vol. 2. 749--754. Google ScholarDigital Library
- Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. 2007. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In Digest of Technical Papers of the IEEE International Solid-State Circuits Conference (ISSCC'07). 98--589.Google Scholar
- Viterbi, A. J. 1971. Convolutional codes and their performance in communication systems. IEEE Trans. Comm. Technol. 19, 5, 751--772.Google ScholarCross Ref
- Vitkovski, A., Jantsch, A., Lauter, R., Haukilahti, R., and Nilsson, E. 2008. Low-power and error protection coding for network-on-chip traffic. IET Comput. Digital Techn. 2, 6, 483--492.Google ScholarCross Ref
- Vitkovskiy, A., Soteriou, V., and Nicopoulos, C. 2010. A fine-grained link-level fault-tolerant mechanism for networks-onchip. In Proceedings of the IEEE International Computer Design Conference (ICCD'10). 447--454.Google Scholar
- Walker, M. 2000. Modeling the wiring of deep submicron ics. IEEE Spectrum 37, 3, 65--71. Google ScholarDigital Library
- Wittmann, R., Puchner, H., Hinh, L., Ceric, H., Gehring, A., and Selberherr, S. 2005. Simulation of dynamic nbti degradation for a 90nm cmos technology. In Proceedings of the Nanotechnology Conference.Google Scholar
- Wu, E., Lai, W., Nowak, E., Mckenna, J., Vayshenker, A., and Harmon, D. 2001. Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin oxides. Microelectron. Engin. 59, 25--31.Google ScholarCross Ref
- Wu, J. and Wang, D. 2002. Fault-tolerant and deadlock-free routing in 2-d meshes using rectilinear-monotone polygonal fault blocks. In Proceedings of the International Conference on Parallel Processing. 247--254. Google ScholarDigital Library
- Xinming, D. and Xuemei, S. 2010. Fault-tolerant routing in a prdt(2,1)-based noc. In Proceedings of the 2nd International Computer Engineering and Technology Conference (ICCET'10).Google Scholar
- Yaghini, P. M., Eghbal, A., Pedram, H., and Zarandi, H. R. 2011. Investigation of transient fault effects in synchronous and asynchronous network on chip router. J. Syst. Archit. 57, 1, 61--68. Google ScholarDigital Library
- Yang, Y. 2010. Issues of esd protection in nano-scale cmso. Ph.D. thesis, George Mason University, Fairfax, Virginia, USA.Google Scholar
- Yu, A. J. and Lemieux, G. G. 2005. Fpga defect tolerance: Impact of granularity. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'05). 189--196.Google Scholar
- Yu, Q. and Ampadu, P. 2008. Adaptive error control for noc switch-to-switch links in a variable noise environment. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems (DFTVS'08). 352--360. Google ScholarDigital Library
- Yu, Q. and Ampadu, P. 2010. Transient and permanent error co-management method for reliable networks-on-chip. In Proceedings of the 4th ACM/IEEE International Networks-on-Chip Symposium (NOCS'10). 145--154. Google ScholarDigital Library
- Yu, Q. and Ampadu, P. 2011. A dual-layer method for transient and permanent error co-management in noc links. IEEE Trans. Circ. Syst. II: Express Briefs 58, 1, 36--40.Google ScholarCross Ref
- Yu, Q. and Ampadu, P. 2012. Dual-layer adaptive error control for network-on-chip links. IEEE Trans. VLSI. Syst. 20, 7, 1304--1317. Google ScholarDigital Library
- Yu, Q., Cano, J., Flich, J., and Ampadu, P. 2012. Transient and permanent error control for high-end multiprocessor systems-on- chip. In Proceedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS'12). 169--176. Google ScholarDigital Library
- Yu, Q., Zhang, B., Li, Y., and Ampadu, P. 2010. Error control integration scheme for reliable noc. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'10). 3893--3896.Google Scholar
- Yu, Q., Zhang, M., and Ampadu, P. 2011. Exploiting inherent information redundancy to manage transient errors in noc routing arbitration. In Proceedings of the IEEE Network on Chip Symposium (NoCS'11). Google ScholarDigital Library
- Zhang, B. and Orshansky, M. 2008. Modeling of nbti-induced pmos degradation under arbitrary dynamic temperature variation. In Proceedings of the 9th International Symposium on Quality Electronic Design (ISQED'08). 774--779. Google ScholarDigital Library
- Zhang, M. and Shanbhag, N. 2006. Soft-error-rate-analysis (sera) methodology. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 10, 2140--2155. Google ScholarDigital Library
- Zhang, Y. and Jiang, J. 2008. Bibliographical review on reconfigurable fault-tolerant control systems. Ann. Rev. Control 32, 229--252.Google ScholarCross Ref
- Zhang, Y., Li, H., and Li, X. 2009. Selected crosstalk avoidance code for reliable network-on-chip. J. Comput. Sci. Technol. 24, 6, 1074--1085.Google ScholarCross Ref
- Zhang, Y., Parikh, D., Sankaranarayanan, K., Skadron, K., and Stan, M. 2003. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. Tech. rep. CS-2003--05, University of Virgiania, Department of Computer Science. March.Google Scholar
- Zhang, Z., Greiner, A., and Taktak, S. 2008. A reconfigurable routing algorithm for a fault-tolerant 2d-mesh network-on-chip. In Proceedings of the Design Automation Conference (DAC'08). 441--446. Google ScholarDigital Library
- Zimmer, H. and Jantsch, A. 2003. A fault model notation and error-control scheme for switch-to-switch buses in a network-onchip. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 188--193. Google ScholarDigital Library
Index Terms
- Methods for fault tolerance in networks-on-chip
Recommendations
A Pairwise Substitutional Fault Tolerance Technique for the Cube-Connected Cycles Architecture
With all of the salient features of hypercubes, the cube-connected cycles (CCC)structure is an attractive parallel computation network suited for very large scaleintegration (VLSI) implementation because of its layout regularity. Unfortunately, ...
Fault-tolerant Network-on-Chip based on Fault-aware Flits and Deflection Routing
NOCS '15: Proceedings of the 9th International Symposium on Networks-on-ChipDeflection routing is a promising approach for energy and hardware efficient NoCs. Future VLSI designs will have an increasing susceptibility to failures and breakdowns. The inherent redundancy of NoCs can be used to tolerate such failures. We extended ...
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
NOCS '08: Proceedings of the Second ACM/IEEE International Symposium on Networks-on-ChipSurvival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (...
Comments