Abstract
A single statistical framework, comprising power law distributions and scale-free networks, seems to fit a wide variety of phenomena. There is evidence that power laws appear in software at the class and function level. We show that distributions with long, fat tails in software are much more pervasive than previously established, appearing at various levels of abstraction, in diverse systems and languages. The implications of this phenomenon cover various aspects of software engineering research and practice.
- Adamic, L. A. 2000. Zipf, power-laws, and Pareto—a ranking tutorial. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html.Google Scholar
- Adamic, L. A. and Huberman, B. A. 2002. Zipf's law and the internet. Glottometrics 3, 143--150.Google Scholar
- Adams, E. N. 1984. Optimizing preventive service of software products. IBM J. Resear. Devel. 28, 1, 2--14.Google ScholarDigital Library
- Albers, S. and Westbrook, J. 1998. Self-organizing data structures. In Online Algorithms: The State of the Art, A. Fiat and G. J. Woeginger, Eds. Lecture Notes in Computer Science, vol. 1442. Springer-Verlag, Berlin, 31--51. Google ScholarDigital Library
- Albert, R., Jeong, H., and Barabási, A.-L. 1999. Diameter of the World-Wide Web. Nature 401, 130.Google ScholarCross Ref
- Albert, R., Jeong, H., and Barabási, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 378--382.Google ScholarCross Ref
- Allen, B. and Munro, I. 1978. Self-organizing binary search trees. J. ACM 25, 4, 526--535. Google ScholarDigital Library
- Barabási, A.-L. 2002. Linked: The New Science of Networks. Perseus Publishing, Cambridge, MA.Google Scholar
- Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.Google ScholarCross Ref
- Barabási, A.-L., Albert, R., and Jeong, H. 1999. Mean-field theory for scale-free random networks. Physical A 272, 173--187.Google ScholarCross Ref
- Barabási, A.-L. and Bonabeau, E. 2003. Scale-free networks. Scientific Amer. 288, 5, 50--59.Google ScholarCross Ref
- Baxter, G., Frean, M., Noble, J., Rickerby, M., Smith, H., Visser, M., Melton, H., and Tempero, E. 2006. Understanding the shape of java software. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'06). ACM Press, New York, NY, 397--412. Google ScholarDigital Library
- Bentley, J. L. and McGeoch, C. C. 1985. Amortized analyses of self-organizing sequential search heuristics. Comm. ACM 28, 4, 404--411. Google ScholarDigital Library
- Boehm, B. and Basili, V. R. 2001. Software defect reduction top 10 list. IEEE Softw. 34, 1, 135--200. Google ScholarDigital Library
- Boehm, B. W. 1987. Industrial software metrics top 10 list. IEEE Softw. 4, 9, 84--85.Google ScholarDigital Library
- Candea, G., Brown, A. B., Fox, A., and Patterson, D. 2004. Recovery-oriented computing: building multitier dependability. IEEE Comput. 37, 11, 60--67. Google ScholarDigital Library
- Chou, A., Yang, J., Chelf, B., Hallem, S., and Engler, D. 2001. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating System Principles. ACM Press, New York, NY, 73--88. Google ScholarDigital Library
- Clark, D. W. and Green, C. C. 1977. An empirical study of list structure in Lisp. Comm. ACM 20, 2, 78--87. Google ScholarDigital Library
- Denning, P. J. 2005. The locality principle. Comm. ACM 48, 7, 19--24. Google ScholarDigital Library
- Dorogovtsev, S. N. and Mendes, J. F. F. 2003. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, U.K. Google ScholarDigital Library
- Ebert, C. 2001. Metrics for indentifying critical components in software projects. In Handbook of Software Engineering and Knowledge Engineering, S. K. Chang, Ed. Vol. 1, Fundamentals. World Scientific, London, U.K.Google Scholar
- Economides, N. 1996. The economics of networks. Int. J. Indust. Org. 16, 4, 673--699.Google ScholarCross Ref
- Endres, A. 1975. An analysis of errors and their causes in system programs. ACM SIGPLAN Notices 10, 6, 327--336. Google ScholarDigital Library
- Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). ACM Press, New York, NY, 251--262. Google ScholarDigital Library
- Feldman, S. I. 1979. Make—a program for maintaining computer programs. Softw. Prac. Exper. 9, 4, 255--265.Google ScholarCross Ref
- Feller, W. 1971. An Introduction to Probability Theory and Its Applications 2nd ed. Vol. 2. John Wiley & Sons, New York, NY.Google Scholar
- Fenton, N. E. and Ohlsson, N. 2000. Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 26, 8, 797--814. Google ScholarDigital Library
- Fowler, M. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Fox Keller, E. 2005. Revisiting “scale-free” networks. BioEssays 27, 10, 1060--1068.Google ScholarCross Ref
- Glass, R. L. 1998. Reuse: What's wrong with this picture? IEEE Softw. 15, 2, 57--59. Google ScholarDigital Library
- Heising, W. P. 1963. Note on random addressing techniques. IBM Syst. J. 2, 2, 112--116.Google ScholarDigital Library
- Henry, S. and Kafura, D. 1981. Software structure metrics based on information flow. IEEE Trans. Softw. Eng. 7, 5, 510--518. Google ScholarDigital Library
- Huberman, B. A. and Adamic, L. A. 1999. Growth dynamics of the World-Wide Web. Nature 401, 131.Google ScholarCross Ref
- Knuth, D. E. 1984a. The TeXbook. Computers & Typesetting, vol. A. Addison Wesley Publishing Company, Reading, MA. Google ScholarDigital Library
- Knuth, D. E. 1984b. Literate programming. Comput. J. 27, 97--111. Google ScholarDigital Library
- Knuth, D. E. 1986a. TeX: The Program. Computers & Typesetting, vol. B. Addison Wesley Publishing Company, Reading, MA.Google Scholar
- Knuth, D. E. 1986b. The METAFONT Book. Computers & Typesetting, vol. C. Addison Wesley Publishing Company, Reading, MA. Google ScholarDigital Library
- Knuth, D. E. 1986c. METAFONT The Program. Computers & Typesetting, vol. D. Addison Wesley Publishing Company, Reading, MA. Google ScholarDigital Library
- Knuth, D. E. 1989. The errors of TeX. Softw. Prac. Exper. 19, 7, 607--685. Google ScholarDigital Library
- Knuth, D. E. 1998. Sorting and Searching, 2nd ed. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Laherrère, J. and Sornette, D. 1998. Stretched exponential distributions in nature and economy: “fat tails with characteristic scales.” Europ. Phys. J. B 2, 525--539.Google ScholarCross Ref
- Lakos, J. 1996. Large Scale C++ Software Development. Addison-Wesley, Boston, MA. Google ScholarDigital Library
- Larsen (guest editor), G. 2000. Component-based enterprise frameworks. Comm. ACM 43, 10, 24--66. Google ScholarDigital Library
- Li, W. 1992. Random texts exhibit zipf's-law-like word frequency distribution. IEEE Trans. Inform. Theory 38, 6, 1841--1845.Google ScholarDigital Library
- Lindholm, T. and Yellin, F. 1999. The Java Virtual Machine Specification, 2nd ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Mandelbrot, B. 1953. An informational theory of the statistical structure of language. In Proceedings of the 2nd London Symposiumon Communication Theory, W. Jackson, Ed. Butterworth, London, 486--504.Google Scholar
- Mandelbrot, B. M. 1951a. Adaptation du message á la ligne de transmission: I. Quanta d' information. Comptes Rendus des séances de l' Academie des Sciences 232, 1636--1740.Google Scholar
- Mandelbrot, B. M. 1951b. Adaptation du message á la ligne de transmission: II. Interprétation physiques. Comptes Rendus des séances de l' Academie des Sciences 232, 2003--2005.Google Scholar
- Mandelbrot, B. M. 1983. The Fractal Geometry of Nature. W. H. Freeman and Company, New York, NY.Google Scholar
- Marchesi, M., Pinna, S., Serra, N., and Tuveri, S. 2004. Power laws in Smalltalk. In Proceedings of the 12th European Smalltalk User Group Joint Event. Köthen, Germany.Google Scholar
- Martin, R. C. 2003. Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Mitzenmacher, M. 2004. A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1, 2, 226--251.Google ScholarCross Ref
- Möller, K.-H. 1993. An empirical investigation of software fault distribution. In Proceedings of the 1st International Metrics Symposium. IEEE Computer Society Press, Los Alamitos, CA, 82--90.Google ScholarCross Ref
- Myers, C. R. 2003. Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys. Rev. E 68, 046116.Google ScholarCross Ref
- Newman, M. E. J. 2005. Power laws, pareto distributions and zipf's law. Contem. Phys. 46, 5, 232--351.Google ScholarCross Ref
- Ohlsson, N. and Alberg, H. 1996. Predicting fault-prone software modules in telephone switches. IEEE Trans. Softw. Eng. 22, 12, 886--894. Google ScholarDigital Library
- Ostrand, T. J. and Weyuker, E. J. 2002. The distribution of faults in a large industrial software system. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM Press, New York, NY, 55--64. Google ScholarDigital Library
- Pareto, V. 1897. Cours d' Économie Politique. Rouge, Lausanne.Google Scholar
- Potanin, A., Noble, J., Frean, M., and Biddle, R. 2005. Scale-free geometry in object-oriented programs. Comm. ACM 48, 5, 99--103. Google ScholarDigital Library
- Schwartz, E. E. 1963. A dictionary for minimum reduncancy encoding. J. ACM 10, 4, 413--439. Google ScholarDigital Library
- Shiode, N. and Batty, M. 2000. Power law distributions in real and virtual worlds. In Proceedings of the 10th Annual Internet Society Conference (INET'00). Yokohama.Google Scholar
- Shull, F., Basili, V., Boehm, B., Brown, A. W., Costa, P., Lindvall, M., Port, D., Ioana, R., Tesoriero, R., and Zelkowitz, M. 2002. What we have learned about fighting defects. In Proceedings of the 8th IEEE Symposium on Software Metrics (METRICS'02). IEEE Computer Society, Los Alamitos, CA. Google ScholarDigital Library
- Simon, H. A. 1955. On a class of skew distribution functions. Biometrika 42, 3/4, 425--440.Google Scholar
- Spinellis, D. and Szyperski, C. 2004. How is open source affecting software development? IEEE Softw. 21, 1, 28--33. Google ScholarDigital Library
- Szyperski, C., Gruntz, D., and Murer, S. 2002. Component Software: Beyond Object-Oriented Programming, 2nd ed. Addison-Wesley, London. Google ScholarDigital Library
- TIS Committee. 1995. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification. Version 1.2.Google Scholar
- Valverde, S., Cancho, R. F., and Solé, R. V. 2002. Scale-free networks from optimal design. Europhysics Lett. 60, 4, 512--517.Google ScholarCross Ref
- Valverde, S. and Solé, R. V. 2003. Hierarchical small worlds in software architecture. Working Paper 03-07-044, Santa Fe Institute, Santa Fe, NM.Google Scholar
- Venkatasubramanian, V., Katare, S., Patkar, P. R., and Mu, F.-P. 2004. Spontaneous emergence of complex optimal networks through evolutionary adaptation. Comput. Chem. Engin. 28, 9, 1789--1798.Google ScholarCross Ref
- Weber, S. 2004. The Success of Open Source. Harvard University Press, Cambridge, MA. Google ScholarDigital Library
- Weiner, L. H. 1978. The roots of structured programming. ACM SIGCSE Bull. 10, 1, 243--253. Google ScholarDigital Library
- Wheeldon, R. and Counsell, S. 2003. Power law distributions in class relationships. In Proceedings of the 3rd IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'03). IEEE Computer Society Press, Los Alamitos, CA, 45--54.Google Scholar
- Yule, G. U. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Philoso. Transa. Royal Soc. London: Series B 213, 21--87.Google ScholarCross Ref
- Zipf, G. K. 1935. The Psycho-Biology of Language: An Introduction to Dynamic Philology. Houghton Mifflin, Boston, MA.Google Scholar
- Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA.Google Scholar
Index Terms
- Power laws in software
Recommendations
Web dynamics as a random walk: how and why power laws occur
WebSci '12: Proceedings of the 4th Annual ACM Web Science ConferenceWe investigate the general conditions under which power laws emerge in networks for the degree distributions (the number of links a node has). Our study is based on a new and versatile random-walk network model (the exciton model) that includes all ...
Power laws and the AS-level internet topology
In this paper, we study and characterize the topology of the Internet at the autonomous system (AS) level. First, we show that the topology can be described efficiently with power laws. The elegance and simplicity of the power laws provide a novel ...
The complex software network evolution of Java Development Kits: topological properties and design principles
We study evolving topological properties of a typical example of complex software networks, the family of Java networks constructed from Java Development Kits (JDKs). In Java networks, a node is a software entity (class or interface) and there is an ...
Comments