Skip to main content

Advertisement

Log in

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people’s attention. However, since security has a long history, it covers a wide range of topics which change a lot, from classic cryptography to recently popular mobile security. There is a need to investigate security-related topics and trends, which can be a guide for security researchers, security educators and security practitioners. To address the above-mentioned need, in this paper, we conduct a large-scale study on security-related questions on Stack Overflow. Stack Overflow is a popular on-line question and answer site for software developers to communicate, collaborate, and share information with one another. There are many different topics among the numerous questions posted on Stack Overflow and security-related questions occupy a large proportion and have an important and significant position. We first use two heuristics to extract from the dataset the questions that are related to security based on the tags of the posts. And then we use an advanced topic model, Latent Dirichlet Allocation (LDA) tuned using Genetic Algorithm (GA), to cluster different security-related questions based on their texts. After obtaining the different topics of security-related questions, we use their metadata to make various analyses. We summarize all the topics into five main categories, and investigate the popularity and difficulty of different topics as well. Based on the results of our study, we conclude several implications for researchers, educators and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 2014, 19(3): 619-654.

    Article  Google Scholar 

  2. Rosen C, Shihab E. What are mobile developers asking about? A large scale study using stack overflow. Empirical Software Engineering, 2016, 21(3): 1192-1223.

    Article  Google Scholar 

  3. Treude C, Barzilay O, Storey M A. How do programmers ask and answer questions on the web? NIER track. In Proc. the 33rd International Conference on Software Engineering (ICSE), May 2011, pp.804-807.

  4. Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B. Design lessons from the fastest Q&A site in the west. In Proc. the 29th SIGCHI Conference on Human Factors in Computing Systems, May 2011, pp.2857-2866.

  5. Xia X, Lo D, Wang X Y, Zhou B. Tag recommendation in software information sites. In Proc. the 10th Working Conference on Mining Software Repositories, May 2013, pp.287-296.

  6. Wang SW, Lo D, Vasilescu B, Serebrenik A. EnTagRec: An enhanced tag recommendation system for software information sites. In Proc. the 30th International Conference on Software Maintenance and Evolution (ICSME), September 2014, pp.291-300.

  7. Beyer S, Pinzger M. A manual categorization of Android app development issues on stack overflow. In Proc. the 30th International Conference on Software Maintenance and Evolution (ICSME), September 2014, pp.531-535.

  8. Linares-Vásquez M, Dit B, Poshyvanyk D. An exploratory analysis of mobile development issues using Stack Overflow. In Proc. the 10th Working Conference on Mining Software Repositories, May 2013, pp.93-96.

  9. Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022.

    MATH  Google Scholar 

  10. Asuncion H U, Asuncion A U, Taylor R N. Software traceability with topic modeling. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), May 2010, pp.95-104.

  11. Thomas SW. Mining software repositories using topic models. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.1138-1139.

  12. Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531.

  13. Heinrich G. Parameter estimation for text analysis. Technical Report, vsonix GmbH + University of Leipzi, 2008. http://www.arbylon.net/publications/textest. pdf,Aug. 2016.

  14. Porter M F. Snowball: A language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html, Aug. 2016.

  15. Goldberg D E, Holland J H. Genetic algorithms and machine learning. Machine Learning, 1988, 3(2/3): 95-99.

    Article  Google Scholar 

  16. Rousseeuw P J, Kaufman L. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 1990.

  17. Sander J, Ester M, Kriegel H P, Xu X W. Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 1998, 2(2): 169-194.

    Article  Google Scholar 

  18. Hotho A, Maedche A, Staab S. Ontology-based text document clustering. KI, 2002, 16(4): 48-54.

    Google Scholar 

  19. Nadi S, Krüger S, Mezini M, Bodden E. “Jumping through hoops”: Why do Java developers struggle with cryptography APIs? In Proc. the 38th International Conference on Software Engineering, May 2016, pp.935-946.

  20. Li H W, Xing Z C, Peng X, Zhao W Y. What help do developers seek, when and how? In Proc. the 20th Working Conference on Reverse Engineering (WCRE), October 2013, pp.142-151.

  21. Bajaj K, Pattabiraman K, Mesbah A. Mining questions asked by web developers. In Proc. the 11th Working Conference on Mining Software Repositories, May 2014, pp.112-121.

  22. Nie L M, Jiang H, Ren Z L, Sun Z Y, Li X C. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, PrePrints, doi:10.1109/TSC.2016.2560165.

  23. Jiang H, Zhang J X, Li X C, Ren Z L, Lo D. A more accurate model for finding tutorial segments explaining APIs. In Proc. the 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 2016, pp.157-167.

  24. Zhang Y, Lo D, Xia X, Sun J L. Multi-factor duplicate question detection in Stack Overflow. Journal of Computer Science and Technology, 2015, 30(5): 981-997.

    Article  Google Scholar 

  25. Xia X, Lo D, Correa D, Sureka A, Shihab E. It takes two to tango: Deleted stack overflow question prediction with text and meta features. In Proc. the 40th Annual International Computers, Software & Applications Conference (COMPSAC), June 2016.

  26. Wang X Y, Xia X, Lo D. TagCombine: Recommending tags to contents in software information sites. Journal of Computer Science and Technology, 2015, 30(5): 1017-1035.

    Article  Google Scholar 

  27. Xu B W, Xing Z C, Xia X, Lo D, Wang Q Y, Li S P. Domain-specific cross-language relevant question retrieval. In Proc. the 13th International Conference on Mining Software Repositories, May 2016, pp.413-424.

  28. Xu B W, Ye D C, Xing Z C, Xia X, Chen G B, Li S P. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proc. the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), September 2016.

  29. Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E. Mining apps for abnormal usage of sensitive data. In Proc. the 37th IEEE International Conference on Software Engineering (ICSE), May 2015, pp.426-436.

  30. Gorla A, Tavecchia I, Gross F, Zeller A. Checking app behavior against app descriptions. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.1025-1035.

  31. Huang J J, Zhang X Y, Tan L, Wang P, Liang B. As-Droid: Detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.1036-1046.

  32. Kirat D, Vigna G. MalGene: Automatic extraction of malware analysis evasion signature. In Proc. the 22nd ACM SIGSAC Conference on Computer and Communications Security, October 2015, pp.769-780.

  33. Parameshwaran I, Budianto E, Shinde S, Dang H, Sadhu A, Saxena P. Auto-patching DOM-based XSS at scale. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, March 2015, pp.272-283.

  34. Fazzini M, Saxena P, Orso A. AutoCSP: Automatically retrofitting CSP to web applications. In Proc. the 37th International Conference on Software Engineering, May 2015, pp.336-346.

  35. Nguyen A T, Nguyen T T, Al-Kofahi J, Nguyen H V, Nguyen T N. A topic-based approach for narrowing the search space of buggy files from a bug report. In Proc. the 26th IEEE/ACM International Conference on Automated Software Engineering, November 2011, pp.263-272.

  36. Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C N. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.70-79.

  37. Lukins S K, Kraft N A, Etzkorn L H. Bug localization using latent Dirichlet allocation. Information and Software Technology, 2010, 52(9): 972-990.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, XL., Lo, D., Xia, X. et al. What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts. J. Comput. Sci. Technol. 31, 910–924 (2016). https://doi.org/10.1007/s11390-016-1672-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1672-0

Keywords

Navigation