skip to main content
10.1145/3084226.3084285acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
short-paper

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

Authors Info & Claims
Published:15 June 2017Publication History

ABSTRACT

Semi-Supervised Learning (SSL) is a data mining technique which comes between supervised and unsupervised techniques, and is useful when a small number of instances in a dataset are labelled but a lot of unlabelled data is also available. This is the case with user reviews in application stores such as the Apple App Store or Google Play, where a vast amount of reviews are available but classifying them into categories such as bug related review or feature request is expensive or at least labor intensive. SSL techniques are well-suited to this problem as classifying reviews not only takes time and effort, but may also be unnecessary. In this work, we analyse SSL techniques to show their viability and their capabilities in a dataset of reviews collected from the App Store for both transductive (predicting existing instance labels during training) and inductive (predicting labels on unseen future data) performance.

References

  1. David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Alcalá-Fdez, L. Sánchez, S. García, M.J. del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J. C. Fernández, and F. Herrera. 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13, 3 (2009), 307--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L.V.G. Carreño and K. Winbladh. 2013. Analysis of user comments: An approach for software requirements evolution. In 35th International Conference on Software Engineering (ICSE). 582--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 767--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nadia Felix F. da Silva, Luiz F. S. Coletta, and Eduardo R. Hruschka. 2016. A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning. Comput. Surveys 49, 1, Article 15 (June 2016), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mark Harman, Yue Jia, and Yuanyuan Zhang. 2012. App Store Mining and Analysis: MSR for App Stores. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. 108--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Iacob and R. Harrison. 2013. Retrieving and analyzing mobile apps feature requests from online reviews. In 10th Working Conference on Mining Software Repositories (MSR). 41--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ming Li and Zhi-Hua Zhou. 2005. SETRED: Self-training with Editing. Springer Berlin Heidelberg, Berlin, Heidelberg, 611--621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Maalej and H. Nabil. 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. In IEEE 23rd International Requirements Engineering Conference (RE). 116--125.Google ScholarGoogle Scholar
  10. J. Ortigosa-Hernández, I. Inza, and J. A. Lozano. 2016. Semisupervised Multiclass Classification Problems With Scarcity of Labeled Data: A Theoretical Study. IEEE Transactions on Neural Networks and Learning Systems 27, 12 (Dec 2016), 2602--2614.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Pagano and W. Maalej. 2013. User feedback in the appstore: An empirical study. In 21st IEEE International Requirements Engineering Conference (RE). 125--134.Google ScholarGoogle Scholar
  12. S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In IEEE International Conference on Software Maintenance and Evolution (ICSME). 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. John C. Platt. 1999. Advances in Kernel Methods. MIT Press, Cambridge, MA, USA, Chapter Fast Training of Support Vector Machines Using Sequential Minimal Optimization, 185--208. http://dl.acm.org/citation.cfm?id=299094.299105 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Sigdel, İ. Dinç, S. Dinç, M.S. Sigdel, M. L. Pusey, and R.S. Aygün. 2014. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery. In Proceedings of IEEE Southeastcon.Google ScholarGoogle Scholar
  16. Isaac Triguero, Salvador García, and Francisco Herrera. 2015. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems 42, 2 (2015), 245--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jiao Wang, Si wei Luo, and Xian hua Zeng. 2008. A random subspace method for co-training. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 195--200.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tiejian Wang, Zhiwu Zhang, Xiaoyuan Jing, and Yanli Liu. 2016. Non-negative sparse-based SemiBoost for software defect prediction. Software Testing, Verification and Reliability 26, 7 (2016), 498--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining, Practical Machine Learning Tools and Techniques (4th Edition). Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL '95). Association for Computational Linguistics, Stroudsburg, PA, USA, 189--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yusuf Yaslan and Zehra Cataltepe. 2010. Co-training with relevant random subspaces. Neurocomputing 73, 10-12 (2010), 1652--1661. Subspace Learning / Selected papers from the European Symposium on Time Series Prediction. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
                June 2017
                405 pages
                ISBN:9781450348041
                DOI:10.1145/3084226

                Copyright © 2017 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 15 June 2017

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • short-paper
                • Research
                • Refereed limited

                Acceptance Rates

                Overall Acceptance Rate71of232submissions,31%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader