short-paper

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

Authors:
Roger Deocadez

School of Technology, Oxford Brookes University, Oxford, UK

School of Technology, Oxford Brookes University, Oxford, UK
View Profile

,
Rachel Harrison

School of Technology, Oxford Brookes University, Oxford, UK

School of Technology, Oxford Brookes University, Oxford, UK
View Profile

,
Daniel Rodriguez

Dept of Comp Science, University of Alcala, Alcalá de Henares, Spain

Dept of Comp Science, University of Alcala, Alcalá de Henares, Spain
View Profile

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software EngineeringJune 2017Pages 320–323https://doi.org/10.1145/3084226.3084285

Published:15 June 2017Publication History

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

Pages 320–323

ABSTRACT

Semi-Supervised Learning (SSL) is a data mining technique which comes between supervised and unsupervised techniques, and is useful when a small number of instances in a dataset are labelled but a lot of unlabelled data is also available. This is the case with user reviews in application stores such as the Apple App Store or Google Play, where a vast amount of reviews are available but classifying them into categories such as bug related review or feature request is expensive or at least labor intensive. SSL techniques are well-suited to this problem as classifying reviews not only takes time and effort, but may also be unnecessary. In this work, we analyse SSL techniques to show their viability and their capabilities in a dataset of reviews collected from the App Store for both transductive (predicting existing instance labels during training) and inductive (predicting labels on unseen future data) performance.

References

David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37--66. Google ScholarDigital Library
J. Alcalá-Fdez, L. Sánchez, S. García, M.J. del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J. C. Fernández, and F. Herrera. 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13, 3 (2009), 307--318. Google ScholarDigital Library
L.V.G. Carreño and K. Winbladh. 2013. Analysis of user comments: An approach for software requirements evolution. In 35th International Conference on Software Engineering (ICSE). 582--591. Google ScholarDigital Library
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 767--778. Google ScholarDigital Library
Nadia Felix F. da Silva, Luiz F. S. Coletta, and Eduardo R. Hruschka. 2016. A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning. Comput. Surveys 49, 1, Article 15 (June 2016), 26 pages. Google ScholarDigital Library
Mark Harman, Yue Jia, and Yuanyuan Zhang. 2012. App Store Mining and Analysis: MSR for App Stores. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. 108--111. Google ScholarDigital Library
C. Iacob and R. Harrison. 2013. Retrieving and analyzing mobile apps feature requests from online reviews. In 10th Working Conference on Mining Software Repositories (MSR). 41--44. Google ScholarDigital Library
Ming Li and Zhi-Hua Zhou. 2005. SETRED: Self-training with Editing. Springer Berlin Heidelberg, Berlin, Heidelberg, 611--621. Google ScholarDigital Library
W. Maalej and H. Nabil. 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. In IEEE 23rd International Requirements Engineering Conference (RE). 116--125.Google Scholar
J. Ortigosa-Hernández, I. Inza, and J. A. Lozano. 2016. Semisupervised Multiclass Classification Problems With Scarcity of Labeled Data: A Theoretical Study. IEEE Transactions on Neural Networks and Learning Systems 27, 12 (Dec 2016), 2602--2614.Google ScholarCross Ref
D. Pagano and W. Maalej. 2013. User feedback in the appstore: An empirical study. In 21st IEEE International Requirements Engineering Conference (RE). 125--134.Google Scholar
S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In IEEE International Conference on Software Maintenance and Evolution (ICSME). 281--290. Google ScholarDigital Library
John C. Platt. 1999. Advances in Kernel Methods. MIT Press, Cambridge, MA, USA, Chapter Fast Training of Support Vector Machines Using Sequential Minimal Optimization, 185--208. http://dl.acm.org/citation.cfm?id=299094.299105 Google ScholarDigital Library
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
M. Sigdel, İ. Dinç, S. Dinç, M.S. Sigdel, M. L. Pusey, and R.S. Aygün. 2014. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery. In Proceedings of IEEE Southeastcon.Google Scholar
Isaac Triguero, Salvador García, and Francisco Herrera. 2015. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems 42, 2 (2015), 245--284. Google ScholarDigital Library
Jiao Wang, Si wei Luo, and Xian hua Zeng. 2008. A random subspace method for co-training. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 195--200.Google ScholarCross Ref
Tiejian Wang, Zhiwu Zhang, Xiaoyuan Jing, and Yanli Liu. 2016. Non-negative sparse-based SemiBoost for software defect prediction. Software Testing, Verification and Reliability 26, 7 (2016), 498--515. Google ScholarDigital Library
Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining, Practical Machine Learning Tools and Techniques (4th Edition). Morgan Kaufmann. Google ScholarDigital Library
David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL '95). Association for Computational Linguistics, Stroudsburg, PA, USA, 189--196. Google ScholarDigital Library
Yusuf Yaslan and Zehra Cataltepe. 2010. Co-training with relevant random subspaces. Neurocomputing 73, 10-12 (2010), 1652--1661. Subspace Learning / Selected papers from the European Symposium on Time Series Prediction. Google ScholarDigital Library

Index Terms

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
Multiview Semi-Supervised Learning with Consensus

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Read More
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
June 2017
405 pages
ISBN:9781450348041
DOI:10.1145/3084226
Conference Chair:
Emilia Mendes,
Program Chairs:
Steve Counsell,
Kai Petersen
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Apps reviews
Mobile apps
Semi-supervised Learning
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate71of232submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 237
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Multiview Semi-Supervised Learning with Consensus

Semi-supervised partial label learning algorithm via reliable label propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Multiview Semi-Supervised Learning with Consensus

Semi-supervised partial label learning algorithm via reliable label propagation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media