skip to main content
10.1145/2901739.2903508acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

AndroZoo: collecting millions of Android apps for the research community

Published:14 May 2016Publication History

ABSTRACT

We present a growing collection of Android Applications collected from several sources, including the official Google Play app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has been analysed by tens of different Antivirus products to know which applications are detected as Malware. We provide this dataset to contribute to ongoing research efforts, as well as to enable new potential research topics on Android Apps. By releasing our dataset to the research community, we also aim at encouraging our fellow researchers to engage in reproducible experiments.

References

  1. K. Allix, T. F. Bissyandé, Q. Jerome, J. Klein, R. State, and Y. Le Traon. Empirical assessment of machine learning-based malware detectors for android: Measuring the gap between in-the-lab and in-the-wild validation scenarios. Empirical Software Engineering, pages 1--29, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. Are your training datasets yet relevant? an investigation into the importance of timeline in machine learning-based malware detection. In Engineering Secure Software and Systems, volume 8978 of LNCS, pages 51--67. Springer International Publishing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. K. Allix, Q. Jérome, T. F. Bissyandé, J. Klein, R. State, and Y. Le Traon. A forensic analysis of android malware: How is malware written and how it could be detected? In Computer Software and Applications Conference (COMPSAC), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Hecht, O. Benomar, R. Rouvoy, N. Moha, and L. Duchien. Tracking the software quality of android applications along their evolution. In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on, pages 236--247, Nov 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Li, A. Bartel, T. F. Bissyandé, J. Klein, Y. Le Traon, S. Arzt, S. Rasthofer, E. Bodden, D. Octeau, and P. McDaniel. Iccta: Detecting inter-component privacy leaks in android apps. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, volume 1, pages 280--291, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Zhou and X. Jiang. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP '12, pages 95--109, Washington, DC, USA, 2012. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AndroZoo: collecting millions of Android apps for the research community

        Recommendations

        Reviews

        James Harold Davenport

        This is basically a data collection paper. How did the authors collect more than three million free Android apps (more than 20 terabytes) The answer: it's somewhat more delicate than one might have thought. In particular, one should avoid triggering the source's defenses. Deduplication is also a problem, as is distinguishing a source with no changes from a source that has changed in such a way that we don't detect the changes. They also give some statistics: 60 percent of the apps are from Google Play, and two Chinese markets account for a little less than 20 percent each. While 22 percent of the Google Play apps trigger at least one of the antivirus products at VirusTotal, less than one percent trigger ten or more of them. This is in marked contrast to the Chinese stores (where 33 percent and 17 percent, respectively, trigger at least ten) or another store where 100 percent do. The dataset is available, though the authors have some important caveats based on "the lack of a clear, universal copyright exemption for research." The authors use this tool in their own research, for which it is good to have the data collection methodology so clearly described.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
          May 2016
          544 pages
          ISBN:9781450341868
          DOI:10.1145/2901739

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 May 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader