skip to main content
research-article

AsterixDB: a scalable, open source BDMS

Authors Info & Claims
Published:01 October 2014Publication History
Skip Abstract Section

Abstract

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.

Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.

References

  1. Data, Data Everywhere. The Economist, February 25, 2010.Google ScholarGoogle Scholar
  2. S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. Tsotras. ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distributed and Parallel Databases, 29(3):185--216, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Borkar and M. Carey. A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits. IEEE Data Eng. Bull., 36(1):56--64, 2013.Google ScholarGoogle Scholar
  5. V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0: 1151--1162, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Bu, V. Borkar, M. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Scaling Datalog for Machine Learning on Big Data. CoRR, abs/1203.0160, 2012.Google ScholarGoogle Scholar
  7. R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39(4):12--27, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. DeWitt and J. Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85--98, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Grover and M. Carey. Scalable Fault-Tolerant Data Feeds in AsterixDB. CoRR, abs/1405-1705, 2014.Google ScholarGoogle Scholar
  10. F. Keller and S. Wendt. FMC: An Approach Towards Architecture-Centric System Development. In ECBS, pages 173--182, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The Log-Structured Merge-Tree (LSM-tree). Acta Inf., 33: 351--385, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Pirzadeh, T. Westmann, and M. Carey. A Performance Study of Big Data Management Systems. in preparation.Google ScholarGoogle Scholar
  14. AsterixDB Documentation. http://asterixdb.ics.uci.edu/documentation/.Google ScholarGoogle Scholar
  15. Experiment Details. https://asterixdb.ics.uci.edu/pub/asterix14/experiments.html.Google ScholarGoogle Scholar
  16. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  17. Apache Hive. http://hive.apache.org/.Google ScholarGoogle Scholar
  18. Hivesterix. http://code.google.com/p/hyracks/wiki/HivesterixUserManual028.Google ScholarGoogle Scholar
  19. AsterixDB. http://asterixdb.ics.uci.edu/.Google ScholarGoogle Scholar
  20. JSON. http://www.json.org/.Google ScholarGoogle Scholar
  21. MongoDB. http://www.mongodb.org/.Google ScholarGoogle Scholar
  22. Pregelix. http://hyracks.org/projects/pregelix/.Google ScholarGoogle Scholar
  23. Apache VXQuery. http://vxquery.apache.org/.Google ScholarGoogle Scholar
  24. XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 14
    October 2014
    244 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 October 2014
    Published in pvldb Volume 7, Issue 14

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader