Abstract
AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.
- Data, Data Everywhere. The Economist, February 25, 2010.Google Scholar
- S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014. Google ScholarDigital Library
- A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. Tsotras. ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distributed and Parallel Databases, 29(3):185--216, 2011. Google ScholarDigital Library
- V. Borkar and M. Carey. A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits. IEEE Data Eng. Bull., 36(1):56--64, 2013.Google Scholar
- V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0: 1151--1162, 2011. Google ScholarDigital Library
- Y. Bu, V. Borkar, M. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Scaling Datalog for Machine Learning on Big Data. CoRR, abs/1203.0160, 2012.Google Scholar
- R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39(4):12--27, May 2011. Google ScholarDigital Library
- D. DeWitt and J. Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85--98, June 1992. Google ScholarDigital Library
- R. Grover and M. Carey. Scalable Fault-Tolerant Data Feeds in AsterixDB. CoRR, abs/1405-1705, 2014.Google Scholar
- F. Keller and S. Wendt. FMC: An Approach Towards Architecture-Centric System Development. In ECBS, pages 173--182, 2003.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, 2010. Google ScholarDigital Library
- P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The Log-Structured Merge-Tree (LSM-tree). Acta Inf., 33: 351--385, June 1996. Google ScholarDigital Library
- P. Pirzadeh, T. Westmann, and M. Carey. A Performance Study of Big Data Management Systems. in preparation.Google Scholar
- AsterixDB Documentation. http://asterixdb.ics.uci.edu/documentation/.Google Scholar
- Experiment Details. https://asterixdb.ics.uci.edu/pub/asterix14/experiments.html.Google Scholar
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- Apache Hive. http://hive.apache.org/.Google Scholar
- Hivesterix. http://code.google.com/p/hyracks/wiki/HivesterixUserManual028.Google Scholar
- AsterixDB. http://asterixdb.ics.uci.edu/.Google Scholar
- JSON. http://www.json.org/.Google Scholar
- MongoDB. http://www.mongodb.org/.Google Scholar
- Pregelix. http://hyracks.org/projects/pregelix/.Google Scholar
- Apache VXQuery. http://vxquery.apache.org/.Google Scholar
- XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/.Google Scholar
Recommendations
Storage management in AsterixDB
Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, ...
Large-scale complex analytics on semi-structured datasets using asterixDB and spark
Large quantities of raw data are being generated by many different sources in different formats. Private and public sectors alike acclaim the valuable information and insights that can be mined from such data to better understand the dynamics of ...
External Data Access And Indexing In AsterixDB
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementTraditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-...
Comments