research-article

AsterixDB: a scalable, open source BDMS

Authors:
Sattam Alsubaiee

University of California, Irvine

University of California, Irvine
View Profile

,
Yasser Altowim

University of California, Irvine

University of California, Irvine
View Profile

,
Hotham Altwaijry

University of California, Irvine

University of California, Irvine
View Profile

,
Alexander Behm

Cloudera Inc.

Cloudera Inc.
View Profile

,
Vinayak Borkar

University of California, Irvine

University of California, Irvine
View Profile

,
Yingyi Bu

University of California, Irvine

University of California, Irvine
View Profile

,
Michael Carey

University of California, Irvine

University of California, Irvine
View Profile

,
Inci Cetindil

University of California, Irvine

University of California, Irvine
View Profile

,
Madhusudan Cheelangi

Google

Google
View Profile

,
Khurram Faraaz

IBM

IBM
View Profile

,
Eugenia Gabrielova

University of California, Irvine

University of California, Irvine
View Profile

,
Raman Grover

University of California, Irvine

University of California, Irvine
View Profile

,
Zachary Heilbron

University of California, Irvine

University of California, Irvine
View Profile

,
Young-Seok Kim

University of California, Irvine

University of California, Irvine
View Profile

,
Chen Li

University of California, Irvine

University of California, Irvine
View Profile

,
Guangqiang Li

MarkLogic Corp.

MarkLogic Corp.
View Profile

,
Ji Mahn Ok

University of California, Irvine

University of California, Irvine
View Profile

,
Nicola Onose

Pivotal Inc.

Pivotal Inc.
View Profile

,
Pouria Pirzadeh

University of California, Irvine

University of California, Irvine
View Profile

,
Vassilis Tsotras

University of California, Riverside

University of California, Riverside
View Profile

,
Rares Vernica

HP Labs

HP Labs
View Profile

,
Jian Wen

Oracle Labs

Oracle Labs
View Profile

,
Till Westmann

Oracle Labs

Oracle Labs
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 14pp 1905–1916https://doi.org/10.14778/2733085.2733096

Published:01 October 2014Publication History

Proceedings of the VLDB Endowment

Abstract

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B⁺-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.

Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.

References

Data, Data Everywhere. The Economist, February 25, 2010.Google Scholar
S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014. Google ScholarDigital Library
A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. Tsotras. ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distributed and Parallel Databases, 29(3):185--216, 2011. Google ScholarDigital Library
V. Borkar and M. Carey. A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits. IEEE Data Eng. Bull., 36(1):56--64, 2013.Google Scholar
V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0: 1151--1162, 2011. Google ScholarDigital Library
Y. Bu, V. Borkar, M. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Scaling Datalog for Machine Learning on Big Data. CoRR, abs/1203.0160, 2012.Google Scholar
R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39(4):12--27, May 2011. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85--98, June 1992. Google ScholarDigital Library
R. Grover and M. Carey. Scalable Fault-Tolerant Data Feeds in AsterixDB. CoRR, abs/1405-1705, 2014.Google Scholar
F. Keller and S. Wendt. FMC: An Approach Towards Architecture-Centric System Development. In ECBS, pages 173--182, 2003.Google ScholarCross Ref
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, 2010. Google ScholarDigital Library
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The Log-Structured Merge-Tree (LSM-tree). Acta Inf., 33: 351--385, June 1996. Google ScholarDigital Library
P. Pirzadeh, T. Westmann, and M. Carey. A Performance Study of Big Data Management Systems. in preparation.Google Scholar
AsterixDB Documentation. http://asterixdb.ics.uci.edu/documentation/.Google Scholar
Experiment Details. https://asterixdb.ics.uci.edu/pub/asterix14/experiments.html.Google Scholar
Apache Hadoop. http://hadoop.apache.org/.Google Scholar
Apache Hive. http://hive.apache.org/.Google Scholar
Hivesterix. http://code.google.com/p/hyracks/wiki/HivesterixUserManual028.Google Scholar
AsterixDB. http://asterixdb.ics.uci.edu/.Google Scholar
JSON. http://www.json.org/.Google Scholar
MongoDB. http://www.mongodb.org/.Google Scholar
Pregelix. http://hyracks.org/projects/pregelix/.Google Scholar
Apache VXQuery. http://vxquery.apache.org/.Google Scholar
XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/.Google Scholar

Recommendations

Storage management in AsterixDB

Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, ...
Read More
Large-scale complex analytics on semi-structured datasets using asterixDB and spark

Large quantities of raw data are being generated by many different sources in different formats. Private and public sectors alike acclaim the valuable information and insights that can be mined from such data to better understand the dynamics of ...
Read More
External Data Access And Indexing In AsterixDB
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Traditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 7, Issue 14
October 2014
244 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 October 2014
Published in pvldb Volume 7, Issue 14
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 85
  Total Citations
  View Citations
- 350
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AsterixDB: a scalable, open source BDMS

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Storage management in AsterixDB

Large-scale complex analytics on semi-structured datasets using asterixDB and spark

External Data Access And Indexing In AsterixDB

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AsterixDB: a scalable, open source BDMS

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Storage management in AsterixDB

Large-scale complex analytics on semi-structured datasets using asterixDB and spark

External Data Access And Indexing In AsterixDB

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media