A Framework to Benchmark NoSQL Data Stores for Large-Scale Model Persistence

Shah, Seyyed M.; Wei, Ran; Kolovos, Dimitrios S.; Rose, Louis M.; Paige, Richard F.; Barmpis, Konstantinos

doi:10.1007/978-3-319-11653-2_36

Seyyed M. Shah¹⁸,
Ran Wei¹⁸,
Dimitrios S. Kolovos¹⁸,
Louis M. Rose¹⁸,
Richard F. Paige¹⁸ &
…
Konstantinos Barmpis¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8767))

Included in the following conference series:

International Conference on Model Driven Engineering Languages and Systems

3095 Accesses
4 Citations

Abstract

We present a framework and methodology to benchmark NoSQL stores for large scale model persistence. NoSQL technologies potentially improve performance of some applications and provide schema-less data-structures, so are particularly suited to persisting large and heterogeneous models. Recent studies consider only a narrow set of NoSQL stores for large scale modelling. Benchmarking many technologies requires substantial effort due to the disparate interface each store provides. Our experiments compare a broad range of NoSQL stores in terms of processor time and disc space used. The framework and methodology is evaluated through a case study that involves persisting large reverse-engineered models of open source projects. The results give tool engineers and practitioners a basis for selecting a store to persist large models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse modeling framework. Pearson Education (2008)
Google Scholar
Kolovos, D.S., Rose, L.M., Matragkas, N., Paige, R.F., Guerra, E., Cuadrado, J.S., De Lara, J., Ráth, I., Varró, D., Tisi, M., Cabot, J.: A Research Roadmap Towards Achieving Scalability in Model Driven Engineering. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 2:1–2:10. ACM, New York (2013)
Google Scholar
Barmpis, K., Kolovos, D.S.: Evaluation of Contemporary Graph Databases for Efficient Persistence of Large-Scale Models. Journal of Object Technology (to appear, 2014)
Google Scholar
Espinazo Pagán, J., Sánchez Cuadrado, J., García Molina, J.: Morsa: A Scalable Approach for Persisting and Accessing Large Models. In: Whittle, J., Clark, T., Kühne, T. (eds.) MODELS 2011. LNCS, vol. 6981, pp. 77–92. Springer, Heidelberg (2011)
Chapter Google Scholar
Fitzpatrick, B.: Distributed caching with memcached. Linux Journal 2004(124), 5 (2004)
Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Article Google Scholar
Fink, B.: Distributed computation on dynamo-style distributed storage: Riak pipe. In: Hoffman, T., Hughes, J. (eds.) Erlang Workshop, pp. 43–50. ACM (2012)
Google Scholar
Fuchs, A.: Accumulo–Extensions to Google’s Bigtable Design (2012)
Google Scholar
Auradkar, A., Botev, C., Das, S., De Maagd, D., Feinberg, A., Ganti, P., Gao, L., Ghosh, B., Gopalakrishna, K., Harris, B., Koshy, J., Krawez, K., Kreps, J., Lu, S., Nagaraj, S., Narkhede, N., Pachev, S., Perisic, I., Qiao, L., Quiggle, T., Rao, J., Schulman, B., Sebastian, A., Seeliger, O., Silberstein, A., Shkolnik, B., Soman, C., Sumbaly, R., Surlaker, K., Topiwala, S., Tran, C., Varadarajan, B., Westerman, J., White, Z., Zhang, D., Zhang, J.: Data Infrastructure at LinkedIn. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1370–1381 (April 2012)
Google Scholar
Chodorow, K., Dirolf, M.: MongoDB - The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly (2010)
Google Scholar
Brown, M.C.: Getting Started with CouchDB - Extreme Scalability at Your Fingertips. O’Reilly (2012)
Google Scholar
ArangoDB, https://www.arangodb.org
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2006 (2006)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: A decentralized structured storage system. Operating Systems Review 44(2), 35–40 (2010)
Article Google Scholar
George, L.: HBase: The Definitive Guide, 1st edn. O’Reilly Media (2011)
Google Scholar
Webber, J.: A programmatic introduction to Neo4j. In: Leavens, G.T. (ed.) SPLASH, pp. 217–218. ACM (2012)
Google Scholar
OrientDB, http://www.orientechnologies.com/orientdb .
TitanDB, http://thinkaurelius.github.io/titan
Kuhlmann, M., Hamann, L., Gogolla, M., Büttner, F.: A benchmark for OCL engine accuracy, determinateness, and efficiency. Software and System Modeling 11(2), 165–182 (2012)
Article Google Scholar
Bergmann, G., Ujhelyi, Z., Ráth, I., Varró, D.: A Graph Query Language for EMF Models. In: Cabot, J., Visser, E. (eds.) ICMT 2011. LNCS, vol. 6707, pp. 167–182. Springer, Heidelberg (2011)
Chapter Google Scholar
Varró, G., Schürr, A., Varró, D.: Benchmarking for Graph Transformation. In: VL/HCC, pp. 79–88 (2005)
Google Scholar
Barmpis, K., Kolovos, D.S.: Comparative Analysis of Data Persistence Technologies for Large-Scale Models. In: XM@MoDELS (2012)
Google Scholar
(CDO): Connected Data Objects, http://www.eclipse.org/cdo/documentation/index.php
Paige, R.F., Kolovos, D.S., Rose, L.M., Drivalos, N., Polack, F.A.C.: The Design of a Conceptual Framework and Technical Infrastructure for Model Management Language Engineering. In: Proc. 14th IEEE International Conference on Engineering of Complex Computer Systems, Potsdam, Germany (2009)
Google Scholar
MongoEMF, https://github.com/BryanHunt/mongo-emf
Neo4EMF, http://neo4emf.com
MySQL: http://www.mysql.com/.
ObjectivityDB, http://www.objectivity.com/products/objectivitydb
Scheidgen, M., Zubow, A., Fischer, J., Kolbe, T.H.: Automated and transparent model fragmentation for persisting large models. In: France, R.B., Kazmeier, J., Breu, R., Atkinson, C. (eds.) MODELS 2012. LNCS, vol. 7590, pp. 102–118. Springer, Heidelberg (2012)
Chapter Google Scholar
Barmpis, K., Kolovos, D.: Hawk: Towards a scalable model indexing architecture. In: Proceedings of the Workshop on Scalability in Model Driven Engineering, BigMDE 2013, pp. 6:1–6:9. ACM, New York (2013)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
Google Scholar
Bruneliere, H., Cabot, J., Jouault, F., Madiot, F.: MoDisco: A generic and extensible framework for model driven reverse engineering. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 173–174. ACM (2010)
Google Scholar
Ait-Ameur, Y., Besnard, F., Girard, P., Pierra, G., Potier, J.C.: Formal specification and metaprogramming in the EXPRESS language. In: Intern. Conference on Software Engineering and Knowledge Engineering SEKE, vol. 95, pp. 181–189 (1995)
Google Scholar
TinkerPop: Blueprints, https://github.com/tinkerpop/blueprints/wiki
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying rdf and rdf schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Chapter Google Scholar
SparkSee, http://www.sparsity-technologies.com/#sparksee
AccumuloDB, https://accumulo.apache.org/
FoundationDB, https://foundationdb.com/
Seltzer, M.: Oracle nosql database. Oracle White Paper (2011)
Google Scholar
Brewer, E.A.: Towards robust distributed systems. In: PODC (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of York, UK
Seyyed M. Shah, Ran Wei, Dimitrios S. Kolovos, Louis M. Rose, Richard F. Paige & Konstantinos Barmpis

Authors

Seyyed M. Shah
View author publications
You can also search for this author in PubMed Google Scholar
Ran Wei
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios S. Kolovos
View author publications
You can also search for this author in PubMed Google Scholar
Louis M. Rose
View author publications
You can also search for this author in PubMed Google Scholar
Richard F. Paige
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Barmpis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Queen’s University, Kingston, ON, Canada
Juergen Dingel
Microsoft Research, Redmond, WA, USA
Wolfram Schulte
Universitat Politècnica de València, Spain
Isidro Ramos , Silvia Abrahão & Emilio Insfran , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, S.M., Wei, R., Kolovos, D.S., Rose, L.M., Paige, R.F., Barmpis, K. (2014). A Framework to Benchmark NoSQL Data Stores for Large-Scale Model Persistence. In: Dingel, J., Schulte, W., Ramos, I., Abrahão, S., Insfran, E. (eds) Model-Driven Engineering Languages and Systems. MODELS 2014. Lecture Notes in Computer Science, vol 8767. Springer, Cham. https://doi.org/10.1007/978-3-319-11653-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-11653-2_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11652-5
Online ISBN: 978-3-319-11653-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics