Skip to main content
Log in

Techniques and guidelines for effective migration from RDBMS to NoSQL

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Migration from RDBMS to NoSQL has become an important topic in a big data era. This paper provides comprehensive techniques and guidelines for effective migration from RDBMS to NoSQL. We discuss the challenges faced in translating SQL queries; the effects of denormalization, column families, secondary indexes, join algorithms, and column name length; and decision support for the migration. We focus on a column-oriented NoSQL, HBase because it is widely used by many Internet enterprises such as Facebook, Twitter, and LinkedIn. Because HBase does not support SQL, we use Apache Phoenix as an SQL layer on top of HBase. Experimental results using TPC-H show that column-level denormalization with atomicity and grouping columns into column families significantly improve query performance; the use of secondary indexes on foreign keys is not as effective as in RDBMSs; the query optimizer of Phoenix is not very sophisticated; shortened column names significantly reduce the database size and improve query performance; and the SVM classifier can predict whether query performance is improved by migration or not. Important open problems in NoSQL research are supporting complex SQL queries, automatic index selection, and optimizing SQL queries for NoSQL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kim H-J, Ko E-J, Jeon Y-H, Lee K-H (2017) Migration from RDBMS to column-oriented NoSQL: lessons learned and open problems. In: EDB, LNEE, vol 461, pp 25–33

  2. Yoo J, Lee K-H, Jeon Y-H (2018) Migration from RDBMS to NoSQL using column-level denormalization and atomic aggregates. J Inf Sci Eng 34(1):243–259

    Google Scholar 

  3. Karnitis G, Arnicans G (2015) Migration of relational database to document-oriented database: structure denormalization and data transformation. In: CICSyN, pp 113–118

  4. Zhao G, Lin Q, Li L, Li Z (2014) Schema conversion model of SQL database to NoSQL. In: 3PGCIC, pp 355–362

  5. Lee C-H, Zheng Y-L (2015) Automatic SQL-to-NoSQL schema transformation over the MySQL and HBase databases. In: IEEE ICCE-TW, pp 426–427

  6. Zhao G, Li L, Li Z, Lin Q (2014) Multiple nested schema of HBase for migration from SQL. In: 3PGCIC, pp 338–343

  7. Lee C-H, Zheng Y-L (2015) SQL-to-NoSQL schema denormalization and migration: a study on content management systems. In: IEEE SMC, pp 2022–2026

  8. Vajk T, Feher P, Fekete K, Charaf H (2013) Denormalizing data into schema-free databases. In: IEEE CogInfoCom, pp 747–752

  9. Vajk T, Deak L, Fekete K, Mezei G (2013) Automatic NoSQL schema development: a case study. In: PDCN, pp 656–663

  10. Ho L-Y, Hsieh M-J, Wu J-J, Liu P (2015) Data partition optimization for column-family NoSQL databases. In: IEEE Smart City, pp 668–675

  11. Mior MJ, Salem K, Aboulnaga A, Liu R (2016) NoSE: schema design for NoSQL applications. In: IEEE ICDE, pp 181–192

  12. Ge W, Huang Y, Zhao D, Luo S, Yuan C, Zhou W, Tang Y, Zhou J (2014) A secondary index with hotscore caching policy on key-value data store. In: ADMA, LNCS, vol 8933, pp 602–615

  13. Gadkari A, Nikam VB, Meshram BB (2014) Implementing joins over HBase on cloud platform. In: IEEE CIT, pp 547–554

  14. Han D, Stroulia E (2012) A three-dimensional data model in HBase for large time-series dataset analysis. In: IEEE MESOCA, pp 47–56

  15. Baralis E, Valle AD, Garza P, Rossi C, Scullino F (2017) SQL versus NoSQL databases for geospatial applications. In: IEEE BSD

  16. Lee S-A, Kim J-H, Moon Y-S, Lee W-K (2015) Efficient level-based top-down data cube computation using MapReduce. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI, LNCS, vol 9260, pp 1–19

  17. Lee K-H, Park Y-H (2011) Revisiting source-level XQuery normalization. IEICE Trans Inf Syst E94-D(3):622–631

    Article  Google Scholar 

  18. Lee K-H, Kim S-Y, Whang E, Lee J-G (2006) A practitioner’s approach to normalizing XQuery expressions. In: DASFAA, LNCS, vol 3882, pp 437–453

  19. Kim W (1982) On optimizing an SQL-like nested query. ACM Trans database Syst 7(3):443–469

    Article  MathSciNet  Google Scholar 

  20. Ganski R, Wong H (1987) Optimization of nested SQL queries revisited. In: ACM SIGMOD, pp 23–33

  21. TPC-H Queries, https://sites.google.com/site/kwunivdsl

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) Grant funded by the Korea government (MSIT) (No. NRF-2015R 1C 1A 1A02036517). The present Research has been conducted by the Research Grant of Kwangwoon University in 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ki-Hoon Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, HJ., Ko, EJ., Jeon, YH. et al. Techniques and guidelines for effective migration from RDBMS to NoSQL. J Supercomput 76, 7936–7950 (2020). https://doi.org/10.1007/s11227-018-2361-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2361-2

Keywords

Navigation