skip to main content
research-article

Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory

Published:01 January 2021Publication History
Skip Abstract Section

Abstract

On-line decision augmentation (OLDA) has been considered as a promising paradigm for real-time decision making powered by Artificial Intelligence (AI). OLDA has been widely used in many applications such as real-time fraud detection, personalized recommendation, etc. On-line inference puts real-time features extracted from multiple time windows through a pre-trained model to evaluate new data to support decision making. Feature extraction is usually the most time-consuming operation in many OLDA data pipelines. In this work, we started by studying how existing in-memory databases can be leveraged to efficiently support such real-time feature extractions. However, we found that existing in-memory databases cost hundreds or even thousands of milliseconds. This is unacceptable for OLDA applications with strict real-time constraints. We therefore propose FEDB (<u>F</u>eature <u>E</u>ngineering <u>D</u>ata<u>b</u>ase), a distributed in-memory database system designed to efficiently support on-line feature extraction. Our experimental results show that FEDB can be one to two orders of magnitude faster than the state-of-the-art in-memory databases on real-time feature extraction. Furthermore, we explore the use of the Intel Optane DC Persistent Memory Module (PMEM) to make FEDB more cost-effective. When comparing the proposed PMEM-optimized persistent skiplist to the FEDB using DRAM+SSD, PMEM-based FEDB can shorten the tail latency up to 19.7%, reduce the recovery time up to 99.7%, and save up to 58.4% total cost of a real OLDA pipeline.

References

  1. Paul Alcorn. 2019. Intel Optane DIMM Pricing. https://www.tomshardware.com/news/intel-optane-dimm-pricing-performance,39007.html. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  2. Alibabacloud. 2019. Key Concepts and Features of Time Series Databases. https://www.alibabacloud.com/blog/key-concepts-and-features-of-time-series-databases_594734 Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  3. Salem Alqahtani and Murat Demirbas. 2019. Performance Analysis and Comparison of Distributed Machine Learning Systems. arXiv:1909.02061 (2019).Google ScholarGoogle Scholar
  4. Mihnea Andrei, Christian Lemke, Günter Radestock, Robert Schulze, Carsten Thiel, Rolando Blanco, Akanksha Meghlan, Muhammad Sharique, Sebastian Seifert, Surendra Vishnoi, et al. 2017. SAP HANA adoption of non-volatile memory. Proceedings of the VLDB Endowment 10, 12 (2017), 1754--1765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki. 2017. The case for heterogeneous HTAP. In 8th Biennial Conference on Innovative Data Systems Research.Google ScholarGoogle Scholar
  6. Jason Arnold, Boris Glavic, and loan Raicu. 2019. A High-Performance Distributed Relational Database System for Scalable OLAP Processing. In 2019 IEEE Int. Parallel and Distributed Processing Symposium (IPDPS). IEEE, 738--748.Google ScholarGoogle ScholarCross RefCross Ref
  7. Joy Arulraj. 2019. Data Management on Non-Volatile Memory. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson. 2018. BzTree: A high-performance latch-free range index for non-volatile memory. Proceedings of the VLDB Endowment 11, 5 (2018), 553--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joy Arulraj and Andrew Pavlo. 2017. How to build a non-volatile memory database management system. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1753--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joy Arulraj and Andrew Pavlo. 2017. How to Build a Non-Volatile Memory Database Management System. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1753--1758. https://db.cs.cmu.edu/papers/2017/p1753-arulraj.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joy Arulraj and Andrew Pavlo. 2019. Non-volatile memory database management systems. Synthesis Lectures on Data Management 11, 1 (2019), 1--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew Chen, Andy Chow, Aaron Davidson, Arjun DCunha, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Clemens Mewald, Siddharth Murching, Tomas Nykodym, et al. 2020. Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning. 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cheng Chen, Qingsong Wei, Weng-Fai Wong, and Chundong Wang. 2019. NV-Journaling: Locality-Aware Journaling Using Byte-Addressable Non-Volatile Memory. IEEE Trans. Comput. 69, 2 (2019), 288--299.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue. 2016. Fine-grained metadata journaling on NVM. In 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  15. Shimin Chen and Qin Jin. 2015. Persistent b+-trees in non-volatile main memory. Proceedings of the VLDB Endowment 8, 7 (2015), 786--797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1077--1091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Salvatore Sanfilippo et. al. 2009. Redis. https://redis.io/. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  18. Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012. SAP HANA database: data management for modern business applications. ACM Sigmod Record 40, 4 (2012), 45--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shen Gao, Bingsheng He, and Jianliang Xu. 2015. Real-Time In-Memory Check-pointing for Future Hybrid Memory Systems. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS '15). Association for Computing Machinery, New York, NY, USA, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Google. 2019. In-Memory Database. https://cloud.google.com/blog/topics/partners/available-first-on-google-cloud-intel-optane-dc-persistent-memory Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  21. The TANEJA Group. 2012. number of nines availability of systems. http://tanejagroup.com/files/Compellent_TG_Opinion_5_Nines_Sept_20121.pdf Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  22. Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. 2020. Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 1035--1050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-Engine: An optimized storage engine for large-scale E-commerce transaction processing. In Proceedings of the 2019 Int. Conference on Management of Data. 651--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems.. In USENIX annual technical conference, Vol. 8. Boston, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. IDC. 2019. IDC marketscape: Manufacturer evaluation of China machine learning development platform 2019. https://www.idc.com/getdoc.jsp?containerId=CHC45389019 Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  26. Timescale Incorporated. 2019. TimescaleDB. https://github.com/timescale/timescaledb Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  27. InfluxData. 2019. influxDB. https://www.influxdata.com/. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  28. Intel. 2015. Intel® OptaneTM DC persistent memory. "https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  29. Intel. 2015. Pmem.io. https://pmem.io/libpmemobj-cpp/, Last accessed on 26-January-2020.Google ScholarGoogle Scholar
  30. Intel. 2019. The Challenge of Keeping up with data. https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/optane-dc-persistent-memory-brief.pdf) Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  31. Intel. 2019. Introduction to programming for persistent memory. https://github.com/pmemhackathon/2019-11-08/blob/master/PMEM_INTRO.pdf Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  32. Intel. 2019. Ipmctl. https://github.com/intel/ipmctl. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  33. Intel. 2019. libpmemobj. https://github.com/pmem/libpmemobj-cpp/, Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  34. Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXivpreprint arXiv:1903.05714 (2019).Google ScholarGoogle Scholar
  35. Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle database in-memory: Adual format in-memory database. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 1253--1258.Google ScholarGoogle ScholarCross RefCross Ref
  36. Berti-Equille Laure, Bonifati Angela, and Milo Tova. 2018. Machine learning to data management: A round trip. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1735--1738.Google ScholarGoogle ScholarCross RefCross Ref
  37. Se Kwon Lee, K Hyun Lim, Hyunsub Song, Beomseok Nam, and Sam H Noh. 2017. {WORT}: Write Optimal Radix Tree for Persistent Memory Storage Systems. In 15th USENIX Conference on File and Storage Technologies (FAST 17). 257--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, and Matteo Interlandi. 2018. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 611--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Viktor Leis, Kan Kundhikanjana, Alfons Kemper, and Thomas Neumann. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. Proceedings of the VLDB Endowment 8, 10 (2015), 1058--1069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Baotong Lu, Xiangpeng Hao, Tianzheng Wang, and Eric Lo. 2020. Dash: scalable hashing on persistent memory. arXiv preprint arXiv:2003.07302 (2020).Google ScholarGoogle Scholar
  41. Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating machine learning inference with probabilistic predicates. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 1493--1508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. 2017. BatchDB: Efficient isolated execution of hybrid OLTP + OLAP workloads for interactive applications. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 37--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. MemSQL. 2013. https://www.memsql.com/, Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  44. Microsoft. Oct 30, 2018. Windows Server 2019 with Intel® OptaneTM DC persistent memory. https://techcommunity.microsoft.com/t5/Storage-at-Microsoft/The-new-HCI-industry-record-13-7-million-IOPS-with-Windows/ba-p/428314 Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  45. MySQL. 1995. https://www.mysql.com/, Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  46. OpenJDK. 2013. https://openjdk.java.net/projects/code-tools/jmh/, Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  47. Oracle. Sep 16, 2019. Oracle Database with Intel Optane DC Persistent Memory. https://www.oracle.com/corporate/pressrelease/oow19-oracle-intel-partner-optane-exadata-091619.html Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  48. Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, 371--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Top percentile. 2019. TP-X. https://support.huaweicloud.com/intl/en-us/productdesc-apm/apm_06_0002.html. Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  50. Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47, 2 (2018), 17--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Georgios Psaropoulos, Ismail Oukid, Thomas Legler, Norman May, and Anastasia Ailamaki. 2019. Bridging the latency gap between NVM and DRAM for latency-bound operations. In Proceedings of the 15th International Workshop on Data Management on New Hardware. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Eric Chung, Bill Dally, et al. 2019. SysML: The New Frontier of Machine Learning Systems. (2019).Google ScholarGoogle Scholar
  53. Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3, 269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through Elastic Resource Scheduling. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). 2043--2054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Theodoras Rekatsinas, SudeepaRoy, Manasi Vartak, Ce Zhang, and Neoklis Polyzotis. 2019. Opportunities for data management research in the era of horizontal AI/ML. Proceedings of the VLDB Endowment 12, 12 (2019), 2323--2323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Babak Salimi, Corey Cole, Peter Li, Johannes Gehrke, and Dan Suciu. 2018. HypDB: a demonstration of detecting, explaining and resolving bias in OLAP queries. Proceedings of the VLDB Endowment 11, 12 (2018), 2062--2065. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q Ngo, and XuanLong Nguyen. 2019. A layered aggregate engine for analytics workloads. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 1642--1659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, and Zhifeng Bao. 2017. Dima: A distributed in-memory similarity-based query processing system. Proceedings of the VLDB Endowment 10, 12 (2017), 1925--1928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Tracy Tsai. 2019. Competitive Landscape: AI Startups in China. Technical Report. Stamford, USA.Google ScholarGoogle Scholar
  60. Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, and Mitsuru Sato. 2018. Managing Non-Volatile Memory in Database Systems. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 1541--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Manasi Vartak, Joana M F. da Trindade, Samuel Madden, and Matei Zaharia. 2018. Mistique: A system to store and query model intermediates for model diagnosis. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 1285--1300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, Roy H Campbell, et al. 2011. Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory.. In FAST, Vol. 11. 61--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Chundong Wang, Sudipta Chattopadhyay, and Gunavaran Brihadiswarn. 2019. Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory. In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. 33--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Chundong Wang, Qingsong Wei, Lingkun Wu, Sibo Wang, Cheng Chen, Xiaokui Xiao, Jun Yang, Mingdi Xue, and Yechao Yang. 2018. Persisting RB-tree into NVM in a consistency perspective. ACM Trans. on Storage (TOS) 14, 1 (2018), 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Tianzheng Wang, Justin Levandoski, and Per-Ake Larson. 2018. Easy lock-free indexing in non-volatile memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 461--472.Google ScholarGoogle ScholarCross RefCross Ref
  66. Wikipedia. 2019. Compare-and-swap. https://en.wikipedia.org/wiki/Compare-and-swap Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  67. Wikipedia. 2019. LLVM. https://en.wikipedia.org/wiki/Click-through_rate Last accessed on 02-July-2020.Google ScholarGoogle Scholar
  68. Jian Xu and Steven Swanson. 2016. {NOVA}: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. In 14th USENIX Conference on File and Storage Technologies ({FAST} 16). 323--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: reducing consistency cost for NVM-based single level systems. In 13th USENIX Conference on File and Storage Technologies (FAST 15). 167--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Luo Yuanfei, Wang Mengshuo, Zhou Hao, Yao Quanming, Tu WeiWei, Chen Yuqiang, Yang Qiang, and Dai Wenyuan. 2019. AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications. arXiv preprint arXiv:1904.12857 (2019).Google ScholarGoogle Scholar
  71. Chaoqun Zhan, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng Wang, Zhe Chen, Feifei Li, Yue Pan, Fang Zheng, et al. 2019. AnalyticDB: real-time OLAP database system at Alibaba cloud. Proceedings of the VLDB Endowment 12, 12 (2019), 2059--2070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yu Zhang, Shan Wang, Jiaheng Lu, et al. 2018. Fusion OLAP: Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP. IEEE Transactions on Knowledge and Data Engineering 31, 9 (2018), 1722--1735.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 5
    January 2021
    142 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 January 2021
    Published in pvldb Volume 14, Issue 5

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader