Abstract
Understanding micro-architectural behavior is important for efficiently using hardware resources. Recent work has shown that in-memory online transaction processing (OLTP) systems severely underutilize their core micro-architecture resources [29]. Whereas, online analytical processing (OLAP) workloads exhibit a completely different computing pattern. OLAP workloads are read-only, bandwidth-intensive, and include various data access patterns. With the rise of column-stores, they run on high-performance engines that are tightly optimized for modern hardware. Consequently, micro-architectural behavior of modern OLAP systems remains unclear.
This work presents a micro-architectural analysis of a set of OLAP systems. The results show that traditional commercial OLAP systems suffer from their long instruction footprint, which results in high response times. High-performance columnstores execute tight instruction streams; however, they spend 25 to 82% of their CPU cycles on stalls both for sequential- and random-access-heavy workloads. Concurrent query execution can improve the utilization, but it creates interference in the shared resources, which results in sub-optimal performance.
- D. Abadi, P. Boncz, and S. Harizopoulos. The Design and Implementation of Modern Column-Oriented Database Systems. Now Publishers Inc., 2013.Google ScholarDigital Library
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on A Modern Processor: Where Does Time Go? In VLDB, pages 266--277, 1999.Google ScholarDigital Library
- A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade. Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server. In BDCloud, pages 1--8, 2015.Google ScholarDigital Library
- A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade. Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. In BDCloud, pages 59--66, 2016.Google ScholarCross Ref
- P. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner. MonetDB/XQuery: A Fast XQuery Processor Powered by A Relational Engine. In SIGMOD, pages 479--490, 2006.Google ScholarDigital Library
- P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. In TPCTC, pages 61--76, 2013.Google Scholar
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In ASPLOS, pages 37--48, 2012.Google ScholarDigital Library
- N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database Servers on Chip Multiprocessors: Limitations and Opportunities. In CIDR, pages 79--87, 2007.Google Scholar
- S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Engineering Bulletin, 35(1):40--45, 2012.Google Scholar
- Intel. Disclosure of Hardware Prefetcher Control on Some Intel Processors. https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors.Google Scholar
- Intel. Intel Memory Latency Checker. https://software.intel.com/en-us/articles/intelr-memory-latency-checker.Google Scholar
- Intel. Understanding How General Exploration Works in Intel VTune Amplifier, 2018. https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe.Google Scholar
- Intel. Intel(R) 64 and IA-32 Architectures Optimization Reference Manual, 2019.Google Scholar
- C. Jonathan, U. F. Minhas, J. Hunter, J. Levandoski, and G. Nishanov. Exploiting Coroutines to Attack the "Killer Nanoseconds". PVLDB, 11(11):1702--1714, 2018.Google ScholarDigital Library
- S. Kanev, J. P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G. Wei, and D. Brooks. Profiling A Warehouse-scale Computer. In ISCA, pages 158--169, 2015.Google ScholarDigital Library
- M. Karpathiotakis, I. Alagiannis, and A. Ailamaki. Fast Queries over Heterogeneous Data Through Engine Customization. PVLDB, 9(12):972--983, 2016.Google ScholarDigital Library
- A. Kemper and T. Neumann. HyPer: A Hybrid OLTP OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, pages 195--206, 2011.Google ScholarDigital Library
- T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. Boncz. Everything You Always Wanted to Know About Compiled and Vectorized Queries but Were Afraid to Ask. PVLDB, 11(13):2209--2222, 2018.Google Scholar
- T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick, S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. Oracle Database In-Memory: A Dual Format In-memory Database. In ICDE, pages 1253--1258, 2015.Google ScholarCross Ref
- P.-A. Larson, C. Clinciu, E. N. Hanson, A. Oks, S. L. Price, S. Rangarajan, A. Surna, and Q. Zhou. SQL Server Column Store Indexes. In SIGMOD, pages 1177--1184, 2011.Google ScholarDigital Library
- V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In SIGMOD, pages 743--754, 2014.Google ScholarDigital Library
- D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis. Heracles: Improving Resource Efficiency at Scale. In ISCA, pages 450--462, 2015.Google ScholarDigital Library
- S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. Knowl. Data Eng., 14(4):709--730, 2002.Google ScholarDigital Library
- J. M. Patel, H. Deshmukh, J. Zhu, N. Potti, Z. Zhang, M. Spehlmann, H. Memisoglu, and S. Saurabh. Quickstep: A Data Platform Based on the Scaling-Up Approach. PVLDB, 11(6):663--676, 2018.Google Scholar
- G. Psaropoulos, T. Legler, N. May, and A. Ailamaki. Interleaving with Coroutines: A Practical Approach for Robust Index Joins. PVLDB, 11(2):230--242, 2017.Google ScholarDigital Library
- G. Psaropoulos, T. Legler, N. May, and A. Ailamaki. Interleaving with Coroutines: A Systematic and Practical Approach to Hide Memory Latency in Index Joins. The VLDB Journal, Dec 2018.Google Scholar
- G. Psaropoulos, I. Oukid, T. Legler, N. May, and A. Ailamaki. Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations. In DAMON, pages 13:1--13:8, 2019.Google ScholarDigital Library
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU Acceleration: So Much More Than Just a Column Store. PVLDB, 6(11):1080--1091, 2013.Google ScholarDigital Library
- U. Sirin, P. Tözün, D. Porobic, and A. Ailamaki. Micro-architectural Analysis of In-memory OLTP. In SIGMOD, pages 387--402, 2016.Google ScholarDigital Library
- U. Sirin, A. Yasin, and A. Ailamaki. A Methodology for OLTP Micro-architectural Analysis. In Damon, pages 1:1--1:10, 2017.Google ScholarDigital Library
- J. Sompolski, M. Zukowski, and P. A. Boncz. Vectorization vs. Compilation in Query Execution. In Damon, pages 33--40, 2011.Google Scholar
- S. Sridharan and J. M. Patel. Profiling R on A Contemporary Processor. PVLDB, 8(2):173--184, 2014.Google Scholar
- P. Tözün, B. Gold, and A. Ailamaki. OLTP in Wonderland: Where Do Cache Misses Come From in Major OLTP Components? In Damon, page 8, 2013.Google Scholar
- P. Tözün, I. Pandis, C. Kaynak, D. Jevdjic, and A. Ailamaki. From A to E: Analyzing TPC's OLTP Benchmarks: The Obsolete, The Ubiquitous, The Unexplored. In EDBT, pages 17--28, 2013.Google Scholar
- TPC. Transcation Processing Performance Council. http://www.tpc.org/.Google Scholar
- A. Yasin. A Top-Down Method for Performance Analysis and Counters Architecture. In ISPASS, pages 35--44, 2014.Google ScholarCross Ref
- A. Yasin, Y. Ben-Asher, and A. Mendelson. Deep-dive Analysis of The Data Analytics Workload in CloudSuite. In IISWC, pages 202--211, 2014.Google ScholarCross Ref
Index Terms
- Micro-architectural analysis of OLAP: limitations and opportunities
Recommendations
A methodology for OLTP micro-architectural analysis
DAMON '17: Proceedings of the 13th International Workshop on Data Management on New HardwareMicro-architectural analysis is critical to investigate the interaction between workloads and processors. While today's aggressive out-of-order processors provide a rich set of performance events for deep execution cycle analysis, OLTP characterization ...
Micro-architectural Analysis of In-memory OLTP
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataMicro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over thepast couple of decades. Results show that traditional OLTP mostly under-utilize the available micro-...
Architectural Considerations for Application-Specific Counterflow Pipelines
ARVLSI '99: Proceedings of the 20th Anniversary Conference on Advanced Research in VLSIApplication-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application- specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a ...
Comments