ABSTRACT
A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.
- D. Abadi. Redefining Physical Data Independence. To appear in CIDR 2007.Google Scholar
- E. Agichtein, L. Gravano: Querying Text Databases for Efficient Information Extraction. ICDE 2003: 113--124.Google Scholar
- R. Agrawal, A. Somani, and Y. Xu. Storage and querying of e-commerce data. In Proc. of VLDB, pages 149--158, 2001. Google ScholarDigital Library
- R. Agrawal, R. Srikant. Searching with Numbers. WWW 2002.Google Scholar
- R. Baylis. Oracle Database Administrator's Guide, 10g, 2003.Google Scholar
- J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, 2006.Google ScholarDigital Library
- N. Chapin. A Comparison of File Organization Techniques. In Proc. of 24th national conference, pg. 273--283, USA, 1969. ACM Press. Google ScholarDigital Library
- S. Chaudhuri, V. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.Google ScholarDigital Library
- CLUstering TOolkit (CLUTO). WWW, available at: http://www.cs.umn.edu/karypis/cluto.Google Scholar
- CNET Networks, Inc. Product Directory. http://shoppper.cnet.com.Google Scholar
- J. Edmonds, J Gryz, D. Liang, R. Miller. Mining for Empty Rectangles in Large Data Sets. ICDT 2001: 174--188. Google ScholarDigital Library
- D. Florescu, D. Kossmann, I. Manolescu, "Integrating Keyword Search into XML Query Processing", WWW Conf., 2000.Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of VLDB, 2002.Google ScholarDigital Library
- Y. Li, C. Yu, H. Jagadish. Schema-Free XQuery. In VLDB, 2004.Google ScholarDigital Library
- D. Maier, J. Ullman. Maximal Objects and the Semantics of Universal Relation Databases. ACM Trans. Database Syst., 1983. Google ScholarDigital Library
- S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical Partitioning Algorithms for Database Design. ACM Trans. Database Syst., 9(4):680--710, 1984. Google ScholarDigital Library
- D. Pyle. Data preparation for data mining. Morgan Kaufmann Publishers Inc., 1999. Google ScholarDigital Library
- E. Rahm, P. A. Bernstein, A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (Dec. 2001), pp. 334--350. Google ScholarDigital Library
- R. Raman, M. Livny, and M. H. Solomon. Matchmaking: Distributed resource management for high throughput computing. In HPDC, 1998.Google ScholarDigital Library
- M. Stonebraker et al. C-Store: a Column-Oriented DBMS. In VLDB 2005.Google ScholarDigital Library
- M. Stonebraker. The Case for Partial Indexes. SIGMOD Rec., 18(4):4--11, 1989. Google ScholarDigital Library
Index Terms
- The case for a wide-table approach to manage sparse relational data sets
Recommendations
Translating JSON Data into Relational Data Using Schema-oblivious Approaches
ACM SE '19: Proceedings of the 2019 ACM Southeast ConferenceJSON (JavaScript Object Notation) has become popular as the data exchange standard over the Web. JSON has been gaining more popularity over XML due to its simplicity, compactness and ability to fit into the object types of programming languages. The ...
Comparing NoSQL MongoDB to an SQL DB
ACMSE '13: Proceedings of the 51st ACM Southeast ConferenceNoSQL database solutions are becoming more and more prevalent in a world currently dominated by SQL relational databases. NoSQL databases were designed to provide database solutions for large volumes of data that is not structured. However, the ...
Comments