Article

The case for a wide-table approach to manage sparse relational data sets

Authors:
Eric Chu

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

,
Jennifer Beckmann

Microsoft Corporation, Redmond, WA

Microsoft Corporation, Redmond, WA
View Profile

,
Jeffrey Naughton

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataJune 2007Pages 821–832https://doi.org/10.1145/1247480.1247571

Published:11 June 2007Publication History

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 821–832

ABSTRACT

A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.

References

D. Abadi. Redefining Physical Data Independence. To appear in CIDR 2007.Google Scholar
E. Agichtein, L. Gravano: Querying Text Databases for Efficient Information Extraction. ICDE 2003: 113--124.Google Scholar
R. Agrawal, A. Somani, and Y. Xu. Storage and querying of e-commerce data. In Proc. of VLDB, pages 149--158, 2001. Google ScholarDigital Library
R. Agrawal, R. Srikant. Searching with Numbers. WWW 2002.Google Scholar
R. Baylis. Oracle Database Administrator's Guide, 10g, 2003.Google Scholar
J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, 2006.Google ScholarDigital Library
N. Chapin. A Comparison of File Organization Techniques. In Proc. of 24th national conference, pg. 273--283, USA, 1969. ACM Press. Google ScholarDigital Library
S. Chaudhuri, V. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.Google ScholarDigital Library
CLUstering TOolkit (CLUTO). WWW, available at: http://www.cs.umn.edu/karypis/cluto.Google Scholar
CNET Networks, Inc. Product Directory. http://shoppper.cnet.com.Google Scholar
J. Edmonds, J Gryz, D. Liang, R. Miller. Mining for Empty Rectangles in Large Data Sets. ICDT 2001: 174--188. Google ScholarDigital Library
D. Florescu, D. Kossmann, I. Manolescu, "Integrating Keyword Search into XML Query Processing", WWW Conf., 2000.Google ScholarDigital Library
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of VLDB, 2002.Google ScholarDigital Library
Y. Li, C. Yu, H. Jagadish. Schema-Free XQuery. In VLDB, 2004.Google ScholarDigital Library
D. Maier, J. Ullman. Maximal Objects and the Semantics of Universal Relation Databases. ACM Trans. Database Syst., 1983. Google ScholarDigital Library
S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical Partitioning Algorithms for Database Design. ACM Trans. Database Syst., 9(4):680--710, 1984. Google ScholarDigital Library
D. Pyle. Data preparation for data mining. Morgan Kaufmann Publishers Inc., 1999. Google ScholarDigital Library
E. Rahm, P. A. Bernstein, A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (Dec. 2001), pp. 334--350. Google ScholarDigital Library
R. Raman, M. Livny, and M. H. Solomon. Matchmaking: Distributed resource management for high throughput computing. In HPDC, 1998.Google ScholarDigital Library
M. Stonebraker et al. C-Store: a Column-Oriented DBMS. In VLDB 2005.Google ScholarDigital Library
M. Stonebraker. The Case for Partial Indexes. SIGMOD Rec., 18(4):4--11, 1989. Google ScholarDigital Library

Index Terms

The case for a wide-table approach to manage sparse relational data sets
1. Information systems
  1. Data management systems

Recommendations

Translating JSON Data into Relational Data Using Schema-oblivious Approaches
ACM SE '19: Proceedings of the 2019 ACM Southeast Conference

JSON (JavaScript Object Notation) has become popular as the data exchange standard over the Web. JSON has been gaining more popularity over XML due to its simplicity, compactness and ability to fit into the object types of programming languages. The ...
Read More
Sparse relational data sets: issues and an application
Read More
Comparing NoSQL MongoDB to an SQL DB
ACMSE '13: Proceedings of the 51st ACM Southeast Conference

NoSQL database solutions are becoming more and more prevalent in a world currently dominated by SQL relational databases. NoSQL databases were designed to provide database solutions for large volumes of data that is not structured. However, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
relational
sparse data
wide table
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 51
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The case for a wide-table approach to manage sparse relational data sets

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Translating JSON Data into Relational Data Using Schema-oblivious Approaches

Sparse relational data sets: issues and an application

Comparing NoSQL MongoDB to an SQL DB

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The case for a wide-table approach to manage sparse relational data sets

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Translating JSON Data into Relational Data Using Schema-oblivious Approaches

Sparse relational data sets: issues and an application

Comparing NoSQL MongoDB to an SQL DB

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media