skip to main content
10.1145/1247480.1247571acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

The case for a wide-table approach to manage sparse relational data sets

Published:11 June 2007Publication History

ABSTRACT

A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.

References

  1. D. Abadi. Redefining Physical Data Independence. To appear in CIDR 2007.Google ScholarGoogle Scholar
  2. E. Agichtein, L. Gravano: Querying Text Databases for Efficient Information Extraction. ICDE 2003: 113--124.Google ScholarGoogle Scholar
  3. R. Agrawal, A. Somani, and Y. Xu. Storage and querying of e-commerce data. In Proc. of VLDB, pages 149--158, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal, R. Srikant. Searching with Numbers. WWW 2002.Google ScholarGoogle Scholar
  5. R. Baylis. Oracle Database Administrator's Guide, 10g, 2003.Google ScholarGoogle Scholar
  6. J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. of ICDE, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Chapin. A Comparison of File Organization Techniques. In Proc. of 24th national conference, pg. 273--283, USA, 1969. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chaudhuri, V. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CLUstering TOolkit (CLUTO). WWW, available at: http://www.cs.umn.edu/karypis/cluto.Google ScholarGoogle Scholar
  10. CNET Networks, Inc. Product Directory. http://shoppper.cnet.com.Google ScholarGoogle Scholar
  11. J. Edmonds, J Gryz, D. Liang, R. Miller. Mining for Empty Rectangles in Large Data Sets. ICDT 2001: 174--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Florescu, D. Kossmann, I. Manolescu, "Integrating Keyword Search into XML Query Processing", WWW Conf., 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of VLDB, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Li, C. Yu, H. Jagadish. Schema-Free XQuery. In VLDB, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Maier, J. Ullman. Maximal Objects and the Semantics of Universal Relation Databases. ACM Trans. Database Syst., 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical Partitioning Algorithms for Database Design. ACM Trans. Database Syst., 9(4):680--710, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Pyle. Data preparation for data mining. Morgan Kaufmann Publishers Inc., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Rahm, P. A. Bernstein, A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (Dec. 2001), pp. 334--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Raman, M. Livny, and M. H. Solomon. Matchmaking: Distributed resource management for high throughput computing. In HPDC, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Stonebraker et al. C-Store: a Column-Oriented DBMS. In VLDB 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Stonebraker. The Case for Partial Indexes. SIGMOD Rec., 18(4):4--11, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The case for a wide-table approach to manage sparse relational data sets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
      June 2007
      1210 pages
      ISBN:9781595936868
      DOI:10.1145/1247480
      • General Chairs:
      • Lizhu Zhou,
      • Tok Wang Ling,
      • Program Chair:
      • Beng Chin Ooi

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader