skip to main content
research-article

Approximate denial constraints

Published:01 June 2020Publication History
Skip Abstract Section

Abstract

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.

References

  1. R. Abreu and A. J. C. van Gemund. A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In SARA, 2009.Google ScholarGoogle Scholar
  2. P. Arabie, S. A. Boorman, and P. R. Levitt. Constructing blockmodels: How and why. Journal of mathematical psychology, 17(1):21--63, 1978.Google ScholarGoogle Scholar
  3. R. Bar-Yehuda and S. Even. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algorithms, 2(2):198--203, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. Bleifuß, S. Kruse, and F. Naumann. Efficient denial constraint discovery with hydra. PVLDB, 11(3):311--323, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Boullé. Universal approximation of edge density in large graphs. arXiv preprint arXiv:1508.01340, 2015.Google ScholarGoogle Scholar
  6. N. Bus, N. H. Mustafa, and S. Ray. Practical and efficient algorithms for the geometric hitting set problem. Discrete Applied Mathematics, 240:25--32, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  7. N. Cardoso and R. Abreu. MHS2: A map-reduce heuristic-driven minimal hitting set search algorithm. In MUSEPAT, pages 25--36, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Chandrasekaran, R. M. Karp, E. Moreno-Centeno, and S. Vempala. Algorithms for implicit hitting set problems. In SODA, pages 614--629, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Chiang and R. J. Miller. Discovering data quality rules. PVLDB, 1(1):1166--1177, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90--121, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. PVLDB, 6(13):1498--1509, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Combi, M. Mantovani, A. Sabaini, P. Sala, F. Amaddeo, U. Moretti, and G. Pozzi. Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases. Comp. in Bio. and Med., 62:306--324, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng., 23(5):683--698, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. U. Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964--984, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. AI Commun., 12(3):139--160, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Fortunato. Community detection in graphs. Physics reports, 486(3-5):75--174, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Gainer-Dewar and P. Vera-Licona. The minimal hitting set generation problem: Algorithms and computation. SIAM J. Discrete Math., 31(1):63--100, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Goldreich and D. Ron. On estimating the average degree of a graph. Electronic Colloquim on Computational Complexity (ECCC), 2004.Google ScholarGoogle Scholar
  19. E. Gribkoff, G. V. den Broeck, and D. Suciu. The most probable database problem. 2014.Google ScholarGoogle Scholar
  20. A. Heidari, I. F. Ilyas, and T. Rekatsinas. Approximate inference in structured instances with noisy categorical observations. In UAI, page 152. AUAI Press, 2019.Google ScholarGoogle Scholar
  21. A. Heidari, J. McGrath, I. F. Ilyas, and T. Rekatsinas. Holodetect: Few-shot learning for error detection. In SIGMOD Conference, pages 829--846. ACM, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109--137, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J., 42(2):100--111, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Kivinen and H. Mannila. Approximate dependency inference from relations, pages 86--98. Springer Berlin Heidelberg, Berlin, Heidelberg, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Li, Z. Li, Q. Chen, T. Jiang, and Z. Yin. Discovering approximate functional dependencies from distributed big data. In APWeb, pages 289--301, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. Liu, J. Li, C. Liu, and Y. Chen. Discover dependencies from data - A review. IEEE Trans. Knowl. Data Eng., 24(2):251--264, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Livshits, A. Heidari, I. F. Ilyas, and B. Kimelfeld. Approximate denial constraints. CoRR, abs/2005.08540, 2020.Google ScholarGoogle Scholar
  28. E. Livshits, I. F. Ilyas, B. Kimelfeld, and S. Roy. Principles of progress indicators for database repairing. CoRR, abs/1904.06492, 2019.Google ScholarGoogle Scholar
  29. E. Livshits, B. Kimelfeld, and S. Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1--4:46, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Lopes, J. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In EDBT, pages 350--364, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Lorrain and H. C. White. Structural equivalence of individuals in social networks. The Journal of mathematical sociology, 1(1):49--80, 1971.Google ScholarGoogle Scholar
  33. K. Murakami and T. Uno. Efficient algorithms for dualizing large-scale hypergraphs. Discrete Applied Mathematics, 170:83--94, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Nourine, A. Quilliot, and H. Toussaint. Partial enumeration of minimal transversals of a hypergraph. In CLA, pages 123--134, 2015.Google ScholarGoogle Scholar
  35. N. Novelli and R. Cicchetti. FUN: an efficient algorithm for mining functional and embedded dependencies. In ICDT, pages 189--203, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery: An experimental evaluation of seven algorithms. PVLDB, 8(10):1082--1093, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. H. M. Pena and E. C. de Almeida. BFASTDC: A bitwise algorithm for mining denial constraints. In DEXA, pages 53--68, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. H. M. Pena, E. C. de Almeida, and F. Naumann. Discovery of approximate (and exact) denial constraints. PVLDB, 13(3), 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Rammelaere and F. Geerts. Revisiting conditional functional dependency discovery: Splitting the "c" from the "fd". In ECML/PKDD (2), volume 11052 of Lecture Notes in Computer Science, pages 552--568. Springer, 2018.Google ScholarGoogle Scholar
  40. S. E. Schaeffer. Graph clustering. Computer science review, 1(1):27--64, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. A. Vinterbo and A. Öhrn. Minimal approximate hitting sets and rule templates. Int. J. Approx. Reasoning, 25(2):123--143, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  42. C. M. Wyss, C. Giannella, and E. L. Robertson. Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances - extended abstract. In DaWaK, pages 101--110, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 10
    June 2020
    193 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 June 2020
    Published in pvldb Volume 13, Issue 10

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader