research-article

Approximate denial constraints

Authors:
Ester Livshits

Technion

Technion
View Profile

,
Alireza Heidari

University of Waterloo

University of Waterloo
View Profile

,
Ihab F. Ilyas

University of Waterloo

University of Waterloo
View Profile

,
Benny Kimelfeld

Technion

Technion
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 10pp 1682–1695https://doi.org/10.14778/3401960.3401966

Published:01 June 2020Publication History

Proceedings of the VLDB Endowment

Abstract

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.

References

R. Abreu and A. J. C. van Gemund. A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In SARA, 2009.Google Scholar
P. Arabie, S. A. Boorman, and P. R. Levitt. Constructing blockmodels: How and why. Journal of mathematical psychology, 17(1):21--63, 1978.Google Scholar
R. Bar-Yehuda and S. Even. A linear-time approximation algorithm for the weighted vertex cover problem. J. Algorithms, 2(2):198--203, 1981.Google ScholarCross Ref
T. Bleifuß, S. Kruse, and F. Naumann. Efficient denial constraint discovery with hydra. PVLDB, 11(3):311--323, 2017. Google ScholarDigital Library
M. Boullé. Universal approximation of edge density in large graphs. arXiv preprint arXiv:1508.01340, 2015.Google Scholar
N. Bus, N. H. Mustafa, and S. Ray. Practical and efficient algorithms for the geometric hitting set problem. Discrete Applied Mathematics, 240:25--32, 2018.Google ScholarCross Ref
N. Cardoso and R. Abreu. MHS2: A map-reduce heuristic-driven minimal hitting set search algorithm. In MUSEPAT, pages 25--36, 2013. Google ScholarDigital Library
K. Chandrasekaran, R. M. Karp, E. Moreno-Centeno, and S. Vempala. Algorithms for implicit hitting set problems. In SODA, pages 614--629, 2011. Google ScholarDigital Library
F. Chiang and R. J. Miller. Discovering data quality rules. PVLDB, 1(1):1166--1177, 2008. Google ScholarDigital Library
J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90--121, 2005. Google ScholarDigital Library
X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. PVLDB, 6(13):1498--1509, 2013. Google ScholarDigital Library
C. Combi, M. Mantovani, A. Sabaini, P. Sala, F. Amaddeo, U. Moretti, and G. Pozzi. Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases. Comp. in Bio. and Med., 62:306--324, 2015. Google ScholarDigital Library
W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng., 23(5):683--698, 2011. Google ScholarDigital Library
U. Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964--984, 2006. Google ScholarDigital Library
P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. AI Commun., 12(3):139--160, 1999. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics reports, 486(3-5):75--174, 2010.Google ScholarCross Ref
A. Gainer-Dewar and P. Vera-Licona. The minimal hitting set generation problem: Algorithms and computation. SIAM J. Discrete Math., 31(1):63--100, 2017.Google ScholarDigital Library
O. Goldreich and D. Ron. On estimating the average degree of a graph. Electronic Colloquim on Computational Complexity (ECCC), 2004.Google Scholar
E. Gribkoff, G. V. den Broeck, and D. Suciu. The most probable database problem. 2014.Google Scholar
A. Heidari, I. F. Ilyas, and T. Rekatsinas. Approximate inference in structured instances with noisy categorical observations. In UAI, page 152. AUAI Press, 2019.Google Scholar
A. Heidari, J. McGrath, I. F. Ilyas, and T. Rekatsinas. Holodetect: Few-shot learning for error detection. In SIGMOD Conference, pages 829--846. ACM, 2019. Google ScholarDigital Library
P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109--137, 1983.Google ScholarCross Ref
Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J., 42(2):100--111, 1999.Google ScholarCross Ref
J. Kivinen and H. Mannila. Approximate dependency inference from relations, pages 86--98. Springer Berlin Heidelberg, Berlin, Heidelberg, 1992. Google ScholarDigital Library
W. Li, Z. Li, Q. Chen, T. Jiang, and Z. Yin. Discovering approximate functional dependencies from distributed big data. In APWeb, pages 289--301, 2016.Google ScholarCross Ref
J. Liu, J. Li, C. Liu, and Y. Chen. Discover dependencies from data - A review. IEEE Trans. Knowl. Data Eng., 24(2):251--264, 2012. Google ScholarDigital Library
E. Livshits, A. Heidari, I. F. Ilyas, and B. Kimelfeld. Approximate denial constraints. CoRR, abs/2005.08540, 2020.Google Scholar
E. Livshits, I. F. Ilyas, B. Kimelfeld, and S. Roy. Principles of progress indicators for database repairing. CoRR, abs/1904.06492, 2019.Google Scholar
E. Livshits, B. Kimelfeld, and S. Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1--4:46, 2020. Google ScholarDigital Library
A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarDigital Library
S. Lopes, J. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In EDBT, pages 350--364, 2000. Google ScholarDigital Library
F. Lorrain and H. C. White. Structural equivalence of individuals in social networks. The Journal of mathematical sociology, 1(1):49--80, 1971.Google Scholar
K. Murakami and T. Uno. Efficient algorithms for dualizing large-scale hypergraphs. Discrete Applied Mathematics, 170:83--94, 2014. Google ScholarDigital Library
L. Nourine, A. Quilliot, and H. Toussaint. Partial enumeration of minimal transversals of a hypergraph. In CLA, pages 123--134, 2015.Google Scholar
N. Novelli and R. Cicchetti. FUN: an efficient algorithm for mining functional and embedded dependencies. In ICDT, pages 189--203, 2001. Google ScholarDigital Library
T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery: An experimental evaluation of seven algorithms. PVLDB, 8(10):1082--1093, 2015. Google ScholarDigital Library
E. H. M. Pena and E. C. de Almeida. BFASTDC: A bitwise algorithm for mining denial constraints. In DEXA, pages 53--68, 2018.Google ScholarDigital Library
E. H. M. Pena, E. C. de Almeida, and F. Naumann. Discovery of approximate (and exact) denial constraints. PVLDB, 13(3), 2019. Google ScholarDigital Library
J. Rammelaere and F. Geerts. Revisiting conditional functional dependency discovery: Splitting the "c" from the "fd". In ECML/PKDD (2), volume 11052 of Lecture Notes in Computer Science, pages 552--568. Springer, 2018.Google Scholar
S. E. Schaeffer. Graph clustering. Computer science review, 1(1):27--64, 2007. Google ScholarDigital Library
S. A. Vinterbo and A. Öhrn. Minimal approximate hitting sets and rule templates. Int. J. Approx. Reasoning, 25(2):123--143, 2000.Google ScholarCross Ref
C. M. Wyss, C. Giannella, and E. L. Robertson. Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances - extended abstract. In DaWaK, pages 101--110, 2001. Google ScholarDigital Library

Recommendations

Fast approximate denial constraint discovery

We investigate the problem of discovering approximate denial constraints (DCs), for finding DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data cleaning tasks. Different methods have been proposed to address ...
Read More
Discovery of approximate (and exact) denial constraints

Maintaining data consistency is known to be hard. Recent approaches have relied on integrity constraints to deal with the problem - correct and complete constraints naturally work towards data consistency. State-of-the-art data cleaning frameworks have ...
Read More
Discovering denial constraints

Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 10
June 2020
193 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 June 2020
Published in pvldb Volume 13, Issue 10
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 35
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Approximate denial constraints

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Fast approximate denial constraint discovery

Discovery of approximate (and exact) denial constraints

Discovering denial constraints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Approximate denial constraints

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Fast approximate denial constraint discovery

Discovery of approximate (and exact) denial constraints

Discovering denial constraints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media