ABSTRACT
We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans. Inform. Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution P on Σn and a tree T on n nodes, we say T is an ε-approximate tree for P if there is a T-structured distribution Q such that D(P || Q) is at most ε more than the best possible tree-structured distribution for P. We show that if P itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O(|Σ|3 nε−1) i.i.d. samples outputs an ε-approximate tree for P with constant probability. In contrast, for a general P (which may not be tree-structured), Ω(n2ε−2) samples are necessary to find an ε-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): we prove that for three random variables X,Y,Z each over Σ, testing if I(X; Y ∣ Z) is 0 or ≥ ε is possible with O(|Σ|3/ε) samples. Finally, we show that for a specific tree T, with O(|Σ|2nε−1) samples from a distribution P over Σn, one can efficiently learn the closest T-structured distribution in KL divergence by applying the add-1 estimator at each node.
Supplemental Material
Available for Download
Appendix.
- Pieter Abbeel, Daphne Koller, and Andrew Y Ng. 2006. Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, Aug, 2006. Pages 1743–1788.Google Scholar
- Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. 2015. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems.Google Scholar
- Anima Anandkumar, Daniel J Hsu, Furong Huang, and Sham M Kakade. 2012. Learning mixtures of tree graphical models. In Advances in Neural Information Processing Systems. Pages 1052–1060.Google Scholar
- András Antos and Ioannis Kontoyiannis. 2001. Convergence properties of functional estimates for discrete distributions. Random Structures & Algorithms, 19, 3-4, 2001. Pages 163–193.Google ScholarDigital Library
- Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. 2001. Testing Random Variables for Independence and Identity. In 42nd Annual Symposium on Foundations of Computer Science.Google Scholar
- Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, and N. V. Vinodchandran. 2020. Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning. CoRR, abs/2002.05378, 2020. arxiv:2002.05378Google Scholar
- Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. Pages 771–782.Google ScholarDigital Library
- Guy Bresler and Mina Karzand. 2020. Learning a tree-structured Ising model in order to make predictions. Ann. Statist., 48, 2, 4, 2020. Pages 713–737. https://doi.org/10.1214/19-AOS1808 Google ScholarCross Ref
- G. Bresler and M. Karzand. 2020. Minimax Prediction in Tree Ising Models. In 2020 IEEE International Symposium on Information Theory (ISIT). Pages 1325–1330. https://doi.org/10.1109/ISIT44484.2020.9174341 Google ScholarDigital Library
- Guy Bresler, Elchanan Mossel, and Allan Sly. 2013. Reconstruction of Markov random fields from samples: Some observations and algorithms. SIAM J. Comput., 42, 2, 2013. Pages 563–578.Google ScholarDigital Library
- Johannes Brustle, Yang Cai, and Constantinos Daskalakis. 2020. Multi-item mechanisms without item-independence: Learnability via robustness. In Proceedings of the 21st ACM Conference on Economics and Computation. Pages 715–761.Google ScholarDigital Library
- Clément L. Canonne. 2015. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22, 2015. Pages 63. http://eccc.hpi-web.de/report/2015/063Google Scholar
- Clément L. Canonne. 2020.Google Scholar
- Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2018. Testing conditional independence of discrete distributions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018. ACM. Pages 735–748. https://doi.org/10.1145/3188745.3188756 Google ScholarDigital Library
- Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2020. Testing Bayesian Networks. IEEE Trans. Inf. Theory, 66, 5, 2020. Pages 3132–3170. https://doi.org/10.1109/TIT.2020.2971625 Google ScholarCross Ref
- Anton Chechetka and Carlos Guestrin. 2008. Efficient principled learning of thin junction trees. In Advances in Neural Information Processing Systems. Pages 273–280.Google Scholar
- David Maxwell Chickering, Doug Fisher, and Hans-Joachim Lenz. 1995. Learning Bayesian Networks is NP-Complete. In Learning from Data - Fifth International Workshop on Artificial Intelligence and Statistics, AISTATS.Google Scholar
- C Chow and T Wagner. 1973. Consistency of an estimate of tree-dependent probability distributions. IEEE Transactions on Information Theory, 19, 3, 1973. Pages 369–371.Google ScholarDigital Library
- C. K. Chow and C. N. Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory, 14, 3, 1968. Pages 462–467. https://doi.org/10.1109/TIT.1968.1054142 Google ScholarDigital Library
- Paul Dagum and Michael Luby. 1997. An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1, 1997. Pages 1–28.Google ScholarDigital Library
- Sanjoy Dasgupta. 1997. The Sample Complexity of Learning Fixed-Structure Bayesian Networks. Mach. Learn., 29, 2-3, 1997. Pages 165–180. https://doi.org/10.1023/A:1007417612269 Google ScholarDigital Library
- Sanjoy Dasgupta. 2013. Learning Polytrees. CoRR, abs/1301.6688, 2013. arxiv:1301.6688Google Scholar
- Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2019. Testing Ising Models. IEEE Trans. Inf. Theory, 65, 11, 2019. Pages 6829–6852. https://doi.org/10.1109/TIT.2019.2932255 Google ScholarCross Ref
- Constantinos Daskalakis, Qinxuan Pan, Satyen Kale, and Ohad Shamir. 2017. Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. In Proceedings of the 30th Conference on Learning Theory, COLT.Google Scholar
- Constantinos Daskalakis and Qinxuan Pan. 2020. Tree-structured Ising models can be learned efficiently. CoRR, abs/2010.14864, 2020. arxiv:2010.14864Google Scholar
- Luc Devroye, Abbas Mehrabian, and Tommy Reddad. 2020. The minimax learning rates of normal and Ising undirected graphical models. Electronic Journal of Statistics, 14, 1, 2020. Pages 2338–2361.Google ScholarCross Ref
- Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, John Peebles, and Eric Price. 2021. Optimal Testing of Discrete Distributions with High Probability. STOC, 2021.Google ScholarDigital Library
- Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. 2018. Sample-optimal identity testing with high probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018).Google Scholar
- Ilias Diakonikolas, Daniel M. Kane, and Irit Dinur. 2016. A New Approach for Testing Properties of Discrete Distributions. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS.Google ScholarCross Ref
- Surbhi Goel, Silvia Chiappa, and Roberto Calandra. 2020. Learning Ising and Potts Models with Latent Variables. Proceedings of Machine Learning Research. 108, PMLR. Pages 3557–3566.Google Scholar
- Oded Goldreich. 2017. Introduction to Property Testing. Cambridge University Press. isbn:978-1-107-19405-2 https://doi.org/10.1017/9781108135252 Google ScholarCross Ref
- Oded Goldreich and Dana Ron. 2011. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation. Springer. Pages 68–75.Google Scholar
- Klaus-U Höffgen. 1993. Learning and robust learning of product distributions. In Proceedings of the sixth annual conference on Computational learning theory. Pages 77–83.Google ScholarDigital Library
- Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, and Ananda Theertha Suresh. 2015. On Learning Distributions from their Samples. In Proceedings of The 28th Conference on Learning Theory, COLT 2015.Google Scholar
- David R. Karger, Nathan Srebro, and S. Rao Kosaraju. 2001. Learning Markov networks: maximum bounded tree-width graphs. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA. ACM/SIAM. Pages 392–401. http://dl.acm.org/citation.cfm?id=365411.365486Google Scholar
- Michael J Kearns and Robert E Schapire. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. System Sci., 48, 3, 1994. Pages 464–497.Google ScholarDigital Library
- Michael J Kearns, Robert E Schapire, and Linda M Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2-3, 1994. Pages 115–141.Google ScholarDigital Library
- Adam R. Klivans, Raghu Meka, and Chris Umans. 2017. Learning Graphical Models Using Multiplicative Weights. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS.Google Scholar
- Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.Google ScholarDigital Library
- Frank R Kschischang, Brendan J Frey, and H-A Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47, 2, 2001. Pages 498–519.Google ScholarDigital Library
- PS Laplace. 1995. Philosophical Essays on Probabilities, from 5th French edition published 1825, translated AI Dale.Google Scholar
- Steffen L Lauritzen. 1996. Graphical models. 17, Clarendon Press.Google Scholar
- Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, and Larry Wasserman. 2011. Forest density estimation. The Journal of Machine Learning Research, 12, 2011. Pages 907–951.Google Scholar
- Christopher Meek. 2001. Finding a path is harder than finding a tree. Journal of Artificial Intelligence Research, 15, 2001. Pages 383–389.Google ScholarCross Ref
- Marina Meila, Ivan Bratko, and Saso Dzeroski. 1999. An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27 - 30, 1999. Morgan Kaufmann. Pages 249–257.Google Scholar
- Marina Meila and Michael I Jordan. 2000. Learning with mixtures of trees. Journal of Machine Learning Research, 1, Oct, 2000. Pages 1–48.Google Scholar
- Mukund Narasimhan and Jeff Bilmes. 2004. PAC-learning bounded tree-width graphical models. In Proceedings of the 20th conference on Uncertainty in artificial intelligence. Pages 410–417.Google ScholarDigital Library
- Liam Paninski. 2003. Estimation of entropy and mutual information. Neural computation, 15, 6, 2003. Pages 1191–1253.Google Scholar
- Ronitt Rubinfeld. 2012. Taming big probability distributions. ACM Crossroads, 19, 1, 2012. Pages 24–28. https://doi.org/10.1145/2331042.2331052 Google ScholarDigital Library
- Nathan Srebro. 2003. Maximum likelihood bounded tree-width Markov networks. Artificial intelligence, 143, 1, 2003. Pages 123–138.Google Scholar
- Vincent YF Tan, Animashree Anandkumar, Lang Tong, and Alan S Willsky. 2011. A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Transactions on Information Theory, 57, 3, 2011. Pages 1714–1735.Google ScholarDigital Library
- Anshoo Tandon, Vincent YF Tan, and Shiyao Zhu. 2020. Exact Asymptotics for Learning Tree-Structured Graphical Models with Side Information: Noiseless and Noisy Samples. arXiv preprint arXiv:2005.04354, 2020.Google Scholar
- Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11, 1984. Pages 1134–1142.Google ScholarDigital Library
- Thomas Verma and Judea Pearl. 1990. Equivalence and synthesis of causal models. In UAI '90: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 1990. Elsevier. Pages 255–270. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1918&proceeding_id=1006Google Scholar
- Martin J Wainwright. 2006. Estimating the"Wrong" Graphical Model: Benefits in the Computation-Limited Setting. Journal of Machine Learning Research, 7, Sep, 2006. Pages 1829–1859.Google Scholar
- Martin J Wainwright and Michael Irwin Jordan. 2008. Graphical models, exponential families, and variational inference. Now Publishers Inc.Google Scholar
- Rui Wu, R Srikant, and Jian Ni. 2013. Learning loosely connected Markov random fields. Stochastic Systems, 3, 2, 2013. Pages 362–404.Google ScholarCross Ref
- Shanshan Wu, Sujay Sanghavi, and Alexandros G Dimakis. 2019. Sparse logistic regression learns all discrete pairwise graphical models. In Advances in Neural Information Processing Systems. Pages 8071–8081.Google Scholar
Index Terms
- Near-optimal learning of tree-structured distributions by Chow-Liu
Recommendations
Property Testing of Joint Distributions using Conditional Samples
In this article, we consider the problem of testing properties of joint distributions under the Conditional Sampling framework. In the standard sampling model, sample complexity of testing properties of joint distributions are exponential in the ...
Testing monotone high-dimensional distributions
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computingA monotone distribution P over a (partially) ordered domain has P(y) ≥ P(x) if y ≥ x in the order. We study several natural problems of testing properties of monotone distributions over the n-dimensional Boolean cube, given access to random draws from ...
Testing Distributional Assumptions of Learning Algorithms
STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of ComputingThere are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be ...
Comments