research-article

Open Access

Near-optimal learning of tree-structured distributions by Chow-Liu

Authors:
Arnab Bhattacharyya

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Sutanu Gayen

National University of Singapore, Singapore

National University of Singapore, Singapore

0000-0003-3300-1627
View Profile

,
Eric Price

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
N. V. Vinodchandran

University of Nebraska-Lincoln, USA

University of Nebraska-Lincoln, USA
View Profile

STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of ComputingJune 2021Pages 147–160https://doi.org/10.1145/3406325.3451066

Published:15 June 2021Publication History

STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing

Pages 147–160

ABSTRACT

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans. Inform. Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution P on Σⁿ and a tree T on n nodes, we say T is an ε-approximate tree for P if there is a T-structured distribution Q such that D(P || Q) is at most ε more than the best possible tree-structured distribution for P. We show that if P itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O(|Σ|³ nε⁻¹) i.i.d. samples outputs an ε-approximate tree for P with constant probability. In contrast, for a general P (which may not be tree-structured), Ω(n²ε⁻²) samples are necessary to find an ε-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): we prove that for three random variables X,Y,Z each over Σ, testing if I(X; Y ∣ Z) is 0 or ≥ ε is possible with O(|Σ|³/ε) samples. Finally, we show that for a specific tree T, with O(|Σ|²nε⁻¹) samples from a distribution P over Σⁿ, one can efficiently learn the closest T-structured distribution in KL divergence by applying the add-1 estimator at each node.

Supplemental Material

Available for Download

zip

stoc21main-p217-p-archive.zip (456.7 KB)

Appendix.

References

Pieter Abbeel, Daphne Koller, and Andrew Y Ng. 2006. Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, Aug, 2006. Pages 1743–1788.Google Scholar
Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. 2015. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems.Google Scholar
Anima Anandkumar, Daniel J Hsu, Furong Huang, and Sham M Kakade. 2012. Learning mixtures of tree graphical models. In Advances in Neural Information Processing Systems. Pages 1052–1060.Google Scholar
András Antos and Ioannis Kontoyiannis. 2001. Convergence properties of functional estimates for discrete distributions. Random Structures & Algorithms, 19, 3-4, 2001. Pages 163–193.Google ScholarDigital Library
Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. 2001. Testing Random Variables for Independence and Identity. In 42nd Annual Symposium on Foundations of Computer Science.Google Scholar
Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, and N. V. Vinodchandran. 2020. Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning. CoRR, abs/2002.05378, 2020. arxiv:2002.05378Google Scholar
Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. Pages 771–782.Google ScholarDigital Library
Guy Bresler and Mina Karzand. 2020. Learning a tree-structured Ising model in order to make predictions. Ann. Statist., 48, 2, 4, 2020. Pages 713–737. https://doi.org/10.1214/19-AOS1808 Google ScholarCross Ref
G. Bresler and M. Karzand. 2020. Minimax Prediction in Tree Ising Models. In 2020 IEEE International Symposium on Information Theory (ISIT). Pages 1325–1330. https://doi.org/10.1109/ISIT44484.2020.9174341 Google ScholarDigital Library
Guy Bresler, Elchanan Mossel, and Allan Sly. 2013. Reconstruction of Markov random fields from samples: Some observations and algorithms. SIAM J. Comput., 42, 2, 2013. Pages 563–578.Google ScholarDigital Library
Johannes Brustle, Yang Cai, and Constantinos Daskalakis. 2020. Multi-item mechanisms without item-independence: Learnability via robustness. In Proceedings of the 21st ACM Conference on Economics and Computation. Pages 715–761.Google ScholarDigital Library
Clément L. Canonne. 2015. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22, 2015. Pages 63. http://eccc.hpi-web.de/report/2015/063Google Scholar
Clément L. Canonne. 2020.Google Scholar
Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2018. Testing conditional independence of discrete distributions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018. ACM. Pages 735–748. https://doi.org/10.1145/3188745.3188756 Google ScholarDigital Library
Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2020. Testing Bayesian Networks. IEEE Trans. Inf. Theory, 66, 5, 2020. Pages 3132–3170. https://doi.org/10.1109/TIT.2020.2971625 Google ScholarCross Ref
Anton Chechetka and Carlos Guestrin. 2008. Efficient principled learning of thin junction trees. In Advances in Neural Information Processing Systems. Pages 273–280.Google Scholar
David Maxwell Chickering, Doug Fisher, and Hans-Joachim Lenz. 1995. Learning Bayesian Networks is NP-Complete. In Learning from Data - Fifth International Workshop on Artificial Intelligence and Statistics, AISTATS.Google Scholar
C Chow and T Wagner. 1973. Consistency of an estimate of tree-dependent probability distributions. IEEE Transactions on Information Theory, 19, 3, 1973. Pages 369–371.Google ScholarDigital Library
C. K. Chow and C. N. Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory, 14, 3, 1968. Pages 462–467. https://doi.org/10.1109/TIT.1968.1054142 Google ScholarDigital Library
Paul Dagum and Michael Luby. 1997. An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1, 1997. Pages 1–28.Google ScholarDigital Library
Sanjoy Dasgupta. 1997. The Sample Complexity of Learning Fixed-Structure Bayesian Networks. Mach. Learn., 29, 2-3, 1997. Pages 165–180. https://doi.org/10.1023/A:1007417612269 Google ScholarDigital Library
Sanjoy Dasgupta. 2013. Learning Polytrees. CoRR, abs/1301.6688, 2013. arxiv:1301.6688Google Scholar
Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2019. Testing Ising Models. IEEE Trans. Inf. Theory, 65, 11, 2019. Pages 6829–6852. https://doi.org/10.1109/TIT.2019.2932255 Google ScholarCross Ref
Constantinos Daskalakis, Qinxuan Pan, Satyen Kale, and Ohad Shamir. 2017. Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. In Proceedings of the 30th Conference on Learning Theory, COLT.Google Scholar
Constantinos Daskalakis and Qinxuan Pan. 2020. Tree-structured Ising models can be learned efficiently. CoRR, abs/2010.14864, 2020. arxiv:2010.14864Google Scholar
Luc Devroye, Abbas Mehrabian, and Tommy Reddad. 2020. The minimax learning rates of normal and Ising undirected graphical models. Electronic Journal of Statistics, 14, 1, 2020. Pages 2338–2361.Google ScholarCross Ref
Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, John Peebles, and Eric Price. 2021. Optimal Testing of Discrete Distributions with High Probability. STOC, 2021.Google ScholarDigital Library
Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. 2018. Sample-optimal identity testing with high probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018).Google Scholar
Ilias Diakonikolas, Daniel M. Kane, and Irit Dinur. 2016. A New Approach for Testing Properties of Discrete Distributions. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS.Google ScholarCross Ref
Surbhi Goel, Silvia Chiappa, and Roberto Calandra. 2020. Learning Ising and Potts Models with Latent Variables. Proceedings of Machine Learning Research. 108, PMLR. Pages 3557–3566.Google Scholar
Oded Goldreich. 2017. Introduction to Property Testing. Cambridge University Press. isbn:978-1-107-19405-2 https://doi.org/10.1017/9781108135252 Google ScholarCross Ref
Oded Goldreich and Dana Ron. 2011. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation. Springer. Pages 68–75.Google Scholar
Klaus-U Höffgen. 1993. Learning and robust learning of product distributions. In Proceedings of the sixth annual conference on Computational learning theory. Pages 77–83.Google ScholarDigital Library
Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, and Ananda Theertha Suresh. 2015. On Learning Distributions from their Samples. In Proceedings of The 28th Conference on Learning Theory, COLT 2015.Google Scholar
David R. Karger, Nathan Srebro, and S. Rao Kosaraju. 2001. Learning Markov networks: maximum bounded tree-width graphs. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA. ACM/SIAM. Pages 392–401. http://dl.acm.org/citation.cfm?id=365411.365486Google Scholar
Michael J Kearns and Robert E Schapire. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. System Sci., 48, 3, 1994. Pages 464–497.Google ScholarDigital Library
Michael J Kearns, Robert E Schapire, and Linda M Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2-3, 1994. Pages 115–141.Google ScholarDigital Library
Adam R. Klivans, Raghu Meka, and Chris Umans. 2017. Learning Graphical Models Using Multiplicative Weights. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS.Google Scholar
Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.Google ScholarDigital Library
Frank R Kschischang, Brendan J Frey, and H-A Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47, 2, 2001. Pages 498–519.Google ScholarDigital Library
PS Laplace. 1995. Philosophical Essays on Probabilities, from 5th French edition published 1825, translated AI Dale.Google Scholar
Steffen L Lauritzen. 1996. Graphical models. 17, Clarendon Press.Google Scholar
Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, and Larry Wasserman. 2011. Forest density estimation. The Journal of Machine Learning Research, 12, 2011. Pages 907–951.Google Scholar
Christopher Meek. 2001. Finding a path is harder than finding a tree. Journal of Artificial Intelligence Research, 15, 2001. Pages 383–389.Google ScholarCross Ref
Marina Meila, Ivan Bratko, and Saso Dzeroski. 1999. An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27 - 30, 1999. Morgan Kaufmann. Pages 249–257.Google Scholar
Marina Meila and Michael I Jordan. 2000. Learning with mixtures of trees. Journal of Machine Learning Research, 1, Oct, 2000. Pages 1–48.Google Scholar
Mukund Narasimhan and Jeff Bilmes. 2004. PAC-learning bounded tree-width graphical models. In Proceedings of the 20th conference on Uncertainty in artificial intelligence. Pages 410–417.Google ScholarDigital Library
Liam Paninski. 2003. Estimation of entropy and mutual information. Neural computation, 15, 6, 2003. Pages 1191–1253.Google Scholar
Ronitt Rubinfeld. 2012. Taming big probability distributions. ACM Crossroads, 19, 1, 2012. Pages 24–28. https://doi.org/10.1145/2331042.2331052 Google ScholarDigital Library
Nathan Srebro. 2003. Maximum likelihood bounded tree-width Markov networks. Artificial intelligence, 143, 1, 2003. Pages 123–138.Google Scholar
Vincent YF Tan, Animashree Anandkumar, Lang Tong, and Alan S Willsky. 2011. A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Transactions on Information Theory, 57, 3, 2011. Pages 1714–1735.Google ScholarDigital Library
Anshoo Tandon, Vincent YF Tan, and Shiyao Zhu. 2020. Exact Asymptotics for Learning Tree-Structured Graphical Models with Side Information: Noiseless and Noisy Samples. arXiv preprint arXiv:2005.04354, 2020.Google Scholar
Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11, 1984. Pages 1134–1142.Google ScholarDigital Library
Thomas Verma and Judea Pearl. 1990. Equivalence and synthesis of causal models. In UAI '90: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 1990. Elsevier. Pages 255–270. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1918&proceeding_id=1006Google Scholar
Martin J Wainwright. 2006. Estimating the"Wrong" Graphical Model: Benefits in the Computation-Limited Setting. Journal of Machine Learning Research, 7, Sep, 2006. Pages 1829–1859.Google Scholar
Martin J Wainwright and Michael Irwin Jordan. 2008. Graphical models, exponential families, and variational inference. Now Publishers Inc.Google Scholar
Rui Wu, R Srikant, and Jian Ni. 2013. Learning loosely connected Markov random fields. Stochastic Systems, 3, 2, 2013. Pages 362–404.Google ScholarCross Ref
Shanshan Wu, Sujay Sanghavi, and Alexandros G Dimakis. 2019. Sparse logistic regression learns all discrete pairwise graphical models. In Advances in Neural Information Processing Systems. Pages 8071–8081.Google Scholar

Index Terms

Near-optimal learning of tree-structured distributions by Chow-Liu
1. Mathematics of computing
  1. Information theory
  2. Probability and statistics
    1. Multivariate statistics
    2. Probabilistic inference problems
      1. Hypothesis testing and confidence interval computation
      2. Maximum likelihood estimation

Recommendations

Property Testing of Joint Distributions using Conditional Samples

In this article, we consider the problem of testing properties of joint distributions under the Conditional Sampling framework. In the standard sampling model, sample complexity of testing properties of joint distributions are exponential in the ...
Read More
Testing monotone high-dimensional distributions
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing

A monotone distribution P over a (partially) ordered domain has P(y) ≥ P(x) if y ≥ x in the order. We study several natural problems of testing properties of monotone distributions over the n-dimensional Boolean cube, given access to random draws from ...
Read More
Testing Distributional Assumptions of Learning Algorithms
STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
June 2021
1797 pages
ISBN:9781450380539
DOI:10.1145/3406325
General Chair:
Samir Khuller
Northwestern University, USA
,
Program Chair:
Virginia Vassilevska Williams
Massachusetts Institute of Technology, USA
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chow Liu tree
distribution testing
high-dimensional statistics
learning theory
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 640
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Near-optimal learning of tree-structured distributions by Chow-Liu

STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Property Testing of Joint Distributions using Conditional Samples

Testing monotone high-dimensional distributions

Testing Distributional Assumptions of Learning Algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Near-optimal learning of tree-structured distributions by Chow-Liu

STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Property Testing of Joint Distributions using Conditional Samples

Testing monotone high-dimensional distributions

Testing Distributional Assumptions of Learning Algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media