skip to main content
10.1145/3406325.3451066acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open Access

Near-optimal learning of tree-structured distributions by Chow-Liu

Published:15 June 2021Publication History

ABSTRACT

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans. Inform. Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution P on Σn and a tree T on n nodes, we say T is an ε-approximate tree for P if there is a T-structured distribution Q such that D(P || Q) is at most ε more than the best possible tree-structured distribution for P. We show that if P itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O(|Σ|3 nε−1) i.i.d. samples outputs an ε-approximate tree for P with constant probability. In contrast, for a general P (which may not be tree-structured), Ω(n2ε−2) samples are necessary to find an ε-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): we prove that for three random variables X,Y,Z each over Σ, testing if I(X; YZ) is 0 or ≥ ε is possible with O(|Σ|3/ε) samples. Finally, we show that for a specific tree T, with O(|Σ|2nε−1) samples from a distribution P over Σn, one can efficiently learn the closest T-structured distribution in KL divergence by applying the add-1 estimator at each node.

Skip Supplemental Material Section

Supplemental Material

References

  1. Pieter Abbeel, Daphne Koller, and Andrew Y Ng. 2006. Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, Aug, 2006. Pages 1743–1788.Google ScholarGoogle Scholar
  2. Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. 2015. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  3. Anima Anandkumar, Daniel J Hsu, Furong Huang, and Sham M Kakade. 2012. Learning mixtures of tree graphical models. In Advances in Neural Information Processing Systems. Pages 1052–1060.Google ScholarGoogle Scholar
  4. András Antos and Ioannis Kontoyiannis. 2001. Convergence properties of functional estimates for discrete distributions. Random Structures & Algorithms, 19, 3-4, 2001. Pages 163–193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. 2001. Testing Random Variables for Independence and Identity. In 42nd Annual Symposium on Foundations of Computer Science.Google ScholarGoogle Scholar
  6. Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, and N. V. Vinodchandran. 2020. Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning. CoRR, abs/2002.05378, 2020. arxiv:2002.05378Google ScholarGoogle Scholar
  7. Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. Pages 771–782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guy Bresler and Mina Karzand. 2020. Learning a tree-structured Ising model in order to make predictions. Ann. Statist., 48, 2, 4, 2020. Pages 713–737. https://doi.org/10.1214/19-AOS1808 Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Bresler and M. Karzand. 2020. Minimax Prediction in Tree Ising Models. In 2020 IEEE International Symposium on Information Theory (ISIT). Pages 1325–1330. https://doi.org/10.1109/ISIT44484.2020.9174341 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guy Bresler, Elchanan Mossel, and Allan Sly. 2013. Reconstruction of Markov random fields from samples: Some observations and algorithms. SIAM J. Comput., 42, 2, 2013. Pages 563–578.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Johannes Brustle, Yang Cai, and Constantinos Daskalakis. 2020. Multi-item mechanisms without item-independence: Learnability via robustness. In Proceedings of the 21st ACM Conference on Economics and Computation. Pages 715–761.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clément L. Canonne. 2015. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22, 2015. Pages 63. http://eccc.hpi-web.de/report/2015/063Google ScholarGoogle Scholar
  13. Clément L. Canonne. 2020.Google ScholarGoogle Scholar
  14. Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2018. Testing conditional independence of discrete distributions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018. ACM. Pages 735–748. https://doi.org/10.1145/3188745.3188756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2020. Testing Bayesian Networks. IEEE Trans. Inf. Theory, 66, 5, 2020. Pages 3132–3170. https://doi.org/10.1109/TIT.2020.2971625 Google ScholarGoogle ScholarCross RefCross Ref
  16. Anton Chechetka and Carlos Guestrin. 2008. Efficient principled learning of thin junction trees. In Advances in Neural Information Processing Systems. Pages 273–280.Google ScholarGoogle Scholar
  17. David Maxwell Chickering, Doug Fisher, and Hans-Joachim Lenz. 1995. Learning Bayesian Networks is NP-Complete. In Learning from Data - Fifth International Workshop on Artificial Intelligence and Statistics, AISTATS.Google ScholarGoogle Scholar
  18. C Chow and T Wagner. 1973. Consistency of an estimate of tree-dependent probability distributions. IEEE Transactions on Information Theory, 19, 3, 1973. Pages 369–371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. K. Chow and C. N. Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory, 14, 3, 1968. Pages 462–467. https://doi.org/10.1109/TIT.1968.1054142 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Paul Dagum and Michael Luby. 1997. An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1, 1997. Pages 1–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sanjoy Dasgupta. 1997. The Sample Complexity of Learning Fixed-Structure Bayesian Networks. Mach. Learn., 29, 2-3, 1997. Pages 165–180. https://doi.org/10.1023/A:1007417612269 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sanjoy Dasgupta. 2013. Learning Polytrees. CoRR, abs/1301.6688, 2013. arxiv:1301.6688Google ScholarGoogle Scholar
  23. Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2019. Testing Ising Models. IEEE Trans. Inf. Theory, 65, 11, 2019. Pages 6829–6852. https://doi.org/10.1109/TIT.2019.2932255 Google ScholarGoogle ScholarCross RefCross Ref
  24. Constantinos Daskalakis, Qinxuan Pan, Satyen Kale, and Ohad Shamir. 2017. Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. In Proceedings of the 30th Conference on Learning Theory, COLT.Google ScholarGoogle Scholar
  25. Constantinos Daskalakis and Qinxuan Pan. 2020. Tree-structured Ising models can be learned efficiently. CoRR, abs/2010.14864, 2020. arxiv:2010.14864Google ScholarGoogle Scholar
  26. Luc Devroye, Abbas Mehrabian, and Tommy Reddad. 2020. The minimax learning rates of normal and Ising undirected graphical models. Electronic Journal of Statistics, 14, 1, 2020. Pages 2338–2361.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, John Peebles, and Eric Price. 2021. Optimal Testing of Discrete Distributions with High Probability. STOC, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. 2018. Sample-optimal identity testing with high probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018).Google ScholarGoogle Scholar
  29. Ilias Diakonikolas, Daniel M. Kane, and Irit Dinur. 2016. A New Approach for Testing Properties of Discrete Distributions. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS.Google ScholarGoogle ScholarCross RefCross Ref
  30. Surbhi Goel, Silvia Chiappa, and Roberto Calandra. 2020. Learning Ising and Potts Models with Latent Variables. Proceedings of Machine Learning Research. 108, PMLR. Pages 3557–3566.Google ScholarGoogle Scholar
  31. Oded Goldreich. 2017. Introduction to Property Testing. Cambridge University Press. isbn:978-1-107-19405-2 https://doi.org/10.1017/9781108135252 Google ScholarGoogle ScholarCross RefCross Ref
  32. Oded Goldreich and Dana Ron. 2011. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation. Springer. Pages 68–75.Google ScholarGoogle Scholar
  33. Klaus-U Höffgen. 1993. Learning and robust learning of product distributions. In Proceedings of the sixth annual conference on Computational learning theory. Pages 77–83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, and Ananda Theertha Suresh. 2015. On Learning Distributions from their Samples. In Proceedings of The 28th Conference on Learning Theory, COLT 2015.Google ScholarGoogle Scholar
  35. David R. Karger, Nathan Srebro, and S. Rao Kosaraju. 2001. Learning Markov networks: maximum bounded tree-width graphs. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA. ACM/SIAM. Pages 392–401. http://dl.acm.org/citation.cfm?id=365411.365486Google ScholarGoogle Scholar
  36. Michael J Kearns and Robert E Schapire. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. System Sci., 48, 3, 1994. Pages 464–497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Michael J Kearns, Robert E Schapire, and Linda M Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2-3, 1994. Pages 115–141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Adam R. Klivans, Raghu Meka, and Chris Umans. 2017. Learning Graphical Models Using Multiplicative Weights. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS.Google ScholarGoogle Scholar
  39. Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Frank R Kschischang, Brendan J Frey, and H-A Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47, 2, 2001. Pages 498–519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. PS Laplace. 1995. Philosophical Essays on Probabilities, from 5th French edition published 1825, translated AI Dale.Google ScholarGoogle Scholar
  42. Steffen L Lauritzen. 1996. Graphical models. 17, Clarendon Press.Google ScholarGoogle Scholar
  43. Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, and Larry Wasserman. 2011. Forest density estimation. The Journal of Machine Learning Research, 12, 2011. Pages 907–951.Google ScholarGoogle Scholar
  44. Christopher Meek. 2001. Finding a path is harder than finding a tree. Journal of Artificial Intelligence Research, 15, 2001. Pages 383–389.Google ScholarGoogle ScholarCross RefCross Ref
  45. Marina Meila, Ivan Bratko, and Saso Dzeroski. 1999. An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27 - 30, 1999. Morgan Kaufmann. Pages 249–257.Google ScholarGoogle Scholar
  46. Marina Meila and Michael I Jordan. 2000. Learning with mixtures of trees. Journal of Machine Learning Research, 1, Oct, 2000. Pages 1–48.Google ScholarGoogle Scholar
  47. Mukund Narasimhan and Jeff Bilmes. 2004. PAC-learning bounded tree-width graphical models. In Proceedings of the 20th conference on Uncertainty in artificial intelligence. Pages 410–417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Liam Paninski. 2003. Estimation of entropy and mutual information. Neural computation, 15, 6, 2003. Pages 1191–1253.Google ScholarGoogle Scholar
  49. Ronitt Rubinfeld. 2012. Taming big probability distributions. ACM Crossroads, 19, 1, 2012. Pages 24–28. https://doi.org/10.1145/2331042.2331052 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Nathan Srebro. 2003. Maximum likelihood bounded tree-width Markov networks. Artificial intelligence, 143, 1, 2003. Pages 123–138.Google ScholarGoogle Scholar
  51. Vincent YF Tan, Animashree Anandkumar, Lang Tong, and Alan S Willsky. 2011. A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Transactions on Information Theory, 57, 3, 2011. Pages 1714–1735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Anshoo Tandon, Vincent YF Tan, and Shiyao Zhu. 2020. Exact Asymptotics for Learning Tree-Structured Graphical Models with Side Information: Noiseless and Noisy Samples. arXiv preprint arXiv:2005.04354, 2020.Google ScholarGoogle Scholar
  53. Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11, 1984. Pages 1134–1142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Thomas Verma and Judea Pearl. 1990. Equivalence and synthesis of causal models. In UAI '90: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, 1990. Elsevier. Pages 255–270. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1918&proceeding_id=1006Google ScholarGoogle Scholar
  55. Martin J Wainwright. 2006. Estimating the"Wrong" Graphical Model: Benefits in the Computation-Limited Setting. Journal of Machine Learning Research, 7, Sep, 2006. Pages 1829–1859.Google ScholarGoogle Scholar
  56. Martin J Wainwright and Michael Irwin Jordan. 2008. Graphical models, exponential families, and variational inference. Now Publishers Inc.Google ScholarGoogle Scholar
  57. Rui Wu, R Srikant, and Jian Ni. 2013. Learning loosely connected Markov random fields. Stochastic Systems, 3, 2, 2013. Pages 362–404.Google ScholarGoogle ScholarCross RefCross Ref
  58. Shanshan Wu, Sujay Sanghavi, and Alexandros G Dimakis. 2019. Sparse logistic regression learns all discrete pairwise graphical models. In Advances in Neural Information Processing Systems. Pages 8071–8081.Google ScholarGoogle Scholar

Index Terms

  1. Near-optimal learning of tree-structured distributions by Chow-Liu

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader