skip to main content
research-article
Public Access

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Published:15 April 2017Publication History
Skip Abstract Section

Abstract

We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, for example, moments-based methods, local search, standard iterative methods for convex optimization, MCMC, and simulated annealing, can be implemented in this framework. Our framework is based on, and generalizes, the statistical query model in learning theory [Kearns 1998].

Our main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n1/2 − δ) for any constant δ > 0. The assumed hardness of variants of these problems has been used to prove hardness of several other problems and as a guarantee for security in cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.

References

  1. N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. 2007. Testing k-wise and almost k-wise independence. In STOC. 496--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Noga Alon, Michael Krivelevich, and Benny Sudakov. 1998. Finding a large hidden clique in a random graph. In SODA. 594--598. Google ScholarGoogle ScholarCross RefCross Ref
  3. Brendan P. W. Ames and Stephen A. Vavasis. 2011. Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129, 1 (2011), 69--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Benny Applebaum, Boaz Barak, and Avi Wigderson. 2010. Public-key cryptography from different assumptions. In STOC. 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. 2010. Computational complexity and information asymmetry in financial products (extended abstract). In ICS. 49--65.Google ScholarGoogle Scholar
  6. P. Bartlett and S. Mendelson. 2002. Rademacher and gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res. 3 (2002), 463--482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alexandre Belloni, Robert M. Freund, and Santosh Vempala. 2009. An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res. 34, 3 (2009), 621--641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shai Ben-David and Eli Dichterman. 1998. Learning with restricted focus of attention. J. Comput. Syst. Sci. 56, 3 (1998), 277--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Quentin Berthet and Philippe Rigollet. 2013. Complexity theoretic lower bounds for sparse principal component detection. In COLT. 1046--1066.Google ScholarGoogle Scholar
  10. Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: An o(n1/4) approximation for densest k-subgraph. In STOC. 201--210.Google ScholarGoogle Scholar
  11. Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan, Venkatesan Guruswami, and Yuan Zhou. 2012. Polynomial integrality gaps for strong SDP relaxations of densest k-subgraph. In SODA. 388--405.Google ScholarGoogle Scholar
  12. A. Blum, C. Dwork, F. McSherry, and K. Nissim. 2005. Practical privacy: The SuLQ framework. In PODS. 128--138.Google ScholarGoogle Scholar
  13. Avrim Blum, Alan M. Frieze, Ravi Kannan, and Santosh Vempala. 1998. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22, 1/2 (1998), 35--52. Google ScholarGoogle ScholarCross RefCross Ref
  14. Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and Steven Rudich. 1994. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In STOC. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guy Bresler, David Gamarnik, and Devavrat Shah. 2014. Structure learning of antiferromagnetic Ising models. In NIPS. 2852--2860.Google ScholarGoogle Scholar
  16. S. Brubaker and S. Vempala. 2009. Random tensors and planted cliques. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Vol. 5687. 406--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. T. Cai, T. Liang, and A. Rakhlin. 2015. Computational and statistical boundaries for submatrix localization in a large noisy matrix. ArXiv E-prints (Feb. 2015).Google ScholarGoogle Scholar
  18. C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. 2006. Map-reduce for machine learning on multicore. In NIPS. 281--288.Google ScholarGoogle Scholar
  19. Amin Coja-Oghlan. 2010. Graph partitioning via adaptive spectral techniques. Combin. ProbabComput. 19, 2 (2010), 227--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Dekel, O. Gurel-Gurevich, and Y. Peres. 2011. Finding hidden cliques in linear time with high probability. In ANALCO. 67--75. Google ScholarGoogle ScholarCross RefCross Ref
  21. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yash Deshpande and Andrea Montanari. 2015a. Finding hidden cliques of size N/e in nearly linear time. Found. Comput. Math. 15, 4 (Aug. 2015), 1069--1128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yash Deshpande and Andrea Montanari. 2015b. Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In COLT. 523--562.Google ScholarGoogle Scholar
  24. Shaddin Dughmi. 2014. On the hardness of signaling. In FOCS. 354--363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John Dunagan and Santosh Vempala. 2008. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114, 1 (2008), 101--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Uriel Feige. 2002. Relations between average case complexity and approximation complexity. In IEEE Conference on Computational Complexity. 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. U. Feige and R. Krauthgamer. 2000. Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algor. 16, 2 (2000), 195--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Uriel Feige and Robert Krauthgamer. 2003. The probable value of the Lovász--Schrijver relaxations for maximum independent set. SICOMP 32, 2 (2003), 345--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. U. Feige and D. Ron. 2010. Finding hidden cliques in linear time. In AofA. 189--204.Google ScholarGoogle Scholar
  30. V. Feldman. 2008. Evolvability from learning algorithms. In STOC. 619--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Feldman. 2012. A complete characterization of statistical query learning with applications to evolvability. J. Comput. Syst. Sci. 78, 5 (2012), 1444--1459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vitaly Feldman. 2014. Open problem: The statistical query complexity of learning sparse halfspaces. In COLT. 1283--1289.Google ScholarGoogle Scholar
  33. Vitaly Feldman. 2016. A general characterization of the statistical query complexity. CoRR abs/1608.02198 (2016). Retrieved from http://arxiv.org/abs/1608.02198.Google ScholarGoogle Scholar
  34. Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. 2015. Statistical query algorithms for stochastic convex optimization. CoRR abs/1512.09170 (2015). Extended abstract in SODA 2017.Google ScholarGoogle Scholar
  35. Vitaly Feldman, Will Perkins, and Santosh Vempala. 2013. On the complexity of random satisfiability problems with planted solutions. CoRR abs/1311.4821 (2013). Extended abstract in STOC 2015.Google ScholarGoogle Scholar
  36. Alan M. Frieze and Ravi Kannan. 2008. A new approach to the planted clique problem. In FSTTCS. 187--198.Google ScholarGoogle Scholar
  37. C. Gao, Z. Ma, and H. H. Zhou. 2014. Sparse CCA: Adaptive estimation and computational barriers. ArXiv E-prints (Sept. 2014).Google ScholarGoogle Scholar
  38. A. E. Gelfand and A. F. M. Smith. 1990. Sampling based approaches to calculating marginal densities. J. Am. Statist. Assoc. 85 (1990), 398--409. Google ScholarGoogle ScholarCross RefCross Ref
  39. Bruce E. Hajek, Yihong Wu, and Jiaming Xu. 2015. Computational lower bounds for community detection on random graphs. In COLT. 899--928. Retrieved from http://jmlr.org/proceedings/papers/v40/Hajek15.html.Google ScholarGoogle Scholar
  40. Johan Håstad. 2001. Some optimal inapproximability results. J. ACM 48 (July 2001), 798--859. Issue 4.Google ScholarGoogle Scholar
  41. W. K. Hastings. 1970. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 1 (1970), 97--109. Google ScholarGoogle ScholarCross RefCross Ref
  42. Elad Hazan and Robert Krauthgamer. 2011. How hard is it to approximate the best nash equilibrium? SIAM J. Comput. 40, 1 (2011), 79--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mark Jerrum. 1992. Large cliques elude the metropolis process. Rand. Struct. Algor. 3, 4 (1992), 347--360. Google ScholarGoogle ScholarCross RefCross Ref
  44. Ari Juels and Marcus Peinado. 2000. Hiding cliques for cryptographic security. Des. Codes Cryptogr. 20, 3 (2000), 269--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ravi Kannan. 2008. Personal communication.Google ScholarGoogle Scholar
  46. R. Karp. 1979. Probabilistic analysis of graph-theoretic algorithms. In Proceedings of Computer Science and Statistics 12th Annual Symposium on the Interface. 173.Google ScholarGoogle Scholar
  47. Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3 (June 2011), 793--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Kearns. 1998. Efficient noise-tolerant Learning from statistical queries. J. ACM 45, 6 (1998), 983--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Subhash Khot. 2004. Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. In FOCS. 136--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Scott Kirkpatrick, D. Gelatt Jr., and Mario P. Vecchi. 1983. Optimization by simmulated annealing. Science 220, 4598 (1983), 671--680.Google ScholarGoogle Scholar
  51. Ludek Kucera. 1995. Expected complexity of graph partitioning problems. Discr. Appl. Math. 57, 2--3 (1995), 193--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zongming Ma and Yihong Wu. 2015. Computational barriers in minimax submatrix detection. Annals of Statistics 43, 3 (2015), 1089--1116. Google ScholarGoogle ScholarCross RefCross Ref
  53. F. McSherry. 2001. Spectral partitioning of random graphs. In FOCS. 529--537. Google ScholarGoogle ScholarCross RefCross Ref
  54. R. Meka, A. Potechin, and A. Wigderson. 2015. Sum-of-squares lower bounds for planted clique. In STOC. 87--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953), 1087--1092. Google ScholarGoogle ScholarCross RefCross Ref
  56. L. Minder and D. Vilenchik. 2009. Small clique detection and approximate Nash equilibria. 5687 (2009), 673--685.Google ScholarGoogle Scholar
  57. K. Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50, 302 (1900), 157--175. Google ScholarGoogle ScholarCross RefCross Ref
  58. Bart Selman, Henry Kautz, and Bram Cohen. 1995. Local search strategies for satisfiability testing. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 521--532.Google ScholarGoogle Scholar
  59. R. Servedio. 2000. Computational sample complexity and attribute-efficient learning. J. Comput. Syst. Sci. 60, 1 (2000), 161--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jacob Steinhardt and John C. Duchi. 2015. Minimax rates for memory-bounded sparse linear regression. In COLT. 1564--1587. Retrieved from http://jmlr.org/proceedings/papers/v40/Steinhardt15.html.Google ScholarGoogle Scholar
  61. J. Steinhardt, G. Valiant, and S. Wager. 2016. Memory, communication, and statistical queries. In COLT. 1490--1516.Google ScholarGoogle Scholar
  62. Balázs Szörényi. 2009. Characterizing statistical query learning: Simplified notions and proofs. In ALT. 186--200.Google ScholarGoogle Scholar
  63. M. Tanner and W. Wong. 1987. The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Stat. Assoc. 82 (1987), 528--550. Google ScholarGoogle ScholarCross RefCross Ref
  64. Leslie G. Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134--1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. V. Vapnik and A. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 2 (1971), 264--280. Google ScholarGoogle ScholarCross RefCross Ref
  66. V. Černý. 1985. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. J. Optim. Theory Appl. 45, 1 (Jan. 1985), 41--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. T. Wang, Q. Berthet, and R. J. Samworth. 2014. Statistical and computational trade-offs in estimation of sparse principal components. ArXiv E-prints (Aug. 2014).Google ScholarGoogle Scholar
  68. Ke Yang. 2001. On learning correlated boolean functions using statistical queries. In ALT. 59--76.Google ScholarGoogle Scholar
  69. Ke Yang. 2005. New lower bounds for statistical query learning. J. Comput. Syst. Sci. 70, 4 (2005), 485--509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Andrew Yao. 1977. Probabilistic computations: Toward a unified measure of complexity. In FOCS. 222--227.Google ScholarGoogle Scholar
  71. Yuchen Zhang, John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In NIPS. 2328--2336.Google ScholarGoogle Scholar

Index Terms

  1. Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of the ACM
            Journal of the ACM  Volume 64, Issue 2
            April 2017
            277 pages
            ISSN:0004-5411
            EISSN:1557-735X
            DOI:10.1145/3080497
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 April 2017
            • Accepted: 1 January 2017
            • Revised: 1 August 2016
            • Received: 1 June 2015
            Published in jacm Volume 64, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader