skip to main content
10.1145/2254756.2254795acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

Authors Info & Claims
Published:11 June 2012Publication History

ABSTRACT

Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings.

References

  1. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  2. D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. monograph in preparation.Google ScholarGoogle Scholar
  3. N. Alon, I. Benjamini, E. Lubetzky, and S. Sodin. Non-backtracking random walks mix faster. Communications in Contemporary Mathematics, 9(4):585--603, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. B. Ash and C. A. Doleans-Dade. Probability and measure theory. Academic Press, second edition, 2000.Google ScholarGoogle Scholar
  5. K. Avrachenkov, B. Ribeiro, and D. Towsley. Improving random walk estimation accuracy with uniform restarts. In WAW, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  6. F. Bassetti and P. Diaconis. Examples comparing importance sampling and the Metropolis algorithm. Illinois Journal of Mathematics, 50(1):67--91, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Berenbrink, C. Cooper, T. R. R. Elsasser, and T. Sauerwald. Speeding up random walks with neighborhood exploration. In ACM SODA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing markov chain on a graph. SIAM Review, 46(4):667--689, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Chen, L. Lovasz, and I. Pak. Lifiting markov chains to speed up mixing. In ACM STOC, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Diaconis, S. Holmes, and R. M. Neal. Analysis of a nonreversible markov chain sampler. Annals of Applied Probability, 10(3):726--752, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Douc and C. P. Robert. A vanilla Rao-Blackwellization of Metropolis-Hastings algorithms. Annals of Statistics}, 39(1):261--277, 2011.Google ScholarGoogle Scholar
  12. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Practical recommendations on crawling online social networks. IEEE JSAC, 2011.Google ScholarGoogle Scholar
  13. S. Goel and M. J. Salganik. Respondent-driven sampling as Markov chain Monte Carlo. Statistics in Medicine, 28(17):2202--2229, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. J. Green and A. Mira. Delayed rejection in reversible jump metropolis-hastings. Biometrika, 88(4):1035--1053, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns. In VLDB, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97--109, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Ikeda, I. Kubo, and M. Yamashita. The hitting and cover times of random walks on finite graphs using local degree information. Theoretical Computer Science, 410(1):94--100, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299--320, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. K. Jung and D. Shah. Fast gossip via nonreversible random walk. In IEEE ITW, 2006.Google ScholarGoogle Scholar
  20. M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In ACM SIGMETRICS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.-H. Lee, X. Xu, and D. Y. Eun. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. Technical report, Dept. of ECE, North Carolina State University, April 2012.Google ScholarGoogle Scholar
  22. D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American Mathematical Society, 2009.Google ScholarGoogle Scholar
  23. W. Li and H. Dai. Accelerating distributed consensus via lifting markov chains. In IEEE ISIT, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Malefaki and G. Iliopoulos. On convergence of properly weighted samples to the target distribution. Journal of Statistical Planning and Inference, 138(4):1210--1225, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics}, 21(6):1087--1092, 1953.Google ScholarGoogle Scholar
  26. A. Mira. Ordering and improving the performance of monte carlo markov chains. Statistical Science, 16(4):340--350, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. M. Neal. Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical report, No. 0406, Dept. of Statistics, University of Toronto, July 2004.Google ScholarGoogle Scholar
  28. P. H. Peskun. Optimum monte-carlo sampling using markov chains. Biometrika, 60:607--612, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In INFOCOM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  30. B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In IMC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probability Surveys, 1:20--71, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  32. S. M. Ross. Stochastic processes. John Wiley & Son, second edition, 1996.Google ScholarGoogle Scholar
  33. M. J. Salganik and D. D. Heckathorn. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34:193--239, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  34. D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Transactions on Networking, 17(2):377--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. S. Wu and M. T. Wells. An extension of the metropolis algorithm. Communications in Statistics -- Theory and Methods, 34(3):585--596, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
            June 2012
            450 pages
            ISBN:9781450310970
            DOI:10.1145/2254756
            • cover image ACM SIGMETRICS Performance Evaluation Review
              ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
              Performance evaluation review
              June 2012
              433 pages
              ISSN:0163-5999
              DOI:10.1145/2318857
              Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 June 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate459of2,691submissions,17%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader