research-article

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

Authors:
Chul-Ho Lee

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Xin Xu

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Do Young Eun

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer SystemsJune 2012Pages 319–330https://doi.org/10.1145/2254756.2254795

Published:11 June 2012Publication History

SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

Pages 319–330

ABSTRACT

Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings.

References

Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/.Google Scholar
D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. monograph in preparation.Google Scholar
N. Alon, I. Benjamini, E. Lubetzky, and S. Sodin. Non-backtracking random walks mix faster. Communications in Contemporary Mathematics, 9(4):585--603, 2007.Google ScholarCross Ref
R. B. Ash and C. A. Doleans-Dade. Probability and measure theory. Academic Press, second edition, 2000.Google Scholar
K. Avrachenkov, B. Ribeiro, and D. Towsley. Improving random walk estimation accuracy with uniform restarts. In WAW, 2010.Google ScholarCross Ref
F. Bassetti and P. Diaconis. Examples comparing importance sampling and the Metropolis algorithm. Illinois Journal of Mathematics, 50(1):67--91, 2006.Google ScholarCross Ref
P. Berenbrink, C. Cooper, T. R. R. Elsasser, and T. Sauerwald. Speeding up random walks with neighborhood exploration. In ACM SODA, 2010. Google ScholarDigital Library
S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing markov chain on a graph. SIAM Review, 46(4):667--689, 2004. Google ScholarDigital Library
F. Chen, L. Lovasz, and I. Pak. Lifiting markov chains to speed up mixing. In ACM STOC, 1999. Google ScholarDigital Library
P. Diaconis, S. Holmes, and R. M. Neal. Analysis of a nonreversible markov chain sampler. Annals of Applied Probability, 10(3):726--752, 2000.Google ScholarCross Ref
R. Douc and C. P. Robert. A vanilla Rao-Blackwellization of Metropolis-Hastings algorithms. Annals of Statistics}, 39(1):261--277, 2011.Google Scholar
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Practical recommendations on crawling online social networks. IEEE JSAC, 2011.Google Scholar
S. Goel and M. J. Salganik. Respondent-driven sampling as Markov chain Monte Carlo. Statistics in Medicine, 28(17):2202--2229, 2009.Google ScholarCross Ref
P. J. Green and A. Mira. Delayed rejection in reversible jump metropolis-hastings. Biometrika, 88(4):1035--1053, 2001.Google ScholarCross Ref
M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns. In VLDB, 2009.Google ScholarDigital Library
W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97--109, 1970.Google ScholarCross Ref
S. Ikeda, I. Kubo, and M. Yamashita. The hitting and cover times of random walks on finite graphs using local degree information. Theoretical Computer Science, 410(1):94--100, January 2009. Google ScholarDigital Library
G. L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299--320, 2004.Google ScholarCross Ref
K. Jung and D. Shah. Fast gossip via nonreversible random walk. In IEEE ITW, 2006.Google Scholar
M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In ACM SIGMETRICS, 2011. Google ScholarDigital Library
C.-H. Lee, X. Xu, and D. Y. Eun. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. Technical report, Dept. of ECE, North Carolina State University, April 2012.Google Scholar
D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American Mathematical Society, 2009.Google Scholar
W. Li and H. Dai. Accelerating distributed consensus via lifting markov chains. In IEEE ISIT, 2007.Google ScholarCross Ref
S. Malefaki and G. Iliopoulos. On convergence of properly weighted samples to the target distribution. Journal of Statistical Planning and Inference, 138(4):1210--1225, 2008.Google ScholarCross Ref
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics}, 21(6):1087--1092, 1953.Google Scholar
A. Mira. Ordering and improving the performance of monte carlo markov chains. Statistical Science, 16(4):340--350, 2001.Google ScholarCross Ref
R. M. Neal. Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical report, No. 0406, Dept. of Statistics, University of Toronto, July 2004.Google Scholar
P. H. Peskun. Optimum monte-carlo sampling using markov chains. Biometrika, 60:607--612, 1973.Google ScholarCross Ref
A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In INFOCOM, 2009.Google ScholarCross Ref
B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In IMC, 2010. Google ScholarDigital Library
G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probability Surveys, 1:20--71, 2004.Google ScholarCross Ref
S. M. Ross. Stochastic processes. John Wiley & Son, second edition, 1996.Google Scholar
M. J. Salganik and D. D. Heckathorn. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34:193--239, 2004.Google ScholarCross Ref
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Transactions on Networking, 17(2):377--390, 2009. Google ScholarDigital Library
S. S. Wu and M. T. Wells. An extension of the metropolis algorithm. Communications in Statistics -- Theory and Methods, 34(3):585--596, 2005.Google Scholar

Index Terms

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling
1. Mathematics of computing
  1. Probability and statistics

Recommendations

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling
Performance evaluation review

Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-...
Read More
Estimating and sampling graphs with multidimensional random walks
IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too ...
Read More
Fast distributed random walks
PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing

Performing random walks in networks is a fundamental primitive that has found applications in many areas of computer science, including distributed computing. In this paper, we focus on the problem of performing random walks efficiently in a distributed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
June 2012
450 pages
ISBN:9781450310970
DOI:10.1145/2254756
General Chair:
Peter Harrison
Imperial College London, United Kingdom
,
Program Chairs:
Martin Arlitt
HP Labs, USA and University of Calgary, Canada
,
Giuliano Casale
Imperial College London, United Kingdom
ACM SIGMETRICS Performance Evaluation Review Volume 40, Issue 1
Performance evaluation review
June 2012
433 pages
ISSN:0163-5999
DOI:10.1145/2318857
Issue’s Table of Contents
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asymptotic variance
non-reversible markov chains
random walks
unbiased graph sampling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate459of2,691submissions,17%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 116
  Total Citations
  View Citations
- 849
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

Estimating and sampling graphs with multidimensional random walks

Fast distributed random walks