ABSTRACT
Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings.
- Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/.Google Scholar
- D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. monograph in preparation.Google Scholar
- N. Alon, I. Benjamini, E. Lubetzky, and S. Sodin. Non-backtracking random walks mix faster. Communications in Contemporary Mathematics, 9(4):585--603, 2007.Google ScholarCross Ref
- R. B. Ash and C. A. Doleans-Dade. Probability and measure theory. Academic Press, second edition, 2000.Google Scholar
- K. Avrachenkov, B. Ribeiro, and D. Towsley. Improving random walk estimation accuracy with uniform restarts. In WAW, 2010.Google ScholarCross Ref
- F. Bassetti and P. Diaconis. Examples comparing importance sampling and the Metropolis algorithm. Illinois Journal of Mathematics, 50(1):67--91, 2006.Google ScholarCross Ref
- P. Berenbrink, C. Cooper, T. R. R. Elsasser, and T. Sauerwald. Speeding up random walks with neighborhood exploration. In ACM SODA, 2010. Google ScholarDigital Library
- S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing markov chain on a graph. SIAM Review, 46(4):667--689, 2004. Google ScholarDigital Library
- F. Chen, L. Lovasz, and I. Pak. Lifiting markov chains to speed up mixing. In ACM STOC, 1999. Google ScholarDigital Library
- P. Diaconis, S. Holmes, and R. M. Neal. Analysis of a nonreversible markov chain sampler. Annals of Applied Probability, 10(3):726--752, 2000.Google ScholarCross Ref
- R. Douc and C. P. Robert. A vanilla Rao-Blackwellization of Metropolis-Hastings algorithms. Annals of Statistics}, 39(1):261--277, 2011.Google Scholar
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Practical recommendations on crawling online social networks. IEEE JSAC, 2011.Google Scholar
- S. Goel and M. J. Salganik. Respondent-driven sampling as Markov chain Monte Carlo. Statistics in Medicine, 28(17):2202--2229, 2009.Google ScholarCross Ref
- P. J. Green and A. Mira. Delayed rejection in reversible jump metropolis-hastings. Biometrika, 88(4):1035--1053, 2001.Google ScholarCross Ref
- M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns. In VLDB, 2009.Google ScholarDigital Library
- W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97--109, 1970.Google ScholarCross Ref
- S. Ikeda, I. Kubo, and M. Yamashita. The hitting and cover times of random walks on finite graphs using local degree information. Theoretical Computer Science, 410(1):94--100, January 2009. Google ScholarDigital Library
- G. L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299--320, 2004.Google ScholarCross Ref
- K. Jung and D. Shah. Fast gossip via nonreversible random walk. In IEEE ITW, 2006.Google Scholar
- M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In ACM SIGMETRICS, 2011. Google ScholarDigital Library
- C.-H. Lee, X. Xu, and D. Y. Eun. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. Technical report, Dept. of ECE, North Carolina State University, April 2012.Google Scholar
- D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American Mathematical Society, 2009.Google Scholar
- W. Li and H. Dai. Accelerating distributed consensus via lifting markov chains. In IEEE ISIT, 2007.Google ScholarCross Ref
- S. Malefaki and G. Iliopoulos. On convergence of properly weighted samples to the target distribution. Journal of Statistical Planning and Inference, 138(4):1210--1225, 2008.Google ScholarCross Ref
- N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics}, 21(6):1087--1092, 1953.Google Scholar
- A. Mira. Ordering and improving the performance of monte carlo markov chains. Statistical Science, 16(4):340--350, 2001.Google ScholarCross Ref
- R. M. Neal. Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical report, No. 0406, Dept. of Statistics, University of Toronto, July 2004.Google Scholar
- P. H. Peskun. Optimum monte-carlo sampling using markov chains. Biometrika, 60:607--612, 1973.Google ScholarCross Ref
- A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In INFOCOM, 2009.Google ScholarCross Ref
- B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In IMC, 2010. Google ScholarDigital Library
- G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probability Surveys, 1:20--71, 2004.Google ScholarCross Ref
- S. M. Ross. Stochastic processes. John Wiley & Son, second edition, 1996.Google Scholar
- M. J. Salganik and D. D. Heckathorn. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34:193--239, 2004.Google ScholarCross Ref
- D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Transactions on Networking, 17(2):377--390, 2009. Google ScholarDigital Library
- S. S. Wu and M. T. Wells. An extension of the metropolis algorithm. Communications in Statistics -- Theory and Methods, 34(3):585--596, 2005.Google Scholar
Index Terms
- Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling
Recommendations
Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling
Performance evaluation reviewGraph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-...
Estimating and sampling graphs with multidimensional random walks
IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurementEstimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too ...
Fast distributed random walks
PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computingPerforming random walks in networks is a fundamental primitive that has found applications in many areas of computer science, including distributed computing. In this paper, we focus on the problem of performing random walks efficiently in a distributed ...
Comments