Efficient crawling through URL ordering

doi:10.1016/S0169-7552(98)00108-1

Computer Networks and ISDN Systems

Volume 30, Issues 1–7, April 1998, Pages 161-172

Computer Networks an...

https://doi.org/10.1016/S0169-7552(98)00108-1 Get rights and content

Under a Creative Commons license

open archive

Abstract

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more “important” pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.

Keywords

Crawling

URL ordering