ABSTRACT
Search engines are a vital part of the Web and thus the Internet infrastructure. Therefore understanding the behavior of users searching the Web gives insights into trends, and enables enhancements of future search capabilities. Possible data sources for studying Web search behavior are either server-side logs or client-side logs. Unfortunately, current server-side logs are hard to obtain as they are considered proprietary by the search engine operators. Therefore we in this paper present a methodology for extracting client-side logs from the traffic exchanged between a large user group and the Internet. The added benefit of our methodology is that we do not only extract the search terms, the query sequences, and search results of each individual user but also the full clickstream, i.e., the result pages users view and the subsequently visited hyperlinked pages. We propose a finite-state Markov model that captures the user web searching and browsing behavior and allows us to deduce users' prevalent search patterns. To our knowledge, this is the first such detailed client-side analysis of clickstreams.
- Google basic search. http://www.google.com/support/bin/static.py?page=searchguides.html&ctx=basics.Google Scholar
- R. Atterer, M. Wnuk, and A. Schmidt. Knowing the user's every move---user activity tracking for website usability evaluation and implicit interaction. In WWW, 2006. Google ScholarDigital Library
- P. Barford. Modeling, Measurement and Performance of World Wide Web Transactions. PhD thesis, Boston University, 2001. Google ScholarDigital Library
- S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly analysis of a very large topically categorized web query log. In ACM SIGIR, 2004. Google ScholarDigital Library
- M. Chau, X. Fang, and O. R. L. Sheng. Analysis of the query logs of a web site search engine. In American Society for Information Science and Technology, 2005. Google ScholarDigital Library
- H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Query expansion by mining user logs. In IEEE Trans. Knowl. Data Eng. 15(4), 2003. Google ScholarDigital Library
- B. Jansen and U. Pooch. Web user studies: A review and framework for future work. In American Society of Information Science and Technology, 2001.Google Scholar
- B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison-Wesley, 2001.Google Scholar
- U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW, 2005. Google ScholarDigital Library
- J. Luxenburger and G. Weikum. Query-log based authority analysis for web information search. In WISE, 2004.Google ScholarCross Ref
- V. Paxson. Bro: A system for detecting network intruders in real-time. In Computer Networks, 1999. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. In KDD, 2005. Google ScholarDigital Library
- X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In ACM SIGIR, 2005. Google ScholarDigital Library
- C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical report, SRC Technical Note 014, 1998.Google Scholar
- A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by excite users. In Internet Research: Electronic Networking Applications and Policy, 2000.Google ScholarCross Ref
- A. Spink, S. Koshman, M. Park, C. Field, and B. J. Jansen. Multitasking web search on vivisimo.com. In ITCC, 2005. Google ScholarDigital Library
- A. Spink, D. Wolfram, B. Jansen, and T. Saracevic. Searching the web: The public and their queries. In American Society for Information Science and Technology, 2001. Google ScholarDigital Library
- H. Weinreich, H. Obendorf, E. Herder, and M. Mayer. Off the beaten tracks: Exploring three aspects of web navigation. In WWW, 2006. Google ScholarDigital Library
Index Terms
- Web search clickstreams
Recommendations
Separation of Interleaved Web Sessions with Heuristic Search
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data MiningWe describe a heuristic search-based method for interleaved HTTP (Web) session reconstruction building upon first order Markov models. An interleaved session is generated by a user who is concurrently browsing the same web site in two or more web ...
Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web Logs
Web logs have been widely used to represent the web page visits of online users. However, we found that web logs in Chrome’s browsing history only record 57% of users’ visited websites, i.e., nearly half of a user’s website visits are not recorded. ...
Semantic search on the Web
Web search is a key technology of the Web, since it is the primary way to access content on the Web. Current standard Web search is essentially based on a combination of textual keyword search with an importance ranking of the documents depending on the ...
Comments