ABSTRACT
Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web hyperlinks), which represent the relationship of data objects within a homogeneous data type (web pages), and inter-type links (e.g., user browsing log) which represent the relationship of data objects across different data types (users and web pages). Unfortunately, most link analysis research only considers one type of link. In this paper, we propose a unified link analysis framework, called "link fusion", which considers both the inter- and intra- type link structure among multiple-type inter-related data objects and brings order to objects in each data type at the same time. The PageRank and HITS algorithms are shown to be special cases of our unified link analysis framework. Experiments on an instantiation of the framework that makes use of the user data and web pages extracted from a proxy log show that our proposed algorithm could improve the search effectiveness over the HITS and DirectHit algorithms by 24.6% and 38.2% respectively.
- The Clever Searching, the Clever project of IBM Almaden Research Center, www.almaden.ibm.com/cs/k53/clever.html <http://www.almaden.ibm.com/cs/k53/clever.html>.Google Scholar
- Berman, A. and Plemmons, R. J. Nonnegative matrices in the mathematical sciences. in Classics in Applied Mathematics, 1994.Google ScholarCross Ref
- Bharat, K. and Henzinger, M. R., Improved algorithms for topic distillation in a hyperlinked environment. in 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, (Melbourne, Australia, 1998), 104--111. Google ScholarDigital Library
- Brin, S. and Page, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30. 107--117. Google ScholarDigital Library
- Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P. and Rajagopalan, S., Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. in 7th international conference on World Wide Web, (Brisbane, Australia, 1998), 65--74. Google ScholarDigital Library
- Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D. and Kleinberg, J. M. Mining the Web's Link Structure. IEEE Computer, 32 (8). 60--67. Google ScholarDigital Library
- Cohn, D. and Chang, H., Learning to Probabilistically Identify Authoritative Documents. in 17th International Conference on Machine Learning, (Stanford, CA 2000), 167--174. Google ScholarDigital Library
- Craswell, N. and Hawking, D., Overview of the TREC-2002 Web Track. in 11th Text Retrieval Conference, (Gaithersburg, MD,2002).Google Scholar
- Craswell, N., Hawking, D. and Robertson, S., Effective Site Finding using Link Anchor Information. in 24th annual international ACM SIGIR conference on Research and development in information retrieval, (New Orleans, LA, 01), 250--257. Google ScholarDigital Library
- Davison, B. D., Toward a unification of text and link analysis. in 26th annual international ACM SIGIR conference on Research and development in information retrieval, (Toronto, Canada, 2003), 367--368. Google ScholarDigital Library
- DirectHit. <http://www.directhit.com>.Google Scholar
- Garfield, E. Citation analysis as a tool in journal evaluation. Science, 178. 471-479.Google Scholar
- Hayes, B. Graph Theory in Practice, 2000.Google Scholar
- Herlocker, J. L., Konstan, J. A., Borchers, A. and Riedl, J., An algorithmic framework for performing collaborative filtering. in 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (Berkeley, CA 1999), 230--237. Google ScholarDigital Library
- Hubbell, C. H. An input-output approach to clique identification. Sociometry, 28. 377--399.Google Scholar
- Katz, L. A new status index derived from sociometric analysis. Psychometrika, 18 (1). 39--42.Google ScholarCross Ref
- Kleinberg, J. M. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46 (5). 604--632. Google ScholarDigital Library
- Lempel, R., Moran, S. SALSA: the Stochastic Approach for Link-Structure Analysis (TOIS), 19 (2). 131-160. Google ScholarDigital Library
- Miller, J. C., Rae, G., Schaefer, F., Ward, L. A., LoFaro, T. and Farahat, A., Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records. in 24th annual international ACM SIGIR conference on Research and development in information retrieval, (New Orleans, LA, 2001), 444--445. Google ScholarDigital Library
- Ng, A. Y., Zheng, A. X. and Jordan, M. I., Stable algorithms for link analysis. in 24th ACM SIGIR International Conference on Research and Development in Information Retrieval, (New Orleans, LA 2001), 258--266. Google ScholarDigital Library
- Pinski, G. and Narin, N. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Process and Management, 12. 297--312.Google Scholar
- Vogt, C. C. and Cottrell, G. W., Predicting the performance of linearly combined IR systems. in 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, (Melbourne, Australia, 1998), 190--196. Google ScholarDigital Library
- Wen, J.-R., Nie, J.-Y. and Zhang, H.-J. Query Clustering Using User Logs. ACM Transactions on Information Systems (TOIS), 20 (1). 59--81. Google ScholarDigital Library
Index Terms
- Link fusion: a unified link analysis framework for multi-type interrelated data objects
Recommendations
Region-wise Ranking of Sports Players based on Link Fusion
WWW '18: Companion Proceedings of the The Web Conference 2018Players are ranked in various sports to show their importance over other players. Existing methods only consider intra-type links (e.g., player to player and team to team), but ignore inter-type links (e.g., one type of player to another type of player, ...
A geometric framework for data fusion in information retrieval
Data fusion in information retrieval has been investigated by many researchers and a number of data fusion methods have been proposed. However, problems such as why data fusion can increase effectiveness and favorable conditions for the use of data ...
Multi-source data fusion study in scientometrics
This paper provides an introduction to multi-source data fusion (MSDF) and comprehensively overviews the ingredients and challenges of MSDF in scientometrics. As compared to the MSDF methods in the sensor and other fields, and considering the features ...
Comments