ABSTRACT
Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by academic researchers. Our dataset represents about forty million bookmarks from the social bookmarking site del.icio.us. We contribute a characterization of posts to del.icio. us: how many bookmarks exist (about 115 million), how fast is it growing, and how active are the URLs being posted about (quite active). We also contribute a characterization of tags used by bookmarkers. We found that certain tags tend to gravitate towards certain domains, and vice versa. We also found that tags occur in over 50 percent of the pages that they annotate, and in only 20 percent of cases do they not occur in the page text, backlink page text, or forward link page text of the pages they annotate. We conclude that social bookmarking can provide search data not currently provided by other sources, though it may currently lack the size and distribution of tags necessary to make a significant impact
- S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing Web Search Using Social Annotations. In WWW '07: Proceedings of the 16th International Conference on World Wide Web pages 501--510, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- W. Cavnar and J. Trenkle. N-Gram-Based Text Categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval pages 161--175, 1994.Google Scholar
- H. Chen. Collaborative Systems: Solving the Vocabulary Problem.IEEE Computer, Special Issue on CSCW 27(5): 58--66, May 1994. Google ScholarDigital Library
- N. Craswell, D. Hawking, and S. Robertson. Effective Site Finding Using Link Anchor Information. In SIGIR'01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 250--257, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- A. Dasgupta, A. Ghosh, R. Kumar, C. Olston, S. Pandey, and A. Tomkins. The Discoverability of the Web. In WWW'07: Proceedings of the 16th International Conference on World Wide Web pages 421--430, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- N. Eiron and K. S. McCurley. Analysis of Anchor Text for Web Search. In SIGIR'03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 459--460, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- N. Eiron, K. S. McCurley, and J. A. Tomlin. Ranking the Web Frontier. In WWW'04: Proceedings of the 13th International Conference on World Wide Web pages 309--318, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The Vocabulary Problem in Human-System Communication. Communications of the ACM 30(11): 964--971, 1987. Google ScholarDigital Library
- S. Golder and B. A. Huberman. Usage Patterns of Collaborative Tagging Systems. Journal of Information Science 32(2): 198--208, April 2006. Google ScholarDigital Library
- H. Halpin, V. Robu, and H. Shepherd. The Complex Dynamics of Collaborative Tagging. In WWW'07: Proceedings of the 16th International Conference on World Wide Web pages 211--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating Spam in Tagging Systems. In AIRWeb'07: Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web pages 57--64, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- C. Marlow, M. Naaman, D. Boyd, and M. Davis. HT06, tagging paper, taxonomy, Flickr, academic article, to read. In HYPERTEXT'06: Proceedings of the Seventeenth Conference on Hypertext and Hypermedia pages 31--40, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- G. Pass, A. Chowdhury, and C. Torgeson. A Picture of Search. In InfoScale'06: Proceedings of the 1st International Conference on Scalable Information Systems page 1, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M. Harper, and J. Riedl. tagging, communities, vocabulary,evolution. In CSCW'06: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work pages 181--190, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- D. Sifry. State of the Live Web: April 2007. http://www.sifry.com/stateoftheliveweb/Google Scholar
- Y. Yanbe, A. Jatowt, S. Nakamura, and K. Tanaka. Can Social Bookmarking Enhance Search in the Web? In JCDL'07: Proceedings of the 2007 Conference on Digital Libraries pages 107--116, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Index Terms
- Can social bookmarking improve web search?
Recommendations
Can social bookmarking enhance search in the web?
JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital librariesSocial bookmarking is an emerging type of a Web service that helps users share, classify, and discover interesting resources. In this paper, we explore the concept of an enhanced search, in which data from social bookmarking systems is exploited for ...
Tag recommendation for social bookmarking: Probabilistic approaches
Principles and Practice of Multi-Agent SystemsTagging has become increasingly popular with the explosion of user-created content on the web. A 'tag' can be defined as a group of keywords that makes organizing, browsing and searching for content more efficient. Users apply tags to a variety of web-...
Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor
The authors present and evaluate an approach to trend detection in social bookmarking systems using a probabilistic generative model in combination with smoothing techniques. Social bookmarking systems are gaining major interest among researchers in the ...
Comments