ABSTRACT
Microblog services typically contain very short documents (e.g., tweets) containing comments about the latest news and events. Many of these documents are not informative or have very little content due to their personal and ephemeral nature. Providing effective retrieval in a microblog service will require addressing the challenge of distinguishing the high-quality, informative documents from the others. Recent work has focused on finding features that indicate the quality of microblog documents, but the impact these quality features on retrieval is not clear. In this paper, we suggest a low-cost quality model using surrogate judgments based on user behavior (i.e., retweets) that can be collected automatically. We analyze the relationship between document informativeness and relevance judgments for microblog retrieval. Then we demonstrate that our behavior-based quality metric has a high correlation with manual judgments. Also, we perform experiments to study the impact of the quality model on microblog retrieval. The results based on the TREC Microblog track show that the proposed quality model, combined with a variety of retrieval models, can improve retrieval performance and is competitive with a model trained using manual relevance judgments.
- O. Alonso, C. Carson, D. Gerster, X. Ji, and S. U. Nabar. Detecting uninteresting content in text streams. In SIGIR'10 Crowdsourcing for Search Evaluation Workshop, 2010.Google Scholar
- M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In WSDM'11, 2011. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 1998. Google ScholarDigital Library
- C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In WWW'11, 2011. Google ScholarDigital Library
- Y. Duan, L. Jiang, T. Qin, M. Zhou, H. Shum. An empirical study on learning to rank of tweets. In Coling'10, 2010. Google ScholarDigital Library
- L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. In WWW'11, 2011. Google ScholarDigital Library
- M. Huang, Y. Yang, and X. Zhu. Quality-biased ranking of short texts in microblogging services, In IJCNLP'11, 2011.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 1999. Google ScholarDigital Library
- V. Lavrenko, W. B. Croft. Relevance-based language models. In SIGIR'01, 2001. Google ScholarDigital Library
- K. Massoudi, E. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In ECIR'11, 2011. Google ScholarDigital Library
- D. Metzler, W. B. Croft. A Markov random field model for term dependencies. In SIGIR'05, 2005. Google ScholarDigital Library
- D. Metzler and W. B. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3), 2007. Google ScholarDigital Library
- D. Metzler and C. Cai, USC/ISI at TREC 2011: Microblog Track, In TREC'11, 2012.Google Scholar
- N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Bad news travel fast: A content-based analysis of interestingness on twitter. In WebSci'11, 2011.Google ScholarDigital Library
- N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Searching microblogs: Coping with sparsity and document quality. In CIKM'11, 2011. Google ScholarDigital Library
- H.-K. Peng, J. Zhu, D. Piao, R. Yan and J. Y. Zhang. Retweet Modeling Using Conditional Random Fields. ICDM Workshops, 2011. Google ScholarDigital Library
- J. Seo and W. B. Croft. Unsupervised estimation of dirichlet smoothing parameters. In SIGIR'10, 2010. Google ScholarDigital Library
- M. D. Smucker, J. Allan, and B. Carterette, A Comparison of Statistical Significance Tests for Information Retrieval Evaluation, CIKM'07, 2007. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval, In SIGIR'98, 1998. Google ScholarDigital Library
- J. Teevan, D. Ramage, and M. Morris. #Twittersearch: A comparison of microblog search and web search. In WSDM'11, 2011. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, 2001. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. Document quality models for web ad hoc retrieval. In CIKM'05, 2005. Google ScholarDigital Library
Index Terms
- Quality models for microblog retrieval
Recommendations
Behavior Analysis of Microblog Users Based on Transitions in Posting Activities
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesIn recent years, such microblogs as Twitter have spread widely over the world. Twitter, which enables instant text communications among users, was launched in 2006. In 2012, its Japanese users exceeded 29.9 million. Useful functions related to posting a ...
Adding semantics to microblog posts
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningMicroblogs have become an important source of information for the purpose of marketing, intelligence, and reputation management. Streams of microblogs are of great value because of their direct and real-time nature. Determining what an individual ...
Research on User Influence in Microblog Based on Interest Graph
ICIE '17: Proceedings of the 6th International Conference on Information EngineeringMicroblog1 is currently China's largest social networking platform. In recent years, as a social media, microblog influence continues to expand. The users who have large influence play a guiding role in the spread of microblog, and even lead to public ...
Comments