ABSTRACT
We consider the problem of identifying authoritative users in Yahoo! Answers. A common approach is to use link analysis techniques in order to provide a ranked list of users based on their degree of authority. A major problem for such an approach is determining how many users should be chosen as authoritative from a ranked list. To address this problem, we propose a method for automatic identification of authoritative actors. In our approach, we propose to model the authority scores of users as a mixture of gamma distributions. The number of components in the mixture is estimated by the Bayesian Information Criterion (BIC) while the parameters of each component are estimated using the Expectation-Maximization (EM) algorithm. This method allows us to automatically discriminate between authoritative and non-authoritative users. The suitability of our proposal is demonstrated in an empirical study using datasets from Yahoo! Answers.
- J. Zhang, M.S. Ackerman and L. Adamic. Expertise Networks in Online Communities: Structure and Algorithms. Proceedings of the 16th ACM International World Wide Web Conference (WWW'07), pages 221--230, 2007. Google ScholarDigital Library
- T.C. Turner, M.A. Smith, D. Fisher and H.T. Welser. Picturing Usenet: Mapping Computer-Mediated Collective Action. Journal of Computer-Mediated Communication, 10 (4), article 7, 2005.Google Scholar
- L. Prescott, Yahoo! Answers captures 96% of Q and A market share, 2006.Google Scholar
- C.S. Campbell, P.P. Maglio, A. Cozzi and B. Dom. Expertise Identification using Email Communication. Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM'03), pages 528--531, 2003. Google ScholarDigital Library
- B. Dom, I. Eiron, A. Cozzi and Y. Zhang. Graph-Based Ranking Algorithms for E-mail Expertise. Proceedings of 8th ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'03), pages 42--48, 2003. Google ScholarDigital Library
- X. Liu, J. Bollen, M. L. Nelson and H. V. Sompel. Co-authorship Network in the Digital Library Research Community. Information Processing and Management, 41 (6): 1462--1480, 2005. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46 (5): 604--632, 1999. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani and T. Winograd, The Pagerank Citation Ranking: Bringing Order to the Web, Stanford Digital Library Technologies Project, 1998.Google Scholar
- E. Agichtein, C. Castillo, D. Donato, A. Gionis and G. Mishne. Finding High-Quality Content in Social Media. Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM'08), pages 183--194, 2008. Google ScholarDigital Library
- J. Shetty and J. Adibi. Discovering Important Nodes through Graph Entropy: The Case of Enron Email Database. Proceedings of the 3rd International Workshop on Link Discovery, pages 74--81, 2005. Google ScholarDigital Library
- B. Dom and D. Paranjpe. A Bayesian Technique for Estimating the Credibility of Question Answerers. Proceedings of SIAM Conference on Data Mining (SDM'08), pages 399--409, 2008.Google ScholarCross Ref
- D. Yimam and A. Kobsa. Expert Finding Systems for Organisations: Problem and Domain Analysis and the DEMOIR Approach. Journal of Organizational Computing and Electronic Commerce, 13 (1): 1--24, 2003Google ScholarCross Ref
- K. Bharat and M. Henzinger. Improved Algorithms for Topic Distillation in Hyperlinked Environments. Proceedings of the 21st Annual International ACM SIGIR Conference (SIGIR'98), pages 104--111, 1998. Google ScholarDigital Library
- A. Brodin, G. O. Roberts, J. S. Rosenthal and P. Tsaparas. Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM Transactions on Internet Technology 5 (1): 231--297, 2005. Google ScholarDigital Library
- N. Balakrishnan and V.B. Nevzorov. A Primer on Statistical Distributions. John Wiley and Sons, 2003.Google ScholarCross Ref
- R. V. Hogg, J.W. McKean and A.T. Craig. Introduction to Mathematical Statistics. Pearson Prentice Hall, sixth ed., 2005.Google Scholar
- J.F. Lawless. Statistical Models and Methods for Lifetime Data. John Wiley and Sons, 1982.Google Scholar
- M. Bouguessa, S. Wang and H. Sun. An Objective Approach to Cluster Validation. Pattern Recognition Letters 27 (13): 1419--1430, 2006. Google ScholarDigital Library
- J.J. Oliver, R.A. Baxter and C.S Wallace. Unsupervised Learning Using MML. Proceedings of the 23rd International Conference on Machine Learning (ICML'06), pages 364--372, 2006.Google Scholar
- G. Schwarz. Estimating the Dimension of a Model. Annals of Statistics, 6 (2): 461--464, 1978.Google ScholarCross Ref
- A. Dempster, N. Laird and D. Rubin. Maximum Likelihood from Mixture Models. Journal of Royal Statistical Society, (Series B): 1--37, 1977.Google Scholar
- M.A.T. Figueiredo and A.K. Jain. Unsupervised Learning of Finite Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (3): 381--396, 2002. Google ScholarDigital Library
- J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. New York Plenum 1981. Google ScholarDigital Library
Index Terms
- Identifying authoritative actors in question-answering forums: the case of Yahoo! answers
Recommendations
Identifying Authoritative and Reliable Contents in Community Question Answering with Domain Knowledge
Revised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 7867Community Question Answering (CQA) has emerged as a popular forum for users to ask and answer questions. Over the last few years, CQA portals such as Yahoo answersand Baidu Zhidao have exploded in popularity, and now provide a viable alternative to ...
Gaussian Mixture Model-Based Bayesian Analysis for Underdetermined Blind Source Separation
AbstractThis paper proposes a Gaussian mixture model-based Bayesian analysis for blind source separation of an underdetermined model that has more sources than sensors. The proposed algorithm follows a hierarchical learning procedure and alternative ...
Identifying Authorities in Online Communities
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd IntelligenceSeveral approaches have been proposed for the problem of identifying authoritative actors in online communities. However, the majority of existing methods suffer from one or more of the following limitations: (1) There is a lack of an automatic ...
Comments