Abstract
We address the collusive opinion fraud problem in online review portals, where groups of people work together to deliver deceptive reviews for manipulating the reputations of targeted items. Such collusive fraud is considered much harder to defend against, since the participants (or colluders) can evade detection by shaping their behaviors collectively so as not to appear suspicious. To alleviate this problem, countermeasures have been proposed that leverage the collective behaviors of colluders. The motivation stems from the observation that colluders typically act in a very synchronized way, as they are instructed by the same campaigns with common items to target and schedules to follow. However, the collective behaviors examined in existing solutions focus mostly on the external appearance of fraud campaigns, such as the campaign size and the size of the targeted item set. These signals may become ineffective once colluders have changed their behaviors collectively. Moreover, the detection algorithms used in existing approaches are designed to only make collusion inference on the input data; predictive models that can be deployed for detecting emerging fraud cannot be learned from the data. In this article, to complement existing studies on collusive opinion fraud characterization and detection, we explore more subtle behavioral trails in collusive fraud practice. In particular, a suite of homogeneity-based measures are proposed to capture the interrelationships among colluders within campaigns. Moreover, a novel statistical model is proposed to further characterize, recognize, and predict collusive fraud in online reviews. The proposed model is fully unsupervised and highly flexible to incorporate effective measures available for better modeling and prediction. Through experiments on two real-world datasets, we show that our method outperforms the state of the art in both characterization and detection abilities.
- L. Akoglu, R. Chandy, and C. Faloutsos. 2013. Opinion fraud detection in online reviews by network effects. In Proceedings of the 7th International AAAI Conference on Web and Social Media (ICWSM’13). 2--11.Google Scholar
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 119--130. Google ScholarDigital Library
- Christopher M. Bishop et al. 2006. Pattern Recognition and Machine Learning. Vol. 4. Springer, New York, NY.Google Scholar
- Chester I. Bliss. 1934. The method of probits. Science 79, 2037 (1934), 38--39. Google ScholarCross Ref
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1 (1977), 1--38.Google ScholarCross Ref
- Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Exploiting burstiness in reviews for review spammer detection. In Proceedings of the 7th International AAAI Conference on Web and Social Media (ICWSM’13) 13 (2013), 175--184.Google Scholar
- Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 11 (2003), 933--969.Google ScholarDigital Library
- Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. 2012. Understanding and combating link farming in the twitter social network. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 61--70. Google ScholarDigital Library
- N. Jindal and B. Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). 219--230. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). 133--142. Google ScholarDigital Library
- A. Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Adv. Neur. Inform. Process. Syst. 14 (2002), 841.Google Scholar
- Julia A. Lasserre, Christopher M. Bishop, and Thomas P. Minka. 2006. Principled hybrids of generative and discriminative models. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 87--94. Google ScholarDigital Library
- E. P. Lim, V. A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). 939--948. Google ScholarDigital Library
- Michael Luca and Georgios Zervas. 2013. Fake it Till You Make It: Reputation, Competition, and Yelp Review Fraud. Harvard Business School NOM Unit Working Paper 14-006.Google Scholar
- Arash Molavi Kakhki, Chloe Kliman-Silver, and Alan Mislove. 2013. Iolaus: Securing online content rating systems. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 919--930. Google ScholarDigital Library
- Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). 632--640. Google ScholarDigital Library
- A. Mukherjee, B. Liu, and N. Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 191--200. Google ScholarDigital Library
- Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie S Glance. 2013. What yelp fake review filter might be doing?. In Proceedings of the 7th International AAAI Conference on Web and Social Media (ICWSM’13).Google Scholar
- Myle Ott, Claire Cardie, and Jeff Hancock. 2012. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 201--210. Google ScholarDigital Library
- Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Volume 1. 309--319.Google Scholar
- Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, and Duen Horng Polo Chau. 2014. Turning the tide: Curbing deceptive yelp behaviors. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM’14). 244--252. Google ScholarCross Ref
- Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 985--994. Google ScholarDigital Library
- Huan Sun, Alex Morales, and Xifeng Yan. 2013. Synthetic review spamming and defense. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). 1088--1096. Google ScholarDigital Library
- G. Wang, S. Xie, B. Liu, and S. Y. Philip. 2012a. Identify online store review spammers via social review graph. ACM Trans. Intell. Syst. Technol. 3, 4 (2012), 61.Google ScholarDigital Library
- Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2012b. Identify online store review spammers via social review graph. ACM Trans. Intell. Syst. Technol. 3, 4 (2012), 61.Google ScholarDigital Library
- Chang Xu and Jie Zhang. 2015. Towards collusive fraud detection in online reviews. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). 1051--1056. Google ScholarDigital Library
- Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. 2013. Uncovering collusive spammers in chinese review websites. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 979--988. Google ScholarDigital Library
- Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 71--80. Google ScholarDigital Library
- Junting Ye and Leman Akoglu. 2015. Discovering opinion spammer groups by network footprints. In Machine Learning and Knowledge Discovery in Databases. 267--282. Google ScholarDigital Library
Index Terms
Collusive Opinion Fraud Detection in Online Reviews: A Probabilistic Modeling Approach
Recommendations
Graph-Based Fraud Detection in the Face of Camouflage
Special Issue on KDD 2016 and Regular PapersGiven a bipartite graph of users and the products that they review, or followers and followees, how can we detect fake reviews or follows? Existing fraud detection methods (spectral, etc.) try to identify dense subgraphs of nodes that are sparsely ...
Towards Collusive Fraud Detection in Online Reviews
ICDM '15: Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM)Online review fraud has evolved in sophistication by launching intelligent campaigns where a group of coordinated participants work together to deliver deceptive reviews for the designated targets. Such collusive fraud is considered much harder to ...
Detection of fraudulent and malicious websites by analysing user reviews for online shopping websites
Recently, the web has become a crucial worldwide platform for online shopping. People go online to sell and buy products, use online banking facilities and even give opinions about their online shopping experience. People with malicious intent may be ...
Comments