Skip to main content
Log in

Deceptive review detection using labeled and unlabeled data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Availability of millions of products and services on e-commerce sites makes it difficult to search the best suitable product according to the requirements because of existence of many alternatives. To get rid of this the most popular and useful approach is to follow reviews of others in opinionated social medias, who have already tried them. Almost all e-commerce sites provide facility to the users for giving views and experience of the product and services they experienced. The customers reviews are increasingly used by individuals, manufacturers and retailers for purchase and business decisions. As there is no scrutiny over the reviews received, anybody can write anything unanimously which conclusively leads to review spam. Moreover, driven by the desire of profit and/or publicity, spammers produce synthesized reviews to promote some products/brand and demote competitors products/brand. Deceptive review spam has seen a considerable growth overtime. In this work, we have applied supervised as well as unsupervised techniques to identify review spam. Most effective feature sets have been assembled for model building. Sentiment analysis has also been incorporated in the detection process. In order to get best performance some well-known classifiers were applied on labeled dataset. Further, for the unlabeled data, clustering is used after desired attributes were computed for spam detection. Additionally, there is a high chance that spam reviewers may also be held responsible for content pollution in multimedia social networks, because nowadays many users are giving the reviews using their social network logins. Finally, the work can be extended to find suspicious accounts responsible for posting fake multimedia contents into respective social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:2–11

    Google Scholar 

  2. Algur SP, Patil AP, Hiremath P, Shivashan S (2010) Conceptual level similarity measure based review spam detection. In: International Conference on Signal and Image Processing. doi:10.1109/ICSIP.2010.5697509, pp 416–423

  3. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. doi:10.1145/279943.279962, pp 92–100

  4. Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):1–24. doi:10.1186/s40537-015-0029-9

    Article  Google Scholar 

  5. Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:175–184

    Google Scholar 

  6. Gao Y, Wang F, Luan H, Chua TS (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval. doi:10.1145/2578726.2578748, p 169

  7. Gao Y, Zhao S, Yang Y, Chua TS (2015) Multimedia social event detection in microblog. In: Multimedia Modeling. doi:10.1007/978-3-319-14445-0-24, pp 269–281

  8. Günnemann S, Günnemann N, Faloutsos C (2014) Detecting anomalies in dynamic rating data: A robust probabilistic model for rating evolution. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2623330.2623721, pp 841–850

  9. Harris C (2012) Detecting deceptive opinion spam using human computation. In: Workshops at AAAI on Artificial Intelligence

  10. Hernández D, Guzmán R, Móntes y, Gomez M, Rosso P (2013) Using pu-learning to detect deceptive opinion spam. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp 38–45

  11. Jindal N, Liu B (2007) Analyzing and detecting review spam. In: Proceedings of the Seventh IEEE International Conference on Data Mining. doi:10.1109/ICDM.2007.68, pp 547–552

  12. Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web. doi:10.1145/1242572.1242759, pp 1189–1190

  13. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. doi:10.1145/1341531.1341560, pp 219–230

  14. Lai C, Xu K, Lau RY, Li Y, Jing L (2010) Toward a language modeling approach for consumer review spam detection. In: Proceedings of IEEE 7th International Conference on E-business Engineering. doi:10.1109/ICEBE.2010.47, pp 1–8

  15. Lau RY, Liao S, Kwok RCW, Xu K, Xia Y, Li Y (2011) Text mining and probabilistic language modeling for online review spam detecting. ACM Trans Manag Inf Syst 2(4):1–30. doi:10.1145/2070710.2070716

    Article  Google Scholar 

  16. Lee K, Caverlee J, Pu C (2014) Social spam, campaigns, misinformation and crowdturfing. In: WWW (Companion volume). doi:10.1145/2567948.2577270, pp 199–200

  17. Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence. doi:10.5591/978-1-57735-516-8/IJCAI11-414, vol 22, p 2488

  18. Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. doi:10.1145/1871437.1871557, pp 939–948

  19. Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of 3rd IEEE International Conference on Data Mining. doi:10.1109/ICDM.2003.1250918, pp 179–186

  20. Long NH, Nghia PHT, Vuong NM (2014) Opinion spam recognition method for online reviews using ontological features. Tap chi KHOA HoC DHSP TPHCM (61) 44

  21. Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web. doi:10.1145/1963192.1963240, pp 93–94

  22. Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web. doi:10.1145/2187836.2187863, pp 191–200

  23. Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2487575.2487580, pp 632–640

  24. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: Classification and analysis of real and pseudo reviews. Technical. Report., Technical Report UIC-CS-2013-03 University of Illinois at Chicago

  25. Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing?. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media

  26. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1, pp 309–319

  27. Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 497–501

  28. Peng Q, Zhong M (2014) Detecting spam review through sentiment analysis. J Softw 9(8):2065–2072. doi:10.4304/jsw.9.8.2065-2072

    Article  Google Scholar 

  29. Qi S, Wang F, Wang X, Wei J, Zhao H (2015) Live multimedia brand-related data identification in microblog. Neurocomputing 158:225–233. doi:10.1016/j.neucom.2015.01.041

    Article  Google Scholar 

  30. Rayson P, Wilson A, Leech G (2001) Grammatical word class variation within the british national corpus sampler. Lang Comput 36(1):295–306

    Google Scholar 

  31. Ren Y, Ji D, Zhang H (2014) Positive unlabeled learning for deceptive reviews detection. In: Proceedings of First Conference on Empirical Methods in Natural Language Processing, pp 488–498

  32. Shojaee S, Murad MAA, Bin Azman A, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Proceedings of 13th International Conference on Intelligent Systems Design and Applications. doi:10.1109/ISDA.2013.6920707, pp 53–58

  33. Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73. doi:10.1007/s00530-014-0393-x

    Article  Google Scholar 

  34. Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the First Workshop on Social Media Analytics. doi:10.1145/1964858.1964860, pp 10–13

  35. Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4):969–979. doi:10.1007/s13278-012-0078-4

    Article  Google Scholar 

  36. Zhao S, Yao H, Zhao S, Jiang X, Jiang X (2014) Multi-modal microblog classification via multi-task learning. Multimed Tools Appl:1–18. doi:10.1007/s11042-014-2342-2

Download references

Acknowledgments

The work presented in this article is partially funded by the following two projects:

1. Information Security Education & Awareness Project (Phase II), Ministry of Communications and Information Technology, Government of India, and

2. Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions (FIST) Program, Department of Science and Technology, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sambit Bakshi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rout, J.K., Singh, S., Jena, S.K. et al. Deceptive review detection using labeled and unlabeled data. Multimed Tools Appl 76, 3187–3211 (2017). https://doi.org/10.1007/s11042-016-3819-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3819-y

Keywords

Navigation