Deceptive review detection using labeled and unlabeled data

Rout, Jitendra Kumar; Singh, Smriti; Jena, Sanjay Kumar; Bakshi, Sambit

doi:10.1007/s11042-016-3819-y

Deceptive review detection using labeled and unlabeled data

Published: 19 August 2016

Volume 76, pages 3187–3211, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jitendra Kumar Rout¹,
Smriti Singh²,
Sanjay Kumar Jena¹ &
…
Sambit Bakshi¹

2006 Accesses
60 Citations
Explore all metrics

Abstract

Availability of millions of products and services on e-commerce sites makes it difficult to search the best suitable product according to the requirements because of existence of many alternatives. To get rid of this the most popular and useful approach is to follow reviews of others in opinionated social medias, who have already tried them. Almost all e-commerce sites provide facility to the users for giving views and experience of the product and services they experienced. The customers reviews are increasingly used by individuals, manufacturers and retailers for purchase and business decisions. As there is no scrutiny over the reviews received, anybody can write anything unanimously which conclusively leads to review spam. Moreover, driven by the desire of profit and/or publicity, spammers produce synthesized reviews to promote some products/brand and demote competitors products/brand. Deceptive review spam has seen a considerable growth overtime. In this work, we have applied supervised as well as unsupervised techniques to identify review spam. Most effective feature sets have been assembled for model building. Sentiment analysis has also been incorporated in the detection process. In order to get best performance some well-known classifiers were applied on labeled dataset. Further, for the unlabeled data, clustering is used after desired attributes were computed for spam detection. Additionally, there is a high chance that spam reviewers may also be held responsible for content pollution in multimedia social networks, because nowadays many users are giving the reviews using their social network logins. Finally, the work can be extended to find suspicious accounts responsible for posting fake multimedia contents into respective social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deceptive Reviews Detection in E-Commerce Websites Using Machine Learning

Detection of spam reviews: a sentiment analysis approach

Article 15 May 2018

Survey of review spam detection using machine learning techniques

Article Open access 05 October 2015

References

Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:2–11
Google Scholar
Algur SP, Patil AP, Hiremath P, Shivashan S (2010) Conceptual level similarity measure based review spam detection. In: International Conference on Signal and Image Processing. doi:10.1109/ICSIP.2010.5697509, pp 416–423
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. doi:10.1145/279943.279962, pp 92–100
Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):1–24. doi:10.1186/s40537-015-0029-9
Article Google Scholar
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:175–184
Google Scholar
Gao Y, Wang F, Luan H, Chua TS (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval. doi:10.1145/2578726.2578748, p 169
Gao Y, Zhao S, Yang Y, Chua TS (2015) Multimedia social event detection in microblog. In: Multimedia Modeling. doi:10.1007/978-3-319-14445-0-24, pp 269–281
Günnemann S, Günnemann N, Faloutsos C (2014) Detecting anomalies in dynamic rating data: A robust probabilistic model for rating evolution. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2623330.2623721, pp 841–850
Harris C (2012) Detecting deceptive opinion spam using human computation. In: Workshops at AAAI on Artificial Intelligence
Hernández D, Guzmán R, Móntes y, Gomez M, Rosso P (2013) Using pu-learning to detect deceptive opinion spam. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp 38–45
Jindal N, Liu B (2007) Analyzing and detecting review spam. In: Proceedings of the Seventh IEEE International Conference on Data Mining. doi:10.1109/ICDM.2007.68, pp 547–552
Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web. doi:10.1145/1242572.1242759, pp 1189–1190
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. doi:10.1145/1341531.1341560, pp 219–230
Lai C, Xu K, Lau RY, Li Y, Jing L (2010) Toward a language modeling approach for consumer review spam detection. In: Proceedings of IEEE 7th International Conference on E-business Engineering. doi:10.1109/ICEBE.2010.47, pp 1–8
Lau RY, Liao S, Kwok RCW, Xu K, Xia Y, Li Y (2011) Text mining and probabilistic language modeling for online review spam detecting. ACM Trans Manag Inf Syst 2(4):1–30. doi:10.1145/2070710.2070716
Article Google Scholar
Lee K, Caverlee J, Pu C (2014) Social spam, campaigns, misinformation and crowdturfing. In: WWW (Companion volume). doi:10.1145/2567948.2577270, pp 199–200
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence. doi:10.5591/978-1-57735-516-8/IJCAI11-414, vol 22, p 2488
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. doi:10.1145/1871437.1871557, pp 939–948
Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of 3rd IEEE International Conference on Data Mining. doi:10.1109/ICDM.2003.1250918, pp 179–186
Long NH, Nghia PHT, Vuong NM (2014) Opinion spam recognition method for online reviews using ontological features. Tap chi KHOA HoC DHSP TPHCM (61) 44
Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web. doi:10.1145/1963192.1963240, pp 93–94
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web. doi:10.1145/2187836.2187863, pp 191–200
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2487575.2487580, pp 632–640
Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: Classification and analysis of real and pseudo reviews. Technical. Report., Technical Report UIC-CS-2013-03 University of Illinois at Chicago
Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing?. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1, pp 309–319
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 497–501
Peng Q, Zhong M (2014) Detecting spam review through sentiment analysis. J Softw 9(8):2065–2072. doi:10.4304/jsw.9.8.2065-2072
Article Google Scholar
Qi S, Wang F, Wang X, Wei J, Zhao H (2015) Live multimedia brand-related data identification in microblog. Neurocomputing 158:225–233. doi:10.1016/j.neucom.2015.01.041
Article Google Scholar
Rayson P, Wilson A, Leech G (2001) Grammatical word class variation within the british national corpus sampler. Lang Comput 36(1):295–306
Google Scholar
Ren Y, Ji D, Zhang H (2014) Positive unlabeled learning for deceptive reviews detection. In: Proceedings of First Conference on Empirical Methods in Natural Language Processing, pp 488–498
Shojaee S, Murad MAA, Bin Azman A, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Proceedings of 13th International Conference on Intelligent Systems Design and Applications. doi:10.1109/ISDA.2013.6920707, pp 53–58
Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73. doi:10.1007/s00530-014-0393-x
Article Google Scholar
Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the First Workshop on Social Media Analytics. doi:10.1145/1964858.1964860, pp 10–13
Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4):969–979. doi:10.1007/s13278-012-0078-4
Article Google Scholar
Zhao S, Yao H, Zhao S, Jiang X, Jiang X (2014) Multi-modal microblog classification via multi-task learning. Multimed Tools Appl:1–18. doi:10.1007/s11042-014-2342-2

Download references

Acknowledgments

The work presented in this article is partially funded by the following two projects:

1. Information Security Education & Awareness Project (Phase II), Ministry of Communications and Information Technology, Government of India, and

2. Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions (FIST) Program, Department of Science and Technology, Government of India.

Author information

Authors and Affiliations

National Institute of Technology, Rourkela, Odisha, 769 008, India
Jitendra Kumar Rout, Sanjay Kumar Jena & Sambit Bakshi
Teradata Corporation, Telangana, 500 081, India
Smriti Singh

Authors

Jitendra Kumar Rout
View author publications
You can also search for this author in PubMed Google Scholar
Smriti Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Kumar Jena
View author publications
You can also search for this author in PubMed Google Scholar
Sambit Bakshi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sambit Bakshi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rout, J.K., Singh, S., Jena, S.K. et al. Deceptive review detection using labeled and unlabeled data. Multimed Tools Appl 76, 3187–3211 (2017). https://doi.org/10.1007/s11042-016-3819-y

Download citation

Received: 12 January 2016
Revised: 18 July 2016
Accepted: 28 July 2016
Published: 19 August 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3819-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deceptive review detection using labeled and unlabeled data

Abstract

Access this article

Similar content being viewed by others

Deceptive Reviews Detection in E-Commerce Websites Using Machine Learning

Detection of spam reviews: a sentiment analysis approach

Survey of review spam detection using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deceptive review detection using labeled and unlabeled data

Abstract

Access this article

Similar content being viewed by others

Deceptive Reviews Detection in E-Commerce Websites Using Machine Learning

Detection of spam reviews: a sentiment analysis approach

Survey of review spam detection using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation