Abstract
External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the internet, it is not practical to compare the suspicious text with every document in the reference collection. Consequently many approaches to plagiarism detection begin by identifying a set of candidate documents from the reference collection. We report an IR-based approach to the candidate document selection problem that uses query expansion to identify candidates which have been altered. The reported system outperforms a previously reported approach and is also robust to changes in the reference collection text.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barrón-Cedeño, A., Rosso, P., Benedí, J.: Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 523–534. Springer, Heidelberg (2009)
Boisvert, R., Irwin, M.: Plagiarism on the rise. Communications of the ACM 49(6), 23–24 (2006)
Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 196–205. ACM (2008)
Campbell, C.: Writing with other’s words: Using background reading text in academic compositions. In: Kroll, B. (ed.) Second Language Writing: Research Insights for the Classroom, pp. 211–230. Cambridge University Press, Cambridge (1990)
Ceska, Z.: Plagiarism Detection Based on Singular Value Decomposition. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 108–119. Springer, Heidelberg (2008)
Chen, C., Yeh, J., Ke, H.: Plagiarism Detection using ROUGE and WordNet. Journal of Computing 2(3), 34–44 (2010)
Chong, M., Specia, L., Mitkov, R.: Using Natural Language Processing for Automatic Detection of Plagiarism. In: Proceedings of the 4th International Plagiarism Conference (IPC 2010), Newcastle, UK (2010)
Clough, P., Stevenson, M.: Developing A Corpus of Plagiarised Short Answers. In: Language Resources and Evaluation: Special Issue on Plagiarism and Authorship Analysis. Springer, Heidelberg (2010)
Efthimiadis, E.: Query expansion. Annual Review of Information Systems and Technology (ARIST) 31, 121–187 (1996)
Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: Harman, D.K. (ed.) Proceedings TREC-2, pp. 243–249 (1994)
Johns, A., Myers, P.: An analysis of summary protocols of university ESL students. Applied Linguistics 11, 253–271 (1990)
Judge, G.: Plagiarism: Bringing Economics and Education Together (With a Little Help from IT). Computers in Higher Education Economics Review 20(1), 21–26 (2008)
Keck, C.: The use of paraphrase in summary writing: A comparison of l1 and l2 writers. Journal of Second Language Writing 15, 261–278 (2006)
Lane, P., Lyon, C., Malcolm, J.: Demonstration of the Ferret plagiarism detector. In: Proceedings of the 2nd International Plagiarism Conference (2006)
Martin, B.: Plagiarism: a misplaced emphasis. Journal of Information Ethics 3(2), 36–47 (1994)
Maurer, H., Kappe, F., Zaka, B.: Plagiarism - A Survey. Journal of Universal Computer Science 12(8), 1050–1084 (2006)
McCabe, D.: Research report of the center for academic integrity (2005), http://www.academicintegrity.org
McCabe, D., Butterfield, K., Trevino, L.: Academic Dishonesty in Graduate Business Programs: Prevalence, Causes, and Proposed Action. Academy of Management Learning and Education 5(3), 1–294 (2006)
Meyer zu Eissen, S., Stein, B., Kulig, M.: Plagiarism detection without reference collections. In: Advances in Data Analysis, pp. 359–366. Springer, Heidelberg (2007)
Mozgovoy, M., Kakkonen, T., Sutinen, E.: Using Natural Language Parsers in Plagiarism Detection. In: Proceedings of SLaTE 2007 Workshop, Pennsylvania, USA (2007)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)
Park, C.: In other (people’s) words: plagiarism by university students – literature and lessons. Assessment and Evaluation in Higher Education 28(5) (2003)
Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005 (2010)
Potthast, M., Stein, B., Eiselt, A., Cedeño, A., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Proceedings of the CLEF 2010 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Padua, Italy (2010)
Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)
Shivakumar, N., Garcia-Molina, H.: SCAM: A Copy Detection Mechanism for Digital Documents. In: Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries, Texas, USA (1995)
Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E.: 3rd PAN Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse. In: 25th Annual Conference of the Spanish Society for Natural Language Processing (SEPLN), pp. 1–77 (2009)
Uzuner, O., Katz, B., Nahnsen, T.: Using syntactic information to identify plagiarism. In: Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pp. 37–44. Association for Computational Linguistics (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nawab, R.M.A., Stevenson, M., Clough, P. (2012). Retrieving Candidate Plagiarised Documents Using Query Expansion. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)