Skip to main content

Outlier-Based Approaches for Intrinsic and External Plagiarism Detection

  • Conference paper
Knowlege-Based and Intelligent Information and Engineering Systems (KES 2011)

Abstract

Plagiarism detection, one of the main problems that educational institutions have been dealing with since the massification of Internet, can be considered as a classification problem using both self-based information and text processing algorithms whose computational complexity is intractable without using space search reduction algorithms. First, self-based information algorithms treat plagiarism detection as an outlier detection problem for which the classifier must decide plagiarism using only the text in a given document. Then, external plagiarism detection uses text matching algorithms where it is fundamental to reduce the matching space with text search space reduction techniques, which can be represented as another outlier detection problem. The main contribution of this work is the inclusion of text outlier detection methodologies to enhance both intrinsic and external plagiarism detection. Results shows that our approach is highly competitive with respect to the leading research teams in plagiarism detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bao, J.-P., Shen, J.-Y., Liu, X.-D., Liu, H.-Y., Zhang, X.-D.: Semantic sequence kin: A method of document copy detection. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 529–538. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Barrón-Cedeño, A., Rosso, P., Benedí, J.-M.: Reducing the plagiarism detection search space on the basis of the kullback-leibler distance. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 523–534. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Braschler, M., Harman, D., Pianta, E. (eds.): CLEF 2010 LABs and Workshops, Notebook Papers, Padua, Italy (September 22-23, 2010)

    Google Scholar 

  4. Chow, T.W.S., Rahman, M.K.M.: Multilayer som with tree-structured data for efficient document retrieval and plagiarism detection. Trans. Neur. Netw. 20(9), 1385–1402 (2009)

    Article  Google Scholar 

  5. Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)

    Book  MATH  Google Scholar 

  6. Hunt, R.: Let’s hear it for internet plagiarism. Teaching Learning Bridges 2(3), 2–5 (2003)

    Google Scholar 

  7. Kasprzak, J., Brandejs, M.: Improving the reliability of the plagiarism detection system - lab report for pan at clef 2010. In: Braschler, et al. (eds.) [3] (2010)

    Google Scholar 

  8. Oberreuter, G., L’Huillier, G., Ríos, S.A., Velásquez, J.D.: Fastdocode: Finding approximated segments of n-grams for document copy detection - lab report for pan at clef 2010. In: Braschler, et al. (eds.) [3] (2010)

    Google Scholar 

  9. Park, C.: In other (people’s) words: plagiarism by university students – literature and lessons. Assessment and Evaluation in Higher Education (5), 471–488 (2003)

    Google Scholar 

  10. Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy (September 22-23, 2010)

    Google Scholar 

  11. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 1–9. CEUR-WS.org (September 2009)

    Google Scholar 

  12. Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85. ACM, New York (2003)

    Google Scholar 

  13. Seaward, L., Matwin, S.: Intrinsic plagiarism detection using complexity analysis. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 56–61. CEUR-WS.org (September 2009)

    Google Scholar 

  14. Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 38–46. CEUR-WS.org (September 2009)

    Google Scholar 

  15. Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)

    Google Scholar 

  16. Eissen, S.M.z., Stein, B., Kulig, M.: Plagiarism detection without reference collections. In: Decker, R., Lenz, H.-J. (eds.) GfKl. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 359–366. Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oberreuter, G., L’Huillier, G., Ríos, S.A., Velásquez, J.D. (2011). Outlier-Based Approaches for Intrinsic and External Plagiarism Detection. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowlege-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23863-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23863-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23862-8

  • Online ISBN: 978-3-642-23863-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics