skip to main content
10.1145/1835449.1835617acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Crowdsourcing a wikipedia vandalism corpus

Published:19 July 2010Publication History

ABSTRACT

We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32452 edits on 28468 Wikipedia articles, among which 2391 vandalism edits have been identified. 753 human annotators cast a total of 193022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as "regular" or "vandalism." The corpus is available free of charge.

References

  1. O. Alonso and S. Mizzaro. Can We Get Rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In Proc. of SIGIR'09.Google ScholarGoogle Scholar
  2. R. S. Geiger and D. Ribes. The Work of Sustaining Order in Wikipedia: The Banning of a Vandal. In Proc. of CSCW'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Y. Itakura and C. L. A. Clarke. Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In Proc. of SIGIR'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Potthast and R. Gerling. Webis Wikipedia Vandalism Corpus Webis-WVC-07. http://www.webis.de/research/corpora, 2007.Google ScholarGoogle Scholar
  5. M. Potthast, B. Stein, and R. Gerling. Automatic Vandalism Detection in Wikipedia. In Proc. of ECIR'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Priedhorsky, J. Chen, S. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, Destroying, and Restoring Value in Wikipedia. In Proc. of Group'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Smets, B. Goethals, and B. Verdonk. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In Proc. of WikiAI at AAAI'08.Google ScholarGoogle Scholar

Index Terms

  1. Crowdsourcing a wikipedia vandalism corpus

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
      July 2010
      944 pages
      ISBN:9781450301534
      DOI:10.1145/1835449

      Copyright © 2010 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2010

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader