Skip to main content

The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation in the Biomedical Domain

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

The Colorado Richly Annotated Full Text (CRAFT) corpus consists of full-text journal articles. The primary motivation for the annotation project was the accumulating body of evidence indicating that the bodies of journal articles contain much information that is not present in the abstracts, and that the textual and structural characteristics of article bodies are different from those of abstracts. The development of CRAFT was characterized by a “multi-model” annotation task. The sample population was all journal articles that had been used by the Mouse Genome Informatics group as evidence for at least one Gene Ontology or Mouse Phenotype Ontology “annotation.” The linguistic annotation is represented in the widely known Penn Treebank format (Marcus et al., Comput. Linguist. 19(2), 313–330, 1993) [50], with the addition of a small number of tags and phrasal categories to accommodate the idiosyncrasies of the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    When we mention linguistic annotation, we mean part of speech, syntactic, structural (e.g. sentence boundaries and tokenization) and coreference annotation. This is contrasted with named entity annotation, referred to more broadly as ‘semantic’ annotation when we refer to broad semantic categories, such as Sequence Ontology concepts or NCBI Taxonomy entities.

References

  1. Abacha, A.B., Zweigenbaum, P.: Annotation et interrogation sémantiques de textes médicaux. Atelier Web Sémantique Médical, IC (2010)

    Google Scholar 

  2. Agarwal, S., Yu, H.: Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics 25(23), 3174–3180 (2009)

    Article  Google Scholar 

  3. Albright, D., Lanfranchi, A., Fredriksen, A., Styler, W.F., Warner, C., Hwang, J.D., Choi, J.D., Dligach, D., Nielsen, R.D., Martin, J., et al.: Towards comprehensive syntactic and semantic annotations of the clinical narrative. J. Am. Med. Inform. Associ. (2013)

    Google Scholar 

  4. Ambert, K.H., Cohen, A.M., Burns, G.A., Boudreau, E., Sonmez, K.: Virk: an active learning-based system for bootstrapping knowledge base development in the neurosciences. Front. Neuroinform. 7 (2013)

    Google Scholar 

  5. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Google Scholar 

  6. Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Jr., W.A.B., Cohen, K.B., Verspoor, K., Blake, J.A., Hunter, L.E.: Concept annotation in the CRAFT corpus. BMC Bioinform. 13(161) (2012)

    Google Scholar 

  7. Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P.C., Erickson, B., Miller, T., Lin, C., Savova, G., Pustejovsky, J.: Temporal annotation in the clinical domain. In: Proceedings of the Association for Computational Linguistics, pp. 143–154 (2014)

    Google Scholar 

  8. Blaschke, C., Valencia, A.: Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study. Comp. Funct. Genomics 2(4), 196–206 (2001)

    Article  Google Scholar 

  9. Boguraev, B., Ide, N., Meyers, A., Nariyama, S., Stede, M., Wiebe, J., Wilcock, G. (eds.): Proceedings of the Linguistic Annotation Workshop. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-15

  10. Castro, L.G., McLaughlin, C., Garcia, A.: Biotea: RDFizing PubMed Central in support for the paper as an interface to the web of data. J. Biomed. Semant. 4(Suppl 1), S5 (2013)

    Article  Google Scholar 

  11. Chinchor, N., Robinson, P.: Muc-7 named entity task definition. In: Proceedings of the 7th Conference on Message Understanding, p. 29 (1997)

    Google Scholar 

  12. Cohen, K.B.: BioNLP: biomedical text mining. In: N. Indurkhya, F.J. Damerau (eds.) Handbook of Natural Language Processing, 2nd edn. (2010)

    Google Scholar 

  13. Cohen, K.B., Johnson, H.L., Verspoor, K., Roeder, C., Hunter, L.E.: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinform. 11(492) (2010)

    Google Scholar 

  14. Cohen, K.B., Lanfranchi, A., Corvey, W., Jr., W.A.B., Roeder, C., Ogren, P.V., Palmer, M., Hunter, L.E.: Annotation of all coreference in biomedical text: guideline selection and adaptation. In: BioTxtM 2010: 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining, pp. 37–41 (2010)

    Google Scholar 

  15. Cohen, K.B., Roeder, C., Jr., W.A.B., Hunter, L., Verspoor, K.: Test suite design for biomedical ontology concept recognition systems. In: Proceedings of the Language Resources and Evaluation Conference (2010)

    Google Scholar 

  16. Collier, N., Tran, M.V., Le, H.q., Ha, Q.T., Oellrich, A., Rebholz-Schuhmann, D.: Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. PloS ONE 8(10), e72,965 (2013)

    Google Scholar 

  17. Collier, N., Paster, F., Campus, H., Tran, A.M.V.: The impact of near domain transfer on biomedical named entity recognition. In: Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL, pp. 11–20 (2014)

    Google Scholar 

  18. Corney, D.P., Buxton, B.F., Langdon, W.B., Jones, D.T.: BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17), 3206–3213 (2004)

    Article  Google Scholar 

  19. Dai, H.J., Wu, J.C.Y., Tsai, R.T.H.: Collective instance-level gene normalization on the IGN corpus. PLoS ONE 8(11), e79,517 (2013)

    Google Scholar 

  20. Doğan, R.I., Lu, Z.: An improved corpus of disease mentions in PubMed citations. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, pp. 91–99. Association for Computational Linguistics (2012)

    Google Scholar 

  21. Doğan, R.I., Comeau, D.C., Yeganova, L., Wilbur, W.J.: Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora. Database 2014, bau044 (2014)

    Google Scholar 

  22. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)

    Article  Google Scholar 

  23. Doğan, R.I., Wilbur, W.J., Comeau, D.C.: BioC and simplified use of the PMC open access dataset for biomedical text mining. In: Proceedings of the 2014 Workshop on Biomedical Text Mining, Language Resources And Evaluation Conference (2014)

    Google Scholar 

  24. Fort, K., Nazarenko, A., Rosset, S.: Modeling the complexity of manual annotation tasks: a grid of analysis. In: Proceedings of the International Conference on Computational Linguistics (COLING 2012), pp. 895–910 (2012)

    Google Scholar 

  25. Fox, L.M., Williams, L.A., Hunter, L., Roeder, C.: Negotiating a text mining license for faculty researchers. Inform. Technol. Libr. 33(3), 5–21 (2014)

    Article  Google Scholar 

  26. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(Suppl. 1), S74–S82 (2001)

    Article  Google Scholar 

  27. Gautama: Nyaaya Suutras (150 CE)

    Google Scholar 

  28. Ginn, R., Pimpalkhute, P., Nikfarjam, A., Patki, A., Karen, O., Sarker, A., Smith, K., Gonzalez, G.: Mining Twitter for adverse drug reaction mentions: a corpus and classification benchmark. In: Evaluating Resources for Health and Biomedical Text Processing (BioTxtM2014). Reykjavik, Iceland (2014). http://www.nactem.ac.uk/biotxtm2014/programme.php

  29. Golik, W., Warnier, P., Nédellec, C.: Corpus-based extension of termino-ontology by linguistic analysis: a use case in biomedical event extraction. In: Proceedings of the 9th International Conference. Terminology and Artificial Intelligence (TIA 2011), pp. 37–39 (2011)

    Google Scholar 

  30. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. COLING 96, 466–471 (1996)

    Google Scholar 

  31. Grouin, C., Rosset, S., Zweigenbaum, P., Fort, K., Galibert, O., Quintard, L.: Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview.In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 92–100. Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/W11-0411. (Poster)

  32. Gurulingappa, H., Rajput, A.M., Roberts, A., Fluck, J., Hofmann-Apitius, M., Toldo, L.: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 45(5), 885–892 (2012). doi:10.1016/j.jbi.2012.04.008

    Article  Google Scholar 

  33. Haverinen, K., Ginter, F., Laippala, V., Viljanen, T., Salakoski, T.: Dependency-based propbanking of clinical Finnish. In: Proceedings of the Fourth Linguistic Annotation Workshop (LAW IV), pp. 137–141. ACL (2010)

    Google Scholar 

  34. Hersh, W., Kalpathy-Cramer, J., Müller, H.: The ImageCLEFmed medical image retrieval task test collection. J. Digit. Imaging 22, 648–655 (2009)

    Article  Google Scholar 

  35. Hirschman, L., Robinson, P., Burger, J., Vilain, M.: Automating coreference: the role of annotated training data. In: Proceedings of the AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 118–121 (1997)

    Google Scholar 

  36. Hripcsak, G., Rothschild, A.S.: Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12(3), 296–298 (2005)

    Article  Google Scholar 

  37. Ide, N., Xia, F. (eds.): Proceedings of the Sixth Linguistic Annotation Workshop. Association for Computational Linguistics, Jeju, Republic of Korea (2012). http://www.aclweb.org/anthology/W12-36

  38. Ide, N., Meyers, A., Pradhan, S., Tomanek, K. (eds.): Proceedings of the 5th Linguistic Annotation Workshop. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/W11-04

  39. Kedzia, P., Piasecki, M., Maziarz, M., Marcińczuk, M.: Recognising compositionality of multi-word expressions in the wordnet oriented perspective. In: Advances in Artificial Intelligence and its Applications, pp. 240–251. Springer, Berlin (2013)

    Google Scholar 

  40. Kilicoglu, H., Rosemblat, G., Fiszman, M., Rindflesch, T.C.: Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinf. 12(1), 486 (2011)

    Article  Google Scholar 

  41. Kim, J.D.: A generalized LCS algorithm and its application to corpus alignment. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, pp. 14–18 (2013)

    Google Scholar 

  42. Kim, J.D.: Sharing reference texts for interoperability of literature annotation. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, pp. 57–61 (2013)

    Google Scholar 

  43. Kim, J.D., Wang, Y.: PubAnnotation: a persistent and sharable corpus and annotation repository. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, pp. 202–205. Association for Computational Linguistics (2012)

    Google Scholar 

  44. Kim, J.D., Ohta, T., Tateisi, Y., Mima, H., Tsujii, J.: XML-based linguistic annotation of corpus. In: Proceedings of The First NLP and XML Workshop, pp. 47–53 (2001)

    Google Scholar 

  45. Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: Genia corpus–a semantically annotated corpus for bio-textmining. Bioinformatics 19(Suppl. 1), 180–182 (2003)

    Article  Google Scholar 

  46. Lee, H.J., Shim, S.H., Song, M.R., Lee, H., Park, J.C.: CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations. BMC Bioinf. 14(1), 323 (2013)

    Article  Google Scholar 

  47. Levin, L., Stede, M. (eds.): Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014). http://www.aclweb.org/anthology/W14-49

  48. Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinf. 10(46) (2009)

    Google Scholar 

  49. Lu, Z., Kao, H.Y., Wei, C.H., Huang, M., Liu, J., Kuo, C.J., Hsu, C.N., Tsai, R.T., Dai, H.J., Okazaki, N., et al.: The gene normalization task in BioCreative III. BMC Bioinf. 12(Suppl 8), S2 (2011)

    Google Scholar 

  50. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  51. McIntosh, T., Curran, J.R.: Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinf. 10(311) (2009)

    Google Scholar 

  52. Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: BioCause: annotating and analysing causality in the biomedical domain. BMC Bioinf. 14(1), 2 (2013)

    Article  Google Scholar 

  53. Mitchell, A., Strassel, S., Huang, S., Zakhary, R.: ACE 2004 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia (2005)

    Google Scholar 

  54. Molla, D., Santiago-Martinez, M.E.: Development of a corpus for evidence based medicine summarisation. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 86–94 (2011)

    Google Scholar 

  55. Morgan, A.A., Hirschman, L., Colosimo, M., Yeh, A.S., Colombe, J.B.: Gene name identification and normalization using a model organism database. J. Biomed. Inf. 37(6), 396–410 (2004). doi:10.1016/j.jbi.2004.08.010

  56. Morgan, A.A., Lu, Z., Wang, X., Cohen, A.M., Fluck, J., Ruch, P., Divoli, A., Fundel, K., Leaman, R., Hakenberg, J., et al.: Overview of BioCreative II gene normalization. Genome Biology 9(Suppl 2), S3 (2008)

    Google Scholar 

  57. Névéol, A., Grouin, C., Leixa, J., Rosset, S., Zweigenbaum, P.: The Quaero French Medical Corpus: a resource for medical entity recognition and normalization. In: Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (2014)

    Google Scholar 

  58. Neves, M.: An analysis on the entity annotations in biological corpora. F100 Res. 3(96) (2014)

    Google Scholar 

  59. Nobata, C., Dobson, P.D., Iqbal, S.A., Mendes, P., Tsujii, J., Kell, D.B., Ananiadou, S.: Mining metabolites: extracting the yeast metabolome from the literature. Metabolomics 7(1), 94–101 (2011)

    Article  Google Scholar 

  60. Nunes, T., Campos, D., Matos, S., Oliveira, J.L.: BeCAS: biomedical concept recognition services and visualization. Bioinformatics 29, 1915–1916 (2013)

    Article  Google Scholar 

  61. Ogren, P.: Knowtator: a Protege plugin for annotated corpus construction. In: HLT-NAACL 2006 Companion Volume (2006)

    Google Scholar 

  62. Ogren, P.: Knowtator: a plug-in for creating training and evaluation data sets for biomedical natural language systems. In: The International Protege conference, pp. 73–76 (2006)

    Google Scholar 

  63. Ohta, T., Kim, J.D., Pyysalo, S., Wang, Y., Tsujii, J.: Incorporating GENETAG-style annotation to GENIA corpus. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp. 106–107. Association for Computational Linguistics (2009)

    Google Scholar 

  64. Ohta, T., Pyysalo, S., Tsujii, J., Ananiadou, S.: Open-domain anatomical entity mention detection. In: Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, pp. 27–36. Association for Computational Linguistics (2012)

    Google Scholar 

  65. Ohta, T., Tateisi, Y., Kim, J.D., Mima, H., Tsujii, J.: The GENIA corpus: an annotated corpus in molecular biology. In: Proceedings of the Human Language Technology Conference (2002)

    Google Scholar 

  66. Pareja-Lora, A., Liakata, M., Dipper, S. (eds.): Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Association for Computational Linguistics, Sofia, Bulgaria (2013). http://www.aclweb.org/anthology/W13-23

  67. Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., Morante, R.: QA4MRE 2011–2013: overview of question answering for machine reading evaluation. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization, pp. 303–320. Springer, Berlin (2013)

    Google Scholar 

  68. Pradhan, S., Elhadad, N., South, B., Martinez, D., Christensen, L., Vogel, A., Suominen, H., Chapman, W., Savova, G.: Task 1: ShARe, CLEF eHealth evaluation lab: Online Working Notes of CLEF. CLEF 230 (2013)

    Google Scholar 

  69. Pradhan, S., Elhadad, N., South, B., Martinez, D., Christensen, L., Vogel, A., Suominen, H., Chapman, W.W., Savova, G.: Evaluating the State of the Art in Disorder Recognition and Normalization of the Clinical Narrative

    Google Scholar 

  70. Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: Modeling unrestricted coreference in OntoNotes. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–27. Association for Computational Linguistics (2011)

    Google Scholar 

  71. Pradhan, S.S., Ramshaw, L., Weischedel, R., MacBride, J., Micciulla, L.: Unrestricted coreference: Identifying entities and events in OntoNotes. In: International Conference on Semantic Computing, 2007. ICSC 2007, pp. 446–453. IEEE, New York (2007)

    Google Scholar 

  72. Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The biomedical discourse relation bank. BMC BioInfo. 12(88) (2011)

    Google Scholar 

  73. Pustejovsky, J., Stubbs, A.: Natural language annotation for machine learning. O’Reilly Media, Newton (2012)

    Google Scholar 

  74. Pyysalo, S., Ananiadou, S.: Anatomical entity mention recognition at literature scale. Bioinformatics (2013)

    Google Scholar 

  75. Pyysalo, S., Ohta, T., Miwa, M., Cho, H.C., Tsujii, J., Ananiadou, S.: Event extraction across multiple levels of biological organization. Bioinformatics 28(18), i575–i581 (2012)

    Article  Google Scholar 

  76. Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J.,Ananiadou, S.: Overview of the infectious diseases (ID) task of BioNLP Shared Task 2011. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 26–35. Association for Computational Linguistics (2011)

    Google Scholar 

  77. Raghavan, P., Fosler-Lussier, E., Lai, A.M.: Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 1366. American Medical Informatics Association (2012)

    Google Scholar 

  78. Ramanan, S., Nathan, P.S.: Adapting Cocoa, A Multi-class Entity Detector, for the CHEMDNER Task of BioCreative IV (2013)

    Google Scholar 

  79. Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Roberts, I., Setzer, A.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inf. 42(5), 950–966 (2009)

    Article  Google Scholar 

  80. Roberts, K., Harabagiu, S.M., Skinner, M.A.: Structuring operative notes using active learning. In: Proceedings of the 2014 BioNLP Workshop, pp. 68–76 (2014)

    Google Scholar 

  81. Roberts, K., Masterton, K., Fiszman, M., Kilicoglu, H., Demner-Fushman, D.: Annotating question decomposition on complex medical questions. In: Language Resources and Evaluation Conference (2014)

    Google Scholar 

  82. Roberts, K., Masterton, K., Fiszman, M., Kilicoglu, H., Demner-Fushman, D.: Annotating question types for consumer health questions. In: Proceedings of the Fourth LREC Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (2014)

    Google Scholar 

  83. Guergana, S., Pradhan, S., Palmer, M., Styler, W., Chapman, W., Elhadad, N.: Annotating the clinical text - MiPACQ, ShARe, SHARPn and THYME corpora. In: Ide, N., Pustejovsky, J. (eds.) This volume. Springer, Berlin (2015)

    Google Scholar 

  84. Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinf. 4(1) (2003). doi:10.1186/1471-2105-4-20

  85. Smith, B., Ceusters, W.: Ontological realism: a methodology for coordinated evolution of scientific ontologies. Appl. Ontol. 5(3), 139–188 (2010)

    Google Scholar 

  86. Stede, M., Huang, C.R., Ide, N., Meyers, A. (eds.): Proceedings of the Third Linguistic Annotation Workshop. Association for Computational Linguistics, Suntec, Singapore (2009). http://www.aclweb.org/anthology/W09-30

  87. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics (2012)

    Google Scholar 

  88. Stubbs, A.: A methodology for using professional knowledge in corpus annotation. Ph.D. thesis, Brandeis University (2013)

    Google Scholar 

  89. Stubbs, A., Uzuner, O.: De-identification of medical records through annotation. In: Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation. Springer, Berlin (2015)

    Google Scholar 

  90. Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in full text articles. In: Natural Language Processing in the Biomedical Domain, pp. 9–13 (2002)

    Google Scholar 

  91. Tateisi, Y., Yakushiji, A., Ohta, T., Tsujii, J.: Syntax annotation for the GENIA corpus. In: Second International Joint Conference on Natural Language Processing: Companion Volume, pp. 220–225 (2005)

    Google Scholar 

  92. Temnikova, I.P., Cohen, K.B.: Recognizing sublanguages in scientific journal articles through closure properties. In: Proceedings of BioNLP 2013 (2013)

    Google Scholar 

  93. Thompson, P., Iqbal, S.A., McNaught, J., Ananiadou, S.: Construction of an annotated corpus to support biomedical information extraction. BMC Bioinf. 10(1), 349 (2009)

    Article  Google Scholar 

  94. Thompson, P., Nawaz, R., McNaught, J., Ananiadou, S.: Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinf. 12(1), 393 (2011)

    Article  Google Scholar 

  95. Van Auken, K., Schaeffer, M.L., McQuilton, P., Laulederkind, S.J., Li, D., Wang, S.J., Hayman, G.T., Tweedie, S., Arighi, C.N., Done, J., et al.: BC4GO: A Full-text Corpus for the BioCreative IV GO Task. Database 2014

    Google Scholar 

  96. Van Mulligen, E.M., Fourrier-Reglat, A., Gurwitz, D., Molokhia, M., Nieto, A., Trifiro, G., Kors, J.A., Furlong, L.I.: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inf. 45(5), 879–884 (2012)

    Article  Google Scholar 

  97. Verspoor, K., Cohen, K.B., Hunter, L.: The textual characteristics of traditional and open access scientific journals are similar. BMC Bioinf. 10 (2009)

    Google Scholar 

  98. Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Jr., W.A.B., Bada, M., Palmer, M., Hunter, L.E.: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinf. 13(207) (2012)

    Google Scholar 

  99. Verspoor, K., Yepes, A.J., Cavedon, L., McIntosh, T., Herten-Crabb, A., Thomas, Z., Plazzer, J.P.: Annotating the biomedical literature for the human variome. Database J. Biol. Databases Curation (2013)

    Google Scholar 

  100. Xue, N., Poesio, M. (eds.): Proceedings of the Fourth Linguistic Annotation Workshop. Association for Computational Linguistics, Uppsala, Sweden (2010). http://www.aclweb.org/anthology/W10-18

Download references

Acknowledgements

The authors gratefully acknowledge the contributions to this work of the annotators, especially lead annotator Arrick Lanfranchi; Colin Warner for help with reconstructing the quality assurance approach; Amber Stubbs for discussion of multi-model and light annotation tasks; Paul Foster for help with Devanagari; and BBN for use of their coreference annotation guidelines.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Bretonnel Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Cohen, K.B. et al. (2017). The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation in the Biomedical Domain. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_53

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_53

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics