Skip to main content
Log in

The Linguistic Annotation Framework: a standard for annotation interchange and merging

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper overviews the International Standards Organization–Linguistic Annotation Framework (ISO–LAF) developed in ISO TC37 SC4. We describe the XML serialization of ISO–LAF, the Graph Annotation Format (GrAF) and discuss the rationale behind the various decisions that were made in determining the standard. We describe the structure of the GrAF headers in detail and provide multiple examples of GrAF representation for text and multi-media. Finally, we discuss the next steps for standardization of interchange formats for linguistic annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. The Lexical Markup Language (LMF) (Francopoulo 2013).

  2. http://www.anc.org/OANC.

  3. http://www.cs.vassar.edu/CES/CES1.html.

  4. AG was subsequently augmented with ad hoc mechanisms to accommodate hierarchical relations, but these were never part of the underlying AG data model.

  5. Annotation Graphs allow nodes to be associated with locations in primary data, but not with other nodes in the graphs defined over the data.

  6. http://www.uml.org.

  7. See Neumann et al. (2013) for a description of the query and visualization tool ANNIS, which enables such queries over MASC data.

  8. The term “document” is applied broadly here to include physical artifacts other than text, and to allow for the possibility that a logical unit of primary data is distributed over multiple computer files.

  9. http://www.cs.vassar.edu/CES/CES1-3.html.

  10. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html.

  11. The @attribute-name notation is used for XML attributes throughout the paper.

  12. Note that all anchor types are associated with one or more media, but a medium is not necessarily associated with an anchor type—in particular, media types associated with documents other than primary data documents (notably, annotation documents) are not associated with an anchor type.

  13. XPath is the XML Path Language defined by W3C; see http://www.w3.org/TR/xpath/.

  14. The annotation documentation would be referenced in the annotation type declaration in the resource header.

  15. Note that the @type attribute on the region element specifies the anchor type and not the region type.

  16. Note that anchors into character data refer to locations between characters, not to the position of the characters themselves.

  17. Sentences may also be represented as annotations defined over tokens, but for some purposes it is less desirable to consider a sentence as an ordered set of tokens than as a single span of characters.

  18. Some detail concerning the html display has been omitted for brevity.

  19. http://www.lat-mpi.eu/tools/elan/.

  20. http://www.graphviz.org/.

  21. http://www.anc.org/data/oanc.

  22. http://www.anc.org/data/masc/.

  23. http://www.anc.org/software/anc2go/.

  24. http://nltk.org.

  25. http://ifarm.nl/signll/conll/.

  26. http://www.graphviz.org.

  27. http://www.clarin.eu.

  28. https://pypi.python.org/pypi/graf-python/0.3.0.

  29. https://poio-api.readthedocs.org/en/latest/.

  30. http://www.sfb632.uni-potsdam.de/annis/.

  31. The ANNIS implementation for accessing MASC annotations is available from http://www.anc.org/software/annis.

  32. Note that the names of the object and features are much less important than the types of the objects and associated features.

  33. http://lapps.anc.org.

  34. http://langrid.nict.go.jp/.

  35. http://www.panacea-lr.eu/.

  36. http://www.clarin.eu.

  37. For more information see http://lapps.anc.org/web-service-exchange-vocabulary/.

  38. http://www.kyoto-project.eu/.

  39. http://www.ausnc.org.au/.

References

  • Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet project. In: COLING-ACL ’98: Proceedings of the conference (pp. 86–90).

  • Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33(1–2), 23–60.

    Article  Google Scholar 

  • Blumtritt, J., Bouda, P., & Rau, F. (2013). Poio API and GraF-XML: A radical stand-off approach in language documentation and language typology. In: Proceedings of balisage: The markup conference 2013, Montreal, Canada, Balisage Series on Markup Technologies (vol. 10). doi:10.4242/BalisageVol10.Bouda01.

  • Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: A framework and graphical development environment for robust nlp tools and applications. In: Proceedings of ACL’02.

  • Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.

    Article  Google Scholar 

  • Francopoulo, G. (Ed.). (2013). LMF: Lexical Markup Framework. London: Wiley-ISTE.

  • Ide, N., & Bunt, H. (2010). Anatomy of annotation schemes: Mapping to GrAF. In Proceedings of the fourth linguistic annotation workshop (LAW IV) (pp. 247–255). Uppsala: Association for Computational Linguistics.

  • Ide, N., & Romary, L. (2001). Standards for language resources. In: Proceedings of IRCS workshop on linguistic databases.

  • Ide, N., & Romary, L. (2003). Outline of the International Standard Linguistic Annotation Framework. In Proceedings of ACL’03 workshop on linguistic annotation: Getting the model right (pp. 1–5).

  • Ide, N., & Romary, L. (2004a). A registry of standard data categories for linguistic annotation. In Proceedings of the fourth international language resources and evaluation conference (LREC’04), Lisbon, Portugal (pp. 135–138).

  • Ide, N., & Romary, L. (2004b). International Standard for a Linguistic Annotation Framework. Journal of Natural Language Engineering, 10(3–4), 211–225.

    Article  Google Scholar 

  • Ide, N., & Romary, L. (2007). Towards international standards for language resources. In L. Dybkjaer, H. Hemsen, & W. Minker (Eds.), Evaluation of text and speech systems (pp. 263–284). Berlin: Springer.

    Chapter  Google Scholar 

  • Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In Proceedings of the linguistic annotation workshop (LAW), association for computational linguistics (pp. 1–8).

  • Ide, N., Bonhomme, P., & Romary, L. (2000). XCES: An XML-based encoding standard for linguistic corpora. In Proceedings of the second international language resources and evaluation conference (LREC’00).

  • Ide, N., Baker, C., Fellbaum, C., & Passonneau, R. (2010a). The manually annotated sub-corpus: A community resource for and by the people. In Proceedings of the ACL 2010 conference short papers (pp. 68–73). Uppsala: Association for Computational Linguistics.

  • Ide, N., Suderman, K., & Simms, B. (2010b). ANC2Go: A web application for customized corpus creation. In Proceedings of the seventh international conference on language resources and evaluation (LREC), Valletta, Malta.

  • Ide, N., Prasad, R., & Joshi, A. (2011). Towards interoperability for the Penn discourse treebank. In Proceedings of the sixth joint ISO–ACL SIGSEM workshop on interoperable semantic annotation (pp. 49–55).

  • ISO. (2005). Language Resource Management–Feature Structures, Part 1: Feature structure representation. ISO Document ISO/DIS 24610–1.

  • ISO. (2012). Language Resource Management–Linguistic Annotation Framework. ISO 24612.

  • Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., & Wright, S. E. (2009). ISOCat: Remodelling metadata for language resources. International Journal of Metadata, Semantics and Ontologies, 4, 261–276.

    Article  Google Scholar 

  • Kipp, M. (2001). ANVIL: A generic annotation tool for multimodal dialogue. In INTERSPEECH’01 (pp. 1367–1370).

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Neumann, A., Ide, N., & Stede, M. (2013). Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF. In Proceedings of the seventh linguistic annotation workshop (LAW) (pp. 98–102). Bulgaria: Sofia.

  • Pustejovsky, J., Lee, K., Bunt, H., & Romary, L. (2010). ISO-TimeML: An international standard for semantic annotation. In Proceedings of the seventh international language resources and evaluation conference (LREC’10).

  • Thompson, HS., & McKelvie, D. (1997). Hyperlink semantics for standoff markup of read-only documents. In Proceedings of SGML Europe 97: The next decade-pushing the envelope (pp. 227–229).

  • Zeldes, A., Ritz, J., Lüdeling, A., & Chiarcos, C. (2009). ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics.

Download references

Acknowledgments

This work was supported by National Science Foundation Grant INT-0753069.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nancy Ide.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ide, N., Suderman, K. The Linguistic Annotation Framework: a standard for annotation interchange and merging. Lang Resources & Evaluation 48, 395–418 (2014). https://doi.org/10.1007/s10579-014-9268-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-014-9268-1

Keywords

Navigation