Skip to main content
Log in

SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we present SFU ReviewSP-NEG, the first Spanish corpus annotated with negation with a wide coverage freely available. We describe the methodology applied in the annotation of the corpus including the tagset, the linguistic criteria and the inter-annotator agreement tests. We also include a complete typology of negation patterns in Spanish. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoids ambiguity in the annotation process, and they provide wide coverage (i.e. they resolved all the cases occurring in the corpus). We use the SFU ReviewSP as a base in order to make the annotations. The corpus consists of 400 reviews, 221,866 words and 9455 sentences, out of which 3022 sentences contain at least one negation structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Version 1.0.0: http://sinai.ujaen.es/sfu-review-sp-neg-2/.

  2. These sentences were extracted from 730 documents from the DrugBank database (Wishart et al. 2008).

  3. http://www.coli.uni-saarland.de/projects/semeval2010_FG/.

  4. The works are The Hound of the Baskervilles and The Adventure of Wisteria Lodge.

  5. The corpus is available in : http://www.clips.ua.ac.be/BiographTA/corpora.html.

  6. In Bioscope, a very fine-grained analysis of the scope is applied, taking into account very specific cases, for medical domain.

  7. Unified Medical Language System.

  8. NP, PP and ADJP stand for nominal, adjectival and prepositional phrase respectively.

  9. We use indistinctly the term negation particle and negation marker.

  10. In oral language, the focus is often marked with specific intonation (pitch) and intensity (volume).

  11. In the examples, we use brackets to indicate the scope and we underline the event.

  12. As a result of the annotation of the corpus, we built up a lexicon containing the collected set of negation markers and cues.

  13. Sentence extracted from the review: no_2_20.txt–Domain: hotels–SFU Review\(_{SP}\)-NEG.

  14. d = determinant, n = noun, v = verb, a = adjective, r = adverb, c = conjunction, s = preposition, f = punctuation mark.

  15. For instance, <postype=“article”> indicates that the determiner is an article and <complex=“no”> indicates that the preposition is not complex.

  16. For a complete description of the morphological tags, see http://clic.ub.edu/corpus/ancora-documentacio.

  17. http://nlp.lsi.upc.edu/freeling/node/30.

  18. http://annotation.exmaralda.org/index.php/AnCoraPipe.

    User’s guide: http://clic.ub.edu/ca/ancorapipe.

  19. 95.82% of the 4329 negation structures have a semantic value different from “noneg” (Table 5).

References

  • Afzal, Z., Pons, E., Kang, N., Sturkenboom, M. C., Schuemie, M. J., & Kors, J. A. (2014). Contextd: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus. BMC Bioinformatics, 15(1), 1.

    Article  Google Scholar 

  • Banjade, R., & Rus, V. (2016). Dt-neg: Tutorial dialogues annotated for negation scope and focus in context. In Chair NCC, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France.

  • Blanco, E., & Moldovan, D. (2014). Retrieving implicit positive meaning from negated statements. Natural Language Engineering, 20(04), 501–535.

    Article  Google Scholar 

  • Bokharaeian, B., Diaz, A., Neves, M., & Francisco, V. (2014). Exploring negation annotations in the drugddi corpus. In Fourth workshop on building and evaluating resources for health and biomedical text processing (BIOTxtM 2014), Citeseer.

  • Councill, IG., McDonald, R., & Velikovich, L. (2010). What’s great and what’s not: learning to classify the scope of negation for improved sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics, (pp. 51–59).

  • Demonte, V., & Bosque, I. (1999). Gramática descriptiva de la lengua española. Espasa Calpe.

  • Erk, K., & Pado, S. (2004). A powerful and versatile xml format for representing role-semantic annotation. In LREC, Citeseer.

  • Española, R. A. (2009). Nueva gramática de la lengua española.

  • Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P., & Declerck, T. (2013). The ddi corpus: An annotated corpus with pharmacological substances and drug-drug interactions. Journal of Biomedical Informatics, 46(5), 914–920.

    Article  Google Scholar 

  • Huddleston, R., Pullum, G. K., et al. (2002). The cambridge grammar of English language. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Jiménez-Zafra, S. M., Martın-Valdivia, M. T., Urena-López, L. A., Martı, M. A., & Taulé, M. (2016). Problematic cases in the annotation of negation in Spanish. ExProM 2016 p 42.

  • Kim, J. D., Ohta, T., & Tsujii, J. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(1), 1.

    Article  Google Scholar 

  • Konstantinova, N., & De Sousa, S. C. (2011). Annotating negation and speculation: The case of the review domain. In RANLP student research workshop (pp. 139–144).

  • Konstantinova, N., De Sousa, S.C., Díaz, N.P.C., López, M.J.M., Taboada, M., & Mitkov, R. (2012). A review corpus annotated for negation, speculation and their scope. In LREC (pp. 3190–3195).

  • Martí, M. A., Martín-Valdivia, M. T., Taulé, M., Jiménez-Zafra, S. M., Nofre, M., & Marsó, L. (2016). La negación en español: Análisis y tipología de patrones de negación. Procesamiento del Lenguaje Natural, 57, 41–48.

  • Morante, R., & Daelemans, W. (2012). Conandoyle-neg: Annotation of negation in conan doyle stories. In Proceedings of the eighth international conference on language resources and evaluation, Istanbul, Citeseer.

  • Morante, R., & Sporleder, C. (2012). Modality and negation: An introduction to the special issue. Computational Linguistics, 38(2), 223–260.

    Article  Google Scholar 

  • Moreno, A., López, S., Sánchez, F., & Grishman, R. (2003). Developing a syntactic annotation scheme and tools for a Spanish treebank. In Treebanks (pp. 149–163). Springer, New York

  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Article  Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing-Volume 10, Association for Computational Linguistics (pp. 79–86).

  • Payne, T. E. (1997). Describing morphosyntax: A guide for field linguists. Cambridge: Cambridge University Press.

    Google Scholar 

  • Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1–10). Springer, New York.

  • Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., Al-Ayyoub, M., Zhao, Y., & Qin, B. (2016). Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of SemEval-2016 (pp. 19–30).

  • Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., et al. (2007). Bioinfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(1), 1.

    Article  Google Scholar 

  • Ruppenhofer, J., Sporleder, C., Morante, R., Baker, C., & Palmer, M. (2010). Semeval-2010 task 10: Linking events and their participants in discourse. In Proceedings of the 5th international workshop on semantic evaluation, association for computational linguistics (pp. 45–50).

  • Sandoval, A. M., & Salazar, M. G. (2013). La anotación de la negación en un corpus escrito etiquetado sintácticamente annotation of negation in a written treebank. Revista Iberoamericana de Linguistica 8.

  • Segura Bedmar, I., Martínez, P., & Herrero Zazo, M. (2013). Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). Association for Computational Linguistics.

  • Taboada, M., Anthony, C., & Voll, K. (2006). Methods for creating semantic orientation dictionaries. In Proceedings of the 5th conference on language resources and evaluation (LREC’06) (pp. 427–432).

  • Vincze, V. (2010). Speculation and negation annotation in natural language texts: what the case of bioscope might (not) reveal. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics (pp. 28–31).

  • Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The bioscope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(11), 1.

    Google Scholar 

  • Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics (pp. 60–68).

  • Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., et al. (2008). Drugbank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research, 36(suppl 1), D901–D906.

    Article  Google Scholar 

  • Zou, B., Zhou, G., & Zhu, Q. (2016). Research on Chinese negation and speculation: Corpus annotation and identification. Frontiers of Computer Science (pp. 1–13).

Download references

Acknowledgements

This work has been partially supported by a Grant from the Ministerio de Educación, Cultura y Deporte (MECD–scholarship FPU014/00983), Fondo Europeo de Desarrollo Regional (FEDER), and the projects REDES (TIN2015-65136-C2-1-R) and SOMEMBED-SLANG (TIN2015-71147-C2-2), which receive financial support from the Spanish Ministerio de Economía y Competitividad. We would like to thank Maite Taboada and her team for sharing the useful SFU resource with the research community. We would also like to express our gratitude to the three anonymous reviewers for their comments and suggestions for improving this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salud María Jiménez-Zafra.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiménez-Zafra, S.M., Taulé, M., Martín-Valdivia, M.T. et al. SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns. Lang Resources & Evaluation 52, 533–569 (2018). https://doi.org/10.1007/s10579-017-9391-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9391-x

Keywords

Navigation