skip to main content
research-article

Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies

Published:14 March 2020Publication History
Skip Abstract Section

Abstract

Today, misogyny and xenophobia are some of the most important social problems. With the increase in the use of social media, this feeling of hatred toward women and immigrants can be more easily expressed, and therefore it can have harmful effects on social media users. For this reason, it is important to develop systems capable of detecting hateful comments automatically. In this article, we analyze the hate speech in Spanish tweets against women and immigrants conducting classification experiments using different approaches. Moreover, we create appropriate language resources for hate speech detection in Spanish.

References

  1. Miguel Á. Álvarez-Carmona, Estefanıa Guzmán-Falcón, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villasenor-Pineda, Verónica Reyes-Meza, and Antonio Rico-Sulayes. 2018. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. In Notebook Papers of the 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL’18), Vol. 6.Google ScholarGoogle Scholar
  2. Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic identification and classification of misogynistic language on Twitter. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. 57--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 45--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Angelo Basile and Chiara Rubagotti. 2018. CrotoneMilano for AMI at Evalita2018. A performant, cross-lingual misogyny detection system. In Proceedings of the Final Workshop of the 6th EvaluationCampaign (EVALITA’18), Co-Located with the 5th Italian Conference on Computational Linguistics (CLiC-it’18).Google ScholarGoogle ScholarCross RefCross Ref
  5. Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval’19).Google ScholarGoogle ScholarCross RefCross Ref
  6. Elisa Bassignana, Valerio Basile, and Viviana Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. In Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  7. Linda Beckman, Curt Hagquist, and Lisa Hellström. 2013. Discrepant gender patterns for cyberbullying and traditional bullying—An analysis of Swedish adolescent data. Computers in Human Behavior 29, 5 (2013), 1896--1903.Google ScholarGoogle ScholarCross RefCross Ref
  8. Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Innovations in Machine Learning, D. E. Holmes and L. C. Jain (Eds.). Studies in Fuzziness and Soft Computing. Springer.Google ScholarGoogle Scholar
  9. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarGoogle ScholarCross RefCross Ref
  10. Cristina Bosco, Viviana Patti, Marcello Bogetti, Michelangelo Conoscenti, Giancarlo Francesco Ruffo, Rossano Schifanella, and Marco Stranisci. 2017. Tools and resources for detecting hate and prejudice against immigrants in social media. In Proceedingsof the AISB Annual Convention: Symposium III—Social Interactions in Complex Intelligent Systems (SICIS). 79--84.Google ScholarGoogle Scholar
  11. Jose Sebastián Canós. 2018. Misogyny identification through SVM at IberEval 2018. In Proceedings of the 3rd IberEval Workshop.Google ScholarGoogle Scholar
  12. Cristian Cardellino. 2016. Spanish Billion Words Corpus and Embeddings. Retrieved February 17, 2020 from https://crscardellino.github.io/SBWCE/.Google ScholarGoogle Scholar
  13. Cagatay Catal, Ugur Sevim, and Banu Diri. 2011. Practical development of an Eclipse-based software fault prediction tool using naive Bayes algorithm. Expert Systems with Applications 38, 3 (2011), 2347--2353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Naganna Chetty and Sreejith Alathur. 2018. Hate speech review in the context of online social networks. Aggression and Violent Behavior 40, 5–6 (2018), 108–118.Google ScholarGoogle ScholarCross RefCross Ref
  15. Raphael Cohen-Almagor. 2011. Fighting hate and bigotry on the Internet. Policy 8 Internet 3, 3 (2011), 1--26.Google ScholarGoogle Scholar
  16. Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  17. Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems 2, 3 (2012), 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karmen Erjavec and Melita Poler Kovačič. 2012. You don’t understand, this is a new war! Analysis of hate speech in news web sites’ comments. Mass Communication and Society 15, 6 (2012), 899--920.Google ScholarGoogle ScholarCross RefCross Ref
  19. Elisabetta Fersini, Debora Nozza, and Paolo Rosso. 2018. Overview of the Evalita 2018 task on automatic misogyny identification (AMI). In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18).Google ScholarGoogle ScholarCross RefCross Ref
  20. Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. 2018. Overview of the task on automatic misogyny identification at IberEval 2018. In Proceedings of the 3rd IberEval Workshop, Co-Located with the 34th Conferenceof the Spanish Society for Natural Language Processing (SEPLN’18).Google ScholarGoogle Scholar
  21. Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jesse Fox, Carlos Cruz, and Ji Young Lee. 2015. Perpetuating online sexism offline: Anonymity, interactivity, and the effects of sexist hashtags on social media. Computers in Human Behavior 52 (2015), 436--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jesse Fox and Wai Yen Tang. 2014. Sexism in online video games: The role of conformity to masculine norms and social dominance orientation. Computers in Human Behavior 33 (2014), 314--320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Simona Frenda, Bilal Ghanem, and Manuel Montes-y Gómez. 2018. Exploration of misogyny in Spanish and English tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 260--267.Google ScholarGoogle Scholar
  25. Simona Frenda, Bilal Ghanem, Manuel Montes-y Gómez, and Paolo Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on Twitter. Journal of Intelligent 8 Fuzzy Systems 36, 5 (2019), 4743--4752.Google ScholarGoogle ScholarCross RefCross Ref
  26. Raúl Garreta and Guillermo Moncecchi. 2013. Learning Scikit-learn: Machine Learning in Python. Packt Publishing Ltd.Google ScholarGoogle Scholar
  27. Abigail S. Gertner, John Henderson, Elizabeth Merkhofer, Amy Marsh, Ben Wellner, and Guido Zarrella. 2019. MITRE at SemEval-2019 Task 5: Transfer learning for multilingual hate speech detection. In Proceedings of the 13th International Workshop on Semantic Evaluation. 453--459.Google ScholarGoogle ScholarCross RefCross Ref
  28. I. Goenaga, A. Atutxa, K. Gojenola, A. Casillas, A. Dıaz de Ilarraza, N. Ezeiza, M. Oronoz, A. Pérez, and O. Perez de Vinaspre. 2018. Automatic misogyny identification using neural networks. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Co-:ocated with the 34th Conference of the Spanish Society for Natural Language Processing (SEPLN’18).Google ScholarGoogle Scholar
  29. Sameer Hinduja and Justin W. Patchin. 2010. Bullying, cyberbullying, and suicide. Archives of Suicide Research 14, 3 (2010), 206--221.Google ScholarGoogle ScholarCross RefCross Ref
  30. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580.Google ScholarGoogle Scholar
  31. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of cyberbullying incidents on the Instagram social network. arXiv:1503.03909.Google ScholarGoogle Scholar
  33. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations. http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  34. Ritesh Kumar, Atul Kr Ojha, Marcos Zampieri, and Shervin Malmasi. 2018. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), R. Kumar, A. K. Ojha, M. Zampieri, and S. Malmasi (Eds.). ACM, New York, NY.Google ScholarGoogle Scholar
  35. Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In Proceedings of the 27th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  36. Vittorio Lingiardi, Nicola Carone, Giovanni Semeraro, Cataldo Musto, Marilisa D’Amico, and Silvia Brena. 2019. Mapping Twitter hate speech towards social and sexual minorities: A lexicon-based approach to semantic content analysis. Behaviour 8 Information Technology. Epub ahead of print. April 22, 2019.Google ScholarGoogle Scholar
  37. E. Martínez-Cámara, F. Cruz, M. D. Molina-González, M. T. Martín-Valdivia, F. Javier Ortega, and L. A. Ureña-López. 2015. Improving Spanish polarity classification combining different linguistic resources. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 9103. Springer, 234--245.Google ScholarGoogle Scholar
  38. Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, M. Dolores Molina-González, and José M. Perea-Ortega. 2014. Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 40, 4 (2014), 538--554. DOI:https://doi.org/10.1177/0165551514535710 arXiv: http://jis.sagepub.com/content/40/4/538.full.pdf+htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Vol. 752. 41--48.Google ScholarGoogle Scholar
  40. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google ScholarGoogle Scholar
  41. M. Dolores Molina-González, Eugenio Martínez-Cámara, María-Teresa Martín-Valdivia, and José M. Perea-Ortega. 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 18 (2013), 7250--7257.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. Dolores Molina-González, Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, and L. Alfonso Ureña-López. 2015. A Spanish semantic orientation approach to domain adaptation for polarity classification. Information Processing 8 Management 51, 4 (2015), 520--531.Google ScholarGoogle Scholar
  43. Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, New York, NY, 85--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rodrigo Moraes, João Francisco Valiati, and Wilson P. Gavião Neto. 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications 40, 2 (2013), 621--633.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hamdy Mubarak, Kareem Darwish, and Walid Magdy. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 52--56.Google ScholarGoogle ScholarCross RefCross Ref
  46. Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. 14-ExLab@ UniTo for AMI at IberEval2018: Exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 234--241.Google ScholarGoogle Scholar
  47. Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. Automatic identification of misogyny in English and Italian tweets at EVALITA 2018 with a multilingual hate lexicon. In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18), Vol. 2263. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  48. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (Oct. 2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Juan Manuel Pérez and Franco M. Luque. 2019. Atalaya at SemEval 2019 Task 5: Robust embeddings for tweet classification. In Proceedings of the 13th International Workshop on Semantic Evaluation. 64--69.Google ScholarGoogle Scholar
  50. Michal Ptaszynski, Agata Pieciukiewicz, and Paweł Dybała. 2019. Results of the PolEval 2019 Shared Task 6: First dataset and open shared task for automatic cyberbullying detection in Polish Twitter. In Proceedings of the PolEval 2019 Workshop.89.Google ScholarGoogle Scholar
  51. Nanjira Sambuli, Faith Morara, and Christine Mahihu. 2013. Monitoring Online Dangerous Speech in Kenya. Umati.Google ScholarGoogle Scholar
  52. Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Stranisci Marco. 2018. An Italian Twitter corpus of hate speech against immigrants. In Proceedings of the 2018 Language Resources and Evaluation Conference (LREC’18). 1--8.Google ScholarGoogle Scholar
  53. Gudbjartur Ingi Sigurbergsson and Leon Derczynski. 2019. Offensive language and hate speech detection for Danish. arXiv:1908.04531.Google ScholarGoogle Scholar
  54. Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. In Proceedings of the 1th International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  55. Rachel Noelle Simons. 2015. Addressing gender-based harassment in social media: A call to action. In Proceedings of iConference 2015.Google ScholarGoogle Scholar
  56. Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proceedings of the 14th National Conference on Artificial Intelligence and the 9th Conference on Innovative Applications of Artificial Intelligence (AAAI’97/AAAI’97). 1058--1065.Google ScholarGoogle Scholar
  57. Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv:1608.08738.Google ScholarGoogle Scholar
  59. Luis Enrique Argota Vega, Jorge Carlos Reyes-Magaña, Helena Gómez-Adorno, and Gemma Bel-Enguix. 2019. MineriaUNAM at SemEval-2019 Task 5: Detecting hate speech in Twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation. 447--452.Google ScholarGoogle ScholarCross RefCross Ref
  60. Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. 138--142.Google ScholarGoogle ScholarCross RefCross Ref
  61. Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, and Joel Tetreault. 2017. Proceedings of the First Workshop on Abusive Language Online. ACM, New York, NY.Google ScholarGoogle Scholar
  62. Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop. 88--93.Google ScholarGoogle ScholarCross RefCross Ref
  63. Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the GermEval 2018 shared task on the identification of offensive language. In Proceedings of the 14th Conference on Natural Language Processing (KONVENS’18).Google ScholarGoogle Scholar
  64. Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J. Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.Google ScholarGoogle Scholar

Index Terms

  1. Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Internet Technology
              ACM Transactions on Internet Technology  Volume 20, Issue 2
              Special Section on Emotions in Conflictual Social Interactions and Regular Papers
              May 2020
              256 pages
              ISSN:1533-5399
              EISSN:1557-6051
              DOI:10.1145/3386441
              • Editor:
              • Ling Liu
              Issue’s Table of Contents

              Copyright © 2020 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 14 March 2020
              • Accepted: 1 October 2019
              • Revised: 1 August 2019
              • Received: 1 March 2019
              Published in toit Volume 20, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format