research-article

Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies

Authors:
Flor-Miriam Plaza-Del-Arco

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

0000-0002-3020-5512
View Profile

,
M. Dolores Molina-González

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

0000-0002-8348-7154
View Profile

,
L. Alfonso Ureña-López

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

Advanced Studies Center in ICT (CEATIC), Jaén, Spain
View Profile

,
M. Teresa Martín-Valdivia

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

Advanced Studies Center in ICT (CEATIC), Jaén, Spain

0000-0002-2874-0401
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 20 Issue 2Article No.: 12pp 1–19https://doi.org/10.1145/3369869

Published:14 March 2020Publication History

ACM Transactions on Internet Technology

Abstract

Today, misogyny and xenophobia are some of the most important social problems. With the increase in the use of social media, this feeling of hatred toward women and immigrants can be more easily expressed, and therefore it can have harmful effects on social media users. For this reason, it is important to develop systems capable of detecting hateful comments automatically. In this article, we analyze the hate speech in Spanish tweets against women and immigrants conducting classification experiments using different approaches. Moreover, we create appropriate language resources for hate speech detection in Spanish.

References

Miguel Á. Álvarez-Carmona, Estefanıa Guzmán-Falcón, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villasenor-Pineda, Verónica Reyes-Meza, and Antonio Rico-Sulayes. 2018. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. In Notebook Papers of the 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL’18), Vol. 6.Google Scholar
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic identification and classification of misogynistic language on Twitter. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. 57--64.Google ScholarDigital Library
Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 45--54.Google ScholarDigital Library
Angelo Basile and Chiara Rubagotti. 2018. CrotoneMilano for AMI at Evalita2018. A performant, cross-lingual misogyny detection system. In Proceedings of the Final Workshop of the 6th EvaluationCampaign (EVALITA’18), Co-Located with the 5th Italian Conference on Computational Linguistics (CLiC-it’18).Google ScholarCross Ref
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval’19).Google ScholarCross Ref
Elisa Bassignana, Valerio Basile, and Viviana Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. In Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. 1--6.Google ScholarCross Ref
Linda Beckman, Curt Hagquist, and Lisa Hellström. 2013. Discrepant gender patterns for cyberbullying and traditional bullying—An analysis of Swedish adolescent data. Computers in Human Behavior 29, 5 (2013), 1896--1903.Google ScholarCross Ref
Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Innovations in Machine Learning, D. E. Holmes and L. C. Jain (Eds.). Studies in Fuzziness and Soft Computing. Springer.Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarCross Ref
Cristina Bosco, Viviana Patti, Marcello Bogetti, Michelangelo Conoscenti, Giancarlo Francesco Ruffo, Rossano Schifanella, and Marco Stranisci. 2017. Tools and resources for detecting hate and prejudice against immigrants in social media. In Proceedingsof the AISB Annual Convention: Symposium III—Social Interactions in Complex Intelligent Systems (SICIS). 79--84.Google Scholar
Jose Sebastián Canós. 2018. Misogyny identification through SVM at IberEval 2018. In Proceedings of the 3rd IberEval Workshop.Google Scholar
Cristian Cardellino. 2016. Spanish Billion Words Corpus and Embeddings. Retrieved February 17, 2020 from https://crscardellino.github.io/SBWCE/.Google Scholar
Cagatay Catal, Ugur Sevim, and Banu Diri. 2011. Practical development of an Eclipse-based software fault prediction tool using naive Bayes algorithm. Expert Systems with Applications 38, 3 (2011), 2347--2353.Google ScholarDigital Library
Naganna Chetty and Sreejith Alathur. 2018. Hate speech review in the context of online social networks. Aggression and Violent Behavior 40, 5–6 (2018), 108–118.Google ScholarCross Ref
Raphael Cohen-Almagor. 2011. Fighting hate and bigotry on the Internet. Policy 8 Internet 3, 3 (2011), 1--26.Google Scholar
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google Scholar
Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems 2, 3 (2012), 18.Google ScholarDigital Library
Karmen Erjavec and Melita Poler Kovačič. 2012. You don’t understand, this is a new war! Analysis of hate speech in news web sites’ comments. Mass Communication and Society 15, 6 (2012), 899--920.Google ScholarCross Ref
Elisabetta Fersini, Debora Nozza, and Paolo Rosso. 2018. Overview of the Evalita 2018 task on automatic misogyny identification (AMI). In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18).Google ScholarCross Ref
Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. 2018. Overview of the task on automatic misogyny identification at IberEval 2018. In Proceedings of the 3rd IberEval Workshop, Co-Located with the 34th Conferenceof the Spanish Society for Natural Language Processing (SEPLN’18).Google Scholar
Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 85.Google ScholarDigital Library
Jesse Fox, Carlos Cruz, and Ji Young Lee. 2015. Perpetuating online sexism offline: Anonymity, interactivity, and the effects of sexist hashtags on social media. Computers in Human Behavior 52 (2015), 436--442.Google ScholarDigital Library
Jesse Fox and Wai Yen Tang. 2014. Sexism in online video games: The role of conformity to masculine norms and social dominance orientation. Computers in Human Behavior 33 (2014), 314--320.Google ScholarDigital Library
Simona Frenda, Bilal Ghanem, and Manuel Montes-y Gómez. 2018. Exploration of misogyny in Spanish and English tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 260--267.Google Scholar
Simona Frenda, Bilal Ghanem, Manuel Montes-y Gómez, and Paolo Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on Twitter. Journal of Intelligent 8 Fuzzy Systems 36, 5 (2019), 4743--4752.Google ScholarCross Ref
Raúl Garreta and Guillermo Moncecchi. 2013. Learning Scikit-learn: Machine Learning in Python. Packt Publishing Ltd.Google Scholar
Abigail S. Gertner, John Henderson, Elizabeth Merkhofer, Amy Marsh, Ben Wellner, and Guido Zarrella. 2019. MITRE at SemEval-2019 Task 5: Transfer learning for multilingual hate speech detection. In Proceedings of the 13th International Workshop on Semantic Evaluation. 453--459.Google ScholarCross Ref
I. Goenaga, A. Atutxa, K. Gojenola, A. Casillas, A. Dıaz de Ilarraza, N. Ezeiza, M. Oronoz, A. Pérez, and O. Perez de Vinaspre. 2018. Automatic misogyny identification using neural networks. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Co-:ocated with the 34th Conference of the Spanish Society for Natural Language Processing (SEPLN’18).Google Scholar
Sameer Hinduja and Justin W. Patchin. 2010. Bullying, cyberbullying, and suicide. Archives of Suicide Research 14, 3 (2010), 206--221.Google ScholarCross Ref
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarDigital Library
Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of cyberbullying incidents on the Instagram social network. arXiv:1503.03909.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations. http://arxiv.org/abs/1412.6980Google Scholar
Ritesh Kumar, Atul Kr Ojha, Marcos Zampieri, and Shervin Malmasi. 2018. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), R. Kumar, A. K. Ojha, M. Zampieri, and S. Malmasi (Eds.). ACM, New York, NY.Google Scholar
Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In Proceedings of the 27th AAAI Conference on Artificial Intelligence.Google Scholar
Vittorio Lingiardi, Nicola Carone, Giovanni Semeraro, Cataldo Musto, Marilisa D’Amico, and Silvia Brena. 2019. Mapping Twitter hate speech towards social and sexual minorities: A lexicon-based approach to semantic content analysis. Behaviour 8 Information Technology. Epub ahead of print. April 22, 2019.Google Scholar
E. Martínez-Cámara, F. Cruz, M. D. Molina-González, M. T. Martín-Valdivia, F. Javier Ortega, and L. A. Ureña-López. 2015. Improving Spanish polarity classification combining different linguistic resources. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 9103. Springer, 234--245.Google Scholar
Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, M. Dolores Molina-González, and José M. Perea-Ortega. 2014. Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 40, 4 (2014), 538--554. DOI:https://doi.org/10.1177/0165551514535710 arXiv: http://jis.sagepub.com/content/40/4/538.full.pdf+htmlGoogle ScholarDigital Library
Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Vol. 752. 41--48.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
M. Dolores Molina-González, Eugenio Martínez-Cámara, María-Teresa Martín-Valdivia, and José M. Perea-Ortega. 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 18 (2013), 7250--7257.Google ScholarCross Ref
M. Dolores Molina-González, Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, and L. Alfonso Ureña-López. 2015. A Spanish semantic orientation approach to domain adaptation for polarity classification. Information Processing 8 Management 51, 4 (2015), 520--531.Google Scholar
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, New York, NY, 85--94.Google ScholarDigital Library
Rodrigo Moraes, João Francisco Valiati, and Wilson P. Gavião Neto. 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications 40, 2 (2013), 621--633.Google ScholarDigital Library
Hamdy Mubarak, Kareem Darwish, and Walid Magdy. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 52--56.Google ScholarCross Ref
Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. 14-ExLab@ UniTo for AMI at IberEval2018: Exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 234--241.Google Scholar
Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. Automatic identification of misogyny in English and Italian tweets at EVALITA 2018 with a multilingual hate lexicon. In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18), Vol. 2263. 1--6.Google ScholarCross Ref
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (Oct. 2011), 2825--2830.Google ScholarDigital Library
Juan Manuel Pérez and Franco M. Luque. 2019. Atalaya at SemEval 2019 Task 5: Robust embeddings for tweet classification. In Proceedings of the 13th International Workshop on Semantic Evaluation. 64--69.Google Scholar
Michal Ptaszynski, Agata Pieciukiewicz, and Paweł Dybała. 2019. Results of the PolEval 2019 Shared Task 6: First dataset and open shared task for automatic cyberbullying detection in Polish Twitter. In Proceedings of the PolEval 2019 Workshop.89.Google Scholar
Nanjira Sambuli, Faith Morara, and Christine Mahihu. 2013. Monitoring Online Dangerous Speech in Kenya. Umati.Google Scholar
Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Stranisci Marco. 2018. An Italian Twitter corpus of hate speech against immigrants. In Proceedings of the 2018 Language Resources and Evaluation Conference (LREC’18). 1--8.Google Scholar
Gudbjartur Ingi Sigurbergsson and Leon Derczynski. 2019. Offensive language and hate speech detection for Danish. arXiv:1908.04531.Google Scholar
Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. In Proceedings of the 1th International AAAI Conference on Web and Social Media.Google Scholar
Rachel Noelle Simons. 2015. Addressing gender-based harassment in social media: A call to action. In Proceedings of iConference 2015.Google Scholar
Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proceedings of the 14th National Conference on Artificial Intelligence and the 9th Conference on Innovative Applications of Artificial Intelligence (AAAI’97/AAAI’97). 1058--1065.Google Scholar
Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.Google ScholarDigital Library
Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv:1608.08738.Google Scholar
Luis Enrique Argota Vega, Jorge Carlos Reyes-Magaña, Helena Gómez-Adorno, and Gemma Bel-Enguix. 2019. MineriaUNAM at SemEval-2019 Task 5: Detecting hate speech in Twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation. 447--452.Google ScholarCross Ref
Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. 138--142.Google ScholarCross Ref
Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, and Joel Tetreault. 2017. Proceedings of the First Workshop on Abusive Language Online. ACM, New York, NY.Google Scholar
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop. 88--93.Google ScholarCross Ref
Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the GermEval 2018 shared task on the identification of offensive language. In Proceedings of the 14th Conference on Natural Language Processing (KONVENS’18).Google Scholar
Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J. Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.Google Scholar

Index Terms

Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search

Recommendations

Categorizing Sexism and Misogyny through Neural Approaches
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the ...
Read More
Detecting Threats of Violence in Online Discussions Using Bigrams of Important Words
JISIC '14: Proceedings of the 2014 IEEE Joint Intelligence and Security Informatics Conference

Making violent threats towards minorities like immigrants or homosexuals is increasingly common on the Internet. We present a method to automatically detect threats of violence using machine learning. A material of 24,840 sentences from YouTube was ...
Read More
Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP
Abstract
The generation of stereotypes allows us to simplify the cognitive complexity we have to deal with in everyday life. Stereotypes are extensively used to describe people who belong to a different ethnic group, particularly in racial ...
Highlights
- The paper offers a psychological and computational perspective on racial stereotype.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 20, Issue 2
Special Section on Emotions in Conflictual Social Interactions and Regular Papers
May 2020
256 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3386441
Editor:
Ling Liu
Georgia Institute of Technology, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2020
- Accepted: 1 October 2019
- Revised: 1 August 2019
- Received: 1 March 2019
Published in toit Volume 20, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Misogyny detection
classifier ensemble
hate speech classification
lexicon
machine learning
social media
text mining
xenophobia detection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 868
  Total Downloads
- Downloads (Last 12 months)104
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Categorizing Sexism and Misogyny through Neural Approaches

Detecting Threats of Violence in Online Discussions Using Bigrams of Important Words

Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP