Abstract
Today, misogyny and xenophobia are some of the most important social problems. With the increase in the use of social media, this feeling of hatred toward women and immigrants can be more easily expressed, and therefore it can have harmful effects on social media users. For this reason, it is important to develop systems capable of detecting hateful comments automatically. In this article, we analyze the hate speech in Spanish tweets against women and immigrants conducting classification experiments using different approaches. Moreover, we create appropriate language resources for hate speech detection in Spanish.
- Miguel Á. Álvarez-Carmona, Estefanıa Guzmán-Falcón, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villasenor-Pineda, Verónica Reyes-Meza, and Antonio Rico-Sulayes. 2018. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. In Notebook Papers of the 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL’18), Vol. 6.Google Scholar
- Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic identification and classification of misogynistic language on Twitter. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. 57--64.Google ScholarDigital Library
- Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 45--54.Google ScholarDigital Library
- Angelo Basile and Chiara Rubagotti. 2018. CrotoneMilano for AMI at Evalita2018. A performant, cross-lingual misogyny detection system. In Proceedings of the Final Workshop of the 6th EvaluationCampaign (EVALITA’18), Co-Located with the 5th Italian Conference on Computational Linguistics (CLiC-it’18).Google ScholarCross Ref
- Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval’19).Google ScholarCross Ref
- Elisa Bassignana, Valerio Basile, and Viviana Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. In Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. 1--6.Google ScholarCross Ref
- Linda Beckman, Curt Hagquist, and Lisa Hellström. 2013. Discrepant gender patterns for cyberbullying and traditional bullying—An analysis of Swedish adolescent data. Computers in Human Behavior 29, 5 (2013), 1896--1903.Google ScholarCross Ref
- Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Innovations in Machine Learning, D. E. Holmes and L. C. Jain (Eds.). Studies in Fuzziness and Soft Computing. Springer.Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarCross Ref
- Cristina Bosco, Viviana Patti, Marcello Bogetti, Michelangelo Conoscenti, Giancarlo Francesco Ruffo, Rossano Schifanella, and Marco Stranisci. 2017. Tools and resources for detecting hate and prejudice against immigrants in social media. In Proceedingsof the AISB Annual Convention: Symposium III—Social Interactions in Complex Intelligent Systems (SICIS). 79--84.Google Scholar
- Jose Sebastián Canós. 2018. Misogyny identification through SVM at IberEval 2018. In Proceedings of the 3rd IberEval Workshop.Google Scholar
- Cristian Cardellino. 2016. Spanish Billion Words Corpus and Embeddings. Retrieved February 17, 2020 from https://crscardellino.github.io/SBWCE/.Google Scholar
- Cagatay Catal, Ugur Sevim, and Banu Diri. 2011. Practical development of an Eclipse-based software fault prediction tool using naive Bayes algorithm. Expert Systems with Applications 38, 3 (2011), 2347--2353.Google ScholarDigital Library
- Naganna Chetty and Sreejith Alathur. 2018. Hate speech review in the context of online social networks. Aggression and Violent Behavior 40, 5–6 (2018), 108–118.Google ScholarCross Ref
- Raphael Cohen-Almagor. 2011. Fighting hate and bigotry on the Internet. Policy 8 Internet 3, 3 (2011), 1--26.Google Scholar
- Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google Scholar
- Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems 2, 3 (2012), 18.Google ScholarDigital Library
- Karmen Erjavec and Melita Poler Kovačič. 2012. You don’t understand, this is a new war! Analysis of hate speech in news web sites’ comments. Mass Communication and Society 15, 6 (2012), 899--920.Google ScholarCross Ref
- Elisabetta Fersini, Debora Nozza, and Paolo Rosso. 2018. Overview of the Evalita 2018 task on automatic misogyny identification (AMI). In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18).Google ScholarCross Ref
- Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. 2018. Overview of the task on automatic misogyny identification at IberEval 2018. In Proceedings of the 3rd IberEval Workshop, Co-Located with the 34th Conferenceof the Spanish Society for Natural Language Processing (SEPLN’18).Google Scholar
- Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 85.Google ScholarDigital Library
- Jesse Fox, Carlos Cruz, and Ji Young Lee. 2015. Perpetuating online sexism offline: Anonymity, interactivity, and the effects of sexist hashtags on social media. Computers in Human Behavior 52 (2015), 436--442.Google ScholarDigital Library
- Jesse Fox and Wai Yen Tang. 2014. Sexism in online video games: The role of conformity to masculine norms and social dominance orientation. Computers in Human Behavior 33 (2014), 314--320.Google ScholarDigital Library
- Simona Frenda, Bilal Ghanem, and Manuel Montes-y Gómez. 2018. Exploration of misogyny in Spanish and English tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 260--267.Google Scholar
- Simona Frenda, Bilal Ghanem, Manuel Montes-y Gómez, and Paolo Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on Twitter. Journal of Intelligent 8 Fuzzy Systems 36, 5 (2019), 4743--4752.Google ScholarCross Ref
- Raúl Garreta and Guillermo Moncecchi. 2013. Learning Scikit-learn: Machine Learning in Python. Packt Publishing Ltd.Google Scholar
- Abigail S. Gertner, John Henderson, Elizabeth Merkhofer, Amy Marsh, Ben Wellner, and Guido Zarrella. 2019. MITRE at SemEval-2019 Task 5: Transfer learning for multilingual hate speech detection. In Proceedings of the 13th International Workshop on Semantic Evaluation. 453--459.Google ScholarCross Ref
- I. Goenaga, A. Atutxa, K. Gojenola, A. Casillas, A. Dıaz de Ilarraza, N. Ezeiza, M. Oronoz, A. Pérez, and O. Perez de Vinaspre. 2018. Automatic misogyny identification using neural networks. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Co-:ocated with the 34th Conference of the Spanish Society for Natural Language Processing (SEPLN’18).Google Scholar
- Sameer Hinduja and Justin W. Patchin. 2010. Bullying, cyberbullying, and suicide. Archives of Suicide Research 14, 3 (2010), 206--221.Google ScholarCross Ref
- Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarDigital Library
- Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of cyberbullying incidents on the Instagram social network. arXiv:1503.03909.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations. http://arxiv.org/abs/1412.6980Google Scholar
- Ritesh Kumar, Atul Kr Ojha, Marcos Zampieri, and Shervin Malmasi. 2018. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), R. Kumar, A. K. Ojha, M. Zampieri, and S. Malmasi (Eds.). ACM, New York, NY.Google Scholar
- Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In Proceedings of the 27th AAAI Conference on Artificial Intelligence.Google Scholar
- Vittorio Lingiardi, Nicola Carone, Giovanni Semeraro, Cataldo Musto, Marilisa D’Amico, and Silvia Brena. 2019. Mapping Twitter hate speech towards social and sexual minorities: A lexicon-based approach to semantic content analysis. Behaviour 8 Information Technology. Epub ahead of print. April 22, 2019.Google Scholar
- E. Martínez-Cámara, F. Cruz, M. D. Molina-González, M. T. Martín-Valdivia, F. Javier Ortega, and L. A. Ureña-López. 2015. Improving Spanish polarity classification combining different linguistic resources. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 9103. Springer, 234--245.Google Scholar
- Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, M. Dolores Molina-González, and José M. Perea-Ortega. 2014. Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 40, 4 (2014), 538--554. DOI:https://doi.org/10.1177/0165551514535710 arXiv: http://jis.sagepub.com/content/40/4/538.full.pdf+htmlGoogle ScholarDigital Library
- Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Vol. 752. 41--48.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
- M. Dolores Molina-González, Eugenio Martínez-Cámara, María-Teresa Martín-Valdivia, and José M. Perea-Ortega. 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 18 (2013), 7250--7257.Google ScholarCross Ref
- M. Dolores Molina-González, Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, and L. Alfonso Ureña-López. 2015. A Spanish semantic orientation approach to domain adaptation for polarity classification. Information Processing 8 Management 51, 4 (2015), 520--531.Google Scholar
- Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, New York, NY, 85--94.Google ScholarDigital Library
- Rodrigo Moraes, João Francisco Valiati, and Wilson P. Gavião Neto. 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications 40, 2 (2013), 621--633.Google ScholarDigital Library
- Hamdy Mubarak, Kareem Darwish, and Walid Magdy. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 52--56.Google ScholarCross Ref
- Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. 14-ExLab@ UniTo for AMI at IberEval2018: Exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 234--241.Google Scholar
- Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. Automatic identification of misogyny in English and Italian tweets at EVALITA 2018 with a multilingual hate lexicon. In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18), Vol. 2263. 1--6.Google ScholarCross Ref
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (Oct. 2011), 2825--2830.Google ScholarDigital Library
- Juan Manuel Pérez and Franco M. Luque. 2019. Atalaya at SemEval 2019 Task 5: Robust embeddings for tweet classification. In Proceedings of the 13th International Workshop on Semantic Evaluation. 64--69.Google Scholar
- Michal Ptaszynski, Agata Pieciukiewicz, and Paweł Dybała. 2019. Results of the PolEval 2019 Shared Task 6: First dataset and open shared task for automatic cyberbullying detection in Polish Twitter. In Proceedings of the PolEval 2019 Workshop.89.Google Scholar
- Nanjira Sambuli, Faith Morara, and Christine Mahihu. 2013. Monitoring Online Dangerous Speech in Kenya. Umati.Google Scholar
- Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Stranisci Marco. 2018. An Italian Twitter corpus of hate speech against immigrants. In Proceedings of the 2018 Language Resources and Evaluation Conference (LREC’18). 1--8.Google Scholar
- Gudbjartur Ingi Sigurbergsson and Leon Derczynski. 2019. Offensive language and hate speech detection for Danish. arXiv:1908.04531.Google Scholar
- Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. In Proceedings of the 1th International AAAI Conference on Web and Social Media.Google Scholar
- Rachel Noelle Simons. 2015. Addressing gender-based harassment in social media: A call to action. In Proceedings of iConference 2015.Google Scholar
- Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proceedings of the 14th National Conference on Artificial Intelligence and the 9th Conference on Innovative Applications of Artificial Intelligence (AAAI’97/AAAI’97). 1058--1065.Google Scholar
- Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.Google ScholarDigital Library
- Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv:1608.08738.Google Scholar
- Luis Enrique Argota Vega, Jorge Carlos Reyes-Magaña, Helena Gómez-Adorno, and Gemma Bel-Enguix. 2019. MineriaUNAM at SemEval-2019 Task 5: Detecting hate speech in Twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation. 447--452.Google ScholarCross Ref
- Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. 138--142.Google ScholarCross Ref
- Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, and Joel Tetreault. 2017. Proceedings of the First Workshop on Abusive Language Online. ACM, New York, NY.Google Scholar
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop. 88--93.Google ScholarCross Ref
- Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the GermEval 2018 shared task on the identification of offensive language. In Proceedings of the 14th Conference on Natural Language Processing (KONVENS’18).Google Scholar
- Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J. Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.Google Scholar
Index Terms
- Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies
Recommendations
Categorizing Sexism and Misogyny through Neural Approaches
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the ...
Detecting Threats of Violence in Online Discussions Using Bigrams of Important Words
JISIC '14: Proceedings of the 2014 IEEE Joint Intelligence and Security Informatics ConferenceMaking violent threats towards minorities like immigrants or homosexuals is increasingly common on the Internet. We present a method to automatically detect threats of violence using machine learning. A material of 24,840 sentences from YouTube was ...
Detecting racial stereotypes: An Italian social media corpus where psychology meets NLP
AbstractThe generation of stereotypes allows us to simplify the cognitive complexity we have to deal with in everyday life. Stereotypes are extensively used to describe people who belong to a different ethnic group, particularly in racial ...
Highlights- The paper offers a psychological and computational perspective on racial stereotype.
Comments