Skip to main content
Log in

A flexible text analyzer based on ontologies: an application for detecting discriminatory language

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Language can be a tool to marginalize certain groups due to the fact that it may reflect a negative mentality caused by mental barriers or historical delays. In order to prevent misuse of language, several agents have carried out campaigns against discriminatory language, criticizing the use of some terms and phrases. However, there is an important gap in detecting discriminatory text in documents because language is very flexible and, usually, contains hidden features or relations. Furthermore, the adaptation of approaches and methodologies proposed in the literature for text analysis is complex due to the fact that these proposals are too rigid to be adapted to different purposes for which they were intended. The main novelty of the methodology is the use of ontologies to implement the rules that are used by the developed text analyzer, providing a great flexibility for the development of text analyzers and exploiting the ability to infer knowledge of the ontologies. A set of rules for detecting discriminatory language relevant to gender and people with disabilities is also presented in order to show how to extend the functionality of the text analyzer to different discriminatory text areas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.w3.org/TR/sparql11-query.

  2. http://ontotext.com/.

  3. https://gate.ac.uk.

  4. https://www.w3.org/2004/02/skos/.

  5. https://github.com/mark-watson/fasttag_v2.

  6. https://jena.apache.org.

  7. A Weka-ready version of the data set is available at https://sourceforge.net/p/disclangeditor/.

  8. Concepts and properties of the ontology.

  9. https://sourceforge.net/p/disclangeditor/.

  10. http://sinbad2.ujaen.es/text-mining-dist.

References

  • Ahmed, S. (2007). The language of diversity. Ethnic and Racial Studies, 30(2), 235–256.

    Article  Google Scholar 

  • Alfonseca, E., Garrido, G., Delort, J. Y., & Peńas, A. (2013). Whad: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the wikipedia edit history. Language Resources and Evaluation, 47(4), 1163–1190.

    Article  Google Scholar 

  • Augoustinos, M., Tuffin, K., & Every, D. (2005). New racism, meritocracy and individualism: Constraining affirmative action in education. Discourse and Society, 16(3), 315–340.

    Article  Google Scholar 

  • Aussenac-Gilles, N., & Sörgel, D. (2005). Text analysis for ontology and terminology engineering. Applied Ontology, 1(1), 35–46.

    Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol, CA: O’Reilly Media, Inc.

    Google Scholar 

  • Brading, J., & Curtis, J. (2000). Disability discrimination: A practical guide to the new law. London: Kogan Page Series.

    Google Scholar 

  • Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied natural language processing, association for computational linguistics, Stroudsburg, PA, USA, ANLC ’92, pp. 152–155. doi:10.3115/974499.974526.

  • Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protégé plug-in for ontology extraction from text based on linguistic analysis. In The semantic web: Research and applications, pp. 31–44. Springer.

  • Chandrasekaran, B., Josephson, J., & Benjamins, V. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems and Their Applications, 14(1), 20–26.

    Article  Google Scholar 

  • Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012, pp. 71–80.

  • Chin, S., Street, W., Srinivasan, P., & Eichmann, D. (2010). Detecting wikipedia vandalism with active learning and statistical language models. In Proceedings of the 4th workshop on information credibility, WICOW’10, pp. 3–10.

  • Cimiano, P., McCrae, J., & Buitelaar, P. (2016). Lexicon model for ontologies: Community report. https://www.w3.org/2016/05/ontolex/. Accessed 12 July 2016.

  • Claude, R., & Weston, B. (1992). Human rights in the world community: Issues and action. Pennsylvania: University of Pennsylvania Press.

    Google Scholar 

  • Colker, R., & Milani, A. (2012). The law of disability discrimination handbook: Statutes and regulatory guidance. New York, NY: LexisNexis.

    Google Scholar 

  • Dance, F. (1970). The concept of communication. Journal of Communication, 20(2), 201–210.

    Article  Google Scholar 

  • Drummond, N., Rector, A., Stevens, R., Moulton, G., Horridge, M., Wang, H., & Seidenberg, J. (2006). Putting owl in order: Patterns for sequences in owl. In OWLED.

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Computing semantic relatedness using wikipedia-based explicit semantic analysis. pp. 1606–1611.

  • Gangemi, A., Navigli, R., & Velardi, P. (2003). The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet. In The OntoWordNet project: Extension and axiomatization of conceptual relations in WordNet, Vol. 2888, pp. 820–838. Springer.

  • Garla, V., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of Biomedical Informatics, 45(5), 992–998.

    Article  Google Scholar 

  • Hayes, P. J., & Patel-Schneide, P. F. (2014). Rdf 1.1 semantics. https://www.w3.org/TR/rdf11-mt/. Accessed 18 March 2016.

  • Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics-Volume 2, Association for Computational Linguistics, pp. 539–545.

  • Hellmann, S., Lehmann, J., Auer, S., & Brümmer, M. (2013). Integrating NLP using linked data. In International semantic web conference, pp. 98–113. Springer.

  • Horrocks, I. (2008). Ontologies and the semantic web. Communications of the ACM, 51(12), 58–67.

    Article  Google Scholar 

  • Horrocks, I., Patel-Schneider, P., & Van Harmelen, F. (2003). From SHIQ and RDF to OWL: The making of a web ontology language. Web Semantics, 1(1), 7–26.

    Article  Google Scholar 

  • Hotho, A., Maedche, A., & Staab, S. (2002). Ontology-based text document clustering. KI, 16(4), 48–54.

    Google Scholar 

  • Isaac, A., & Summers, E. (2009). Skos simple knowledge organization system primer. w3c recommendation. Technical Report, World Wide Web Consortium (W3C).

  • Kasper, W., & Vela, M. (2012). Sentiment analysis for hotel reviews. Speech Technology, 4(2), 96–109.

    Google Scholar 

  • Knijff, J., Frasincar, F., & Hogenboom, F. (2013). Domain taxonomy learning from text: The subsumption method versus hierarchical clustering. Data & Knowledge Engineering, 83, 54–69. doi:10.1016/j.datak.2012.10.002.

    Article  Google Scholar 

  • Kohler, J., Philippi, S., Specht, M., & Ruegg, A. (2006). Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19(8), 744–754.

    Article  Google Scholar 

  • Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074.

    Article  Google Scholar 

  • Kontostathis, A., Edwards, L., & Leatherman, A. (2009). Chatcoder: Toward the tracking and categorization of internet predators. In Society for industrial and applied mathematics—9th SIAM international conference on data mining 2009, Proceedings in applied mathematics, Vol 3. pp. 1327–1334.

  • Kubota, R., & Lin, A. (2010). Race, culture, and identities in second language education: Exploring critically engaged practice. New York: Taylor & Francis.

    Google Scholar 

  • Li, C., Yang, J., & Park, S. (2012). Text categorization algorithms using semantic approaches, corpus-based thesaurus and wordnet. Expert Systems with Applications, 39(1), 765–772.

    Article  Google Scholar 

  • Litosseliti, L. (2014). Gender and language theory and practice. New York: Taylor & Francis.

    Google Scholar 

  • Loenen, T., & Rodrigues, P. (1999). Non-discrimination law: Comparative perspectives. Alphen aan den Rijn: Kluwer Law International.

    Google Scholar 

  • Luo, Q., Chen, E., & Xiong, H. (2011). A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10), 12,708–12,716.

    Article  Google Scholar 

  • Machhour, H., & Kassou, I. (2013). Improving text categorization: A fully automated ontology based approach. In 2013 Third international conference on communications and information technology (ICCIT), IEEE, pp. 67–72.

  • Maedche, A., & Staab, S. (2001). Ontology learning for the semantic web. IEEE Intelligent Systems and Their Applications, 16(2), 72–79.

    Article  Google Scholar 

  • McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.

    Article  Google Scholar 

  • Mowbray, J. (2012). Linguistic justice: International law and language policy. Oxford: OUP.

    Book  Google Scholar 

  • ODP. (2010). Owl list pattern. http://ontologydesignpatterns.org/wiki/Submissions:List. Accessed 18 May 2016.

  • Orelus, P. (2011). Rethinking race, class, language, and gender: A dialogue with noam chomsky and other leading scholars. Lanham, MD: Rowman & Littlefield Publishers.

    Google Scholar 

  • Salguero, A., & Espinilla, M. (2016). Description logic class expression learning applied to sentiment analysis. Cham: Springer. doi:10.1007/978-3-319-30319-2_5.

    Book  Google Scholar 

  • Santorini, B. (1990). Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Report, University of Pennsylvania.

  • Schiek, D., & Lawson, A. (2011). European union non-discrimination law and intersectionality: Investigating the triangle of racial, gender and disability discrimination. Farnham: Ashgate.

    Google Scholar 

  • Shuy, R. W. (2007). Fighting over words: Language and civil law cases: Language and civil law cases. Oxford: Oxford University Press.

    Google Scholar 

  • Sirin, E., Parsia, B., Grau, B., Kalyanpur, A., & Katz, Y. (2007). Pellet: A practical owl-dl reasoner. Web Semantics, 5(2), 51–53.

    Article  Google Scholar 

  • Tablan, V., Bontcheva, K., Roberts, I., & Cunningham, H. (2015). Mímir: An open-source semantic search framework for interactive information seeking and discovery. Web Semantics: Science, Services and Agents on the World Wide Web, 30, 52–68. doi:10.1016/j.websem.2014.10.002 http://www.sciencedirect.com/science/article/pii/S1570826814001036, semantic Search.

  • Talbot, M. (2010). Language and gender. New York: Wiley.

    Google Scholar 

  • Tontti, J. (2004). Right and prejudice: Prolegomena to a hermeneutical philosophy of law. Farnham: Ashgate.

    Google Scholar 

  • University of Newcastle. (2006). Inclusive language policy 000797. http://www.newcastle.edu.au/policy/000797.html.

  • Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.

    Article  Google Scholar 

  • Uschold, M., Gruninger, M., et al. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.

    Article  Google Scholar 

  • Wang, P., Hu, H. J. J. Z., & Chen, Z. (2009). Using wikipedia knowledge to improve text classification. Knowledge and Information Systems, 19(3), 265–281.

    Article  Google Scholar 

  • Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using wordnet and lexical chains. Expert Systems with Applications, 42(4), 2264–2275. doi:10.1016/j.eswa.2014.10.023.

    Article  Google Scholar 

  • Weller, P., Purdam, K., Ghanea, N., & Cheruvallil-Contractor, S. (2013). Religion or belief, discrimination and equality: britain in global contexts. London: Bloomsbury Publishing.

    Google Scholar 

  • Xu, H., Zhang, F., & Wang, W. (2015). Implicit feature identification in chinese reviews using explicit topic mining model. Knowledge-Based Systems, 76, 166–175. doi:10.1016/j.knosys.2014.12.012.

    Article  Google Scholar 

  • Yates, S. (2001). Gender, language and CMC for education. Learning and Instruction, 11(1), 21–34.

    Article  Google Scholar 

  • Zhang, F., Ma, Z., & Li, W. (2015). Storing owl ontologies in object-oriented databases. Knowledge-Based Systems, 76, 240–255. doi:10.1016/j.knosys.2014.12.020.

    Article  Google Scholar 

Download references

Acknowledgements

This contribution has been supported by the Andalusian Institute of Women, Junta de Andalucía, Spain (Grant No. UNIVER09/2009/23/00).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Salguero.

Appendices

Appendix 1: Relevant class descriptions for discriminative language detection

Extra-visibility \(\equiv \) DisabledPeople  \(\sqcup \) RacePeople \(\sqcup \)  ReligionPeople \(\sqcup \) Sex  \(\sqcap \) \(\exists \) hasNext Noun

InappropriateTitles \(\equiv \) Dr  \(\sqcup \) Mr \(\sqcap \) \(\exists \)  hasNext (\(\exists \) hasNext (Feminine \(\sqcap \)  ProperNoun)) \(\sqcup \) \(\exists \) hasNext  (\(\exists \) hasNext (\(\exists \) hasNext  (Feminine \(\sqcap \) ProperNoun))) \(\sqcup \)  \(\exists \) hasPrevious (\(\exists \) hasPrevious  (Feminine \(\sqcap \) ProperNoun)) \(\sqcup \)  \(\exists \) hasPrevious (\(\exists \) hasPrevious  (\(\exists \) hasPrevious (Feminine \(\sqcap \)  ProperNoun)))  \(\sqcup \) Mrs \(\sqcup \) Ms \(\sqcap \)  \(\exists \) hasNext (\(\exists \) hasNext  (Masculine \(\sqcap \) ProperNoun)) \(\sqcup \)  \(\exists \) hasNext (\(\exists \) hasNext  (\(\exists \) hasNext (Masculine \(\sqcap \) ProperNoun)))  \(\sqcup \) \(\exists \) hasPrevious  (\(\exists \) hasPrevious (Masculine \(\sqcap \)  ProperNoun)) \(\sqcup \) \(\exists \) hasPrevious  (\(\exists \) hasPrevious (\(\exists \) hasPrevious  (Masculine \(\sqcap \) ProperNoun)))

ManAsVerb \(\equiv \) Man \(\sqcup \)  Manning \(\sqcap \) \(\exists \) hasNext The

ManPrecededByForInOf \(\equiv \) Man \(\sqcap \)  \(\exists \) hasNext (For \(\sqcup \)  In \(\sqcup \) Of)

ManPrecededByForInOf \(\sqsubseteq \) ManAlternative 

MenWomenOrder \(\equiv \) He  \(\sqcap \) \(\exists \) hasNext  (\(\exists \) hasNext She) \(\sqcup \)  Him \(\sqcap \) \(\exists \) hasNext  (\(\exists \) hasNext Her) \(\sqcup \)   His \(\sqcap \) \(\exists \) hasNext  (\(\exists \) hasNext Hers) \(\sqcup \)  Men \(\sqcap \) \(\exists \) hasNext  (\(\exists \) hasNext Women) \(\sqcup \)  Sir \(\sqcap \) \(\exists \) hasNext  (\(\exists \) hasNext Madam)

NeutralMasculinePronoun \(\equiv \)  Masculine \(\sqcap \) Pronoun \(\sqcap \)  \(\exists \) isPrecededBy ProperNoun

NeutralMasculinePronoun \(\equiv \) Masculine  \(\sqcap \) Pronoun \(\sqcap \)  \(\exists \) hasNext (\(\exists \)  hasNext (Feminine \(\sqcap \) Pronoun))

SexistDescription \(\equiv \) Adjective  \(\sqcap \) \(\exists \) hasNext Women  \(\sqcap \) \(\exists \) hasPrevious  (And \(\sqcap \) \(\exists \) hasPrevious  (Men \(\sqcap \) \(\exists \) hasPrevious Adjective))

Stereotyping \(\equiv \) Sufferer  \(\sqcup \) Victim \(\sqcap \) \(\exists \)  isFollowedBy Illness \(\sqcup \) \(\exists \)  isPrecededBy Illness

Appendix 2: Rules for detecting discriminative language

Rule

Description

Examples

 

2.1. Extra-visibility

It is quite unnecessary to mention a person’s sex, race, ethnic background, religion or disability

Male nurse; female engineer; muslim student; Black police officer

\(\checkmark \)

3.1.1. Invisibility

Women are often invisible in language due to the use of the masculine pronouns ‘he’, ‘him’, ‘his’ to refer to both men and women, and the use of ‘man’ as a noun, verb or adjective

Mankind; man made

\(\checkmark \)

3.1.2. Inferiority

Unnecessary mention of gender to suggest that in certain roles women are inferior to men. The use of ‘feminine’ suffixes such as ‘ette’, ‘ess’, ‘ienne’ and ‘trix’ are unnecessary

Female engineer; woman academic; actress

\(\checkmark \)

3.2.1. Use alternatives for ‘man’

 

Mankind; the best man for the job; the man in the street; man of letters, men of science; manpower; manmade

\(\checkmark \)

3.2.2. Avoid the use of ‘man’ as a verb

 

We need someone to man the desk; manning the office; She will man the phones

\(\checkmark \)

3.2.4. Find alternatives to ‘he’ and ‘his’

The pronouns ‘he’, ‘his’ and ‘him’ are frequently used as generic pronouns. As this use is both ambiguous and excludes women, try to find alternatives

The student may exercise his right to appeal

\(\checkmark \)

3.2.7. Use alternatives for sex-specific occupation terms

Avoid the impression that these positions are male-exclusive. Avoid using occupational titles containing the ‘feminine’ suffixes -ess, -ette, -trix, -ienne.

Chairman; headmaster; headmistress; policeman; businessman; layman; groundsman; actress; executrix; authoress; comedienne

\(\checkmark \)

3.2.8. Use appropriate titles and other modes of address

The inappropriate use of names, titles, salutations and endearments create the impression that women merit less respect or less serious consideration that men do. Ensure that people’s qualifications are accurately reflected in their title, and that women’s and men’s academic titles are used in a parallel fashion

Albert Einstein and Mrs Mead; Ms Clark and John Howard; Judy Smith and Dr Nguyen

Partially

3.2.9. Use of Ms, Mrs, Miss, Mr

The use of ‘Ms’ is recommended for all women when the parallel ‘Mr’ is applicable, and ‘Ms’ should be used when a woman’s title of preference is unknown

 

\(\checkmark \)

3.2.10. Avoid patronising expressions

Use the words ‘man’/‘woman’, ‘girl’/‘boy’, ‘gentleman’/‘lady’ in a parallel manner

The girls in the office; Ladies; My girl will take care of that immediately

\(\checkmark \)

3.2.12. Avoid sexist descriptions

Avoid the use of stereotyped generalisations about men’s and women’s characters and patterns of behaviour

Strong men and domineering women; assertive men and aggressive women; angry men and hysterical women

\(\checkmark \)

4.1.1. Derogatory labelling

They are still used, and should be avoided. Some acceptable alternatives for such labels are ’person with Down’s Syndrome’, ‘person with an intellectual disability’

Cripple; mongoloid; deaf and dumb; retarded

\(\checkmark \)

4.1.2. Depersonalising or impersonal reference

Often people with a disability are referred to collectively as the disabled, the handicapped, the mentally retarded, the blind, the deaf, or paraplegics, spastics, epileptics etc.

The disabled; the handicapped; disabled people; the physically handicapped; a paraplegic; paraplegics; an epileptic; the deaf

\(\checkmark \)

4.1.3. Stereotyping

Never use the terms ‘victim’ or ‘sufferer’ to refer to a person who has or has had an illness, disease or disability

Victim of AIDS; AIDS sufferer; polio victim

\(\checkmark \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salguero, A., Espinilla, M. A flexible text analyzer based on ontologies: an application for detecting discriminatory language. Lang Resources & Evaluation 52, 185–215 (2018). https://doi.org/10.1007/s10579-017-9387-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9387-6

Keywords

Navigation