Ontology refinement for improved information retrieval

https://doi.org/10.1016/j.ipm.2009.05.008Get rights and content

Abstract

Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results.

Introduction

Ontologies and terminological resources have appeared in information retrieval (IR) either to provide query expansion terms, to perform semantic indexing of documents or to produce a better organization of retrieved documents. However, these ontologies are usually not optimized for IR tasks.

In this paper we rely on language modeling (Ponte & Croft, 1998), as it provides a formal probabilistic background and an interesting retrieval performance. In language modeling, documents are ranked by the probability that the query is generated by the language model of the document. In our work, we rank the documents combining their models with a query model based on the topology of the ontology and a selection of concepts from the ontology (i.e. the query). In this paper, we study mechanisms to improve ontologies to make them more effective in IR.

In the next section, we present related work. Section 3 introduces the ontology query model. Section 4 shows results of this query model. Section 5 presents lexicon cleansing proposals to enhance the quality of our ontology. Section 6 introduces our ontology refinement algorithm and shows the results using the algorithm. Finally, Section 7 presents conclusions and future work.

Section snippets

Related work

The contribution of this paper is related to ontology refinement, the heuristics related to IR that might be interesting for ontology refinement and the usage of ontologies in information retrieval. Next sections review the main approaches in these topics.

Ontology query model

The main aim of the ontology query model (OQM) (Jimeno-Yepes, Berlanga-Llavori, & Rebholz-Schuhmann, 2009) is to produce an IR query from a set of concepts C selected by a user browsing the ontology.

We define the following sets and functions. W is the set of words in the lexicon, T is the set of terms in the ontology. LexW(T) returns the set of words in W given the term T. This means that the term breast cancer in T will be represented as the words breast and cancer in LexW. Terms are grouped

Performance of OQM

In this section we describe the experiments carried out to show the effectiveness of queries generated from a domain ontology.

Lexicon cleansing

Several heuristics are presented in this section that we have evaluated extensively in (Jimeno-Yepes et al., 2009), see Table 5. The first heuristic (Corpus) consists in removing terms from the lexicon that are not found in the document collection; i.e. Medline. This heuristic is query independent and allows removing redundant terms and, consequently, reducing space and noise in the lexicon. The second strategy is aimed at finding the specific contexts in which a concept is labeled with a term.

Ontology refinement

In this section, we explain our refinement algorithm. This section is split in three section. In the first one we present our refinement approach. Then we present the information extraction implementation used in our work. Finally, we show the results of the algorithm applied to the data sets.

Conclusions

The ontology query model presents an interesting performance and we plan to investigate on the different parameters and document models to obtain further improvement in retrieval effectiveness.

The results show that our method has identified missing knowledge in the ontology relevant to IR tasks. Thus, a selection of the represented knowledge linked to a concept has to be done to avoid a query drift. This outcome was earlier identified in the literature (Voorhees, 1994). The main contribution of

Acknowledgement

This work was funded by the EC within the BOOTStrep (FP6-028099) Project and by the Spanish National Research Program Project TIN2008-01825/TIN.

References (32)

  • A. Kiryakov et al.

    Semantic annotation indexing and retrieval

    Web Semantics: Science Services and Agents on the World Wide Web

    (2004)
  • Bai, J., Song, D., Bruza, P., Nie, J., & Cao, G. (2005). Query expansion using term relationships in language models...
  • Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd annual...
  • Buckley, C., & Salton, G. (1995). Optimization of relevance feedback weights. In Proceedings of the 18th annual...
  • Buckley, C., Salton, G., Allan, J., & Singhal, A. (2004). Automatic query expansion using SMART: TREC 3. In Text...
  • Cao, G., Nie, J., & Bai, J. (2005). Integrating word relationships into language models. In Proceedings of the 28th...
  • P. Castells et al.

    An adaptation of the vector-space model for ontology-based information retrieval

    IEEE Transactions on Knowledge and Data Engineering

    (2007)
  • Cimiano, P., Handschuh, S., & Staab, S. (2004). Towards the self-annotating web. In Proceedings of the 13th...
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    Journal of the American Society for Information Science

    (1990)
  • A. Divoli et al.

    BioIE: Extracting informative sentences from the biomedical literature

    Bioinformatics

    (2005)
  • Efthimiadis, E. (1996). Query expansion. In Martha E. Williams (Ed.), Annual review of information systems and...
  • Faatz, A., & Steinmetz, R. (2002). Ontology enrichment with texts from the www. In Semantic web mining,...
  • Hahn, U., & Schnattinger, K. (1998). Towards text knowledge engineering. In Proceedings of the fifteenth national/tenth...
  • Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. Technical Report...
  • Hearst, M. (1998). Automated discovery of wordnet relations. In M. Press (Ed.), WordNet: An electronic lexical database...
  • Jimeno-Yepes, A., & Berlanga-Llavori, R. (2008). Study of named entity recognition in biomedicine: Towards the...
  • Cited by (43)

    • Knowledge based word-concept model estimation and refinement for biomedical text mining

      2015, Journal of Biomedical Informatics
      Citation Excerpt :

      Regardless of the large number of potentially false positive relations extracted by co-occurrences, the model refinement improves the performance of the initial model only based on the KB. The improvement of the resulting model is global, since the refinement is done on the whole of the KB, and not by a single concept as in [25]. In the document ranking results, we showed significant improvement in ranking over other methods.

    • Recent developments in the organization goals conformance using ontology

      2013, Expert Systems with Applications
      Citation Excerpt :

      In this case, information needs to be well retrieved. Jimeno-Yepes et al. (2010) studied on ontology refinement to improve information retrieval. The authors studied on ontology and terminological resources have appeared in information retrieval (IR) either to provide query expansion terms, to perform semantic indexing of documents or to produce a better organization of retrieved documents.

    • Semantic similarity based food entities recognition using WordNet

      2022, Journal of Intelligent and Fuzzy Systems
    • Ontology development life cycle: A review

      2019, International Journal of Advanced Science and Technology
    View all citing articles on Scopus
    View full text