Ontology refinement for improved information retrieval
Introduction
Ontologies and terminological resources have appeared in information retrieval (IR) either to provide query expansion terms, to perform semantic indexing of documents or to produce a better organization of retrieved documents. However, these ontologies are usually not optimized for IR tasks.
In this paper we rely on language modeling (Ponte & Croft, 1998), as it provides a formal probabilistic background and an interesting retrieval performance. In language modeling, documents are ranked by the probability that the query is generated by the language model of the document. In our work, we rank the documents combining their models with a query model based on the topology of the ontology and a selection of concepts from the ontology (i.e. the query). In this paper, we study mechanisms to improve ontologies to make them more effective in IR.
In the next section, we present related work. Section 3 introduces the ontology query model. Section 4 shows results of this query model. Section 5 presents lexicon cleansing proposals to enhance the quality of our ontology. Section 6 introduces our ontology refinement algorithm and shows the results using the algorithm. Finally, Section 7 presents conclusions and future work.
Section snippets
Related work
The contribution of this paper is related to ontology refinement, the heuristics related to IR that might be interesting for ontology refinement and the usage of ontologies in information retrieval. Next sections review the main approaches in these topics.
Ontology query model
The main aim of the ontology query model (OQM) (Jimeno-Yepes, Berlanga-Llavori, & Rebholz-Schuhmann, 2009) is to produce an IR query from a set of concepts selected by a user browsing the ontology.
We define the following sets and functions. W is the set of words in the lexicon, T is the set of terms in the ontology. returns the set of words in W given the term T. This means that the term breast cancer in T will be represented as the words breast and cancer in LexW. Terms are grouped
Performance of OQM
In this section we describe the experiments carried out to show the effectiveness of queries generated from a domain ontology.
Lexicon cleansing
Several heuristics are presented in this section that we have evaluated extensively in (Jimeno-Yepes et al., 2009), see Table 5. The first heuristic (Corpus) consists in removing terms from the lexicon that are not found in the document collection; i.e. Medline. This heuristic is query independent and allows removing redundant terms and, consequently, reducing space and noise in the lexicon. The second strategy is aimed at finding the specific contexts in which a concept is labeled with a term.
Ontology refinement
In this section, we explain our refinement algorithm. This section is split in three section. In the first one we present our refinement approach. Then we present the information extraction implementation used in our work. Finally, we show the results of the algorithm applied to the data sets.
Conclusions
The ontology query model presents an interesting performance and we plan to investigate on the different parameters and document models to obtain further improvement in retrieval effectiveness.
The results show that our method has identified missing knowledge in the ontology relevant to IR tasks. Thus, a selection of the represented knowledge linked to a concept has to be done to avoid a query drift. This outcome was earlier identified in the literature (Voorhees, 1994). The main contribution of
Acknowledgement
This work was funded by the EC within the BOOTStrep (FP6-028099) Project and by the Spanish National Research Program Project TIN2008-01825/TIN.
References (32)
- et al.
Semantic annotation indexing and retrieval
Web Semantics: Science Services and Agents on the World Wide Web
(2004) - Bai, J., Song, D., Bruza, P., Nie, J., & Cao, G. (2005). Query expansion using term relationships in language models...
- Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd annual...
- Buckley, C., & Salton, G. (1995). Optimization of relevance feedback weights. In Proceedings of the 18th annual...
- Buckley, C., Salton, G., Allan, J., & Singhal, A. (2004). Automatic query expansion using SMART: TREC 3. In Text...
- Cao, G., Nie, J., & Bai, J. (2005). Integrating word relationships into language models. In Proceedings of the 28th...
- et al.
An adaptation of the vector-space model for ontology-based information retrieval
IEEE Transactions on Knowledge and Data Engineering
(2007) - Cimiano, P., Handschuh, S., & Staab, S. (2004). Towards the self-annotating web. In Proceedings of the 13th...
- et al.
Indexing by latent semantic analysis
Journal of the American Society for Information Science
(1990) - et al.
BioIE: Extracting informative sentences from the biomedical literature
Bioinformatics
(2005)
Cited by (43)
Knowledge based word-concept model estimation and refinement for biomedical text mining
2015, Journal of Biomedical InformaticsCitation Excerpt :Regardless of the large number of potentially false positive relations extracted by co-occurrences, the model refinement improves the performance of the initial model only based on the KB. The improvement of the resulting model is global, since the refinement is done on the whole of the KB, and not by a single concept as in [25]. In the document ranking results, we showed significant improvement in ranking over other methods.
Hybrid fuzzy-ontology design using FCA based clustering for information retrieval in semantic web
2015, Procedia Computer ScienceTailored semantic annotation for semantic search
2015, Journal of Web SemanticsRecent developments in the organization goals conformance using ontology
2013, Expert Systems with ApplicationsCitation Excerpt :In this case, information needs to be well retrieved. Jimeno-Yepes et al. (2010) studied on ontology refinement to improve information retrieval. The authors studied on ontology and terminological resources have appeared in information retrieval (IR) either to provide query expansion terms, to perform semantic indexing of documents or to produce a better organization of retrieved documents.
Semantic similarity based food entities recognition using WordNet
2022, Journal of Intelligent and Fuzzy SystemsOntology development life cycle: A review
2019, International Journal of Advanced Science and Technology