An efficient concept-based retrieval model for enhancing text retrieval quality

Shehata, Shady; Karray, Fakhri; Kamel, Mohamed S.

doi:10.1007/s10115-012-0504-y

An efficient concept-based retrieval model for enhancing text retrieval quality

Regular Paper
Published: 12 June 2012

Volume 35, pages 411–434, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shady Shehata¹,
Fakhri Karray² &
Mohamed S. Kamel²

466 Accesses
12 Citations
Explore all metrics

Abstract

Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aas K, Eikvil L (1999) Text categorisation: a survey. Technical report 941, Norwegian Computing Center
Amati G, van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20(4): 357–389
Article Google Scholar
Apache jakarta lucene search engine (version 1.3). http://lucene.apache.org/
Apostolakis J (2009) An introduction to data mining. In: Data mining in crystallography. Springer, pp 1–35
Baccini A, Déjean S, Lafage L, Mothe J (2011) How many performance measures to evaluate information retrieval systems? Knowl Inf Syst 1–21
Belkin N, Croft W (1987) Retrieval techniques. Annu Rev Inf Sci Technol 22: 109–145
Google Scholar
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: SIGIR’04: proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 25–32
Cole R (1998) Survey of the state of the art in human language technology (studies in natural language processing). Cambridge University Press, New York
Google Scholar
Collins M (1999) Head-driven statistical model for natural language parsing. PhD thesis, University of Pennsylvania
Cucerzan S (2010) A case study of using web search statistics: case restoration. In: Computational linguistics and intelligent text processing. Springer, pp 199–211
Fagan J (1989) The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. J Am Soc Inf Sci 40(2): 115–132
Article Google Scholar
Fillmore C (1968) The case for case In Universals in linguistic theory. Holt, Rinehart and Winston Inc., , New York
Google Scholar
Géry M, Jurafsky D (2002) Automatic labeling of semantic roles. Comput Linguist 28(3): 245–288
Article Google Scholar
Gildea D, Largeron C (2011) Bm25t: a bm25 extension for focused information retrieval. Knowl Inf Syst 1–25
Hull D (1993) Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of special interest group on information retrieval (ACM SIGIR)
Jurafsky D, Martin JH (2000) Speech and language processing. Prentice Hall Inc., Upper Saddle River
Google Scholar
Kalogeratos A, Likas A (2011) Text document clustering using global term context vectors. Knowl Inf Syst 1–20
Kingsbury P, Palmer M (2003) Propbank: the next level of treebank. In: Proceedings of treebanks and lexical theories
Liu X, Webster J, Kit C (2009) An extractive text summarizer based on significant words. In: Computer processing of oriental languages. Language Technology for the Knowledge-based Economy. Springer, pp 168–178
Ounis I, Amati G, Plachouras V, He B, Macdonald C, Lioma C (2006) Terrier: a high performance and scalable information retrieval platform. In: Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006)
Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26(2): 175–193
Article Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Program 14(3): 130–137
Article Google Scholar
Pradhan S, Hacioglu K, Ward W, Martin JH, Jurafsky D (2003) Semantic role parsing: Adding semantic structure to unstructured text. In: Proceedings of the 3th IEEE international conference on data mining (ICDM), pp 629–632
Pradhan S, Ward W, Hacioglu K, Martin J, Jurafsky D (2004) Shallow semantic parsing using support vector machines. In: Proceedings of the Human Language Technology/North American Association for Computational Linguistics (HLT/NAACL)
Pradhan S, Hacioglu K, Krugler V, Ward W, Martin JH, Jurafsky D (2005) Support vector learning for semantic argument classification. Mach Learn 60(1–3): 11–39
Article Google Scholar
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York
MATH Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11): 112–117
Article Google Scholar
Shehata S, Karray F, Kamel M (2006) Enhancing text retrieval performance using conceptual ontological graph. In: ICDM workshops, pp 39–44
Shehata S, Karray F, Kamel M (2007) Enhancing search engine quality using concept-based text retrieval. In: International conference on web intelligence (WI), USA
Shehata S, Karray F, Kamel M (2007) A concept-based model for enhancing text categorization. In: Knowledge discovery and data mining (KDD), USA
Tombros A, Rijsbergen CJ (2004) Query-sensitive similarity measures for information retrieval. Knowl Inf Syst 617–642

Download references

Author information

Authors and Affiliations

Desire2Learn Incorporated, Kitchener, Ontario, Canada
Shady Shehata
Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Fakhri Karray & Mohamed S. Kamel

Authors

Shady Shehata
View author publications
You can also search for this author in PubMed Google Scholar
Fakhri Karray
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed S. Kamel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shady Shehata.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shehata, S., Karray, F. & Kamel, M.S. An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 35, 411–434 (2013). https://doi.org/10.1007/s10115-012-0504-y

Download citation

Received: 21 May 2008
Revised: 09 April 2011
Accepted: 25 July 2011
Published: 12 June 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s10115-012-0504-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient concept-based retrieval model for enhancing text retrieval quality

Abstract

Access this article

Similar content being viewed by others

Revisiting the Term Frequency in Concept-Based IR Models

Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval

SemApp: A Semantic Approach to Enhance Information Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient concept-based retrieval model for enhancing text retrieval quality

Abstract

Access this article

Similar content being viewed by others

Revisiting the Term Frequency in Concept-Based IR Models

Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval

SemApp: A Semantic Approach to Enhance Information Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation