ABSTRACT
Most of the existing document and web search engines rely on keyword-based queries. To find matches, these queries are processed using retrieval algorithms that rely on word frequencies, topic recentness, document authority, and (in some cases) available ontologies. In this paper, we propose an innovative approach to exploring text collections using a novel keywords-by-concepts (KbC) graph, which supports navigation using domain-specific concepts as well as keywords that are characterizing the text corpus. The KbC graph is a weighted graph, created by tightly integrating keywords extracted from documents and concepts obtained from domain taxonomies. Documents in the corpus are associated to the nodes of the graph based on evidence supporting contextual relevance; thus, the KbC graph supports contextually informed access to these documents. In this paper, we also present CoSeNa (Context-based Search and Navigation) system that leverages the KbC model as the basis for document exploration and retrieval as well as contextually-informed media integration.
- M. J. Bates. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5):407--424, 1989.Google ScholarCross Ref
- D. Ellis. A behavioral approach to information retrieval system design. J. Doc., 45(3):171--212, 1989. Google ScholarDigital Library
- Fellbaum. WordNet: An Electronic Lexical Database. The MIT Press, May 1998.Google Scholar
- S. Gauch, J. Chaffee, and A. Pretschner. Ontology-based personalized search and browsing. Web Intelli. and Agent Sys., 1(3--4):219--234, 2003. Google ScholarDigital Library
- F. A. Grootjen and T. P. van der Weide. Conceptual query expansion. Data Knowl. Eng., 56(2):174--193, 2006. Google ScholarDigital Library
- J. W. Kim and K. S. Candan. Cp/cv: concept similarity mining without frequency information from domain describing taxonomies. In CIKM '06, pages 483--492, 2006. Google ScholarDigital Library
- W.-S. Li and K. S. Candan. Semcog: A hybrid object-based image and video database system and its modeling, language, and query processing. TAPOS, 5(3):163--180, 1999. Google ScholarDigital Library
- R. Mandala, T. Tokunaga, and H. Tanaka. Combining multiple evidence from different types of thesaurus for query expansion. In Proc of ACM SIGIR'99, pages 191--197, 1999. Google ScholarDigital Library
- Y. Qiu and H.-P. Frei. Concept based query expansion. In SIGIR '93, pages 160--169. ACM. Google ScholarDigital Library
- S. Y. Rieh and H. Xie. Analysis of multiple query reformulations on the web: the interactive information retrieval context. Inf. Process. Manage., 42(3):751--768, 2006. Google ScholarDigital Library
- I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev., 18(2):95--145, June 2003. Google ScholarDigital Library
- G. M. Sacco. Dynamic taxonomies: A model for large information bases. IEEE TKDE, 12(3):468--479, 2000. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages 513--523, 1988. Google ScholarDigital Library
- S. P. Shashank. Navigation-aided retrieval. In Proc of WWW'07, pages 391--400. Google ScholarDigital Library
- A. Spink, R. I. Building, D. Wolfram, and T. Saracevic. Searching the web: the public and their queries. J. of the American Society for Information Science and Technology, 52:226--234, 2001. Google ScholarDigital Library
- J. Teevan, C. Alvarado, M. S. Ackerman, and D. R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In SIGCHI'04, pages 415--422. ACM, 2004. Google ScholarDigital Library
Index Terms
- CoSeNa: a context-based search and navigation system
Recommendations
e-Document management in situated interactivity: the WIL approach
Complex organizations need to manage a large amount of information that their employees produce and use in the form of documents: therefore, information systems are adopted to access these documents in electronic format (e-documents) through Intranet or ...
User keyword preference: the Nwords and Rwords experiments
Everyday, millions of people use some form of text-based interface to search inefficiently for information. This reflects a lack of penetration of key developments in Human Computer Interaction (HCI) designed to expedite document retrieval. In the ...
Find it if you can: usability case study of search engines for young users
The Internet is an integral part of the lives of our children nowadays. Using the Internet, in particular search engines, children search for information for school, for their individual interests or simply for entertainment. Unfortunately, research ...
Comments