Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

Srinivas, P. R.; Wei, Shang-Heng; Cristianini, Nello; Jones, E. G.; Gorin, F. A.

doi:10.1385/NI:3:2:115

Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

Original Article
Published: June 2005

Volume 3, pages 115–131, (2005)
Cite this article

Neuroinformatics Aims and scope Submit manuscript

P. R. Srinivas¹,
Shang-Heng Wei¹,
Nello Cristianini³,
E. G. Jones¹ &
…
F. A. Gorin^1,2

293 Accesses
2 Citations
Explore all metrics

Abstract

Generating informational thesauri that classify, cross-reference, and retrieve diverse and highly detailed neuroscientific information requires identifying related neuroanatomical terms and acronyms within and between species (Gorin et al., 2001) Manual construction of such informational thesauri is laborious, and we describe implementing and evaluating a neuroanatomical term and acronym reconciliation (NTAR) system to assist domain experts with this task. NTAR is composed of two modules. The neuroanatomical term extraction (NTE) module employs a hidden Markov model (HMM) in conjunction with lexical rules to extract neuroanatomical terms (NT) and acronyms (NA) from textual material. The output of the NTE is formatted into collections of term- or acronym-indexed documents composed of sentences and word phrases extracted from textual material. The second information retrieval (IR) module utilizes a vector space model (VSM) and includes a novel, automated relevance feedback algorithm. The IR module retrieves statistically related neuroanatomical terms and acronyms in response to queried neuroanatomical terms and acronyms. Neuroanatomical terms and acronyms retrieval obtained from term-based inquiries were compared with (1) term retrieval obtained by including automated relevance feedback and with (2) term retrieval using “document-to-document” comparisons (context-based VSM). The retrieval of synonymous and similar primate and macaque thalamic terms and acronyms in response to a query list of human thalamic terminology by these three IR approaches was compared against a previously published, manually constructed concordance table of homologous cross-species terms and acronyms. Term-based VSM with automated relevance feedback retrieved 70% and 80% of these primate and macaque terms and acronyms, respectively, listed in the concordance table. Automated feedback algorithm correctly identified 87% of the macaque terms and acronyms that were independently selected by a domain expert as being appropriate for manual relevance feedback. Context-based VSM correctly retrieved 97% and 98% of the primate and macaque terms and acronyms listed in the term homology table. These results indicate that the NTAR system could assist neuroscientists with thesauri creation for closely related, highly detailed neuroanatomical domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

A survey on neural topic models: methods, applications, and challenges

Article Open access 25 January 2024

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Article Open access 09 April 2024

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.Z., Miller, W., and Lipman, D. (1997) Gapped BLAST and PS-Blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402.
Article PubMed CAS Google Scholar
American Heritage Dictionary of the English Language, Fourth Edn., 2000, Houghton-Mifflin, Boston, MA.
Baeza-Yates R. and Riberia-Neto, B. (1999) Modern Information Retrieval. Addison-Wesley (ACM Press series).
Bowden, D.M. and Dubach, M.F. (2003) Neuronames. Neuroinformatics. 1, 43–60.
Article PubMed Google Scholar
Berrois, C.D, Cucina, J.R., Fagan, M.L. (2002) Methods from Semi-automated indexing for High Precision Information Retrieval. JAMA9(6).
Crestani, F., Neural Relevance Feedback for Information Retrieval. In: Uncertainty in Intelligent and Information Systems, B. Bouchon-Meunier, R.R. Yager, and Zadeh L.A. (eds.), World Scientific, Singapore.
Cristianini, N., Kandola J., Vinokourov, A., and Shawe-Taylor, J. Kernel methods for text processing. In: Advances in Learning Theory: Methods, Models and Applications, NATO-ASI series in Computer and System Sciences, Suykens, J.A.K., Horvath, G., Basu, S., Micchelli, C., and Vanewalle, J. (eds.) IOS, Amsterdam.
Frakes W.B. and Baeza-Yates R. (eds.) (1992) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Gorin, F., Hogarth, M., and Gertz, M. (2001) The challenges and rewards of integrating diverse neuroscience information Neuroscientist 7, 18.
CAS Google Scholar
Grefenstette G. (1994) Explorations in Automatic Thesaurus Discovery, Kluwer Academic, Boston.
Google Scholar
Gusfield D. (1997) Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Press Syndicate of the University of Cambridge, Cambridge, UK.
Google Scholar
Hassler R. (1959) Anatomy of the Thalamus: Introduction to the Stereotaxis with an Atlas of the Human Brain, Thieme, New York.
Google Scholar
Homayouni, R., Heinrich, K., Wel, L., Cui, Y., Zhou, M., and Berry, M. (2003) Mining the Bibliome to Identify Functional Relationships Between Genes, UT-ORNL Bioinformatics Summit.
Jones, E.G. (1998) The thalamus of primates, in Handbook of Chemical Neuroanatomy, Vol. 14. Elsevier, Amsterdam.
Google Scholar
Lamping, J. and Rao, R. (1994) “Laying out and Visualizing Large Trees Using a Hyperbolic Space.” Proceedings of UIST’94, November: pp. 13–14.
Leroy, G., Chen, H., and Jesse, D.M. (2003) A shallow parser based on closed-class words to capture relations in biomedical text. J. Biomed. Informatics 36, 145.
Article Google Scholar
Liu, H., Johnson, S.B., and Friedman, C. (2002) Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. J. Am. Med. Inform. Assoc. 9(6), 621–36.
Article PubMed Google Scholar
Liu, H., Teller, V., and Friedman, C. (2004) A multi-aspect comparison study of supervised word sense disambiguation. J. Am. Med. Inform. Assoc. 11(4), 320–331.
Article PubMed Google Scholar
Mao, W. and Chu, W.W. (2002) Free text medical document retrieval via phrase-based vector space model. Proceedings of AMIA Annual Symposium.
Magnus, S. (2001) Vector-based semantic analysis: representing word meanings nased on random labels. Semantic Knowledge Acquisition and Categorization Workshop, ESSLI, Helsinki, Finland.
Google Scholar
Nomadic, G., Spastic, I., and Ananiadou S. (2002) Automatic discovery of term similarities using pattern mining. Proceedings of Second International Workshop on Computational Terminology-Computer, Taipei, Taiwan.
Olszewski, J. (1952) The Thalamus of the Macaca Mulatta, An Atlas for Use With the Stereotaxic Instrument. Krager, Basel.
Google Scholar
Patwardhan, S., Banerjee, S., and Pedersen, T. (2003) Using measures of semantic relatedness for word sense disambiguation. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City.
Porter, M. F. (1980) An algorithm for suffix stripping. Program 14(3), 130–137.
Google Scholar
Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M., and Rumshisky, A. (2001) Linguistic Knowledge Extraction from Medline: Automatic Construction of an Acronym Database. Medinfo.
Qtag v 3.01, Portable POS Tagger. Oliver Mason, Department of English, School of Humanities, University of Birmingham, UK. http://web.bham.ac.uk/O.Mason/
Raghaven, V.V., Jung, G.S., and Bollmann, P. (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Info Systems 7(3), 205–229.
Article Google Scholar
Ratnaparkh, A. (1997) MXTERMINATOR.
Rindflesh T., Hunter L., and Aronson, A. (1999) Mining molecular binding terminology from biomedical text. Proceedings of AMIA Annual Symposium.
Rindflesh, T., Rajan, J., and Hunter, L. (2000) Extracting molecular binding relationships from biomedical text. Proceedings of the 6th Applied Natural Language Processing Conference, pp. 188–195.
Salton, G., Wong, A., and Yang, C.S. (1975) A vector space model for automatic indexing, in Communica-tions of the ACM, Vol. 18. p. 613.
Article Google Scholar
Salton, G. and McGill, M.J. (1983) In: Introduction to Modern Information Retrieval, Stewart, C.E. and Vastyan, J.E (eds.) McGraw-Hill, NY.
Google Scholar
Salton, G. (1971) The SMART Retrieval System. Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Salton, G. and Buckley, C. (1990) Improving retrieval performance by relevance feedback, J. Am. Soc. Info Sci. 41(4), 288–297.
Article Google Scholar
SPECIALIST Lexicon. National Library of Medicine, Unified Medical Language System (UMLS) Project.
Srinivas, P.R., Gusfield, D., Mason, O., et al. (2002) Neuroanatomical term generation and comparison between two terminologies, Neuroinformatics 1, 177.
Article Google Scholar
Manning, C.D. and Schutze, H. (2000) Foundations of Statistical Natural Language. Mit Press, Cambridge, MA.
Google Scholar
Yao, D. et al. Pathway Finder: paving the way towards automatic pathway extraction in ACM International Conference Proceeding Series Archive. Proceedings of the Second Conference on Asia-Pacific Bioinformatics, 29, pp. 53–62.

Download references

Author information

Authors and Affiliations

Center for Neuroscience, UC Davis, Davis, CA
P. R. Srinivas, Shang-Heng Wei, E. G. Jones & F. A. Gorin
Department of Neurology, School of Medicine, UC Davis, Davis, CA
F. A. Gorin
Department of Statistics, UC Davis, Davis, CA
Nello Cristianini

Authors

P. R. Srinivas
View author publications
You can also search for this author in PubMed Google Scholar
Shang-Heng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Nello Cristianini
View author publications
You can also search for this author in PubMed Google Scholar
E. G. Jones
View author publications
You can also search for this author in PubMed Google Scholar
F. A. Gorin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. A. Gorin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivas, P.R., Wei, SH., Cristianini, N. et al. Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts. Neuroinform 3, 115–131 (2005). https://doi.org/10.1385/NI:3:2:115

Download citation

Issue Date: June 2005
DOI: https://doi.org/10.1385/NI:3:2:115

Index Entries

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

A survey on neural topic models: methods, applications, and challenges

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Index Entries

Navigation

Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

A survey on neural topic models: methods, applications, and challenges

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Index Entries

Search

Navigation