tutorial

Measuring Relatedness Between Scientific Entities in Annotation Datasets

Authors:
Guillermo Palma

Universidad Simón Bolívar, Caracas, Venezuela

Universidad Simón Bolívar, Caracas, Venezuela
View Profile

,
Maria-Esther Vidal

Universidad Simón Bolívar, Caracas, Venezuela

Universidad Simón Bolívar, Caracas, Venezuela
View Profile

,
Eric Haag

University of Maryland, College Park, USA

University of Maryland, College Park, USA
View Profile

,
Louiqa Raschid

University of Maryland, College Park, USA

University of Maryland, College Park, USA
View Profile

,
Andreas Thor

University of Leipzig, Germany

University of Leipzig, Germany
View Profile

BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical InformaticsSeptember 2013Pages 367–376https://doi.org/10.1145/2506583.2506651

Published:22 September 2013Publication History

BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Pages 367–376

ABSTRACT

Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.

References

Classified transporter families in arabidopsis. http://www.clfs.umd.edu/CBMG/faculty/sze/lab/AtTransporters.html.Google Scholar
D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with coma++. In SIGMOD Conference, pages 906--908, 2005. Google ScholarDigital Library
Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011. Google ScholarDigital Library
S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(4):509--522, 2002. Google ScholarDigital Library
M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75--94, 2005. Google ScholarDigital Library
J. Benik, C. Chang, L. Raschid, M. E. Vidal, G. Palma, and A. Thor. Finding cross genome patterns in annotation graphs. In Proceedings of Data Integration in the Life Sciences (DILS), 2012. Google ScholarDigital Library
S. Bhagwani, S. Satapathy, and H. Karnick. Semantic textual similarity using maximal weighted bipartite graph matching. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pages 579--585. Association for Computational Linguistics, 2012. Google ScholarDigital Library
K. Bleakley and Y. Yamanishi. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics, 25(18):2397--2403, 2009. Google ScholarDigital Library
C. Chen, S. Hsieh, Y. Weng, W. Chang, and F. Lai. Semantic similarity measure in biomedical domain leverage web search engine. Proc.IEEE Eng Med Biol Soc, pages 4436--4439, 2010.Google Scholar
W. Cook and A. Rohe. Blossom iv: Code for minimum weight perfect matchings. http://www2.isye.gatech.edu/~wcook/software.html.Google Scholar
M. A. Jaro. Probabilistic linkage of large public health data files. Statistics in Medicine, pages 491--498, 1995.Google Scholar
J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008, 1997.Google Scholar
J. K. Kalervo Jarvelin. Cumulated gain-based evaluation of ir techniques. JACM Transactions on Information Systems, 20(4):422--446, 2002. Google ScholarDigital Library
D. Lin. An information-theoretic definition of similarity. In ICML, pages 296--304, 1998. Google ScholarDigital Library
B. McInnes, T. Pedersen, and S. Pakhomov. Umls-interface and umls-similarity: Open source software for measuring paths and semantic similarity. Proceedings of the AMIA Symposium, pages 431--435, 2009.Google Scholar
S. Pakhomov, B. McInnes, T. Adam, Y. Liu, T. Pedersen, and G. Melton. Semantic similarity and relatedness between clinical terms: An experimental study. Proceedings of the AMIA Symposium, pages 572--576, 2010.Google Scholar
T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarDigital Library
V. Pekar and S. Staab. Taxonomy learning - factoring the structure of a taxonomy into a semantic classification decision. In COLING, 2002. Google ScholarDigital Library
C. Pesquita, D. Faria, A. Falcão, P. Lord, and F. Couto. Semantic similarity in biomedical ontologies. PLoS Computational Biology, 5(7):e1000443, 2009.Google ScholarCross Ref
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995. Google ScholarDigital Library
J. Schwartz, A. Steger, and A. Weißl. Fast algorithms for weighted bipartite matching. In WEA, pages 476--487, 2005. Google ScholarDigital Library
Y. Shavitt, E. Weinsberg, and U. Weinsberg. Estimating peer similarity using distance of shared files. In International workshop on peer-to-peer systems (IPTPS), volume 104, 2010. Google ScholarDigital Library
C. Shi, X. Kong, P. S. Yu, S. Xie, and B. Wu. Relevance search in heterogeneous networks. In EDBT, pages 180--191, 2012. Google ScholarDigital Library
P. Shvaiko and J. Euzenat. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng., 25(1):158--176, 2013. Google ScholarDigital Library
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, March 1981.Google ScholarCross Ref
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, pages 195--197, 1981.Google ScholarCross Ref
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992--1003, 2011.Google ScholarDigital Library
A. Thor, T. Kirsten, and E. Rahm. Instance-based matching of hierarchical ontologies. In BTW, pages 436--448, 2007.Google Scholar
J. Z. Wang, Z. Du, R. Payattakool, P. S. Yu, and C.-F. Chen. A new method to measure the semantic similarity of go terms. Bioinformatics, 23(10):1274--1281, 2007. Google ScholarDigital Library

Index Terms

Measuring Relatedness Between Scientific Entities in Annotation Datasets
1. Information systems
  1. Information systems applications

Recommendations

Computing Semantic Relatedness between Named Entities Using Wikipedia
AICI '10: Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 01

In this paper the authors suggest an novel approach that uses Wikipedia to measure the semantic relatedness between Chinese named entities, such as names of persons, books, softwares, etc. The relatedness is measured through articles in Wikipedia that ...
Read More
Annotation of chemical named entities
BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

We describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was ...
Read More
Automatic semantic web annotation of named entities
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligence

This paper describes a method to perform automated semantic annotation of named entities contained in large corpora. The semantic annotation is made in the context of the Semantic Web. The method is based on an algorithm that compares the set of words ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
September 2013
987 pages
ISBN:9781450324342
DOI:10.1145/2506583

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 September 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Annotation datasets
Annotation similarity
topological distance
weighted bipartite match
Qualifiers
- tutorial
- Research
- Refereed limited
Conference

Acceptance Rates
BCB'13 Paper Acceptance Rate43of148submissions,29%Overall Acceptance Rate254of885submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 144
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring Relatedness Between Scientific Entities in Annotation Datasets

BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Computing Semantic Relatedness between Named Entities Using Wikipedia

Annotation of chemical named entities

Automatic semantic web annotation of named entities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Measuring Relatedness Between Scientific Entities in Annotation Datasets

BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Computing Semantic Relatedness between Named Entities Using Wikipedia

Annotation of chemical named entities

Automatic semantic web annotation of named entities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media