skip to main content
10.1145/2506583.2506651acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

Measuring Relatedness Between Scientific Entities in Annotation Datasets

Published:22 September 2013Publication History

ABSTRACT

Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.

References

  1. Classified transporter families in arabidopsis. http://www.clfs.umd.edu/CBMG/faculty/sze/lab/AtTransporters.html.Google ScholarGoogle Scholar
  2. D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with coma++. In SIGMOD Conference, pages 906--908, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(4):509--522, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75--94, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Benik, C. Chang, L. Raschid, M. E. Vidal, G. Palma, and A. Thor. Finding cross genome patterns in annotation graphs. In Proceedings of Data Integration in the Life Sciences (DILS), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Bhagwani, S. Satapathy, and H. Karnick. Semantic textual similarity using maximal weighted bipartite graph matching. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pages 579--585. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Bleakley and Y. Yamanishi. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics, 25(18):2397--2403, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chen, S. Hsieh, Y. Weng, W. Chang, and F. Lai. Semantic similarity measure in biomedical domain leverage web search engine. Proc.IEEE Eng Med Biol Soc, pages 4436--4439, 2010.Google ScholarGoogle Scholar
  10. W. Cook and A. Rohe. Blossom iv: Code for minimum weight perfect matchings. http://www2.isye.gatech.edu/~wcook/software.html.Google ScholarGoogle Scholar
  11. M. A. Jaro. Probabilistic linkage of large public health data files. Statistics in Medicine, pages 491--498, 1995.Google ScholarGoogle Scholar
  12. J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008, 1997.Google ScholarGoogle Scholar
  13. J. K. Kalervo Jarvelin. Cumulated gain-based evaluation of ir techniques. JACM Transactions on Information Systems, 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Lin. An information-theoretic definition of similarity. In ICML, pages 296--304, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. McInnes, T. Pedersen, and S. Pakhomov. Umls-interface and umls-similarity: Open source software for measuring paths and semantic similarity. Proceedings of the AMIA Symposium, pages 431--435, 2009.Google ScholarGoogle Scholar
  16. S. Pakhomov, B. McInnes, T. Adam, Y. Liu, T. Pedersen, and G. Melton. Semantic similarity and relatedness between clinical terms: An experimental study. Proceedings of the AMIA Symposium, pages 572--576, 2010.Google ScholarGoogle Scholar
  17. T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Pekar and S. Staab. Taxonomy learning - factoring the structure of a taxonomy into a semantic classification decision. In COLING, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Pesquita, D. Faria, A. Falcão, P. Lord, and F. Couto. Semantic similarity in biomedical ontologies. PLoS Computational Biology, 5(7):e1000443, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Schwartz, A. Steger, and A. Weißl. Fast algorithms for weighted bipartite matching. In WEA, pages 476--487, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Shavitt, E. Weinsberg, and U. Weinsberg. Estimating peer similarity using distance of shared files. In International workshop on peer-to-peer systems (IPTPS), volume 104, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Shi, X. Kong, P. S. Yu, S. Xie, and B. Wu. Relevance search in heterogeneous networks. In EDBT, pages 180--191, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Shvaiko and J. Euzenat. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng., 25(1):158--176, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, March 1981.Google ScholarGoogle ScholarCross RefCross Ref
  26. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, pages 195--197, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  27. Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992--1003, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Thor, T. Kirsten, and E. Rahm. Instance-based matching of hierarchical ontologies. In BTW, pages 436--448, 2007.Google ScholarGoogle Scholar
  29. J. Z. Wang, Z. Du, R. Payattakool, P. S. Yu, and C.-F. Chen. A new method to measure the semantic similarity of go terms. Bioinformatics, 23(10):1274--1281, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring Relatedness Between Scientific Entities in Annotation Datasets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
      September 2013
      987 pages
      ISBN:9781450324342
      DOI:10.1145/2506583

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 September 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

      Acceptance Rates

      BCB'13 Paper Acceptance Rate43of148submissions,29%Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader