skip to main content
10.1145/2872427.2882999acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

From Diversity-based Prediction to Better Ontology & Schema Matching

Authors Info & Claims
Published:11 April 2016Publication History

ABSTRACT

Ontology & schema matching predictors assess the quality of matchers in the absence of an exact match. We propose MCD (Match Competitor Deviation), a new diversity-based predictor that compares the strength of a matcher confidence in the correspondence of a concept pair with respect to other correspondences that involve either concept. We also propose to use MCD as a regulator to optimally control a balance between Precision and Recall and use it towards 1:1 matching by combining it with a similarity measure that is based on solving a maximum weight bipartite graph matching (MWBM). Optimizing the combined measure is known to be an NP-Hard problem. Therefore, we propose CEM, an approximation to an optimal match by efficiently scanning multiple possible matches, using rare event estimation. Using a thorough empirical study over several benchmark real-world datasets, we show that MCD outperforms other state-of-the-art predictor and that CEM significantly outperform existing matchers.

References

  1. S. Anand. The multi-criteria bipartite matching problem. 2006.Google ScholarGoogle Scholar
  2. C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys (CSUR), 18(4):323--364, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bellahsene. Schema Matching and Mapping. Springer-Verlag New York Inc, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. Proceedings of the VLDB Endowment, 4(11):695--701, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. F. Cruz, F. P. Antonelli, and C. Stroe. Efficient selection of mappings and automatic quality-driven combination of matching methods. In Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009) Chantilly, USA, October 25, 2009, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. H. Do and E. Rahm. Coma: a system for flexible combination of schema matching approaches. In Proceedings of VLDB, pages 610--621. VLDB Endowment, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. dos Santos Mello, S. Castano, and C. A. Heuser. A method for the unification of XML schemata. Information and Software Technology, 44(4):241 -- 249, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Ehrgott. Multicriteria optimization, volume 2. Springer, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Euzenat and P. Shvaiko. Ontology matching. Springer-Verlag New York Inc, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  10. G. E. Evans, J. M. Keith, and D. P. Kroese. Parallel cross-entropy optimization. In Proc. of WSC, pages 2196--2202, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gal. Uncertain schema matching. Synthesis Lectures on Data Management, 3(1):1--97, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Gal, A. Anaby-Tavor, A. Trombetta, and D. Montesi. A framework for modeling and evaluating automatic semantic reconciliation. The VLDB Journal, 14(1):50--67, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gal, H. Roitman, and T. Sagi. From diversity-based prediction to better schema matching. Technical Report IE/IS-2015-02, Technion -- Israel Institute of Technology, 2015. http://ie.technion.ac.il/tech_reports/1430383901_MCD.pdf.Google ScholarGoogle Scholar
  14. A. Gal and T. Sagi. Tuning the ensemble selection process of schema matchers. Information Systems, 35(8):845--859, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Galil, S. Micali, and H. Gabow. An O(EV$\backslash$logV) algorithm for finding a maximal weighted matching in general graphs. SIAM Journal on Computing, 15(1):120--130, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Gawinecki. Abbreviation expansion in lexical annotation of schema. Camogli (Genova), Italy June 25th, 2009 Co-located with SEBD, page 61, 2009.Google ScholarGoogle Scholar
  17. B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD '03, pages 217--228, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997.Google ScholarGoogle Scholar
  19. J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In Proc. ICDE, pages 57 -- 68, april 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Mao, Y. Peng, and M. Spring. A harmony based adaptive ontology mapping approach. In Proc. of SWWS, 2008.Google ScholarGoogle Scholar
  21. L. Margolin. On the convergence of the cross-entropy method. Annals of Operations Research, 134(1):201--214, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Marie and A. Gal. On the stable marriage of maximum weight royal couples. In Proceedings of AAAI Workshop on Information Integration on the Web, 2007.Google ScholarGoogle Scholar
  23. P. D. Meo, G. Quattrone, G. Terracina, and D. Ursino. Integration of XML schemas at various 'severity' levels. Information Systems, 31(6):397 -- 434, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Modica, A. Gal, and H. Jamil. The use of machine-generated ontologies in dynamic information seeking. In CoopIS, pages 433--447, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Peukert, J. Eberius, and E. Rahm. AMC-a framework for modelling and comparing matching systems as matching processes. In ICDE, pages 1304--1307. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Peukert, J. Eberius, and E. Rahm. A self-configuring schema matching system. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Ramshaw and R. E. Tarjan. On minimum-cost assignments in unbalanced bipartite graphs. Technical report, HP Labs technical report HPL-2012--40R1, www. hpl. hp. com/techreports/HPL-2012--40R1. html, 2012.Google ScholarGoogle Scholar
  28. L. Ratinov and E. Gudes. Abbreviation expansion in schema matching and web integration. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 485--489. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Rodriguez-Gianolli and J. Mylopoulos. A semantic approach to xml-based data integration. In H. S.Kunii, S. Jajodia, and A. S¸lvberg, editors, Conceptual Modeling ER 2001, volume 2224 of Lecture Notes in Computer Science, pages 117--132. Springer Berlin Heidelberg, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Y. Rubinstein and D. P. Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Y. Rubinstein and D. P. Kroese. Simulation and the Monte Carlo method, volume 707. John Wiley & Sons, 2011.Google ScholarGoogle Scholar
  32. T. Sagi and A. Gal. Schema matching prediction with applications to data source discovery and dynamic ensembling. The VLDB Journal, 22(5):689--710, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. P. Sheth and J. A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 22(3):183--236, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Shvaiko and J. Euzenat. A survey of schema-based matching approaches. Journal on Data Semantics IV, pages 146--171, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Tu and Y. Yu. CMC: Combining multiple schema-matching strategies based on credibility prediction. In L. Zhou, B. Ooi, and X. Meng, editors, Database Systems for Advanced Applications, volume 3453 of LNCS, pages 995--995. Springer Berlin / Heidelberg, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Wang, J. Wen, F. Lochovsky, and W. Ma. Instance-based schema matching for web databases by domain-specific query probing. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 408--419. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Weidlich, T. Sagi, H. Leopold, A. Gal, and J. Mendling. Predicting the quality of process model matching. In Business Process Management, volume 8094 of LNCS, pages 203--210. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From Diversity-based Prediction to Better Ontology & Schema Matching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '16: Proceedings of the 25th International Conference on World Wide Web
      April 2016
      1482 pages
      ISBN:9781450341431

      Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 11 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader