ABSTRACT
Ontology & schema matching predictors assess the quality of matchers in the absence of an exact match. We propose MCD (Match Competitor Deviation), a new diversity-based predictor that compares the strength of a matcher confidence in the correspondence of a concept pair with respect to other correspondences that involve either concept. We also propose to use MCD as a regulator to optimally control a balance between Precision and Recall and use it towards 1:1 matching by combining it with a similarity measure that is based on solving a maximum weight bipartite graph matching (MWBM). Optimizing the combined measure is known to be an NP-Hard problem. Therefore, we propose CEM, an approximation to an optimal match by efficiently scanning multiple possible matches, using rare event estimation. Using a thorough empirical study over several benchmark real-world datasets, we show that MCD outperforms other state-of-the-art predictor and that CEM significantly outperform existing matchers.
- S. Anand. The multi-criteria bipartite matching problem. 2006.Google Scholar
- C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys (CSUR), 18(4):323--364, 1986. Google ScholarDigital Library
- Z. Bellahsene. Schema Matching and Mapping. Springer-Verlag New York Inc, 2011. Google ScholarDigital Library
- P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. Proceedings of the VLDB Endowment, 4(11):695--701, 2011.Google ScholarDigital Library
- I. F. Cruz, F. P. Antonelli, and C. Stroe. Efficient selection of mappings and automatic quality-driven combination of matching methods. In Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009) Chantilly, USA, October 25, 2009, 2009. Google ScholarDigital Library
- H. H. Do and E. Rahm. Coma: a system for flexible combination of schema matching approaches. In Proceedings of VLDB, pages 610--621. VLDB Endowment, 2002. Google ScholarDigital Library
- R. dos Santos Mello, S. Castano, and C. A. Heuser. A method for the unification of XML schemata. Information and Software Technology, 44(4):241 -- 249, 2002. Google ScholarCross Ref
- M. Ehrgott. Multicriteria optimization, volume 2. Springer, 2005. Google ScholarCross Ref
- J. Euzenat and P. Shvaiko. Ontology matching. Springer-Verlag New York Inc, 2007. Google ScholarCross Ref
- G. E. Evans, J. M. Keith, and D. P. Kroese. Parallel cross-entropy optimization. In Proc. of WSC, pages 2196--2202, 2007. Google ScholarDigital Library
- A. Gal. Uncertain schema matching. Synthesis Lectures on Data Management, 3(1):1--97, 2011. Google ScholarCross Ref
- A. Gal, A. Anaby-Tavor, A. Trombetta, and D. Montesi. A framework for modeling and evaluating automatic semantic reconciliation. The VLDB Journal, 14(1):50--67, 2005. Google ScholarDigital Library
- A. Gal, H. Roitman, and T. Sagi. From diversity-based prediction to better schema matching. Technical Report IE/IS-2015-02, Technion -- Israel Institute of Technology, 2015. http://ie.technion.ac.il/tech_reports/1430383901_MCD.pdf.Google Scholar
- A. Gal and T. Sagi. Tuning the ensemble selection process of schema matchers. Information Systems, 35(8):845--859, 2010. Google ScholarDigital Library
- Z. Galil, S. Micali, and H. Gabow. An O(EV$\backslash$logV) algorithm for finding a maximal weighted matching in general graphs. SIAM Journal on Computing, 15(1):120--130, 1986. Google ScholarDigital Library
- M. Gawinecki. Abbreviation expansion in lexical annotation of schema. Camogli (Genova), Italy June 25th, 2009 Co-located with SEBD, page 61, 2009.Google Scholar
- B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD '03, pages 217--228, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997.Google Scholar
- J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In Proc. ICDE, pages 57 -- 68, april 2005. Google ScholarDigital Library
- M. Mao, Y. Peng, and M. Spring. A harmony based adaptive ontology mapping approach. In Proc. of SWWS, 2008.Google Scholar
- L. Margolin. On the convergence of the cross-entropy method. Annals of Operations Research, 134(1):201--214, 2005. Google ScholarCross Ref
- A. Marie and A. Gal. On the stable marriage of maximum weight royal couples. In Proceedings of AAAI Workshop on Information Integration on the Web, 2007.Google Scholar
- P. D. Meo, G. Quattrone, G. Terracina, and D. Ursino. Integration of XML schemas at various 'severity' levels. Information Systems, 31(6):397 -- 434, 2006. Google ScholarDigital Library
- G. Modica, A. Gal, and H. Jamil. The use of machine-generated ontologies in dynamic information seeking. In CoopIS, pages 433--447, 2001. Google ScholarDigital Library
- E. Peukert, J. Eberius, and E. Rahm. AMC-a framework for modelling and comparing matching systems as matching processes. In ICDE, pages 1304--1307. IEEE, 2011. Google ScholarDigital Library
- E. Peukert, J. Eberius, and E. Rahm. A self-configuring schema matching system. In ICDE, 2012. Google ScholarDigital Library
- L. Ramshaw and R. E. Tarjan. On minimum-cost assignments in unbalanced bipartite graphs. Technical report, HP Labs technical report HPL-2012--40R1, www. hpl. hp. com/techreports/HPL-2012--40R1. html, 2012.Google Scholar
- L. Ratinov and E. Gudes. Abbreviation expansion in schema matching and web integration. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pages 485--489. IEEE Computer Society, 2004. Google ScholarDigital Library
- P. Rodriguez-Gianolli and J. Mylopoulos. A semantic approach to xml-based data integration. In H. S.Kunii, S. Jajodia, and A. S¸lvberg, editors, Conceptual Modeling ER 2001, volume 2224 of Lecture Notes in Computer Science, pages 117--132. Springer Berlin Heidelberg, 2001. Google ScholarDigital Library
- R. Y. Rubinstein and D. P. Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer, 2004. Google ScholarDigital Library
- R. Y. Rubinstein and D. P. Kroese. Simulation and the Monte Carlo method, volume 707. John Wiley & Sons, 2011.Google Scholar
- T. Sagi and A. Gal. Schema matching prediction with applications to data source discovery and dynamic ensembling. The VLDB Journal, 22(5):689--710, 2013. Google ScholarDigital Library
- A. P. Sheth and J. A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 22(3):183--236, 1990. Google ScholarDigital Library
- P. Shvaiko and J. Euzenat. A survey of schema-based matching approaches. Journal on Data Semantics IV, pages 146--171, 2005. Google ScholarDigital Library
- K. Tu and Y. Yu. CMC: Combining multiple schema-matching strategies based on credibility prediction. In L. Zhou, B. Ooi, and X. Meng, editors, Database Systems for Advanced Applications, volume 3453 of LNCS, pages 995--995. Springer Berlin / Heidelberg, 2005. Google ScholarDigital Library
- J. Wang, J. Wen, F. Lochovsky, and W. Ma. Instance-based schema matching for web databases by domain-specific query probing. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 408--419. VLDB Endowment, 2004. Google ScholarDigital Library
- M. Weidlich, T. Sagi, H. Leopold, A. Gal, and J. Mendling. Predicting the quality of process model matching. In Business Process Management, volume 8094 of LNCS, pages 203--210. Springer, 2013. Google ScholarDigital Library
Index Terms
- From Diversity-based Prediction to Better Ontology & Schema Matching
Recommendations
Schema matching based on SQL statements
AbstractSchema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the ...
A New Complex Schema Matching System
CICC-ITOE '10: Proceedings of the 2010 International Conference on Innovative Computing and Communication and 2010 Asia-Pacific Conference on Information Technology and Ocean EngineeringSchema matching, the problem of finding semantic correspondences between elements of two schemas, plays a key role in many applications, such as data warehouse, E-Commerce. The existing approaches to automating schema matching almost focus on computing ...
Ontology Matching: State of the Art and Future Challenges
After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly ...
Comments