ABSTRACT
We propose a method of classifying XML documents and extracting XML schema from XML by inductive inference based on constraint logic programming. The goal of this work is to type a large collection of XML approximately but efficiently. This can also process XML code written in a different schema or even code which is schema-less. Our approach is intended to achieve identification based on the syntax and semantics of the XML documents by information extraction using ontology, and to support retrieval and data management. Our approach has three steps. The first step is XML to predicates, the second step is to compare predicates and classifies structures which represent similar meanings in different structures, and the last step is predicates to rules by using ontology and to maintain XML Schema. We evaluate similarity of data type and data range by using an ontology dictionary, and XML Schema is made from results of second and last step.
- Masaya Eki, Tadachika Ozono, Toramatsu Shintani, 'On an XML Database System Based on Constraint Logic Programming', WorldComp ICAI'07, pages 859-865, 2007.Google Scholar
- Wen-Syan Li, Chris Clifton, 'SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks', Data & Knowledge Engineering, Volume 33, Issue 1, Pages 49-84, Apr 2000. Google ScholarDigital Library
- Fumio Mizoguchi, Hayato Ohwada, 'Constraint relative least general generalization for inducing constraint logic programs', New Generation Computing, pages 335-368, 1995.Google ScholarCross Ref
- Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani, 'Extracting Schema from Semistructured Data', SIGMOD'98, pages 295-306, 1998. Google ScholarDigital Library
Index Terms
- Extracting XML schema from multiple implicit xml documents based on inductive reasoning
Recommendations
XML-based XML schema access
WWW '07: Proceedings of the 16th international conference on World Wide WebXML Schema's abstract data model consists of components, which are the structures that eventually define a schema as a whole. XML Schema's XML syntax, on the other hand, is not a direct representation of the schema components, and it proves to be ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Comments