research-article

XML data partitioning strategies to improve parallelism in parallel holistic twig joins

Authors:
Imam Machdi

University of Tsukuba, Japan

University of Tsukuba, Japan
View Profile

,
Toshiyuki Amagasa

University of Tsukuba, Japan

University of Tsukuba, Japan
View Profile

,
Hiroyuki Kitagawa

University of Tsukuba, Japan

University of Tsukuba, Japan
View Profile

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and CommunicationFebruary 2009Pages 471–480https://doi.org/10.1145/1516241.1516322

Published:15 February 2009Publication History

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

Pages 471–480

ABSTRACT

Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.

References

Niagara query engine. http://www.cs.wisc.edu/niagara.Google Scholar
Stanford university infolab. http://infolab.stanford.edu/pub/movies/dtd.html.Google Scholar
S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proc. of the 18th International Conference on Data Engineering (ICDE'02), pages 141--, 2002. Google ScholarDigital Library
T. Amagasa, K. Kido, and H. Kitagawa. Querying XML Data Using PC Cluster System. In Proc. of the International Workshops on XML Data Management Tools and Techniques (XANTEC'07), pages 5--9, 2007. Google ScholarDigital Library
J.-M. Bremer and M. Gertz. On Distributing XML Repositories. In International Workshop on the Web and Databases (WebDB), pages 73--78, 2003.Google Scholar
N. Bruno, N. Koudas, and D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD'02), pages 310--321, 2002. Google ScholarDigital Library
T. Chen, J. Lu, and T. W. Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. In Proc. of the 24th ACM SIGMOD International Conference on Management of Data (SIGMOD'05), pages 455--466, 2005. Google ScholarDigital Library
R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proc. of 23rd International Conference on Very Large Data Bases (VLDB'97), pages 436--445, 1997. Google ScholarDigital Library
G. Gou and R. Chirkova. Efficiently Querying Large XML Data Repositories: A Survey. IEEE Transactions on Knowledge and Data Engineering, 19(10):1381--1403, 2007. Google ScholarDigital Library
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the Integration of Structure Indexes and Inverted Lists. In Proc. of the 23^rd ACM SIGMOD International Conference on Management of Data (SIGMOD'04), pages 779--790, 2004. Google ScholarDigital Library
K. Kido, T. Amagasa, and H. Kitagawa. Processing XPath Queries in PC Clusters Using XML Data Partitioning. In Proc. of the 22nd ICDE Workshops, page 114, 2006. Google ScholarDigital Library
H. Kurita, K. Hatano, J. Miyazaki, and S. Uemura. Efficient Query Processing for Large XML Data in Distributed Environments. In Proc. of the 21^st International Conference on Advanced Networking and Applications (AINA'07), 2007. Google ScholarDigital Library
Y. G. Li, S. Bressan, G. Dobbie, Z. Lacroix, M. L. Lee, U. Nambiar, and B. Wadhwa. XOO7: Applying OO7 Benchmark to XML Query Processing Tool. In Proc. of the 10th International Conference on Information and Knowledge Management, pages 167--174, New York, NY, USA, 2001. Google ScholarDigital Library
J. Lu, T. Chen, and T. W. Ling. Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach. In Proc. of 13^th International Conference on Information and Knowledge Management (CIKM'04), pages 533--542, 2004. Google ScholarDigital Library
K. Lu, Y. Zhu, and W. Sun. Parallel Processing XML Documents. In Proc. of International Database Engineering and Applications Symposium (IDEAS), pages 96--105, 2002. Google ScholarDigital Library
I. Machdi, T. Amagasa, and H. Kitagawa. GMX: An XML Data Partitioning Scheme for Holistic Twig Joins. In Proc. of 10th International Conference on Information Integration and Web-based Applications and Services (iiWAS'08), Linz, Austria, 2008. Google ScholarDigital Library
R. Sakellariou and J. R. Gurd. Compile-time Minimisation of Load Imbalance in Loop Nests. In Proc. of the 11th International Conference on Supercomputing, pages 277--284, 1997. Google ScholarDigital Library
A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the 28th International Conference on Very Large Data Bases, pages 974--985. VLDB Endowment, 2002. Google ScholarDigital Library
N. Tang, G. Wang, X. J. Yu, K.-F. Wong, and G. Yu. WIN: An Efficient Data Placement Strategy for Parallel XML Databases. In Proc. of 11th International Conference on Parallel and Distributed Systems (ICPADS), pages 249--355, 2005. Google ScholarDigital Library
H. Wang, S. Park, W. Fan, and P. S. Yu. Vist: A Dynamic Index Method for Querying XML Data by Tree Structures. In Proc. of the 22nd ACM SIGMOD International Conference on Management of Data (SIGMOD'03), pages 110--121, 2003. Google ScholarDigital Library
C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proc. of the 2001 ACM SIGMOD International Conference on Management of data, pages 425--436, 2001. Google ScholarDigital Library

Index Terms

XML data partitioning strategies to improve parallelism in parallel holistic twig joins
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

GMX: an XML data partitioning scheme for holistic twig joins
iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services

As traditional partitioning strategies do not serve well for semistructured data, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query processing ...
Read More
Holistic twig joins: optimal XML pattern matching
SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data

XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for ...
Read More
Cost-based holistic twig joins

An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
February 2009
704 pages
ISBN:9781605584058
DOI:10.1145/1516241
General Chairs:
Won Kim
Sungkyunkwan University, Korea
,
Hyung Jin Choi
Sungkyunkwan University, Korea
,
Dongho Won
SungKyunkwan University, Korea
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 February 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
XML data partition
holistic twig joins
inter query parallelism
intra query parallelism
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate251of941submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 296
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

XML data partitioning strategies to improve parallelism in parallel holistic twig joins

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

GMX: an XML data partitioning scheme for holistic twig joins

Holistic twig joins: optimal XML pattern matching

Cost-based holistic twig joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

XML data partitioning strategies to improve parallelism in parallel holistic twig joins

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

GMX: an XML data partitioning scheme for holistic twig joins

Holistic twig joins: optimal XML pattern matching

Cost-based holistic twig joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media