Skip to main content
Log in

An effective quality analysis of XML web data using hybrid clustering and classification approach

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

An effective quality analysis of XML web data using clustering and classification approach is used in our proposed method. XML is turning into a standard in representation of data, it is attractive to support keyword search in XML database. A keyword search searches for words anyplace in record. It is developed as best worldview for finding data on web. The most imperative prerequisite for the keyword search is to rank the consequences of question so that the most pertinent outcomes show up. Here, we gather more XML documents. Followed by that, feature extraction occurs. Since the selected feature contains both relevant as well as irrelevant features it is essential to filter the irrelevant features. For the purpose of selecting, the relevant features probability-based feature selection method is used. Then for clustering the relevant features on the basis of keywords weighted fuzzy c means clustering algorithm is used. In order to assess the XML data quality, optimal neural network (ONN) classifier is utilized. In this ONN classifier in order to select the optimal weights, whale optimization algorithm is used. Thus, the web pages are effectively ranked. The efficiency of the proposed method is assessed using clustering and classification accuracy, RMSE, and search time. The proposed method is implemented in JAVA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Algergawy A, Schallehn E, Saake G (2009) Improving XML schema matching performance using Prüfer sequences. Data Knowl Eng 68(8):728–747

    Article  Google Scholar 

  • Alpuente M, Ballis D, Falaschi M, Frechina F, Romero D (2013) Rewriting-based repairing strategies for XML repositories. J Logic Algebraic Progr 82(8):326–352

    Article  MathSciNet  Google Scholar 

  • Barros EG, Laender AHF, Moro MM, da Silva AS (2016) LCA-based algorithms for efficiently processing multiple keyword queries over XML streams. Data Knowl Eng 103:1–18

    Article  Google Scholar 

  • Böttcher S, Hartel R, Wolters D (2016) S2CX: from relational data via SQL/XML to (Un-)Compressed XML. Inf Syst 56:198–213

    Article  Google Scholar 

  • Cao Y, Lung C-H, Majumdar S (2016) Efficient message delivery models for XML-based publish/subscribe systems. Comput Commun 85:58–73

    Article  Google Scholar 

  • Greco S, Gullo F, Ponti G, Tagarelli A (2011) Collaborative clustering of XML documents. J Comput Syst Sci 77(6):988–1008

    Article  MathSciNet  Google Scholar 

  • Grijzenhout S, Marx M (2013) The quality of the XML Web. Web Semant Sci Serv Agents World Wide Web 19:59–68

    Article  Google Scholar 

  • Liu J, Zhang XX (2016) Dynamic labeling scheme for XML updates. Knowl Based Syst 106:135–149

    Article  Google Scholar 

  • Liu J, Zhang XX (2017) Efficient keyword search in fuzzy XML. Fuzzy Sets Syst 317:68–87

    Article  MathSciNet  Google Scholar 

  • Ma Z, Yan L (2016) Modeling fuzzy data with XML: a survey. Fuzzy Sets Syst 301:146–159

    Article  MathSciNet  Google Scholar 

  • Ma Z, Bai L, Ishikawa Y, Yan L (2017) Consistencies of fuzzy spatiotemporal data in XML documents. Fuzzy Sets Syst 343:97–125

    Article  MathSciNet  Google Scholar 

  • Mata C, Oliver A, Lalande A, Walker P, Martí J (2017) On the use of XML in medical imaging web-based applications. IRBM 38(1):3–12

    Article  Google Scholar 

  • Mohammed S, Barradah AF, El-Alfy E-SM (2016) Selectivity estimation of extended XML query tree patterns based on prime number labeling and synopsis modeling. Simul Model Pract Theory 64:30–42

    Article  Google Scholar 

  • Morris KC (2010) A framework for XML schema naming and design rules development tools. Comput Stand Interfaces 32(4):179–184

    Article  Google Scholar 

  • Nečaský M, Klímek J, Malý J, Mlýnková I (2012) Evolution and change management of XML-based systems. J Syst Softw 85(3):683–707m

    Article  Google Scholar 

  • Qadah GZ (2017) Indexing techniques for processing generalized XML documents. Comput Stand Interfaces 49:34–43

    Article  Google Scholar 

  • Qtaish A, Ahmad K (2016) XAncestor: an efficient mapping approach for storing and querying XML documents in relational database using path-based technique. Knowl Based Syst 114:167–192

    Article  Google Scholar 

  • Safabahar B, Mirabi M (2017) A new structure and access mechanism for secure and efficient XML data broadcast in mobile wireless networks. J Syst Softw 125:119–132

    Article  Google Scholar 

  • Schweinsberg K, Wegner L (2017) Advantages of complex SQL types in storing XML documents. Future Gener Comput Syst 68:500–507

    Article  Google Scholar 

  • Sengupta A (2012) On the feasibility of using conceptual modeling constructs for the design and analysis of XML data. Data Knowl Eng 72:219–238

    Article  Google Scholar 

  • Szymczak M, Zadrożny S, Bronselaer A, De Tré G (2015) Coreference detection in an XML schema. Inf Sci 296:237–262

    Article  Google Scholar 

  • Tekli J, Charbel N, Chbeir R (2016) Building semantic trees from XML documents. Web Semant Sci Serv Agents World Wide Web 37–38:1–24

    Article  Google Scholar 

  • Vela B, Mazón JN, Blanco C, Fernández-Medina E, Trujillo J, Marcos E (2013) Development of secure XML data warehouses with QVT. Inf Softw Technol 55(9):1651–1677

    Article  Google Scholar 

  • Wang D (2007) An XML-based testing strategy for probing security vulnerabilities in the diameter protocol. Bell Labs Tech J 12(3):79–93

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Gopianand.

Ethics declarations

Conflict of interest

The authors declare that we have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gopianand, M., Jaganathan, P. An effective quality analysis of XML web data using hybrid clustering and classification approach. Soft Comput 24, 2139–2150 (2020). https://doi.org/10.1007/s00500-019-04045-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04045-9

Keywords

Navigation