Automated Feature Extraction from UML Images to Measure SOA Size
Samson Wanjala Munialo1, Geoffrey Muchiri Muketha2, Kelvin Kabeti Omieno3

1Samson Wanjala Munialo*, Department of Information Technology, Meru University of Science and Technology, Meru, Kenya.
2Geoffrey Muchiri Muketha, Department of Computer Science, Murang‟a University of Technology, Murang‟a, Kenya.
3Kelvin Kabeti Omieno, Department of Information Technology and Informatics, Kaimosi Friends University College, Kaimosi, Kenya.

Manuscript received on May 25, 2020. | Revised Manuscript received on June 29, 2020. | Manuscript published on July 30, 2020. | PP: 1132-1137 | Volume-9 Issue-2, July 2020. | Retrieval Number: B4131079220/2020©BEIESP | DOI: 10.35940/ijrte.B4131.079220
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Enormous development has been experiences in the field of text and image extraction and classification. This is due to large amount of image data that is generated as a result of document sharing for collaborative software development and electronic storage of design documents. One of the recent technique for analyzing large dataset and discover underlying patterns is Deep learning technique. Deep learning is a branch of Machine learning inspired by human brain functionality for the purpose of analyzing unstructured data including images, sound and text. Unified Model Language (UML) is an architectural design which provides developers with a view of software components and scope. UML contain texts and notations which are mostly analyzed and interpreted manually for the purpose of system implementation and scope or size measurement. Consequently, manual processing of electronic design artifacts is prone to bias, errors and time consuming. Various researchers have attempted to automate the process of reading and interpreting design artifacts but still there is a challenge due to varying style of designing these artifacts. This study propose an automatic tool based on existing deep learning algorithms including ResNet50 CNN to read UML interface and sequence diagrams images to detect UML arrows, EAST test detector to detect text, Tesseract OCR with Long Short-Term Memory (LSTM) to recognize text and Multi-class Support Vector Machine to classify text for the purpose of measuring Service Oriented Architecture size. We subjected the tool to accuracy tests which returned encouraging results.
Keywords: Unified Modeling Language, Machine Learning, Deep Learning, image classification, text extraction.