End-to-end recognition of slab identification numbers using a deep convolutional neural network
Introduction
The steel industry is one of the fundamental industries in the world. Nowadays, most steelworks equip an integrated production line that contains a furnace, continuous casting process, and rolling mills. The smelting and refining processes are conducted in the furnace to produce molten steel, and the molten steel is continuously cast and cut to produce semi-finished steel products such as slabs, billets, and blooms. After the continuous casting, slabs are piled up on a slab transfer machine and transferred to a slab yard. Because slabs are manufactured with different amount of deoxidized and alloying elements according to the purpose of production, the identification of each individual slab is important to prevent an improper process. In the steel industry, paint marking systems are widely used to inscribe a slab identification number (SIN) for characterizing a slab. By marking SINs, individual slabs are visually identifiable without any equipment in a steelworks or by other customer companies, and SINs can be automatically recognized by using a computer vision system. Several issues are arisen for effective paint marking machines, sustainable image acquisition systems, and image processing techniques for automatically recognizing SINs. This paper focuses on the recognition algorithm.
Image processing techniques have been widely utilized for solving recognition problems [1], [2], [3]. The recognition of SINs in factory scenes can be viewed as a problem of text recognition. There are two common strategies for the text recognition: stepwise strategy and integrated strategy. Stepwise strategy extracts text information by conducting a series of processes that contain localization, segmentation, and verification steps. The information of text candidates such as positions, sizes, and orientations is obtained in the localization step, and individual characters are segmented and classified in successive steps. Stepwise approach is computationally efficient especially for the recognition of multi-oriented texts, but errors in each step are accumulated. On the other hand, The integrated strategy detects and recognizes texts with a combined module. This approach shares character information in the overall processes, and jointly optimizes the detection and recognition tasks. Some integrated recognition algorithms loosely localize a region that may contain a text as a preprocessing [4].
Several methodologies have been developed for the text recognition. A conventional approach employs rule-based methods [5]. Rule-based algorithms generate text candidates and filter out non-text candidates [6]. To effectively recognize texts in unstructured scenes, machine learning based methods [7], [8], [9] were developed with various feature representations such as stroke filters [10] and local gradient features [11]. However, the performance of previous machine learning methods is heavily dependent on carefully engineered feature representations in practical applications. Recently, data driven methods with the use of deep learning have been developed with big attention for the text recognition [12], [13], [14]. Basically, deep learning is a neural network that contains many layers, and it has been successfully used in knowledge-based systems [15], [16], [17]. A convolutional neural network [18] is the most popularly used deep learning structure for images or multi-dimensional data. With the efficient use of a graphical processing unit [19] and development of new algorithms such as rectified linear unit (ReLU) [20], deep convolutional neural networks (DCNNs) have achieved outstanding results in image classification [21], object detection [22], and application areas [23], [24], [25], [26]. Detailed information about DCNNs is well-described in other articles [27].
In our industrial setting, various number of slabs are piled up on a slab transfer machine at a slab yard. An image acquisition system was installed at the slab yard to collect factory scenes. A SIN is a string of characters that characterize a slab, and one SIN consists of 9 characters that are horizontally arranged with similar distances as shown in Fig. 1. The first character is an alphabetical character that indicates a specific production line, and successive 8 characters are numerical characters that characterize a slab. Our image acquisition system was set for a production line that was indicated by B.
The automatic recognition of SINs in an actual industrial field is a challenging problem due to the following matters:
- 1)
Various lengths of slabs
Slabs with different lengths are arranged by the centers on a slab transfer machine. Different distances from frontal surfaces of slabs to the image acquisition system cause various sizes of SINs in a scene. Furthermore, shadow can be casted on a region of a SIN if its upper slab is relatively long.
- 2)
High temperature of a slab
Due to the high temperature of a slab, edges of characters are blurred, and a part of a character can be deformed or removed. Because a high temperature object radiates large amount of infrared and visible light, the quality of SINs is degraded due to reddish color noise appeared on the surface of a hot slab.
- 3)
Environments of the vision system
There are other obstacles for the recognition such as complicated background due to complex manufacturing facilities, change of lighting condition during 24 h operation, and huge amount of dust particles in the actual steelworks.
Fig. 2 presents actual factory scenes that show challenges for the recognition of SINs.
This paper proposes a pipeline for the end-to-end recognition of SINs. The end-to-end recognition is far more challenging problem compared to the classification of characters or recognition of cropped word images because whole characters with variable size and location should be correctly predicted in an unstructured scene. The term, end-to-end recognition, implies that the proposed system takes a factory scene as an input and yields transcribed SINs which characterize individual slabs in the input scene. A deep convolutional neural network and sliding window method were employed in the recognition pipeline. The main contribution of this work is twofold: (1) accumulated response map for utilizing the information of neighboring patches and (2) model-based score function for simultaneously recognizing whole characters in a SIN. Actual factory scenes, which were used in [28], were utilized to develop and evaluate the proposed algorithm. Experiments were thoroughly conducted to demonstrate the effectiveness of each contribution.
The remaining sections are organized as follows. Section 2 investigates related work, and Section 3 presents the information of datasets. Section 4 describes training procedure for DCNNs, and Section 5 explains the end-to-end recognition algorithm for SINs. Section 6 presents experimental results, and Section 7 contains conclusion.
Section snippets
Related work
The end-to-end recognition of SINs in factory scenes is closely related to the problem of scene text recognition which is an important task in the field of visual information retrieval. Recently, outstanding progress has been achieved for text recognition with the effective use of deep learning methodologies. An end-to-end text recognition algorithm was proposed in [14], and a DCNN and sliding window method were utilized to detect and recognize individual characters. In this work, non-maximum
Dataset
Factory scenes were recorded in an actual steelworks during 55 working days, and 4501 scenes that contain 9130 slabs were collected as 24-bit color images with the size of 1200 (height) × 1920 (width). Image data acquired during 30 and 6 working days were used to construct training and validation sets, and the remaining images were utilized to organize a test set for evaluating the recognition performance of the proposed algorithm. Table 1 summarizes the information about training, validation,
Training data construction
For the training images, the information about bounding-box of each character and its class was recorded to generate ground-truth data. By using the ground-truth data, patch images were separately collected from the character and background regions in the 1850 training images. From the training images, 33,741 character patches were collected, and data augmentation was conducted for this original character regions to alleviate the problem of insufficient training data. For each individual
Multiscale image
The heights of most training characters had the range of 50 to 140 pixels, and these character regions were resized to the fixed size of 80 × 56. Based on the sizes of training characters, multiscale analysis with the five scales (60%, 80%, 100%, 120%, 140%) is applied for a test image to recognize SINs with various sizes. SIN candidates are detected from the five resized versions for a slab. Among the candidates, one candidate with the maximum model-based score remains, and the other
Experimental environment and evaluation measures
Hardware environment for the experiments contains intel core i7-3930K CPU (3.2GHz), 16GB DDR3 RAM, and a GeForce GTX TITAN X. The neural network toolbox in MATLAB 2016b was utilized for developing the algorithm.
The accuracy for classifying characters in the test set is presented in Section 6.3, and it was measured by the ratio of the number of correctly classified characters to the total number of test characters. The end-to-end recognition performance for the test set is analyzed from
Conclusion
This paper proposes an end-to-end recognition algorithm for SINs in actual factory scenes. The recognition of a SIN is a challenging problem due to low-quality characters in complicated scenes. A deep learning based method is utilized to handle the challenges and to overcome limitations of the previous rule-based algorithms. The proposed algorithm adopts an integrated recognition strategy; it loosely estimates vertical positions of SINs in coarse search, and precise localization and recognition
References (33)
- et al.
Pose and illumination variable face recognition via sparse representation and illumination dictionary
Knowl. Based Syst.
(2016) - et al.
A keypoints-based feature extraction method for iris recognition under variable image quality conditions
Knowl. Based Syst.
(2016) - et al.
Facial expression recognition with automatic segmentation of face regions using a fuzzy based classification approach
Knowl. Based Syst.
(2016) - et al.
Accurate text localization in images based on svm output scores
Image Vis. Comput.
(2009) - et al.
A stroke filter and its application to text localization
Pattern Recogn. Lett.
(2009) - et al.
Recognition of handwritten characters using local gradient feature descriptors
Eng. Appl. Artif. Intel.
(2015) - et al.
Protein secondary structure prediction by using deep learning method
Knowl. Based Syst.
(2017) - et al.
Deep neural network framework and transformed mfccs for speaker’s age and gender classification
Knowl. Based Syst.
(2017) - et al.
Segmentation of dna using simple recurrent neural network
Knowl. Based Syst.
(2012) - et al.
Supervised remote sensing image segmentation using boosted convolutional neural networks
Knowl. Based Syst.
(2016)
Multi-column deep neural network for traffic sign classification
Neural Netw.
Localizing slab identification numbers in factory scene images
Expert Syst. Appl.
Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models
Pattern Recognit.
Text detection and recognition in imagery: a survey
IEEE Trans. Pattern Anal. Mach. Intell.
Historical review of OCR research and development
Proc. IEEE
Robust text detection in natural scene images
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (19)
Data-driven intelligent method for detection of electricity theft
2023, International Journal of Electrical Power and Energy SystemsNatural scene text detection and recognition based on saturation-incorporated multi-channel MSER
2022, Knowledge-Based SystemsCitation Excerpt :The text information of natural scene images is of fundamental importance for many applications, such as intelligent transportation, tourism translation, financial ticket recognition [1], and slab identification number recognition [2].
Unified deep neural networks for end-to-end recognition of multi-oriented billet identification number
2021, Expert Systems with ApplicationsCitation Excerpt :Based on its success in the field of image processing and computer vision, deep learning has been widely used to identify products in numerous industries (Xuan et al., 2018; Chen & Jahanshahi, 2018; Kessentini, Besbes, Ammar, & Chabbouh, 2019). This trend can also be found in the steel industry (Lee, Yun, Koo, & Kim, 2017; Koo, Yun, Lee, Choi, & Kim, 2019; Dong et al., 2019). Semi-finished products such as a billet and a slab should be identified because they are made up of different ingredients according to the needs of the customers.
Two-stage hybrid algorithm for recognition of industrial slab numbers with data quality improvement
2023, Complex and Intelligent SystemsRoad Scene Text Detection and Recognition Using Machine Learning
2023, 2023 IEEE 20th International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT, HONET 2023Lightweight Inception Networks for the Recognition and Detection of Rice Plant Diseases
2022, IEEE Sensors Journal