Elsevier

Knowledge-Based Systems

Volume 132, 15 September 2017, Pages 1-10
Knowledge-Based Systems

End-to-end recognition of slab identification numbers using a deep convolutional neural network

https://doi.org/10.1016/j.knosys.2017.06.017Get rights and content

Abstract

This paper proposes a novel algorithm for the end-to-end recognition of slab identification numbers (SINs). In the steel industry, automatic recognition of an individual product information is important for production management. The recognition of SINs in actual factory scenes is a challenging problem due to complicated background and low-quality of characters. Conventional rule-based algorithms were developed to extract information of SINs, but these methods require engineering knowledge and tedious work for parameter tuning. The proposed algorithm employs a data-driven method to overcome these limitations and to handle the challenges for the recognition of SINs. This paper proposes accumulated response map and model-based score function to effectively use the outputs of a deep convolutional neural network. Experiments were thoroughly conducted for industrial data collected from an actual steelworks to verify the effectiveness of the proposed algorithm. Experiment results demonstrate that simultaneous recognition of entire characters in a SIN by optimizing the model-based score function is more effective for the robust performance compared to separated recognition of individual characters.

Introduction

The steel industry is one of the fundamental industries in the world. Nowadays, most steelworks equip an integrated production line that contains a furnace, continuous casting process, and rolling mills. The smelting and refining processes are conducted in the furnace to produce molten steel, and the molten steel is continuously cast and cut to produce semi-finished steel products such as slabs, billets, and blooms. After the continuous casting, slabs are piled up on a slab transfer machine and transferred to a slab yard. Because slabs are manufactured with different amount of deoxidized and alloying elements according to the purpose of production, the identification of each individual slab is important to prevent an improper process. In the steel industry, paint marking systems are widely used to inscribe a slab identification number (SIN) for characterizing a slab. By marking SINs, individual slabs are visually identifiable without any equipment in a steelworks or by other customer companies, and SINs can be automatically recognized by using a computer vision system. Several issues are arisen for effective paint marking machines, sustainable image acquisition systems, and image processing techniques for automatically recognizing SINs. This paper focuses on the recognition algorithm.

Image processing techniques have been widely utilized for solving recognition problems [1], [2], [3]. The recognition of SINs in factory scenes can be viewed as a problem of text recognition. There are two common strategies for the text recognition: stepwise strategy and integrated strategy. Stepwise strategy extracts text information by conducting a series of processes that contain localization, segmentation, and verification steps. The information of text candidates such as positions, sizes, and orientations is obtained in the localization step, and individual characters are segmented and classified in successive steps. Stepwise approach is computationally efficient especially for the recognition of multi-oriented texts, but errors in each step are accumulated. On the other hand, The integrated strategy detects and recognizes texts with a combined module. This approach shares character information in the overall processes, and jointly optimizes the detection and recognition tasks. Some integrated recognition algorithms loosely localize a region that may contain a text as a preprocessing [4].

Several methodologies have been developed for the text recognition. A conventional approach employs rule-based methods [5]. Rule-based algorithms generate text candidates and filter out non-text candidates [6]. To effectively recognize texts in unstructured scenes, machine learning based methods [7], [8], [9] were developed with various feature representations such as stroke filters [10] and local gradient features [11]. However, the performance of previous machine learning methods is heavily dependent on carefully engineered feature representations in practical applications. Recently, data driven methods with the use of deep learning have been developed with big attention for the text recognition [12], [13], [14]. Basically, deep learning is a neural network that contains many layers, and it has been successfully used in knowledge-based systems [15], [16], [17]. A convolutional neural network [18] is the most popularly used deep learning structure for images or multi-dimensional data. With the efficient use of a graphical processing unit [19] and development of new algorithms such as rectified linear unit (ReLU) [20], deep convolutional neural networks (DCNNs) have achieved outstanding results in image classification [21], object detection [22], and application areas [23], [24], [25], [26]. Detailed information about DCNNs is well-described in other articles [27].

In our industrial setting, various number of slabs are piled up on a slab transfer machine at a slab yard. An image acquisition system was installed at the slab yard to collect factory scenes. A SIN is a string of characters that characterize a slab, and one SIN consists of 9 characters that are horizontally arranged with similar distances as shown in Fig. 1. The first character is an alphabetical character that indicates a specific production line, and successive 8 characters are numerical characters that characterize a slab. Our image acquisition system was set for a production line that was indicated by B.

The automatic recognition of SINs in an actual industrial field is a challenging problem due to the following matters:

  • 1)

    Various lengths of slabs

    Slabs with different lengths are arranged by the centers on a slab transfer machine. Different distances from frontal surfaces of slabs to the image acquisition system cause various sizes of SINs in a scene. Furthermore, shadow can be casted on a region of a SIN if its upper slab is relatively long.

  • 2)

    High temperature of a slab

    Due to the high temperature of a slab, edges of characters are blurred, and a part of a character can be deformed or removed. Because a high temperature object radiates large amount of infrared and visible light, the quality of SINs is degraded due to reddish color noise appeared on the surface of a hot slab.

  • 3)

    Environments of the vision system

    There are other obstacles for the recognition such as complicated background due to complex manufacturing facilities, change of lighting condition during 24 h operation, and huge amount of dust particles in the actual steelworks.

Fig. 2 presents actual factory scenes that show challenges for the recognition of SINs.

This paper proposes a pipeline for the end-to-end recognition of SINs. The end-to-end recognition is far more challenging problem compared to the classification of characters or recognition of cropped word images because whole characters with variable size and location should be correctly predicted in an unstructured scene. The term, end-to-end recognition, implies that the proposed system takes a factory scene as an input and yields transcribed SINs which characterize individual slabs in the input scene. A deep convolutional neural network and sliding window method were employed in the recognition pipeline. The main contribution of this work is twofold: (1) accumulated response map for utilizing the information of neighboring patches and (2) model-based score function for simultaneously recognizing whole characters in a SIN. Actual factory scenes, which were used in [28], were utilized to develop and evaluate the proposed algorithm. Experiments were thoroughly conducted to demonstrate the effectiveness of each contribution.

The remaining sections are organized as follows. Section 2 investigates related work, and Section 3 presents the information of datasets. Section 4 describes training procedure for DCNNs, and Section 5 explains the end-to-end recognition algorithm for SINs. Section 6 presents experimental results, and Section 7 contains conclusion.

Section snippets

Related work

The end-to-end recognition of SINs in factory scenes is closely related to the problem of scene text recognition which is an important task in the field of visual information retrieval. Recently, outstanding progress has been achieved for text recognition with the effective use of deep learning methodologies. An end-to-end text recognition algorithm was proposed in [14], and a DCNN and sliding window method were utilized to detect and recognize individual characters. In this work, non-maximum

Dataset

Factory scenes were recorded in an actual steelworks during 55 working days, and 4501 scenes that contain 9130 slabs were collected as 24-bit color images with the size of 1200 (height) × 1920 (width). Image data acquired during 30 and 6 working days were used to construct training and validation sets, and the remaining images were utilized to organize a test set for evaluating the recognition performance of the proposed algorithm. Table 1 summarizes the information about training, validation,

Training data construction

For the training images, the information about bounding-box of each character and its class was recorded to generate ground-truth data. By using the ground-truth data, patch images were separately collected from the character and background regions in the 1850 training images. From the training images, 33,741 character patches were collected, and data augmentation was conducted for this original character regions to alleviate the problem of insufficient training data. For each individual

Multiscale image

The heights of most training characters had the range of 50 to 140 pixels, and these character regions were resized to the fixed size of 80 × 56. Based on the sizes of training characters, multiscale analysis with the five scales (60%, 80%, 100%, 120%, 140%) is applied for a test image to recognize SINs with various sizes. SIN candidates are detected from the five resized versions for a slab. Among the candidates, one candidate with the maximum model-based score remains, and the other

Experimental environment and evaluation measures

Hardware environment for the experiments contains intel core i7-3930K CPU (3.2GHz), 16GB DDR3 RAM, and a GeForce GTX TITAN X. The neural network toolbox in MATLAB 2016b was utilized for developing the algorithm.

The accuracy for classifying characters in the test set is presented in Section 6.3, and it was measured by the ratio of the number of correctly classified characters to the total number of test characters. The end-to-end recognition performance for the test set is analyzed from

Conclusion

This paper proposes an end-to-end recognition algorithm for SINs in actual factory scenes. The recognition of a SIN is a challenging problem due to low-quality characters in complicated scenes. A deep learning based method is utilized to handle the challenges and to overcome limitations of the previous rule-based algorithms. The proposed algorithm adopts an integrated recognition strategy; it loosely estimates vertical positions of SINs in coarse search, and precise localization and recognition

References (33)

  • D. Cireşan et al.

    Multi-column deep neural network for traffic sign classification

    Neural Netw.

    (2012)
  • S. Choi et al.

    Localizing slab identification numbers in factory scene images

    Expert Syst. Appl.

    (2012)
  • Y.-C. Wu et al.

    Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models

    Pattern Recognit.

    (2017)
  • Q. Ye et al.

    Text detection and recognition in imagery: a survey

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • S. Mori et al.

    Historical review of OCR research and development

    Proc. IEEE

    (1992)
  • X.-C. Yin et al.

    Robust text detection in natural scene images

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • Cited by (19)

    • Data-driven intelligent method for detection of electricity theft

      2023, International Journal of Electrical Power and Energy Systems
    • Natural scene text detection and recognition based on saturation-incorporated multi-channel MSER

      2022, Knowledge-Based Systems
      Citation Excerpt :

      The text information of natural scene images is of fundamental importance for many applications, such as intelligent transportation, tourism translation, financial ticket recognition [1], and slab identification number recognition [2].

    • Unified deep neural networks for end-to-end recognition of multi-oriented billet identification number

      2021, Expert Systems with Applications
      Citation Excerpt :

      Based on its success in the field of image processing and computer vision, deep learning has been widely used to identify products in numerous industries (Xuan et al., 2018; Chen & Jahanshahi, 2018; Kessentini, Besbes, Ammar, & Chabbouh, 2019). This trend can also be found in the steel industry (Lee, Yun, Koo, & Kim, 2017; Koo, Yun, Lee, Choi, & Kim, 2019; Dong et al., 2019). Semi-finished products such as a billet and a slab should be identified because they are made up of different ingredients according to the needs of the customers.

    • Road Scene Text Detection and Recognition Using Machine Learning

      2023, 2023 IEEE 20th International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT, HONET 2023
    View all citing articles on Scopus
    View full text