1 Introduction

Various techniques such as laser scanning and 3D cameras have been used to determine the stability of infrastructure [15,16,17, 19, 26]. A widely used method is damage detection using the digital image. According to the edge characteristics of the road surface image, the approximate location of the road crack can be detected in the pavement image [1, 11]. Various techniques such as thermal infrared images as well as digital images have been applied for crack detection [18, 20]. However, many speckle noises in the image can be identified as crack because of the low contrast between crack and the background of the road. Other detecting methods used the histogram-based machine vision algorithm to identify pavement cracks [8]. The method initially divides the collected road surface image into a lot of small samples and uses the parameters of the histogram to express the characteristics of each small sample. However, it is not appropriate to use in irregular pavement conditions. Some algorithms were included in the conversion process of the grayscale of the image before using the threshold to segment the image [24]. Since the analysis can be performed with a single color value, the number of variables can be reduced.

Peng et al. [13] proposed the use of the histogram and edge detection to analyze the pavement image and to detect cracks from the pavement. Here, the road image can be segmented to be filtered by the threshold of grayscale estimated by the histogram and edge detection. This method improved the detection of pavement cracks more clearly. Kapela et al. [6] obtained a histogram-mixed Gaussian fitting function for each subsample image by evaluating the histogram after image segmentation. The intersection of two Gaussian fitting functions is used to choose the best segmentation point during the segmentation of the image. This study was focused on dividing the number of image sub-blocks. If the area of the sub-block image is relatively large, the location of the crack on the road surface is not accurate. On the contrary, if the number of sub-blocks is relatively small, it takes a longer time to evaluate the crack locations.

With the development of high-speed video capture technology and the large-storage hardware [27], many pieces of research are focused on automatic detection of pavement damage to achieve real-time maintenance of the road, and crack detection by image processing has achieved lots of technological progress from both the academic and industrial fields [9]. Proposed a new image threshold algorithm based on neighboring difference histogram method (NDHM). The proposed algorithm is based on the value of the standard deviation of the image closely related to the road surface cracks. This method includes the threshold method and builds an objective function that quantizes the difference between the two categories. [25] proposed a histogram analysis from the segmented images. Because of the accuracy and efficiency to analyze the image data, the threshold segmentation method is widely used for crack detection. But a single threshold is used to filter the entire image in the previous method so that the proposed algorithm is not appropriate in various pavement environments, especially some images including shadows.

Threshold-based algorithms are one of the most popular crack detection methods [2], due to their simplicity and high efficiency. Histogram-based thresholding methods, either Gaussian hypothesis or adaptive or local thresholding, can separate background from cracks by the distribution of gray values of pixels [5]. However, this idea does not seem promising. A single segmentation is susceptible to noise and is not accurate enough [3, 4]. In recent years, with the significant increase in image data size, machine learning-based methods have attracted a lot of attention [3, 4]. Machine learning-based methods have contributed extensively to the study of pavement crack detection. Neural networks are at the core of trainable crack detection algorithms: They have proven to outperform threshold-based techniques and morphological tools Safaei et al. [14]. CrackForest [21] incorporates multiple levels of complementary functionality to characterize cracks and exploit structured information in crack patches.

The aim of this paper is to integrate local and global properties of cracks into the algorithm. It finds crack locations by performing a reasonable combination of finding global as well as local segments. The image processing is mainly focused on enhancing contrast to use several image processing stages: converting RGB image into grayscale image; histogram equalization; and mean filtering.

2 Algorithm framework

The algorithm consists of the following stages: pre-processing, local threshold-based extraction, global threshold segmentation to obtain potential crack region, local threshold segmentation to obtain cracks.

The algorithm framework is shown in Fig. 1. After image acquisition, converting RGB image into grayscale image is used to simplify the RGB image structure; histogram equalization improves image contrast, and average filtering is used to remove noise initially. Filter the lane line of the road based on a fixed threshold. The adaptive threshold is calculated, which combines the characteristics of the global threshold and the local threshold. In the crack extraction process, a sliding window-based deviation is used first to obtain a potential crack region using global segmentation, and then, local threshold segmentation is performed for the potential crack region. In the case of cracks, the final image with significant cracks is output. In the algorithm efficiency and accuracy analysis, all images in the crack image library (manually marked in the crack region) are manually processed to obtain crack regions and non-crack regions.

Fig. 1
figure 1

Structure of the crack detection algorithm

3 Pavement image acquisition and system overview

Driving tests were performed on the selected route to detect the crack on the pavement. Original image data were collected from twenty times round trips with a speed of 40 km/h on a Su Tong Li road in Suzhou, China, and its total length was 3.8 km (Fig. 2a). A camera was used for collecting the image of the pavement surface. In order to compare the images of pavement, it was necessary to fix the camera on the vehicle before the experiment. The distance between the camera and the pavement is 1 m, and the angle of the camera has 30° to the ground, and details of the apparatus are shown in Fig. 2b. A tripod was installed at the back window of the car to minimize the vibration of a camera. The GPS information is also collected by several mobile phones during the driving tests.

Fig. 2
figure 2

Overview of driving tests

The image is greatly affected by the environment and varying illumination. It can lead to image quality problems on the road surface. The most common factors include uneven lighting left on the road, shadows, trash and debris, lanes or markings, oil stains, and shadows, all of which may influence the detection. Therefore, the camera may produce uneven illumination in the captured image, a shadow pattern that must be removed in subsequent processing.

4 Crack detection processing

4.1 Pre-processing of crack detection

Vision-based crack detection systems are susceptible to high contrast areas caused by many possible non-cracking objects, such as dark spots in the road surface, lane stripes, and raised pavement markings. Another factor that affects the result is speckle noise, which is often mixed with cracks in the road image, which may exhibit low contrast and poor spatial continuity due to particulate material on the pavement, crack degradation and unreliable crack shadows.In order to minimize the negative effects and improve the effectiveness of the crack detection algorithm, some pre-processing operations must be performed.

4.1.1 Grayscale processing and Histogram equalization to enhance image contrast

In this paper, the image is represented by the grayscale, and it can be calculated from the RGB value [10] to brightness, which is shown in Eq. (1):

$$ v = \max \left( {R,G,B} \right) $$
(1)

The grayscale values of images are concentrated in a narrow interval. The histogram equalization can be used to adjust the distribution of grayscale value to enhance the local contrast so that the crack and the background are more distinct. Equation (2) [7, 22, 23] shows the algorithm for histogram equalization.

$$ f\left( {x,y} \right) = R \cdot \mathop \sum \limits_{{i = X_{0} }}^{{i = X_{k} }} \left( {n^{i} /N} \right) $$
(2)

where \(f\left( {x,y} \right)\) is the output image of the histogram equalization, \(R\) is the range of the output image (in this paper, R is 255), \(X_{0}\) is the minimum grayscale value in the input image, \(X_{k}\) is a maximum grayscale value in the input image, \(n_{i}\) is the total number of the grayscale, \(N\) is the total number of pixels in the image.

Figure 3 is a schematic diagram of the process of grayscale equalization. Figure 3b is a gray histogram of the input image (see Fig. 3a) which shows the distribution of the number of pixels according to the grayscale. The gray values of most pixels are concentrated between 70 and 145 in which represents the pavement pixels. Because of this overcrowded distribution, images of cracks and normal pavement are not clearly distinguished so that it can disturb the crack extraction from the original image. The processed image from the histogram equalization can be obtained as shown in Fig. 3c. The histogram distribution area after the histogram equalization is wider than that before the histogram equalization as shown in Fig. 3d, which means it makes the contrast of the image stronger.

Fig. 3
figure 3

The process of histogram equalization

4.1.2 Filtering traffic lines and signs

In this paper, Eq. (3) is proposed as the filtering method to remove the traffic lane. As results from lots of driving tests in this paper, the grayscale value of the traffic lane after the histogram equalization was always higher than ‘150,’ and the grayscale value of most road pavement and cracks were lower than ‘150.’ Therefore, the algorithm of this study includes filtering to remove pixels that grayscale value has higher than ‘150.’ Fig. 4 shows the results after filtering applying the filtering of Eq. (3).

$$ \left( {xx,y} \right) = \left\{ {\begin{array}{*{20}l} {f\left( {x,y} \right),} \hfill & {f\left( {x,y} \right) \le 150} \hfill \\ {\frac{1}{M}\mathop \sum \limits_{{\left( {x,y} \right) \in G}} f\left( {x,y} \right),} \hfill & {f\left( {x,y} \right) > 150} \hfill \\ \end{array} } \right.\begin{array}{*{20}c} {} \\ {} \\ \end{array} $$
(3)

where \(f\left( {x,y} \right)\) is the imported images after histogram equalization, and \( M\) is the total pixels in an image.

Fig. 4
figure 4

Image after traffic lane filtering and mean filtering

The mean filter is mainly used for noise reduction. The surface of the pavement is composed of very small asphalt particles. Since each particle does not have the same color, there is a large difference among the grayscale values of pixels representing the asphalt particles even if it is a neighboring pixel in an image.

$$ G\left( {x,y} \right) = \frac{1}{mn}\mathop \sum \limits_{i,j \in s} S\left( {i,j} \right) $$
(4)

5 Crack image segmentation based on adaptive threshold processing

5.1 Grayscale characteristics of pavement image

In the current study, the characteristics of the crack are analyzed by the grayscale distribution characteristics of the crack, as shown in Fig. 5. In Fig. 5a, four image regions are selected to analyze the crack gradation characteristics, and Fig. 5a is the image after pre-processing to enhance the contrast. Four different windows are selected to compare the different features of each window within an image: a window including crack (box ‘A’ in Fig. 5a); a window affected by traffic lane (box ‘B’ in Fig. 5a); a window affected by shadow (box ‘C’ in Fig. 5a); and normal road (box ‘D’ in Fig. 5a). In order to better show the grayscale distribution of cracks, Fig. 3b shows the grayscale distribution after 90 degrees of rotation and Fig. 3b shows that the crack has a lower grayscale. Moreover, there is a random low grayscale in the image. The adaptive threshold is the key to extracting crack images. Figure 3c indicates the position of the crack on the grayscale distribution of four different windows. In Fig. 5c (A), lots of pixels are concentrated in between 100 and 140 of grayscale which represents the typical pavement, but a large peak can be found near the ‘0’ representing pure black which is affected by crack. Figure 5c (B) shows the histogram distribution of the region, including the traffic lane. Most of the pixels are concentrated in the grayscale value representing the traffic lane. Therefore, the average value is slightly darker than this peak. In the region of a normal road (seen Fig. 5c (C) and (D)), the grayscale is mainly concentrated around 100. Here, the difference among the grayscale at typical pavement is not too high; the small change in the spatial distribution of grayscale indicates that there are no cracks in this region.

Fig. 5
figure 5

Characteristic analysis of cracks

5.2 Selection of Sliding window size based on the linear sweep algorithm (LSA)

The selection of the sliding window size is crucial in the threshold segmentation. The variance reflects the grayscale change of the image in the window range, and the window should cover the crack, so the window value m is three times the crack width (3 W) we defined. Therefore, the cross-sectional variation of the grayscale is used to estimate the width of the crack to determine the size of the sliding window. The scans are carried out between the rows and columns from top to bottom and from left to right. The position of the mark in Fig. 6b indicates the position of the crack on the gradation curve. Figure 6c shows a partially enlarged curve of the crack having a recessed structure including the descending starting point S, the valley point V, the end point E, and the crack width W, and the grayscale edge varying intensity I1, I2. And definitions I1 and I2 should be higher than the threshold I.

Fig. 6
figure 6

Schematic diagram of selecting sliding window values

Sliding window operator:

  1. (1)

    Looking for the valley point, and because after the histogram equalization, the valley point must be in the low gray level. Assuming that the valley point V is less than 50, so the potential valley bottom V(x) = minimum(p(u)) and p(u) < 50 P(u) is the grayscale of the image rows and columns.

  2. (2)

    The gradient is calculated on both sides, and the left gradient is l(u) = dp(u), and when l(u) > 0, it reaches the S point. The right gradient is r(u) = dp(u), and when r(u) > 0, it reaches end point E.

  3. (3)

    At the same time, I1 and I2 are calculated, and the threshold i is required to be satisfied, and the value I am 70 according to the number of experiments. While all of the above conditions are met, the area is a potential crack zone. The difference between points E and S is the crack width. The maximum value between E and S during the entire search is W, and here, we can get the window value as 3 W.

5.3 Multiple thresholding method (MTM)

Both the global and local threshold algorithms distinguished cracks through thresholds, but much noise still exists. Therefore, further improvement of the algorithm is developed in this paper which is the multiple thresholding method. The method is the concept of finding the maximum interclass variance of the standard deviation in an image. This method is much powerful to extract the potential crack region so that most noises can be filtered out. The algorithm will adaptively apply different thresholds to segment the image without excessive parameters. Figure 6 shows the adaptive thresholding process. Figure 7a shows starting from the original image, and Fig. 7b is the mean image, Fig. 7c is the deviation variance image, and the local threshold based on the variance and the mean is acquired according to the adaptive window (shown in Fig. 7d). The variance image segmentation is performed using the global threshold to obtain a potential crack image (shown from Fig. 7e). Then use further segmentation to obtain a crack image by Fig. 7g. The multiple thresholding method can be generally summarized as follows:

Fig. 7
figure 7

Explanation adaptive thresholding process

5.3.1 Calculation of mean and deviation using sliding windows

For the calculation of mean and standard deviation, a pre-treated image can be segmented with several windows. In this study, the window size is adaptive. In this process, the value of each pixel point in Fig. 7b is equal to the mean value of all pixels in the size of the adaptive window. Meanwhile, the value of each pixel in Fig. 7c is the deviation variance within the adaptive window. The formula can be, respectively, expressed:

$$ \begin{aligned} & m(i,j) = \frac{1}{{R^{2} }}\sum\limits_{x = i - R/2}^{x = i + R/2} {\sum\limits_{y = i - R/2}^{y = i + R/2} {I(x,y)} } \\ & \sigma (i,j) = \frac{1}{{R^{2} }}\sum\limits_{x = i - R/2}^{x = i + R/2} {\sum\limits_{y = i - R/2}^{y = i + R/2} {\left( {I(x,y) - m(i,j)^{2} } \right)^{\frac{1}{2}} } } \\ \end{aligned} $$
(5)

where \(m\left( {i, j} \right)\) is a mean of the sliding windows, and \(\sigma \left( {i, j} \right)\) is the deviation variances. R is an adaptive window size, and I(x,y) is the grayscale of the pixels.

5.3.2 Segmentation of potential crack region by global threshold

The global threshold of the pixel grayscale value is to separate the image into two categories of grayscale. It tried to define an interclass variance to express the difference of grayscale value between the two categories, and when the interclass variance gets maximum, the responding grayscale value can be a threshold. The specific implementation is as follows:

In this method, the threshold is determined by the maximum interclass variance methods [12]. The pixels of the input image are represented from 0 to a maximum of \( \sigma \left( {i, j} \right)\). The probability of the grayscale level ‘i’ is calculated from Eq. (6).

$$ p_{i} = n_{i} /N $$
(6)

where \(n_{i}\) is the number of pixels at level ‘I,’ and N is the total number of pixels (N = \(n_{0} + n_{1} + n_{2} + \cdots + n_{\max \left( \sigma \right)}\)).

If the threshold is assumed certain value as T to segment the images to two categories, \(C_{0}\) and \({ }C_{1}\), \(C_{0} \) contains pixels representing grayscale values from 0 to T and \(C_{1}\) includes pixels representing grayscale values from T + 1 to \({\text{max}}\left( \sigma \right)\). Therefore, the probability (\(\omega_{0}\)) can be calculated by Eq. (7):

$$ \omega_{0} = \mathop \sum \limits_{i = 1}^{T} p_{i} $$
(7)

The mean grayscale (\(\mu_{0}\)) of the target region,\({ }C_{0}\), can be represented by Eq. (8).

$$ \mu_{0} = \mathop \sum \limits_{i = 1}^{T} i \times p_{i} /\omega_{0} $$
(8)

The probability (\(\omega_{1}\)) and mean grayscale (\(\mu_{1}\)) of the target region,\({ }C_{1}\), are calculated by Eqs. (9) and (10), respectively.

$$ \omega_{1} = \mathop \sum \limits_{i = T + 1}^{L} p_{i} $$
(9)
$$ \mu_{1} = \mathop \sum \limits_{i = T + 1}^{L} i \times p_{i} /\omega_{1} $$
(10)

The mathematical expectation between the two classes;

$$ { }\mu_{1} \omega_{1} + { }\mu_{0} \omega_{0} = \mu $$
(11)

Then, the interclass variance \(\sigma^{2}\) is as following;

$$ { }\sigma^{2} = \omega_{0} \times (\mu_{0} - \mu )^{2} + { }\omega_{1} \times (\mu_{1} - \mu )^{2} $$
(12)

Figure 8 shows the result of evaluating the threshold value by applying the interclass variance method. Most of the pixels are located in the region which represents higher grayscale than the threshold value (\({ }C_{1}\)). However, all the pixels representing the crack are located in the region which represents lower grayscale than the threshold value (\(C_{0}\)). It can be seen that the image segmentation method based on the threshold can segment into two different categories in an image. In Fig. 8, the threshold value (T) is 27 when the interclass variance is reached to the maximum so that this value can be used as the threshold to segment the image. The result of the image applying the threshold is shown in Fig. 7e.

Fig. 8
figure 8

Extraction of the potential crack using global threshold method

In order to further subdivide the cracks, the binarization of the potential crack image and the original image is fused. The following equations can show the extraction of the potential crack region.

$$ F\left( {i,j} \right) = \left( {P\left( {i,j} \right) + 1} \right) \times \left( {M\left( {i,j} \right) + 1} \right) - 1 $$
(13)
$$ F\left( {i,j} \right) = \left( {P\left( {i,j} \right) + 1} \right) \times \left[ {\left\{ {\begin{array}{*{20}c} 0 \\ {255} \\ \end{array} } \right. + \left. 1 \right]} \right] - 1 $$
(14)
$$ F\left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} {P\left( {i,j} \right)} \hfill \\ {\left( {P\left( {i,j} \right) + 1} \right) \times 256 - 1 \ge 255} \hfill \\ \end{array} } \right. $$
(15)
$$ F\left( {i,j} \right) = \left\{ {\begin{array}{*{20}c} {P\left( {i,j} \right)} \\ {255} \\ \end{array} } \right. $$
(16)

where \(P\left( {i,j} \right)\) is the image after the mean filtering, \( M\left( {i,j} \right)\) is the potential crack image, and the \(F\left( {i,j} \right)\) is the generated image. The results of fusion processing can be seen in Fig. 7f.

5.3.3 Final segmentation using local threshold

The local threshold method divides the image into several windows to calculate the thresholds for each divided window separately. Equation (17) shows the threshold value calculation.

$$ T_{{{\text{niblack}}}} = m + k \times \sigma $$
(17)

where m is mean value, k is adjustment factor (normally k is − 0.2), and \(\sigma\) is standard deviation (Vijayan et al. 2017).

5.4 Algorithm performance

In order to evaluate the performance of the proposed algorithm, several experiments have been carried out. All of these images have a resolution of 880 by 480. We manually annotated the real crack curves on these images for objective performance evaluation. We implemented the proposed method in MATLAB and tested it on Windows10 PC with INTEL i5 9400f CPU and 24 GB memory. First, we focus on the quantitative evaluation of test performance through two measurements. The performance of the algorithm is usually measured by its sensitivity and precision. In Eq. (18), Pg is the actual crack region in the sample image, Ph is the extracted crack region, Num is the pixel number, Rs is the sensitivity representing the integrity of crack extraction, Rp is the precision of crack extraction, and1- Rp is the error rate of crack extraction.

$$ {\text{Rs}} = \frac{{{\text{Num}}\left( {{\text{Ph}} \cap {\text{Pg}}} \right)}}{{{\text{Pg}}}};{\text{Rp}} = \frac{{{\text{Num}}\left( {{\text{Ph}} \cap {\text{Pg}}} \right)}}{{{\text{Ph}}}}; $$
(18)

Because the cracks in the road image have a certain width, we allow a certain tolerance when measuring the coincidence between the detected crack curve and the actual surface crack curve. More specifically, the average crack width in the images we collected was about 2 pixels. This paper compares with the other two algorithms, the OTSU method, and the local threshold method.

Various types of cracks exist in a complex road condition, and hence, case study was performed to verify that the proposed algorithm in this paper is applicable to detect other cracks. Figure 9a shows the original image when cracks occur in multiple directions, and the results for different threshold methods are shown in Figs. 9b, c, respectively. Figure 9d shows the result of applying the PCRM proposed in this paper. A case as shown in Fig. 9e shows that the traffic lane is dominant as a large area of the image. The results for segmentation methods are shown in Fig. 9f, g, respectively. Figure 9h shows the result of applying the PCRM proposed in this paper. Regardless of the case, it can be seen that the shape of the crack can be distinguished when the potential crack region is applied. It is important to evaluate the potential crack region because clear crack extraction is required in automatic crack detection (Table 1).

Fig. 9
figure 9

comparison with different condition images of pavement

Table1 Extraction performance comparison of different methods

Through comparison of these typical experiments, it is found that the Niblack with the pre-treatment algorithm proposed in this paper can remove the image noise and reduce its interference to the crack region extraction, making the extraction more accurate.

Since the cracks may differ due to gray distribution in the thicker or thinner areas of the cracks, and the adaptive sliding window size will also be different, the segmentation results for cracks of different widths are shown in Fig. 10. For different cracks can be extracted very well, but because some areas of thicker crack are close to the background color, the cracks cannot be extracted completely.

Fig. 10
figure 10

Results of segmentation of areas with thicker or thinner cracks

To further investigate the reliability of the algorithm, the German asphalt pavement distress (Gap) dataset was used to test the segmentation results under different methods. Gap dataset contains a total of 1,969 Gy value images, containing different categories of distress such as cracks, pits, and inlays, with an image resolution of 1920 × 1080 pixels and each pixel reps 1.2 mm × 1.2 mm, due to the large image size, GPU memory is limited, each image is cropped into 6 non-overlapping image regions of size 640 × 540 pixels. Only image regions with 1000 pixels are kept, so we get 509 images for testing.

To validate the proposed method, detection was calculated based on accuracy recall and precision scores. The number of true/false positive/negative detections was determined by comparing the method results with the manual marker results from human experts. Table 2 shows the validation results.

Table 2 Comparison of the validation results

6 Conclusions

In this study, driving tests were performed to detect the cracks of pavement and image processing method, threshold segmentation methods applied to detect the images of crack automatically. Also, the multiple thresholding method (MTM)was developed in this paper which can show the clear crack regions of the images. The detailed conclusions of this paper are shown below:

  1. (1)

    In order to remove the noises from the crack images, the pre-treatment of images was performed, which includes grayscale processing, histogram equalization to enhance image contrast, filtering traffic lane, and mean filtering. Most of the main noises can be filtered by this image processing, and it is accurate to use the pre-treated images for the next steps to detect the cracks.

  2. (2)

    Global and local threshold methods applied to segment the image from the pre-treated images. Otsu has a threshold for the entire image, but the Niblack method has the thresholds corresponding to the divided windows of the image. If the threshold is evaluated from the Otsu method, most pixels representing shadows are not able to be removed. In this paper, thresholds evaluated by the Niblack method were compared at four different areas of an image including parts of the crack; the traffic line; the shadow; and normal road. It is found that the thresholds of the Niblack method can remove noises like traffic lane and gradation effect from the light, etc.

  3. (3)

    Both the global threshold and local threshold algorithms distinguished cracks through thresholds, but much noise still exists. Therefore, the multiple thresholding method (MTM)was developed in this paper. A maximum interclass variance can be evaluated by the standard deviation results in an image. The standard deviation value corresponding to the maximum interclass variance is the threshold of standard deviation so that the potential crack region can be extracted from the image. Based on PCRM, the potential crack region can be extracted from the image.

  4. (4)

    The proposed image processing algorithm was applied to the image data taken from the driving tests, and all 41 cracks were detected by the proposed image processing algorithm in the experimental route. The longitude and latitude corresponding detected cracks were presented. It is expected that cracks can be detected more effectively if the developed algorithm applies to process the crack images of pavement or infrastructures.