1 Introduction

Oceanography is the study to analyse and interpret physical, biological, chemical, archaeological and geological data collected from the sea. Oceanographers use imaging technology for their study. However, there are limitations still prevailing in underwater surrounding while collecting and processing the underwater image. Some of the basic limitation in processing is associated with the basic physical properties of the acoustic propagation in water medium. Today’s acoustic based instruments can remotely image the ocean floor, look below the sea bed and measure various physical oceanographic parameters. With the variety of instruments, imaging the objects in underwater like side scan sonar, multi-beam sonar, synthetic aperture sonar, etc., has reached a very good advanced level of imaging systems. Side Scan Sonar equipment helps to acquire and cover large region and provide image of the sea floor [1]. The side scan photography was developed by professor Harold Edgerton in the year 1960s and it is used to detect submarines during world war 2 based on the (ASDIC system) anti-submarine detection investigation committee system [2].

2 Problem definition

The configuration of side scan sonar system is discussed in this section. The basic principle of sonar is echo sounding, where the echo returns are plotted as image. It is very effective in providing wide region of the seafloor. Sidescan system is usually mounted on the towfish of unmanned underwater vehicle. Figure 1 illustrates the sidescan sonar system configuration, where the water surface is flooded with acoustic source and its intensity returns are plotted as image [3].

Fig. 1
figure 1

Sidescan sonar configuration

2.1 Effect of antenna beam pattern versus resolution

The antenna beam pattern plays vital role for producing high quality sonar images. Horizontal beam pattern shall be between 0.1° and 0.5° i.e. extremely narrow to have the best across range resolution. For maximising the range of the sonar, the vertical beam pattern should be around 80° width. The overall quality of sonar images is improved by increasing the pixel density to increase the resolution. Since the resolution is related with frequency, the manufacturers have decided to increase the frequency. The resolution linearly depends on the wavelength λ, which is again related to frequency given by λ = v/f. ‘v’ is the velocity of sound in water and ‘f’ is the frequency. The logic signifies that, high frequency sonar produces high resolution image (in the order of hundreds kHz) but it has less range. That is the high frequency can penetrate short distance or depth. The low frequency reaches long distance and deep into water. But Low frequency produce low resolution image.

The other limitation is that the resolution varies along the range. The insonified region linearly increases along the range. And the resolution along the track δy is:

$$\delta y = R\uptheta$$
(1)

where ‘R’ is the range, and ‘θ’ the aperture of horizontal beam. This tend to result in degrading of side scan images at longer range.

2.2 Need for resolution

The query of resolution has been raised again by the invention of very highly resolved sonar equipment. The limitations of the side scan technology are highlighted and still it emphasis the need for producing high resolution image. To identify the resolution for performing object recognition is a difficult task. The effort has been put forth by many researchers [4,5,6], finding the minimum resolution as well as differentiating the shape of the object such as sphere or a cube [7, 8]. Very little research is done on the resolution improvement and is still not resolved in sonar images. So we focus here on super resolution algorithm to analyze the resolution needs for object identification.

The need for resolution is still emphasised by the real time sonar data collected in XTF format used in our work. The intensities are extracted using matlab software and then converted to image [9,10,11].

The intensity values which shows the characteristics of seabed are obtained from the amplitude samples. Figure 2 shows the distribution of pixel intensity values and then it is converted to jpeg image (used in this work).

Fig. 2
figure 2

Pixel intensity values distribution

Figure 3 shows the portion of real time image. Since the image size is (3 × 2.5) km2 a portion of image is processed. The intensity values (z axis) with easting (x axis), northing coordinates (y axis) is shown. Figure 3 shows some of the values to be Zero which implies missing data due to ambient noise and the reasons discussed in the section above-problem definition. This emphasis the need for resolution enhancement in the sonar images.

Fig. 3
figure 3

Sub image

2.3 Automatic target recognition problem in sonar images

Automatic object detection and identification in sonar imagery is active research area in the Naval force, maritime for longer period. Due to enhancement in technology, threats to Naval, maritime have increased so new benchmarks for object identification has to be adopted. Detection and identification techniques concentrate on local contrast and global clarity [12] or [13]; Alternative supervised learning methods to examine the shape of objects using basic physics of acoustics [7, 8] show a lot of promises. But still, the probability of false alarm rates for object recognition remains is likely to be high [12,13,14].

Determining the resolution required to perform object recognition still remains a challenging task. Misidentification of targets is more at low resolution. That is the image based classifier fails at low resolution and the classification accuracy is less [7, 8]. This paved the need to create new classification versus resolution algorithm [9, 15] to work on the autonomous platforms in real time.

Image resolution is determined by Pixels per square inches. Pixel is defined as a small unit of image. The most important parameter that corrupts the image resolution is the ambient noise [16], Scattering, Reflection of the surface and the ocean floor. Reflected signal from the surface perfectly gets diverted back into the water. Some of the signal gets transmitted into the earth. Some of the signals get scattered by the suspended particles floating in water, aquatic organism and plants, sand, rock, objects etc. This leads to noise called ambient noise in the image. This ambient noise will reduce the Signal to Noise Ratio of the signal and makes sonar images noisy. Therefore, images directly obtained from the side scan sonar images needs to be enhanced for accurate interpretation. This can be accomplished by highly efficient Super-resolution enhancement algorithm. Super-resolution aims to apply image processing techniques on low-resolution images which is degraded and to obtain a high resolution image. The development of post processing algorithm for high resolution is presented in this work which address the challenges faced when target object recognition.

The Sonar image is viewed as a low resolution (LR) Image. With a collection of these sonar images, the mutual information of all the low resolution images is transferred to a common grid or a common co-ordinate. With proper a priori information about the degrading parameters, the registered image is fused to form the high resolution (HR) estimate. These High resolution images obviously carry fine detailed information and can be processed for target identification. Very little research work is done on the enhancement of resolution in terms of pixel density in the SONAR images. Hence we are motivated to carry out the research work in improving the resolution.

The objective is to produce SONAR images of very high quality that is clear, sharp and with fine details with much less loss. Such images can be used for detecting sea bed objects with more clarity. All these add to the need for post processing the sidescan sonar image and for developing a new super resolution algorithm. Hence it is necessary to design novel super resolution algorithms that are suitable for images captured from side scan sonar equipment, thereby to improve classification accuracy. The above processes are performed in near real time.

2.4 Related works

Demirel et al. [17] introduced techniques for sharper image resolution. The image is decomposed into subbands using Discrete Wavelet Transform technique which preserves edges. The high frequency subbands are interpolated and added with subbands generated by Stationary Wavelet transforms (SWT). SWT helps to reduce the loss of information while downsampling. The authors compared with existing techniques like bilinear, bicubic, WZP, NEDI, HMM, HMM SR, WZP-CS, WZP-CS-ER, DWT SR, CWT SR interpolation techniques. Sree Sharmila et al. [18] worked on LANDSAT Remote Sensing Images to classify the water and non water body. The authors denoised the image by using hybrid transforms (HDL) and improved the resolution by using discrete wavelet transforms (DWT). GLCM texture features are derived and input to support vector machine classifiers. For denoising lifting techniques is compared with HDL techniques. For resolution enhancement SWT and DWT are compared. HDL and DWT techniques outperformed the other in their respective process. PSNR is utilised as the performance measure. Zhou et al. [19] presented a novel image enhancement algorithm based on curvelet transform for sonar images. The curvelet transform constructs multi-channel enhancement structure based on (HVS) human visual system. This method removed the noise and low contrast. Finally, the enhanced image is obtained. Burguera and Oliver [20] proposed the mapping techniques of side scan sonar images and discussed the resolution enhancement technique. Probabilistic based model is used to estimate the sea-floor region. Sinai et al. [21] suggested K means algorithm to identify mine-like object segment in side scan sonar images. Contour algorithm is used to sharpen and restore the edges of the object. The image features of shadows and highlights are extracted. Pearson–Neyman Criteria is implemented to identify the mine-objects. This reduced false alarm rates and other errors when tested on data of real image and produced good results. Hans et al. [22] presented matrix based prior algorithm on hyper spectral images for obtaining super resolution. The method involves direct mapping or indirect mapping approach in the reconstruction stage. The existing techniques of Yang et al. [23], Dong et al. are compared with matrix based prior algorithm. The performance measure is achieved in terms of peak signal to noise ratio (PSNR) and Structural Similarity Index (SSIM). On et al. [24] discussed about the resolution enhancement of SONAR images where the low frequency signals are used to capture the images under sea bed. The authors commonly used wiener filtering method to enhance azimuth resolution. Priyadarshini et al. (2017) performed contrast enhancement using the stationary wavelet transform (SWT) in sonar images. The image is decomposed into Low and high frequency components of different subbands. Laplacian filter is used to enhance low frequency component. Then inverse SWT is applied to reconstruct high contrast image. The result is compared with the DWT interpolation. Rajarapollu and Mankar [25] replaced the mixing pixels due to artifacts using bicubic interpolation. The authors compared with existing algorithm such as Bilinear, nearest neighbour interpolation techniques.

3 Existing method

Super Resolution has been frequently referred as an important aspect of an image and is determined by Pixels per square inches. Pixel is defined as a small unit of an image. Images are being processed in order to obtain more enhanced Resolution for high pixel density. The resolution improvement of the sonar images still remains a problem and it is compensated by the interpolation techniques. One of the commonly used techniques for image resolution enhancement is Interpolation. This is widely used for applications such as super resolution [10], facial reconstruction, multiple description coding, etc. Interpolation has been used to estimate the unknown pixel or missing data in the image by the neighbourhood pixels. The well known interpolation techniques discussed are bilinear interpolation, bicubic interpolation, wavelet transforms.

The existing methods give blurred edge boundaries, degraded image information. The edges of the objects are not clearly defined. That is the boundary between the back ground and edge of the object is not clear when the resolution is poor. Hence the resolution enhancement is needed prior to object identification and classification. The existing techniques used for resolution enhancement are given below and it is compared with proposed Super resolution using sparse representation.

The proposed method is compared with other existing techniques like

  • Bilinear interpolation [26],

  • Bicubic interpolation [25],

  • Discrete wavelet transform [18],

  • Stationary wavelet transform [27],

  • Combined discrete and stationary wavelet transform [10, 17].

3.1 Bilinear interpolation method

Bilinear interpolation smooths the edges by manipulating the neighbouring pixels to find the unknown pixel intensities whose coordinates is (x, y). The pixel represents characteristics such as contrast, brightness, colour etc. Edge contrast is reduced by averaging surrounding pixel values together. The bilinear interpolation methods are among the most common techniques used in image processing due to their computational simplicity. The unknown pixel is estimated as follows.

$$B_{l} \left( {x,y} \right) = o + lx + my + nxy$$
(2)

l, m, n, o are bilinear coefficients obtained from the interpolation of 4 neighbourhood pixels; Bl(x,y) = bilinear Interpolated new pixel intensity.

3.2 Bicubic interpolation method

Bicubic interpolation [25] manipulates sixteen neighbourhood pixel intensities to estimate the unkown pixel intensities and uses that information for enhancement. The 16 bc(x, y) coefficients are obtained from 16 equations. (x, y) is the coordinates of the location of the pixel intensities. The interpolated function for determining 16 coefficients \(p_{ik}\) is as follows:

$$b_{c} \left( {x,y} \right) = \mathop \sum \limits_{i = 0}^{3} \mathop \sum \limits_{k = 0}^{3} p_{ik} x^{i} y^{k}$$
(3)

It produces smoother results for enlargements less than 150% but quickly degrades in quality for above 150% enlargement.

3.3 Wavelet transfoms

Wavelet transfom has both time and frequency information. It is used in many applications like denoising in sonar images [28], compression, etc. (DWT) Discrete wavelet transform is applied in image processing. DWT decomposes image into high and low frequency component of different subband, and helps to preserve high frequency component details. Stationary wavelet transforms (SWT) decompose the image into subbands of equal size.

3.4 Combined DWT–SWT

The low resolution image is passed through a filter in wavelet domain thereby high resolution image is obtained. The improvement of quality in the image is done using interpolation technique on High Frequency component. The edge detail is enhanced by using SWT in intermediate stage. DWT can be used to decompose the input image into different sub-bands and interpolate high frequency (HF) bands. When applying DWT down sampling produce loss of information in the subbands. So SWT is used to reduce the artifacts. The HF subbands obtained using SWT are similar in size. Basically the illumination information of low resolution image is represented by the LL sub-band and it contains less information. This method of combined dwt-swt method will preserve the HF components.

4 Proposed method

The existing methods give blurred edge boundaries, degraded image information. The edges of the objects (between foreground and background) are not clearly defined when the resolution becomes poor due to degradation. Hence resolution enhancement is needed prior to object identification and classification. In order to overcome the disadvantages of traditional methods, sparse representation algorithm has been proposed because the high dimensional sonar image data exhibits the property of sparsity. The existing Techniques used for resolution enhancement are given below and it is compared with sparse representation algorithm for super resolution.

4.1 Methodology for super resolution using sparse representation

Sonar image has high dimensional data and it is degraded due to various reasons and the need for resolution enhancement is clearly explained in previous sections. The proposed method involves using sparse representation algorithm to produce super resolution images.

Sparsity is an important property used in high-dimensional data analysis and is often easy to interpret. It reduces model complexity and avoids overfitting. Sparse representation is represented by a linear combination of basis vectors which are sparse in nature. Sparse representation algorithm is applied to many problems in image processing, such as restoration [29], denoising [30, 31]and often enhancing is the most recent one. The researchers apply K-SVD method [32] to train and learn dictionary [33] created from image patches and applied for denoising in their work. Super resolution–sparse representation is employed in this work to give sharper edges and textures of the target.

Super resolution-sparse representation (SRSR) is mainly used to create dictionaries of low-resolution image patches—Dlow and Dhigh—dictionaries of high-resolution image patches, respectively [34]. The image is partitioned into patches. Each patch is represented using sparse coefficient and dictionaries. These dictionaries are used to obtain high resolution image.

Let X and Z be a high-resolution image patch and its corresponding low-resolution image patch, respectively. Dhigh dictionary is used to create super resolution image, since it consist features for high resolution. Dlow contains features of low resolution. Z = SBX is the reconstruction constraint.

Reconstruction Phase

Noise from the high resolution image is removed and final super resolution image X∗ is obtained as follows.

The entire super-resolution via sparse representation is summarized below.

Super-resolution algorithm via sparse representation

  • Partition the image which is degraded image Z into patches z. Z is the low resolution image (Z = SBX, S—downsampling operator, B—blurring filter)

  • Construct a dictionary, denoted as D: Dlow and Dhigh.

  • Train image patches (z of low-resolution image Z) using dictionaries Dlow and Dhigh.

Neighbour patches are represented by (\(\alpha = Sparse\,coefficient)\)

$$\begin{array}{*{20}c} {\hbox{min} \left\| \alpha \right\|\,{\text{s}} . {\text{t}} .\,} & {\left\| { FD_{low} \alpha - Fz} \right\|_{2}^{2} \le \,\in_{1} } \\ {} & {\left\|LD_{high} \alpha - p\right\|_{2}^{2} \le\, \in_{1} } \\ \end{array}$$

where ‘F’ is the feature extraction operator extracting edges (‘F’ approximates ‘α’ and ‘z’). It is similar to a high pass filter. It represents features of low resolution images. Sometimes high frequency content are present in low resolution image. ‘F’ helps to extract edge data.

‘z’ is the low resolution patch, ‘p’ is the previously reconstructed high resolution patch, ‘L’ is the superimposing of previously obtained high resolution image and current patch, ‘α’ sparse coefficient.

  • Consider 3 × 3 patches z of Z with overlap of 1 pixel. High resolution patch 9 × 9 with overlap of 3 pixels

To every single patch (overlapped) z of size 3 × 3 of Z (from top–bottom and left–right) use one-pass algorithm (one pass algorithm enhance compatibility between patches [35])

Step 1: Calculate optimum sparse representation coefficients α∗.

$$\upalpha *= {\text{argmin}}\left\| {\hat{z} - D\alpha } \right\|_{2}^{2} +\uplambda\left\| \alpha \right\|_{1}$$
(4)

where \(\hat{z} = \left( {\begin{array}{*{20}c} {F_{y} } \\ p \\ \end{array} } \right)\), \({\text{D}} = \left( {\begin{array}{*{20}c} {FD_{low} } \\ {LD_{high} } \\ \end{array} } \right)\), F is the feature extraction operator extracting edges, p is the previously reconstructed high resolution patch, L is the superimposing of previously obtained high resolution image and current patch, \(\uplambda\) is a regularisation parameter controlling the sparsity, \(\uplambda = 50 \times {\text{dimension}}\,\left( {{\text{patch}}\,{\text{feature}}} \right)\).

Step 2: Calculate the high-resolution patch, x = Dhigh α∗. (Reconstruct high resolution patch). Compute sparse coefficient for all the patches.

Step 3: Replace all the patches x in the original place and combine to form High-resolution image Xo.

$$\begin{array}{*{20}l} {{\text{x}} = {\text{D}}_{\text{high}} } \hfill & {\alpha * \ge {\text{X}}0} \hfill \\ {} \hfill & {{\text{STOP}}.} \hfill \\ \end{array}$$

Reconstruction Phase

This phase helps to remove noise generated due to previous phase.

  • Let the high resolution image is denoted by X0 matrix.

  • Project X0 into the domain of the reconstruction constraint Z = SBX by calculating X∗.

  • Problem is formulated as in Eq. (5). It is solved by back propogation method.

    $$X_{i + 1} = X_{i} + \left( {\left( {Z + SBX_{i} } \right) \uparrow s} \right)b_{p}$$
    (5)

    where Xi+1 is the high resolution image after ith iteration, ↑s = upsampling by a factor of ‘s’, bp = back projection filter.

Step 4: Calculate X∗, the super-resolution image using Eq. (5).

To ensure the reconstructed image X0 satisfies the global constraint to remove artifacts.

$$X* = argmin\left\| {X - X_{O} } \right\|_{2}^{2} \quad {\text{s}} . {\text{t}} .\,{\text{Z}} = {\text{SBX}}$$
(6)
  • Output: X∗. (X∗ is the final super-resolution image).

5 Performance metrics

In order to evaluate the performance of the algorithm PSNR and SSIM is calculated as follows.

5.1 Peak signal to noise ratio (PSNR)

Peak signal to noise ratio (PSNR) is ratio of maximum signal power to the noise power and is defined through the mean squared error (MSE). Given input r × c monochrome image I which is free from noise and its noise image N, MSE is given as in Eq. (7):

$$MSE = \frac{1}{rc}\mathop \sum \limits_{i = 0}^{r - 1} \mathop \sum \limits_{j = 0}^{c - 1} \left[ {I\left( {i,j} \right) - N\left( {i,j} \right)} \right]^{2}$$
(7)

i.e. MSE = (output image − input image)/r × c,

r × c − rows × columns.

The PSNR (in dB) is defined as:

$$PSNR = 10 \log_{10} \frac{{{\text{G}}^{2} }}{MSE}$$
(8)

G2 = 255 (8 bit gray scale image).

G is the maximum possible pixel value of the image. When the pixels are represented using 8 bits per sample, this is 255.

5.2 Structural Similarity Index (SSIM)

Structural Similarity Index (SSIM) is used for measuring the quality of image. At a given pixel I SSIM between two Images img1 and img2 computed given in Eq. (9):

$$SSIM = \frac{{2 * {\text{m}}1\left( {\text{I}} \right) * {\text{m}}2\left( {\text{I}} \right) + {\text{C}}1}}{{{\text{m}}1\left( {\text{I}} \right)^{2} + {\text{m}}2({\text{I}}^{2} ) + {\text{C}}1}}*\frac{{2 * {\text{cov}}\left( {\text{I}} \right) + {\text{C}}2 }}{{\sigma 1\left( {\text{I}} \right)2 + \sigma 2\left( {\text{I}} \right)2 + {\text{C}}2}}$$
(9)

where m1(I) and m2(I): mean of img1 and img2 computed around I through a small window XY. \(\sigma\)1(I) and \(\sigma\)2(I): standard deviation of img1 and img2. cov(I): covariance computed between img 1 and img 2. C1 = (K1*P)2: constant for regularization. C2 = (K2 * P)2: constant for regularization. K1, K2: parameters for regularization (K1, K2 > 0), K = [0.01 0.03]. P: range of the pixel values (if the sequence is 8 bit encoded, L = 255).

6 Results and discussion

6.1 Dataset

The acoustic images are acquired by the Edgetech 4125 series dual frequency, side scan sonar equipment. The sonar operates at frequency 500 kHz and it is used for water survey. The side scan sonar was dipped to 20 metres depth. The images are then processed by using the Discover software and recorded as Extended Triton file (XTF) format. The XTF file is processed using matlab and image is created.

6.2 Experimental results

This work involves real time sonar images. The sotware used is matlab.

The image acquired is sited in Bay of Bengal [11], latitude 13:07.7935N and longitude 80:18.1166E.

Figure 4 illustrates the sonar image data of size 256 × 256 acquired in real time and the resolution enhancement done using techniques

Fig. 4
figure 4

Resolution enhancement on sonar images

  • Bilinear interpolation (a),

  • Bicubic interpolation (b),

  • Discrete wavelet transform (c),

  • Stationary wavelet transform (d),

  • Combined discrete wavelet transform and stationary wavelet transform (e),

  • Super resolution using sparse representation (f).

The image obtained by sparse representation algorithm gives clear picture and sharp edges. The performance is assessed by finding peak signal to noise ratio (PSNR) value. Peak signal to noise ratio (PSNR) will be greater when the resolved image and source image are alike. Higher value means better resolution since distortion in the message is reduced after applying super-resolution techniques. It is also assessed by Structural Similarity Index (SSIM).

The real time sonar images of Low Resolution are enhanced to the High Resolution images of size 512 × 512. Different techniques are used where super resolution algorithm using sparse representation provided high peak signal to noise ratio (PSNR) shown in Table 1.

Table 1 PSNR-various resolution enhancement techniques

Figure 5 illustrates graphically the performance metrics of various resolution enhancement techniques done using PSNR values. The techniques discussed are bilinear interpolation, bicubic interpolation, discrete wavelet transform, stationary wavelet transform, combined discrete wavelet transform and stationary wavelet transform, and the super resolution using sparse representation. It is found that the peak signal to noise ratio for the super resolution algorithm using Sparse representation. High PSNR value implies high resolution. So the sparse representation algorithm produce super resolution image compared to the other techniques.

Fig. 5
figure 5

Performance metrics of various resolution enhancement techniques (PSNR)

Table 2 show Structural Similarity Index values obtained for the resolution enhancement techniques. Figure 6 illustrates graphically the performance metrics of various resolution enhancement techniques done using Structural Similarity Index (SSIM) values. The techniques discussed are Bilinear Interpolation, Bicubic Interpolation, Discrete Wavelet Transform, Stationary Wavelet Transform, Combined Discrete Wavelet Transform and Stationary Wavelet Transform, and the Super resolution using sparse representation. SSIM is an index to measure the structural similarity between two images. Its value lies between -1 and 1. When two images are nearly identical, their SSIM is close to 1.

Table 2 Structural Similarity Index (SSIM)
Fig. 6
figure 6

Performance metrics of various resolution enhancement techniques (SSIM)

It is found that the SSIM for the Super resolution algorithm using Sparse representation is almost in the range 0.8–0.9. This implies image quality is improved as well as the image details are retained when compared to other techniques.

7 Conclusion and future work

Sparsity of sonar image is exploited well and the super resolution image is obtained. We have compared with the existing wavelet transforms and the proposed work outperforms the existing technique in terms of the metric PSNR-peak signal to noise ratio, Structural Similarity Index (SSIM). The proposed method results give clear and sharp edges where the resolution is enhanced to (512 × 512) pixel size. This will highlight the boundaries of the object. Hence we arrive at the conclusion that the super resolution using sparse representation technique is suitable for side scan sonar images. This provided better resolution in terms of pixel resolution replacing the missing data pixel due to ambient noise and other degradation factors. The future work is planned to improve the resolution of the images using Fast wavelet transform which will reduce the computation time and also to analyse the classification accuracy of the target versus resolution enhancement. This will enhance the object detection and classification accuracy.