A general framework for image fusion based on multi-scale transform and sparse representation
Introduction
In recent years, image fusion has become an important issue in image processing community. The target of image fusion is to generate a composite image by integrating the complementary information from multiple source images of the same scene [1]. For an image fusion system, the input source images can be acquired from either different types of imaging sensors or a sensor whose optical parameters can be changed, and the output called fused image will be more suitable for human or machine perception than any individual source image. Image fusion technique has been widely employed in many applications such as computer vision, surveillance, medical imaging, and remote sensing.
Multi-scale transform (MST) theories are the most popular tools used in various image fusion scenarios such as multi-focus image fusion, visible-infrared image fusion, and multimodal medical image fusion. Classical MST-based fusion methods include pyramid-based ones like Laplacian pyramid (LP) [2], ratio of low-pass pyramid (RP) [3] and gradient pyramid (GP) [4], wavelet-based ones like discrete wavelet transform (DWT) [5], stationary wavelet transform (SWT) [6] and dual-tree complex wavelet transform (DTCWT) [7], and multi-scale geometric analysis (MGA)-based ones like curvelet transform (CVT) [8] and nonsubsampled contourlet transform (NSCT) [9]. In general, the MST-based fusion methods consist of the following three steps [10]. First, decompose the source images into a multi-scale transform domain. Then, merge the transformed coefficients with a given fusion rule. Finally, reconstruct the fused image by performing the corresponding inverse transform over the merged coefficients. These methods assume that the underlying salient information of the source images can be extracted from the decomposed coefficients. Obviously, the selection of transform domain plays a crucial role in these methods. A comparative study of different MST-based methods is reported in [11], where Li et al. found that the NSCT-based method can generally achieve the best results. In addition to the selection of transform domain, the fusion rule in either high-pass or low-pass band also has a great impact on the fused results. Conventionally, the absolute value of high-pass coefficient is used as the activity level measurement for high-pass fusion. The simplest rule is selecting the coefficient with largest absolute value at each pixel position (the “max-absolute” rule). Many improved high-pass fusion rules which make use of the neighbor coefficients’ information have also been developed. However, compared with the great concentration on developing effective rules for high-pass fusion, less attention has been paid to the fusion of low-pass bands. In most MST-based fusion methods, the low-pass bands are just simply merged by averaging all the source inputs (the “averaging” rule).
Sparse representation addresses the signals’ natural sparsity, which is in accord with the physiological characteristics of human visual system [12]. The basic assumption behind SR is that a signal can be approximately represented by a linear combination of a “few” atoms from an overcomplete dictionary , where n is the signal dimension and m is the dictionary size. That is, the signal can be expressed as , where is the unknown sparse coefficient vector. As the dictionary is overcomplete, there are numerous feasible solutions for this underdetermined system. The target of SR is to calculate the sparsest which contains the fewest nonzero entries among all feasible solutions (known as sparse coding). In SR-based image processing methods, the sparse coding technique is often performed on local image patches for the sake of algorithm stability and efficiency [13]. Yang and Li [14] first introduced SR into image fusion. The sliding window technique (patches are overlapped) is adopted in their method to make the fusion process more robust to noise and mis-registration. In [14], the sparse coefficient vector is used as the activity level measurement. Particularly, among all the source sparse vectors, the one owning the maximal is selected as the fused sparse vector (the “max-L1” rule). The fused image is finally reconstructed with all the fused sparse vectors. Their experimental results show that the SR-based fusion method owns clear advantages over traditional MST-based methods for multi-focus image fusion, and can lead to state-of-the-art results. In the past few years, the SR-based fusion has emerged as a new active branch in image fusion research with many improved approaches being proposed [15], [16], [17], [18].
Although both the MST- and SR-based methods have achieved great success in image fusion, it is worthwhile to notice that both of them have some defects, which will be further discussed in this paper. Moreover, to overcome the related disadvantages, we present a general image fusion framework by taking the complementary advantages of MST and SR. Specifically, the low-pass MST bands are merged with a SR-based fusion approach while the high-pass MST bands are fused using the conventional “max-absolute” rule with a local window based consistency verification scheme [5]. To verify the effectiveness of the proposed framework, six popular multi-scale transforms (MSTs), which are LP, RP, DWT, DTCWT, CVT and NSCT, with different decomposition levels ranging from one to four are tested in our experiments. By comparing the fused results subjectively and objectively, we give the best-performed methods under the proposed framework for the fusion of multi-focus, visible-infrared and medical images, respectively. The effect of the sliding window’s step length is also investigated. Experimental results demonstrate that the combined methods can clearly outperform both the MST- and SR-based methods. Furthermore, the proposed fusion methods can obtain state-of-the-art fused results, especially for the fusion of medical images as well as visible-infrared images.
The rest of this paper is organized as follows. We first present the detailed fusion framework in Section 2. In Section 3, the disadvantages of MST- and SR-based methods and why the proposed framework can overcome them are discussed from a theoretical perspective. The experimental results are given in Section 4. Section 5 summarizes some main conclusions of this paper.
Section snippets
Proposed fusion framework
To better exhibit the advantages of the proposed framework over MST- and SR-based methods, we first present the details of our framework in this section.
Why the proposed framework works
In this section, for each of the MST- and SR-based fusion methods, we first itemize its main defects and then show why the proposed framework can overcome them. All the points given here will be further experimentally verified in Section 4.
Source images
As shown in Fig. 3, 26 pairs of source images grouped into three categories are employed to verify the effectiveness of the proposed fusion framework. Among them, there are 10 pairs of multi-focus images (Figs. 3(a)), 8 pairs of visible-infrared images (Fig. 3(b)) and 8 pairs of medical images (Fig. 3(c)). For each pair, the two source images are assumed to be pre-registered in our study.
Objective evaluation metrics
It is not an easy task to quantitatively evaluate the quality of a fused image since the reference image
Conclusion
In this paper, we present a general image fusion framework with multi-scale transform (MST) and sparse representation (SR). In the framework, the low-pass MST bands are merged with the SR-based scheme while the high-pass bands are fused using the conventional “max-absolute” rule. The advantages of the proposed fusion framework over conventional MST- and SR-based methods are first analyzed theoretically, and then experimentally verified. In our experiments, six popular multi-scale transforms
Acknowledgements
The authors first sincerely thank the editors and anonymous reviewers for their constructive comments and suggestions, which are of great value to us. The authors would also like to thank Prof. Shutao Li and Dr. Xudong Kang from Hunan University (China), Dr. Xiaobo Qu from Xiamen University (China), and Prof. Zheng Liu from Toyota Technological Institute (Japan) for generously providing some source images and codes used in the publications [11], [24], [27], [30]. This work is supported by the
References (30)
Image fusion by a ratio of low pass pyramid
Pattern Recogn. Lett.
(1989)- et al.
Multisensor image fusion using the wavelet transform
Graph. Models Image Process.
(1995) - et al.
Pixel- and region-based image fusion with complex wavelets
Inform. Fusion
(2007) - et al.
Remote sensing image fusion using the curvelet transform
Inform. Fusion
(2007) - et al.
Multifocus image fusion using the nonsubsampled contourlet transform
Signal Process.
(2009) A general framework for multiresolution image fusion: from pixels to regions
Inform. Fusion
(2003)- et al.
Performance comparison of different multi-resolution transforms for image fusion
Inform. Fusion
(2011) - et al.
Pixel-level image fusion with simultaneous orthogonal matching pursuit
Inform. Fusion
(2012) - et al.
Simultaneous image fusion and super-resolution using sparse representation
Inform. Fusion
(2013) - et al.
Multi-frame compression: theory and design
Signal Process.
(2000)
Image fusion: advances in the state of the art
Inform. Fusion
The laplacian pyramid as a compact image code
IEEE Trans. Commun.
Gradient-based multiresolution image fusion
IEEE Trans. Image Process.
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
Nature
Cited by (1158)
A color image fusion model by saturation-value total variation
2024, Journal of Computational and Applied MathematicsA semantic-driven coupled network for infrared and visible image fusion
2024, Information FusionA multi-channel neural network model for multi-focus image fusion
2024, Expert Systems with ApplicationsPolarization image fusion method based on polarization saliency with generator adversarial network
2024, Optics and Lasers in EngineeringMulti-focus image fusion via interactive transformer and asymmetric soft sharing
2024, Engineering Applications of Artificial IntelligenceSCGRFuse: An infrared and visible image fusion network based on spatial/channel attention mechanism and gradient aggregation residual dense blocks
2024, Engineering Applications of Artificial Intelligence