Elsevier

Information Fusion

Volume 24, July 2015, Pages 147-164
Information Fusion

A general framework for image fusion based on multi-scale transform and sparse representation

https://doi.org/10.1016/j.inffus.2014.09.004Get rights and content

Highlights

  • Includes discussion on multi-scale transform (MST) based image fusion methods.

  • Includes discussion on sparse representation (SR) based image fusion methods.

  • Presents a general image fusion framework with MST and SR.

  • Introduces several promising image fusion methods under the proposed framework.

  • Provides a new image fusion toolbox.

Abstract

In image fusion literature, multi-scale transform (MST) and sparse representation (SR) are two most widely used signal/image representation theories. This paper presents a general image fusion framework by combining MST and SR to simultaneously overcome the inherent defects of both the MST- and SR-based fusion methods. In our fusion framework, the MST is firstly performed on each of the pre-registered source images to obtain their low-pass and high-pass coefficients. Then, the low-pass bands are merged with a SR-based fusion approach while the high-pass bands are fused using the absolute values of coefficients as activity level measurement. The fused image is finally obtained by performing the inverse MST on the merged coefficients. The advantages of the proposed fusion framework over individual MST- or SR-based method are first exhibited in detail from a theoretical point of view, and then experimentally verified with multi-focus, visible-infrared and medical image fusion. In particular, six popular multi-scale transforms, which are Laplacian pyramid (LP), ratio of low-pass pyramid (RP), discrete wavelet transform (DWT), dual-tree complex wavelet transform (DTCWT), curvelet transform (CVT) and nonsubsampled contourlet transform (NSCT), with different decomposition levels ranging from one to four are tested in our experiments. By comparing the fused results subjectively and objectively, we give the best-performed fusion method under the proposed framework for each category of image fusion. The effect of the sliding window’s step length is also investigated. Furthermore, experimental results demonstrate that the proposed fusion framework can obtain state-of-the-art performance, especially for the fusion of multimodal images.

Introduction

In recent years, image fusion has become an important issue in image processing community. The target of image fusion is to generate a composite image by integrating the complementary information from multiple source images of the same scene [1]. For an image fusion system, the input source images can be acquired from either different types of imaging sensors or a sensor whose optical parameters can be changed, and the output called fused image will be more suitable for human or machine perception than any individual source image. Image fusion technique has been widely employed in many applications such as computer vision, surveillance, medical imaging, and remote sensing.

Multi-scale transform (MST) theories are the most popular tools used in various image fusion scenarios such as multi-focus image fusion, visible-infrared image fusion, and multimodal medical image fusion. Classical MST-based fusion methods include pyramid-based ones like Laplacian pyramid (LP) [2], ratio of low-pass pyramid (RP) [3] and gradient pyramid (GP) [4], wavelet-based ones like discrete wavelet transform (DWT) [5], stationary wavelet transform (SWT) [6] and dual-tree complex wavelet transform (DTCWT) [7], and multi-scale geometric analysis (MGA)-based ones like curvelet transform (CVT) [8] and nonsubsampled contourlet transform (NSCT) [9]. In general, the MST-based fusion methods consist of the following three steps [10]. First, decompose the source images into a multi-scale transform domain. Then, merge the transformed coefficients with a given fusion rule. Finally, reconstruct the fused image by performing the corresponding inverse transform over the merged coefficients. These methods assume that the underlying salient information of the source images can be extracted from the decomposed coefficients. Obviously, the selection of transform domain plays a crucial role in these methods. A comparative study of different MST-based methods is reported in [11], where Li et al. found that the NSCT-based method can generally achieve the best results. In addition to the selection of transform domain, the fusion rule in either high-pass or low-pass band also has a great impact on the fused results. Conventionally, the absolute value of high-pass coefficient is used as the activity level measurement for high-pass fusion. The simplest rule is selecting the coefficient with largest absolute value at each pixel position (the “max-absolute” rule). Many improved high-pass fusion rules which make use of the neighbor coefficients’ information have also been developed. However, compared with the great concentration on developing effective rules for high-pass fusion, less attention has been paid to the fusion of low-pass bands. In most MST-based fusion methods, the low-pass bands are just simply merged by averaging all the source inputs (the “averaging” rule).

Sparse representation addresses the signals’ natural sparsity, which is in accord with the physiological characteristics of human visual system [12]. The basic assumption behind SR is that a signal xRn can be approximately represented by a linear combination of a “few” atoms from an overcomplete dictionary DRn×m(n<m), where n is the signal dimension and m is the dictionary size. That is, the signal x can be expressed as xDα, where αRm is the unknown sparse coefficient vector. As the dictionary is overcomplete, there are numerous feasible solutions for this underdetermined system. The target of SR is to calculate the sparsest α which contains the fewest nonzero entries among all feasible solutions (known as sparse coding). In SR-based image processing methods, the sparse coding technique is often performed on local image patches for the sake of algorithm stability and efficiency [13]. Yang and Li [14] first introduced SR into image fusion. The sliding window technique (patches are overlapped) is adopted in their method to make the fusion process more robust to noise and mis-registration. In [14], the sparse coefficient vector is used as the activity level measurement. Particularly, among all the source sparse vectors, the one owning the maximal l1-norm is selected as the fused sparse vector (the “max-L1” rule). The fused image is finally reconstructed with all the fused sparse vectors. Their experimental results show that the SR-based fusion method owns clear advantages over traditional MST-based methods for multi-focus image fusion, and can lead to state-of-the-art results. In the past few years, the SR-based fusion has emerged as a new active branch in image fusion research with many improved approaches being proposed [15], [16], [17], [18].

Although both the MST- and SR-based methods have achieved great success in image fusion, it is worthwhile to notice that both of them have some defects, which will be further discussed in this paper. Moreover, to overcome the related disadvantages, we present a general image fusion framework by taking the complementary advantages of MST and SR. Specifically, the low-pass MST bands are merged with a SR-based fusion approach while the high-pass MST bands are fused using the conventional “max-absolute” rule with a local window based consistency verification scheme [5]. To verify the effectiveness of the proposed framework, six popular multi-scale transforms (MSTs), which are LP, RP, DWT, DTCWT, CVT and NSCT, with different decomposition levels ranging from one to four are tested in our experiments. By comparing the fused results subjectively and objectively, we give the best-performed methods under the proposed framework for the fusion of multi-focus, visible-infrared and medical images, respectively. The effect of the sliding window’s step length is also investigated. Experimental results demonstrate that the combined methods can clearly outperform both the MST- and SR-based methods. Furthermore, the proposed fusion methods can obtain state-of-the-art fused results, especially for the fusion of medical images as well as visible-infrared images.

The rest of this paper is organized as follows. We first present the detailed fusion framework in Section 2. In Section 3, the disadvantages of MST- and SR-based methods and why the proposed framework can overcome them are discussed from a theoretical perspective. The experimental results are given in Section 4. Section 5 summarizes some main conclusions of this paper.

Section snippets

Proposed fusion framework

To better exhibit the advantages of the proposed framework over MST- and SR-based methods, we first present the details of our framework in this section.

Why the proposed framework works

In this section, for each of the MST- and SR-based fusion methods, we first itemize its main defects and then show why the proposed framework can overcome them. All the points given here will be further experimentally verified in Section 4.

Source images

As shown in Fig. 3, 26 pairs of source images grouped into three categories are employed to verify the effectiveness of the proposed fusion framework. Among them, there are 10 pairs of multi-focus images (Figs. 3(a)), 8 pairs of visible-infrared images (Fig. 3(b)) and 8 pairs of medical images (Fig. 3(c)). For each pair, the two source images are assumed to be pre-registered in our study.

Objective evaluation metrics

It is not an easy task to quantitatively evaluate the quality of a fused image since the reference image

Conclusion

In this paper, we present a general image fusion framework with multi-scale transform (MST) and sparse representation (SR). In the framework, the low-pass MST bands are merged with the SR-based scheme while the high-pass bands are fused using the conventional “max-absolute” rule. The advantages of the proposed fusion framework over conventional MST- and SR-based methods are first analyzed theoretically, and then experimentally verified. In our experiments, six popular multi-scale transforms

Acknowledgements

The authors first sincerely thank the editors and anonymous reviewers for their constructive comments and suggestions, which are of great value to us. The authors would also like to thank Prof. Shutao Li and Dr. Xudong Kang from Hunan University (China), Dr. Xiaobo Qu from Xiamen University (China), and Prof. Zheng Liu from Toyota Technological Institute (Japan) for generously providing some source images and codes used in the publications [11], [24], [27], [30]. This work is supported by the

References (30)

  • A. Goshtasby et al.

    Image fusion: advances in the state of the art

    Inform. Fusion

    (2007)
  • P. Burt et al.

    The laplacian pyramid as a compact image code

    IEEE Trans. Commun.

    (1983)
  • V. Petrovic et al.

    Gradient-based multiresolution image fusion

    IEEE Trans. Image Process.

    (2004)
  • M. Beaulieu, S. Foucher, L. Gagnon, Multi-spectral image resolution refinement using stationary wavelet transform, in:...
  • B.A. Olshausen et al.

    Emergence of simple-cell receptive field properties by learning a sparse code for natural images

    Nature

    (1996)
  • Cited by (1158)

    • A color image fusion model by saturation-value total variation

      2024, Journal of Computational and Applied Mathematics
    • Multi-focus image fusion via interactive transformer and asymmetric soft sharing

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text