Keywords

1 Introduction

The range of the light intensity in natural scenes of the real world is so vast that the illumination intensity of sunlight can be 100 million times higher than that of starlight. Therefore, high dynamic range (HDR) images are usually used to capture more information of natural scenes. However, most display devices available to us have a limited dynamic range [21]. HDR tone mapping or dynamic range compression methods are usually required to reproduce the HDR images matching the dynamic range of the standard display devices, so that the details in both the dark and bright areas are as faithfully visible as possible.

In recent years, many HDR image compression methods have been proposed. Roughly, the existing methods can be classified into two categories: global operators and local operators. Global operators apply the same transformation to each pixel, without considering the spatial position. For example, Tumbling and Rushmerier proposed a tone mapping operator in 1993 [25], which uses a single spatially invariant level for the scene and another adaptation level for the display. Although it can preserve the brightness value, but lost the visibility of high dynamic range scenes [12]. Instead of brightness, Ward [26] and Ferwerda et al. [8] aimed at preserving the contrast. They used a scaling factor to transform real-world luminance values to displayable range. They believe that a Just Noticeable Difference (JND) in the real world can be mapped as the JND on the display device. They built their global models based on the threshold versus intensity (TVI) function. Different from Ward’s contrast-based scale factor that only considers the photopic lighting conditions, Ferwerda et al. added a scotopic component. These models can preserve the contrast well but may lose visibility of the very high and very low intensity regions [12]. In the year of 2000, Pattanaik et al. [20] proposed a new time-dependent tone reproduction operator following the framework proposed by Tumblin and Rushmeier. In short, although these global models are low computational cost, they are difficult to preserve the details with high contrast scenes.

In contrast, local operators adapt the mapping functions to the statistics and contexts of local pixels. In comparison to the global operators, local operators perform better on preserving details. However, a fundamental problem with local operators is the halos artifacts appearing around the high-contrast edges. For example, Pattanaik et al. [19] described a comprehensive computational model of human visual system adaptation and spatial vision for realistic tone reproduction. This model is able to display HDR scenes on conventional display devices, but the dynamic range compression is performed by applying different gain-control factors to each band-pass, which causes strong halo effects. To solve this problem, Durand et al. [6] presented a method with edge-preserving filter (bilateral filer) to integrate local intensities for avoiding halo artifacts.

In 2004, Ledda et al. [13] proposed a local model based on the work of Pattanaik et al. [20]. The adaptation part of that method tries to model the mechanisms of human visual adaptation, with the help of a bilateral filer to avoid halo artifacts. Along this line, there are also some other works that attempt to build bio-inspired methods for HDR image rendering and dynamic range compression [9, 14, 28]. Biologically, human visual system can respond to huge luminance range, depending on the ability of visual adaptation, mainly the dark adaptation and light adaptation. Dark adaptation refers to how the visual system recovers its sensitivity when going from a bright environment to a dark one, while light adaptation indicates the processing that visual system recovers its sensitivity when we go from a dark environment to a bright one. In the retina, cone photoreceptors respond to higher light levels while rods are highly sensitive to the light of the dark and dim condition. Therefore, the light and dark adaptations is related to the switch of cones and rods.

In this paper, we propose a new visual adaptation model to compress the dynamic range of HDR images according to the mechanisms of cones and rods in the retina. We design two separated channels (cone and rod channels) with different adaptation properties to compress the dynamic range of light and dark information in the HDR images. The simple receptive field model is further used to enhance the local contrast for improving the visibility of details. Finally, the compressed HDR image is obtained by recovering the fused luminance to the RGB color space. Experimental results suggest that the proposed retinal adaptation model can effectively compress the dynamic range of HDR image and preserve the scene details well.

2 The Proposed Model

In this study, we proposed a new dynamic range compression method aiming at modeling the mechanisms of visual adaptation. The input image is first separated into two luminance distribution maps for the rod and cone channels. Then the responses of rods and cones are obtained with the Naka-Rushton adaptive model [16] according to the local luminance of the scene. A local enhancement operator based on the receptive field of ganglion cells is then used to enhance the local contrast in both the cone and rod channels. Finally, cone and rod channels are adaptively fused into a unified luminance map with compressed dynamic range and enhanced local contrast. The flowchart of dynamic range compression in the proposed model is shown in Fig. 1. Finally, the luminance information is recovered to the RGB space to obtain the color images.

Fig. 1.
figure 1

The flowchart of dynamic range compression in the proposed model. The spatially varying weighting w(xy) for fusion is computed with Eq. (10).

2.1 Response of Photoreceptors

According to the visual adaptation mechanisms, cone and rod photoreceptors vary their response range dynamically to better adapt the available luminance of the environment. The model proposed by Naka and Rushton [16] has been widely used to describe the responses of cones [2, 11] and rods [4]. It indicates that the response of a photoreceptor at any adaptation level can be simply described as \(R = I^n/(I^n + \sigma ^n)\), where I is the light intensity and \(\sigma \) is the semi-saturation parameter determined by the adaptation level, n is a sensitivity-control exponent which is normally between the range of 0.7–1.0 [3, 17]. Thus, the dark and light adaptations can be explained by changing the value of \(\sigma \) at varying luminance levels. For example, on a sunny day, we cannot see well at the beginning when we enter a dark room. It is just because the value of \(\sigma \) is originally set at high adaptation level while I is of low intensity, which makes the response R almost zero. But after tens of minutes, visual sensitivity is restored after \(\sigma \) changes into a smaller adaptation level and the response R increases. Figure 2 shows the response curves changing with different adaptation levels by varying the value of \(\sigma \).

Fig. 2.
figure 2

Response to luminance with different adaptation levels.

HDR image compression mainly refers to compress the dynamic range of luminance information. In this work, we first convert the input HDR image (RGB) into two different luminance channels for further processing by rods and cones [19], i.e., the photopic luminance of cones (\(L_{cone}\)) and scotopic luminance of rods (\(L_{rod}\)):

$$\begin{aligned} L_{cone} = 0.25 \cdot I^R_{in} + 0.67 \cdot I^G_{in} + 0.065 \cdot I^B_{in} \end{aligned}$$
(1)
$$\begin{aligned} L_{rod} = 0.702 \cdot X + 1.039 \cdot Y + 0.433 \cdot Z \end{aligned}$$
(2)

where, \(I^R_{in}\), \(I^G_{in}\) and \(I^B_{in}\) are respectively the R, G and B components of the given HDR image, while X, Y and Z are the three components when transforming the original RGB space to XYZ color space. Thus, we can obtain the cone and rod luminance responses referring to Naka-Rushton equation [16] according to

$$\begin{aligned} {R_{cone}}(x,y) = \frac{{L_{cone}^n(x,y)}}{{L_{cone}^n(x,y) + \sigma _{cone}^n(x,y)}} \end{aligned}$$
(3)
$$\begin{aligned} {R_{rod}}(x,y) = \frac{{L_{rod}^n(x,y)}}{{L_{rod}^n(x,y) + \sigma _{rod}^n(x,y)}} \end{aligned}$$
(4)

Following the previous work [27], there is a empirical relation between the adaptation parameter \(\sigma \) and visual intensity levels as

$$\begin{aligned} {\sigma _{\mathrm{{cone}}}} = L_{cone}^\alpha \cdot \beta \end{aligned}$$
(5)
$$\begin{aligned} {\sigma _{\mathrm{{rod}}}} = L_{rod}^\alpha \cdot \beta \end{aligned}$$
(6)

In this paper, we experimentally set the value of \(\alpha \) as 0.69 for both the cone and rod channels, but \(\beta \) is a constant that differs for the rods and cones, i.e., \(\beta =4\) for the cone channel and \(\beta =2\) for the rod channel.

2.2 Local Enhancement and Fusion

The responses of cones and rods pass through the retina which contains the bipolar cells, horizontal cells, retinal ganglion cells, etc. In this paper, we just focus on the role of local enhancement by the retinal ganglion cells. We use a difference-of-Gaussians (DOG) model [23] to simulate the receptive filed of ganglion cells for enhancing the local contrast of the visual scene. Thus the output of ganglion cells (\(G_{cone}\) and \(G_{rod}\)) can be computed as:

$$\begin{aligned} {G_{cone}}(x,y) = {R_{cone}}(x,y) * \left( {g(x,y;\sigma _{rf}^c) - k \cdot g(x,y;\sigma _{rf}^s)} \right) \end{aligned}$$
(7)
$$\begin{aligned} {G_{rod}}(x,y) = {R_{rod}}(x,y) * \left( {g(x,y;\sigma _{rf}^c) - k \cdot g(x,y;\sigma _{rf}^s)} \right) \end{aligned}$$
(8)
$$\begin{aligned} g(x,y;\sigma _{rf}^{}) = \frac{1}{{2\pi \sigma _{rf}^2}}\exp ( - \frac{{{x^2} + {y^2}}}{{2\sigma _{rf}^2}}) \end{aligned}$$
(9)

where \(*\) denotes the convolution operator, \(\sigma _{rf}^c\) and \(\sigma _{rf}^s\) are respectively the standard deviations of Gaussian shaped receptive field center and its surround, which are experimentally set to be 0.5 and 2.0 in this work. k denotes the sensitivity of the inhibitory annular surround.

To combine the ganglion cell’s outputs along the cone and rod channels, we use a sigmoid function to describe the weighting of cone and rod systems when fusing the two signals.

$$\begin{aligned} w(x,y) = {1 \over {0.2 + {L_{cone}}{{(x,y)}^{ - 0.1}}}} \end{aligned}$$
(10)

where the fixed parameters (i.e., 0.2 and \(-0.1\)) are experimentally set.

From Eq. (10), the value of w(xy) is increased with the increasing of luminance, which basically matches the light and dark adaptation mechanisms. In the photopic condition, the cone system makes more contribution, while the rod system is more sensitive in the scotopic range. Thus, the fused luminance response is given by

$$\begin{aligned} {L_{out}}(x,y) = w(x,y) \cdot {G_{cone}}(x,y) + (1 - w(x,y)) \cdot {G_{rod}}(x,y) \end{aligned}$$
(11)

2.3 Recovering of the Color Image

To recombine the luminance information into a color image and assure color remains stable, we keep the ratio between the color channels constant before and after compression [10]. Considering that for some input images with high saturation, the output images may appears over saturation when using this simple rule, we add an exponent s to control the saturation of the output image [21], which is described as

$$\begin{aligned} I_{out}^c(x,y) = {L_{out}}(x,y){\left( {\frac{{I_{in}^c(x,y)}}{{{L_{in}}(x,y)}}} \right) ^s},{} {} {} {} {} {} {} {} c \in \{ R,G,B\} \end{aligned}$$
(12)

where \(I_{out}^c(x,y), c \in \{ R,G,B\}\) are the RGB channels of the output image, \({L_{in}}(x,y)\) is replaced using \({L_{cone}}(x,y)\). The exponent s is given as a parameter between 0 and 1, and we set \(s=0.8\) for the most scenes.

Fig. 3.
figure 3

Comparison of the results with local algorithms on the indoor “office” image.

3 Experimental Results

In this section, we evaluated the performance of the proposed method by comparing our method with some existing algorithms. We considered two representative local algorithms including the methods proposed by Durand and Dorsey [18] and Meylan et al. [15], and one typical global algorithm proposed by Pattanaik et al. [20]. The model of Durand et al. uses a bilateral filter to integrate the local intensity. The algorithm proposed by Meylan et al. is an adaptive Retinex model. The method of Pattanaik et al. is aimed at simulating certain mechanisms of human visual system, but with a global operation.

Fig. 4.
figure 4

Comparing the results with local algorithms (inside scenes).

Fig. 5.
figure 5

Comparison of the results with local algorithms on four outdoor scenes.

Fig. 6.
figure 6

Comparing the results with a global algorithm.

Experimental results are shown in Figs. 3, 4, 5, 6 and 7. In Fig. 3, we tested the methods with an indoor HDR scene named as “office”. Figure 3(a)–(d) lists the results of the whole image (the first row), the local details in the dark area (the second row), and the local details in the bright area (the third row). For the dark regions, our and Meylan’s results are better than that of the Durand’s method. For the bright region, our method obtains good luminance compression and preserves more details. The last row (Fig. 3(e)) shows different exposure levels of the HDR image, it clearly shows that our model and Durand’s method still keep the wall a little blue which is similar to the original HDR image, but the Meylan’s result has serious color cast, e.g., the wall appears even a little yellow. Other two inside HDR scenes are also shown in Fig. 4. Durand’s model has serious contrast reversals artifacts around high contrast edges. For instance, unlocked lamps on the ceiling in top image of Fig. 4, their edges are out of shape.

We also selected four outdoor HDR scenes to give more comparisons. The results are shown in Fig. 5. In some situations, Durand’s algorithm could lose some information, while our method preserves more clear details in both the dark and light situations with less color cast.

Fig. 7.
figure 7

comparison on Napa Valley and Hotel Room from MPII. In every scene from top to bottom and left to right there are the original HDR image, our results, Ashikhmin [1], Drago [5], Durand [6], Fattal [7], Reinhard [22], Drago’s Retinex, Tumblin [24] and Ward [26] respectively.

Table 1. Quantitative comparison on images from MPII with entropy as metric.

We also compared our model with the method of Pattanaik et al. [20], a global tone reproduction operator (shown in Fig. 6). The results of Pattanaik et al. are cited from pfstools (http://pfstools.sourceforge.net/tmo_gallery/). We can see that Pattanaik’s results lose much color information and have lower contrast.

We further compared our method with Ashikhmin’s tone mapping algorithm [1], Drago’s adaptive logarithmic mapping [5], Durand’s bilateral filtering [6], Fattal’s gradient domain compression [7], Reinhard’s photographic tone reproduction [22], Tumblin’s fovea interactive method [24] and Ward’s contrast-based operator [26]. The results of these methods are downloaded from the Max Planck Institut Informatik (MPII) (http://resources.mpi-inf.mpg.de/tmo/NewExperiment/TmoOverview.html). Two examples of Napa Valley and Hotel Room scenes in MPII dataset are shown in Fig. 7. We calculated the entropy values as quantitative metric. The entropy of our method and the seven compared methods on five scenes are listed in Table 1. A higher entropy score means the richer information contained in an image, indicating the better performance of the method.

4 Conclusion

In this paper, we proposed a visual adaptation model based on the retinal adaptation mechanisms to compress the dynamic range of HDR images. The model has considered the sensitivity changes based on the light and dark adaptation mechanisms and the receptive field property of retinal ganglion cells. We have compared our results with several typical compression operators, on both indoor and outdoor scenes. In general, the proposed model can efficiently compress the dynamic range and enhance the local contrast. Considering that some high level information in the visual system can usually serve as an important guidance to improve the low level visual processing, in the future work we will consider to add the global visual effect when compressing the dynamic range.