1 Introduction

The human visual system can handle massively different levels in input brightness. This is necessary to cope with the large range of luminance levels that appear around us—for us to be able to navigate and operate in dim night light as well as in bright sun light. The field of high dynamic range (HDR) imaging tries to address the problem of capturing and displaying these large ranges using devices (cameras and displays) with limited dynamic range. During the last years, the HDR field has grown, and today, many camera devices have built in functionality for acquiring HDR images. This can be done in hardware using sensors with pixels that can capture very large differences in dynamic range. It can also be done by taking several low dynamic range (LDR) images at different exposures and then combining them using software [10, 19, 45]. Recently the capability of recording HDR video data has arisen. One important part when working with HDR data is the ability to visualize it on LDR displays. The process of transferring an HDR image to an LDR image is known as tone mapping. Depending on the application the role of the tone mapping operator (TMO) can be different, but in most applications the ability to capture both detail in darker areas and very bright ones is important. Tone mapping can also be an important component in image enhancement for, e.g., images taken under poor lighting [34, 59].

In [15] a number of criteria that an ideal TMO should exhibit are listed, namely

  1. 1.

    Temporal model free from artifacts such as flickering, ghosting and disturbing (too noticeable) temporal color changes.

  2. 2.

    Local processing to achieve sufficient dynamic range compression in all circumstances while maintaining a good level of detail and contrast.

  3. 3.

    Efficient algorithms, since large amount of data need processing, and turnaround times should be kept as short as possible.

  4. 4.

    No need for parameter tuning.

  5. 5.

    Calibration of input data should be kept to a minimum, e.g., without the need of considering scaling of data.

  6. 6.

    Capability of generating high-quality results for a wide range of video inputs with highly different characteristics.

  7. 7.

    Explicit treatment of noise and color.

We will in this paper present a new framework for doing automatic tone mapping of HDR image and video data. Our procedure addresses all criteria except 2 above. Our approach is spatially global but temporally local, to ensure temporal consistency. Even though we do not do any local processing in the image domain, we believe that our approach gives a high level of contrast. Depending on how the HDR data were constructed or recorded, there may be a need for noise reduction of the data. We do not consider this aspect in this paper, but concentrate solely on the tone mapping.

We will look at the tone mapping problem as a clustering problem. If we are given an input image with large dynamic range, i.e., with a large range of intensity values, we want to map these intensity values to a much smaller range. We can describe this problem as a clustering of the input intensity levels into a smaller set of output levels. In our setting we are looking at an HDR input with a very large input discretization, and performing clustering in three dimensions is not tractable. Iterative local algorithms will inevitably lead to local minima. Instead we work with only the luminance channel, and we show how we can find the global optimum using dynamic programming. This leads to a very efficient and stable tone mapping algorithm. We call our algorithm democratic tone mapping since all input pixels get to vote on which output levels we should use. The method has a very natural extension to handle HDR video input. The material presented here is partly based on the conference paper [38].

1.1 Related Work

A large number of tone mapping algorithms have been proposed over the years. Some of the first work include [53, 58]. One can divide the tone mapping algorithms into global algorithms, that apply a global transformation on the pixel intensities, and local algorithms, where the transformation also depends on the spatial structure in the image. For a discussion on the differences see [60]. The global algorithms include simply applying some fixed function such as a logarithm or a power function. In [12] the authors present a method that adapts a logarithmic function to mimic the human visual system’s response to HDR input. In [27] the image histogram is used and a variant of histogram equalization is applied but with additional properties based on ideas from human perception. Using histogram equalization will often lead to not efficiently using the colorspace, due to discretization effects.

The local algorithms usually apply some form of local filtering to be able to increase contrast locally. This often comes at the cost of higher computational complexity and can lead to strange artifacts. In [13] the authors use bilateral filtering to steer the local tone mapping. In [37] a perceptual model is used to steer the contrast mapping, which is performed in the gradient domain. In [35] the authors address the problem of designing display-dependent tone mappings. In [44] the authors propose an automatic version of the zone system developed by Ansel Adams for conventional photographic printing. The method also includes local filtering based on the photographic procedure of dodging and burning. The global part of their method is in spirit similar to our approach. In [29] they use K-means to cluster the image into regions and then apply individual gamma correction to each segment. For general overview of tone mapping we refer to the book [43].

Tone mapping is not only important for images, but also for HDR video. This field has received increasing interest during the last years, much due to the fact that HDR video data are gaining popularity [22, 25, 40, 52]. Some of the earliest extensions to video simply apply TMOs image wise, adding temporal filtering to avoid flickering [22, 35, 41]. These simple operators can be targeted at real-time processing [23]. A number of operators are based on ideas mimicking the human visual system, some of which are global, e.g., [16, 21, 39, 55] and some local, e.g., [5, 28]. Some recent methods use more involved local processing and filtering schemes to achieve high local contrast and also reduce noise [2, 4, 6, 7, 14]. For overview and evaluation of video tone mapping operators see [15, 40].

In this paper, we address the problem of tone mapping as a clustering problem. This is not an entirely new idea in the realm of quantization of images. The idea of clustering color values was popular during the 1980s and 1990s, when the displays had very low dynamic range, and the object was to take an ordinary 24-bit color image and map it to a smaller palette of colors that could be displayed on the screen. In this case there have been a number of algorithms that use variants of K-means clustering, see [8, 47, 48]. Here the clustering was done on three-dimensional input, i.e., color values. The algorithms used variants of the standard K-means [31] to avoid local minima. The number of input points was quite small, and the number of output classes, i.e., the palette, was relatively small so these methods worked well, but (as shown in Sect. 6.2) they are prone to get stuck in local minima, when the size of the problem increases.

2 Problem Formulation

We will begin by formulating our clustering problem for a luminance input image. Then, we will describe how this can be applied to RGB images and video inputs in the following sections. Let us consider the following problem. We are given an input gray value image, I(xy), with a large number (N) intensity levels. We would like to find an approximate image \({\hat{I}}(x,y)\) with a smaller number (K) intensity levels, i.e., we would like to solve

$$\begin{aligned} \min _{c_1,c_2,\ldots ,c_K}||I-{\hat{I}}||_2, \end{aligned}$$
(1)

where

$$\begin{aligned} I(x,y) \in \{u_1,u_2,\ldots ,u_N\} \end{aligned}$$
(2)

and

$$\begin{aligned} {\hat{I}}(x,y) \in \{c_1,c_2,\ldots ,c_K\}. \end{aligned}$$
(3)

If we calculate the histogram corresponding to the input image’s distribution, we can reformulate the problem as:

Problem 1

(K-means clustering tone mapping) Given \(K \in {\mathbb {Z}}^+\) and a number of gray values \(u_i \in {\mathbb {R}}\) with a corresponding distribution histogram \(h(i), i = 1,\ldots ,N,\) the K-means tone mapping problem is finding the K points \(c_l \in {\mathbb {R}}\) that solve:

$$\begin{aligned} D(N,K) = \min _{c_1,c_2,\ldots ,c_K} \sum _{i=0}^{N} h(i)d(u_i,c_1,\ldots ,c_K)^2, \end{aligned}$$
(4)

where

$$\begin{aligned} d(u_i,c_1,\ldots ,c_K) = \min _l |u_i-c_l|. \end{aligned}$$
(5)

This is a weighted K-means clustering problem. One usually solves it using some form of iterative scheme that converges to a local minimum. A classic way of solving it is alternating between estimating the cluster centers \(c_l\) and the assignment of points \(u_i\) to clusters. If we have assigned n points \(u_i\) to a cluster l, then the best estimate of \(c_l\) is the weighted mean

$$\begin{aligned} c_{\{1,\ldots ,n\}} = \frac{\sum _{i=1}^n h(i)u_i}{\sum _{i=1}^n h(i)}. \end{aligned}$$
(6)

For ease of notation we will henceforth use the notation \(c_l\) for cluster number l or \(c_{\{1,\ldots , n\}}\) for the cluster corresponding to points \(\{u_1,\ldots ,u_n\}\). The contribution of this cluster to the error function (4) is then equal to:

$$\begin{aligned} f(u_1,\ldots ,u_n)=\sum _{i=1}^n h(i)(u_i-c_{\{1,\ldots , n\}})^2. \end{aligned}$$
(7)

The assignment that minimizes (4) given the cluster centers \(c_l\) is simply taking the nearest \(c_l\) for each point \(u_i\). One can keep on iteratively alternating between assigning points to clusters and updating the cluster centers according to (6). It can easily be shown that this alternating scheme converges to a local minimum, but there are no guarantees that this is a global minimum. In fact for most problems, it is highly dependent on the initialization. There are numerous ways of initializing. See [50] for an extensive review of K-means clustering methods.

The K-means clustering problem is in general NP-hard, for most dimensions, sizes of input and number of clusters, see [1, 9, 33, 57] for details. However, since the points we are working with are one-dimensional, i.e., \(u_i \in {\mathbb {R}}\) , we can actually find the global minimum of problem 1 using dynamic programming. This is what makes our method tractable. In the next section we will describe the details of our approach. We will in Sect. 4 describe how we use our solver to construct a tone mapping method for color images, and in Sect. 5 how we extend it to video input.

3 A Dynamic Programming Scheme

Problem 1 is a weighted K-means clustering problem, with data points in \({\mathbb {R}}\). We will now show how we can devise a dynamic programming scheme that accurately and quickly gives the minimum solution to our problem. For details on dynamic programming see, e.g., [24].

We use an approach similar to [3, 57] and modify it to fit our weighted K-means problem. We will now show the recurrence relation for our problem:

Theorem 1

For any \(n>1\) and \(k>1\) we have

$$\begin{aligned} D(n,k)=\min _{k \le i \le n} (D(i-1,k-1)+f(u_i,\ldots ,u_n)). \end{aligned}$$
(8)

Proof

Since our data points are one-dimensional, we can sort them in ascending order. Assume that we have obtained a solution D(nk) to (4) and let \(u_i\) be the smallest point that belongs to cluster k. Then it is clear that \(D(i-1,k-1)\) is the optimal solution for the first \(i-1\) points clustered into \(k-1\) sets. \(\square \)

Equation (8) defines the Bellman equation for our dynamic programming scheme and gives us our tools to solve problem 1. We iteratively solve D(nk) using (8) and store the results in an \(N \times K\) matrix. The initial values for \(n=1\) or \(k=1\) are given by the trivial solutions. We can read out the optimal solution to our original problem at position (NK) in the matrix. The clustering and the cluster centers of the optimal solution are then found by backtracking in the matrix. In order to efficiently calculate D(nk) we need to be able to iteratively update the function f from (7). We start by calculating the cumulative distribution H(i) of h(i), given by

$$\begin{aligned} H(i) = \sum _{j=1}^{i} h(j), \quad i=1,\ldots ,N. \end{aligned}$$
(9)
figure b

Theorem 2

For a point set \(\{u_1,\ldots ,u_n\}\) the error contribution of those points can be updated by

$$\begin{aligned} f(u_1,\ldots ,u_n)&= \sum _{i=1}^{n} h(i)(u_i-c_{\{1,\ldots , n\}})^2 \end{aligned}$$
(10)
$$\begin{aligned}&= \sum _{i=1}^{n-1} h(i)(u_i-c_{\{1,\ldots , n-1\}})^2 \end{aligned}$$
(11)
$$\begin{aligned}&\qquad + \frac{h(n)H(n-1)}{H(n)}(u_n-c_{\{1,\ldots , n-1\}})^2, \end{aligned}$$
(12)

where the weighted mean of the point set \(\{u_1,\ldots ,u_n\}\) is updated by:

$$\begin{aligned} c_{\{1,\ldots , n\}} = \frac{h(n)u_n+H(n-1)\cdot c_{\{1,\ldots , n-1\}}}{H(n)}. \end{aligned}$$
(13)

Proof

We can, without loss of generality, assume that we have transformed the coordinates so that \(c_{\{1,\ldots , n-1\}}=0\) and hence \(\sum _{i=1}^{n-1}h(i)u_i = 0\). This gives according to (13):

$$\begin{aligned} c_{\{1,\ldots , n\}} = h(n)u_n/H(n), \end{aligned}$$
(14)

giving us

$$\begin{aligned} f(n)&= \sum _{i=1}^{n} h(i)(u_i-c_{\{1,\ldots , n\}})^2 \end{aligned}$$
(15)
$$\begin{aligned}&=\sum _{i=1}^{n} h(i)\left( u_i{-}\frac{h(n)u_n}{H(n)}\right) ^2. \end{aligned}$$
(16)

We can write this as

$$\begin{aligned} f(n)&= f(n-1) + h(n)\left( u_n-\frac{h(n)u_n}{H(n)}\right) ^2 \end{aligned}$$
(17)
$$\begin{aligned}&= f(n-1) + h(n)u_n^2\frac{H(n-1)^2}{H(n)^2}. \end{aligned}$$
(18)

The first part can be simplified as

$$\begin{aligned} f(n-1)&= \sum _{i=1}^{n-1} h(i)\left( u_i-\frac{h(n)u_n}{H(n)}\right) ^2 \end{aligned}$$
(19)
$$\begin{aligned}&= \sum _{i=1}^{n-1} h(i) \left( u_i^2-2u_i\frac{h(n)u_n}{H(n)} +\frac{h(n)^2u_n^2}{H(n)^2}\right) \end{aligned}$$
(20)
$$\begin{aligned}&= \sum _{i=1}^{n-1} h(i) u_i^2 + \frac{H(n-1)h(n)^2u_n^2}{H(n)^2}. \end{aligned}$$
(21)

Combining (17) and (19) then yields

$$\begin{aligned} f(n) =&\sum _{i=1}^{n-1} h(i) u_i^2 {+} \frac{H(n-1)h(n)^2u_n^2 {+} h(n)u_n^2 H(n-1)^2}{H(n)^2} \end{aligned}$$
(22)
$$\begin{aligned} =&\sum _{i=1}^{n-1} h(i) u_i^2 +\frac{H(n-1)h(n)u_n^2}{H(n)}\frac{(h(n)+H(n-1))}{H(n)} \end{aligned}$$
(23)
$$\begin{aligned} =&\sum _{i=1}^{n-1} h(i) u_i^2 +\frac{h(n)H(n-1)}{H(n)}u_n^2. \end{aligned}$$
(24)

\(\square \)

Without using (12) each entry D(nk) would take \(n^2\) iterations to calculate, and the total complexity would become \(N\cdot K \cdot N^2 = N^3K\). However using (12) we can compute \(f(u_1,\ldots ,u_n)\) in constant time, and this gives a total complexity of \(N^2K\).

4 Tone Mapping of HDR Color Images

The discussion in the previous section was concerned with grayscale images. In this section we will describe the whole algorithm for an HDR color input image. We assume an RGB input image. Algorithm 1 is based on that the input points \(u_i\) are one-dimensional. There are a number of ways in which one could apply the clustering on a color image, including working in numerous different color space representations. We will here describe two fast, efficient and color-preserving methods. They are both based on running Algorithm 1 on the luminance channel.

We start by estimating the intensity channel \(I_{\mathrm{gr}}\) from the RGB image. One could use several estimates of the intensity, such as estimating the luminance using a standard weighted average. We have found however that our method performs best when we use the maximum of the three channels as our intensity. We then do a preprocessing step by taking the logarithm of \(I_{\mathrm{gr}}\), giving us \(I_{\log }\). We can then run Algorithm 1. When we have clustered the intensity channel into the desired K levels, we calculate our transfer function \(F(s): \{u_1,\ldots ,u_N\} \rightarrow \{1,\ldots , K\} \) by finding the nearest neighbor of each input level s:

$$\begin{aligned} F(s) = \mathop {arg\,min\,}\limits _l | c_l-s|. \end{aligned}$$
(25)

The output image \(I_{\mathrm{out}}\) is then constructed in one of two ways,

$$\begin{aligned} I^{\mathrm{ch}}_{\mathrm{out}}=F\left( I^{\mathrm{ch}}_{\log }\right) , \end{aligned}$$
(26)

or

$$\begin{aligned} I^{\mathrm{ch}}_{\mathrm{out}}=\frac{I^{\mathrm{ch}}}{I_{\mathrm{gr}}} \cdot F(I_{\mathrm{gr}}), \end{aligned}$$
(27)

where ch denotes the color channel, i.e. , R, G or B. Equation (26) simply applies the function F on the whole RGB image pixel-wise, whereas (27) applies the function F on the intensity channel, and then each color channel is given by multiplying with the color weight of each pixel. Using (27) corresponds to the way that was proposed in [49]. The different steps are summarized in Algorithm 2. Using equation (26) usually gives very good results, but can in some cases give somewhat subdued colors. Using equation (27) on the other hand will in general give more saturated colors, but can in some cases give exaggerated colors. This has been noted before, and in [54], the suggestion was to instead use

$$\begin{aligned} I^{\mathrm{ch}}_{\mathrm{out}}=\left( \frac{I^{\mathrm{ch}}}{I_{\mathrm{gr}}}\right) ^q \cdot F(I_{\mathrm{gr}}), \end{aligned}$$
(28)

where q controls color saturation. It can actually be shown that (26) actually corresponds closely to (28) for a special choice of q, [36].

figure c

In Fig. 1 the two extreme cases are exemplified for two test images. The top shows the result of running Algorithm 2 using (26), and the middle row shows the result of using (27). For the left image the colors are better reproduced in the middle image, but for the right image the top row is better. A simple way of controlling the color is taking a weighted average of the two outputs. This is shown in the bottom of Fig. 1. For the example images that we have tested, we have found that this gives a good trade-off. In [36] a different choice of color correction model was suggested, namely

$$\begin{aligned} I^{\mathrm{ch}}_{\mathrm{out}}=\left( \left( \frac{I^{\mathrm{ch}}}{I_{\mathrm{gr}}}-1\right) q+1\right) \cdot F(I_{\mathrm{gr}}), \end{aligned}$$
(29)

where again q controls the color saturation. They further suggest choosing q as a certain sigmoid function of the contrast factor of the tone mapping function. We have tried various versions of this, but found that they give over-saturated colors for our tone mapping function. In Fig. 2 example outputs are shown. Here the luminance preserving sigmoid function was chosen, and the slope of the tone mapping function was used as contrast factor (see [36] for details). Since our tone mapping function is piece-wise constant, the slope is zero everywhere. But a natural choice of slope is to use the derivative of a continuous piece-wise linear function as approximation.

Fig. 1
figure 1

The figure shows the different color outputs for two example images. Top shows the result of running Algorithm 2 using (26), the middle shows the result of using (27), and the bottom row shows the output using a linear combination of the two (in this case \(0.7 \cdot (26)+0.3 \cdot \) (27)). For the left image the colors are better reproduced in the middle row, but the right image is better in the top row. The bottom row shows the linear combination, which gives a very good compromise for the example images that we have tested

Fig. 2
figure 2

The figure shows the different color outputs for two example images using the color model (29). The luminance preserving sigmoid function was chosen, and the slope of the tone mapping function was used as contrast factor (see [36] for details). The right image suffers from over-saturation of colors

5 Tone Mapping of HDR Video

The tone mapping procedure described in the previous sections can easily be extended to efficiently handle video input. The most basic approach is to run Algorithm 2 on every frame. We will modify this approach in two ways, to give an efficient and temporally coherent output. First of all one can notice that the most time-consuming step of Algorithm 2 is running the dynamic programming. The complexity of this depends on the number of input bins and output bins, but not on the image size since it uses the distribution of gray values. This means that it would be just as costly to run the dynamic programming on one frame as it would be to run it on the whole sequence. Running it on only one frame will in many cases lead to unrobust behavior. Using it on the whole sequence will on the other hand lead to not using the maximum range for individual frames. We suggest using the distribution of a small number of neighboring frames for each frame.

For most scenes the gray value distribution will not change dramatically within a couple of frames. If running times are a priority, we propose the use of key frames, where the tone mapping function is estimated for each key frame using the dynamic programming scheme, and for intermediate frames the tone mapping function is interpolated linearly between the two nearest key frames.

Key frames can be chosen either with fixed intervals or when the gray value distributions differ significantly. A suitable metric for determining the difference is the Wasserstein metric or the Earth mover’s distance (EMD), see [30, 56]. For two finite discrete one-dimensional distributions, represented by their histograms \(h_1\) and \(h_2\), the EMD distance d can be calculated in closed form. If \(H_1\) and \(H_2\) are the cumulative distributions of \(h_1\) and \(h_2\), then

$$\begin{aligned} d = \sum _{i=1}^{N} |H_1(i)-H_2(i)|, \end{aligned}$$
(30)

where N is the number of bins in the histograms.

The different steps are summarized in Algorithm 3.

figure d

Note that Algorithm 3 is described for an offline scenario, but it could just as easily be run as an online algorithm for a real-time application, where the key frames are estimated online. In the experiments in the paper, we used a fixed distance of \(\varDelta t = 20\) frames between key frames. We further used \(n=1\) (i.e. , the key frame histogram is based on three frames). We need to estimate the two neighboring key frames to linearly interpolate between them. This means that we get a lag of \(20+1=21\) frames, which for many applications is not a problem.

6 Results

We have implemented our tone mapping algorithm and conducted a number of tests. First we show in Sect. 6.1 that our method gives a substantially better optimum compared to standard K-means clustering. In Sect. 6.2 we study the time complexity of our algorithm, and then in Sect. 6.3, we show results on a set of standard HDR images and compare with a number of different tone mapping algorithms. We have consistently used \(K=256\) in our experiments, corresponding to 8-bit output, but this could be set to any output quantization you like. In Sect. 6.4 we test our algorithms on HDR video input.

Fig. 3
figure 3

Comparison between standard K-means and our optimal approach using \(K=5\) clusters (Left) and \(K=256\) clusters (Right). Top shows histograms over the \(l^2\)-norms of the differences between the global optimum and local optimums (based on 200 random initializations of standard K-means.) Bottom shows the final energy of the 200 solutions, with also the global optimum depicted in solid green. One can see that for \(K=5\) we get very similar results, and the local optimization leads to solutions close to the global one in most cases. For \(K=256\) on the other hand, the local optimization gives solutions far from the optimum, both in terms of the actual solution and the final error

Fig. 4
figure 4

Silhouette plots from the different clustering examples. Left shows the five clusters, when \(K=5\). Top shows one run of standard K-means, and bottom shows our optimal method. In this case both methods give very similar results, and both show balanced clustering silhouettes. Right shows the clustering when \(K=256\). Top is again one run of standard K-means, and bottom is our method. For visualization purposes, only a random five of the 256 clusters are shown. One can see that in this case, our method gives more balanced silhouettes than standard K-means

6.1 Comparison with Standard Iterative K-means

A standard iterative K-means algorithm will converge to a local minimum. Algorithm 1 will converge to the global minimum, but one may ask how often the local iterative scheme gets stuck in a local minimum, and how far this is from the global optimum. In order to investigate this we did some simple qualitative experiments where we ran Algorithm 1 on an input image (with 5000 gray levels). Our hypothesis is that for larger K the ordinary K-means will get stuck in local optima. To check this, we then ran the standard iterative K-means clustering and compared the resulting solution to the global optimum. We repeated this for a large number of runs with random initialization. We tried this for a smaller K (= 5) and a larger K (= 256). In Fig. 3 the results are shown. It shows at the top, histograms over the \(l^2\) differences between the local solution for the cluster centers and the true solution. The center points were of course sorted before the norm was taken. Also shown are plots of the resulting error energy (4) for the different runs, with the global optimum also plotted. The figures clearly show that in this case K-means finds the global optimum for the smaller K, but there was a large difference between the local solutions and the true solution for the large K, and in no case was the true optimum found using the local iterative method (right of Fig. 3). To further investigate the solutions we constructed silhouette plots [46] that can be seen in Fig. 4. For the small K there is little difference, but for the large K the optimal solution has a more even silhouette than the example run, based on standard K-means. For visibility purposes only five random clusters out of the 256 are shown. In Table 1, statistics from the silhouettes are shown. The table shows the mean and standard deviation of the silhouettes, for the optimal method compared to three runs of standard K-means. Again one can see that there is little difference for \(K=5\), but for \(K=256\), the optimal algorithm achieves less spread than standard K-means.

Table 1 Statistics from the silhouette plots

As a final check we calculated Fleiss’ kappa index [17]. This measures how well a set of different clusterings agree, on a scale where a value that is negative or close to zero indicates low agreement, and a value close to one indicates high agreement. In this case we got 0.96 for \(K=5\), which means that all the clusterings from the different runs (including the optimal solution) agree to a very high extent. For \(K=256\) the Fleiss’ kappa index was equal to 0.05, with little agreement. All the different measures point in the same direction that for the high-dimensional case that we are interested in, the optimal dynamic programming gives a much better solution than standard K-means. For our tone mapping application, the choice of \(K=256\) is a natural one, but one could still ask whether the number of clusters could be chosen by other means. One way of selecting K is by investigating the so-called gap statistic [51]. For the images that we have tested, we get a much lower value than 256 if we look at the gap statistic, typically around 5. This probably reflects more of the global modality of the distribution, than the smaller characteristics of the distribution that we want to capture. It does suggest that if speed is of very high priority, a smaller K could be chosen, and then some simpler interpolation scheme could be used in between the clusters. We have however not investigated this further.

Fig. 5
figure 5

Left shows execution time for running the complete Algorithm 2 as a function of the output discretization K. Right shows the execution time as a function of the square of the input discretization N. The plots follow the predicted behavior of the algorithm, i.e. , linear in the number of output bins K and quadratic in the number of input bins N

Fig. 6
figure 6

The figure shows cutouts from the results on (top to bottom) NancyChurch3, Rosette and NancyChurch1. HDR radiance maps courtesy of Mantiuk [20] and Debevec [42]. The figure shows from left to right [35, 44] and our method. One can see that the compared methods suffer from over-saturation, color artifacts and loss of detail. The results are best viewed on screen

6.2 Algorithm Complexity and Stability

Next we wanted to check whether the total algorithm followed the expected complexity. In order to do this we ran Algorithm 2 on a randomly generated input image and varied the number of input gray levels N and the number of output gray levels K. Our implementation was done in MATLAB, with the most time-consuming step, i.e., the dynamic programming part, done using compiled mex-functions. The MATLAB and mex files can be downloaded from [11]. All tests were conducted on a desktop computer running Ubuntu, with an Intel Core i7 3.6 GHz processor. The results are shown in Fig. 5, where the respective dependences on K and N are shown. Here we set \(K=256\) when N varied, and \(N=2000\) when K varied. The most time-consuming part of the algorithm is the dynamic programming step, and this is linear in K and quadratic in N which is validated in the graph.

6.3 Results on HDR Images

We have tested our method on a number of HDR input images and compared with a number of standard tone mapping algorithms. The images were collected from R. Mantiuk [20] and P. Debevec [42]. We used the HDR image tool [32] to do the processing. It contains implementations of a number of tone mapping algorithms. Throughout our tests, we have only used the default parameter settings as supplied by Luminance HDR. It is probably so that in some cases better results can be found by tweaking the parameters manually, but since our method does not contain any parameters and the goal for us was to have an automatic system we opted for the default parameters. We have compared our method to the methods of [12, 13, 35, 37, 44]. Of these we found that [35] and [44] gave significantly better results over the set of test images. Our method gave very similar results to these two methods. In Fig. 6 we show magnified cutouts from three example images. Here we can see that the two compared methods (on the left) exhibit problems with over-saturation, loss of detail resolution and color artifacts. In Fig. 7 the output of our algorithm is shown for a number of different input HDR images.

Fig. 7
figure 7

The result of running our Algorithm 2 on a number of HDR images. No parameters need to be set to produce the output. The results are best viewed on screen. HDR radiance maps courtesy of Debevec [42] and Mantiuk [20]

Throughout our tests we used a fixed discretization \((N=5000)\) for the input intensity distribution. We have experimented with higher values of N but this does not give any noticeable effects. Further more, in our algorithm, empty bins are removed before the dynamic programming step, and the complexity of the algorithm will only depend on the number of non-empty bins. Sampling the input distribution with higher N will result in slightly finer modeling of the input luminance distribution, but most added bins will be empty. (So in this sense the complexity depends mostly on the actual input distribution and not on the fixed value of 5000).

6.4 Results on HDR Video

We have run our algorithm on a number of test HDR videos from [25, 26, 40], that were also used in the evaluation in [15]. An overview of the sequences can be seen in Table 2. We used the same parameter settings for all input videos. The number of input bins was fixed at \(N=5000\). We used seven neighboring images to estimate the key frame gray value distributions, and we used a fixed distance of 20 frames between key frames. We then ran Algorithm 3. In Fig. 9 the output of two of the frames for a number of sequences is shown. Even though the luminance greatly changes both spatially and temporally, we achieve significant contrast overall. We do not experience any flickering, color artifacts or ghosting. The dynamic programming solver was the same as for the still images, i.e. , a mex MATLAB implementation. The rest of the steps in Algorithm 3 were implemented in MATLAB. The total running times for our tone mapping on the different test sequences are shown in Table 2. The resulting corresponding framerates are also shown. In order to investigate the time dependence of our algorithm further, we conducted the same experiment as described in [15]. In this test, the output intensity for two spatial locations is plotted as functions of time or frame number. We chose the same sequence (the student sequence) and the same two locations as in [15], corresponding to two points with different temporal behavior. Four frames from our output are shown in Fig. 8 with the two points overlaid in red and green, respectively. The intensities for the two points are shown in Fig. 10. One can see that in this case there is no apparent flickering, as well as no saturation or overshooting issues.

Table 2 The tested HDR sequences from [25, 40]
Fig. 8
figure 8

Four frames from the output of Algorithm 3 with the student sequence as input. Also overlaid in red and green are the two locations whose time dependence is shown in Fig. 10

Fig. 9
figure 9

The figure shows two frames (left and right, respectively) of the output from Algorithm 3, using (from top to bottom) the window sequence, the hallway sequence, the hallway 2 sequence, the driving sequence and the exhibition sequence. The results are best viewed on screen

Fig. 10
figure 10

The time dependence of the output intensity at the two points shown in Fig. 8 (indicated by the red and green dots). The graph shows the intensity that results from running Algorithm 3 on the student sequence. One can see that there are no apparent problems with flickering, overshooting or over-saturation for these points

We have also done comparisons to a number of state-of-the-art HDR video tone mapping algorithms. We used four sequences from the dataset of Froelich et al. [18], namely Poker fullshot, Smith hammering, Cars fullshot and Showgirl 2.

We have compared our results with three state-of-the-art video tone mapping algorithms, the zonal brightness coherency method of Boitard et al. [7], the temporally coherent local tone mapping method of Aydin et al. [2] and the real-time noise-aware tone mapping of Eilertsen et al. [14]. In Fig. 11 some resulting output frames are shown, with magnified cutouts. All methods generally give outputs with overall good brightness and contrast, with little temporal flickering and artifacts. Due to the local filtering of the compared methods, they exhibit stronger local contrast, but this can in some cases lead to artificial details and cartoonishness. We have also conducted a qualitative subjective comparison between the different tone mapping operators. We follow the setup in [14], where a number of persons evaluated the four test sequences, with respect to artifacts and image quality. Specifically the tone mapping operators were graded on overall brightness, overall contrast, overall color saturation, temporal color consistency, temporal flickering, ghosting, excessive noise and detail reproduction. We used a 32” \(1920 \times 1080\) BenQ BL3200PT LCD monitor, with a peak luminance of \(300\hbox { cd/m}^2\). The sequences were shown double blind in random order to ten persons, who graded them according to the above scale. The results can be seen in Fig. 12, where the results for the four tested tone mapping operators are shown, from top to bottom our method, Aydin et al. [2], Boitard et al. [7] and Eilertsen et al. [14]. One can see that all methods introduce very little artifacts. There is slightly more variation in the evaluation of the image characteristics, which can be expected. One can see that our method compares favorably with the other state-of-the-art methods. Note that our method is the only global method of the four tested algorithms.

Fig. 11
figure 11

Individual frames from the sequences in [18]. Top two rows show the output from our method, with magnified sections. The next two rows show the result of Eilertsen et al. [14], and the lower two rows show the result of Aydin et al. [2]. The competing methods give in some areas slightly higher local contrast, but also suffer from some artifacts. In the highlights of the Cars fullshot sequence, both Aydin and Eilertsen show color saturation artifacts. In the Showgirl 2 sequence, the local contrast amplification of Aydin gives rise to the appearance of facial scarring. In the challenging Smith hammering sequence, Aydin again has problems with saturation and Eilertsen lacks medium-level contrast at the cost of having high detail contrast. This gives the fire a very artificial color appearance. The results are best viewed on screen

Fig. 12
figure 12

The result of the subjective evaluation of the four tested tone mapping algorithms. The figure shows from top to bottom the result using our method, Aydin et al. [2], Boitard et al. [7], and Eilertsen et al. [14]. All methods give satisfactory results, with minor variations. Note that our method is the only purely global method

7 Conclusion

We have in this paper presented a novel tone mapping algorithm that is based on K-means clustering. We solve the clustering problem using a dynamic programming approach. This enables us to not only solve the clustering problem efficiently but also find the global optimum. The clustering algorithm runs in \(O(N^2K)\) for N input luminance levels and K output levels. We have shown how this can be used to tone map both HDR color image and video data. Our algorithm gives comparable result to state-of-the-art tone mapping algorithms and with the large benefit of minimal need for parameter setting. In some cases in HDR imaging it is beneficial to be able to adjust the output manually, and some of the compared local methods could be tuned to get better effect in local color and contrast. But in many cases a totally automatic procedure is highly desirable.