The 2018 PIRM Challenge on Perceptual Image Super-Resolution

Blau, Yochai; Mechrez, Roey; Timofte, Radu; Michaeli, Tomer; Zelnik-Manor, Lihi

doi:10.1007/978-3-030-11021-5_21

Yochai Blau¹⁴,
Roey Mechrez¹⁴,
Radu Timofte¹⁵,
Tomer Michaeli¹⁴ &
…
Lihi Zelnik-Manor¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11133))

Included in the following conference series:

European Conference on Computer Vision

3148 Accesses
200 Citations
6 Altmetric

Abstract

This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

Y. Blau and R. Mechrez—Authors who contributed equally.

You have full access to this open access chapter, Download conference paper PDF

PIPAL: A Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

1 Introduction

The past few years have seen a major performance leap in single-image super-resolution (SR), both in terms of reconstruction accuracy (as measured e.g., by PSNR, SSIM) [11, 19, 36, 38, 39] and in terms of visual quality (as rated by human observers) [18, 24, 31, 42, 44]. However, the more SR methods advanced, the more it has become evident that reconstruction accuracy and perceptual quality are typically in disagreement with each other. That is, models which excel at minimizing the reconstruction error tend to produce visually unpleasing results, while models that produce results with superior visual quality are rated poorly by distortion measures like PSNR, SSIM, IFC, etc. [4, 13, 18, 24, 31] (see Fig. 1). Recently, it has been shown that this disagreement cannot be completely resolved by seeking for better distortion measures [1]. Namely, there is a fundamental tradeoff between the ability to achieve low distortion and low deviation from natural image statistics, no matter what full-reference dissimilarity criterion is used to measure distortion.

These observations caused the formation of two distinct research trends (see Fig. 2). The first is aimed at improving the reconstruction accuracy according to popular full-reference distortion metrics, and the second targets high perceptual quality. While reconstruction accuracy can be precisely quantified, perceptual quality is often estimated through user studies, in which, due to practical limitations, each user is typically exposed to only a small number of methods and/or a small number of images per method. Therefore, reports on perceptual quality are often inaccurate and hard to reproduce. As a result, novel methods cannot be easily compared to their predecessors in terms of perceptual quality, and existing benchmarks and challenges (e.g., NTIRE [38]) focus mostly on quantifying reconstruction accuracy, using e.g., PSNR/SSIM. As perceptually-aware super-resolution is gaining increasing attention in recent years, there is a need for a benchmark for evaluating perceptual-quality driven algorithms.

The 2018 PIRM challenge on perceptual super-resolution took part in conjunction with the 2018 Perceptual Image Restoration and Manipulation (PIRM) workshop. This challenge compared and ranked perceptual super-resolution algorithms. In contrast to previous challenges, the evaluation was performed in a perceptual-quality aware manner, as suggested in [1]. Specifically, we define perceptual quality as the visual quality of the reconstructed image regardless of its similarity to any ground-truth image. Namely, it is the extent to which the reconstruction looks like a valid natural image. Therefore, we measured the perceptual quality of the reconstructed images using no-reference image quality measures, which do not rely on the ground-truth image.

Although the main motivation of the challenge is to promote algorithms that produce images with good perceptual quality, similarity to the ground truth images is obviously also of importance. For example, perfect perceptual quality can be achieved by randomly drawing natural images that have nothing to do with the input images. Such a scheme would score quite poorly in terms of reconstruction accuracy. We therefore evaluate algorithms on a 2-dimensional plane, where one axis is the full-reference root mean squared error (RMSE) distortion, and the second axis is a perceptual index which combines the no-reference image quality measures of [22, 27]. This approach jointly quantifies accuracy and perceptual quality, thus enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. PIRM is therefore the first established benchmark for perceptual-quality driven image restoration, which will hopefully be extended to other perceptual computer-vision tasks in the future.

The outcomes arising from this challenge are manifold:

Participants introduced algorithms which well-improve upon the state of the art in perceptual SR. The submitted methods incorporated novelties in optimization objectives (losses), conv-net architectures, generative adversarial net (GAN) variants, training schemes and more. These enabled to impressively surpass the performance of baselines, such as EnhanceNet [31] and CX [24]. The results are presented in Sect. 4, and the main novelties are discussed in Sect. 6.
We validate our chosen perceptual index through a human-opinion study, and find that it is highly correlated with the ratings of human observers. This provides empirical evidence that no-reference image quality measures can faithfully assess perceptual quality. The results of the human-opinion study are presented in Sect. 4.1.
We also test the agreement of many other commonly used image quality measures with the human-opinion scores, and find that most of them are either uncorrelated or anti-correlated. This shows that most existing schemes for evaluating image restoration algorithms cannot be used to quantify perceptual quality. The results of this analysis are presented in Sect. 5.
The challenge results provide insights on the trade-off between perception and distortion (suggested and analyzed in [1]). In particular, at the low-distortion regime, participants showed considerable improvements in perceptual quality over methods that excel in RMSE (e.g. EDSR [19]), while sacrificing only a small increase in RMSE. This indicates that the tradeoff is severe in this regime. Furthermore, at the good perceptual quality regime, participants were able to improve both in perceptual quality and in distortion, over state-of-the-art perceptual SR methods (e.g.E-Net [31]). This indicates that previous methods were quite far from the theoretical perception-distortion bound discussed in [1].

2 Perceptual Super Resolution

The field of image super-resolution (SR) has been dominated by convolutional-network based methods in recent years. At first, the adopted optimization objective was an $\ell _1\slash \ell _2$ loss, which aimed to improve the reconstruction accuracy (in terms of e.g. PSNR, SSIM). While the first attempt to apply a conv-net to image SR [6] did not significantly surpass the performance of prior methods, it set the ground for major improvements in PSNR/SSIM values over the course of the several following years [10, 11, 15, 17,18,19, 34, 39, 51, 52]. During these years, the rising PSNR/SSIM values were not always accompanied by a rise in the perceptual quality. In fact, this resulted in increasingly blurry and unnatural outputs in many cases. These observations led to a significant shift of the optimization objective, from PSNR maximization to perceptual quality maximization. We refer to this new line of works as perceptual SR.

The first work to adopt such an objective for SR was that by Johnson et al. [13], which added an $\ell _2$ loss on the deep features extracted from the outputs (commonly referred to as the perceptual loss). The next major breakthrough in perceptual SR was presented by Ledig et al. [18], who adopted the perceptual loss and combined it with an adversarial loss (originally suggested for generative modeling by [9]). This was further developed in [31], where a texture matching loss was added to the perceptual and adversarial losses. Recently, [24] showed that natural image statistics can be maintained by replacing the perceptual loss with the contextual loss [25]. These ideas were further extended in e.g., [8, 35, 42, 44].

These perceptual SR methods have established a fresh research direction which is producing algorithms with superior perceptual quality. However, in all works, this has come at the cost of a substantial decrease in PSNR and SSIM values, indicating that these common distortion measures do not faithfully quantify the perceptual quality of SR methods [1]. As such, perceptual SR algorithms cannot participate in any challenge or benchmark based on these standard measures (e.g., NTIRE [38]), and cannot be compared or ranked using these common metrics.

3 The PIRM Challenge on Perceptual SR

The PIRM challenge is the first to compare and rank perceptual image super-resolution. The essential difference compared to previous challenges is the novel evaluation scheme which is not based solely on common distortion measures such as PSNR/SSIM.

Task. The challenge task is $4\times $ super-resolution of a single image which was down-sampled with a bicubic kernel.

Datasets. Validation and testing of the submitted methods were performed on two sets of 100 images each^{Footnote 1}. These images cover diverse contents, including people, objects, environments, flora, natural scenery, etc. Participants did not have access to the high-res ground truth images during the challenge, and these images were not available on any online source prior to the challenge. These image sets (high and low resolution) are now available online^{Footnote 2}. Datasets for model training were chosen by the participants.

Evaluation. The evaluation scheme is based on [1], which proposed to evaluate image restoration algorithms on the perception-distortion plane (see Fig. 3). The rationale of this method is shortly explained in the Introduction.

In the PIRM challenge, the perception-distortion plane was divided into three regions by setting thresholds on the RMSE values (regions 1 / 2 / 3 were defined by $\text {RMSE} \le 11.5/12.5/16$ respectively, see Fig. 3). In each region, the goal was to obtain the best mean perceptual quality. That is, participants attempted to move as downwards as possible in the perception-distortion plane. The perception index (PI) we chose for the vertical axis combines the no-reference image quality measures of Ma et al. [22] and NIQE [27] as

$$\begin{aligned} \text {PI} = \tfrac{1}{2} \left( (10-\text {Ma}) + \text {NIQE} \right) . \end{aligned}$$

(1)

Notice that in this setting, a lower perceptual index indicates better perceptual quality. The RMSE was computed as the square-root of the mean-squared-error (MSE) of all pixels in all images^{Footnote 3}, that is

$$\begin{aligned} \text {RMSE} = \Big (\tfrac{1}{M} \sum _{i=1}^{M} \tfrac{1}{N_i} \Vert x_i^{\text {HR}} - x_i^{\text {EST}}\Vert ^2 \Big )^{1/2}, \end{aligned}$$

(2)

where $x_i^{\text {HR}}$ and $x_i^{\text {EST}}$ are the ith ground truth and estimated images respectively, $N_i$ is the number of pixels in $x_i^{\text {HR}}$, and M is the number of images in the test set. Both the RMSE and the PI were computed on the y-channel after removing a 4-pixel border. We encouraged participants to submit methods for all three regions, and indeed many did (see Table 1).

4 Challenge Results

Twenty-one teams participated in the test phase of the challenge. Table 1 reports the top scoring teams in each region, where the team members and affiliations can be found in Appendix A. Figure 4(a) plots all test phase submissions on the perception-distortion plane (teams were allowed up to 10 final submissions). Figure 4(b) shows the correlation between our perceptual index (PI) and human-opinion-scores on the top 10 submissions (see details in Sect. 5). The high correlation justifies our choice of definition of the PI. In Fig. 5 we compare the visual outputs of several top methods in each region (the number in the method’s name indicates the region of the submission), where additional visual comparisons can be found in Appendix C. A table with the scores of all participating teams in each region can be found in Appendix B.

Table 1. Challenge results. The top 9 submissions in each region. For submissions with a marginal PI difference (up to 0.01), the one with the lower RMSE is ranked higher. Submission with marginal differences in both the PI and RMSE are ranked together (marked by $*$). We perform a human-opinion-study on the $\mathbf{top\, submissions}$ in bold (see Sect. 4.1). See the cited papers describing the submissions. Team members and affiliations can be found in Appendix A. A full table of the test phase results appears in Appendix B.

Full size table

The submitted algorithms exceed the performance of previous SR methods in all regions, pushing forward the state-of-the-art in perceptual SR. In Region 3, challenge submissions outperform the EnhanceNet [31] baseline, as well as the recently proposed CX [24] algorithm. Notice that several submissions improve upon the baselines in both perceptual quality and reconstruction accuracy, which are both important. In Region 2, the top submissions present fairly good perceptual quality with a far lower distortion than the methods in Region 3. Such methods could prove advantageous in applications where reconstruction accuracy is valuable. Inspection of the Region 1 results reveals that participants obtained a significant improvement in the PI ($45\%$) w.r.t. the EDSR baseline [19] with only a small increase in the RMSE ($7\%, 0.77$ gray-levels per-pixel).

The results provide insights on the tradeoff between perceptual quality and distortion, which is clearly noticed when progressing from Region 1 to Region 3. First, the tradeoff appears to be stronger in the low distortion regime (Region 1), implying that PSNR maximization can have damaging effects in terms of perceptual quality. In the high perceptual quality regime (Region 3), notice that beyond some point, increasing the RMSE allows only slight improvement in the perceptual quality. This indicates that it is possible to achieve perceptual quality similar to that of the current state-of-the-art methods with considerably lower RMSE values.

4.1 Human Opinion Study

We validate the challenge results with a human-opinion study. Thirty-five raters were each shown the outputs of 12 algorithms (10 top challenge submissions, 2 baselines) on 20 images (240 images per rater). For each image, they were asked to rate how realistic the image looked on a scale of $1-4$ which corresponds to: 1-Definitely fake, 2-Probably fake, 3-Probably real, and 4-Definitely real. We made it clear that “real” corresponds to a natural image and “fake” corresponds to the output of an algorithm. This scale tests how natural the outputs look. Note that users were not exposed to the original “ground truth” images, therefore this study does not test distortion in any way, but rather only perceptual quality. The mean human-opinion-scores are shown in Fig. 6.

The human-opinion study validates that the challenge submissions surpassed the performance of state-of-the-art baselines by significant margins. Region 3 submissions, and even Region 2 submissions, are considered notably better than EnhanceNet by human raters. Region 1 submissions were rated far better in visual quality compared to EDSR (with only a slight increase in RMSE). The tradeoff between perceptual quality and distortion is once more revealed, as the best attainable perceptual quality increases with the increase in RMSE. Note that while the PI is well correlated with the human-opinion-scores on a coarse scale (in between regions), it is not always well-correlated with these scores on a finer scale (rankings within the regions), which can be seen when comparing the rankings in Table 1 and Fig. 6. This highlights the urgent need for better perceptual quality metrics, a point which is further analyzed in Sect. 5.

Figure 7 shows the normalized histogram of votes per method. Notice that all methods fail to achieve a large percentage of “definitely real” votes, indicating that there is still much to be done in perceptual super-resolution. In all submitted results, there tend to appear unnatural features in the reconstructions (at $4\times $ magnification), which degrade the perceptual quality. Notice that the outputs of EDSR, a state-of-the-art algorithm in terms of distortion, are mostly voted as “definitely fake”. This is due to the aggressive averaging causing blurriness as a consequence of optimizing for distortion.

4.2 Not All Images Are Created Equal

The results presented in the previous sections show the general trends when averaging over a set of images. Interestingly, when examining single images, there can be quite a variability in SR results. First, there are images which are much easier to super-resolve than others. In such a scenario, the outputs of all SR methods tend towards high perceptual quality. Such an example can be seen on the left side of Fig. 8, where the outputs of all methods on the “grafity” image are rated fairly higher compared to the “mountain” image. In both it seems advantageous to move towards region 3, but the SR of texture-less images (such as “grafity”) will generally produce visually pleasing results. Another variation from the average trend are images which include more structure than texture. On such images, it seems that methods from region 1 which prefer accuracy succeed in maintaining large-scale structures, as opposed to generative-based methods from region 3 which tend to distort structures and often produce visually unpleasing results. For example, on the “building” image on the right side of Fig. 8, the outputs of EDSR are visually pleasing while the outputs of region 3 methods are rated unsatisfactory. However, for images with fine unstructured details such as the “carved stone” image, it is beneficial to move towards region 3. This calls for novel methods, which can either adaptively favor structure preservation vs. texture reconstruction, or employ generative models capable of outputing large-scale structured regions.

5 Analyzing Quality Measures

The lack of a faithful criterion for assessing the perceptual quality of images is restricting progress in perceptually-aware image reconstruction and manipulation tasks. The current main tool for comparing methods are human-opinion studies, which are hardly reproducible, making it practically impossible to systematically compare methods and assess progress. Here, we analyze the relation between existing image quality metrics and human-opinion scores, concluding which metrics are best for quantifying perceptual quality. In Fig. 9, we plot the mean-opinion scores of the methods included in the human-opinion study vs. the mean score according to the common full-reference measures RMSE, SSIM [45], IFC [33], and LPIPS [50], as well as the no-reference methods by Ma et al. [22], NIQE [27], BRISQUE [26] and the PI defined by (1). For each measure, we report Spearman’s correlation coefficient with the raters’ mean opinion scores, and also plot the corresponding least-squares linear fit.

As seen in Fig. 9, RMSE, SSIM and IFC, which are widely used for evaluating the quality of image reconstruction algorithms, are anti-correlated with perceptual quality and thus inappropriate for evaluating it. Ma et al. and BRISQUE show moderate correlation with human-opinion-scores, while LPIPS, NIQE and PI are highly correlated, with PI being the most correlated.

The bottom pane of Fig. 9 focuses on the high-perceptual quality regime, where it is important to distinguish between methods and correctly rank them. Metrics which excel in this regime will allow to assess progress in perceptual SR and to systematically compare methods. This is done by zooming in on the region of mean-opinion-score above 2.3 (a new least-squares linear fit appears in magenta). These plots reveal that LPIPS, Ma et al. and BRISQUE fail to faithfully quantify the perceptual quality in this regime. The only methods capable of correctly evaluating the perceptual quality of perceptually-aware SR algorithms are NIQE and PI (which is a combination of NIQE and Ma). Note that we also tested the full-reference measures VIF [32], FSIM [49] and MS-SSIM [46], and the no-reference measures CORNIA [48] and BLIINDS [30], which all failed to correctly assess the perceptual quality^{Footnote 4}.

We also analyze the correlation between human-opinion scores and common image quality measures on a single image. In Fig. 10 we plot the scores for outputs of each tested challenge method on all 40 tested images (480 images altogether), where we average only over different human raters. To eliminate the variations between images (see Sect. 4.2), we first subtract the mean score of each image (over different raters) for both the human-opinion scores and the image quality measures. As can be seen, theses results are similar in trend to the results presented in Fig. 9.

6 Current Trends in Perceptual Super Resolution

All twenty-one groups who participated in the PIRM SR challenge, submitted algorithms based on deep nets. We next shortly review the current trends reflected in the submitted algorithms, in terms of three main aspects: the loss functions, the architectures, and methods to traverse the perception-distortion tradeoff. Note that the scope of this paper is not to review the field of SR, but rather to summarize the leading trends in the PIRM SR challenge. Additional details on the submitted methods can be found in the PIRM workshop proceedings.

6.1 Loss Functions

Traditionally, neural networks for single image SR are trained with $\ell _1\slash \ell _2$ norm objectives [47, 53]. These training objectives have been shown to enhance the values of common image evaluation metrics, e.g. PSNR, SSIM. In the PIRM perceptual SR challenge, the evaluation methodology assesses the perceptual quality of algorithms, which is not necessarily always enhanced by $\ell _1\slash \ell _2$ objectives [1]. As a consequence, a variety of other loss functions were suggested. The main observed trend is the use of adversarial training [9] in order to learn the statistics of natural images and reconstruct realistic images. Most participants used the standard GAN loss [9]. Others [43] used a recent adaptation to the standard GAN loss named Relativistic GAN [14], which emphasizes the relation between the fake and real examples by modifying the loss function. Vu et al. [41] suggested to further improve the relativistic GAN by wrapping it with the focal loss [20] which intensifies difficult samples and depresses easy samples.

Training the network solely with an adversarial loss is not enough since affinity to the input (distortion) is also of importance. The clear solution is to combine the GAN loss with the $\ell _1\slash \ell _2$ loss and by that target both perceptual quality and distortion. However, it was shown in [18, 31] that $\ell _1\slash \ell _2$ losses prevent the generation of textures, which are crucial for perceptual quality. To overcome this, challenge participants used loss functions which are considered more perceptual (capture semantics). The “perceptual loss” [13] appeared in most submitted solutions, where participants chose different nets and layers for extracting deep-features. An alternative for the perceptual loss used by [28] is the contextual loss [24, 25], that encourages the reconstructed images to have the same statistics as of the high resolution ground-truth images.

A different approach [8] that achieved high perceptual quality is transferring texture by training with the Gram loss [7], and without adversarial training. These participants show that standard texture transfer can be further improved by controlling the process using homogeneous semantic regions.

Submissions also applied other distortion functions, including the MS-SSIM loss function to emphasize a more structural distortion goal, Discrete Cosine Transform (DCT) based loss function and L1 norm between image gradients [2] which were suggested in order overcome the smoothing effect of the MSE loss.

6.2 Architecture

The second crucial component of submissions is the network architecture. Overall, most participating teams adopted state-of-the-art architectures from successful PSNR-maximization based SR methods and replaced the loss function. The main trend is to use the EDSR network architecture [19] for the generator and the SRGAN architecture [18] for the discriminator. Wang et al. [43] suggested to replace the residual block of EDSR with the Residual-in-Residual Dense Block (RRDB), which combines multi-level residual networks and dense connections. RRDR enables the use of deeper models, and as a result, improves the recovered textures. Others used Deep Back-Projection Networks (DBPN) [11], Enhanced Upscale Modules (EUSR) [16], and Multi-Grid-Back-Projection (MGBP) [28].

6.3 Traversing the Perception-Distortion Tradeoff

The tradeoff between perceptual quality and distortion raises the question of how to control the compromise between these two objectives. The importance of this question is two-fold: first, the optimal working point along the perception-distortion curve is domain specific and moreover it is image specific. Second, it is hard to predict the final working point, especially when the full objective is complex and when adversarial training is incorporated. Below we elaborate on four possible solutions (see pros and cons in Table 2):

1.
Retrain the network for each working point. This can be done by modifying the magnitude of the loss terms (e.g. adversarial and distortion losses).
2.
Interpolate between output images of two pretrained networks (in the pixel domain). For example, by using soft thresholding [5].
3.
Interpolate between the parameters of two networks with the same architecture but different loss. This allows to generate a third network that is easy to control (see [43] for details).
4.
Control the tradeoff with an additional network input. For example, [28] added noise to the input in order to traverse along the curve by changing the noise level at test time.

Table 2. Pros and cons of the suggested methods for controlling the compromise between perceptual quality and distortion.

Full size table

7 Conclusions

The 2018 PIRM challenge is the first benchmark for perceptual-quality driven SR algorithms. The novel evaluation methodology used in this challenge enabled the assessment and ranking of perceptual SR methods along-side with those which target PSNR maximization. With this evaluation scheme, we compared the submitted algorithms with existing baselines, which revealed that the proposed methods push forward this field’s state-of-the-art. A thorough study of the capability of common image quality measures to capture the perceptual quality of images was conducted. This study exposed that most common image quality measures are inadequate of quantifying perceptual quality.

We conclude this report by pointing to several challenges in the field of perceptual SR, which should be the focus of future work. While we have witnessed major improvements over the past several years, in challenging scenarios such as 4x SR, the outputs of current methods are generally unrealistic to human observers. This highlights that there is still much to be done to achieve high-quality perceptual SR images. Most common image quality measures fail to quantify the perceptual quality of SR methods, and there is still much room for improvement in this essential task. Perceptual-quality driven algorithms have yet to appear for the real-world scenario of blind SR. The perceptual quality objective, which has gained much attention for the SR task, should also gain attention for other image restoration tasks e.g. deblurring. Finally, since a tradeoff between reconstruction accuracy and perceptual quality exists, schemes for controlling the compromise between the two can lead to adaptive SR schemes. This may promote new ways of quantifying the performance of SR algorithms, for instance, by measuring the area-under-the-curve in the perception-distortion plane.

Notes

1.
The validation set was used throughout the challenge for model development, and the test set was released a week before the challenge ended for assessing the final results.
2.
https://pirm.github.io.
3.
Note that this is not the mean of the RMSEs of the images, but rather the square-root of the images’ mean MSE.
4.
VIF, FSIM, MS-SSIM and CORNIA were anti-correlated with the mean-opinion-scores. BLIINDS was moderately correlated, but failed in the high perceptual quality regime (similar to BRISQUE).

References

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the CVPR (2018)
Google Scholar
Cheon, M., Kim, J.H., Choi, J.H., Lee, J.S.: Generative adversarial network-based image super-resolution using perceptual content losses. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Choi, J.H., Kim, J.H., Cheon, M., Lee, J.S.: Deep learning-based image super-resolution considering quantitative and perceptual quality. arXiv preprint arXiv:1809.04789 (2018)
Dahl, R., Norouzi, M., Shlens, J.: Pixel recursive super resolution. In: Proceedings of the ICCV (2017)
Google Scholar
Deng, X.: Enhancing image quality via style transfer for single image super-resolution. IEEE Sign. Process. Lett. 25(4), 571–575 (2018)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: Proceedings of the NIPS (2015)
Google Scholar
Gondal, M.W., Schölkopf, B., Hirsch, M.: The unreasonable effectiveness of texture transfer for single image super-resolution. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the NIPS (2014)
Google Scholar
Han, W., Chang, S., Liu, D., Yu, M., Witbrock, M., Huang, T.S.: Image super-resolution via dual-state recurrent networks. In: Proceedings of the CVPR (2018)
Google Scholar
Haris, M., Shakhnarovich, G., Ukita, N.: Deep backprojection networks for super-resolution. In: Proceedings of the CVPR (2018)
Google Scholar
Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the CVPR (2015)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734 (2018)
Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the CVPR (2016)
Google Scholar
Kim, J.H., Lee, J.S.: Deep residual network with enhanced upscaling module for super-resolution. In: Proceedings of the CVPR Workshops (2018)
Google Scholar
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate superresolution. In: Proceedings of the CVPR (2017)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the CVPR (2017)
Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the CVPR workshops (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the ICCV (2017)
Google Scholar
Luo, X., Chen, R., Xie, Y., Qu, Y., Cui-hua, L.: Bi-GANs-ST for perceptual image super-resolution. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Ma, C., Yang, C.Y., Yang, X., Yang, M.H.: Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 158, 1–16 (2017)
Article Google Scholar
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the ICCV (2001)
Google Scholar
Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Learning to maintain natural image statistics. arXiv preprint arXiv:1803.04626 (2018)
Mechrez, R., Talmi, I., Zelnik-Manor, L.: The contextual loss for image transformation with non-aligned data. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 800–815. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_47
Chapter Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. (TIP) 21(12), 4695–4708 (2012)
Article MathSciNet Google Scholar
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Sign. Process. Lett. 20(3), 209–212 (2013)
Article Google Scholar
Navarrete Michelini, P., Zhu, D., Hanwen, L.: Multi-scale recursive and perception-distortion controllable image super-resolution. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Purohit, K., Mandal, S., Rajagopalan, A.N.: Scale-recurrent multi-residual dense network for image super resolution. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Saad, M.A., Bovik, A.C., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. (TIP) 21(8), 3339–3352 (2012)
Article MathSciNet Google Scholar
Sajjadi, M.S., Schölkopf, B., Hirsch, M.: Enhancenet: single image super-resolution through automated texture synthesis. In: Proceedings of the ICCV (2017)
Google Scholar
Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans. Image Process. (TIP) 15(2), 430–444 (2006)
Article Google Scholar
Sheikh, H.R., Bovik, A.C., De Veciana, G.: An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 14(12), 2117–2128 (2005)
Article Google Scholar
Shocher, A., Cohen, N., Irani, M.: “zero-shot” super-resolution using deep internal learning. In: Proceedings of the CVPR (2018)
Google Scholar
Sun, L., Hays, J.: Super-resolution using constrained deep texture synthesis. arXiv preprint arXiv:1701.07604 (2017)
Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L., et al.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: Proceedings of the CVPR workshops (2017)
Google Scholar
Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Proceedings of the ACCV (2014)
Google Scholar
Timofte, R., et al.: NTIRE 2018 challenge on single image super-resolution: methods and results. In: Proceedings of the CVPR workshops (2018)
Google Scholar
Tong, T., Li, G., Liu, X., Gao, Q.: Image super-resolution using dense skip connections. In: Proceedings of the ICCV (2017)
Google Scholar
Vasu, S., Nimisha, T.M., Rajagopalan, A.N.: Analyzing perception-distortion tradeoff using enhanced perceptual super-resolution network. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Vu, T., Luu, T., Yoo, C.D.: Perception-enhanced image super-resolution via relativistic generative adversarial networks. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: Proceedings of the CVPR (2018)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Proceedings of the ECCV Workshops (2018)
Google Scholar
Wang, Y., Perazzi, F., McWilliams, B., Sorkine-Hornung, A., Sorkine-Hornung, O., Schroers, C.: A fully progressive approach to single-image super-resolution. In: Proceedings of the CVPR (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP) 13(4), 600–612 (2004)
Article Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402 (2003)
Google Scholar
Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.H.: Deep learning for single image super-resolution: a brief review. arXiv preprint arXiv:1808.03344 (2018)
Ye, P., Kumar, J., Kang, L., Doermann, D.: Unsupervised feature learning framework for no-reference image quality assessment. In: Proceedings of the CVPR (2012)
Google Scholar
Zhang, L., et al.: FSIM: a feature similarity index for image quality assessment. IEEE Trans. Image Process. (TIP) 20(8), 2378–2386 (2011)
Article MathSciNet Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the CVPR (2018)
Google Scholar
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 294–310. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_18
Chapter Google Scholar
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: Proceedings of the CVPR (2018)
Google Scholar
Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017)
Article Google Scholar

Download references

Acknowledgments

The 2018 PIRM Challenge on Perceptual SR was sponsored by Huawei and Mediatek.

Author information

Authors and Affiliations

Technion–Israel Institute of Technology, Haifa, Israel
Yochai Blau, Roey Mechrez, Tomer Michaeli & Lihi Zelnik-Manor
ETH Zurich, Zürich, Switzerland
Radu Timofte

Authors

Yochai Blau
View author publications
You can also search for this author in PubMed Google Scholar
Roey Mechrez
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Michaeli
View author publications
You can also search for this author in PubMed Google Scholar
Lihi Zelnik-Manor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yochai Blau .

Editor information

Editors and Affiliations

Technical University of Munich, Garching, Germany
Laura Leal-Taixé
Technische Universität Darmstadt, Darmstadt, Germany
Stefan Roth

Appendices

A Participating Teams

See Table 3.

Table 3. Participating teams (alphabetical order).

Full size table

B Test Phase Results

See Table 4

Table 4. Challenge results. The top submission of each group in each region. For submissions with a marginal perceptual index difference (up to 0.01), the one with the lower RMSE is ranked higher. Submission with marginal differences in both the perceptual index and RMSE are ranked together (marked by *).

Full size table

C More Results

See Figs. 11 and 12

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L. (2019). The 2018 PIRM Challenge on Perceptual Image Super-Resolution. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11133. Springer, Cham. https://doi.org/10.1007/978-3-030-11021-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-11021-5_21
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11020-8
Online ISBN: 978-3-030-11021-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The 2018 PIRM Challenge on Perceptual Image Super-Resolution

Abstract

Similar content being viewed by others

PIPAL: A Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

1 Introduction

2 Perceptual Super Resolution

3 The PIRM Challenge on Perceptual SR