Raw or Cooked? Object Detection on RAW Images

Ljungbergh, William; Johnander, Joakim; Petersson, Christoffer; Felsberg, Michael

doi:10.1007/978-3-031-31435-3_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13885))

Included in the following conference series:

Scandinavian Conference on Image Analysis

597 Accesses
4 Citations

Abstract

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Åström, F., Zografos, V., Felsberg, M.: Density driven diffusion. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 718–730. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38886-6_67
Chapter Google Scholar
Bayer, B.E.: Color imaging array. United States Patent 3,971,065 (1976)
Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)
Google Scholar
Buckler, M., Jayasuriya, S., Sampson, A.: Reconfiguring the imaging pipeline for computer vision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 975–984 (2017)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Ciufolini, I., Paolozzi, A.: Mathematical prediction of the time evolution of the COVID-19 pandemic in Italy by a gauss error function and monte Carlo simulations. Eur. Phys. J. Plus 135(4), 355 (2020)
Article Google Scholar
Condat, L.: A simple, fast and efficient approach to denoisaicking: Joint demosaicking and denoising. In: 2010 IEEE International Conference on Image Processing, pp. 905–908. IEEE (2010)
Google Scholar
Dai, L., Liu, X., Li, C., Chen, J.: AWNet: attentive wavelet network for image ISP. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12537, pp. 185–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-67070-2_11
Chapter Google Scholar
Dubois, E.: Filter design for adaptive frequency-domain Bayer demosaicking. In: 2006 International Conference on Image Processing, pp. 2705–2708. IEEE (2006)
Google Scholar
Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical poissonian-gaussian noise modeling and fitting for single-image raw-data. IEEE Trans. Image Process. 17(10), 1737–1754 (2008)
Article MathSciNet MATH Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Hirakawa, K., Parks, T.W.: Adaptive homogeneity-directed demosaicing algorithm. IEEE Trans. Image Process. 14(3), 360–369 (2005)
Article Google Scholar
Hong, Y., Wei, K., Chen, L., Fu, Y.: Crafting object detection in very low light. In: BMVC, vol. 1, p. 3 (2021)
Google Scholar
HP, A.W., Prasetyo, H., Guo, J.M.: Autoencoder-based image companding. In: 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), pp. 1–2. IEEE (2020)
Google Scholar
Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera ISP with a single deep learning model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 536–537 (2020)
Google Scholar
Krawczyk, G., Myszkowski, K., Seidel, H.P.: Lightness perception in tone reproduction for high dynamic range images. In: Computer Graphics Forum, vol. 24, pp. 635–646. Amsterdam: North Holland, 1982- (2005)
Google Scholar
Kriesel, D.: Traue keinem scan, den du nicht selbst gefälscht hast. Mitteilungen der Deutschen Mathematiker-Vereinigung 22(1), 30–34 (2014)
Article Google Scholar
Langseth, R., Gaddam, V.R., Stensland, H.K., Griwodz, C., Halvorsen, P.: An evaluation of debayering algorithms on GPU for real-time panoramic video recording. In: 2014 IEEE International Symposium on Multimedia, pp. 110–115. IEEE (2014)
Google Scholar
Li, X., Gunturk, B., Zhang, L.: Image demosaicing: a systematic survey. In: Visual Communications and Image Processing 2008, vol. 6822, pp. 489–503. SPIE (2008)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, Z., et al.: SWIN transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Malvar, H.S., He, L.W., Cutler, R.: High-quality linear interpolation for demosaicing of bayer-patterned color images. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. iii–485. IEEE (2004)
Google Scholar
Meng, D., et al.: Conditional DETR for fast training convergence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3651–3660 (2021)
Google Scholar
Morawski, I., Chen, Y.A., Lin, Y.S., Dangi, S., He, K., Hsu, W.H.: GENISP: neural ISP for low-light machine cognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 630–639 (2022)
Google Scholar
Mujtaba, N., Khan, I.R., Khan, N.A., Altaf, M.A.B.: Efficient flicker-free tone mapping of HDR videos. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 01–06. IEEE (2022)
Google Scholar
Olli Blom, M., Johansen, T.: End-to-end object detection on raw camera data (2021)
Google Scholar
Omid-Zohoor, A., Ta, D., Murmann, B.: Pascalraw: raw image database for object detection (2014)
Google Scholar
Poynton, C.: Digital video and HD: Algorithms and Interfaces. Elsevier (2012)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction for digital images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 267–276 (2002)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Riechert, M.: Rawpy (2022). https://github.com/letmaik/rawpy
Shekhar Tripathi, A., Danelljan, M., Shukla, S., Timofte, R., Van Gool, L.: Transform your smartphone into a DSLR camera: Learning the ISP in the wild. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision. ECCV 2022. ECCV 2022. LNCS, pp. 625–641. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_36
Suma, R., Stavropoulou, G., Stathopoulou, E.K., Van Gool, L., Georgopoulos, A., Chalmers, A.: Evaluation of the effectiveness of HDR tone-mapping operators for photogrammetric applications. Virtual Archaeol. Rev. 7(15), 54–66 (2016)
Article Google Scholar
Sun, Z., Cao, S., Yang, Y., Kitani, K.M.: Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2021)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Wang, Y., Zhang, X., Yang, T., Sun, J.: Anchor DETR: query design for transformer-based detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2567–2575 (2022)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Yeo, I.K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000)
Article MathSciNet MATH Google Scholar
Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Dynamicisp: dynamically controlled image signal processor for image recognition. arXiv preprint arXiv:2211.01146 (2022)
Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Rawgment: noise-accounted raw augmentation enables recognition in a wide variety of environments. arXiv preprint arXiv:2210.16046 (2022)
Zhang, H., et al.: Dino: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)
Zhang, X., Zhang, L., Lou, X.: A raw image-based end-to-end object detection accelerator using hog features. IEEE Trans. Circuits Syst. I: Regular Papers 69(1), 322–333 (2021)
Article Google Scholar
Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-srgb mappings with inaccurately aligned supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4348–4358 (2021)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Computer Vision Laboratory, Linköping University, 581 83, Linköping, Sweden
William Ljungbergh, Joakim Johnander & Michael Felsberg
Zenseact, Lindholmspiren 2, 417 56, Gothenburg, Sweden
William Ljungbergh, Joakim Johnander & Christoffer Petersson

Authors

William Ljungbergh
View author publications
You can also search for this author in PubMed Google Scholar
Joakim Johnander
View author publications
You can also search for this author in PubMed Google Scholar
Christoffer Petersson
View author publications
You can also search for this author in PubMed Google Scholar
Michael Felsberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Ljungbergh .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Rikke Gade
Linköping University, Linköping, Sweden
Michael Felsberg
Tampere University, Tampere, Finland
Joni-Kristian Kämäräinen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ljungbergh, W., Johnander, J., Petersson, C., Felsberg, M. (2023). Raw or Cooked? Object Detection on RAW Images. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13885. Springer, Cham. https://doi.org/10.1007/978-3-031-31435-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-31435-3_25
Published: 27 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31434-6
Online ISBN: 978-3-031-31435-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Raw or Cooked? Object Detection on RAW Images