Skip to main content
Log in

OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Recent computer systems and handheld devices are equipped with high computing capability, such as general purpose GPUs (GPGPU) and multi-core CPUs. Utilizing such resources for computation has become a general trend, making their availability an important issue for the real-time aspect. Discrete cosine transform (DCT) and quantization are two major operations in image compression standards that require complex computations. In this paper, we develop an efficient parallel implementation of the forward DCT and quantization algorithms for JPEG image compression using Open Computing Language (OpenCL). This OpenCL-based parallel implementation utilizes a multi-core CPU and a GPGPU to perform DCT and quantization computations. We demonstrate the capability of this design via two proposed working scenarios. The proposed approach also applies certain optimization techniques to improve the kernel execution time and data movements. We developed an optimal OpenCL kernel for a particular device using device-based optimization factors, such as thread granularity, work-items mapping, workload allocation, and vector-based memory access. We evaluated the performance in a heterogeneous environment, finding that the proposed parallel implementation was able to speed up the execution time of the DCT and quantization by factors of 7.97 and 8.65, respectively, obtained from 1024 × 1024 and 2084 × 2048 image sizes in 4:4:4 format.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. John, O., Mike, H., David, L., Simon, G., John, S., James, P.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  2. Stephen, K., William, D., Brucek, K., Michael, G., David, G.: GPUs and the future of parallel computing. Micro IEEE 31(5), 7–17 (2011)

    Article  Google Scholar 

  3. John, S., David, G., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)

    Article  Google Scholar 

  4. Barak, A., Ben-Nun, T., Levy, E., Shiloh, A.: A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In IEEE International Conference on Cluster Computing, Heraklion, Crete (2010)

  5. Samsung Galaxy S5: Samsung, (Online). http://en.wikipedia.org/wiki/Samsung_Galaxy_S5. Accessed 2 Jan 2015

  6. Yun, H.S., Shi, Q.: Image and video compression for multimedia engineering. CRC Press, New York (2008)

    Google Scholar 

  7. Ruby, L., John, B., Joel, L., Kenneth, S.: Real-time software MPEG video decoder on multimedia-enhanced PA-7100LC processors. Hewlett-Packard J. 46(2), 60–68 (1995)

    Google Scholar 

  8. Furht, B.: A survey of multimedia compression techniques and standards. Part I: JPEG standard. Real-Time Imaging 1(1), 49–67 (1995)

    Article  Google Scholar 

  9. Agostini, L., Bampi, S.: Integrated digital architecture for JPEG image compression. In: European Conference on Circuit Theory and Design, Espoo, Finland (2001)

  10. Rabadi, W., Talluri, R., Illgner, K.: Programmable DSP platform for digital still cameras. Texas Instrutments (2000)

  11. Li, S., Qu, X., Li, Q.: Implementation of the JPEG On DSP processors. Appl. Mech. Mater. 34–35, 1536–1539 (2010)

    Article  Google Scholar 

  12. Min, J., Markandey, V.: Optimizing JPEG on the TMS320C6211 2-level cache DSP. Digital Signal Processing Solutions (2000)

  13. Mohanty, S.P.: GPU-CPU multi-core for real-time signal processing. In: International Conference on Consumer Electronics ICCE ‘09 (2009)

  14. Tokdemir, S., Belkasim, S.: Parallel processing of DCT on GPU. In: Data Compression Conference (DCC), Snowbird, UT (2011)

  15. Duo, L., Ya, F.X.: Parallel program design for JPEG compression. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery (2012)

  16. Yang, Z., Zhu, Y., Pu, Y.: Parallel image processing based on CUDA. In: International Conference on Computer Science and Software Engineering (2008)

  17. Nvidia SDK 9.52 code samples—transform, discrete cosine (Online). http://developer.download.nvidia.com/SDK/9.5/Samples/gpgpu_samples.html. Accessed 11 Dec 2014

  18. AMD APP SDK Samples—DCT, AMD (Online). http://amddevcentral.com/tools/hc/AMDAPPSDK/samples/Pages/default.aspx. Accessed 11 Dec 2014

  19. Kou, W.: Digital image compression algorithms and standards. Kluwer Academic Publishers, Dordrecht (1995)

    Book  Google Scholar 

  20. Mitchell, J.L., Pennebaker, W.B.: JPEG still image data compression standard. International Thomson, New York (1993)

    Google Scholar 

  21. Thyagrajan, K.S.: Still image and video compression with Matlab. Wiley, New York (2011)

    Google Scholar 

  22. Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. 38, xviii–xxxiv (1991)

    Google Scholar 

  23. Yukihiro, A., Takeshi, A., Nakajima, M.: A fast DCT-SQ scheme for images. Trans. IEICE E-71(11), 1095–1097 (1988)

    Google Scholar 

  24. OpenCL: Khronos Group (Online). http://www.khronos.org/opencl/. Accessed 11 Dec 2014

  25. Gaster, B., Howes, L., Kaeli, D., Mistry, P., Schaa, D.: Heterogeneous computing with OpenCL. Elsevier, Amsterdam (2012)

    Google Scholar 

  26. The OpenCL Specification Version: 1.1, Khronos OpenCL Working Group (2010)

  27. Ralf Karrenberg, S.H.: Improving performance of OpenCL on CPUs. In: The 21st international conference on Compiler Construction, Berlin, Heidelberg (2012)

  28. Pourazad, M., Doutre, C., Azimi, M., Nasiopoulos, P.: HEVC: the new gold standard for video compression: how does HEVC compare with H.264/AVC? IEEE Consum. Electron. Mag. 1, 36–46 (2012)

    Article  Google Scholar 

  29. Goldman, M.: High-efficiency video coding (HEVC): the next-generation compression technology. SMPTE Motion Imaging J. 121(5), 27–33 (2012)

    Article  Google Scholar 

  30. Pastuszak, G.: Hardware architectures for the H.265/HEVC discrete cosine transform. IET Image Process. (2014). doi:10.1049/iet-ipr.2014.0277

  31. Meher, P., Park, S., Mohanty, B., Lim, K., Yeo, C.: Efficient integer DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol. 24(1), 168–178 (2014)

    Article  Google Scholar 

  32. Xun, C., Qunshan, G.: Improved HEVC lossless compression using two-stage coding with sub-frame level optimal quantization values. Image Processing (ICIP), 2014 IEEE International Conference, pp. 5651–5655, 27–30 (2014)

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and future Planning (NRF-2012R1A1A2043400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasser Alqudami.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alqudami, N., Kim, SD. OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform. J Real-Time Image Proc 12, 219–235 (2016). https://doi.org/10.1007/s11554-015-0507-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-015-0507-5

Keywords

Navigation