Intelligent Visual Media Processing: When Graphics Meets Vision

Cheng, Ming-Ming; Hou, Qi-Bin; Zhang, Song-Hai; Rosin, Paul L.

doi:10.1007/s11390-017-1681-7

Intelligent Visual Media Processing: When Graphics Meets Vision

Survey
Published: 11 January 2017

Volume 32, pages 110–121, (2017)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Ming-Ming Cheng¹,
Qi-Bin Hou¹,
Song-Hai Zhang² &
…
Paul L. Rosin^1,3

407 Accesses
65 Citations
Explore all metrics

Abstract

The computer graphics and computer vision communities have been working closely together in recent years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: 1) the availability of big data from the Internet has created a demand for dealing with the ever-increasing, vast amount of resources; 2) powerful processing tools, such as deep neural networks, provide effective ways for learning how to deal with heterogeneous visual data; 3) new data capture devices, such as the Kinect, the bridge between algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques benefit computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Lengyel J. The convergence of graphics and vision. Computer, 1998, 31(7): 46-53.
Article Google Scholar
Kang S B. Vision for graphics. In Proc. IJARC/ACCV Joint Int. Symp. Computer Vision, Nov. 2007, pp.23-34.
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1): 116-124.
Article Google Scholar
Xiao J X. Graphics for vision: Learning to see using big 3D synthetic data. http://www.cs.princeton.edu/~xj/slides/2015_CAD_Graphics Keynote.pdf, Oct. 2016.
Zheng S, Prisacariu V A, Averkiou M, Cheng M M, Mitra N J, Shotton J, Torr P H S, Rother C. Object proposals estimation in depth image using compact 3D shape manifolds. In Lecture Notes in Computer Science 9358, Gall J, Gehler P, Leibe B (eds.), Springer International Publishing, 2015, pp.196-208.
Meeker M. Internet trends 2014-code conference. http://www.kpcb.com/internet-trends, Oct. 2016.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.
Article Google Scholar
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.
Cheng M M, Mitra N J, Huang X L, Torr P H S, Hu S M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 569-582.
Article Google Scholar
Qi W, Cheng M M, Borji A, Lu H C, Bai L F. SaliencyRank: Two-stage manifold ranking for salient object detection. Computational Visual Media, 2015, 1(4): 309-320.
Article Google Scholar
Wu X M, Du M N, Chen W H, Wang J H. Salient object detection via region contrast and graph regularization. Science China Information Sciences, 2016, 59: 032104.
Article Google Scholar
Zhang W, Borji A,Wang Z, Le Callet P, Liu H T. The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Trans. Neural Networks and Learning Systems, 2016, 27(6): 1266-1278.
Borji A, Cheng M M, Jiang H Z, Li J. Salient object detection: A benchmark. IIEEE Transactions on Image Processing, 2015, 24(12): 5706-5722.
Article MathSciNet Google Scholar
Borji A, Cheng M M, Jiang H Z, Li J. Salient object detection: A survey. arXiv:1411.5878, 2014. https://arxiv.org/abs/1411.5878, Nov. 2016.
Zhang G X, Cheng M M, Hu S M, Martin R R. A shapepreserving approach to image resizing. Computer Graphics Forum, 2009, 28(7): 1897-1906.
Article Google Scholar
Zhao Y T, Liu Y H. Patch based saliency detection method for 3D surface simplification. In Proc. the 21st International Conference on Pattern Recognition, Nov. 2012, pp.845-848.
Jänicke H, Chen M. A salience-based quality metric for visualization. Computer Graphics Forum, 2010, 29(3): 1183-1192.
Miao Y W, Feng J Q, Wang J R, Pajarola R. A multichannel salience based detail exaggeration technique for 3D relief surfaces. Journal of Computer Science and Technology, 2012, 27(6): 1100-1109.
Article Google Scholar
Avidan S, Shamir A. Seam carving for content-aware image resizing. ACM Transactions on Graphics, 2007, 26(3): Article No. 10.
Wang Y S, Tai C L, Sorkine O, Lee T Y. Optimized scale-and-stretch for image resizing. ACM Transactions on Graphics, 2008, 27(5): Article No. 118.
Lee C H, Varshney A, Jacobs D W. Mesh saliency. ACM Transactions on Graphics, 2005, 24(3): 659-666.
Article Google Scholar
Kim Y, Varshney A. Saliency-guided enhancement for volume visualization. IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5): 925-932.
Article Google Scholar
Zhang L M, Wang M, Nie L Q, Hong L, Rui Y, Tian Q. Retargeting semantically-rich photos. IEEE Transactions on Multimedia, 2015, 17(9): 1538-1549.
Article Google Scholar
Wu H S, Wang Y S, Feng K C, Wong T T, Lee T Y, Heng P A. Resizing by symmetry-summarization. ACM Transactions on Graphics, 2010, 29(6): Article No. 159.
Zhang F, Zhang X, Qin X Y, Zhang C M. Enlarging image by constrained least square approach with shape preserving. Journal of Computer Science and Technology, 2015, 30(3): 489-498.
Article MathSciNet Google Scholar
Li B, Duan L Y, Lin C W, Huang T J, Gao W. Depthpreserving warping for stereo image retargeting. IEEE Transactions on Image Processing, 2015, 24(9): 2811-2826.
Article MathSciNet Google Scholar
Jain E, Sheikh Y, Shamir A, Hodgins J. Gaze-driven video re-editing. ACM Trans. Graphics, 2015, 34(2): Article No. 21.
Liu Y, Sun L F, Yang S Q. A retargeting method for stereoscopic 3D video. Computational Visual Media, 2015, 1(2): 119-127.
Article Google Scholar
Miao Y W, Lin H B. Visual saliency guided global and local resizing for 3D models. In Proc. Int. Conf. Computer-Aided Design and Computer Graphics, Nov. 2013, pp.212-219.
Jia S X, Zhang C M, Li X M, Zhou Y F. Mesh resizing based on hierarchical saliency detection. Graphical Models, 2014, 76(5): 355-362.
Article Google Scholar
Song R, Liu Y H, Zhao Y T, Martin R R, Rosin P L. Conditional random field-based mesh saliency. In Proc. the 19th IEEE International Conference on Image Processing, Sept. 30-Oct. 3, 2012, pp.637-640.
Castelló P, Chover M, Sbert M, Feixas M. Reducing complexity in polygonal meshes with view-based saliency. Computer Aided Geometric Design, 2014, 31(6): 279-293.
Miao Y W, Feng J Q, Pajarola R. Visual saliency guided normal enhancement technique for 3D shape depiction. Computers & Graphics, 2011, 35(3): 706-712.
Article Google Scholar
Zhao Y, Lu S J, Qian H L, Yao P C. Robust mesh deformation with salient features preservation. Science China Information Sciences, 2016, 59: 052106.
Article Google Scholar
Semmo A, Trapp M, Kyprianidis J E, Döllner J. Interactive visualization of generalized virtual 3D city models using level-of-abstraction transitions. Computer Graphics Forum, 2012, 31: 885-894.
Song P, Fu Z Q, Liu L G, Fu C W. Printing 3D objects with interlocking parts. Computer Aided Geometric Design, 2015, 35/36: 137-148.
Wang W M, Chao H Y, Tong J et al. Saliency-preserving slicing optimization for effective 3D printing. Computer Graphics Forum, 2015, 34(6): 148-160.
Article Google Scholar
Criminisi A, Pérez P, Toyama K. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 2004, 13(9): 1200-1212.
Adams A, Gelfand N, Dolson J, Levoy M. Gaussian KD-trees for fast high-dimensional filtering. ACM Transactions on Graphics, 2009, 28(3): Article No. 21.
Simakov D, Caspi Y, Shechtman E, Irani M. Summarizing visual data using bidirectional similarity. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2008.
Shamir A, Avidan S. Seam carving for media retargeting. Communications of the ACM, 2009, 52(1): 77-85.
Article Google Scholar
Chen T, Zhu Z, Shamir A, Hu S M, Cohen-Or D. 3-sweep: Extracting editable objects from a single photo. ACM Trans. Graphics, 2013, 32(6): Article No. 195.
Kholgade N, Simon T, Efros A, Sheikh Y. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graphics, 2014, 33(4): Article No. 127.
Koka K. Principles of Gestalt Psychology (Reprint Edition). Routledge, 2013.
Cheng M M, Zhang F L, Mitra N J, Huang X L, Hu S M. RepFinder: Finding approximately repeated scene elements for image editing. ACM Transactions on Graphics, 2010, 29(4): Article No. 83.
Goldberg C, Chen T, Zhang F L, Shamir A, Hu S M. Datadriven object manipulation in images. Computer Graphics Forum, 2012, 31: 265-274.
Article Google Scholar
Chen T, Cheng M M, Tan P, Shamir A, Hu S M. Sketch2Photo: Internet image montage. ACM Transactions on Graphics, 2009, 28(5): Article No. 124.
Lu S P, Zhang S H, Wei J, Hu S M, Martin R R. Timeline editing of objects in video. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(7): 1218-1227.
Article Google Scholar
Zheng Y Y, Chen X, Cheng M M et al. Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graphics, 2012, 31(4): Article No. 99.
Iizuka S, Endo Y, Hirose M, Kanamori Y, Mitani J, Fukui Y. Object repositioning based on the perspective in a single image. Computer Graphics Forum, 2014, 33(8): 157-166.
Article Google Scholar
Rong Y L, Zheng Y Y, Shao T J et al. An interactive approach for functional prototype recovery from a single RGBD image. Computational Visual Media, 2016, 2(1): 87-96.
Article Google Scholar
Wu J, Rosin P L, Sun X F, Martin R R. Improving shape from shading with interactive Tabu search. Journal of Computer Science and Technology, 2016, 31(3): 450-462.
Article Google Scholar
Zhao H L, Nie G Z, Li X J et al. Structure-aware nonlocal optimization framework for image colorization. Journal of Computer Science and Technology, 2015, 30(3): 478-488.
Article Google Scholar
Cheng M M, Prisacariu V A, Zheng S, Torr P H S, Rother C. DenseCut: Densely connected CRFs for realtime Grab-Cut. Computer Graphics Forum, 2015, 34(7): 193-201.
Article Google Scholar
Cheng M M, Zheng S, Lin W Y, Vineet V, Sturgess P, Crook N, Mitra N J, Torr P. ImageSpirit: Verbal guided image parsing. ACM Trans. Graphics, 2014, 34(1): Article No. 3.
Huang Q X, Wang H, Koltun V. Single-view reconstruction via joint analysis of image and shape collections. ACM Transactions on Graphics, 2015, 34(4): Article No. 87.
Chen T, Tan P, Ma L Q, Cheng M M, Shamir A, Hu S M. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(5): 824-837.
Article Google Scholar
Tanahashi Y, Hsueh C H, Ma K L. An efficient framework for generating storyline visualizations from streaming data. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(6): 730-742.
Article Google Scholar
Hasegawa K, Saito H. Synthesis of a stroboscopic image from a hand-held camera sequence for a sports analysis. Computational Visual Media, 2016, 2(3): 277-289.
Article Google Scholar
Lalonde J F, Hoiem D, Efros A A, Rother C, Winn J, Criminisi A. Photo clip art. ACM Transactions on Graphics, 2007, 26(3): Article No. 3.
Xu K, Chen K, Fu H B, Sun W L, Hu S M. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graphics, 2013, 32(4): Article No. 123.
Chia A Y S, Zhuo S J, Gupta R K, Tai Y W, Cho S Y, Tan P, Lin S. Semantic colorization with Internet images. ACM Transactions on Graphics, 2011, 30(6): Article No. 156.
Longuet-Higgins H C. A computer algorithm for reconstructing a scene from two projections. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, Fischler M A, Firschein O (eds.), Morgan Kaufmann Publishers Inc., 1987, pp.61-62.
Snavely N, Seitz S M, Szeliski R. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics, 2006, 25(3): 835-846.
Article Google Scholar
Agarwal S, Snavely N, Simon I, Seitz S M, Szeliski R. Building Rome in a day. In Proc. the 12th International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.72-79.
Cao C, Bradley D, Zhou K, Beeler T. Realtime high-fidelity facial performance capture. ACM Transactions on Graphics, 2015, 34(4): Article No. 46.
Frahm J M, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C C, Jen Y H, Dunn E, Clipp B, Lazebnik S, Pollefeys M. Building Rome on a cloudless day. In Lecture Notes in Computer Science 6314, Daniilidis K, Maragos P, Paragios N (eds.), Springer-Verlag, 2010, pp.368-381.
Fuhrmann S, Langguth F, Moehrle N, Waechter M, Goesele M. MVE — An imagebased reconstruction environment. Computers & Graphics, 2015, 53: 44-53.
Article Google Scholar
Ceylan D, Mitra N J, Zheng Y Y, Pauly M. Coupled structure-from-motion and 3D symmetry detection for urban facades. ACM Trans. Graphics, 2014, 33(1): Article No. 2.
Kopf J, Cohen M F, Szeliski R. Firstperson hyper-lapse videos. ACM Trans. Graphics, 2014, 33(4): Article No. 78.
Tan W, Liu H M, Dong Z L, Zhang G F, Bao H J. Robust monocular SLAM in dynamic environments. In Proc. Int. Sym. Mixed and Augmented Reality, Oct. 2013, pp.209-218.
Li K, Yang J Y, Jiang J M. Nonrigid structure from motion via sparse representation. In Proc. International Conference on Multimedia and Expo, July 2014.
Li K, Yang J, Jiang J. Nonrigid structure from motion via sparse representation. IEEE Trans. Cybernetics, 2015, 45(8): 1401-1413.
Article Google Scholar
Huang H D, Chai J X, Tong X, Wu H T. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Transactions on Graphics, 2011, 30(4): Article No. 74.
Zhang L, Snavely N, Curless B, Seitz S M. Spacetime faces: High-resolution capture for modeling and animation. In Data-Driven 3D Facial Animation, Deng Z G, Neumann U (eds.), Springer, 2008, pp.248-276.
Beeler T, Hahn F, Bradley D, Bickel B, Beardsley P, Gotsman C, Sumner R W, Gross M. High-quality passive facial performance capture using anchor frames. ACM Transactions on Graphics, 2011, 30(4): Article No. 75.
Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data: A survey. Computational Visual Media, 2015, 1(4): 267-278.
Article Google Scholar
Cao C, Hou Q M, Zhou K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics, 2014, 33(4): Article No. 43.
Casas D, Tejera M, Guillemaut J Y, Hilton A. Interactive animation of 4D performance capture. IEEE Trans. Visualization and Computer Graphics, 2013, 19(5): 762-773.
Article Google Scholar
Huang P, TejeraM, Collomosse J, Hilton A. Hybrid skeletalsurface motion graphs for character animation from 4D performance capture. ACM Transactions on Graphics, 2015, 34(2): Article No. 17.
Xia S H, Wang C Y, Chai J X, Hodgins J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics, 2015, 34(4): Article No. 119.
Pons-Moll G, Romero J, Mahmood N, Black M J. Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics, 2015, 34(4): Article No. 120.
Rogez G, Schmid C. MoCap-guided data augmentation for 3D pose estimation in the wild. arXiv:1607.02046, 2016. https://arxiv.org/abs/1607.02046, Oct. 2016.
Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A, Blake A. Efficient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.
Article Google Scholar
Song S R, Xiao J X. Sliding shapes for 3D object detection in depth images. In Lecture Notes in Computer Science 8694, Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds.), Springer International Publishing, 2014, pp.634-651.
Malisiewicz T, Gupta A, Efros A A. Ensemble of exemplar-SVMs for object detection and beyond. In Proc. Int. Conf. Computer Vision, Nov. 2011, pp.89-96.
Peng X C, Sun B C, Ali K, Saenko K. Learning deep object detectors from 3D models. In Proc. International Conference on Computer Vision, Dec. 2015, pp.1278-1286.
Gupta S, Girshick R, Arbeláez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In Lecture Notes in Computer Science 8695, Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds.), Springer International Publishing, 2014, pp.345-360.
Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, Xiao J X. 3D ShapeNets: A deep representation for volumetric shapes. In Proc. Conference on Computer Vision and Pattern Recognition, June 2015, pp.1912-1920.
Maturana D, Scherer S. VoxNet: A 3D convolutional neural network for real-time object recognition. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept. 28-Oct. 2, 2015, pp.922-928.
Wohlhart P, Lepetit V. Learning descriptors for object recognition and 3D pose estimation. In Proc. Conf. Computer Vision and Pattern Recognition, June 2015, pp.3109-3118.
Valentin J, Vineet V, Cheng M M, Kim D, Shotton J, Kohli P, Nieβner M, Criminisi A, Izadi S, Torr P. Semantic-Paint: Interactive 3D labeling and learning at your fingertips. ACM Trans. Graphics, 2015, 34(5): Article No. 154.
Xu K, Huang H, Shi Y F, Li H, Long P X, Caichen J, Sun W, Chen B Q. Autoscanning for coupled scene reconstruction and proactive object analysis. ACM Transactions on Graphics, 2015, 34(6): Article No. 177.
Tateno K, Tombari F, Navab N. When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM. In Proc. International Conference on Robotics and Automation, May 2016, pp.2295-2302.

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their useful feedbacks.

Author information

Authors and Affiliations

College of Computer Science and Control Engineering, Nankai University, Tianjin, 300071, China
Ming-Ming Cheng, Qi-Bin Hou & Paul L. Rosin
TNList, Tsinghua University, Beijing, 100084, China
Song-Hai Zhang
School of Computer Science and Informatics, Cardiff University, Wales, CF10 3EU, U.K.
Paul L. Rosin

Authors

Ming-Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qi-Bin Hou
View author publications
You can also search for this author in PubMed Google Scholar
Song-Hai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Paul L. Rosin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Ming Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, MM., Hou, QB., Zhang, SH. et al. Intelligent Visual Media Processing: When Graphics Meets Vision. J. Comput. Sci. Technol. 32, 110–121 (2017). https://doi.org/10.1007/s11390-017-1681-7

Download citation

Received: 11 August 2016
Revised: 26 October 2016
Published: 11 January 2017
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11390-017-1681-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent Visual Media Processing: When Graphics Meets Vision

Abstract

Access this article

Similar content being viewed by others

Differentiable visual computing for inverse problems and machine learning

Expanding Theoretical Complexity

On Alternative Approaches to 3D Image Perception: Monoscopic 3D Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intelligent Visual Media Processing: When Graphics Meets Vision

Abstract

Access this article

Similar content being viewed by others

Differentiable visual computing for inverse problems and machine learning

Expanding Theoretical Complexity

On Alternative Approaches to 3D Image Perception: Monoscopic 3D Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation