Elsevier

Information Fusion

Volume 64, December 2020, Pages 131-148
Information Fusion

Deepfakes and beyond: A Survey of face manipulation and fake detection

https://doi.org/10.1016/j.inffus.2020.06.014Get rights and content

Highlights

  • DeepFakes and beyond: types of facial manipulations.

  • Facial manipulation techniques.

  • Databases for research in face manipulation and fake detection.

  • Key benchmarks for technology evaluation of fake detection methods.

  • Summary of fake detection results for each facial manipulation group.

Abstract

The free access to large-scale public databases, together with the fast progress of deep learning techniques, in particular Generative Adversarial Networks, have led to the generation of very realistic fake content with its corresponding implications towards society in this era of fake news.

This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations. In particular, four types of facial manipulation are reviewed: i) entire face synthesis, ii) identity swap (DeepFakes), iii) attribute manipulation, and iv) expression swap. For each manipulation group, we provide details regarding manipulation techniques, existing public databases, and key benchmarks for technology evaluation of fake detection methods, including a summary of results from those evaluations. Among all the aspects discussed in the survey, we pay special attention to the latest generation of DeepFakes, highlighting its improvements and challenges for fake detection.

In addition to the survey information, we also discuss open issues and future trends that should be considered to advance in the field.

Introduction

Fake images and videos including facial information generated by digital manipulation, in particular with DeepFake methods [1], have become a great public concern recently [2], [3]. The very popular term “DeepFake” is referred to a deep learning based technique able to create fake videos by swapping the face of a person by the face of another person. This term was originated after a Reddit user named “deepfakes” claimed in late 2017 to have developed a machine learning algorithm that helped him to transpose celebrity faces into porn videos [4]. In addition to fake pornography, some of the more harmful usages of such fake content include fake news, hoaxes, and financial fraud. As a result, the area of research traditionally dedicated to general media forensics [5], [6], [7], [8], [9], [10], [11], is being invigorated and is now dedicating growing efforts for detecting facial manipulation in image and video [12] Part of these renewed efforts in fake face detection are built around past research in biometric anti-spoofing [13], [14], [15] and modern data-driven deep learning [16], [17]. The growing interest in fake face detection is demonstrated through the increasing number of workshops in top conferences [18], [19], [20], [21], [22], international projects such as MediFor funded by the Defense Advanced Research Project Agency (DARPA), and competitions such as the recent Media Forensics Challenge (MFC2018)1 and the Deepfake Detection Challenge (DFDC)2 launched by the National Institute of Standards and Technology (NIST) and Facebook, respectively.

Traditionally, the number and realism of facial manipulations have been limited by the lack of sophisticated editing tools, the domain expertise required, and the complex and time-consuming process involved. For example, an early work in this topic [23] was able to modify the lip motion of a person speaking using a different audio track, by making connections between the sounds of the audio track and the shape of the subject’s face. However, from these early works up to date, many things have rapidly evolved in the last years. Nowadays, it is becoming increasingly easy to automatically synthesise non-existent faces or manipulate a real face of one person in an image/video, thanks to: i) the accessibility to large-scale public data, and ii) the evolution of deep learning techniques that eliminate many manual editing steps such as Autoencoders (AE) and Generative Adversarial Networks (GAN) [24], [25]. As a result, open software and mobile application such as ZAO3 and FaceApp4 have been released opening the door to anyone to create fake images and videos, without any experience in the field needed.

In response to those increasingly sophisticated and realistic manipulated content, large efforts are being carried out by the research community to design improved methods for face manipulation detection. Traditional fake detection methods in media forensics have been commonly based on: i) in-camera fingerprints, the analysis of the intrinsic fingerprints introduced by the camera device, both hardware and software, such as the optical lens [26], colour filter array and interpolation [27], [28], and compression [29], [30], among others, and ii) out-camera fingerprints, the analysis of the external fingerprints introduced by editing software, such as copy-paste or copy-move different elements of the image [31], [32], reduce the frame rate in a video [33], [34], [35], etc. However, most of the features considered in traditional fake detection methods are highly dependent on the specific training scenario, being therefore not robust against unseen conditions [6], [8], [16]. This is of special importance in the era we live in as most media fake content is usually shared on social networks, whose platforms automatically modify the original image/video, e.g., through compression and resize operations [12].

This survey provides an in-depth review of digital manipulation techniques applied to facial content due to the large number of possible harmful applications, e.g., the generation of fake news that would provide misinformation in political elections and security threats [36], [37]. Specifically, we cover four types of manipulations: i) entire face synthesis, ii) identity swap, iii) attribute manipulation, and iv) expression swap. These four main types of face manipulation are well established by the research community, receiving most attention in the last few years. Besides, we also review in this survey some other challenging and dangerous face manipulation that are not so popular yet like face morphing.

Finally, for completeness, we would like to highlight other recent surveys in the field. In [38], the authors cover the topic of DeepFakes from a general perspective, proposing the R.E.A.L framework to manage DeepFake risks. In addition, Verdoliva has recently surveyed in [39] traditional manipulation and fake detection approaches considered in general media forensics, and also the latest deep learning techniques. The present survey complements [38] and [39] with a more detailed review of each facial manipulation group, including manipulation techniques, existing public databases, and key benchmarks for technology evaluation of fake detection methods, including a summary of results from those evaluations. In addition, we pay special attention to the latest generation of DeepFakes, highlighting its improvements and challenges for fake detection.

The remainder of the article is organised as follows. We first provide in Section 2 a general description of different types of facial manipulation. Then, from Section 3 to Section 6 we describe the key aspects of each type of facial manipulation including public databases for research, detection methods, and benchmark results. Section 7 focuses on other interesting types of face manipulation techniques not covered in previous sections. Finally, we provide in Section 8 our concluding remarks, highlighting open issues and future trends.

Section snippets

Types of facial manipulations

Facial manipulations can be categorised in four main different groups regarding the level of manipulation. Fig. 1 graphically summarises each facial manipulation group. A description of each of them is provided below, from higher to lower level of manipulation:

  • Entire Face Synthesis: this manipulation creates entire non-existent face images, usually through powerful GAN, e.g., through the recent StyleGAN approach proposed in [41]. These techniques achieve astonishing results, generating

Manipulation techniques and public databases

This manipulation creates entire non-existent face images. Table 1 summarises the main publicly available databases for research on detection of image manipulation techniques relying on entire face synthesis. Four different databases of fake images are of relevance here, all of them based on the same GAN architectures: ProGAN [48] and StyleGAN [41]. It is interesting to remark that each fake image may be characterised by a specific GAN fingerprint just like natural images are identified by a

Manipulation techniques and public databases

This is one of the most popular face manipulation research lines nowadays due to the great public concerns around DeepFakes [2], [3]. It consists of replacing the face of one person in a video with the face of another person. Unlike the entire face synthesis manipulation, where manipulations are carried out at image level, in identity swap the goal is to generate realistic fake videos.

Since publicly available fake databases such as the UADFV database [82], up to the recent Celeb-DF and DFDC

Manipulation techniques and public databases

This face manipulation consists of modifying in an image some attributes of the face such as the colour of the hair or the skin, the gender, the age, adding glasses, etc. Despite the success of GAN-based frameworks for general image translations and manipulations [43], [71], [125], [126], [127], [128], [129], and in particular for face attribute manipulations [43], [44], [68], [130], [131], [132], [133], [134], few databases are publicly available for research in this area, to the best of our

Manipulation techniques and public databases

This manipulation, also known as face reenactment, consists of modifying the facial expression of the person. We focus on the most popular techniques Face2Face and NeuralTextures, which replace the facial expression of one person in a video with the facial expression of another person (also in a video). To the best of our knowledge, the only available database for research in this area is FaceForensics++ [12], an extension of FaceForensics [87].

Initially, the FaceForensics database was focused

Other face manipulation directions

The four classes of face manipulation techniques described before are the ones that are receiving most attention in the last few years, but they do not perfectly represent all possible face manipulations. This section discusses some other challenging and dangerous approaches in face manipulation: face morphing, face de-identification, and face synthesis based on audio or text (i.e., audio-to-video and text-to-video).

Concluding remarks

Motivated by the ongoing success of digital face manipulations, specially DeepFakes, this survey provides a comprehensive panorama of the field, including details of up-to-date: i) types of facial manipulations, ii) facial manipulation techniques, iii) public databases for research, and iv) benchmarks for the detection of each facial manipulation group, including key results achieved by the most representative manipulation detection approaches.

Generally speaking, most current face manipulations

CRediT authorship contribution statement

Ruben Tolosana: Conceptualization, Investigation, Writing - original draft, Writing - review & editing, Visualization, Funding acquisition. Ruben Vera-Rodriguez: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Julian Fierrez: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Aythami Morales: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Javier Ortega-Garcia: Conceptualization, Writing -

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by projects: PRIMA (H2020-MSCA-ITN-2019-860315), TRESPASS-ETN (H2020-MSCA-ITN-2019-860813), BIBECA (MINECO/FEDER RTI2018-101248-B-I00), Bio-Guard (Ayudas Fundación BBVA a Equipos de Investigación Científica 2017), and Accenture. Ruben Tolosana is supported by Consejería de Educación, Juventud y Deporte de la Comunidad de Madrid y Fondo Social Europeo.

References (200)

  • S. Milani et al.

    An overview on video forensics

    APSIPA Transactions on Signal and Information Processing

    (2012)
  • A. Piva

    An overview on image forensics

    ISRN Signal Processing

    (2013)
  • A. Rössler et al.

    FaceForensics++: Learning to Detect Manipulated Facial Images

    Proc. IEEE/CVF International Conference on Computer Vision

    (2019)
  • J. Galbally et al.

    Biometric anti-Spoofing methods: A Survey in face recognition

    IEEE Access

    (2014)
  • A. Hadid et al.

    Biometrics systems under spoofing attack: an evaluation methodology and lessons learned

    IEEE Signal Process. Mag.

    (2015)
  • S. Marcel et al.

    Handbook of biometric anti-Spoofing (2nd edition)

    (2019)
  • J. Neves et al.

    GANprintR: improved fakes and evaluation of the state-of-the-Art in face manipulation detection

    IEEE J. Sel. Top. Signal Process.

    (2020)
  • H. Dang et al.

    On the Detection of Digital Face Manipulation

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • C. Canton et al.

    Applications of Computer Vision and Pattern Recognition to Media Forensics

    IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2019)
  • B. Biggio et al.

    Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes

    International Conference on Machine Learning

    (2019)
  • L. Verdoliva et al.

    Multimedia Forensics

    ACM Multimedia

    (2019)
  • K. Raja et al.

    Workshop on Deepfakes and Presentation Attacks in Biometrics

    IEEE Winter Conference on Applications of Computer Vision

    (2020)
  • M. Barni et al.

    MultiMedia Forensics in the Wild

    IEEE International Conference on Pattern Recognition

    (2020)
  • C. Bregler et al.

    Video rewrite: driving visual speech with audio

    Comput. Graph. (ACM)

    (1997)
  • D.P. Kingma et al.

    Auto-Encoding Variational Bayes

    Proc. International Conference on Learning Representations

    (2013)
  • I. Goodfellow et al.

    Generative Adversarial Nets

    Proc. Advances in Neural Information Processing Systems

    (2014)
  • I. Yerushalmy et al.

    Digital image forgery detection based on lens and sensor aberration

    Int. J. Comput. Vis.

    (2011)
  • A.C. Popescu et al.

    Exposing digital forgeries in color filter array interpolated images

    IEEE Trans. Signal Process.

    (2005)
  • H. Cao et al.

    Accurate detection of demosaicing regularity for digital image forensics

    IEEE Trans. Inf. Forensics Secur.

    (2009)
  • Z. Lin et al.

    Fast, automatic and fine-Grained tampered JPEG image detection via DCT coefficient analysis

    Pattern Recognit

    (2009)
  • Y.L. Chen et al.

    Detecting recompression of JPEG images via periodicity analysis of compression artifacts for tampering detection

    IEEE Trans. Inf. Forensics Secur.

    (2011)
  • I. Amerini et al.

    A SIFT-Based forensic method for copy-Move attack detection and transformation recovery

    IEEE Trans. Inf. Forensics Secur.

    (2011)
  • D. Cozzolino et al.

    Splicebuster: A new blind image splicing detector

    Proc. IEEE International Workshop on Information Forensics and Security

    (2015)
  • A. Gironi et al.

    A Video Forensic Technique for Detecting Frame Deletion and Insertion

    Proc. IEEE International Conference on Acoustics, Speech and Signal Processing

    (2014)
  • B.C. Hosler et al.

    Detecting Video Speed Manipulation

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    (2020)
  • H. Allcott et al.

    Social media and fake news in the 2016 election

    Journal of Economic Perspectives

    (2017)
  • D.M. Lazer et al.

    The science of fake news

    Science

    (2018)
  • I.M. J. Kietzmann et al.

    Deepfakes: trick or treat?

    Bus. Horiz.

    (2020)
  • L. Verdoliva

    Media forensics and DeepFakes: an overview

    IEEE J. Sel. Top. Signal Process.

    (2020)
  • Y. Li et al.

    Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • T. Karras et al.

    A Style-Based Generator Architecture for Generative Adversarial Networks

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2019)
  • E. Gonzalez-Sosa et al.

    Facial soft biometrics for recognition in the wild: recent works, annotation and COTS evaluation

    IEEE Trans. Inf. Forensics Secur.

    (2018)
  • Y. Choi et al.

    StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2018)
  • M. Liu et al.

    STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2019)
  • J. Thies et al.

    Face2face: Real-Time Face Capture and Reenactment of RGB Videos

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2016)
  • J. Thies et al.

    Deferred neural rendering: image synthesis using neural textures

    ACM Trans. Graph.

    (2019)
  • 100,000 Faces Generated by AI, 2018, URL...
  • T. Karras et al.

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    Proc. International Conference on Learning Representations

    (2018)
  • F. Marra et al.

    Do GANs Leave Artificial Fingerprints?

    Proc. IEEE Conference on Multimedia Information Processing and Retrieval

    (2019)
  • M. Albright et al.

    Source Generator Attribution via Inversion?

    Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    (2019)
  • Cited by (473)

    View all citing articles on Scopus
    View full text