Deepfakes and beyond: A Survey of face manipulation and fake detection
Introduction
Fake images and videos including facial information generated by digital manipulation, in particular with DeepFake methods [1], have become a great public concern recently [2], [3]. The very popular term “DeepFake” is referred to a deep learning based technique able to create fake videos by swapping the face of a person by the face of another person. This term was originated after a Reddit user named “deepfakes” claimed in late 2017 to have developed a machine learning algorithm that helped him to transpose celebrity faces into porn videos [4]. In addition to fake pornography, some of the more harmful usages of such fake content include fake news, hoaxes, and financial fraud. As a result, the area of research traditionally dedicated to general media forensics [5], [6], [7], [8], [9], [10], [11], is being invigorated and is now dedicating growing efforts for detecting facial manipulation in image and video [12] Part of these renewed efforts in fake face detection are built around past research in biometric anti-spoofing [13], [14], [15] and modern data-driven deep learning [16], [17]. The growing interest in fake face detection is demonstrated through the increasing number of workshops in top conferences [18], [19], [20], [21], [22], international projects such as MediFor funded by the Defense Advanced Research Project Agency (DARPA), and competitions such as the recent Media Forensics Challenge (MFC2018)1 and the Deepfake Detection Challenge (DFDC)2 launched by the National Institute of Standards and Technology (NIST) and Facebook, respectively.
Traditionally, the number and realism of facial manipulations have been limited by the lack of sophisticated editing tools, the domain expertise required, and the complex and time-consuming process involved. For example, an early work in this topic [23] was able to modify the lip motion of a person speaking using a different audio track, by making connections between the sounds of the audio track and the shape of the subject’s face. However, from these early works up to date, many things have rapidly evolved in the last years. Nowadays, it is becoming increasingly easy to automatically synthesise non-existent faces or manipulate a real face of one person in an image/video, thanks to: i) the accessibility to large-scale public data, and ii) the evolution of deep learning techniques that eliminate many manual editing steps such as Autoencoders (AE) and Generative Adversarial Networks (GAN) [24], [25]. As a result, open software and mobile application such as ZAO3 and FaceApp4 have been released opening the door to anyone to create fake images and videos, without any experience in the field needed.
In response to those increasingly sophisticated and realistic manipulated content, large efforts are being carried out by the research community to design improved methods for face manipulation detection. Traditional fake detection methods in media forensics have been commonly based on: i) in-camera fingerprints, the analysis of the intrinsic fingerprints introduced by the camera device, both hardware and software, such as the optical lens [26], colour filter array and interpolation [27], [28], and compression [29], [30], among others, and ii) out-camera fingerprints, the analysis of the external fingerprints introduced by editing software, such as copy-paste or copy-move different elements of the image [31], [32], reduce the frame rate in a video [33], [34], [35], etc. However, most of the features considered in traditional fake detection methods are highly dependent on the specific training scenario, being therefore not robust against unseen conditions [6], [8], [16]. This is of special importance in the era we live in as most media fake content is usually shared on social networks, whose platforms automatically modify the original image/video, e.g., through compression and resize operations [12].
This survey provides an in-depth review of digital manipulation techniques applied to facial content due to the large number of possible harmful applications, e.g., the generation of fake news that would provide misinformation in political elections and security threats [36], [37]. Specifically, we cover four types of manipulations: i) entire face synthesis, ii) identity swap, iii) attribute manipulation, and iv) expression swap. These four main types of face manipulation are well established by the research community, receiving most attention in the last few years. Besides, we also review in this survey some other challenging and dangerous face manipulation that are not so popular yet like face morphing.
Finally, for completeness, we would like to highlight other recent surveys in the field. In [38], the authors cover the topic of DeepFakes from a general perspective, proposing the R.E.A.L framework to manage DeepFake risks. In addition, Verdoliva has recently surveyed in [39] traditional manipulation and fake detection approaches considered in general media forensics, and also the latest deep learning techniques. The present survey complements [38] and [39] with a more detailed review of each facial manipulation group, including manipulation techniques, existing public databases, and key benchmarks for technology evaluation of fake detection methods, including a summary of results from those evaluations. In addition, we pay special attention to the latest generation of DeepFakes, highlighting its improvements and challenges for fake detection.
The remainder of the article is organised as follows. We first provide in Section 2 a general description of different types of facial manipulation. Then, from Section 3 to Section 6 we describe the key aspects of each type of facial manipulation including public databases for research, detection methods, and benchmark results. Section 7 focuses on other interesting types of face manipulation techniques not covered in previous sections. Finally, we provide in Section 8 our concluding remarks, highlighting open issues and future trends.
Section snippets
Types of facial manipulations
Facial manipulations can be categorised in four main different groups regarding the level of manipulation. Fig. 1 graphically summarises each facial manipulation group. A description of each of them is provided below, from higher to lower level of manipulation:
- •
Entire Face Synthesis: this manipulation creates entire non-existent face images, usually through powerful GAN, e.g., through the recent StyleGAN approach proposed in [41]. These techniques achieve astonishing results, generating
Manipulation techniques and public databases
This manipulation creates entire non-existent face images. Table 1 summarises the main publicly available databases for research on detection of image manipulation techniques relying on entire face synthesis. Four different databases of fake images are of relevance here, all of them based on the same GAN architectures: ProGAN [48] and StyleGAN [41]. It is interesting to remark that each fake image may be characterised by a specific GAN fingerprint just like natural images are identified by a
Manipulation techniques and public databases
This is one of the most popular face manipulation research lines nowadays due to the great public concerns around DeepFakes [2], [3]. It consists of replacing the face of one person in a video with the face of another person. Unlike the entire face synthesis manipulation, where manipulations are carried out at image level, in identity swap the goal is to generate realistic fake videos.
Since publicly available fake databases such as the UADFV database [82], up to the recent Celeb-DF and DFDC
Manipulation techniques and public databases
This face manipulation consists of modifying in an image some attributes of the face such as the colour of the hair or the skin, the gender, the age, adding glasses, etc. Despite the success of GAN-based frameworks for general image translations and manipulations [43], [71], [125], [126], [127], [128], [129], and in particular for face attribute manipulations [43], [44], [68], [130], [131], [132], [133], [134], few databases are publicly available for research in this area, to the best of our
Manipulation techniques and public databases
This manipulation, also known as face reenactment, consists of modifying the facial expression of the person. We focus on the most popular techniques Face2Face and NeuralTextures, which replace the facial expression of one person in a video with the facial expression of another person (also in a video). To the best of our knowledge, the only available database for research in this area is FaceForensics++ [12], an extension of FaceForensics [87].
Initially, the FaceForensics database was focused
Other face manipulation directions
The four classes of face manipulation techniques described before are the ones that are receiving most attention in the last few years, but they do not perfectly represent all possible face manipulations. This section discusses some other challenging and dangerous approaches in face manipulation: face morphing, face de-identification, and face synthesis based on audio or text (i.e., audio-to-video and text-to-video).
Concluding remarks
Motivated by the ongoing success of digital face manipulations, specially DeepFakes, this survey provides a comprehensive panorama of the field, including details of up-to-date: i) types of facial manipulations, ii) facial manipulation techniques, iii) public databases for research, and iv) benchmarks for the detection of each facial manipulation group, including key results achieved by the most representative manipulation detection approaches.
Generally speaking, most current face manipulations
CRediT authorship contribution statement
Ruben Tolosana: Conceptualization, Investigation, Writing - original draft, Writing - review & editing, Visualization, Funding acquisition. Ruben Vera-Rodriguez: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Julian Fierrez: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Aythami Morales: Conceptualization, Writing - review & editing, Visualization, Funding acquisition. Javier Ortega-Garcia: Conceptualization, Writing -
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work has been supported by projects: PRIMA (H2020-MSCA-ITN-2019-860315), TRESPASS-ETN (H2020-MSCA-ITN-2019-860813), BIBECA (MINECO/FEDER RTI2018-101248-B-I00), Bio-Guard (Ayudas Fundación BBVA a Equipos de Investigación Científica 2017), and Accenture. Ruben Tolosana is supported by Consejería de Educación, Juventud y Deporte de la Comunidad de Madrid y Fondo Social Europeo.
References (200)
Digital image integrity - a survey of protection and verification techniques
Digit. Signal Process.
(2017)- et al.
Exposing Video Inter-Frame Forgery based on Velocity Field Consistency
Proc. IEEE International Conference on Acoustics, Speech and signal Processing
(2014) - P. Korshunov, S. Marcel, Deepfakes: a New Threat to Face Recognition? Assessment and Detection, arXiv:1812.08685...
- D. Citron, How DeepFake Undermine Truth and Threaten Democracy, 2019, URL...
- R. Cellan-Jones, Deepfake Videos Double in Nine Months, 2019, URL...
- BBC Bitesize, Deepfakes: What Are They and Why Would I Make One?, 2019, URL...
- et al.
Digital image forensics via intrinsic fingerprints
IEEE Trans. Inf. Forensics Secur.
(2008) Image forgery detection
IEEE Signal Process. Mag.
(2009)- et al.
Forensic detection of image manipulation using statistical intrinsic fingerprints
IEEE Trans. Inf. Forensics Secur.
(2010) - et al.
Vision of the unseen: current trends and challenges in digital image and video forensics
ACM Comput. Surv.
(2011)
An overview on video forensics
APSIPA Transactions on Signal and Information Processing
An overview on image forensics
ISRN Signal Processing
FaceForensics++: Learning to Detect Manipulated Facial Images
Proc. IEEE/CVF International Conference on Computer Vision
Biometric anti-Spoofing methods: A Survey in face recognition
IEEE Access
Biometrics systems under spoofing attack: an evaluation methodology and lessons learned
IEEE Signal Process. Mag.
Handbook of biometric anti-Spoofing (2nd edition)
GANprintR: improved fakes and evaluation of the state-of-the-Art in face manipulation detection
IEEE J. Sel. Top. Signal Process.
On the Detection of Digital Face Manipulation
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Applications of Computer Vision and Pattern Recognition to Media Forensics
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes
International Conference on Machine Learning
Multimedia Forensics
ACM Multimedia
Workshop on Deepfakes and Presentation Attacks in Biometrics
IEEE Winter Conference on Applications of Computer Vision
MultiMedia Forensics in the Wild
IEEE International Conference on Pattern Recognition
Video rewrite: driving visual speech with audio
Comput. Graph. (ACM)
Auto-Encoding Variational Bayes
Proc. International Conference on Learning Representations
Generative Adversarial Nets
Proc. Advances in Neural Information Processing Systems
Digital image forgery detection based on lens and sensor aberration
Int. J. Comput. Vis.
Exposing digital forgeries in color filter array interpolated images
IEEE Trans. Signal Process.
Accurate detection of demosaicing regularity for digital image forensics
IEEE Trans. Inf. Forensics Secur.
Fast, automatic and fine-Grained tampered JPEG image detection via DCT coefficient analysis
Pattern Recognit
Detecting recompression of JPEG images via periodicity analysis of compression artifacts for tampering detection
IEEE Trans. Inf. Forensics Secur.
A SIFT-Based forensic method for copy-Move attack detection and transformation recovery
IEEE Trans. Inf. Forensics Secur.
Splicebuster: A new blind image splicing detector
Proc. IEEE International Workshop on Information Forensics and Security
A Video Forensic Technique for Detecting Frame Deletion and Insertion
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing
Detecting Video Speed Manipulation
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Social media and fake news in the 2016 election
Journal of Economic Perspectives
The science of fake news
Science
Deepfakes: trick or treat?
Bus. Horiz.
Media forensics and DeepFakes: an overview
IEEE J. Sel. Top. Signal Process.
Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
A Style-Based Generator Architecture for Generative Adversarial Networks
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Facial soft biometrics for recognition in the wild: recent works, annotation and COTS evaluation
IEEE Trans. Inf. Forensics Secur.
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Face2face: Real-Time Face Capture and Reenactment of RGB Videos
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Deferred neural rendering: image synthesis using neural textures
ACM Trans. Graph.
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Proc. International Conference on Learning Representations
Do GANs Leave Artificial Fingerprints?
Proc. IEEE Conference on Multimedia Information Processing and Retrieval
Source Generator Attribution via Inversion?
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Cited by (473)
Video Deepfake classification using particle swarm optimization-based evolving ensemble models
2024, Knowledge-Based SystemsMisinformed by images: How images influence perceptions of truth and what can be done about it
2024, Current Opinion in PsychologyDisentangling different levels of GAN fingerprints for task-specific forensics
2024, Computer Standards and InterfacesConspiracy thinking and social media use are associated with ability to detect deepfakes
2024, Telematics and Informatics