Generative AI models are increasingly used in creating synthetic media, including deepfakes and doing media manipulation. Their application ranges from creating innocuous memes to misleading and false media. Fact-checkers and organizations working to detect and combat such content often publish detailed reports outlining the tools and methods used to identify deepfakes and media manipulation in video/audio. They use a range of automated AI tools, both proprietary and open-source.
We wanted to gain a better understanding of the performance of open-source tools and machine learning models used for detecting deepfakes and AI-generated content. Doing so allows us to compare their results with proprietary tools and services, giving a useful baseline for evaluation of paid services. Additionally, there are various detection techniques, models, and tools used to identify synthetic content in video, image, and audio. Analyzing open-source tools will also help us understand the type of techniques/models available for deepfake detection. Various models differ in their pre-training data, model architecture and the features they prioritize; we wanted to see if it's possible to find correlations between model performance and the type of manipulation it can detect. We were also trying to understand how easy it is to set up and run the FOSS models. What frameworks, libraries, and tools are commonly used in training/inferencing these models? Do the model developers provide containerization options, such as Dockerfile? etc
For our initial exploration and literature review, we began by identifying open-source projects focused on deepfake detection. We chose to look at the Deepfake-o-Meter project from the University of Buffalo. It is an open platform that integrates state-of-the-art methods for AI generated image, video and audio detection, supporting 18 models across various modalities 1. We also chose it because it is an active platform, and more likely to have the best-performing models. The authors compare Deepfake-o-Meter with other commercial and open-source projects like Deepware, WeVerify, DuckDuckGoose, and Sensity, while also reporting its usage statistics.
We used this project to get an idea of the norms present in open source ML models for deepfake detection.
We document each model's details, including its underlying architecture, training dataset, experiment setup, results, some important conclusions, and shortcomings of the models, in this GitHub issue: https://github.com/tattle-made/deepfake-marker/issues/2
Alongside this, we also looked at the different types of deepfakes across various modalities and have categorized them in the figure below. This breakdown highlights the key forms of manipulation within image, video, audio, and multimodal deepfakes, which further gives us insights into their diverse applications.123
As we wanted to assess feasibility, in terms of training/inference and the potential for self-hosting. We did a high level categorisation of all the models in deepfake-o-meter based on attributes such as framework/library used, programming language, and ease of use (easy - instructions to run the model are given and weights are open, medium - model needs to be trained, hard - model needs to be trained or/and training instruction are vague/not available).
Model Name | Framework/Library | Language | Ease of use | Modality |
---|---|---|---|---|
Nodown | Torch | Python | Easy | Image |
GLFF | Torch | Python | Medium | Image |
DMimageDetection | Torch | Python | Hard | Image |
CLIP-ViT | Torch | Python | Easy | Image |
DSP-FWA | Torch | Python | Easy | Video |
LIPINC | Tensorflow, OpenCV | Python | Easy | Video |
LSDA | Torch | Python | Easy + Dockerfile Given | Video |
RawNet2 | Torch | Python | Hard | Audio |
RawNet2-Vocoder | Torch | Python | Easy | Audio |
Whisper | Torch, openai-whisper | Python | Easy | Audio |
(A more detailed extension of this table can be found here)
Of the 18 models in deepfake-o-meter, 6 were for image, 7 for video and 5 for audio.
DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection, https://arxiv.org/abs/2404.13146 ↩ ↩2
Deepfake Generation and Detection: A Benchmark and Survey, https://arxiv.org/abs/2403.17881 ↩
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset, https://arxiv.org/abs/2411.15457 ↩