Model & training

RawNetLite + meta-learning layer. Key numbers, architecture, and how we train for robust, cross-domain detection.

Key metrics

Representative performance from our evaluation setup. Actual numbers depend on dataset and evaluation protocol.

~94%

Accuracy (in-domain)

Typical on held-out same-domain data

< 8%

EER (cross-domain)

Equal error rate on unseen domains

> 0.96

AUC

Area under ROC curve

< 100 ms

Latency (CPU)

Per 3 s clip, single request

Model card

Input
Raw waveform, 16 kHz, 3 s, mono
Output
P(fake), label (real / fake)
Framework
PyTorch, RawNetLite + meta-learning
Limitations
3 s window; performance may vary on very low quality or unseen attack types

RawNetLite architecture

Lightweight 1D CNN on raw waveform—no mel or hand-crafted features. Learns directly from samples; "Lite" = fewer parameters for fast inference.

Input

(batch, 1, 48000) @ 16 kHz, 3 s

Backbone

1D conv blocks, no RNN

Output

Single logit → P(fake)

Meta-learning layer (our golden point)

A meta-learner on top of RawNetLite embeddings that adapts quickly to new domains or attack types with few examples—our main research contribution.

  • Adapts with minimal extra data
  • Improves cross-domain and few-shot performance
  • Stays effective as new deepfakes appear

Why it matters

New generators appear constantly. Meta-learning lets the system adapt without full retraining.

Training methodology

Checkpoint: augmented_triple_cross_domain_focal_rawnet_lite.pt

Augmentation

Noise, time stretch, codec simulation for robustness.

Triple / multi-domain

Multiple datasets and synthesis methods.

Cross-domain eval

EER, AUC on unseen domains.

Focal loss

Focus on hard examples, better calibration.

Training loop

  • Preprocess (16 kHz, 3 s, mono, normalize)
  • Augment per epoch → forward → focal loss → backprop
  • Validate on held-out and cross-domain sets → save best checkpoint

Datasets

Typical sources: ASVspoof (LA, PA, DF), ADD, in-the-wild; real from VCTK/LibriSpeech; fake from TTS/VC systems. Triple cross-domain = multiple domains for train and eval.

Inference

Load .pt with torch.load, same preprocessing (resample → mono → normalize → 3 s), one forward pass → logit → P(fake) and label. No augmentation at inference.