About

Architecture and flow for the prediction-ready voice deepfake detection system.

What this does

You upload an audio file. The backend loads it (torchaudio), resamples to 16 kHz, normalizes, and trims or pads to 3 seconds. RawNetLite plus our meta-learning layer (the research golden point) runs on the raw waveform and outputs a single probability P(fake). We threshold at 0.5 to label the clip as real or fake. The frontend stores each result in browser history so you can review past runs.

Our golden point: meta-learning layer

Our main research contribution is a meta-learning layer on top of the RawNetLite encoder. It enables quick adaptation to new domains or attack types with few examples, improving generalization to unseen deepfakes. See the Model page for details.

Frontend architecture

Next.js app with routes: Home, Detect, Process, Model, API docs, Get started, Mission, Results, Status, FAQ, About. All prediction and health calls go through Next.js API routes (/api/predict, /api/health), which proxy to the Flask backend. Types and schema live in src/types/prediction.ts. The results store (src/lib/results-store.ts) persists to localStorage. We provide production-level API documentation and a get-started guide for local setup and deployment. Framer Motion is used for section and list animations.

API & open source

We are open-sourcing the project and providing API access. See API docs for endpoints, request/response schemas, and code examples. An API playground is planned for in-browser testing. Get started covers local setup and model download. Deployment covers host locally, host on a VPS (where to get a server), Nginx, SSL, and security (secrets, HTTPS, firewall, rate limiting).

Backend (Flask)

Flask serves /health and /predict. Audio is loaded with torchaudio.load(), resampled to 16 kHz, converted to mono, peak-normalized, and trimmed/padded to 3 seconds before inference. The model returns { probability, label }. Ensure the model file is in the configured MODEL_ROOT and the backend is running for Detect and Status to work.

Pipeline in short

Upload → temp file → load waveform + sample rate → mono → resample 16 kHz → normalize → (1, 48000) → RawNetLite → logit → P(fake), label → JSON → frontend. For a step-by-step breakdown with code snippets and librosa comparison, see Process. For model architecture and training (augmentation, cross-domain, focal loss), see Model.

Desired end-to-end flow

1) User selects audio on Detect. 2) Frontend POSTs to /api/predict with the file. 3) Next.js forwards to Flask /predict. 4) Backend preprocesses (load, resample, normalize, trim/pad), runs model, returns probability and label. 5) Frontend displays the result and appends it to the results store. 6) User can view history on Results, check backend health on Status, and read Process and Model for details.