Architecture and flow for the prediction-ready voice deepfake detection system.
You upload an audio file. The backend loads it, resamples, and slices it into 3-second chunks. Each chunk is converted into a 288-dimensional feature vector and passed through a prototypical meta-learning model that compares it against Real/Fake prototypes. We aggregate chunk votes into a file-level probability P(fake) and label. The frontend stores each result in browser history so you can review past runs.
Our main research contribution is a meta-learning layer on top of a prototypical encoder. It enables quick adaptation to new domains or attack types with few examples, improving generalization to unseen deepfakes. See the Model page for details.
Next.js app with routes: Home, Detect, Process, Model, API docs, Get started, Mission, Results, Status, FAQ, About. All prediction and health calls go through Next.js API routes (/api/predict, /api/health), which proxy to the Flask backend. Types and schema live in src/types/prediction.ts. The results store (src/lib/results-store.ts) persists to localStorage. We provide production-level API documentation and a get-started guide for local setup and deployment. Framer Motion is used for section and list animations.
We are open-sourcing the project and providing API access. See API docs for endpoints, request/response schemas, and code examples. An API playground is planned for in-browser testing. Get started covers local setup and model download. Deployment covers host locally, host on a VPS (where to get a server), Nginx, SSL, and security (secrets, HTTPS, firewall, rate limiting).
The production backend is a FastAPI service (with an optional Docker stack including Gunicorn + Nginx). Audio is loaded from a temporary file, standardized, chunked, and sent through the prototypical meta-learning model described on the Model page. The API returns both the aggregated verdict and per-chunk details; the Next.js API routes normalize this for the UI.
Upload → temp file → load waveform + sample rate → mono → resample → 3 s chunks → 288‑dim features → prototypical meta-learning → votes + aggregates → JSON → frontend. For a step-by-step breakdown with code snippets, see Process. For model architecture and training (features, prototypes, aggregation), see Model.
1) User selects audio on Detect. 2) Frontend POSTs to /api/predict with the file. 3) Next.js forwards to Flask /predict. 4) Backend preprocesses (load, resample, normalize, trim/pad), runs model, returns probability and label. 5) Frontend displays the result and appends it to the results store. 6) User can view history on Results, check backend health on Status, and read Process and Model for details.