Whisper on Swedish hardware

Record.
Transcribe.
Who said what.

Audio to text with speaker diarization. Whisper-large-v3 runs on dedicated GPU hardware in Sweden. Your audio never leaves the country.

Why staik VOICE?

The Whisper model runs on dedicated GPU hardware in Sweden. No audio is sent outside the country and nothing is used for AI training.

Pyannote 3.1 separates speakers automatically — perfect for meetings, interviews and podcasts. Toggle on/off as needed.

Same API as OpenAI's /v1/audio/transcriptions. Just swap base_url. Supports mp3, wav, m4a, webm, ogg.

From audio file to diarized text in seconds.

Drop an audio file or pick from your device. Supports mp3, wav, m4a, webm and ogg up to 100 MB.

The WhisperX pipeline runs large-v3 on Swedish GPUs and pyannote separates speakers with word-level timestamps.

Get plain text, diarized text, SRT, VTT or JSON with word-level timestamps and speaker labels.