Documentation
Everything you need to know about staik VOICE.
Getting started
- Click "Try for free" on the homepage, or log in with your own API key from api.staik.se.
- Drop an audio file or pick one from your device. Supports mp3, wav, m4a, webm and ogg up to 100 MB / 30 minutes.
- Choose language (auto, Swedish or English) and whether speaker diarization should be on.
- Click "Transcribe" — Whisper-large-v3 runs on a Swedish GPU and pyannote separates speakers. The result appears on screen.
- Copy, download as .txt, .srt, .vtt or JSON, or share directly.
Supported formats
Audio formats
- MP3
- WAV
- M4A / AAC
- WebM (Opus / Vorbis)
- OGG
Use cases
- Meeting notes
- Interviews and podcasts
- Lectures and seminars
- Voice memos
- Internal calls
- Research interviews
API reference
staik VOICE uses the staik API (api.staik.se/v1/audio/transcriptions) for transcription. It's OpenAI Whisper-compatible — just swap base_url.
POST
https://api.staik.se/v1/audio/transcriptionsAuthentication
Bearer token in the Authorization header. Get a key at api.staik.se.
Model
whisper-large-v3curl
curl -X POST https://api.staik.se/v1/audio/transcriptions \
-H "Authorization: Bearer sk-..." \
-F file=@meeting.mp3 \
-F model=whisper-large-v3 \
-F response_format=verbose_json \
-F diarize=true \
-F language=svPython (openai SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
base_url="https://api.staik.se/v1",
)
with open("meeting.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
file=f,
model="whisper-large-v3",
response_format="verbose_json",
extra_body={"diarize": True},
)
for seg in transcript.segments:
speaker = seg.get("speaker", "?")
print(f"[{speaker} {seg['start']:.1f}s] {seg['text']}")Response
The response follows OpenAI's verbose_json format with extra fields for speaker per segment and word-level timestamps.
Limitations
- MVP: max 100 MB file size and 30 minutes of audio in sync mode.
- Longer files (>30 min) are handled via async mode in the next stage.
- Pricing model: 1 minute of audio = 1,000 tokens.
- Speaker diarization needs at least two distinct voices to work well.
- Whisper-large-v3 supports 99 languages; Swedish and English are primarily tested.
- Quality depends on recording — avoid heavy background noise and overlapping speech.
Plans and pricing
The demo account is free for short clips. With your own key you get more tokens and can transcribe longer files. See all plans at api.staik.se→
Tips for best results
- Record in a quiet environment — minimize background noise.
- Place the microphone so all participants are heard equally.
- Use an external microphone for longer meetings — better quality means better transcription.
- 16 kHz mono is enough — higher sample rates don't add quality.
- For diarization: avoid people talking over each other.