Skip to content
AI transcription · 99 languages · self-hosted

Audio to text, in any language.

Drop an audio or video file, get a clean transcript with timestamps in seconds. Powered by an open-source AI model running on our own servers. Free. No signup needed.

200 MB / filetxt · srt · vtt · jsonfiles purged in 24hno third-party API
Task

Audio + video · up to 200 MB · 99 languages · files purged in 24h

Word-level timestamps

Every transcript ships ready-to-embed subtitles — SRT and WebVTT — alongside plain text and JSON.

99 languages, 1 engine

Auto-detect by default. Switch to translate mode and get any source language → English in one pass.

Self-hosted, private

No OpenAI, no Google, no third-party API. Files are deleted within 24 hours. No model training.

How it works

Three steps, no upsell.

  1. 01

    Drop a file

    Audio or video, up to 200 MB. mp3, wav, m4a, mp4, mov, webm — all accepted.

  2. 02

    Pick options

    Choose a language (or auto-detect), a model size, and whether to translate to English.

  3. 03

    Download

    Plain text, SRT, WebVTT, or full JSON with timestamps — all generated in one pass.

Self-hosted AI

Your audio doesn’t leave our servers.

SpeedScribe runs an open-source speech-recognition model on our own infrastructure — no calls to OpenAI, Google, or any other third-party API. We delete uploaded files automatically and never use them for model training.

  • Word-level timestamps · synced subtitles ready to embed.
  • 99 languages with auto-detect, plus a translate-to-English mode.
  • Guest files deleted in 24h, account deletion wipes everything in 7 days.
  • Output: plain text, SRT, WebVTT, full JSON — all four every time.
speedscribe · job_a3f12cdone

interview.m4a · 22:14 · 21.8 MB

[00:00:04] “So we ran the experiment three times — every run gave us the same outcome.”

[00:00:11] “That’s the part that surprised everyone on the team.”

.txt.srt.vtt.json

Languages

99 languages. One engine.

Auto-detect on by default. Pin a language only if the engine guesses wrong. Translate mode forces output to English from any source.

  • English
  • Русский
  • Español
  • Français
  • Deutsch
  • Italiano
  • Português
  • Polski
  • Nederlands
  • Türkçe
  • العربية
  • हिन्दी
  • 中文
  • 日本語
  • 한국어
  • Tiếng Việt
  • ไทย
  • Bahasa Indonesia
  • Українська
  • Čeština
  • Dansk
  • Suomi
  • Norsk
  • + 75 more

Made for

People who ship text.

Podcasts

Generate show-notes and chapter timestamps in minutes.

Meetings

Drop a Zoom recording, get searchable notes.

YouTube subtitles

Export VTT and upload — no third-party caption service.

Interviews

Transcribe research interviews while keeping files private.

Lectures

Turn class recordings into study material.

Voice notes

Keep dictated notes searchable and exportable.

FAQ

Frequently asked.

Is SpeedScribe really free?

Yes. Guests get a few free transcriptions and ~30 minutes of audio total. Registered users get 30 jobs/day and ~2 hours of audio/day, no credit card.

How accurate is it?

On clear English audio, very accurate (low single-digit % WER on the larger models). Noisy recordings, strong accents, and overlapping speakers are harder — pick a bigger model for those.

Which file formats are supported?

Common audio (mp3, wav, m4a, ogg, opus, flac) and video (mp4, mov, webm) up to 200 MB. We extract the audio internally; you don't have to convert anything beforehand.

Does it support languages other than English?

Yes — 99 languages. Auto-detect is on by default. There's also a translate mode that forces output to English from any source language.

What output formats do I get?

Every job emits four formats: plain text (.txt), SRT subtitles (.srt), WebVTT (.vtt), and a full JSON with segment-level timestamps. You can download any of them.

Do you keep my files?

Guest uploads vanish in 24 hours. Logged-in users keep history 30 days, or delete it manually from the dashboard.

Do you train models on my audio?

No. Your audio is used only to fulfil your transcription job. We don't repurpose it.

How long does it take?

Roughly 0.3–0.6× the audio length on the default model. A 30-minute file finishes in ~10 minutes; a 5-minute file in under 2.

Want history? Make an account.

Free forever. Email is the only thing we ask for.

share