·
Kamal Sankarraj
·6 min read

How to transcribe any audio or video file with AI

PandaStudio is a video editor, so people are sometimes surprised it transcribes loose audio files too. It does. A podcast mp3, a Zoom mp4, a voice memo m4a, a lecture mov — anything with an audio track goes in, and a clean transcript (or an SRT / VTT subtitle file) comes out. It runs entirely on your machine: nothing is uploaded, there's no per-minute fee, and it works offline once the model is cached.

There are two ways to do it. The easy one is to ask your AI assistant in plain English. The scriptable one is four terminal commands. Both are below.

The easy way: ask your AI assistant

If you already use ChatGPT or Claude, you can hand the whole job to it. PandaStudio ships an official Skill that teaches any AI agent its full command surface, so the agent knows exactly how to drive a transcription and how to format the result.

  1. 1. Install PandaStudio. Download the free app from writepanda.ai. The transcription engine is bundled, so there's nothing else to set up.
  2. 2. Connect your AI assistant. Follow the one-time setup guide for your tool — ChatGPT (Codex), Claude Desktop, Claude Code, Cursor, OpenClaw, or Hermes Agent. Each takes about five minutes.
  3. 3. Ask it. In a normal chat, type something like:

"Transcribe /Users/me/Desktop/interview.mp3 and give me an SRT subtitle file."

"Turn lecture.mp4 into a plain-text transcript with speaker paragraphs."

"Make a .vtt for podcast-ep12.wav so I can upload captions to YouTube."

The agent does the rest: it wraps your file in a PandaStudio project, runs the transcription, reads back the word-level timestamps, and formats them into exactly the file you asked for. Subtitles come back as valid SRT or VTT, ready to drop into YouTube, Premiere, or any player. You never see a command line.

The scriptable way: four CLI commands

Prefer to script it, or batching a folder of files? PandaStudio ships a CLI on npm. Install it once:

npm install -g @writepanda/cli

Then, from a file to a transcript is four commands. PandaStudio is project-based, so the first command wraps your file in a project (that's the one quirk — but it's a single line, and the file's audio is what gets read):

# 1. Wrap the file in a project (creates it, adds your audio/video as a clip)
ID=$(pandastudio project.new --name="Interview" \
  --withMedia="/abs/path/interview.mp3" --json | jq -r '.data.id')

# 2. Transcribe. The model (~470 MB) auto-downloads on your first ever run.
JOB=$(pandastudio transcript.transcribe --id=$ID --json | jq -r '.data.jobId')

# 3. Wait for the job to finish.
pandastudio job.wait --id=$JOB --json

# 4. Get the transcript — words + segments, each with millisecond timestamps.
pandastudio transcript.get --id=$ID --json

Step 4 returns JSON with both word-level and segment-level timestamps. For plain text, pull segments[].text. For subtitles, the timestamps are already there in milliseconds, so converting to SRT or VTT is a short script — or just hand the JSON to any LLM and ask it to format the subtitle file for you (which is exactly what the "easy way" above does under the hood).

The desktop app provides the transcription engine, so keep PandaStudio installed — the CLI auto-launches it if it isn't already open.

Languages, accuracy, and privacy

  • English + 25 European languages work out of the box with auto-detection, using a fast on-device model that preserves filler words and gives you word-level timing.
  • Chinese, Japanese, Korean, Hindi, Arabic, and Thai are supported via a second model. Open Settings, pick the language, and download the model once (about 1.1 GB), then transcribe as normal.
  • Nothing leaves your machine. Transcription runs locally on your CPU. There's no upload, no cloud queue, and no per-minute charge — a real difference from web transcription services if your audio is sensitive (client calls, interviews, medical, legal).
  • Works offline once the model is cached, so you can transcribe on a plane or anywhere without a connection.

From file to transcript in a few minutes

And once your file is in PandaStudio, you're one sentence away from the rest of the editor too: clean up the audio, cut filler words and silences from the transcript, add captions, and export — all by asking your AI assistant.