Generate high-accuracy captions for a video and attach them as a text track using the Mux Robots API.
Generate captions for a Mux asset and automatically attach them as a text track. Premium captions use a higher-accuracy speech model than Mux's standard auto-generated captions, and add optional speaker labels, word-level timestamps, and custom phrase hints for proper nouns and jargon. Captions are generated directly from the asset's audio, so no existing track is required. See the Generate Premium Captions API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.
Experimental
The generate-premium-captions workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.
If a text track with the same language code already exists on the asset, the job is rejected. Set replace_existing to true to delete the existing track first.
To produce captions, this workflow needs an audio-only static rendition of the asset. If the asset already has one, it's reused. If not, the workflow creates one to process the audio and deletes it once the job completes, so you aren't charged for extra storage.
generate-premium-captions jobcurl https://api.mux.com/robots/v0/jobs/generate-premium-captions \
-H "Content-Type: application/json" \
-X POST \
-d '{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"language_code": "en",
"phrases": ["Mux", "API"]
}
}' \
-u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.generate_premium_captions.completed webhook: the payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/generate-premium-captions/{JOB_ID} with the id from the response until the status is completed.
| Parameter | Type | Description |
|---|---|---|
asset_id | string | Required. The Mux asset ID of the video to caption. |
language_code | string | BCP 47 language code of the audio (e.g. en, es). Auto-detected when omitted. |
replace_existing | boolean | When true, any existing text track with the same language code is deleted before the new track is uploaded. When false (the default), the request is rejected if a matching track already exists. |
track_name | string | Custom name for the uploaded Mux text track. Defaults to "{Language} (Generated)" using the resolved language code. |
include_speakers | boolean | When true, speaker labels are identified and added to each caption cue. Useful for interviews, podcasts, and multi-speaker content. Defaults to false. |
include_words | boolean | When true, word-level timestamps are exported as a JSON file accessible via temporary_words_url in the output. Billed at a higher unit rate. Defaults to false. |
upload_to_mux | boolean | Whether to upload the generated captions as a new text track on the asset. Defaults to true. When false, no track is created (and replace_existing must also be false); the captions remain available via temporary_srt_url. |
phrases | array of strings | Best-effort list of words or short phrases (proper nouns, product names, jargon) likely to appear in the audio, used to bias recognition toward correct spellings. Up to 100 phrases, each up to 50 characters. Does not guarantee exact output. |
The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.generate_premium_captions.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/generate-premium-captions/{JOB_ID}. It contains:
| Field | Type | Description |
|---|---|---|
track_id | string | Mux text track ID of the newly uploaded caption track. Omitted when upload_to_mux is false. |
language_code | string | Resolved language code of the generated captions (may differ from the requested code when auto-detected). |
temporary_srt_url | string | Temporary pre-signed URL to download the generated SRT file. Expires 7 days after the job completes. |
temporary_words_url | string | Temporary pre-signed URL to download the word-level timestamp JSON. Present when include_words is true. Expires 7 days after the job completes, so download and store it for long-term access. |
replaced_track_id | string | Mux track ID of the deleted track, present when replace_existing was true. |
This is the payload delivered to the robots.job.generate_premium_captions.completed webhook, and the same shape you get from GET /robots/v0/jobs/generate-premium-captions/{JOB_ID}:
{
"data": {
"id": "rjob_yza567",
"workflow": "generate-premium-captions",
"status": "completed",
"units_consumed": 3000,
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"language_code": "en",
"replace_existing": false,
"include_speakers": false,
"include_words": false,
"upload_to_mux": true,
"phrases": ["Mux", "API"]
},
"outputs": {
"track_id": "track_en_abc123",
"language_code": "en",
"temporary_srt_url": "https://storage.googleapis.com/..."
}
}
}When upload_to_mux is true (the default), the caption track is automatically attached to your asset, and viewers will see the new language option in the player's caption menu.