On this page

Generate premium captions

Generate high-accuracy captions for a video and attach them as a text track using the Mux Robots API.

Generate captions for a Mux asset and automatically attach them as a text track. Premium captions use a higher-accuracy speech model than Mux's standard auto-generated captions, and add optional speaker labels, word-level timestamps, and custom phrase hints for proper nouns and jargon. Captions are generated directly from the asset's audio, so no existing track is required. See the Generate Premium Captions API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.

Experimental

The generate-premium-captions workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.

If a text track with the same language code already exists on the asset, the job is rejected. Set replace_existing to true to delete the existing track first.

To produce captions, this workflow needs an audio-only static rendition of the asset. If the asset already has one, it's reused. If not, the workflow creates one to process the audio and deletes it once the job completes, so you aren't charged for extra storage.

Create a `generate-premium-captions` job

curl https://api.mux.com/robots/v0/jobs/generate-premium-captions \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "phrases": ["Mux", "API"]
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.generate_premium_captions.completed webhook: the payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/generate-premium-captions/{JOB_ID} with the id from the response until the status is completed.

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video to caption.
`language_code`	string	BCP 47 language code of the audio (e.g. `en`, `es`). Auto-detected when omitted.
`replace_existing`	boolean	When `true`, any existing text track with the same language code is deleted before the new track is uploaded. When `false` (the default), the request is rejected if a matching track already exists.
`track_name`	string	Custom name for the uploaded Mux text track. Defaults to `"{Language} (Generated)"` using the resolved language code.
`include_speakers`	boolean	When `true`, speaker labels are identified and added to each caption cue. Useful for interviews, podcasts, and multi-speaker content. Defaults to `false`.
`include_words`	boolean	When `true`, word-level timestamps are exported as a JSON file accessible via `temporary_words_url` in the output. Billed at a higher unit rate. Defaults to `false`.
`upload_to_mux`	boolean	Whether to upload the generated captions as a new text track on the asset. Defaults to `true`. When `false`, no track is created (and `replace_existing` must also be `false`); the captions remain available via `temporary_srt_url`.
`phrases`	array of strings	Best-effort list of words or short phrases (proper nouns, product names, jargon) likely to appear in the audio, used to bias recognition toward correct spellings. Up to 100 phrases, each up to 50 characters. Does not guarantee exact output.

Output

The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.generate_premium_captions.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/generate-premium-captions/{JOB_ID}. It contains:

Field	Type	Description
`track_id`	string	Mux text track ID of the newly uploaded caption track. Omitted when `upload_to_mux` is `false`.
`language_code`	string	Resolved language code of the generated captions (may differ from the requested code when auto-detected).
`temporary_srt_url`	string	Temporary pre-signed URL to download the generated SRT file. Expires 7 days after the job completes.
`temporary_words_url`	string	Temporary pre-signed URL to download the word-level timestamp JSON. Present when `include_words` is `true`. Expires 7 days after the job completes, so download and store it for long-term access.
`replaced_track_id`	string	Mux track ID of the deleted track, present when `replace_existing` was `true`.

Example response

This is the payload delivered to the robots.job.generate_premium_captions.completed webhook, and the same shape you get from GET /robots/v0/jobs/generate-premium-captions/{JOB_ID}:

{
  "data": {
    "id": "rjob_yza567",
    "workflow": "generate-premium-captions",
    "status": "completed",
    "units_consumed": 3000,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "replace_existing": false,
      "include_speakers": false,
      "include_words": false,
      "upload_to_mux": true,
      "phrases": ["Mux", "API"]
    },
    "outputs": {
      "track_id": "track_en_abc123",
      "language_code": "en",
      "temporary_srt_url": "https://storage.googleapis.com/..."
    }
  }
}

When upload_to_mux is true (the default), the caption track is automatically attached to your asset, and viewers will see the new language option in the player's caption menu.

Create a generate-premium-captions job

curl https://api.mux.com/robots/v0/jobs/generate-premium-captions \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "phrases": ["Mux", "API"]
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video to caption.
`language_code`	string	BCP 47 language code of the audio (e.g. `en`, `es`). Auto-detected when omitted.
`replace_existing`	boolean	When `true`, any existing text track with the same language code is deleted before the new track is uploaded. When `false` (the default), the request is rejected if a matching track already exists.
`track_name`	string	Custom name for the uploaded Mux text track. Defaults to `"{Language} (Generated)"` using the resolved language code.
`include_speakers`	boolean	When `true`, speaker labels are identified and added to each caption cue. Useful for interviews, podcasts, and multi-speaker content. Defaults to `false`.
`include_words`	boolean	When `true`, word-level timestamps are exported as a JSON file accessible via `temporary_words_url` in the output. Billed at a higher unit rate. Defaults to `false`.
`upload_to_mux`	boolean	Whether to upload the generated captions as a new text track on the asset. Defaults to `true`. When `false`, no track is created (and `replace_existing` must also be `false`); the captions remain available via `temporary_srt_url`.
`phrases`	array of strings	Best-effort list of words or short phrases (proper nouns, product names, jargon) likely to appear in the audio, used to bias recognition toward correct spellings. Up to 100 phrases, each up to 50 characters. Does not guarantee exact output.

Output

Field	Type	Description
`track_id`	string	Mux text track ID of the newly uploaded caption track. Omitted when `upload_to_mux` is `false`.
`language_code`	string	Resolved language code of the generated captions (may differ from the requested code when auto-detected).
`temporary_srt_url`	string	Temporary pre-signed URL to download the generated SRT file. Expires 7 days after the job completes.
`temporary_words_url`	string	Temporary pre-signed URL to download the word-level timestamp JSON. Present when `include_words` is `true`. Expires 7 days after the job completes, so download and store it for long-term access.
`replaced_track_id`	string	Mux track ID of the deleted track, present when `replace_existing` was `true`.

Example response

This is the payload delivered to the robots.job.generate_premium_captions.completed webhook, and the same shape you get from GET /robots/v0/jobs/generate-premium-captions/{JOB_ID}:

{
  "data": {
    "id": "rjob_yza567",
    "workflow": "generate-premium-captions",
    "status": "completed",
    "units_consumed": 3000,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "replace_existing": false,
      "include_speakers": false,
      "include_words": false,
      "upload_to_mux": true,
      "phrases": ["Mux", "API"]
    },
    "outputs": {
      "track_id": "track_en_abc123",
      "language_code": "en",
      "temporary_srt_url": "https://storage.googleapis.com/..."
    }
  }
}

When upload_to_mux is true (the default), the caption track is automatically attached to your asset, and viewers will see the new language option in the player's caption menu.