Segment a Mux video asset into ordered scenes using the Mux Robots API.
Experimental
The find-scenes workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.
find-scenes can be enabled by contacting our support team.
Find scenes in a video and get structured, timestamped scene metadata. This workflow segments a Mux asset into ordered scenes, using visual signals and transcript cues when available. It is useful for building scene indexes, searchable video catalogs, navigation aids, and review workflows. See the Find Scenes API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.
Find scenes requires a video asset with generated shots in the ready state. Audio-only assets are not supported. Segmentation works from visual shots alone, but a text track makes the output richer by adding audible context to each scene. For the best results, make sure your asset has captions, either auto-generated or manually added, before running a find-scenes job.
find-scenes jobcurl https://api.mux.com/robots/v0/jobs/find-scenes \
-H "Content-Type: application/json" \
-X POST \
-d '{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"language_code": "en",
"min_scenes": 4,
"min_scene_duration_ms": 15000
}
}' \
-u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.find_scenes.completed webhook. The payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/find-scenes/{JOB_ID} with the id from the response until the status is completed.
| Parameter | Type | Description |
|---|---|---|
asset_id | string | Required. The Mux asset ID of the video asset to segment into scenes. |
language_code | string | Optional. Preferred transcript language code to analyze when a matching ready transcript track is available. |
min_scenes | integer | Optional. Lower-bound hint for scene segmentation. Mux avoids collapsing clearly distinct beats below this count when the content supports more granular scene boundaries. |
min_scene_duration_ms | integer | Optional. Preferred minimum scene duration in milliseconds. Must be at least 1000. Defaults to 15000. |
output_steering | object | Optional. Curated controls that guide scene grouping, titles, narratives, and concepts without changing the response schema. See Output steering. |
Use output_steering when you want best-effort control over how scenes are grouped and described. These fields guide the workflow but do not guarantee exact output.
| Field | Type | Description |
|---|---|---|
segmentation_strategy | string | Preferred segmentation strategy. Supported values: editorial_beats, topic_changes, visual_transitions, action_progression, and instructional_steps. |
title_style | string | Preferred scene title style. Supported values: descriptive, editorial, search_optimized, and accessibility. |
narration_detail | string | Preferred level of narrative detail. Supported values: concise, balanced, and detailed. |
audience | string | Intended audience used to guide scene grouping, titles, and narrative emphasis. |
brand_terms | array of strings | Preferred brand or domain terms to use when supported by the source content. |
topic_taxonomy | object | Controlled vocabulary used to steer scene concepts without changing the response schema. |
topic_taxonomy.name | string | Optional customer-facing name for the taxonomy. |
topic_taxonomy.values | array | Controlled vocabulary values. Each value has a required label and optional description and aliases. |
topic_taxonomy.allow_other | boolean | When true, non-taxonomy values may be used when no taxonomy value applies. |
{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"output_steering": {
"segmentation_strategy": "instructional_steps",
"title_style": "search_optimized",
"narration_detail": "detailed",
"audience": "support agents reviewing product walkthroughs",
"brand_terms": ["Mux Video", "Mux Data"],
"topic_taxonomy": {
"name": "Product areas",
"values": [
{
"label": "Video uploads",
"description": "Uploading, ingesting, or configuring new video assets",
"aliases": ["uploads", "direct uploads"]
},
{
"label": "Playback analytics",
"description": "Viewer experience, engagement, or streaming performance"
}
],
"allow_other": true
}
}
}
}The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.find_scenes.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/find-scenes/{JOB_ID}. It contains:
| Field | Type | Description |
|---|---|---|
scenes | array | Ordered scene segments extracted from the asset. |
scenes[].start_ms | number | Scene start time in milliseconds. |
scenes[].end_ms | number | Scene end time in milliseconds. |
scenes[].title | string | Short human-readable title for the scene. |
scenes[].cues | array | Transcript cues that overlap the scene. Can be empty when no usable transcript is available. |
scenes[].cues[].start_ms | number | Cue start time in milliseconds. |
scenes[].cues[].end_ms | number | Cue end time in milliseconds. |
scenes[].cues[].text | string | Transcript text for this cue. |
scenes[].audible_narrative | string | One-sentence summary of what is being said during the scene. Present only when transcript cues are available. |
scenes[].blended_narrative | string | Combined audible and visual narrative for the scene when both signals are meaningfully available. |
scenes[].notable_audible_concepts | array | Multi-word descriptive phrases capturing key audible concepts in the scene. Present only when transcript cues are available. |
scenes[].visual_narrative | string | One-sentence summary of what is visually happening during the scene. |
scenes[].notable_visual_concepts | array | Scored visual concepts extracted from sampled scene frames. Each item has concept, score, and rationale. |
scenes[].shots | array | Optional ordered shot or beat breakdown within the scene, including timings and short visual descriptions. |
scenes[].shots[].start_ms | number | Shot or beat start time in milliseconds. |
scenes[].shots[].end_ms | number | Shot or beat end time in milliseconds. |
scenes[].shots[].visual_description | string | Short visual description of what happens during this shot or beat. |
scenes[].shot_count | integer | How many timed shot or beat entries make up the scene. When shots is present, this equals shots.length. |
This is the payload delivered to the robots.job.find_scenes.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-scenes/{JOB_ID}:
{
"data": {
"id": "rjob_scn123",
"workflow": "find-scenes",
"status": "completed",
"units_consumed": 1,
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"language_code": "en",
"min_scenes": 4,
"min_scene_duration_ms": 15000
},
"outputs": {
"scenes": [
{
"start_ms": 0,
"end_ms": 18500,
"title": "Opening Product Context",
"cues": [
{
"start_ms": 500,
"end_ms": 4300,
"text": "Today we are walking through the upload workflow."
}
],
"audible_narrative": "The speaker introduces the upload workflow and sets context for the walkthrough.",
"blended_narrative": "A presenter introduces the product workflow while the interface is visible on screen.",
"notable_audible_concepts": ["upload workflow walkthrough", "product context"],
"visual_narrative": "A presenter stands beside a screen showing a video upload interface.",
"notable_visual_concepts": [
{
"concept": "upload interface",
"score": 0.88,
"rationale": "The visible UI anchors the scene around the upload workflow."
}
],
"shots": [
{
"start_ms": 0,
"end_ms": 8200,
"visual_description": "The presenter appears next to the dashboard."
},
{
"start_ms": 8200,
"end_ms": 18500,
"visual_description": "The screen focuses on the upload controls."
}
],
"shot_count": 2
}
]
}
}
}