On this page

Find scenes

Segment a Mux video asset into ordered scenes using the Mux Robots API.

Experimental

The find-scenes workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.

Find scenes in a video and get structured, timestamped scene metadata. This workflow segments a Mux asset into ordered scenes, using visual signals and transcript cues when available. It is useful for building scene indexes, searchable video catalogs, navigation aids, and review workflows. See the Find Scenes API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.

Find scenes requires a video asset with generated shots in the ready state. Audio-only assets are not supported. Segmentation works from visual shots alone, but a text track makes the output richer by adding audible context to each scene. For the best results, make sure your asset has captions, either auto-generated or manually added, before running a find-scenes job.

Create a `find-scenes` job

curl https://api.mux.com/robots/v0/jobs/find-scenes \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.find_scenes.completed webhook. The payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/find-scenes/{JOB_ID} with the id from the response until the status is completed.

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video asset to segment into scenes.
`language_code`	string	Optional. Preferred transcript language code to analyze when a matching ready transcript track is available.
`min_scenes`	integer	Optional. Lower-bound hint for scene segmentation. Mux avoids collapsing clearly distinct beats below this count when the content supports more granular scene boundaries.
`min_scene_duration_ms`	integer	Optional. Preferred minimum scene duration in milliseconds. Must be at least `1000`. Defaults to `15000`.
`output_steering`	object	Optional. Curated controls that guide scene grouping, titles, narratives, and concepts without changing the response schema. See Output steering.

Output steering

Use output_steering when you want best-effort control over how scenes are grouped and described. These fields guide the workflow but do not guarantee exact output.

Field	Type	Description
`segmentation_strategy`	string	Preferred segmentation strategy. Supported values: `editorial_beats`, `topic_changes`, `visual_transitions`, `action_progression`, and `instructional_steps`.
`title_style`	string	Preferred scene title style. Supported values: `descriptive`, `editorial`, `search_optimized`, and `accessibility`.
`narration_detail`	string	Preferred level of narrative detail. Supported values: `concise`, `balanced`, and `detailed`.
`audience`	string	Intended audience used to guide scene grouping, titles, and narrative emphasis.
`brand_terms`	array of strings	Preferred brand or domain terms to use when supported by the source content.
`topic_taxonomy`	object	Controlled vocabulary used to steer scene concepts without changing the response schema.
`topic_taxonomy.name`	string	Optional customer-facing name for the taxonomy.
`topic_taxonomy.values`	array	Controlled vocabulary values. Each value has a required `label` and optional `description` and `aliases`.
`topic_taxonomy.allow_other`	boolean	When `true`, non-taxonomy values may be used when no taxonomy value applies.

{
  "parameters": {
    "asset_id": "YOUR_ASSET_ID",
    "output_steering": {
      "segmentation_strategy": "instructional_steps",
      "title_style": "search_optimized",
      "narration_detail": "detailed",
      "audience": "support agents reviewing product walkthroughs",
      "brand_terms": ["Mux Video", "Mux Data"],
      "topic_taxonomy": {
        "name": "Product areas",
        "values": [
          {
            "label": "Video uploads",
            "description": "Uploading, ingesting, or configuring new video assets",
            "aliases": ["uploads", "direct uploads"]
          },
          {
            "label": "Playback analytics",
            "description": "Viewer experience, engagement, or streaming performance"
          }
        ],
        "allow_other": true
      }
    }
  }
}

Output

The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.find_scenes.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/find-scenes/{JOB_ID}. It contains:

Field	Type	Description
`scenes`	array	Ordered scene segments extracted from the asset.
`scenes[].start_ms`	number	Scene start time in milliseconds.
`scenes[].end_ms`	number	Scene end time in milliseconds.
`scenes[].title`	string	Short human-readable title for the scene.
`scenes[].cues`	array	Transcript cues that overlap the scene. Can be empty when no usable transcript is available.
`scenes[].cues[].start_ms`	number	Cue start time in milliseconds.
`scenes[].cues[].end_ms`	number	Cue end time in milliseconds.
`scenes[].cues[].text`	string	Transcript text for this cue.
`scenes[].audible_narrative`	string	One-sentence summary of what is being said during the scene. Present only when transcript cues are available.
`scenes[].blended_narrative`	string	Combined audible and visual narrative for the scene when both signals are meaningfully available.
`scenes[].notable_audible_concepts`	array	Multi-word descriptive phrases capturing key audible concepts in the scene. Present only when transcript cues are available.
`scenes[].visual_narrative`	string	One-sentence summary of what is visually happening during the scene.
`scenes[].notable_visual_concepts`	array	Scored visual concepts extracted from sampled scene frames. Each item has `concept`, `score`, and `rationale`.
`scenes[].shots`	array	Optional ordered shot or beat breakdown within the scene, including timings and short visual descriptions.
`scenes[].shots[].start_ms`	number	Shot or beat start time in milliseconds.
`scenes[].shots[].end_ms`	number	Shot or beat end time in milliseconds.
`scenes[].shots[].visual_description`	string	Short visual description of what happens during this shot or beat.
`scenes[].shot_count`	integer	How many timed shot or beat entries make up the scene. When `shots` is present, this equals `shots.length`.

Example response

This is the payload delivered to the robots.job.find_scenes.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-scenes/{JOB_ID}:

{
  "data": {
    "id": "rjob_scn123",
    "workflow": "find-scenes",
    "status": "completed",
    "units_consumed": 1,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    },
    "outputs": {
      "scenes": [
        {
          "start_ms": 0,
          "end_ms": 18500,
          "title": "Opening Product Context",
          "cues": [
            {
              "start_ms": 500,
              "end_ms": 4300,
              "text": "Today we are walking through the upload workflow."
            }
          ],
          "audible_narrative": "The speaker introduces the upload workflow and sets context for the walkthrough.",
          "blended_narrative": "A presenter introduces the product workflow while the interface is visible on screen.",
          "notable_audible_concepts": ["upload workflow walkthrough", "product context"],
          "visual_narrative": "A presenter stands beside a screen showing a video upload interface.",
          "notable_visual_concepts": [
            {
              "concept": "upload interface",
              "score": 0.88,
              "rationale": "The visible UI anchors the scene around the upload workflow."
            }
          ],
          "shots": [
            {
              "start_ms": 0,
              "end_ms": 8200,
              "visual_description": "The presenter appears next to the dashboard."
            },
            {
              "start_ms": 8200,
              "end_ms": 18500,
              "visual_description": "The screen focuses on the upload controls."
            }
          ],
          "shot_count": 2
        }
      ]
    }
  }
}

On this page

Find scenes

Segment a Mux video asset into ordered scenes using the Mux Robots API.

Experimental

The find-scenes workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.

Create a `find-scenes` job

curl https://api.mux.com/robots/v0/jobs/find-scenes \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video asset to segment into scenes.
`language_code`	string	Optional. Preferred transcript language code to analyze when a matching ready transcript track is available.
`min_scenes`	integer	Optional. Lower-bound hint for scene segmentation. Mux avoids collapsing clearly distinct beats below this count when the content supports more granular scene boundaries.
`min_scene_duration_ms`	integer	Optional. Preferred minimum scene duration in milliseconds. Must be at least `1000`. Defaults to `15000`.
`output_steering`	object	Optional. Curated controls that guide scene grouping, titles, narratives, and concepts without changing the response schema. See Output steering.

Output steering

Use output_steering when you want best-effort control over how scenes are grouped and described. These fields guide the workflow but do not guarantee exact output.

Field	Type	Description
`segmentation_strategy`	string	Preferred segmentation strategy. Supported values: `editorial_beats`, `topic_changes`, `visual_transitions`, `action_progression`, and `instructional_steps`.
`title_style`	string	Preferred scene title style. Supported values: `descriptive`, `editorial`, `search_optimized`, and `accessibility`.
`narration_detail`	string	Preferred level of narrative detail. Supported values: `concise`, `balanced`, and `detailed`.
`audience`	string	Intended audience used to guide scene grouping, titles, and narrative emphasis.
`brand_terms`	array of strings	Preferred brand or domain terms to use when supported by the source content.
`topic_taxonomy`	object	Controlled vocabulary used to steer scene concepts without changing the response schema.
`topic_taxonomy.name`	string	Optional customer-facing name for the taxonomy.
`topic_taxonomy.values`	array	Controlled vocabulary values. Each value has a required `label` and optional `description` and `aliases`.
`topic_taxonomy.allow_other`	boolean	When `true`, non-taxonomy values may be used when no taxonomy value applies.

{
  "parameters": {
    "asset_id": "YOUR_ASSET_ID",
    "output_steering": {
      "segmentation_strategy": "instructional_steps",
      "title_style": "search_optimized",
      "narration_detail": "detailed",
      "audience": "support agents reviewing product walkthroughs",
      "brand_terms": ["Mux Video", "Mux Data"],
      "topic_taxonomy": {
        "name": "Product areas",
        "values": [
          {
            "label": "Video uploads",
            "description": "Uploading, ingesting, or configuring new video assets",
            "aliases": ["uploads", "direct uploads"]
          },
          {
            "label": "Playback analytics",
            "description": "Viewer experience, engagement, or streaming performance"
          }
        ],
        "allow_other": true
      }
    }
  }
}

Output

Field	Type	Description
`scenes`	array	Ordered scene segments extracted from the asset.
`scenes[].start_ms`	number	Scene start time in milliseconds.
`scenes[].end_ms`	number	Scene end time in milliseconds.
`scenes[].title`	string	Short human-readable title for the scene.
`scenes[].cues`	array	Transcript cues that overlap the scene. Can be empty when no usable transcript is available.
`scenes[].cues[].start_ms`	number	Cue start time in milliseconds.
`scenes[].cues[].end_ms`	number	Cue end time in milliseconds.
`scenes[].cues[].text`	string	Transcript text for this cue.
`scenes[].audible_narrative`	string	One-sentence summary of what is being said during the scene. Present only when transcript cues are available.
`scenes[].blended_narrative`	string	Combined audible and visual narrative for the scene when both signals are meaningfully available.
`scenes[].notable_audible_concepts`	array	Multi-word descriptive phrases capturing key audible concepts in the scene. Present only when transcript cues are available.
`scenes[].visual_narrative`	string	One-sentence summary of what is visually happening during the scene.
`scenes[].notable_visual_concepts`	array	Scored visual concepts extracted from sampled scene frames. Each item has `concept`, `score`, and `rationale`.
`scenes[].shots`	array	Optional ordered shot or beat breakdown within the scene, including timings and short visual descriptions.
`scenes[].shots[].start_ms`	number	Shot or beat start time in milliseconds.
`scenes[].shots[].end_ms`	number	Shot or beat end time in milliseconds.
`scenes[].shots[].visual_description`	string	Short visual description of what happens during this shot or beat.
`scenes[].shot_count`	integer	How many timed shot or beat entries make up the scene. When `shots` is present, this equals `shots.length`.

Example response

This is the payload delivered to the robots.job.find_scenes.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-scenes/{JOB_ID}:

{
  "data": {
    "id": "rjob_scn123",
    "workflow": "find-scenes",
    "status": "completed",
    "units_consumed": 1,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    },
    "outputs": {
      "scenes": [
        {
          "start_ms": 0,
          "end_ms": 18500,
          "title": "Opening Product Context",
          "cues": [
            {
              "start_ms": 500,
              "end_ms": 4300,
              "text": "Today we are walking through the upload workflow."
            }
          ],
          "audible_narrative": "The speaker introduces the upload workflow and sets context for the walkthrough.",
          "blended_narrative": "A presenter introduces the product workflow while the interface is visible on screen.",
          "notable_audible_concepts": ["upload workflow walkthrough", "product context"],
          "visual_narrative": "A presenter stands beside a screen showing a video upload interface.",
          "notable_visual_concepts": [
            {
              "concept": "upload interface",
              "score": 0.88,
              "rationale": "The visible UI anchors the scene around the upload workflow."
            }
          ],
          "shots": [
            {
              "start_ms": 0,
              "end_ms": 8200,
              "visual_description": "The presenter appears next to the dashboard."
            },
            {
              "start_ms": 8200,
              "end_ms": 18500,
              "visual_description": "The screen focuses on the upload controls."
            }
          ],
          "shot_count": 2
        }
      ]
    }
  }
}