# Find scenes
Segment a Mux video asset into ordered scenes using the Mux Robots API.
<Callout type="warning" title="Experimental">
  The `find-scenes` workflow is experimental. The API shape, parameters, behavior, and pricing may change. If we make changes to this workflow while you're using it, we'll let you know via email.

  `find-scenes` can be enabled by [contacting our support team](/support).
</Callout>

Find scenes in a video and get structured, timestamped scene metadata. This workflow segments a Mux asset into ordered scenes, using visual signals and transcript cues when available. It is useful for building scene indexes, searchable video catalogs, navigation aids, and review workflows. See the <ApiRefLink href="/docs/api-reference/robots/find-scenes">Find Scenes API reference</ApiRefLink> for the full endpoint specification. See [Mux Robots pricing](/docs/pricing/overview#mux-robots-pricing) for unit costs.

<Callout type="info">
  Find scenes requires a video asset with generated shots in the `ready` state. Audio-only assets are not supported. Segmentation works from visual shots alone, but a text track makes the output richer by adding audible context to each scene. For the best results, make sure your asset has captions, either [auto-generated](/docs/guides/add-autogenerated-captions-and-use-transcripts) or [manually added](/docs/guides/add-subtitles-to-your-videos), before running a find-scenes job.
</Callout>

## Create a `find-scenes` job

```bash
curl https://api.mux.com/robots/v0/jobs/find-scenes \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}
```

<Callout type="info">
  This request is **asynchronous**. The `POST` returns immediately with the job in `pending` status and does not include results. **We strongly recommend listening for the [`robots.job.find_scenes.completed` webhook](/docs/guides/robots#webhooks)**. The payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll `GET /robots/v0/jobs/find-scenes/{JOB_ID}` with the `id` from the response until the status is `completed`.
</Callout>

## Parameters

| Parameter | Type | Description |
| :-- | :-- | :-- |
| `asset_id` | string | **Required.** The Mux asset ID of the video asset to segment into scenes. |
| `language_code` | string | **Optional.** Preferred transcript language code to analyze when a matching ready transcript track is available. |
| `min_scenes` | integer | **Optional.** Lower-bound hint for scene segmentation. Mux avoids collapsing clearly distinct beats below this count when the content supports more granular scene boundaries. |
| `min_scene_duration_ms` | integer | **Optional.** Preferred minimum scene duration in milliseconds. Must be at least `1000`. Defaults to `15000`. |
| `output_steering` | object | **Optional.** Curated controls that guide scene grouping, titles, narratives, and concepts without changing the response schema. See [Output steering](#output-steering). |

### Output steering

Use `output_steering` when you want best-effort control over how scenes are grouped and described. These fields guide the workflow but do not guarantee exact output.

| Field | Type | Description |
| :-- | :-- | :-- |
| `segmentation_strategy` | string | Preferred segmentation strategy. Supported values: `editorial_beats`, `topic_changes`, `visual_transitions`, `action_progression`, and `instructional_steps`. |
| `title_style` | string | Preferred scene title style. Supported values: `descriptive`, `editorial`, `search_optimized`, and `accessibility`. |
| `narration_detail` | string | Preferred level of narrative detail. Supported values: `concise`, `balanced`, and `detailed`. |
| `audience` | string | Intended audience used to guide scene grouping, titles, and narrative emphasis. |
| `brand_terms` | array of strings | Preferred brand or domain terms to use when supported by the source content. |
| `topic_taxonomy` | object | Controlled vocabulary used to steer scene concepts without changing the response schema. |
| `topic_taxonomy.name` | string | Optional customer-facing name for the taxonomy. |
| `topic_taxonomy.values` | array | Controlled vocabulary values. Each value has a required `label` and optional `description` and `aliases`. |
| `topic_taxonomy.allow_other` | boolean | When `true`, non-taxonomy values may be used when no taxonomy value applies. |

```json
{
  "parameters": {
    "asset_id": "YOUR_ASSET_ID",
    "output_steering": {
      "segmentation_strategy": "instructional_steps",
      "title_style": "search_optimized",
      "narration_detail": "detailed",
      "audience": "support agents reviewing product walkthroughs",
      "brand_terms": ["Mux Video", "Mux Data"],
      "topic_taxonomy": {
        "name": "Product areas",
        "values": [
          {
            "label": "Video uploads",
            "description": "Uploading, ingesting, or configuring new video assets",
            "aliases": ["uploads", "direct uploads"]
          },
          {
            "label": "Playback analytics",
            "description": "Viewer experience, engagement, or streaming performance"
          }
        ],
        "allow_other": true
      }
    }
  }
}
```

## Output

The `outputs` object is included in the job once its status is `completed`. You'll receive it on the [`robots.job.find_scenes.completed`](/docs/guides/robots#webhooks) webhook (recommended), or you can fetch it with `GET /robots/v0/jobs/find-scenes/{JOB_ID}`. It contains:

| Field | Type | Description |
| :-- | :-- | :-- |
| `scenes` | array | Ordered scene segments extracted from the asset. |
| `scenes[].start_ms` | number | Scene start time in milliseconds. |
| `scenes[].end_ms` | number | Scene end time in milliseconds. |
| `scenes[].title` | string | Short human-readable title for the scene. |
| `scenes[].cues` | array | Transcript cues that overlap the scene. Can be empty when no usable transcript is available. |
| `scenes[].cues[].start_ms` | number | Cue start time in milliseconds. |
| `scenes[].cues[].end_ms` | number | Cue end time in milliseconds. |
| `scenes[].cues[].text` | string | Transcript text for this cue. |
| `scenes[].audible_narrative` | string | One-sentence summary of what is being said during the scene. Present only when transcript cues are available. |
| `scenes[].blended_narrative` | string | Combined audible and visual narrative for the scene when both signals are meaningfully available. |
| `scenes[].notable_audible_concepts` | array | Multi-word descriptive phrases capturing key audible concepts in the scene. Present only when transcript cues are available. |
| `scenes[].visual_narrative` | string | One-sentence summary of what is visually happening during the scene. |
| `scenes[].notable_visual_concepts` | array | Scored visual concepts extracted from sampled scene frames. Each item has `concept`, `score`, and `rationale`. |
| `scenes[].shots` | array | Optional ordered shot or beat breakdown within the scene, including timings and short visual descriptions. |
| `scenes[].shots[].start_ms` | number | Shot or beat start time in milliseconds. |
| `scenes[].shots[].end_ms` | number | Shot or beat end time in milliseconds. |
| `scenes[].shots[].visual_description` | string | Short visual description of what happens during this shot or beat. |
| `scenes[].shot_count` | integer | How many timed shot or beat entries make up the scene. When `shots` is present, this equals `shots.length`. |

## Example response

This is the payload delivered to the [`robots.job.find_scenes.completed`](/docs/guides/robots#webhooks) webhook, and the same shape you get from `GET /robots/v0/jobs/find-scenes/{JOB_ID}`:

```json
{
  "data": {
    "id": "rjob_scn123",
    "workflow": "find-scenes",
    "status": "completed",
    "units_consumed": 1,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "language_code": "en",
      "min_scenes": 4,
      "min_scene_duration_ms": 15000
    },
    "outputs": {
      "scenes": [
        {
          "start_ms": 0,
          "end_ms": 18500,
          "title": "Opening Product Context",
          "cues": [
            {
              "start_ms": 500,
              "end_ms": 4300,
              "text": "Today we are walking through the upload workflow."
            }
          ],
          "audible_narrative": "The speaker introduces the upload workflow and sets context for the walkthrough.",
          "blended_narrative": "A presenter introduces the product workflow while the interface is visible on screen.",
          "notable_audible_concepts": ["upload workflow walkthrough", "product context"],
          "visual_narrative": "A presenter stands beside a screen showing a video upload interface.",
          "notable_visual_concepts": [
            {
              "concept": "upload interface",
              "score": 0.88,
              "rationale": "The visible UI anchors the scene around the upload workflow."
            }
          ],
          "shots": [
            {
              "start_ms": 0,
              "end_ms": 8200,
              "visual_description": "The presenter appears next to the dashboard."
            },
            {
              "start_ms": 8200,
              "end_ms": 18500,
              "visual_description": "The screen focuses on the upload controls."
            }
          ],
          "shot_count": 2
        }
      ]
    }
  }
}
```