Learn how to use an LLM for translating your subtitles to another language
Mux uses OpenAI's Whisper model to create auto-generated captions, and that comes with a few restrictions. One of the most impactful, if you want your content to be accessible on a global scale, is the restriction that your captions must be generated in the same language as your source audio. Thankfully, large language models are quite skilled when it comes to translating text.
Here, we’ll outline a simple workflow that can watch your Mux organization for new assets with generated English subtitle tracks. When we see one, we’ll send it off to an LLM for translation into Spanish, and then we’ll upload the new track to the existing asset.
generated_subtitles
is set with a language_code
of en
(also your video’s audio will need to be in English for this example)video.asset.track.ready
webhook for your asset’s caption track, it will have a data.type
value of text
, a data.language_code
of en
, and a data.text_source
value of generated_vod
.Let's outline a simple example, written in Python and Flask. Please note that this isn’t production ready, and should instead serve as an example of what is possible with this kind of implementation. We'll need to account for the following:
Let's create our venv
, activate it, and install flask
and requests
. We'll use Flask for serving our web endpoints, and Requests for interacting with Mux.
python3 -m venv venv
source venv/bin/activate
pip install flask requests
We'll also pre-populate some variables in our script for later use, and also set up our imports:
# Grab Flask for web-hosting
from flask import Flask, request, jsonify
# Grab json for parsing json responses from claude, os for file handling,
# threading for delayed file deletion, requests for web interaction
import requests, json, os, threading
# Store your various API credentials and webhook secret here
mux_access_token = "YOUR_MUX_ACCESS_TOKEN"
mux_access_secret = "YOUR_MUX_ACCESS_SECRET"
anthropic_api_key = "YOUR_ANTHROPIC_API_KEY"
# The public URL of this webhook receiver, used for hosting the translated VTT file
host = "YOUR_PUBLIC_HOST_URL"
We need to create our function to interact with Claude, which we will do via a request to Anthropic's API. Note the prompt of Translate the following subtitle track to spanish, return in JSON format with key 'translation'.
We're using this phrasing to ensure that we provide Claude context around the ask, as well as to specify a standardized output to interact with later. We've specified Claude Sonnet 4 to strike a good balance between response time and accuracy.
# Call to claude to translate the subtitle track, could be modified to do more lanuages, etc.
def translate_with_claude(mux_stream_url, anthropic_api_key):
claude_url = "https://api.anthropic.com/v1/messages"
headers = {
"x-api-key": anthropic_api_key,
"content-type": "application/json",
"anthropic-version": "2023-06-01"
}
payload = {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1000,
"messages": [
{"role": "user", "content": f"Translate the following subtitle track to spanish, return in JSON format with key 'translation'.\n\n{mux_stream_url}"}
]
}
try:
claude_response = requests.post(claude_url, headers=headers, json=payload)
claude_response.raise_for_status()
result = claude_response.json()
return result
except Exception:
pass
Next we'll create our webhook handler in our Python script. We'll create a route that serves POST
requests made to the root of our endpoint. Here we're ensuring that we only act on the proper assets and webhooks. Once we see a generated track, we'll try to download the text into memory, and pass it off to Claude. We're also starting the web-server here on port 5000
.
app = Flask(__name__)
@app.route('/', methods=['POST']) # Webhook receiver endpoint
def webhook_receiver():
translation = None
data = request.json
# Only react to webhooks of type video.asset.track.ready, ignore all others
if not (isinstance(data, dict) and data.get("type") == "video.asset.track.ready"):
return jsonify({"status": "ignored"}), 200
# Check for required values inside the data dict to prove this is the right track
datadict = data.get("data") if isinstance(data, dict) else None
if (
isinstance(datadict, dict)
and datadict.get("type") == "text"
and datadict.get("language_code") == "en"
and datadict.get("text_source") == "generated_vod"
):
# Extract asset_id and id from the data
asset_id = datadict.get("asset_id")
track_id = datadict.get("id")
# Request playback ID from Mux API
mux_url = f"https://api.mux.com/video/v1/assets/{asset_id}"
try:
response = requests.get(
mux_url,
auth=(mux_access_token, mux_access_secret)
)
response.raise_for_status()
mux_data = response.json()
playback_ids = mux_data.get("data", {}).get("playback_ids", [])
# Construct the Mux stream URL for the text track
if playback_ids:
playback_id = playback_ids[0].get("id")
mux_track_url = f"https://stream.mux.com/{playback_id}/text/{track_id}.vtt"
# Download the VTT file, translate it
try:
vtt_response = requests.get(mux_track_url)
vtt_response.raise_for_status()
vtt_text = vtt_response.text
claude_result = translate_with_claude(vtt_text, anthropic_api_key) # Translate the subtitle track using Claude
########################################
# Handle our response from Claude here #
########################################
#huge chain of expections to avoid crashing the webhook receiver, in a production system you'd want to handle these more gracefully
except Exception as e:
pass
except Exception:
pass
#signal that we processed the webhook
return jsonify({"status": "success"}), 200
#listen on port 5000
if __name__ == '__main__':
app.run(port=5000)
Once we have our response from Claude, we'll want to extract the JSON from the response, and then take the value of the translation
key that we asked for in our prompt. We'll save that output as a file to the /static
directory, and make a POST
back to Mux, directing to that new file.
########################################
# Handle our response from Claude here #
########################################
try:
# Extract the translated text from Claude's response, upload to Mux as a new asset track
translation = json.loads(claude_result['content'][0]['text'].replace('```json', '').replace('```', '').strip())['translation']
# Save the translated VTT file to static directory under asset_id
static_dir = os.path.join(os.path.dirname(__file__), "static")
os.makedirs(static_dir, exist_ok=True)
translated_vtt_path = os.path.join(static_dir, f"{asset_id}_es.vtt")
with open(translated_vtt_path, "w", encoding="utf-8") as f:
f.write(translation)
# Upload the translated VTT file to Mux as a new text track, using the static directory URL
mux_tracks_url = f"https://api.mux.com/video/v1/assets/{asset_id}/tracks"
track_payload = {
"type": "text",
"text_type": "subtitles",
"language_code": "es",
"name": "Spanish",
"url": f"{host}/static/{asset_id}_es.vtt"
}
try:
track_response = requests.post(
mux_tracks_url,
auth=(mux_access_token, mux_access_secret),
json=track_payload
)
track_response.raise_for_status()
except Exception:
pass
##########################################################################
# Delete the translated VTT file after 60 seconds in a background thread #
##########################################################################
Finally, let's set up a thread that will delete the translated file after 60 seconds. This will give Mux time to grab the file, without filling up our local storage over time.
##########################################################################
# Delete the translated VTT file after 60 seconds in a background thread #
##########################################################################
threading.Timer(60, delete_vtt, args=(translated_vtt_path,)).start()
And we'll need create that delete_vtt
function as well.
def delete_vtt(path):
try:
os.remove(path)
except Exception:
pass
If you're developing locally, you can use ngrok
to expose your local server to the internet:
ngrok http 5000
This will give you a public URL that you can use to test the webhook. Finally, in your Mux Dashboard, you can create a new webhook endpoint pointing to your public URL. You'll want to use that endpoint for YOUR_PUBLIC_HOST_URL
in our example as well.
Here’s a sample video showing how the output of this workflow could look, with the Spanish subtitles coming entirely from Anthropic’s Claude, generated on the fly as the Whisper model captions were finalized:
Relying on AI to translate your subtitles means that you likely won't be manually checking each translation. Because AI isn't 100% perfect, you may want to indicate in your app that the translations being displayed are auto-generated, in case of inconsistencies or inaccuracies that are present. Additionally, if you’re basing your translations off of Mux’s auto-generated captions, you’re introducing two different AI models into your workflow, increasing the likelihood of minor mistakes. In small scale testing, we've found Claude to be quite reliable and consistent, but your mileage may vary, and you should consider adding whatever checks and controls that you feel are necessary to manage mistakes.
For example: AI may not always understand the contextual meaning of a conversation. It might not always correctly translate jokes, slang, or culturally specific expressions. Depending on the type of content being translated, you may want to add a manual verification step before publishing the translations.