Skip to Content
Mux Docs: Home

Add auto-generated captions to your videos and use transcripts

Learn how to add auto-generated captions to your on-demand Mux Video assets, to increase accessibility and to create transcripts for further processing.

Beta

This feature is currently in public beta. If you encounter any issues please let us know!

Overview

Mux uses OpenAI's Whisper model to automatically generate captions for on-demand assets. This guide shows you how to enable this feature, what you can do with it, and what some of the limitations you might encounter are.

Generally, you should expect auto-generated captions to work well with content in English, with reasonably clear audio. It may work less well with assets that contain a lot of non-speech audio (music, background noise, extended periods of silence).

We recommend that you try it out on some of your typical content, and see if the results meet your expectations.

We also have a video guide for learning how to use this new feature!

Enabling auto-generated captions

When you create a Mux AssetAPI, you can add a generated_subtitles array to the API call, as follows:

// POST /video/v1/assets
{
    "input": [
        {
            "url": "...",
            "generated_subtitles": [
                {
                    "language_code": "en",
                    "name": "English CC"
                }
            ]
        }
    ],
    "playback_policy": "public"
}

Note that language_code must be en currently; other languages aren't currently supported.

You can also enable autogenerated captions if you're using Direct UploadsAPI by specifying the generated_subtitles configuration in the first entry of the input list of the new_asset_settings object, like this:

// POST /video/v1/uploads
{
    "new_asset_settings": {
        "playback_policy": [
            "public"
        ],
        "input": [
            {
                "generated_subtitles": [
                    {
                        "language_code": "en",
                        "name": "English CC"
                    }
                ]
            }
        ]
    },
    "cors_origin": "*"
}

Auto-captioning happens separately from the initial asset ingest, so that this doesn't delay the asset being available for playback. If you want to know when the text track for the captions is ready, listen for the video.asset.track.ready webhook for a track with "text_source": "generated_vod".

Retrieving a transcript

For assets that have a ready auto-generated captions track, you can also request a transcript (a plain text file) of the speech recognized in your asset.

To get this, just request (using a playback id for your asset, and the track id for the generated_vod text track):

https://stream.mux.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt

You might find this transcript useful for doing further processing in other systems. For example, content moderation, sentiment analysis, summarization, extracting insights from your content, and many more.

FAQ

There is no additional charge for this feature. It's included as part of the standard encoding charge for Mux Video assets.

This is not currently available. However, if you create a copy of the asset through our 'clip' functionality, you can enable auto-generated captions on the new asset.

We're sorry to hear that! Unfortunately, though automatic speech recognition has improved enormously in recent years, sometimes it can still get things wrong.

One option you have is to edit and replace the mis-recognized speech in the captions track:

  1. Download the full VTT file we generated at https://stream.mux.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
  2. Edit the VTT file using your preferred text editor
  3. Delete the autogenerated track with the 'delete track' APIAPI
  4. Add a new track to your asset using the edited VTT file, using the 'create track' APIAPI

Sorry, we don't currently support other languages. We also do not recommend using this feature on mixed-language content.

Was this page helpful?