Add auto-generated captions to your videos and use transcripts

Learn how to add auto-generated captions to your on-demand Mux Video assets, to increase accessibility and to create transcripts for further processing.

In this guide:

Overview

Learn about what use-cases auto-generated captions are suited for

Enabling auto-generated captions

Learn how to enable auto-generated captions at asset creation time

Enabling auto-generated captions retroactively

Learn how to enable auto-generated captions after an asset has been created

Retrieving a transcript

Learn how to retrieve a transcript file based on the auto-generated captions

FAQ

Read commonly asked questions and answers about auto-generated captions

Overview

Mux uses OpenAI's Whisper model to automatically generate captions for on-demand assets. This guide shows you how to enable this feature, what you can do with it, and what some of the limitations you might encounter are.

Generally, you should expect auto-generated captions to work well for content with reasonably clear audio. It may work less well with assets that contain a lot of non-speech audio (music, background noise, extended periods of silence).

We recommend that you try it out on some of your typical content, and see if the results meet your expectations.

This feature is designed to generate captions in the same language that your content's audio is produced in. It should not be used to programatically generate translated captions in other languages.

We also have a video guide for learning how to use this new feature!

Enabling auto-generated captions

When you create a Mux AssetAPI, you can add a generated_subtitles array to the API call, as follows:

// POST /video/v1/assets
{
    "input": [
        {
            "url": "...",
            "generated_subtitles": [
                {
                    "language_code": "en",
                    "name": "English CC"
                }
            ]
        }
    ],
    "playback_policy": "public",
    "encoding_tier": "smart"
}

Mux supports the following languages and corresponding language codes for VOD generated captions. Languages labeled as "beta" may have lower accuracy.

Language	Language Code	Status
English	en	Stable
Spanish	es	Stable
Italian	it	Stable
Portuguese	pt	Stable
German	de	Stable
French	fr	Stable
Polish	pl	Beta
Russian	ru	Beta
Dutch	nl	Beta
Catalan	ca	Beta
Turkish	tr	Beta
Swedish	sv	Beta
Ukrainian	uk	Beta
Norwegian	no	Beta
Finnish	fi	Beta
Slovak	sk	Beta
Greek	el	Beta
Czech	cs	Beta
Croatian	hr	Beta
Danish	da	Beta
Romanian	ro	Beta
Bulgarian	bg	Beta

You can also enable autogenerated captions if you're using Direct UploadsAPI by specifying the generated_subtitles configuration in the first entry of the input list of the new_asset_settings object, like this:

// POST /video/v1/uploads
{
    "new_asset_settings": {
        "playback_policy": [
            "public"
        ],
        "encoding_tier": "smart",
        "input": [
            {
                "generated_subtitles": [
                    {
                        "language_code": "en",
                        "name": "English CC"
                    }
                ]
            }
        ]
    },
    "cors_origin": "*"
}

Auto-captioning happens separately from the initial asset ingest, so that this doesn't delay the asset being available for playback. If you want to know when the text track for the captions is ready, listen for the video.asset.track.ready webhook for a track with "text_source": "generated_vod".

Enabling auto-generated captions retroactively

You can retroactively add captions to any smart asset created in the last 7 days by POSTing to the generate-subtitles endpoint on the asset audio track that you want to generate captions for, as shown below:

// POST /video/v1/assets/${ASSET_ID}/tracks/${AUDIO_TRACK_ID}/generate-subtitles
{
  "generated_subtitles": [
    {
      "language_code": "en",
      "name": "English (generated)"
    }
  ]
}

If you need to use this API to backfill captions to assets created longer than 7 days ago, please reach out and we'd be happy to help. Please note that there may be a charge for backfilling captions onto large libraries.

Retrieving a transcript

For assets that have a ready auto-generated captions track, you can also request a transcript (a plain text file) of the speech recognized in your asset.

To get this, just request (using a playback id for your asset, and the track id for the generated_vod text track):

https://stream.mux.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt

You might find this transcript useful for doing further processing in other systems. For example, content moderation, sentiment analysis, summarization, extracting insights from your content, and many more.

FAQ

How much does auto-generated captioning cost for on-demand assets?

There is no additional charge for this feature. It's included as part of the standard encoding charge for Mux Video assets.

How long does it take to generate captions?

It depends on the length of the asset, but generally it takes about 0.1x content duration. As an example, a 1 hour asset would take about 6 minutes to generate captions for.

Help, the captions you generated are full of mistakes!

We're sorry to hear that! Unfortunately, though automatic speech recognition has improved enormously in recent years, sometimes it can still get things wrong.

One option you have is to edit and replace the mis-recognized speech in the captions track:

Download the full VTT file we generated at https://stream.mux.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
Edit the VTT file using your preferred text editor
Delete the autogenerated track with the 'delete track' APIAPI
Add a new track to your asset using the edited VTT file, using the create track APIAPI

Add auto-generated captions to your videos and use transcripts

In this guide:

Overview

Overview

Enabling auto-generated captions

Enabling auto-generated captions

Enabling auto-generated captions retroactively

Enabling auto-generated captions retroactively

Retrieving a transcript

Retrieving a transcript

FAQ

FAQ

Overview

Enabling auto-generated captions

Enabling auto-generated captions retroactively

Retrieving a transcript

FAQ

How much does auto-generated captioning cost for on-demand assets?

How long does it take to generate captions?

Help, the captions you generated are full of mistakes!

My content is in multiple languages

I want to generate captions in a different language to my content

My content is in a language you don't support