Skip to content Skip to navigation



Offering text captions that are synchronized with videos is an increasingly common practice on the web, in movies, on TV, and at live events. These captions are essential for those who cannot hear or have difficulty hearing the audio. Making transcripts available is an additional and important component of providing full access to all who are unable to use audio and video, including those who have limited bandwidth, people who lack time, or have a disability. Transcripts include descriptions of what is onscreen. And like captions, they should identify who is speaking and indicate important sound effects.

Captions and transcripts are not just for people with hearing difficulties. For example, captioning helps many others -- very likely including you.

Do you ever read the captions while watching a sports game or other TV show in a loud setting? Have you found captions to be helpful when viewing a video with audio that's difficult to understand? Wouldn't captions be handy when watching a video in a library? Do you often read captions/subtitles when watching online content at work because your headphones are too hard to find? But perhaps most important, have you ever wanted to use captions, but found the ones you can see frustratingly incomprehensible? If so, then you realize that captions are for everyone, and quality matters.

Other situations in which captions are valuable include:

  • Supporting non-Native English speakers who may find reading easier than listening
  • Assisting with multi-modal learning strategies for everyone, such as learning to spell unfamiliar words
  • Making text available to aid with Search Engine Optimization, as well as making it "human searchable," (in the case of transcripts)

Definition of Terms

The terms "captions" and "subtitles" are generally interchangeable. Captions go one step further than subtitles to add non-spoken information. Transcripts are textual versions of the spoken words, whereas captions or subtitles break the text up into reasonable length to fit on screen and add timecode information. Common caption or subtitle files use extensions .DFXP, .SRT, .WEBVTT.

Video descriptions, also known as described videos, are provided for those who are blind or visually impaired. They offer additional audio content that is inserted during pauses in the regular audio content. Video descriptions provide information about activities on screen that are essential to understanding the overall content. For example, it might be valuable to include video description to supplement a video of a science experiment to assure that a blind or visually impaired student can fully grasp the concepts being discussed.

Captioning Services

A range of captioning services are in use by the Stanford community. Some services are suited to handling large batches of content, while others work best when you have short videos, or if you have a prepared transcript. Which service to choose also depends on whether the video is already online and where it is hosted (e.g., departmental web page, YouTube). Some services offer captioning and editing workflows designed to work with third-party platforms, such as YouTube and Vimeo.

Cost varies, depending on whether you have a transcript, how quickly you need to post the results, how much editing you wish to do, etc. Typically, prices range between $1.50 to $3 per minute.

Please contact us for assistance selecting the solution that is best for your unit.

3Play Media

3Play Media is a third party transcription & captioning service available to the Stanford community. This company provides a range of products and services including automated workflows with YouTube, Kaltura, Mediasite, Echo360, and many other video platforms. The user interface is flexible and permits categorizing, searching, editing, and format conversion. It is also possible to import your existing captions and transcripts. 3Play Media also has translation partners for multilingual videos. For more information, prices and set-up instructions, please visit


Cielo24 is a third party service with whom Stanford has negotiated per-minute rates for transcription and captioning. They provide automated workflows for adding captions to videos hosted on YouTube, Vimeo, Kaltura and several other video hosting services. For example, if you use YouTube, you can request captions by simply providing the YouTube URL of your video, and captions will be automatically added to your video once they are created. The platform also provides an online tool for editing your captions. For more information, prices and set-up instructions please visit

Rev offers a bare-bones low-rate captioning and transcription service. This service is cost-effective for the occasional, short, general-audience videos, and that you know how to manually upload and link the caption files to your videos

Do-It-Yourself with YouTube

There are 3 ways depending on the transcripts you have on.

A Word About Quality

Regardless of which service you choose, it is always a good idea to proofread the captions you receive before making them publicly available. Accurately transcribed captions help the multimedia content on your site to make a good impression. It is especially important to look for any words that are unusual, such as names, technical terms in a particular field, and the like.

As you may know, Google offers an automatic captioning feature on YouTube, but its results are less than optimal. It is possible to edit the automatic captions, though you may wish to decide whether it is faster to do that, generate your own captions using a transcript, or work with a third-party vendor.

Target Audience: 
Content Creator
Last modified: 
March 4, 2016