Commons:Timed Text

Translate this page; This page contains changes which are not marked for translation.

Media community: Audio and video requests · Featured media (candidates) · Media help · Media of the Day · Timed Text · Video info · Video2commons–Upload · Video cut tool

Shortcuts

For other uses, you may be looking for Commons:File captions.

TimedText is a custom Wikimedia Commons namespace to hold closed captioning text, or subtitles, to be associated with other media, such as audio or video files. This page intends to explain the feature's concept and use.

Closed captioning (CC) and subtitling are both processes of displaying text on a television, video screen, or other visual display to provide additional or interpretive information. Both are typically used as a transcription of the audio portion of a program as it occurs (either verbatim or in edited form), sometimes including descriptions of non-speech elements. This aids hearing-impaired and deaf people and provides a way for non-native language speakers to understand the content in a multimedia file.

Using

Example video player

Also see Commons:Video#Subtitles and closed captioning.

Thumbnails of videos and audio clips that have closed captioning available will show the CC icon overlayed. After opening the player, subtitles in your language are automatically enabled. You can find the icon in the controls of the player to switch between languages, toggling subtitles on and off, or to change the formatting of the subtitles.

Timed Text can be used for any media that is presented in a time sequence:

Audio file
Silent video
Spoken video
Animation demonstrating a concept or how something works

Actual examples

Commons:Timed Text Demo Page a page highlighting a few timed-text examples.
TimedText:Krazy Kat Bugologist 1916 silent.ogv.de.srt, German captions
TimedText:Krazy Kat Bugologist 1916 silent.ogv.en.srt, English captions
TimedText:Wikipedia Edit 2014.webm.pl.srt

Finding

Search Timed Text

Add below the name of the video to search

(do not delete the TimedText: prefix, add the text after it, e.g. TimedText:Elephants_Dream.ogv).

REMINDER : If the TimedText doesn't exist, don't forget to add language & extension, e.g. TimedText:Elephants_Dream.ogv.en.srt) to create a TimedText page - see Commons:Timed Text

{{Allpages|102}} is rendered as TimedText and lists all pages in namespace 102.

Commons needs a means to find Timed Text files for specific languages; the following suffer from the Search function's limitations (such as: it does not show all matches; it includes non-matches; it needs regular expression support). Search including some Timed Text .srt files in different languages:

English • German • French • Portuguese • Russian • Swedish • Ukrainian • Polish • Indonesian

Other methods to help user find Timed Text:

{{Closed captions}} displays links to all the closed captioning files available for a file, can be placed on a media page and its talk page.
{{special|Prefixindex/TimedText:{{PAGENAME}}.|stripprefix|1|subtitles}} yields a link to all related Timed Text files (example).
Commons:Timed Text/search by lang displays search links for all Timed Text files for a given language, useful for Commons pages, Categories and Talk pages.

Marking and Finding videos that need subtitles

The template {{Captions requested}} can be used to mark that a video needs caption. The template add it to the category Videos needing subtitles, so one can see which videos, users or authors have requested transcripts.

This template and category is in the scope of Commons:WikiProject Deaf and its sisters meta:Deaf Wikimedians and Wikipedia:WikiProject Deaf.

Finding videos that need subtitles translation

One way to find such videos, is to open one of the subcategories of Category:Files with closed captioning depending on the preferred starting language, and then to use Help:FastCCI (on the top right of the page) to include only the videos that don't have subtitles for your preferred target language.

Example

To find videos with subtitles in English to translate them, go to Category:Files with closed captioning in English.
Then, click on the FastCCI arrow to open the sub-menu and select "In this category but not in..."
In the textbox, enter the corresponding category depending of your preferred target language:
- For German, enter Files with closed captioning in German
- For French, enter Files with closed captioning in French
- For Russian, enter Files with closed captioning in Russian

etc..

Timed Text talk

The TimedText talk namespace is for discussing the respective Timed Text pages, but it could also be used to link and categorize the Timed Text page.

Maintenance tasks

Patrol changes in the TimedText namespace: RecentChanges
Find orphaned timed text pages that have no associated media file (any longer).

Uploading

To upload an already created subtitle file, open the file on your computer in a text editor (such as Notepad) and copy the text into a new page in the TimedText namespace that matches the filename of the video and the language code.

Creating

Commons uses the SubRip (.srt) file format for closed captioning and subtitles. You can create these files in multiple ways.

Create subtitles page for existing Commons files

Option 1: in the Commons page of the file (recommended)

You can use the "TimedText" link at the top of any suitable multimedia file on Commons.

Option 2: directly in the media player

By using the CC button in the toolbar of the Wikimedia HTML5 media player, you can select subtitles if they are available, or open the Subtitles editor to create subtitles for the video.

Option 3: creating a blank page (for advanced users)

You can always directly create the page in Commons using the template TimedText:[Common_File_Name.extension].[language].srt, where [Common_File_Name.extension] is the name of the file, and [language] is the ISO code for the language.

Example: to add subtitles to Elephants_Dream.ogg, you can create the page TimedText:Elephants_Dream.ogg.en.srt for english subtitles, or TimedText:Elephants_Dream.ogg.fr.srt for french subtitles.

Extracting existing subtitles to import them

Create Subtitles from DVD

To copy existing subtitles from a DVD you can use software such as SubRip. You can then copy-paste them in the wiki Commons subtitle page.

Create Subtitles with YouTube

YouTube allows users with a YouTube account to create subtitles out of any uploaded file. Keep in mind the speech recognition is automated and produces unexpected results. It is preferable to upload a transcript of the file to YouTube. This will provide a much better result. You can then copy-paste them in the wiki Commons subtitle page.

Steps to create the subtitles (a video tutorial of the steps can be found here):

Upload the file. (The multimedia file must also include a video track but you are free to choose a blank one or any other)
While uploading set the Video language for your file to the appropriate language under the "Show more" menu.
Or, after uploading, select "Subtitles" in the specific videos Details or in the YouTube Studio navigation.
Click on "Add" or "Add language".
You can add subtitles in one of three ways:
1. Upload a transcript in the proper format.
2. Copy and paste the transcript.
3. Type manually while watching the video.
The captions are then integrated into the video.
Download the .sbv file from the Subtitles menu under the three dot menu while in the "Edit Timings" view.
Convert the contents of the .sbv file into .srt file. There are various online tools to help with this step.
1. ffmpeg is one open-source option (directions).
Upload the .srt file to the corresponding page of the video on Wikimedia Commons.

Downloading subtitles from YouTube

You can download subtitles from video on YouTube (and probably several other video websites) like so:

Install yt-dlp
Run yt-dlp --list-subs url (replace url with the youtube url)
Run e.g. yt-dlp --write-subs en --sub-format vtt url (replace url with the YouTube URL)
Maybe srt subtitles are available too so you should use that instead of vtt or you can download all at once
Convert the vtt subtitles (or the format you have) to srt subtitles using a tool like FFmpeg (see: #Convert YouTube Subtitles to Timed Text format) or web UI like this
You can then paste these into the TimedText page of the video on WMC

If you use the tool video2commons one can check "Import subtitles" but that does not work for vtt subtitles (phab:T368298) so for these videos you also need to do the above steps for importing subtitles.

Converting scrolling captions to block captions

YouTube auto generated subtitles are scrolling captions. I wrote a program that converts these to block captions so they can be put on Commons. First, download the video with yt-dlp --write-auto-subs url (replace url with the url, well, duh). Then, use option 3. It should work okay but it has a habit of putting "word. word," at the end of a block, which is just so wrong because a full stop should be a good time to end a block. But the code is really long and I think I would have lots of trouble fixing it now.

Machine transcription

You can use the open source tool SoniTranslate to more easily and quickly generate machine transcribed subtitles. It would be good if you check these, especially if you also use the tool for machine translation into other languages. For example it may output years as long texts instead of numbers or get people's names wrong. How to use this tool is described in Help:AI video dubbing. ^[1] If there are no existing subtitles to import, this is likely the fastest way to add TimedTexts. Transcription usually only takes only a few seconds even if you don't have a GPU, depending on how long the video is.

The timings are made so that they are well-suited for getting used for dubbing videos into other languages which often is not the case for manually-made subtitles. You can edit the subtitles, then save as srt file and use that as input to the tool to let it create an audio or subtitle in another language.

Creating subtitles with whisper.cpp

as of 2024^[update], the Whisper AI models ^[1] are the most advanced speech transcription models available and can be run locally, either using Python or whisper.cpp. Unlike the earlier Vosk models, they will also produce punctuation, bringing their output much closer to a high-quality human transcription. All the same, you should check AI-generated subtitles against the video and correct mistakes, add punctuation, check correct spelling of people and place names, check facts and figures, etc. AI subtitles are very useful as a first draft, but often also contain some silly mistakes a human transcriber would not have made.

An advantage of whisper.cpp is that it is particularly optimized for running on the CPU rather than the GPU (so it is especially useful if you have an AMD graphics card and therefore no CUDA). But CUDA and Metal (on a Mac) are also supported, therefore it can easily adapt to different hardware configurations. Another advantage is that it does not require installing any external dependencies, i.e. no Python or PyTorch, since it is written in C++, making it a much smaller download than a Python machine learning environment.

Some video editing and closed captioning GUI software now features built-in Whisper functionality: Open source examples include the video editor Kdenlive (since version 23.04; requires Python) and Subtitle Edit (either Python or C++ can be used to run Whisper models).

But running the command-line version of whisper.cpp directly to create an SRT file is not too difficult either, provided your operating system has a C compiler, make, etc. to compile it with:

First, use e.g. ffmpeg to extract a video's audio track and convert it to 16 kHz sample rate:

ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav

Next, compile whisper.cpp and download a model (the base model optimized for English content is about 140 MB; "medium" can also handle other languages and is about 1.5 GB) and then start the conversion with e.g.:

./main -m models/ggml-base.en.bin -f audio.wav -t 8 -pc -osrt

This will use 8 CPU cores and create an SRT file called audio.wav.srt in the same directory. During recognition, words will be color-coded by confidence (green = very certain, red = very uncertain), so you can quickly see if the model is having trouble. If a smaller model delivers unusable output, you can try a larger model, e.g. medium, which will be slower but produce better results.

You can also translate from other languages, e.g. adding "-l fr -tr" to the options will translate French audio to English.

Convert YouTube Subtitles to Timed Text format

SBV Subtitles

If you export the SBV format from YouTube subtitles you can use ffmpeg to convert the subtile file to the SRT (SubRip) format used by Commons. This feature also solves the overlap issue that is common when converting YouTube subtitles to Commons.

ffmpeg -fix_sub_duration -i input.sbv output.srt

XML Subtitles

This section describes how to convert XML YouTube subtitles to SubRip (srt) format, that is TimedText subtitles format used in Wikimedia Commons.

If

the YouTube video has subtitles in some language (e.g. I created this YouTube video with subtitles in English, in Russian and in Livvi-Karelian languages),
this video was uploaded to Wikimedia Commons (e.g. this file),
you want to copy YouTube subtitles to the same video at Commons.

Then:

Download the subtitle in XML, put the ID of the YouTube video at the end of the URL: http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__
Install Ruby.
Download a Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
Run this program and convert XML file to .SRT file.
Copy and paste the contents of the .SRT file into the corresponding page of the video on Wikimedia Commons.

General tips

Type what is said in SRT format, this is one subtitle block:

1
00:00:20,000 --> 00:00:24,400
Words here.
Also get a caption editor.

This is two:

1
00:00:20,000 --> 00:00:21,500
Words more words.

2
00:00:21,500 --> 00:00:24,400
More.

If a person says "Words more words." at the same time as another person says "More.", writing Words more words. More would be wrong. Put:

-Words more words.
-More.

Putting -Words more words. -More is also wrong, it needs the line break. Two lines are the maximum most of the time, but in the past people have used three.

If there is enough time to show each line on its own block, then that should be done: Words more words. then after: More..

(subtitleedit will make the block red if it is too short, indicating you need to join them. By default it is set to 25 characters per second)

Each new person or thing making a sound gets a dash: e.g., a baby cries then alarm goes off:

-[baby cries]
-[alarm goes off]

However, if these are far apart, then these should be their own blocks: [baby crying] then after: [alarm goes off] (ing is put just in case it goes longer.).

Cutting lines

After 43 characters the line should be split, don't split between names. Try to split between commas or full stops. Do this:

This line is very okay.
See I broke the line.

Not this:

This line is very okay. See
I broke the line.

Because the word "See" is right after a full stop.

Don't do this:

I know little information about Taylor
Swift to be a Swifty.

because it splits her name.

Do this:

I know little information about
Taylor Swift to be a Swifty.

That is good because it does not split her name.

But not this:

I know too little information about Taylor Swift
to be a Swifty

Because that is longer than 43 characters in one line, so it is too long. When splitting blocks due to length, don't make a block with only one or two words at the end of a sentence.

Style

You can pick your favorite style as long as you keep the style consistent for the whole video and match other people's styles when editing other people’s closed captions.

Square brackets for speakers and sounds

It is more popular to use square brackets: (A citation is needed here)

[wolf-whistles]

-[speaker1] Words.
-[speaker2] Words.

This style always uses two dashes when there are two people per block.

-Words
-[speaker2] words

-[speaker1] Words
-Words

Round brackets for sounds

NAME: can be used for this style when identifying a speaker when using this style.

-SPEAKER: Text.

SPEAKER 1: Words.
SPEAKER 2: Words.

When writing in this style, two dashes aren't used unless it's similar to this:

-Okay, Sia.
-SASHA: Wait.

or this:

-QUNNI: Okay, Sia.
-Wait.

All

Some people put spaces:

( sing-song )

Some people use square brackets:

[ sing-song ]

Some people capitalize the first letter:

[ Gasps ]

Some people capitalize the first letter without spaces:

[Gasps]

Some people capitalize the first letter with parentheses:

(Gasps)

Some people capitalize the first letter of each word:

[ All Laughing ]

Any of these styles can be used in conjunction.

Some people always double dash with a space after the dash:

- [Speaker 1] Words.
- [Speaker 2] Words.

See how that looks different too:

-[Speaker 1] Words.
-[Speaker 2] Words.

Some people double dash only on the second line:^[2]

[Speaker 1] Words.
-[Speaker 2] Words.

Words.
-[Speaker 2] Words.

[Speaker 1] Words.
-Words.

Words.
-Words.

Some prefer to double dash only on the second line with a space.

Putting a new dash for each speaker

Some styles put a new dash every time the speaker changes. A space, which is more common, as ub "- ", or just a dash, "-", can be used to indicate this. This is what it looks like:

1
00:00:20,000 --> 00:00:23,000
- Speaker 1
- Speaker 2

2
00:00:23,000 --> 00:00:26,000
Speaker 2
- Speaker 1

3
00:00:26,000 --> 00:00:29,000
- Speaker 3
- Speaker 1

4
00:00:29,000 --> 00:00:32,000
Speaker 1

5
00:00:32,000 --> 00:00:36,000
- Speaker 2

Why brackets are better

The problem with putting the name in uppercase with a colon at the end, is you still may need to write things in brackets after the name:

MAN (on TV): Spell Okay Correctly Moment!
WOMAN: Okay, not ok.

Ether MAN (on TV): Words or a worse looking way: MAN: (on TV) Words

So forget the uppercase name, right? And just use the square brackets for both speakers and sounds:

[man on TV] OK or O.K. is also okay.

When using the square brackets for speakers and sounds you always double dash when there are two people per line.

-[man on TV] But okay is better.
-[girl] Because OK is like "How are U"

Also never do this: SPEAKER: (YAWNS) I've seen it, it's dumb.^[3]

I would recommend using the square brackets to indicate sounds and speakers and not putting a new dash when each speaker change because if a dog started barking I wouldn't know if that could count as a new speaker or not (probably only counts with words). Use whatever style you want. Except the *asterisks*, that would be stupid. If you edit someone else's closed captions you gotta match their style. And if it isn't your preference don't bother changing it because if it ain’t broke, don't fix it. In the case of Friends captioned by the media access group at the WGBH education foundation, they always used italics ( clicking )^[4] but today some people use the italics to also mean if the sound is not there, e.g. over the phone if someone sighs. You could put [sighs] in italics. Along with their words.

The sound description does not need to last until the whole sound is finished, because keeping blocks on screen for a really long time is really annoying. Clear the screen every now and then (like a screen change or after 8 seconds) and if the sound is still happening you can put [crying continues]. Read this:[1]

Sound names

Here is a list of sounds with examples of when they should be used:

[gasps]
[chuckles]
[chuckling]
[laughs]
[laughing]
[laughter]

(Can be used if lots of people laugh but [all laugh] and [all laughing] should be used if they start laughing more noticeably.)

[[[w:wolf-whistles]]]
[groans]
[hawking]

The suffix ing is used when a sound is longer than usual.

Character naming

If someone speaks but you can't tell who it is without the sound, you put their name: [[[w:Elsa (Frozen)|Elsa]]] ♪ I can't! ♪ Name forcing is mentioning the name before it is mentioned in the dialogue. When trying to avoid name forcing, you can use their gender or occupation, for example, [man] or [waiter] before the words. You can also summarize their name, for example, Captain Raymond Holt to Ray or Holt, if there may not be enough time to read the full name.

Sometimes the dialogue can make it clear who is talking and putting the name might not be needed.

When adding the character’s name, don't add it for every caption block, only the first one when the character starts talking.

[Name] words words words or [Name over phone] words words words if its over the phone.

Next block doesn't have the name of the person:

more words here

A word with a different tone of voice should be in italics, especially if they are implying something. Follow WP:Italics to know what types of media to put in italics. If you need to use italics but the character is off screen and their words are already in italics, then un-italicize the text that would be italicized if they were on screen.

When to paraphrase

If the reader can't read the full text in time, then remove some words and paraphrase. Try to never go over 25 characters per second and aim under 20. For kids' shows, don't go over 17 characters per second because they read slower. Subtitleedit will tell you the CPS. When paraphrasing, you must maintain the original meaning and try not to change the sentence in a way that adds or removes a question mark. When removing words, don't use acronyms, for example, replacing "Oh my God" with "OMG". If the sentence is at 20 CPS and it ends up paraphrased down to 7, then it's likely too many words were removed. Paraphrasing may not be ass necessary if they stop speaking fast and take a break after their sentence. In that case, the blocks can be spread so they are delayed and cut into the gap. Similar to delaying them, they can also appear early.

Numbers Symbols and Acronyms

Uppercase letters shouldn't be used for screaming. Italics can be used for a whole paragraph to emphasize the yelling. ^[5]

Use capital letters when someone says an acronym e.g NASA or an initialism e.g ADHD, although the reader may not be able to tell if they said each letter (i.e., an initialism) or not (i.e., an acronym).

Made-up words that originated form an acronym should also be in capital letters. For example, in Heartbreak High they go to a class called Sexual Literacy Tutorial but the student's call it "sluts".

The subtitles are written as "I'm going to SLT's" even if they say "sluts".

If someone says "I paid thirteen dollars" put "I paid $13" if they say "It cost me ten grand but if I asked the other guy it would've been twenty" you can put It cost me $10,000 but if I asked the other guy it would've been $20,000. or It cost me $10,000 but if I asked the other guy it would've been 20.

If a character is talking about Wikipedia and they say "W P colon mos" put WP:MOS as that's how you type it. If slash is said, then it should be subtitled as "slash". For example, "detective slash genius," is subtitled as "detective slash genius," not "detective/genius."

Music

Lyrics being sang should have proper capitalization, no period, and should be surrounded by the ♪ character, Unicode U+9834, or Alt+266A. You can also use ♫ Unicode U+9835, or Alt+266B, e.g.

 1
 00:00:20,000 --> 00:00:24,400
  ♪ Take me out to the ball game ♪T

When the characters start talking over the music, lyrics should be omitted so the audience can read what the characters are saying.

Markup

The only supported markup of the SRT format is

Bold – ...
Italic – ...
Underline – ...

REMINDER: Wikicode formatting is not supported.

Internationalization

After the subtitles have been transcribed in the original language of the video onto a Timed Text file, they can be translated into other languages as follows:

Open the Timed Text file in the original language, say English for example TimedText:Elephants Dream.ogv.en.srt, in edit mode and copy the whole of the page.
In the address bar replace "en" with the language code of your choice, say "fr", then paste the original text in the new page.
View the original video, then translate the text into your language.
After saving the new page, the video with the subtitles should load onto the page; you can view it to check the timing of the subtitles.
Add a category link to the talk page [[Category:Timed Text in Language Name|Language Name]]. For example, see TimedText talk:Elephants Dream.ogv.fr.srt.

Wikipedia articles about the topics of Timed Text or subtitles

These are articles about either Q844253: Timed text, or Q204028: subtitle.

Dansk: Undertekster
Deutsch: Untertitelung
Ελληνικά: Υπότιτλοι
English: Timed Text is also termed subtitles, closed captioning and closed caption text. See also Subtitle (captioning).
Esperanto: Subtekstoj
Español: Subtítulo
Français : sous-titrage
Interlingua: Subtitulos
Italiano: Sottotitolo
日本語: 字幕
한국어: 자막
Македонски: Толкување
Nederlands: Ondertiteling
Norsk bokmål: Undertekster
Português: Legenda
Русский: Субтитры
Slovenščina: podnaslovi
Svenska: Textning
Українська: Субтитри
粵語: 字幕
中文：字幕
Bahasa Indonesia: Teks Berwaktu

Linking

This section needs expansion.

How to associate closed captions with multimedia files?

Redirect to avoid duplicated content, for example TimedText:Elephants Dream (high quality).ogv.pt.srt redirects to the existing TimedText:Elephants Dream.ogv.pt.srt. This ensures the closed captions template displays the correct file name of the caption files (this could be important with movie clips).
{{Closed captions}}'s parameter is an alternative
more support is needed for the Timed Text function;
Categorizing: Not possible to categorize the Timed Text page itself, but the Timed Text Talk can be.

A possible categorization scheme is:

 [[:Category:File formats]] + [[:Category:Media types]]
                       |
               [[:Category:Timed Text]] + [[:Category:Legend in German]]
                                   | 
                           [[:Category:Timed Text in German]]
 
                                   + [[:Category:Legend in French]]
                                   | 
                           [[:Category:Timed Text in French]]
 
                                   + [[:Category:Legend in English]]
                                   | 
                           [[:Category:Timed Text in English]]

Related categories: Category:Files with closed captioning

References

↑ ^a ^b AI — Artificial Intelligence
↑ File:Boston Police Attack Nonviolent Protestors at "Straight Pride".webm
↑ Popeye Fright to the Finish with Closed Captions and Audio Description
↑ Friends episode "The One With The Two Parties"
↑ Friends episode "The One With Phoebe's Birthday Dinner"

[ai-1] AI — Artificial Intelligence

[2] File:Boston Police Attack Nonviolent Protestors at "Straight Pride".webm

[3] Popeye Fright to the Finish with Closed Captions and Audio Description

[4] Friends episode "The One With The Two Parties"

[5] Friends episode "The One With Phoebe's Birthday Dinner"

[1]

[2]

[3]

[4]

[5]