Editing Captions and Subtitles

The main section of the editor lists all captions and subtitles available for the current language:

Selecting two captions enables the Join button

The + button allows you to create a new caption. If a caption is currently selected, a new caption is created below it. When no captions are selected, the new caption is inserted at the end of the list, after all existing captions for the current language.
The - button deletes all selected captions. The button is not available when the list is empty or if no captions are currently selected.
The ⚙ button lets you display global preferences that let you customize certain behaviors of the user interface. For example, you can define what happens after editing the in or out timecodes of a caption, or define a time range when searching for captions by timecode.
The Join button is available when exactly two captions are selected. Clicking this button causes the two captions to be merged into a single caption. Text from both captions is merged on multiple lines. The in timecode for the merged caption will be the same as the in time for the first caption. The out timecode will be the same as the second caption being merged. The end result is to have a single caption that is displayed on screen for the same duration as the previous two captions:

Two captions have been merged into one
The Split button is available when a single caption is selected. Clicking this button causes the single caption to be split in two, with the first line of text being assigned to the first caption and subsequent lines of text assigned to the second caption. The duration of each caption will be exactly half of what the original caption was. Only captions that span multiple lines of text can be split

To edit a caption’s text, click, select and type as you would in a normal text editor. The user interface allows for up to 4 lines of text to be previewed. We recommend that no caption use more than 2 lines of text. This is both a strict requirement for exporting to iTunes Time Text and good practice to follow. The readability of your captions on any device and display suffers when more than 2 lines are presented at once.

Working with Timecodes

All timecodes are displayed in the SMPTE notation, using colons as delimiters for non-drop frame mode (HH:MM:SS:FF). When drop frame mode is enabled for NTSC frame rates, the last component in the timecode is separated by a semicolon (HH:MM:SS;FF).

When entering timecodes, remember that components on the right take precedence. For example the timecode 2:30:02 translates to 2 minutes, 30 seconds and 2 frames. When entering an invalid frame number, it is automatically adjusted to be within the range allowed by the current frame rate. For example, timecode 1:10:28 is automatically adjusted to 1:10:23 if the current frame rate is 24fps, since this only allows frame numbers from 0 to 23.

Timing Options

Options available under the Timing section affect all captions and all languages in the file:

Timing options

The Frame rate option affects how timecodes are displayed. It also affects the rules for altering existing timecodes or entering new ones. Change the frame rate to best match the project and data you are working with. Changes to the frame rate are non-destructive and do not cause any retiming problems. You can also choose one frame rate for editing and an entirely different one for exporting.
The Drop frame option lets you switch to and from the SMPTE drop-frame mode for Broadcast NTSC frame rates: 29.97fps and 59.94fps. Please note some important side effects of turning on drop frame mode:
1. Timecodes are displayed according to the rules of this standard. Any timecodes you enter are similarly expected to follow the rules defined by the standard.
2. Enabling or disabling drop frame mode can cause captions to shift forward or backward by small amounts, when timecodes are edited. This is due to the nonintuitive and peculiar rules behind the drop-frame standard. You are not required to turn on drop-frame mode simply because you are expected to export captions in that standard. The export window gives you a chance to enable drop frame mode in its own UI, as a single-shot option that only affects the exported captions.
The Offset timecode defines a time shift applied to all captions. For example, an offset of 2:00:00 means that all captions are considered relative to the 2 minute mark on the timeline. In the same scenario, a caption that is set to start at 1:12 (1 second, 12 frames) will in fact start at 2:12:12.
This value is imported from iTunes Timed Text files, and few other file formats have a native concept of a global time offset. Its usefulness should be clear during export operations, since it allows you to shift all captions forward or backward in time without requiring you to manually adjust every timecode. A huge time saver!
The Overlap option allows captions to have overlapping time ranges, so that one or more captions may appear simultaneously on screen. By default the Allowed button is off because many popular formats do not allow captions to appear simultaneously on screen. Two captions are said to be overlapping when the time period represented by their in/out timecodes intersects. Turn Allowed on to allow captions to have overlapping time ranges. This can be useful when multiple characters on screen are speaking at the same time, and you would like to create separate captions for each voice. Simultaneous captions can be exported to WebVTT, or burned in through the Caption Burner plug-in.

Appearance Options

Options under the Appearance section affect all selected captions:

Appearance options

A preview canvas provides a quick glance at how the caption might appear within the frame, according to the options selected below. Text formatting options will apply to entire captions, or only to the selected text within a caption.

The Text Box option lets you choose the size of the area where caption text is rendered, relative to the width of the frame. A value of 100% means that the text can occupy the entire width of the frame. A value of 50% means the text is rendered at most half as wide as the entire frame. It then becomes possible to position the text box relative to the frame by using the horizontal slider below the preview image, or by clicking and dragging directly within the preview. The Text Box option allows you to position a caption directly below a character’s face, and it is generally required when working with simultaneous captions.
The Position option controls the general location of the caption within the frame. iTunes Timed Text only supports captions in the top or bottom regions of the frame. The WebVTT specification allows for greater flexibility. You can fine tune the vertical position of the text by using the vertical slider to the right of the preview image, or by clicking and dragging over the preview image itself. Please note that many text-based caption file formats, such as SRT, support relative positioning of captions within the frame only though {\an...} tags.
The Alignment option affects the relative alignment of text within its enclosing box. iTunes Timed Text does not support this option, and neither do most text-based caption file formats. This option is currently available for output to WebVTT.
The Style option allows you to apply basic text styles to any selected text, or to all selected captions at once. Remember to select portions of the caption text if you want a specific style to affect specific words or characters rather than the entire caption. Styles are supported by iTunes Timed Text, WebVTT and by a number of text-based formats through the inclusion of HTML-style tags in the output.
The Color option lets you pick a color for any selected text. Text colors are natively supported by iTunes Timed Text. SRT and other text-based formats support text colors through the inclusion of HTML-style tags in the output. Remember to select portions of the caption text if you want to change the color of specific words or characters rather than the entire caption.

Notably absent from this section are any options to choose a font and size. Most caption file formats do not support font information, instead putting the responsibility of picking the correct font and size to the software and/or device being used to display captions.

Searching Captions

Below and to the left of your captions is a search box that supports text and timecode-based searches:

Filtering captions by entering search terms

When you enter one or more keywords in the search box, all captions that contain any of those words are displayed.

You can also search for captions by entering a timecode:

Filtering captions by timecode

All captions that appear near that timecode are displayed. By default, captions that appear 5 seconds before or after the given time are matched. You can set a different range through the Settings window.

Finding and Fixing Problems

The editor automatically detects when two or more captions are set to appear simultaneously during playback. These timings conflicts are highlighted in the user interface:

Captions whose timecodes overlap are reported as conflicts

In the example above, a value of 8 was entered in the out timecode of the first caption when a value of 7 was intended. The result of this mistake is that the first caption overlaps the next caption. Changing the out timecode of the first caption to 2:57:17 would fix the problem.

When the imported data contains multiple conflicts, enable the Conflicts Only option to temporarily display only captions that have timing problems to be resolved.

Click the ↑ and ↓ buttons to jump to the previous and next caption with problems. For quick navigation, the keyboard shortcut for these buttons is the Option ⌥ key followed by the up or down arrow on your keyboard.

Translating Captions

Multiple languages can be edited and saved through a single editor. This makes it easier to manage different translations for the same media.

When creating a new set of captions, the current system locale is used to create the initial language, i.e. "English (United States)". When importing an existing file, add the language that matches the captions being imported. For example, when importing a SubRip (.srt) file containing German language subtitles, store those captions as "German".

To add a new language to the editor, click the + button to the right of the Language menu. Similarly, click the - button to delete the current language and all its associated captions from the editor.

Only one language is visible by default while editing

Translation

When creating a new language, you are given a chance to duplicate existing captions to the new language. This is a common technique for anyone who embarks on the translation effort.

The editor goes a step further to make translation easier. Click the Translate ❯ button to reveal a second, side-by-side view where you can load a reference language (your source for the translation):

Two languages, visible side-by-side

Load your source language on the right side (the reference view) and work on the translation on the left (the primary view).

Click on a caption in one language to find its nearest equivalent in the other language. This allows you to identify the caption(s) that match a given timecode, and help you verify the correctness of the translation.

Languages are not required to have the same number of captions or the same timecodes. Translators have complete freedom in using more or less captions, and at different timecodes, to translate the underlying material.

Only captions and timecodes on the left-view can be changed. Text and timecodes in the right view are locked. They can only be selected and copied.

Importing

The import process begins when you open an existing file, or when you click the Import button in the editing window.

Certain file formats embed enough information as to be readable directly, with no user intervention. One such example are iTunes Timed Text files. All other formats may require you to fine-tune the import process through this window:

Import options

The most likely settings are automatically applied. In most cases, you only job when importing is to match the language to the contents of the file.

The Format menu lets you pick one of the supported file formats. This format instructs the import process on how to interpret the file contents. As for most options in the Import window, this is automatically guessed from the file extension and contents of the file. In most cases it is not be necessary to change the initial selection.
The Language menu lets you select among hundreds of languages and regions used across the world. Match the language and region selection to the contents of the file. For example: when importing Japanese captions, select Japanese. When importing German subtitles meant for Swiss viewership, select German (Switzerland), etc.
The Frame rate menu lets you select a frame rate when importing from files that use SMTPE timecodes, such as Adobe Encore Script. These file formats use frame numbers in their timecodes but fail to declare what range of frames was used (0 to 23 for 24fps, 0 to 29 for 30fps, etc.)
In these cases it is very important that you match the frame rate to the known frame rate used to generate the source file. If the source file was known to use drop frame timecodes, enable the Drop frame (NTSC only) option as well. This option is only available for Broadcast NTSC frame rates: 29.97fps and 59.94fps.
The Text encoding option lets you choose how the contents of the text file are to be interpreted. In more recent times, files are mainly encoded using one of several Unicode standards (UTF-8, @UTF-16) to ensure that characters and glyphs in any language can be represented. Unfortunately a large number of captions are still available in files that use text encodings specific to a family of languages (e.g. Western European languages) or single languages (Japanese). Our software tries to automatically detect the text encoding from the input file. A Preview is available to check the contents of the file as interpreted through the given encoding. If the preview looks incorrect, switch to a different encoding until the contents of the file are read correctly.

Importing WebVTT

When importing WebVTT files, the following UI is visible under the Format menu:

Additional options available for WebVTT

The extra options give you control over any voice tags found in the source file:

The Voice Tags: Insert with delimiter option lets you convert any voice tags to caption text during the import process.
The delimiter is a series of characters used to separate voice identifiers from the caption text.
The style options affect the voices only, allowing you to make them bold, italic and/or underline, and to assign a custom color.

For example, when using the default delimiter of ": " with the bold option enabled and the color set to cyan, the WebVTT cue:

<v John>Hello!<v>

...is imported as the caption:

John: Hello!

Importing Adobe Premiere Pro Markers

When importing Premiere Pro Markers files, the following UI is visible under the Format menu:

The Import menu is only available when importing Adobe Premiere Pro markers stored in CSV files. It lets you pick whether you with to import caption text from the marker’s name, comments or both.

Importing Lyrics (LRC)

When importing LRC files, the following UI is visible under the Format menu:

Additional options available for Lyrics (LRC)

The Apply custom style to active lyrics creates multiple captions for each set of word time tags found in the source file, and apply a different style to the active lyrics. When this option is disabled, any word time tags are ignored, and only one caption for each line of lyrics will be created.

Not all LRC files contain word time tags. It is a feature described as Enhanced LRC, designed to help karaoke machines highlight active lyrics on screen.

Exporting

The export process begins when you click the Export button in the editing window. Export options are available inside the standard Save panel:

File export options

The options you choose at this stage only affect the export process. The original captions and options are left untouched during an export. This allows you to quickly perform a series of export operations at various output settings. For example, you might want to output a series of iTunes Timed Text files at various frame rates, or to create drop frame and non-drop frame versions of the same captions.

The Language menu lets you select which language is displayed by the editor. Since multiple languages may be available, your first job is to select the language you wish to export to a file.
The Format menu lets you choose the file format to save your data into. Every file format has different characteristics. The Summary at the bottom of the window is meant to provide a quick review.
If the output file format supports text styles, you can choose to include those styles by enabling the Export styles option. When you omit text styles, all captions are exported as text only, regardless of the features supported by the file format. When exporting to YouTube, remember to export without markup as required by their platform.
If the output file format supports caption positioning, you can choose to include the size and relative position of each text box by enabling the Export positions option. When you omit positioning, The software or device displaying the captions has control over their location within the frame.
When exporting a SubRip (SRT) file for Facebook, enable the Facebook Naming Convention option to have the correct language suffix appended to your filename. For example, if your filename is "Demo" and it contains English subtitles for the United States, the filename becomes Demo.en_US.srt. This option is not available when exporting to formats other than SubRip.
The Frame rate menu lets you pick a desired frame rate for the exported timecodes. This option is only available for file formats that use SMPTE timecodes, i.e. iTunes Timed Text and Adobe Encore Script. As with all other export options, changing the frame rate at this stage only affects the current export. Enable the Drop frame option if you want the timecodes to be generated using the SMTPE drop frame notation for NTSC broadcast frame rates (29.97fps and 59.94fps).
The Start time timecode lets you offset all exported timecodes by the same value. This may be useful when extra content has been added at the beginning of the media, since you can quickly offset all captions by the required time without having to edit any timecodes. As an example, if you enter a Start time of 2:30:00 and your first caption starts at 1:00 (one second) the actual exported timecode for the first caption will be 2:31:00.
Enable the Export start timecode even when zero when exporting to iTunes Timed Text (iTT) to improve compatibility with software that always expects this timecode, even when the offset is zero. This option has no effect when exporting to other formats.
The Text encoding menu lets you select one of the common encodings used to transfer text files across computers. This option is disabled for some formats, like iTunes Timed Text and WebVTT, that require files to be in a specific flavor of Unicode (UTF-8). When exporting to other text-based formats, it may make sense to select an encoding that is known to be supported by the software that will be used to read and display your captions. Many such programs still use one of the legacy Windows codepages corresponding to one or more languages.
When exporting to Unicode (UTF-8 or UTF-16) the Add Byte Order Mark (BOM) option lets you decide if you want the output file to contain a special, invisible character sequence that instructs other software on how to interpret the contents of the file correctly.

Formats

iTunes Timed Text (iTT)
WebVTT
SubRip (SRT)
SubViewer (SUB)
Adobe Encore Script
Adobe Premiere Pro Markers (CSV)
Lyrics (LRC)

iTunes Timed Text (iTT)

iTunes Timed Text (iTT) is a subset of the Timed Text Markup Language by the World Wide Web Consortium (W3C). All iTT documents are TTML documents that use the restricted subset of TTML. iTunes Timed Text is natively supported by Final Cut Pro 10.4 (or later). iTT files store all timecodes in the SMTPE format, with a distinction between drop frame (HH:MM:SS;FF) and non-drop frame (HH:MM:SS:FF) timecodes.

When importing iTunes Timed Text

The majority of information is provided by the iTT specification, and the language, style and timing stored in the file will be faithfully imported.

When Exporting iTunes Timed Text

Most of the information you can manipulate through the editor is faithfully exported to the iTT file, with a few important exceptions. The iTT specification does not allow for simultaneous captions (i.e. captions whose time periods overlap). iTT does not support text box sizing and placement. iTT does not allow you to customize text alignment, since all captions are centered within the frame.

WebVTT

WebVTT is an evolving standard by the World Wide Web Consortium called The Web Video Text Tracks Format. The export process supports a limited but growing subset of the specification that deals with static captions:

Bold, italic and underlined text via HTML-like syntax.
Voices identified through HTML-like tags can be inlined during the import process. For example, the caption:
```
<v John>Hello!<v>
```
can be imported as:
```
John: Hello!
```

Most other information is skipped during an import. WebVTT uses its own format for timecodes (00:00:00.000) where the last component represents milliseconds.

When Importing WebVTT

The import process recognizes text styles defined inline via HTML-style syntax, and text colors as defined via CSS-like statements in STYLE sections. Voices can be inlined into captions or skipped, based on the options provided for the import process. All other information is skipped.

When Exporting WebVTT

Most of the caption information you can manipulate in the editor is exported to WebVTT. Bold, italic and underline text styles are exported via inline HTML-style syntax. Text alignment, text box size and relative positioning within the frame are also exported via Cue attributes. Text colors will be exported once the

STYLE

section is widely supported.

SubRip (SRT)

SubRip (SRT) remains the most popular text-based caption file formats, despite having no formal specification and limited support for text formatting via HTML-like syntax. SubRip files use their own format for timecodes (00:00:00,000) where the last component represents milliseconds.

When Importing SubRip (.srt)

Text styles and colors are imported with appropriate HTML-like syntax.

Relative positioning of each caption or subtitle in the frame is supported by recognizing {\an1...9} tags at the beginning of each line.

When Exporting SubRip (.srt)

Text styles (bold, italic, underline) and colors are exported via HTML-like syntax. Text alignment and text box size are not supported. Simultaneous captions and relative positioning within the frame are supported by prepending the equivalent {\an1...9} tag at the beginning of each line.Make sure that you export your file as UTF-8 when targeting YouTube, Facebook or other popular social media platforms.In some cases it also helps to export files without markup (text formatting) to guarantee the best results. You can export captions without markup by turning off the Export Styles (markup) option in the Export panel.

SubViewer (SUB)

SubViewer is a text-based file format that does not support any appearance options or simultaneous captions. It is still popular with a number of software packages and web video platforms, such as YouTube. SubViewer files use their own format for timecodes (00:00:00.00) where the last component represents hundredths of a second.

Whem Importing SubViewer (.sub)

Only caption text is imported.

Whem Exporting SubViewer (.sub)

Only caption text is exported.

Adobe Encore Script

Adobe Encore support both Text and Image-based subtitles. Caption Converter allows you to import and export Text Script files. The file format is extremely simple. Timecodes use SMTPE-like components (HH;MM;SS;FF) but no distinction is made between drop vs non-drop timecodes.

Whem Importing Adobe Encore Script

Only caption text is imported.

Whem Exporting Adobe Encore Script

Only caption text is exported. Make sure that you export the file in one of the Unicode formats (UTF-8 and UTF-16) to ensure that text in all languages is correctly preserved. The use of a Byte Order Mark is optional.

Adobe Premiere Pro Markers (CSV)

Adobe Premiere Pro allows you to export markers in XML and CSV files. Caption Converter currently allows you to import CSV files only. Premiere Pro uses SMPTE-like notation for its timecodes, distinguishing between non-drop frame (HH:MM:SS:FF) and drop-frame mode (HH;MM;SS;FF). The .csv file extension suggests that file contents should always be comma-separated values. In practice, recent versions of Premiere Pro seem to export tab-separated values instead. Either variant is detected and handled automatically by the import process.

When Importing Adobe Premiere Pro Markers

Only caption text is imported. Make sure to match the Drop frame setting to the value expected in the file. The import process will fail if your selection does not match the data in the file. While markers do not have any associated text formatting, Premiere Pro allows users to enter both a name and arbitrary comments for each marker. You can import marker name, comments, or both, via a setting in the import window. When importing name and comments, the name is imported as the first line in the caption and the comments are imported in subsequent lines.

When Exporting Adobe Premiere Pro Markers

Exporting to this format is not possible. Premiere Pro allows the importing of markers saved in the Final Cut Pro XML Interchange Format which predates Final Cut Pro X. Let us know if you are looking forward to having it as an option.

Lyrics (LRC)

LRC is a text-based file format that does not support any appearance options or simultaneous captions. It is popular for storing and displaying song lyrics. Its timecode has format MM:SS.XX, where MM is minutes, SS is seconds, and XX is hundredths of a second.

When Importing Lyrics (.lrc)

Text and any global offset stored in the file are imported. Since lyrics do not often include the out timecode, each line of lyrics ends where the next line begins. When the file does not specify the out timecode of the last lyric, the duration of the song is used to assign the correct timecode. Should the overall song duration also be unavailable or incorrect, the last lyric is assigned a default duration of one second.

When the LRC file contains enhanced word time tags, you have the option to translate a single line of lyrics into multiple captions, where the current words are displayed through a different style. You can choose a style for the active lyrics in the Import window. If you choose not to apply a custom style to active lyrics, all enhanced word tags are ignored, and one caption is created for each line.

When Exporting Lyrics (.lrc)

Only caption text and the overall offset are exported. Note that the LRC file format interprets the [offset:...] field with the opposite meaning as our software. A positive offset value indicates that lyrics are delayed by the specified amount. In our own software, a positive offset indicates that time should be fast-forwarded by the desired amount, thus causing captions to appearsooner on screen.

Editing Captions and Subtitles

Working with Captions

Working with Timecodes

Timing Options

Appearance Options

Searching Captions

Finding and Fixing Problems

Translating Captions

Translation

Importing

Importing WebVTT

Importing Adobe Premiere Pro Markers

Importing Lyrics (LRC)

Exporting

Formats

iTunes Timed Text (iTT)

When importing iTunes Timed Text

When Exporting iTunes Timed Text

WebVTT

When Importing WebVTT

When Exporting WebVTT

SubRip (SRT)

When Importing SubRip (.srt)

When Exporting SubRip (.srt)

SubViewer (SUB)

Whem Importing SubViewer (.sub)

Whem Exporting SubViewer (.sub)

Adobe Encore Script

Whem Importing Adobe Encore Script

Whem Exporting Adobe Encore Script

Adobe Premiere Pro Markers (CSV)

When Importing Adobe Premiere Pro Markers

When Exporting Adobe Premiere Pro Markers

Lyrics (LRC)

When Importing Lyrics (.lrc)

When Exporting Lyrics (.lrc)