March 16, 2026 · 5 min read

How to Add Live Captions to Any Video in Chrome (Free AI Tool)

Add real-time captions to any video playing in Chrome — YouTube, Netflix, Twitch, or any website. Compare Chrome's built-in captions with AI-powered alternatives.

live captionsaccessibilityAIchrome extensiontranscription

Adi Founder, Hearably

Whether you are watching a foreign language lecture, following along in a noisy environment, or have hearing difficulties, live captions transform the video experience. But Chrome’s built-in captioning has significant limitations that most users discover the hard way.

This guide covers every way to add real-time captions to any video in Chrome — from the free built-in option to AI-powered tools that handle multiple languages, and explains when each approach works best.

Chrome’s Built-In Live Caption Feature

Chrome has shipped a native Live Caption feature since version 89 (2021). It uses an on-device speech recognition model to transcribe English audio in real time.

How to enable it

Open Chrome Settings (chrome://settings)
Click Accessibility in the left sidebar
Toggle Live Caption on
Chrome will download a small speech recognition model (approximately 100 MB)
A caption bar appears at the bottom of any tab playing audio

What it does well

Privacy: all processing happens on-device. No audio is sent to Google’s servers.
Works on any website: if audio is playing in the tab, it gets captioned. YouTube, Netflix, Twitch, random embedded videos — all work.
Zero cost: completely free, no extension needed.
Low latency: captions appear within 500ms-1s of speech.

Where it falls short

Chrome Live Caption has several limitations that become apparent quickly:

English only (on most platforms). While Google has added a handful of additional languages on ChromeOS and Android, the desktop Chrome version supports primarily English. If you are watching a German lecture, a Spanish podcast, or a Japanese stream, Chrome Live Caption outputs nothing useful — it attempts to transcribe foreign speech as English, producing gibberish.

No punctuation or formatting. The output is a continuous stream of lowercase words with minimal punctuation. Long sentences run together, making it hard to follow complex content like academic lectures or technical presentations.

No export or save option. You cannot save the captions as an SRT or VTT file. The text appears in a floating bar and disappears. If you need a transcript for notes, studying, or accessibility compliance, Chrome Live Caption does not help.

No speaker identification. In content with multiple speakers — interviews, panel discussions, podcasts — all speech appears as a single undifferentiated stream. There is no indication of who is speaking.

Accuracy degrades with accents and background noise. The model handles standard American English reasonably well but struggles with strong accents, fast speech, overlapping speakers, and background music.

AI-Powered Live Captions with Hearably

Hearably’s live caption feature takes a different approach by running a much larger AI model directly in your browser.

How it works

Hearably uses a 4-billion parameter speech recognition model (based on Voxtral Mini) that runs entirely client-side via WebGPU. The model is significantly more capable than Chrome’s built-in speech recognition:

150 million parameters (Chrome) vs 4 billion parameters (Hearably) — roughly 26x larger
13 languages supported: English, German, French, Spanish, Russian, Chinese, Japanese, Italian, Portuguese, Dutch, Arabic, Hindi, Korean
Proper punctuation and capitalization from the model’s language understanding
Better accent handling due to broader training data

The model downloads once (approximately 2.5 GB across several shards) and is cached in your browser. After the initial download, captions work offline with no server dependency.

Feature comparison

Feature	Chrome Live Caption	Hearably AI Captions
Languages	English (desktop)	13 languages
Model size	~100 MB	~2.5 GB (cached)
Punctuation	Minimal	Full punctuation + capitalization
Accuracy (English)	Good (standard accents)	Very good (all accents)
Accuracy (other languages)	Not supported	Good to very good
Export to SRT/VTT	No	Yes
Speaker identification	No	Planned
Works offline (after download)	Yes	Yes
Privacy	On-device	On-device
System requirements	Any Chrome 89+	Chrome 113+ with WebGPU
Cost	Free	Free (beta)

System requirements for AI captions

The AI caption engine uses WebGPU for inference, which requires:

Chrome 113 or later (or Edge 113+)
A GPU with at least 4 GB VRAM (most modern laptops meet this)
macOS 12.3+, Windows 10+, or ChromeOS 113+
The initial model download of approximately 2.5 GB (cached after first use)

On an Apple M1 or later MacBook, the model runs at approximately 1-2 second latency. On a modern Windows laptop with a discrete GPU, performance is similar. Integrated graphics on older machines may experience higher latency (3-5 seconds).

Use Case: Foreign Language Content

One of the most valuable applications of AI-powered captions is watching content in a language you are learning or do not speak at all. Chrome Live Caption cannot help here, but multilingual AI captions open up entirely new content.

Scenario: watching a German university lecture

Enable Hearably’s live captions
Select German as the source language
The AI model transcribes the German speech with proper German punctuation and spelling
Optionally, enable translation to see English captions (translation feature in development)

This works on any website — not just platforms with built-in subtitle support. A live-streamed German conference on a university website, a French podcast embedded in a blog, a Japanese VOD on a niche platform — all get accurate captions.

Scenario: following along with accented English

Many users enable captions not because they do not understand the language but because accented speech is harder to parse in real time, especially in noisy environments. The larger AI model handles Indian English, Scottish English, Nigerian English, and other accented varieties significantly better than Chrome’s smaller model because it was trained on a more diverse dataset.

Use Case: Accessibility and Hearing Difficulties

For users who are deaf or hard of hearing, caption quality directly impacts comprehension. The differences between basic and AI-powered captions matter significantly:

Punctuation affects meaning. “Let’s eat grandma” versus “Let’s eat, grandma.” Chrome’s basic captions frequently omit commas, periods, and question marks, forcing the reader to infer sentence boundaries from context. AI captions provide full punctuation.

Capitalization signals proper nouns. Is the speaker talking about “apple” (the fruit) or “Apple” (the company)? Without capitalization, context is the only clue. AI captions capitalize properly.

Word accuracy is not optional. For someone who cannot hear the original audio at all, a misrecognized word is not a minor annoyance — it is missing information. The higher accuracy of a larger model directly translates to better comprehension.

Use Case: Note-Taking and Study

Students watching recorded lectures benefit enormously from live captions they can export. Hearably’s auto caption generator generates SRT or VTT files from any video, which can then be:

Imported into note-taking apps alongside timestamps
Searched for specific terms mentioned in the lecture
Used to create study guides from transcript text
Shared with classmates who missed the lecture

The ability to export captions transforms a passive viewing experience into a searchable text document.

How to Get the Best Caption Accuracy

Regardless of which captioning tool you use, these tips improve accuracy:

Use headphones when possible

When you play audio through speakers, room echo and ambient noise get mixed back into the audio stream. Headphones eliminate this feedback loop, giving the captioning model a cleaner input signal.

Enable noise reduction

Hearably’s volume booster includes a noise reduction feature that cleans up the audio stream before it reaches the captioning model. Reducing background noise improves word error rate by 15-30% on noisy content.

Choose the right source language

If you know the spoken language, explicitly selecting it (rather than relying on auto-detection) improves accuracy. Language auto-detection works for most content but can oscillate between similar languages (Spanish/Portuguese, Dutch/German) in the first few seconds.

Avoid extreme volume boosting during captioning

While Hearably’s volume booster and AI captions work simultaneously, boosting volume beyond 400% can introduce compression artifacts that slightly degrade transcription accuracy. For captioning purposes, a moderate boost (100-200%) with EQ adjustments produces better results than extreme amplification.

Start Captioning Any Video in Chrome

Chrome’s built-in Live Caption is a decent starting point for English-only content. For multilingual support, better accuracy, proper formatting, and exportable transcripts, AI-powered captions are the next step.

Install Hearably from the Chrome Web Store and enable AI captions on any video. The model downloads once in the background while you continue browsing, and captions are available on every site from that point forward — no per-site configuration, no subscriptions to individual platforms, no dependence on content creators adding subtitles.

Every video. Every language. Every word.

Try Hearably for free

Volume boost, live captions, noise reduction, and more — all in your browser.

Add to Chrome — Free