How to Add Live Captions to Any Video in Chrome (Free AI Tool)
Add real-time captions to any video playing in Chrome — YouTube, Netflix, Twitch, or any website. Compare Chrome's built-in captions with AI-powered alternatives.
Whether you are watching a foreign language lecture, following along in a noisy environment, or have hearing difficulties, live captions transform the video experience. But Chrome’s built-in captioning has significant limitations that most users discover the hard way.
This guide covers every way to add real-time captions to any video in Chrome — from the free built-in option to AI-powered tools that handle multiple languages, and explains when each approach works best.
Chrome’s Built-In Live Caption Feature
Chrome has shipped a native Live Caption feature since version 89 (2021). It uses an on-device speech recognition model to transcribe English audio in real time.
How to enable it
- Open Chrome Settings (chrome://settings)
- Click Accessibility in the left sidebar
- Toggle Live Caption on
- Chrome will download a small speech recognition model (approximately 100 MB)
- A caption bar appears at the bottom of any tab playing audio
What it does well
- Privacy: all processing happens on-device. No audio is sent to Google’s servers.
- Works on any website: if audio is playing in the tab, it gets captioned. YouTube, Netflix, Twitch, random embedded videos — all work.
- Zero cost: completely free, no extension needed.
- Low latency: captions appear within 500ms-1s of speech.
Where it falls short
Chrome Live Caption has several limitations that become apparent quickly:
English only (on most platforms). While Google has added a handful of additional languages on ChromeOS and Android, the desktop Chrome version supports primarily English. If you are watching a German lecture, a Spanish podcast, or a Japanese stream, Chrome Live Caption outputs nothing useful — it attempts to transcribe foreign speech as English, producing gibberish.
No punctuation or formatting. The output is a continuous stream of lowercase words with minimal punctuation. Long sentences run together, making it hard to follow complex content like academic lectures or technical presentations.
No export or save option. You cannot save the captions as an SRT or VTT file. The text appears in a floating bar and disappears. If you need a transcript for notes, studying, or accessibility compliance, Chrome Live Caption does not help.
No speaker identification. In content with multiple speakers — interviews, panel discussions, podcasts — all speech appears as a single undifferentiated stream. There is no indication of who is speaking.
Accuracy degrades with accents and background noise. The model handles standard American English reasonably well but struggles with strong accents, fast speech, overlapping speakers, and background music.
AI-Powered Live Captions with Hearably
Hearably’s live caption feature takes a different approach by running a much larger AI model directly in your browser.
How it works
Hearably uses a 4-billion parameter speech recognition model (based on Voxtral Mini) that runs entirely client-side via WebGPU. The model is significantly more capable than Chrome’s built-in speech recognition:
- 150 million parameters (Chrome) vs 4 billion parameters (Hearably) — roughly 26x larger
- 13 languages supported: English, German, French, Spanish, Russian, Chinese, Japanese, Italian, Portuguese, Dutch, Arabic, Hindi, Korean
- Proper punctuation and capitalization from the model’s language understanding
- Better accent handling due to broader training data
The model downloads once (approximately 2.5 GB across several shards) and is cached in your browser. After the initial download, captions work offline with no server dependency.
Feature comparison
| Feature | Chrome Live Caption | Hearably AI Captions |
|---|---|---|
| Languages | English (desktop) | 13 languages |
| Model size | ~100 MB | ~2.5 GB (cached) |
| Punctuation | Minimal | Full punctuation + capitalization |
| Accuracy (English) | Good (standard accents) | Very good (all accents) |
| Accuracy (other languages) | Not supported | Good to very good |
| Export to SRT/VTT | No | Yes |
| Speaker identification | No | Planned |
| Works offline (after download) | Yes | Yes |
| Privacy | On-device | On-device |
| System requirements | Any Chrome 89+ | Chrome 113+ with WebGPU |
| Cost | Free | Free (beta) |
System requirements for AI captions
The AI caption engine uses WebGPU for inference, which requires:
- Chrome 113 or later (or Edge 113+)
- A GPU with at least 4 GB VRAM (most modern laptops meet this)
- macOS 12.3+, Windows 10+, or ChromeOS 113+
- The initial model download of approximately 2.5 GB (cached after first use)
On an Apple M1 or later MacBook, the model runs at approximately 1-2 second latency. On a modern Windows laptop with a discrete GPU, performance is similar. Integrated graphics on older machines may experience higher latency (3-5 seconds).
Use Case: Foreign Language Content
One of the most valuable applications of AI-powered captions is watching content in a language you are learning or do not speak at all. Chrome Live Caption cannot help here, but multilingual AI captions open up entirely new content.
Scenario: watching a German university lecture
- Enable Hearably’s live captions
- Select German as the source language
- The AI model transcribes the German speech with proper German punctuation and spelling
- Optionally, enable translation to see English captions (translation feature in development)
This works on any website — not just platforms with built-in subtitle support. A live-streamed German conference on a university website, a French podcast embedded in a blog, a Japanese VOD on a niche platform — all get accurate captions.
Scenario: following along with accented English
Many users enable captions not because they do not understand the language but because accented speech is harder to parse in real time, especially in noisy environments. The larger AI model handles Indian English, Scottish English, Nigerian English, and other accented varieties significantly better than Chrome’s smaller model because it was trained on a more diverse dataset.
Use Case: Accessibility and Hearing Difficulties
For users who are deaf or hard of hearing, caption quality directly impacts comprehension. The differences between basic and AI-powered captions matter significantly:
Punctuation affects meaning. “Let’s eat grandma” versus “Let’s eat, grandma.” Chrome’s basic captions frequently omit commas, periods, and question marks, forcing the reader to infer sentence boundaries from context. AI captions provide full punctuation.
Capitalization signals proper nouns. Is the speaker talking about “apple” (the fruit) or “Apple” (the company)? Without capitalization, context is the only clue. AI captions capitalize properly.
Word accuracy is not optional. For someone who cannot hear the original audio at all, a misrecognized word is not a minor annoyance — it is missing information. The higher accuracy of a larger model directly translates to better comprehension.
Use Case: Note-Taking and Study
Students watching recorded lectures benefit enormously from live captions they can export. Hearably’s auto caption generator generates SRT or VTT files from any video, which can then be:
- Imported into note-taking apps alongside timestamps
- Searched for specific terms mentioned in the lecture
- Used to create study guides from transcript text
- Shared with classmates who missed the lecture
The ability to export captions transforms a passive viewing experience into a searchable text document.
How to Get the Best Caption Accuracy
Regardless of which captioning tool you use, these tips improve accuracy:
Use headphones when possible
When you play audio through speakers, room echo and ambient noise get mixed back into the audio stream. Headphones eliminate this feedback loop, giving the captioning model a cleaner input signal.
Enable noise reduction
Hearably’s volume booster includes a noise reduction feature that cleans up the audio stream before it reaches the captioning model. Reducing background noise improves word error rate by 15-30% on noisy content.
Choose the right source language
If you know the spoken language, explicitly selecting it (rather than relying on auto-detection) improves accuracy. Language auto-detection works for most content but can oscillate between similar languages (Spanish/Portuguese, Dutch/German) in the first few seconds.
Avoid extreme volume boosting during captioning
While Hearably’s volume booster and AI captions work simultaneously, boosting volume beyond 400% can introduce compression artifacts that slightly degrade transcription accuracy. For captioning purposes, a moderate boost (100-200%) with EQ adjustments produces better results than extreme amplification.
Start Captioning Any Video in Chrome
Chrome’s built-in Live Caption is a decent starting point for English-only content. For multilingual support, better accuracy, proper formatting, and exportable transcripts, AI-powered captions are the next step.
Install Hearably from the Chrome Web Store and enable AI captions on any video. The model downloads once in the background while you continue browsing, and captions are available on every site from that point forward — no per-site configuration, no subscriptions to individual platforms, no dependence on content creators adding subtitles.
Every video. Every language. Every word.
Try Hearably for free
Volume boost, live captions, noise reduction, and more — all in your browser.
Add to Chrome — Free