AI-POWERED LIVE CAPTIONS

💬

Live Captions for Any Website

Real-time AI transcription on any webpage. Whisper and Moonshine models run entirely in your browser — no cloud, no latency, no data leaves your device. 90+ languages with auto-detection.

Add to Chrome — Free Boost a File Online

Real-time enhancement via extension · Or upload a file for free in Studio

🎵

Try it now — drop your file here

MP3, WAV, FLAC, MP4, MOV — 10-second free preview

Live captions have become essential for accessibility, comprehension, and multitasking — but until now, browser-based captioning has been limited to a handful of platforms with their own built-in implementations. YouTube has auto-captions (often inaccurate), Google Meet has live transcription (English-centric), and Chrome itself offers a system-level Live Caption feature that only supports English. If you watch a lecture on Coursera, a webinar on a custom platform, or a foreign-language stream on Twitch, you are on your own. Hearably changes this by bringing AI-powered live captions to every website in Chrome and Edge.

The technology behind Hearably's captions is OpenAI Whisper (and the lighter Moonshine variant), running entirely inside your browser via WebAssembly and ONNX Runtime. When you enable captions, Hearably captures the tab's audio stream using Chrome's tabCapture API, feeds it through a Voice Activity Detection (VAD) gate that filters out silence and background noise, and sends voiced segments to the Whisper model for transcription. The result appears as a styled subtitle overlay directly on the webpage — positioned, colored, and sized to your preferences, draggable to any corner of the screen.

What makes this approach powerful is that the transcription model runs locally in an offscreen document. Your audio never leaves your device. There is no cloud API call, no transcription server, no usage-based billing, and no privacy concern. The VAD gate is critical for performance: rather than transcribing continuous audio (which would overwhelm the model and produce hallucinated text during silence), the system only sends audio segments where speech is detected. This reduces GPU/CPU load by 60-80% and eliminates the phantom text artifacts that plague always-on transcription systems.

Language detection is automatic. Whisper's multilingual model identifies the spoken language from the first few seconds of audio and transcribes accordingly — no manual language selection required. This works across 90+ languages including English, Spanish, French, German, Japanese, Korean, Mandarin, Arabic, Hindi, Portuguese, and dozens more. For multilingual content (a professor switching between English and their native language, for example), the model adapts in real time as the language changes.

The subtitle overlay itself is fully customizable: font size, text color, background opacity, position (top, center, bottom), maximum line count, and fade timing. Captions appear with a smooth text buffer that accumulates partial results and commits final transcriptions — avoiding the jittery word-by-word display that makes most live caption systems hard to read. The overlay is rendered as a lightweight DOM element injected by the content script, positioned above the page content with a high z-index so it works on fullscreen video players, embedded iframes, and complex web apps alike.

The Technical Problem

How In-Browser AI Transcription Works

Hearably's live caption pipeline runs entirely within Chrome's extension architecture, using three coordinated components. The content script renders the subtitle overlay on the active webpage. The service worker orchestrates audio capture via chrome.tabCapture.getMediaStreamId(). The offscreen document hosts the actual transcription engine — a Web Worker running Whisper or Moonshine via ONNX Runtime WebAssembly.

Audio flows from the tab capture stream into a ring buffer that continuously stores raw PCM samples. A Voice Activity Detection (VAD) module (Silero VAD, also running in WASM) analyzes the buffer in real time, flagging segments where speech probability exceeds a threshold. Only these voiced segments are sent to the Whisper model for inference. This VAD gating is essential: without it, the model would hallucinate repetitive text during silent passages and consume unnecessary compute cycles.

The Whisper model itself runs as an ONNX graph in a dedicated Web Worker thread, keeping the main thread and audio processing thread unblocked. Inference on a typical 5-second audio segment takes 200-800ms depending on hardware, producing a text result that is sent back to the offscreen document, relayed to the service worker, and finally dispatched to the content script for rendering. The text buffer in the content script accumulates partial results, applies sentence boundary detection, and renders the final caption with a configurable fade-out. A repetition_penalty parameter in the Whisper decoder prevents the common failure mode where the model loops on repeated phrases.

Tips & Tricks

How to get the best audio on Live Captions for Any Website

Enable captions on lecture and course platforms

Hearably captions work on Coursera, Udemy, edX, Khan Academy, and any platform that plays audio in the browser. Unlike platform-specific captions that may be inaccurate or unavailable, Hearably generates fresh transcriptions from the actual audio using Whisper AI.

Use auto language detection for foreign content

Watching a video in a language you are learning? Hearably auto-detects the spoken language and transcribes in that language. No manual selection needed. This works across 90+ languages and adapts in real time if the speaker switches languages.

Customize subtitle appearance for readability

Adjust font size, text color, background opacity, and position in the extension settings. For video content, bottom-center with a semi-transparent dark background works best. For web pages with text, top-right minimizes visual interference.

Drag the subtitle overlay to any position

The caption overlay is draggable — click and drag it to any corner of the screen. This is especially useful on fullscreen video where the default position might overlap with on-screen controls or existing subtitles.

Combine with volume boost for hearing accessibility

For users with hearing difficulties, enable both captions and volume boost simultaneously. The captions provide visual confirmation of speech while the boost ensures maximum audibility. Voice Boost mode at 200-300% plus captions covers both visual and auditory channels.

Captions work on video calls and webinars

Enable Hearably captions on Zoom Web, Google Meet, Microsoft Teams (web), or any browser-based video call. The transcription captures all audio from the tab, including remote participants, making it ideal for accessibility in meetings.

Privacy-first: no audio leaves your device

Unlike cloud transcription services that send your audio to external servers, Hearably runs Whisper entirely in your browser. Your audio data stays on your machine. This is critical for confidential meetings, medical consultations, and any scenario where audio privacy matters.

Why Hearably

Built for this exact use case

💬

Real-Time AI Captions

Whisper and Moonshine AI models transcribe audio in real time, directly in your browser. Captions appear as a styled overlay on any webpage — no cloud processing, no delay, no privacy compromise.

🌍

90+ Languages Auto-Detected

Whisper automatically identifies the spoken language and transcribes accordingly. English, Spanish, Japanese, Arabic, Hindi, and 85+ more — no manual language selection required.

🎨

Customizable Subtitle Overlay

Adjust font size, color, background opacity, position, and line count. Drag the overlay anywhere on screen. Smooth text buffering eliminates jittery word-by-word display.

🔇

VAD-Gated Transcription

Voice Activity Detection filters silence and noise before transcription. This prevents hallucinated text during quiet passages and reduces CPU usage by 60-80% compared to always-on transcription.

Two Ways to Boost

Choose your method

Different situations call for different tools. Hearably gives you both.

REAL-TIME

⚡

Chrome Extension

Enhance audio live while you stream. The extension intercepts your tab's audio and processes it in real-time — volume boost, EQ, presets — without downloading anything.

Best for:

Streaming on Live Captions for Any Website, Netflix, Spotify
Video calls on Zoom, Meet, Teams
Any website with audio
When you want instant, always-on enhancement

Add to Chrome — Free

FILE-BASED

🎛️

Free Online Studio

Upload an audio or video file, apply volume boost + 10-band EQ, preview in real-time, then download the enhanced WAV. Your file never leaves your browser.

Best for:

Downloaded videos or music files
Podcast episodes you want to boost before sharing
Voice recordings, lectures, interviews
When you need a permanently enhanced file

Open Free Studio

Pro tip: Use a YouTube-to-MP3 tool to download the audio, then enhance it in Hearably Studio with EQ + volume boost. Perfect for offline listening, DJ sets, or sharing on social media.

How it works

Three clicks to better audio

Install

Add Hearably from the Chrome Web Store. Under 300KB, installs in seconds.

→

Enhance

Click the Hearably icon and tap "Enhance." Boost kicks in instantly.

→

Enjoy

Adjust volume, EQ, and presets. Works on any website with audio.

FAQ

Frequently asked questions

Does the transcription run in the cloud?

No. Hearably runs Whisper (or the lighter Moonshine variant) entirely in your browser using WebAssembly and ONNX Runtime. Audio never leaves your device. There is no cloud API, no server, and no usage-based billing.

How accurate are the live captions?

Whisper is one of the most accurate speech recognition models available, trained on 680,000 hours of multilingual audio. Accuracy varies by language and audio quality, but for clear speech in supported languages, word error rates are typically 5-10% — comparable to or better than YouTube auto-captions.

What languages are supported?

Over 90 languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin Chinese, Arabic, Hindi, Russian, Turkish, Dutch, Polish, and many more. The model auto-detects the spoken language.

Does it work on Netflix, YouTube, and other streaming sites?

Yes. Hearably captions work on any website that plays audio in Chrome or Edge — YouTube, Netflix, Twitch, Spotify podcasts, Coursera, Zoom Web, Google Meet, and any other site. The captions are generated from the tab audio stream, independent of the platform.

How much CPU does live transcription use?

With VAD gating, CPU usage is modest — the Whisper model only runs when speech is detected. On modern hardware (2020+), expect 10-25% CPU usage during active speech. During silence, usage drops to near zero. The model runs in a Web Worker thread, so it does not block the main browser UI.

Can I export the transcription as text or SRT?

The live caption feature is designed for real-time viewing. For exportable transcriptions, use Hearably Studio's caption generator which produces SRT and plain text files from uploaded audio/video files.

How is this different from Chrome's built-in Live Caption?

Chrome Live Caption only supports English (and a few other languages recently added), uses a smaller on-device model, and renders captions in a system-level panel outside the browser window. Hearably supports 90+ languages, uses the more accurate Whisper model, and renders captions directly on the webpage as a styled, draggable overlay.

Does it work with fullscreen video?

Yes. The subtitle overlay has a high z-index and is rendered above all page content, including fullscreen video players. You can drag the overlay to any position on the fullscreen display.

👂Audio Enhancement for Hearing Impaired 📹Zoom Volume Booster & Audio Enhancer 🟢Google Meet Volume Booster & Audio Enhancer ▶️YouTube Volume Booster

Add AI captions to any website — instantly

Real-time Whisper transcription, 90+ languages, zero cloud. Install free and start captioning any audio in your browser.

⚡

Real-Time Enhancement

Boost audio live while you stream, browse, or call. Works on every website.

Add to Chrome — Free Chrome & Edge · Under 300KB

🎛️

Boost a File Online

Upload an MP3, WAV, or video file. Enhance with EQ & volume boost. Download instantly.

Open Free Studio No signup · No upload to servers · 100% in-browser

Want to check your levels first? Try our free dB meter.