Building Ramble #2: Capturing Audio in Real-Time

The unexpected challenges of streaming microphone audio through modern browser APIs

By Ricardo Amaral

While building Ramble , we needed real-time audio capture to stream microphone input to our backend for processing. The initial implementation used a Web Audio API feature called ScriptProcessorNode, which made this straightforward: create an inline callback, process audio samples, send them over WebSocket. No extra files, no complex setup. But there was a catch. The API was deprecated, and browsers could drop support at any time. Rather than wait for Ramble to break unexpectedly, we migrated to the modern replacement: AudioWorklet.

This write-up covers the interesting challenges involved: understanding why the old API had to go, implementing the worklet processor itself, getting the worklet module to load correctly across our multi-CDN environment, and discovering that our desktop app was keeping computers awake.

Migrating from a Deprecated API

ScriptProcessorNode has been deprecated in the Web Audio specification for years. Chrome introduced its replacement, AudioWorklet, back in version 64 (December 2017), and it became enabled by default in Chrome 66. The deprecation wasn’t arbitrary. The old API had a fundamental design flaw: it ran audio processing on the main UI thread.

When your audio callback competes with React rendering, animations, and DOM updates, you get problems. The UI can freeze briefly, causing audio glitches. Or audio processing gets delayed, causing latency. Google’s announcement put it bluntly: “the event handling is asynchronous by design, and the code execution happens on the main thread… causing either UI to jank or audio to glitch.”

AudioWorklet solves this by running audio code on a dedicated audio rendering thread, completely separate from the main thread. The trade-off is complexity: instead of an inline callback, you now need a separate JavaScript module that the browser loads and executes in that audio thread. The deprecation was our main driver for migrating, but for Ramble (an AI feature that streams voice input in real-time), the improved latency and reliability were welcome bonuses.

Inside the Worklet

The worklet itself lives in a standalone JavaScript file. It defines a class extending AudioWorkletProcessor with a single method: process(). The Web Audio API calls this method automatically, typically delivering around 128 audio samples at a time (a few milliseconds of audio per call, depending on the device’s sample rate).

Sending 128 samples over a WebSocket every few milliseconds would be inefficient. Instead, we accumulate samples into a buffer:

process(inputs) {
    const inputChannel = inputs[0]?.[0]
    
    if (inputChannel) {
        for (const sample of inputChannel) {
            this.buffer[this.bufferIndex++] = sample
            
            if (this.bufferIndex >= this.bufferSize) {
                this.port.postMessage(new Float32Array(this.buffer))
                this.bufferIndex = 0
            }
        }
    }
    
    return true
}

We chose a buffer size of 2048 samples, roughly 40-50ms of audio depending on the device’s sample rate. Larger buffers mean fewer messages but higher latency; smaller buffers mean more overhead. 2048 felt like a reasonable middle ground for voice input.

When the buffer fills, we send it to the main thread via postMessage. The main thread then resamples to 16kHz (what our backend expects), converts the floating-point samples to 16-bit integers, base64-encodes them, and sends them over the WebSocket.

One trade-off we accepted: if a session ends with a partial buffer, those samples are lost. In practice, this is fine. Sessions end either when users click “add tasks” (after they’ve finished speaking) or via voice commands like “that’s it” (which the backend processes before closing the session). By then, the microphone is capturing silence anyway.

Serving the Worklet Module

Unlike regular JavaScript, worklet modules can’t be bundled with your application code. They must be loaded separately via audioContext.audioWorklet.addModule(url). This introduced complications 😬

We serve Todoist from multiple CDNs, with a watchdog system that dynamically selects one based on availability and performance. The worklet URL needs to use whichever CDN was selected. We also use content-hashing for cache busting, so the filename changes with each deployment.

Our solution: at build time, we copy the worklet file as ramble-audio-capture.[contenthash].js and inject the filename into the HTML via a data attribute. At runtime, we read that attribute, prepend the selected CDN base URL, and call addModule(). We trigger this load early during app initialization so the worklet is ready by the time a user opens Ramble. If loading fails for any reason, we detect it and show an error before they try to record.

Electron and the Sleep Problem

Here’s something we didn’t anticipate: in our desktop app, computers stopped going to sleep while Todoist was open 😴

Browsers start AudioContext in a suspended state due to autoplay policies. But Electron doesn’t have those restrictions, so the context starts running immediately. A running audio context means an active audio stream at the OS level, and operating systems interpret that as “something is playing audio, don’t sleep.”

We found this through user reports. The fix was manual lifecycle management: suspend the context immediately after creating it in Electron, resume it when the user actually starts a Ramble session, then suspend again when the session ends. Not complicated once we understood the cause, but definitely not something we’d planned for.

Wrapping Up

On paper, this migration was simple: replace a deprecated API with its modern equivalent. In practice, each section of this write-up represents something we didn’t anticipate. Thread boundaries, CDN quirks, OS sleep behavior—all discoveries made along the way.

Getting the worklet to load reliably was the most frustrating part. It worked locally, broke in staging, worked again after fixes, then failed in edge cases we hadn’t considered. It took a few iterations, but we got there.

Ramble now runs on a foundation that won’t break when browsers eventually drop ScriptProcessorNode. We also got the benefits of off-thread audio processing: lower latency, no competition with UI rendering. Worth doing before it became urgent.