← All posts

How we built Wispr Flow's floating pill

How we reconstructed Wispr Flow's floating pill in Electron. It's a non-focusable window that renders, listens for hotkeys, and turns your mic into a live waveform.

Matthew WangMatthew WangMaintainer · FreestyleJun 8, 2026 · 6 min readShare
How we built Wispr Flow's floating pill

For the past two weeks, Aditya and I have been building Freestyle, an open source alternative to Wispr Flow. We were both Wispr Flow addicts, but felt like voice dictation is a commodity, and should be free and local first.

We thought building it was going to be a cakewalk. Two weeks in, we took that assumption back. Getting sub-second latency and accuracy good enough, that's the part that separates a weekend project from a real product. That's the part we're still chasing.

I wanted to share our learnings along the way. In this article, we cover how we built the floating pill.

The pill is its own Electron window

We built Freestyle in Electron, the framework that lets us build desktop apps in React inside a Chromium window. In Wispr Flow and Freestyle, the pill is the little capsule that pops up when you talk, shows a waveform, then disappears.

The floating pill, mid-dictation.
The floating pill, mid-dictation.

The trick to rendering the pill is that it's not part of the app. The pill is a separate BrowserWindow that floats. BrowserWindow is part of the Electron API. If you're interested in learning more about it, here are the docs.

pillWindow = new BrowserWindow({
  width: 260,
  height: 90,
  frame: false,
  transparent: true,
  hasShadow: false,
  resizable: false,
  alwaysOnTop: true,
  skipTaskbar: true,
  focusable: false,
  ...(process.platform === "darwin" ? { type: "panel" } : {}),
});

pillWindow.setAlwaysOnTop(true, "screen-saver");
pillWindow.setVisibleOnAllWorkspaces(true, { visibleOnFullScreen: true });

transparent: true means the window is invisible. The rounded pill is just an HTML element sitting inside a bigger transparent rectangle. That's what gives you a floating shape.

focusable: false is set because we never want the user to be able to focus their cursor on the pill. We want the app to be able to paste text without ever stealing the cursor. We also show the pill by calling pillWindow.showInactive() instead of show() so the editor doesn't know it's there.

Lastly, setAlwaysOnTop() with the screen-saver level sets the z-level, ensuring the pill shows up above other apps.

Pill responding to hotkey presses over IPC

All of the hotkey handling lives in Electron's main process, not the pill. In order for the pill to know when the user presses and releases hotkeys, the main process sends messages to the pill:

pillWindow.webContents.send("hotkey:down"); // start recording
pillWindow.webContents.send("hotkey:up");   // stop & transcribe

The pill has its own state machine, idle → initializing → recording → transcribing → idle, that responds to inputs from the main process.

Making the pill's waveform feel alive

The fun part about building the pill is creating its reactive waveform.

The pill itself is the window that activates your mic. Since it's a Chromium window, it grabs the mic via the getUserMedia() API.

From there, it attaches an AnalyserNode to the mic, which analyzes the audio and hands back snapshots of frequencies and intensity. It's those frequencies that become the bars.

const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: false,
    noiseSuppression: false,
    autoGainControl: false,
  },
});

const ctx = new AudioContext();
const source = ctx.createMediaStreamSource(stream);
const analyser = ctx.createAnalyser();
analyser.fftSize = 256;
source.connect(analyser);

Each frame, we take that frequency snapshot, slice it into 14 buckets, one per bar, and average the energy in each to get a target height.

const BARS = 14;
const RISE = 0.55;
const FALL = 0.22;

function smoothBars(current: number[], target: number[]): number[] {
  return current.map((now, i) => {
    const goal = target[i] ?? 0;
    const speed = goal > now ? RISE : FALL;
    return now + (goal - now) * speed;
  });
}

function nextFrame(analyser: AnalyserNode, bars: number[]): number[] {
  const spectrum = new Uint8Array(analyser.frequencyBinCount);
  analyser.getByteFrequencyData(spectrum);

  const binsPerBar = Math.floor(spectrum.length / BARS);
  const target = Array.from({ length: BARS }, (_, i) => {
    let sum = 0;
    for (let j = 0; j < binsPerBar; j++) sum += spectrum[i * binsPerBar + j];
    return sum / binsPerBar / 255;
  });

  return smoothBars(bars, target);
}

That asymmetry in smoothBars gives it the smooth, voice-y feel that settles with a decay. Lastly, these are all rendered via plain SVG lines, and a requestAnimationFrame loop drives the motion. This is far more effective than leaning on React's re-rendering, which would drop frames.

What's next

First, I wanted to thank everyone who's part of the community and has contributed to the project in any way.

There's so much work that needs to be done to build an open source dictation app that feels native and has optimized latency. We want to keep refining how we use local models, post-processing, and the rendering of things like this pill.

We're looking to build a community of devs interested in working in the OSS voice space. If you found this article interesting, please check out the GitHub repo and consider joining the community!

Matthew Wang
Matthew Wang
Maintainer · Freestyle

Maintainer of Freestyle Voice

Found this useful? Pass it on.

Keep reading