I Made Claude Code Talk

The Moment

I was 90 minutes into a session. Claude finished a task, dropped 400 words explaining what it did and why, and I had to stop. Stop touching the keyboard. Stop thinking. And read! When the fuck did I learn how to read?

That's when I went — wait, why am I doing this...

I've got a terminal open, a diff open, three files I'm about to touch... and I'm spending 15 seconds doing the one thing that breaks the whole loop. Eyes off the code. Brain switches modes. Parse the wall of text. Confirm it did the thing. Eyes back. Re-acquire context. Keep going. Every. Single. Response.

That's not minor friction. That's a cognitive context switch on every agent reply, and I was just... accepting it. Like it was normal. Like that's just how this works now.

I don't know if it bothers other people the same way it bothers me — maybe most devs are fine with it — but I couldn't stop thinking about it once I noticed it. So I built the thing that makes it stop.

How It Works

claude-code-tts hooks into Claude Code's lifecycle events... specifically the hooks that fire when an agent starts, when it completes, when it hits a notification. Those hooks pipe the response text to a local TTS daemon running in the background.

The daemon is the part that actually matters. First time I prototyped this, there was a cold-start delay every single time — 1.2 seconds before any voice came out, which is honestly worse than just reading. So the daemon solves that... it keeps the model warm, first word hits in under 200ms. That's the difference between audio that integrates into your flow and audio that interrupts it.

Voices come from Microsoft Edge TTS — 21 neural voices, no API key, no account, no $0.015 per character quietly ticking up while you're heads-down. Free. And not "good for free" — just actually good.

npx @domdhi/claude-code-tts

That's it. One command. Configures itself, starts the daemon, wires the hooks. No editing config files. I hate tools that make you do setup rituals.

The Stupid Fun Part I Didn't Expect to Care About

Okay so the natural language voice selection thing... I thought this was a gimmick when I was building it. It's not a gimmick, lol.

You don't pick a voice ID. You describe what you want:

"british male"
"bubbly girl faster"
"calm deep voice"
"professional female slower"

The system figures out which of the 21 voices matches and sets the rate. I've got 30+ configured agents in my setup — gave each one a different voice. Orchestrator sounds different from the code reviewer. QA engineer sounds different from the architect. After a week of this you just know which agent is talking without looking at anything.

I realize that sounds insane. It kind of is. But it works and I'm not changing it.

Offline? There's a kokoro-onnx fallback. Quality drops a bit but it's fully local, zero network call. Either way you're covered.

What Actually Changed

Here's what I expected: I'd hear confirmations instead of reading them, feel slightly less annoyed, whatever, ship it.

Here's what actually happened — my eyes stopped leaving the code. Not sometimes. Basically never. Claude finishes, I hear the done signal, I'm already in the next file. I'm not reading "I've successfully refactored the route handler to use the new pattern" — I'm listening to it while my cursor is already somewhere else.

That sounds like a small thing. It isn't.

When you're reading, you're in receive mode. You're parsing language, confirming what happened, deciding next steps — sequentially, one thing at a time. When you're listening... your brain can stay in build mode. The audio comes in on a different channel. You process it without stopping whatever else you're processing.

I don't know the neuroscience of why this works, and honestly I don't care. I just know that after about three days your brain rewires around it... and going back to silent text walls feels like playing a game with the sound off. Technically functional. Noticeably, stubbornly worse. You can do it but why would you.

The workflow change is bigger than the tool itself. The tool is just hooks and a daemon and 21 voices. The change is what happens to your cognitive loop when Claude talks to you instead of displaying at you. That's the shit that's actually different.

What It Doesn't Do

It doesn't read every character of every response out loud — nobody wants 80 lines of TypeScript narrated character by character. Long code blocks get summarized or skipped. There's configuration around what triggers speech vs. what passes through silently, and you can tune it.

Also not magic for every workflow. If you're doing rapid back-and-forth where you're actively steering with quick prompts, the audio can feel like lag. The sweet spot is longer autonomous tasks — give Claude something to go build, keep working, listen for when it's done. That's when the parallel processing really clicks.

Session pooling handles multiple Claude Code instances running at the same time. I usually have 2-3 going in parallel... each gets its own voice config so they don't step on each other.

Get It

npx @domdhi/claude-code-tts

MIT licensed. Repo is at github.com/Domdhi/claude-code-tts if you want to dig into per-agent routing or tune what triggers speech.

I can't go back to silence.