Use Voice Mode with Hermes

This guide is the practical companion to the Voice Mode feature reference.

If the feature page explains what voice mode can do, this guide shows how to actually use it well.

What voice mode is good for

Voice mode is especially useful when:

Choose your voice mode setup

There are really three different voice experiences in Hermes.

ModeBest forPlatform
Interactive microphone loopPersonal hands-free use while coding or researchingCLI
Voice replies in chatSpoken responses alongside normal messagingTelegram, Discord
Live voice channel botGroup or personal live conversation in a VCDiscord voice channels

A good path is:

  1. get text working first
  2. enable voice replies second
  3. move to Discord voice channels last if you want the full experience

Step 1: make sure normal Hermes works first

Before touching voice mode, verify that:

hermes

Ask something simple:

What tools do you have available?

If that is not solid yet, fix text mode first.

Step 2: install the right extras

CLI microphone + playback

pip install "hermes-agent[voice]"

Messaging platforms

pip install "hermes-agent[messaging]"

Premium ElevenLabs TTS

pip install "hermes-agent[tts-premium]"

Local NeuTTS (optional)

python -m pip install -U neutts[all]

Everything

pip install "hermes-agent[all]"

Step 3: install system dependencies

macOS

brew install portaudio ffmpeg opus
brew install espeak-ng

Ubuntu / Debian

sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng

Why these matter:

Step 4: choose STT and TTS providers

Hermes supports both local and cloud speech stacks.

Easiest / cheapest setup

Use local STT and free Edge TTS:

This is usually the best place to start.

Environment file example

Add to ~/.hermes/.env:

# Cloud STT options (local needs no key)
GROQ_API_KEY=***
VOICE_TOOLS_OPENAI_KEY=***

# Premium TTS (optional)
ELEVENLABS_API_KEY=***

Provider recommendations

Speech-to-text

Text-to-speech

If you use hermes setup

If you choose NeuTTS in the setup wizard, Hermes checks whether neutts is already installed. If it is missing, the wizard tells you NeuTTS needs the Python package neutts and the system package espeak-ng, offers to install them for you, installs espeak-ng with your platform package manager, and then runs:

python -m pip install -U neutts[all]

If you skip that install or it fails, the wizard falls back to Edge TTS.

voice:
  record_key: "ctrl+b"
  max_recording_seconds: 120
  auto_tts: false
  silence_threshold: 200
  silence_duration: 3.0

stt:
  provider: "local"
  local:
    model: "base"

tts:
  provider: "edge"
  edge:
    voice: "en-US-AriaNeural"

This is a good conservative default for most people.

If you want local TTS instead, switch the tts block to:

tts:
  provider: "neutts"
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu

Use case 1: CLI voice mode

Turn it on

Start Hermes:

hermes

Inside the CLI:

/voice on

Recording flow

Default key:

Workflow:

  1. press Ctrl+B
  2. speak
  3. wait for silence detection to stop recording automatically
  4. Hermes transcribes and responds
  5. if TTS is on, it speaks the answer
  6. the loop can automatically restart for continuous use

Useful commands

/voice
/voice on
/voice off
/voice tts
/voice status

Good CLI workflows

Walk-up debugging

Say:

I keep getting a docker permission error. Help me debug it.

Then continue hands-free:

Research / brainstorming

Great for:

Accessibility / low-typing sessions

If typing is inconvenient, voice mode is one of the fastest ways to stay in the full Hermes loop.

Tuning CLI behavior

Silence threshold

If Hermes starts/stops too aggressively, tune:

voice:
  silence_threshold: 250

Higher threshold = less sensitive.

Silence duration

If you pause a lot between sentences, increase:

voice:
  silence_duration: 4.0

Record key

If Ctrl+B conflicts with your terminal or tmux habits:

voice:
  record_key: "ctrl+space"

Use case 2: voice replies in Telegram or Discord

This mode is simpler than full voice channels.

Hermes stays a normal chat bot, but can speak replies.

Start the gateway

hermes gateway

Turn on voice replies

Inside Telegram or Discord:

/voice on

or

/voice tts

Modes

ModeMeaning
offtext only
voice_onlyspeak only when the user sent voice
allspeak every reply

When to use which mode

Good messaging workflows

Telegram assistant on your phone

Use when:

Discord DMs with spoken output

Useful when you want private interaction without server-channel mention behavior.

Use case 3: Discord voice channels

This is the most advanced mode.

Hermes joins a Discord VC, listens to user speech, transcribes it, runs the normal agent pipeline, and speaks replies back into the channel.

Required Discord permissions

In addition to the normal text-bot setup, make sure the bot has:

Also enable privileged intents in the Developer Portal:

Join and leave

In a Discord text channel where the bot is present:

/voice join
/voice leave
/voice status

What happens when joined

Best practices for Discord VC use

Voice quality recommendations

Best quality setup

Best speed / convenience setup

Best zero-cost setup

Common failure modes

“No audio device found”

Install portaudio.

“Bot joins but hears nothing”

Check:

“It transcribes but does not speak”

Check:

“Whisper outputs garbage”

Try:

“It works in DMs but not in server channels”

That is often mention policy.

By default, the bot needs an @mention in Discord server text channels unless configured otherwise.

Suggested first-week setup

If you want the shortest path to success:

  1. get text Hermes working
  2. install hermes-agent[voice]
  3. use CLI voice mode with local STT + Edge TTS
  4. then enable /voice on in Telegram or Discord
  5. only after that, try Discord VC mode

That progression keeps the debugging surface small.