Support & Getting Started

OpenVox – Local Voice AI

Getting Started with OpenVox

Welcome to OpenVox – Local Voice AI
Transform text into natural speech with 300+ premium AI voices, plus OmniVoice coverage across 600+ languages, completely private and offline on your Mac.

Quick Start (5 Minutes)

1. Install OpenVox

  • • Download OpenVox from the Mac App Store
  • • Open from your Applications folder
  • • Grant any required permissions (microphone for voice cloning, file access for AudioBooks)

2. Your First Generation

Step 1: Enter Your Text

  • • Type or paste any text into the main text area
  • • Try something simple like: "Hello! This is my first audio generation with OpenVox."

Step 2: Choose a Voice

  • • Click the voice selector or browse voices
  • • For your first try, use Kokoro-82M voices (fast generation)
  • • Popular choices: "Bella" (US Female) or "Adam" (US Male)

Step 3: Generate

  • • Click "Generate" or press ⌘+Enter
  • • On first use, OpenVox will download the AI model (~2-5 minutes, one-time only)
  • • Watch the progress bar with real-time ETA
  • • Audio plays automatically when ready!

Step 4: Export (Optional)

  • • Click "Export" to save your audio
  • • Choose WAV (high quality) or MP3 (smaller file)
  • • Save to your desired location

Congratulations! You've created your first AI-generated audio.

Understanding OpenVox

Four Local AI Models

OpenVox includes four complementary models, each optimized for different needs:

OmniVoice (Massive Language Coverage)
  • 600+ languages with broad global and regional coverage
  • Best for: Long-tail languages, regional support, and underserved language coverage
  • Built for Mac so wide language coverage fits into a real local workflow
Kokoro-82M (Fast & Efficient)
  • 60+ voices in 9 languages
  • • ⚡ Optimized for speed and long documents
  • Best for: Articles, scripts, batch processing
  • Languages include English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Portuguese
Chatterbox (Quality & Versatility)
  • 240+ voices in 23 languages
  • Best for: High-quality voiceovers, audiobooks, and premium speech output
  • Strong choice for voice cloning and expressive controls like CFG weight and exaggeration
  • Covers major world languages including Arabic, German, Korean, Swedish, Turkish, and more
Qwen3 TTS + Voice Design (Custom Reusable Voices)
  • Design voices from text prompts and reuse them across workflows
  • Best for: Custom voice creation, reusable presets, and polished voice design workflows
  • Model sizes: 1.6GB to 3.1GB depending on the selected Qwen3 package
  • Supports multiple languages and locally generated voice samples

Choosing the Right Model:

  • Need broad language coverage? → Use OmniVoice
  • Need speed? → Use Kokoro-82M
  • Need premium voiceovers or cloning? → Use Chatterbox
  • Need custom voice design? → Use Qwen3 TTS
  • Need reusable designed voices? → Use Qwen3 TTS + Voice Design

Main Features Tour

AI Speech Generation (Main Tab)

The primary feature for converting text to speech.

Basic Controls:

  • Text Input: Enter or paste your text (no length limits)
  • Voice Selector: Browse 300+ premium voices and switch into OmniVoice when you need 600+ language coverage
  • Speed: Adjust from 0.5x to 2.0x (default: 1.0x)
  • Model Switcher: Choose between OmniVoice, Kokoro-82M, Chatterbox, and Qwen3 TTS depending on the workflow

Advanced Controls:

  • Temperature: Control randomness (higher = more variation)
  • CFG Weight: Classifier-free guidance (Chatterbox only)
  • Exaggeration: Voice characteristic intensity (Chatterbox only)
  • Post-Processing: Silence removal, audio normalization

Tips for Best Results:

  • • Use proper punctuation for natural pacing
  • • Break very long texts into paragraphs
  • • Use commas for pauses
  • • Avoid excessive ALL CAPS or exclamation marks!!!
  • • For technical terms, use phonetic spelling if mispronounced

Conversations

Create multi-speaker scripts with up to 4 voices for interviews, skits, dialogue, and character scenes.

How to Use:

  1. Click the Conversations tab in the sidebar
  2. Start a new conversation script or import a .txt or .pdf file
  3. Add up to 4 speakers and assign a voice to each speaker
  4. Write each line under the correct speaker so OpenVox can keep the dialogue structure intact
  5. Generate the scene, preview the result, then adjust speaker voices or pacing as needed
  6. Export the completed conversation audio when it is ready

Tips:

  • • Keep speaker names short and consistent, such as Host, Guest, Narrator, or Character 1
  • • Use punctuation and paragraph breaks to control pacing between turns
  • • Use distinct voices for each speaker so dialogue is easy to follow
  • • Import scripts when you already have a draft, then refine speaker assignments inside Conversations

AI AudioBook Generation

Create complete audiobooks from PDF or text files.

How to Use:

  1. Click the AudioBook tab in the sidebar
  2. Click "New AudioBook" or import a PDF/text file
  3. OpenVox auto-detects chapters (or create manually)
  4. Set voice and settings per chapter (or use same for all)
  5. Generate individual chapters or batch process entire book
  6. Export final audio when complete

Features:

  • • Chapter-by-chapter management
  • • Per-chapter voice customization
  • • Batch processing for entire books
  • • Integrated audio player for preview
  • • Chapter reordering and deletion

Best For: Converting books to audio format, creating narrated content, long-form content with multiple sections.

AI Voice Changer

Transform existing audio to different voice characteristics.

How to Use:

  1. Click the Voice Changer tab
  2. Import source audio (MP3 or WAV)
  3. Select target voice from 300+ options
  4. Adjust exaggeration control (how much to transform)
  5. Click "Convert"
  6. For audio >30 seconds, automatic chunking handles it
  7. Export transformed audio

Best For: Character voice variations, podcast voice consistency, audio enhancement, creative voice effects.

AI Voice Cloning

Clone voices from your own audio samples using Chatterbox or Qwen3 TTS.

How to Use:

  1. Click the Voice Clone tab
  2. Click "New Voice"
  3. Choose Language: Select a supported cloning language for Chatterbox or Qwen3 TTS
  4. Select Gender: Male or Female
  5. Provide Audio: Upload audio file (10-20 seconds recommended) OR record directly in the app
  6. Add Transcript: Type what the audio says (improves accuracy)
  7. Click "Create Voice"
  8. Use your cloned voice in AI Speech or AudioBook

Requirements:

  • • Clear audio sample (10-20 seconds recommended)
  • • One speaker only (no background voices)
  • • Reference transcript matching audio (Apple Speech Recognition supported for transcript detection)
  • • Supported format: MP3 or WAV

Tips:

  • • Use high-quality audio (no background noise)
  • • Speak naturally at normal pace
  • • Provide accurate transcript
  • • Keep reference samples focused; OpenVox warns when audio is longer than the recommended 10-20 seconds

💾 Generation History

All your generations are automatically saved locally.

Voice Design (Qwen3)

The Voice Design page lets you create unique AI voices using Qwen3 Voice Design. You can describe the desired voice style in natural language, select language and gender, and generate custom samples for use in AI Speech and AudioBook.

  • 1. Enter a Voice Description: Use natural language to describe the style, tone, or personality you want (e.g., "cinematic storyteller", "premium assistant", "energetic announcer").
  • 2. Write a Script: Provide a short sample script (up to 30 seconds) for the voice to speak. This helps the model understand context and emotion.
  • 3. Select Language & Gender: Choose the language and gender for your custom voice. Qwen3 supports multiple languages and both male/female options.
  • 4. Generate & Preview: Click to generate the sample. Listen to the result and adjust your description or script for best results.
  • 5. Save to Voice Library: Store your generated voice as a reusable reference for AI Speech and AudioBook. Name your voice and select gender before saving.

Tips for Best Voice Design Results

  • • Use clear, descriptive language for voice style (e.g., "warm resonance", "controlled pacing", "studio polish").
  • • Try different scripts to capture emotion or context.
  • • Save multiple versions for different use cases (e.g., audiobook narrator, assistant, character voices).
  • • All generated voices are stored locally and can be managed in the Voice Library.

Qwen3 Voice Design is the most advanced local voice generation available, offering SOTA quality and best cloning for custom voices. Experiment and enjoy creating unique voices for your projects!

Features:

  • • Search by text content or voice name
  • • Filter by AI Speech or Voice Changer
  • • Grid or list view
  • • Replay audio instantly
  • • Reuse settings from previous generations
  • • Export or delete past generations

Access: Click History tab in sidebar, search bar at top for quick filtering, click any item to replay audio, right-click for export or delete options.

Voice Library

OpenVox includes 300+ professional voices plus OmniVoice coverage across 600+ languages for broader language reach.

Browsing Voices

By Model:

  • OmniVoice: 600+ languages for broad global and regional language support
  • Kokoro-82M: Fast generation, 60+ voices, 9 languages
  • Chatterbox: High quality, 240+ voices, 23 languages
  • Qwen3 TTS + Voice Design: Custom reusable voices, advanced voice design, and multiple language workflows

By Language:

  • • Filter by your target language
  • • See voice count per language
  • • Preview voices with sample audio

By Characteristics:

  • Gender: Male, Female
  • Age: Young, Middle-aged, Old (Chatterbox only)
  • Accent: American, British (Chatterbox only)

Favorites: Click the star icon to save favorites, quick access to your preferred voices, works across all features.

Voice Previews

  • • Click the play icon next to any voice
  • • Listen to sample audio before generating
  • • Preview shows voice characteristics
  • • Helps you choose the right voice for your project

Managing Models

Model Library

Access via Sidebar → Manage Models button.

Available Models:

  • OmniVoice: ~3GB
  • Kokoro-82M: ~327MB (fast, 8 languages)
  • Chatterbox (Standard): ~1.2GB (high quality)
  • Chatterbox (8-bit): ~600MB (balanced)
  • Chatterbox (4-bit): ~400MB (memory efficient)
  • Chatterbox Multilingual: ~800MB-1.5GB (23 languages)
  • Qwen3 TTS Base: 1.6GB to 2.6GB (SOTA quality, advanced voice design, best voice cloning locally)
  • Qwen3 TTS Voice Design: 3.1GB (advanced voice design, custom voices)

Actions:

  • • View download status
  • • Download models in advance
  • • Delete unused models to free space
  • • Switch between quantization levels

Choosing Quantization: Standard: Best quality, largest size; 8-bit: Balanced quality/size; 4-bit: Smallest size, slightly lower quality.

First-Time Model Download

On your first generation, OpenVox automatically downloads the required model:

  • OmniVoice: Download time varies by selected package and connection speed
  • Kokoro-82M: 2-5 minutes
  • Chatterbox: 5-15 minutes (varies by version)
  • Qwen3 TTS Base: 5-20 minutes (varies by model size)

Progress Tracking:

  • • Real-time download progress
  • • ETA displayed
  • • Can use app while downloading

Internet Required: One-time download only, downloaded from HuggingFace, models cached locally in ~/.cache/huggingface/, After download: Completely offline!

Local API Usage

OpenVox can expose a local voice API for apps, agents, scripts, and automations. The API is hosted on your Mac at http://127.0.0.1:8000/v1. Keep it on loopback unless you intentionally need local network access.

Recommended request flow:

  1. Call GET /models to discover available model IDs.
  2. Before the first speech request, call POST /models/{model}/load to warm the model.
  3. Call GET /models/{model}/languages to find valid language codes.
  4. Call GET /models/{model}/voices?language={code} to select a compatible voice.
  5. Use the same language code in POST /audio/speech so the voice and synthesis language stay aligned.
OPENVOX_API="http://127.0.0.1:8000/v1"

curl "$OPENVOX_API/models"
curl -X POST "$OPENVOX_API/models/kokoro/load"
curl "$OPENVOX_API/models/kokoro/languages"
curl "$OPENVOX_API/models/kokoro/voices?language=en"

Generate a complete WAV file:

curl -X POST "http://127.0.0.1:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Your spoken reply text here",
    "language": "en",
    "voice": "af_bella",
    "response_format": "wav"
  }' \
  --output openvox-reply.wav

Stream audio for incremental playback:

curl -N -X POST "http://127.0.0.1:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatterbox-turbo-small",
    "language": "en",
    "input": "OpenVox is streaming this response locally.",
    "stream": true
  }'
  • • Streaming uses server-sent events named response.created, audio.chunk, and response.completed.
  • • Treat audio.chunk.data.audio as base64-encoded WAV data and decode it before playback.
  • • If the API returns 429, wait and retry because only one generation or preload job can run at a time.
  • • If a requested voice is missing, fetch the voice list again for the same model and language, then choose a valid replacement.
  • • If the local API is unavailable, keep your app or agent in text mode and show that local voice output is currently unavailable.

Keyboard Shortcuts

Speed up your workflow with these shortcuts:

General

  • ⌘+Enter – Generate audio
  • ⌘+L – Focus text input
  • Space – Play/Pause audio
  • ⌘+H – Open History

Text Editing

  • ⌘+A – Select all text
  • ⌘+C – Copy
  • ⌘+V – Paste
  • ⌘+Z – Undo

Navigation

  • ⌘+1 – AI Speech tab
  • ⌘+2 – AudioBook tab
  • ⌘+3 – Voice Changer tab
  • ⌘+4 – Voice Clone tab
  • ⌘+5 – History tab

Tips for Best Results

Text Input

Do:

  • Use proper punctuation for natural pacing
  • Break long texts into paragraphs
  • Use commas for pauses
  • Write in complete sentences
  • Use quotes for dialogue: "Hello," she said.

Don't:

  • Use excessive exclamation marks!!!
  • Write in ALL CAPS (unless emphasizing)
  • Include URLs or code (spell them out instead)
  • Use special characters excessively ($$$, ***, etc.)

Voice Selection

For Narration:

  • • Use Narrative voices (Kokoro)
  • • Professional, clear, storytelling tone

For Conversational:

  • • Use Conversational A/B voices (Kokoro)
  • • Natural, friendly tone

For Professional:

  • • Use Professional voices (Chatterbox or Qwen3 TTS)
  • • Formal, clear, business-appropriate

For Character Voices:

  • • Use Chatterbox with exaggeration control
  • • Experiment with different ages and accents

Speed Settings

  • 0.5x-0.8x: Slow, deliberate (learning content)
  • 1.0x: Natural pace (default)
  • 1.2x-1.5x: Faster (podcast-style)
  • 1.5x-2.0x: Very fast (time-saving)

Long Documents

For texts over 5,000 words: Kokoro-82M: Fast generation, ideal for long docs; AudioBook Feature: Best for books with chapters; Batch Processing: Generate multiple sections at once.

Export & File Management

Export Formats

WAV (Recommended for Quality)

  • • 24kHz, 16-bit, lossless
  • • Best for editing or professional use
  • • Larger file size

MP3 (Recommended for Sharing)

  • • Compressed, widely compatible
  • • Smaller file size
  • • Good for podcasts, videos, web

Export Options

From Generation View:

  • • Click "Export" button after generation
  • • Choose format and location
  • • Audio saved with timestamp

From History:

  • • Right-click any generation
  • • Select "Export"
  • • Batch export multiple items

Drag & Drop: Drag audio from history to Finder, quick export without dialogs.

Privacy & Offline Use

100% Private

Your data never leaves your Mac:

  • All AI processing happens locally
  • No cloud services or servers
  • No analytics or tracking
  • No account required
  • No internet after model download

You can verify: Use Activity Monitor or Little Snitch, after initial model download, zero network activity, all data stored in local app container.

Completely Offline

After initial setup:

  • No internet required for generation
  • Perfect for travel
  • Works on planes, trains, remote areas
  • No API rate limits
  • Unlimited generations

Only need internet for: Initial model download (one-time), app updates from Mac App Store.

Data Storage

Where your data lives:

  • Generation History: Local SwiftData database
  • Cloned Voices: Local app container
  • Preferences: macOS UserDefaults
  • AI Models: ~/.cache/huggingface/hub/
  • Exported Audio: Your chosen location
  • Nothing in the cloud!

Troubleshooting

Model Download Issues

Problem: Download stuck or slow

Solutions:

  • • Check internet connection
  • • Try smaller model (4-bit vs standard)
  • • Check disk space (need 1-3GB free)
  • • Restart app and retry
  • • Check Model Library for progress

Generation Issues

Problem: Audio sounds robotic

Solutions:

  • • Reset speed to 1.0x
  • • Try different voice
  • • Simplify punctuation
  • • Break long sentences

Problem: Words mispronounced

Solutions:

  • • Use phonetic spelling (e.g., "Nee-chuh" for "Nietzsche")
  • • Add hyphens (e.g., "data-base")
  • • Use commas for pacing

Performance Issues

Problem: Generation is slow

Solutions:

  • • Close other intensive apps
  • • Ensure Mac is plugged in (not low-power mode)
  • • Use Kokoro-82M for faster generation
  • • Restart Mac to clear memory
  • • Check Activity Monitor for runaway processes

Can't Find Features

Problem: Where is voice cloning?

Answer: Click "Voice Clone" tab in sidebar

Problem: Where do I create multi-speaker dialogue?

Answer: Click "Conversations" tab in the sidebar, then add speakers or import a .txt/.pdf script.

Problem: How to import PDF for audiobook?

Answer: Click "AudioBook" tab → "New AudioBook" → Import PDF

Problem: Where are advanced controls?

Answer: Expand "Advanced" section below voice selector

System Requirements

Minimum Requirements

  • Mac: Apple Silicon (M1, M2, M3, M4, or later)
  • macOS: macOS 15.0 (Sequoia)
  • Disk Space: 2-6GB recommended (varies by models installed)
  • RAM: 8GB minimum (16GB recommended)
  • Internet: For one-time model download only

Intel Macs Not Supported

OpenVox requires Apple Silicon and uses Apple's MLX framework, which is not available on Intel Macs. There are no plans for Intel support as MLX is Apple Silicon-only.

Next Steps

Explore More Features

Try Conversations:

  1. Click Conversations tab
  2. Add two to four speakers or import a .txt/.pdf script
  3. Assign a different voice to each speaker
  4. Generate your first interview, skit, or dialogue scene

Try AudioBook Generation:

  1. Click AudioBook tab
  2. Import a PDF or text file
  3. Generate chapter by chapter
  4. Create your first audiobook!

Experiment with Voice Cloning:

  1. Click Voice Clone tab
  2. Record or upload 10-20 seconds of clean audio
  3. Create your custom voice
  4. Use it in AI Speech

Use Voice Changer:

  1. Click Voice Changer tab
  2. Import existing audio
  3. Transform to different voice
  4. Export transformed audio

Optimize Your Workflow

Set Favorites:

  • • Star your preferred voices
  • • Quick access across all features

Use Keyboard Shortcuts:

  • ⌘+Enter to generate
  • Space to play/pause
  • ⌘+L to focus text

Organize History: Use search to find past generations, reuse settings from history, export batches for projects.

Learn Advanced Techniques

Fine-Tune Generation:

  • • Experiment with Temperature
  • • Adjust CFG Weight (Chatterbox)
  • • Use Exaggeration for character voices (Chatterbox)

Optimize for Use Case:

  • Podcasts: Conversational voices at 1.2x
  • Audiobooks: Narrative voices with chapters
  • Professional: Professional voices at 1.0x
  • Character Work: Chatterbox or Qwen3 TTS

Batch Processing: Use AudioBook for multi-chapter content, clone voices for consistency, export in bulk from History.

Getting Help

Documentation

  • FAQ: Comprehensive answers to common questions
  • Support Docs: Detailed technical information
  • This Guide: Getting started and feature overview

Contact Support

  • Email: [email protected]
  • Website: theoracleguy.in/support
  • Response Time: 24-48 hours

Report Issues

Include in your report:

  • • Description of the problem
  • • Steps to reproduce
  • • Sample text (if generation-related)
  • • System info (macOS version, Mac model)
  • • Screenshots or error messages

Welcome to OpenVox!

You're now ready to transform text into natural speech with complete privacy and control.

Remember:

  • 100% private – No cloud, no tracking
  • Completely offline after setup
  • 300+ premium voices plus 600+ language coverage with OmniVoice
  • Four models: OmniVoice, Kokoro-82M, Chatterbox, and Qwen3 TTS + Voice Design
  • Apple Silicon accelerated via MLX
  • Conversations, Voice cloning, AudioBooks, Voice Changer included

Need Help?

  • • Check the FAQ for quick answers
  • • Visit theoracleguy.in/support for more information
  • • Email [email protected]

Enjoy creating amazing audio with OpenVox!

*Last Updated: April 17, 2026*

Frequently Asked Questions

Need more help?

Email us at [email protected] or visit the OpenVox page for more information.

Last Updated: April 17, 2026