Welcome to OpenVox – Local Voice AI Transform text into natural speech with 300+ premium AI voices, plus OmniVoice coverage across 600+ languages, completely private and offline on your Mac.
Quick Start (5 Minutes)
1. Install OpenVox
• Download OpenVox from the Mac App Store
• Open from your Applications folder
• Grant any required permissions (microphone for voice cloning, file access for AudioBooks)
2. Your First Generation
Step 1: Enter Your Text
• Type or paste any text into the main text area
• Try something simple like: "Hello! This is my first audio generation with OpenVox."
Step 2: Choose a Voice
• Click the voice selector or browse voices
• For your first try, use Kokoro-82M voices (fast generation)
• Popular choices: "Bella" (US Female) or "Adam" (US Male)
Step 3: Generate
• Click "Generate" or press ⌘+Enter
• On first use, OpenVox will download the AI model (~2-5 minutes, one-time only)
• Watch the progress bar with real-time ETA
• Audio plays automatically when ready!
Step 4: Export (Optional)
• Click "Export" to save your audio
• Choose WAV (high quality) or MP3 (smaller file)
• Save to your desired location
Congratulations! You've created your first AI-generated audio.
Understanding OpenVox
Four Local AI Models
OpenVox includes four complementary models, each optimized for different needs:
OmniVoice (Massive Language Coverage)
• 600+ languages with broad global and regional coverage
• Best for: Long-tail languages, regional support, and underserved language coverage
• Built for Mac so wide language coverage fits into a real local workflow
Kokoro-82M (Fast & Efficient)
• 60+ voices in 9 languages
• ⚡ Optimized for speed and long documents
• Best for: Articles, scripts, batch processing
• Languages include English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Portuguese
Chatterbox (Quality & Versatility)
• 240+ voices in 23 languages
• Best for: High-quality voiceovers, audiobooks, and premium speech output
• Strong choice for voice cloning and expressive controls like CFG weight and exaggeration
• Covers major world languages including Arabic, German, Korean, Swedish, Turkish, and more
Qwen3 TTS + Voice Design (Custom Reusable Voices)
• Design voices from text prompts and reuse them across workflows
• Best for: Custom voice creation, reusable presets, and polished voice design workflows
• Model sizes: 1.6GB to 3.1GB depending on the selected Qwen3 package
• Supports multiple languages and locally generated voice samples
Choosing the Right Model:
• Need broad language coverage? → Use OmniVoice
• Need speed? → Use Kokoro-82M
• Need premium voiceovers or cloning? → Use Chatterbox
• Need custom voice design? → Use Qwen3 TTS
• Need reusable designed voices? → Use Qwen3 TTS + Voice Design
Main Features Tour
AI Speech Generation (Main Tab)
The primary feature for converting text to speech.
Basic Controls:
• Text Input: Enter or paste your text (no length limits)
• Voice Selector: Browse 300+ premium voices and switch into OmniVoice when you need 600+ language coverage
• Speed: Adjust from 0.5x to 2.0x (default: 1.0x)
• Model Switcher: Choose between OmniVoice, Kokoro-82M, Chatterbox, and Qwen3 TTS depending on the workflow
Advanced Controls:
• Temperature: Control randomness (higher = more variation)
• Keep reference samples focused; OpenVox warns when audio is longer than the recommended 10-20 seconds
💾 Generation History
All your generations are automatically saved locally.
Voice Design (Qwen3)
The Voice Design page lets you create unique AI voices using Qwen3 Voice Design. You can describe the desired voice style in natural language, select language and gender, and generate custom samples for use in AI Speech and AudioBook.
1. Enter a Voice Description: Use natural language to describe the style, tone, or personality you want (e.g., "cinematic storyteller", "premium assistant", "energetic announcer").
2. Write a Script: Provide a short sample script (up to 30 seconds) for the voice to speak. This helps the model understand context and emotion.
3. Select Language & Gender: Choose the language and gender for your custom voice. Qwen3 supports multiple languages and both male/female options.
4. Generate & Preview: Click to generate the sample. Listen to the result and adjust your description or script for best results.
5. Save to Voice Library: Store your generated voice as a reusable reference for AI Speech and AudioBook. Name your voice and select gender before saving.
Tips for Best Voice Design Results
• Use clear, descriptive language for voice style (e.g., "warm resonance", "controlled pacing", "studio polish").
• Try different scripts to capture emotion or context.
• Save multiple versions for different use cases (e.g., audiobook narrator, assistant, character voices).
• All generated voices are stored locally and can be managed in the Voice Library.
Qwen3 Voice Design is the most advanced local voice generation available, offering SOTA quality and best cloning for custom voices. Experiment and enjoy creating unique voices for your projects!
Features:
• Search by text content or voice name
• Filter by AI Speech or Voice Changer
• Grid or list view
• Replay audio instantly
• Reuse settings from previous generations
• Export or delete past generations
Access: Click History tab in sidebar, search bar at top for quick filtering, click any item to replay audio, right-click for export or delete options.
Voice Library
OpenVox includes 300+ professional voices plus OmniVoice coverage across 600+ languages for broader language reach.
Browsing Voices
By Model:
• OmniVoice: 600+ languages for broad global and regional language support
• Kokoro-82M: Fast generation, 60+ voices, 9 languages
• Chatterbox: High quality, 240+ voices, 23 languages
• Qwen3 TTS + Voice Design: Custom reusable voices, advanced voice design, and multiple language workflows
By Language:
• Filter by your target language
• See voice count per language
• Preview voices with sample audio
By Characteristics:
• Gender: Male, Female
• Age: Young, Middle-aged, Old (Chatterbox only)
• Accent: American, British (Chatterbox only)
Favorites: Click the star icon to save favorites, quick access to your preferred voices, works across all features.
Voice Previews
• Click the play icon next to any voice
• Listen to sample audio before generating
• Preview shows voice characteristics
• Helps you choose the right voice for your project
Choosing Quantization:Standard: Best quality, largest size; 8-bit: Balanced quality/size; 4-bit: Smallest size, slightly lower quality.
First-Time Model Download
On your first generation, OpenVox automatically downloads the required model:
• OmniVoice: Download time varies by selected package and connection speed
• Kokoro-82M: 2-5 minutes
• Chatterbox: 5-15 minutes (varies by version)
• Qwen3 TTS Base: 5-20 minutes (varies by model size)
Progress Tracking:
• Real-time download progress
• ETA displayed
• Can use app while downloading
Internet Required: One-time download only, downloaded from HuggingFace, models cached locally in ~/.cache/huggingface/, After download: Completely offline!
Local API Usage
OpenVox can expose a local voice API for apps, agents, scripts, and automations. The API is hosted on your Mac at http://127.0.0.1:8000/v1. Keep it on loopback unless you intentionally need local network access.
Recommended request flow:
Call GET /models to discover available model IDs.
Before the first speech request, call POST /models/{model}/load to warm the model.
Call GET /models/{model}/languages to find valid language codes.
Call GET /models/{model}/voices?language={code} to select a compatible voice.
Use the same language code in POST /audio/speech so the voice and synthesis language stay aligned.
curl -N -X POST "http://127.0.0.1:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "chatterbox-turbo-small",
"language": "en",
"input": "OpenVox is streaming this response locally.",
"stream": true
}'
• Streaming uses server-sent events named response.created, audio.chunk, and response.completed.
• Treat audio.chunk.data.audio as base64-encoded WAV data and decode it before playback.
• If the API returns 429, wait and retry because only one generation or preload job can run at a time.
• If a requested voice is missing, fetch the voice list again for the same model and language, then choose a valid replacement.
• If the local API is unavailable, keep your app or agent in text mode and show that local voice output is currently unavailable.
Keyboard Shortcuts
Speed up your workflow with these shortcuts:
General
• ⌘+Enter – Generate audio
• ⌘+L – Focus text input
• Space – Play/Pause audio
• ⌘+H – Open History
Text Editing
• ⌘+A – Select all text
• ⌘+C – Copy
• ⌘+V – Paste
• ⌘+Z – Undo
Navigation
• ⌘+1 – AI Speech tab
• ⌘+2 – AudioBook tab
• ⌘+3 – Voice Changer tab
• ⌘+4 – Voice Clone tab
• ⌘+5 – History tab
Tips for Best Results
Text Input
Do:
• Use proper punctuation for natural pacing
• Break long texts into paragraphs
• Use commas for pauses
• Write in complete sentences
• Use quotes for dialogue: "Hello," she said.
Don't:
• Use excessive exclamation marks!!!
• Write in ALL CAPS (unless emphasizing)
• Include URLs or code (spell them out instead)
• Use special characters excessively ($$$, ***, etc.)
Voice Selection
For Narration:
• Use Narrative voices (Kokoro)
• Professional, clear, storytelling tone
For Conversational:
• Use Conversational A/B voices (Kokoro)
• Natural, friendly tone
For Professional:
• Use Professional voices (Chatterbox or Qwen3 TTS)
• Formal, clear, business-appropriate
For Character Voices:
• Use Chatterbox with exaggeration control
• Experiment with different ages and accents
Speed Settings
• 0.5x-0.8x: Slow, deliberate (learning content)
• 1.0x: Natural pace (default)
• 1.2x-1.5x: Faster (podcast-style)
• 1.5x-2.0x: Very fast (time-saving)
Long Documents
For texts over 5,000 words: Kokoro-82M: Fast generation, ideal for long docs; AudioBook Feature: Best for books with chapters; Batch Processing: Generate multiple sections at once.
Export & File Management
Export Formats
WAV (Recommended for Quality)
• 24kHz, 16-bit, lossless
• Best for editing or professional use
• Larger file size
MP3 (Recommended for Sharing)
• Compressed, widely compatible
• Smaller file size
• Good for podcasts, videos, web
Export Options
From Generation View:
• Click "Export" button after generation
• Choose format and location
• Audio saved with timestamp
From History:
• Right-click any generation
• Select "Export"
• Batch export multiple items
Drag & Drop: Drag audio from history to Finder, quick export without dialogs.
Privacy & Offline Use
100% Private
Your data never leaves your Mac:
• All AI processing happens locally
• No cloud services or servers
• No analytics or tracking
• No account required
• No internet after model download
You can verify: Use Activity Monitor or Little Snitch, after initial model download, zero network activity, all data stored in local app container.
Completely Offline
After initial setup:
• No internet required for generation
• Perfect for travel
• Works on planes, trains, remote areas
• No API rate limits
• Unlimited generations
Only need internet for: Initial model download (one-time), app updates from Mac App Store.
Data Storage
Where your data lives:
• Generation History: Local SwiftData database
• Cloned Voices: Local app container
• Preferences: macOS UserDefaults
• AI Models:~/.cache/huggingface/hub/
• Exported Audio: Your chosen location
• Nothing in the cloud!
Troubleshooting
Model Download Issues
Problem: Download stuck or slow
Solutions:
• Check internet connection
• Try smaller model (4-bit vs standard)
• Check disk space (need 1-3GB free)
• Restart app and retry
• Check Model Library for progress
Generation Issues
Problem: Audio sounds robotic
Solutions:
• Reset speed to 1.0x
• Try different voice
• Simplify punctuation
• Break long sentences
Problem: Words mispronounced
Solutions:
• Use phonetic spelling (e.g., "Nee-chuh" for "Nietzsche")
• Add hyphens (e.g., "data-base")
• Use commas for pacing
Performance Issues
Problem: Generation is slow
Solutions:
• Close other intensive apps
• Ensure Mac is plugged in (not low-power mode)
• Use Kokoro-82M for faster generation
• Restart Mac to clear memory
• Check Activity Monitor for runaway processes
Can't Find Features
Problem: Where is voice cloning?
Answer: Click "Voice Clone" tab in sidebar
Problem: Where do I create multi-speaker dialogue?
Answer: Click "Conversations" tab in the sidebar, then add speakers or import a .txt/.pdf script.
Problem: How to import PDF for audiobook?
Answer: Click "AudioBook" tab → "New AudioBook" → Import PDF
• Disk Space: 2-6GB recommended (varies by models installed)
• RAM: 8GB minimum (16GB recommended)
• Internet: For one-time model download only
Intel Macs Not Supported
OpenVox requires Apple Silicon and uses Apple's MLX framework, which is not available on Intel Macs. There are no plans for Intel support as MLX is Apple Silicon-only.
Next Steps
Explore More Features
Try Conversations:
Click Conversations tab
Add two to four speakers or import a .txt/.pdf script
Assign a different voice to each speaker
Generate your first interview, skit, or dialogue scene
Try AudioBook Generation:
Click AudioBook tab
Import a PDF or text file
Generate chapter by chapter
Create your first audiobook!
Experiment with Voice Cloning:
Click Voice Clone tab
Record or upload 10-20 seconds of clean audio
Create your custom voice
Use it in AI Speech
Use Voice Changer:
Click Voice Changer tab
Import existing audio
Transform to different voice
Export transformed audio
Optimize Your Workflow
Set Favorites:
• Star your preferred voices
• Quick access across all features
Use Keyboard Shortcuts:
• ⌘+Enter to generate
• Space to play/pause
• ⌘+L to focus text
Organize History: Use search to find past generations, reuse settings from history, export batches for projects.
Learn Advanced Techniques
Fine-Tune Generation:
• Experiment with Temperature
• Adjust CFG Weight (Chatterbox)
• Use Exaggeration for character voices (Chatterbox)
Optimize for Use Case:
• Podcasts: Conversational voices at 1.2x
• Audiobooks: Narrative voices with chapters
• Professional: Professional voices at 1.0x
• Character Work: Chatterbox or Qwen3 TTS
Batch Processing: Use AudioBook for multi-chapter content, clone voices for consistency, export in bulk from History.
Getting Help
Documentation
• FAQ: Comprehensive answers to common questions
• Support Docs: Detailed technical information
• This Guide: Getting started and feature overview