Getting Started with OpenVox
Welcome to OpenVox – Local Voice AI
Transform text into natural speech with 300+ AI voices across 23 languages, completely private and offline on your Mac.
Quick Start (5 Minutes)
1. Install OpenVox
- • Download OpenVox from the Mac App Store
- • Open from your Applications folder
- • Grant any required permissions (microphone for voice cloning, file access for AudioBooks)
2. Your First Generation
Step 1: Enter Your Text
- • Type or paste any text into the main text area
- • Try something simple like: "Hello! This is my first audio generation with OpenVox."
Step 2: Choose a Voice
- • Click the voice selector or browse voices
- • For your first try, use Kokoro-82M voices (fast generation)
- • Popular choices: "Bella" (US Female) or "Adam" (US Male)
Step 3: Generate
- • Click "Generate" or press
⌘+Enter - • On first use, OpenVox will download the AI model (~2-5 minutes, one-time only)
- • Watch the progress bar with real-time ETA
- • Audio plays automatically when ready!
Step 4: Export (Optional)
- • Click "Export" to save your audio
- • Choose WAV (high quality) or MP3 (smaller file)
- • Save to your desired location
🎉 Congratulations! You've created your first AI-generated audio.
Understanding OpenVox
Two Powerful AI Models
OpenVox includes two complementary models, each optimized for different needs:
🚀 Kokoro-82M (Fast & Efficient)
- • 60+ voices in 9 languages
- • ⚡ Optimized for speed and long documents
- • 📄 Best for: Articles, scripts, batch processing
- • 🌍 Languages: US English, British English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, Portuguese
🎨 Chatterbox (Quality & Versatility)
- • 240+ voices in 23 languages
- • 🎙️ Optimized for voice cloning and premium audio
- • ✨ Best for: High-quality voiceovers, audiobooks, professional projects
- • 🌍 Languages: All Kokoro languages plus Arabic, Danish, Dutch, Finnish, German, Greek, Hebrew, Korean, Malay, Norwegian, Polish, Russian, Swahili, Swedish, Turkish
Choosing the Right Model:
- • Need speed? → Use Kokoro-82M
- • Need quality? → Use Chatterbox
- • Need voice cloning? → Use Chatterbox (only model that supports cloning)
Main Features Tour
🎤 AI Speech Generation (Main Tab)
The primary feature for converting text to speech.
Basic Controls:
- • Text Input: Enter or paste your text (no length limits)
- • Voice Selector: Browse 300+ voices by model, language, gender, or favorites
- • Speed: Adjust from 0.5x to 2.0x (default: 1.0x)
- • Model Switcher: Choose between Kokoro-82M and Chatterbox
Advanced Controls:
- • Temperature: Control randomness (higher = more variation)
- • CFG Weight: Classifier-free guidance (Chatterbox only)
- • Exaggeration: Voice characteristic intensity (Chatterbox only)
- • Post-Processing: Silence removal, audio normalization
Tips for Best Results:
- • Use proper punctuation for natural pacing
- • Break very long texts into paragraphs
- • Use commas for pauses
- • Avoid excessive ALL CAPS or exclamation marks!!!
- • For technical terms, use phonetic spelling if mispronounced
📖 AI AudioBook Generation
Create complete audiobooks from PDF or text files.
How to Use:
- Click the AudioBook tab in the sidebar
- Click "New AudioBook" or import a PDF/text file
- OpenVox auto-detects chapters (or create manually)
- Set voice and settings per chapter (or use same for all)
- Generate individual chapters or batch process entire book
- Export final audio when complete
Features:
- • Chapter-by-chapter management
- • Per-chapter voice customization
- • Batch processing for entire books
- • Integrated audio player for preview
- • Chapter reordering and deletion
Best For: Converting books to audio format, creating narrated content, long-form content with multiple sections.
🔄 AI Voice Changer
Transform existing audio to different voice characteristics.
How to Use:
- Click the Voice Changer tab
- Import source audio (MP3 or WAV)
- Select target voice from 300+ options
- Adjust exaggeration control (how much to transform)
- Click "Convert"
- For audio >30 seconds, automatic chunking handles it
- Export transformed audio
Best For: Character voice variations, podcast voice consistency, audio enhancement, creative voice effects.
🎙️ Voice Cloning
Clone voices from your own audio samples (Chatterbox only).
How to Use:
- Click the Voice Clone tab
- Click "New Voice"
- Choose Language: 23 languages supported
- Select Gender: Male or Female
- Provide Audio: Upload audio file (15-30 seconds recommended) OR record directly in the app
- Add Transcript: Type what the audio says (improves accuracy)
- Click "Create Voice"
- Use your cloned voice in AI Speech or AudioBook
Requirements:
- • Clear audio sample (15-30 seconds minimum)
- • One speaker only (no background voices)
- • Reference transcript matching audio
- • Supported format: MP3 or WAV
Tips:
- • Use high-quality audio (no background noise)
- • Speak naturally at normal pace
- • Provide accurate transcript
- • Longer samples (30-60s) = better results
💾 Generation History
All your generations are automatically saved locally.
Features:
- • Search by text content or voice name
- • Filter by AI Speech or Voice Changer
- • Grid or list view
- • Replay audio instantly
- • Reuse settings from previous generations
- • Export or delete past generations
Access: Click History tab in sidebar, search bar at top for quick filtering, click any item to replay audio, right-click for export or delete options.
Voice Library
OpenVox includes 300+ professional voices optimized for different use cases.
Browsing Voices
By Model:
- • Kokoro-82M: Fast generation, 60+ voices, 9 languages
- • Chatterbox: High quality, 240+ voices, 23 languages
By Language:
- • Filter by your target language
- • See voice count per language
- • Preview voices with sample audio
By Characteristics:
- • Gender: Male, Female
- • Age: Young, Middle-aged, Old (Chatterbox only)
- • Accent: American, British (Chatterbox only)
Favorites: Click the star icon to save favorites, quick access to your preferred voices, works across all features.
Voice Previews
- • Click the play icon next to any voice
- • Listen to sample audio before generating
- • Preview shows voice characteristics
- • Helps you choose the right voice for your project
Managing Models
Model Library
Access via Sidebar → Manage Models button.
Available Models:
- • Kokoro-82M: ~327MB (fast, 9 languages)
- • Chatterbox (Standard): ~1.2GB (high quality)
- • Chatterbox (8-bit): ~600MB (balanced)
- • Chatterbox (4-bit): ~400MB (memory efficient)
- • Chatterbox Multilingual: ~800MB-1.5GB (23 languages)
Actions:
- • View download status
- • Download models in advance
- • Delete unused models to free space
- • Switch between quantization levels
Choosing Quantization: Standard: Best quality, largest size; 8-bit: Balanced quality/size; 4-bit: Smallest size, slightly lower quality.
First-Time Model Download
On your first generation, OpenVox automatically downloads the required model:
- • Kokoro-82M: 2-5 minutes
- • Chatterbox: 5-15 minutes (varies by version)
Progress Tracking:
- • Real-time download progress
- • ETA displayed
- • Can use app while downloading
Internet Required: One-time download only, downloaded from HuggingFace, models cached locally in ~/.cache/huggingface/, After download: Completely offline!
Keyboard Shortcuts
Speed up your workflow with these shortcuts:
General
- •
⌘+Enter – Generate audio - •
⌘+L – Focus text input - •
Space – Play/Pause audio - •
⌘+H – Open History
Text Editing
- •
⌘+A – Select all text - •
⌘+C – Copy - •
⌘+V – Paste - •
⌘+Z – Undo
Navigation
- •
⌘+1 – AI Speech tab - •
⌘+2 – AudioBook tab - •
⌘+3 – Voice Changer tab - •
⌘+4 – Voice Clone tab - •
⌘+5 – History tab
Tips for Best Results
Text Input
Do:
- • ✅ Use proper punctuation for natural pacing
- • ✅ Break long texts into paragraphs
- • ✅ Use commas for pauses
- • ✅ Write in complete sentences
- • ✅ Use quotes for dialogue: "Hello," she said.
Don't:
- • ❌ Use excessive exclamation marks!!!
- • ❌ Write in ALL CAPS (unless emphasizing)
- • ❌ Include URLs or code (spell them out instead)
- • ❌ Use special characters excessively ($$$, ***, etc.)
Voice Selection
For Narration:
- • Use Narrative voices (Kokoro)
- • Professional, clear, storytelling tone
For Conversational:
- • Use Conversational A/B voices (Kokoro)
- • Natural, friendly tone
For Professional:
- • Use Professional voices (Chatterbox)
- • Formal, clear, business-appropriate
For Character Voices:
- • Use Chatterbox with exaggeration control
- • Experiment with different ages and accents
Speed Settings
- • 0.5x-0.8x: Slow, deliberate (learning content)
- • 1.0x: Natural pace (default)
- • 1.2x-1.5x: Faster (podcast-style)
- • 1.5x-2.0x: Very fast (time-saving)
Long Documents
For texts over 5,000 words: Kokoro-82M: Fast generation, ideal for long docs; AudioBook Feature: Best for books with chapters; Batch Processing: Generate multiple sections at once.
Export & File Management
Export Formats
WAV (Recommended for Quality)
- • 24kHz, 16-bit, lossless
- • Best for editing or professional use
- • Larger file size
MP3 (Recommended for Sharing)
- • Compressed, widely compatible
- • Smaller file size
- • Good for podcasts, videos, web
Export Options
From Generation View:
- • Click "Export" button after generation
- • Choose format and location
- • Audio saved with timestamp
From History:
- • Right-click any generation
- • Select "Export"
- • Batch export multiple items
Drag & Drop: Drag audio from history to Finder, quick export without dialogs.
Privacy & Offline Use
100% Private
Your data never leaves your Mac:
- • ✅ All AI processing happens locally
- • ✅ No cloud services or servers
- • ✅ No analytics or tracking
- • ✅ No account required
- • ✅ No internet after model download
You can verify: Use Activity Monitor or Little Snitch, after initial model download, zero network activity, all data stored in local app container.
Completely Offline
After initial setup:
- • ✅ No internet required for generation
- • ✅ Perfect for travel
- • ✅ Works on planes, trains, remote areas
- • ✅ No API rate limits
- • ✅ Unlimited generations
Only need internet for: Initial model download (one-time), app updates from Mac App Store.
Data Storage
Where your data lives:
- • Generation History: Local SwiftData database
- • Cloned Voices: Local app container
- • Preferences: macOS UserDefaults
- • AI Models:
~/.cache/huggingface/hub/ - • Exported Audio: Your chosen location
- • Nothing in the cloud!
Troubleshooting
Model Download Issues
Problem: Download stuck or slow
Solutions:
- • Check internet connection
- • Try smaller model (4-bit vs standard)
- • Check disk space (need 1-3GB free)
- • Restart app and retry
- • Check Model Library for progress
Generation Issues
Problem: Audio sounds robotic
Solutions:
- • Reset speed to 1.0x
- • Try different voice
- • Simplify punctuation
- • Break long sentences
Problem: Words mispronounced
Solutions:
- • Use phonetic spelling (e.g., "Nee-chuh" for "Nietzsche")
- • Add hyphens (e.g., "data-base")
- • Use commas for pacing
Performance Issues
Problem: Generation is slow
Solutions:
- • Close other intensive apps
- • Ensure Mac is plugged in (not low-power mode)
- • Use Kokoro-82M for faster generation
- • Restart Mac to clear memory
- • Check Activity Monitor for runaway processes
Can't Find Features
Problem: Where is voice cloning?
Answer: Click "Voice Clone" tab in sidebar
Problem: How to import PDF for audiobook?
Answer: Click "AudioBook" tab → "New AudioBook" → Import PDF
Problem: Where are advanced controls?
Answer: Expand "Advanced" section below voice selector
System Requirements
Minimum Requirements
- • Mac: Apple Silicon (M1, M2, M3, M4, or later)
- • macOS: macOS 15.0 (Sequoia)
- • Disk Space: 1-3GB (varies by models installed)
- • RAM: 8GB minimum (16GB recommended)
- • Internet: For one-time model download only
Intel Macs Not Supported
OpenVox requires Apple Silicon and uses Apple's MLX framework, which is not available on Intel Macs. There are no plans for Intel support as MLX is Apple Silicon-only.
Next Steps
Explore More Features
Try AudioBook Generation:
- Click AudioBook tab
- Import a PDF or text file
- Generate chapter by chapter
- Create your first audiobook!
Experiment with Voice Cloning:
- Click Voice Clone tab
- Record or upload 30 seconds of audio
- Create your custom voice
- Use it in AI Speech
Use Voice Changer:
- Click Voice Changer tab
- Import existing audio
- Transform to different voice
- Export transformed audio
Optimize Your Workflow
Set Favorites:
- • Star your preferred voices
- • Quick access across all features
Use Keyboard Shortcuts:
- •
⌘+Enter to generate - •
Space to play/pause - •
⌘+L to focus text
Organize History: Use search to find past generations, reuse settings from history, export batches for projects.
Learn Advanced Techniques
Fine-Tune Generation:
- • Experiment with Temperature
- • Adjust CFG Weight (Chatterbox)
- • Use Exaggeration for character voices
Optimize for Use Case:
- • Podcasts: Conversational voices at 1.2x
- • Audiobooks: Narrative voices with chapters
- • Professional: Professional voices at 1.0x
- • Character Work: Chatterbox with high exaggeration
Batch Processing: Use AudioBook for multi-chapter content, clone voices for consistency, export in bulk from History.
Getting Help
Documentation
- • FAQ: Comprehensive answers to common questions
- • Support Docs: Detailed technical information
- • This Guide: Getting started and feature overview
Contact Support
- • Email: support@theoracleguy.in
- • Website: theoracleguy.in/support
- • Response Time: 24-48 hours
Report Issues
Include in your report:
- • Description of the problem
- • Steps to reproduce
- • Sample text (if generation-related)
- • System info (macOS version, Mac model)
- • Screenshots or error messages
Welcome to OpenVox!
You're now ready to transform text into natural speech with complete privacy and control.
Remember:
- • 🔒 100% private – No cloud, no tracking
- • 🌐 Completely offline after setup
- • 🎙️ 300+ voices across 23 languages
- • 🚀 Two models: Kokoro-82M (speed) and Chatterbox (quality)
- • ⚡ Apple Silicon accelerated via MLX
- • 🎨 Voice cloning, AudioBooks, Voice Changer included
Need Help?
- • Check the FAQ for quick answers
- • Visit theoracleguy.in/support for more information
- • Email support@theoracleguy.in
Enjoy creating amazing audio with OpenVox!
*Last Updated: January 27, 2026 | Version 1.0.0*