Welcome to OpenVox – Local Voice AI Transform text into natural speech with 300+ premium AI voices, plus OmniVoice coverage across 600+ languages, completely private and offline on your device.
Quick Start (5 Minutes)
1. Install OpenVox
• Download OpenVox from the App Store on Mac or iPad, or download the Windows installer from openvoxai.com
• Open the app from Applications (macOS), the Home Screen (iPadOS), or the Start menu (Windows)
• Grant any required permissions (microphone for voice cloning, file access for AudioBooks)
2. Your First Generation
Step 1: Enter Your Text
• Type or paste any text into the main text area
• Try something simple like: "Hello! This is my first audio generation with OpenVox."
Step 2: Choose a Voice
• Click the voice selector or browse voices
• For your first try, use Kokoro-82M voices (fast generation)
• Popular choices: "Bella" (US Female) or "Adam" (US Male)
Step 3: Generate
• Select "Generate" (or use the platform keyboard shortcut shown in the app)
• On first use, OpenVox will download the AI model (~2-5 minutes, one-time only)
• Watch the progress bar with real-time ETA
• Audio plays automatically when ready!
Step 4: Export (Optional)
• Click "Export" to save your audio
• Choose WAV (high quality) or MP3 (smaller file)
• Save to your desired location
Congratulations! You've created your first AI-generated audio.
Understanding OpenVox
Four Local AI Models
OpenVox includes four complementary models, each optimized for different needs:
OmniVoice (Massive Language Coverage)
• 600+ languages with broad global and regional coverage
• Best for: Long-tail languages, regional support, and underserved language coverage
• Built for Mac and iPad so wide language coverage fits into a real local workflow
Kokoro-82M (Fast & Efficient)
• 60+ voices in 9 languages
• ⚡ Optimized for speed and long documents
• Best for: Articles, scripts, batch processing
• Languages include English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Portuguese
Chatterbox (Quality & Versatility)
• 240+ voices in 23 languages
• Best for: High-quality voiceovers, audiobooks, and premium speech output
• Strong choice for voice cloning and expressive controls like CFG weight and exaggeration
• Covers major world languages including Arabic, German, Korean, Swedish, Turkish, and more
Qwen3 TTS + Voice Design (Custom Reusable Voices)
• Design voices from text prompts and reuse them across workflows
• Best for: Custom voice creation, reusable presets, and polished voice design workflows
• Model sizes: 1.6GB to 3.1GB depending on the selected Qwen3 package
• Supports multiple languages and locally generated voice samples
Choosing the Right Model:
• Need broad language coverage? → Use OmniVoice
• Need speed? → Use Kokoro-82M
• Need premium voiceovers or cloning? → Use Chatterbox
• Need custom voice design? → Use Qwen3 TTS
• Need reusable designed voices? → Use Qwen3 TTS + Voice Design
Main Features Tour
AI Speech Generation (Main Tab)
The primary feature for converting text to speech.
Basic Controls:
• Text Input: Enter or paste your text (no length limits)
• Voice Selector: Browse 300+ premium voices and switch into OmniVoice when you need 600+ language coverage
• Speed: Adjust from 0.5x to 2.0x (default: 1.0x)
• Model Switcher: Choose between OmniVoice, Kokoro-82M, Chatterbox, and Qwen3 TTS depending on the workflow
Advanced Controls:
• Temperature: Control randomness (higher = more variation)
• Keep reference samples focused; OpenVox warns when audio is longer than the recommended 10-20 seconds
💾 Generation History
All your generations are automatically saved locally.
Voice Design (Qwen3)
The Voice Design page lets you create unique AI voices using Qwen3 Voice Design. You can describe the desired voice style in natural language, select language and gender, and generate custom samples for use in AI Speech and AudioBook.
1. Enter a Voice Description: Use natural language to describe the style, tone, or personality you want (e.g., "cinematic storyteller", "premium assistant", "energetic announcer").
2. Write a Script: Provide a short sample script (up to 30 seconds) for the voice to speak. This helps the model understand context and emotion.
3. Select Language & Gender: Choose the language and gender for your custom voice. Qwen3 supports multiple languages and both male/female options.
4. Generate & Preview: Click to generate the sample. Listen to the result and adjust your description or script for best results.
5. Save to Voice Library: Store your generated voice as a reusable reference for AI Speech and AudioBook. Name your voice and select gender before saving.
Tips for Best Voice Design Results
• Use clear, descriptive language for voice style (e.g., "warm resonance", "controlled pacing", "studio polish").
• Try different scripts to capture emotion or context.
• Save multiple versions for different use cases (e.g., audiobook narrator, assistant, character voices).
• All generated voices are stored locally and can be managed in the Voice Library.
Qwen3 Voice Design is the most advanced local voice generation available, offering SOTA quality and best cloning for custom voices. Experiment and enjoy creating unique voices for your projects!
Features:
• Search by text content or voice name
• Filter by AI Speech or Voice Changer
• Grid or list view
• Replay audio instantly
• Reuse settings from previous generations
• Export or delete past generations
Access: Click History tab in sidebar, search bar at top for quick filtering, click any item to replay audio, right-click for export or delete options.
Voice Library
OpenVox includes 300+ professional voices plus OmniVoice coverage across 600+ languages for broader language reach.
Browsing Voices
By Model:
• OmniVoice: 600+ languages for broad global and regional language support
• Kokoro-82M: Fast generation, 60+ voices, 9 languages
• Chatterbox: High quality, 240+ voices, 23 languages
• Qwen3 TTS + Voice Design: Custom reusable voices, advanced voice design, and multiple language workflows
By Language:
• Filter by your target language
• See voice count per language
• Preview voices with sample audio
By Characteristics:
• Gender: Male, Female
• Age: Young, Middle-aged, Old (Chatterbox only)
• Accent: American, British (Chatterbox only)
Favorites: Click the star icon to save favorites, quick access to your preferred voices, works across all features.
Voice Previews
• Click the play icon next to any voice
• Listen to sample audio before generating
• Preview shows voice characteristics
• Helps you choose the right voice for your project
Choosing Quantization:Standard: Best quality, largest size; 8-bit: Balanced quality/size; 4-bit: Smallest size, slightly lower quality.
First-Time Model Download
On your first generation, OpenVox automatically downloads the required model:
• OmniVoice: Download time varies by selected package and connection speed
• Kokoro-82M: 2-5 minutes
• Chatterbox: 5-15 minutes (varies by version)
• Qwen3 TTS Base: 5-20 minutes (varies by model size)
Progress Tracking:
• Real-time download progress
• ETA displayed
• Can use app while downloading
Internet Required: One-time download only, downloaded from Hugging Face. Models are cached locally (macOS: ~/.cache/huggingface/; iPadOS and Windows: app-managed local storage). After download: Core generation works completely offline.
Local API Usage
Note: The Local API is a macOS-only feature and is not available on iPadOS or Windows.
OpenVox for macOS can expose a local voice API for apps, agents, scripts, and automations. The API is hosted on your Mac at http://127.0.0.1:8000/v1. Keep it on loopback unless you intentionally need local network access.
Recommended request flow:
Call GET /models to discover available model IDs.
Before the first speech request, call POST /models/{model}/load to warm the model.
Call GET /models/{model}/languages to find valid language codes.
Call GET /models/{model}/voices?language={code} to select a compatible voice.
Use the same language code in POST /audio/speech so the voice and synthesis language stay aligned.
curl -N -X POST "http://127.0.0.1:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "chatterbox-turbo-small",
"language": "en",
"input": "OpenVox is streaming this response locally.",
"stream": true
}'
• Streaming uses server-sent events named response.created, audio.chunk, and response.completed.
• Treat audio.chunk.data.audio as base64-encoded WAV data and decode it before playback.
• If the API returns 429, wait and retry because only one generation or preload job can run at a time.
• If a requested voice is missing, fetch the voice list again for the same model and language, then choose a valid replacement.
• If the local API is unavailable, keep your app or agent in text mode and show that local voice output is currently unavailable.
Keyboard Shortcuts
Speed up your workflow with these shortcuts:
General
• ⌘+Enter – Generate audio
• ⌘+L – Focus text input
• Space – Play/Pause audio
• ⌘+H – Open History
Text Editing
• ⌘+A – Select all text
• ⌘+C – Copy
• ⌘+V – Paste
• ⌘+Z – Undo
Navigation
• ⌘+1 – AI Speech tab
• ⌘+2 – AudioBook tab
• ⌘+3 – Voice Changer tab
• ⌘+4 – Voice Clone tab
• ⌘+5 – History tab
Tips for Best Results
Text Input
Do:
• Use proper punctuation for natural pacing
• Break long texts into paragraphs
• Use commas for pauses
• Write in complete sentences
• Use quotes for dialogue: "Hello," she said.
Don't:
• Use excessive exclamation marks!!!
• Write in ALL CAPS (unless emphasizing)
• Include URLs or code (spell them out instead)
• Use special characters excessively ($$$, ***, etc.)
Voice Selection
For Narration:
• Use Narrative voices (Kokoro)
• Professional, clear, storytelling tone
For Conversational:
• Use Conversational A/B voices (Kokoro)
• Natural, friendly tone
For Professional:
• Use Professional voices (Chatterbox or Qwen3 TTS)
• Formal, clear, business-appropriate
For Character Voices:
• Use Chatterbox with exaggeration control
• Experiment with different ages and accents
Speed Settings
• 0.5x-0.8x: Slow, deliberate (learning content)
• 1.0x: Natural pace (default)
• 1.2x-1.5x: Faster (podcast-style)
• 1.5x-2.0x: Very fast (time-saving)
Long Documents
For texts over 5,000 words: Kokoro-82M: Fast generation, ideal for long docs; AudioBook Feature: Best for books with chapters; Batch Processing: Generate multiple sections at once.
Export & File Management
Export Formats
WAV (Recommended for Quality)
• 24kHz, 16-bit, lossless
• Best for editing or professional use
• Larger file size
MP3 (Recommended for Sharing)
• Compressed, widely compatible
• Smaller file size
• Good for podcasts, videos, web
Export Options
From Generation View:
• Click "Export" button after generation
• Choose format and location
• Audio saved with timestamp
From History:
• Right-click any generation
• Select "Export"
• Batch export multiple items
Drag & Drop: Drag audio from history to Finder or File Explorer for quick export without dialogs.
Privacy & Offline Use
100% Private
Your data never leaves your device:
• All AI processing happens locally
• No cloud services or servers
• No analytics or tracking
• No account required
• No internet after model download
You can verify: Use Activity Monitor or Little Snitch on macOS, Settings on iPadOS, or Resource Monitor/Windows Firewall on Windows. After initial model downloads, core generation remains local and app data stays in local application storage.
Completely Offline
After initial setup:
• No internet required for generation
• Perfect for travel
• Works on planes, trains, remote areas
• No API rate limits
• Unlimited generations
Only need internet for: Initial model downloads and app updates from the App Store or the Windows distribution channel.
• Disk Space: 2-6GB recommended (varies by models installed)
• RAM: 8GB minimum (16GB recommended)
• Internet: For one-time model download only
Minimum Requirements (iPadOS)
• iPad: iPad with Apple Silicon (M1, M2, M3, M4, or later)
• iPadOS: iPadOS 18.4 or later
• Disk Space: 2-6GB recommended (varies by models installed)
• Internet: For one-time model download only
Minimum Requirements (Windows)
• OS: Windows 10 or Windows 11
• Architecture: x64
• RAM: 8 GB minimum, 16 GB recommended
• Disk Space: 1–2 GB free for the app
• Hardware: Works on CPU; NVIDIA GPU with 4 GB+ VRAM recommended for faster generation
• Internet: Required for installer, model downloads, and updates
Intel Macs Not Supported
OpenVox requires Apple Silicon and uses Apple's MLX framework, which is not available on Intel Macs. There are no plans for Intel support as MLX is Apple Silicon-only.
Next Steps
Explore More Features
Try Conversations:
Click Conversations tab
Add two to four speakers or import a .txt/.pdf script
Assign a different voice to each speaker
Generate your first interview, skit, or dialogue scene
Try AudioBook Generation:
Click AudioBook tab
Import a PDF or text file
Generate chapter by chapter
Create your first audiobook!
Experiment with Voice Cloning:
Click Voice Clone tab
Record or upload 10-20 seconds of clean audio
Create your custom voice
Use it in AI Speech
Use Voice Changer:
Click Voice Changer tab
Import existing audio
Transform to different voice
Export transformed audio
Optimize Your Workflow
Set Favorites:
• Star your preferred voices
• Quick access across all features
Use Keyboard Shortcuts:
• ⌘+Enter to generate
• Space to play/pause
• ⌘+L to focus text
Organize History: Use search to find past generations, reuse settings from history, export batches for projects.
Learn Advanced Techniques
Fine-Tune Generation:
• Experiment with Temperature
• Adjust CFG Weight (Chatterbox)
• Use Exaggeration for character voices (Chatterbox)
Optimize for Use Case:
• Podcasts: Conversational voices at 1.2x
• Audiobooks: Narrative voices with chapters
• Professional: Professional voices at 1.0x
• Character Work: Chatterbox or Qwen3 TTS
Batch Processing: Use AudioBook for multi-chapter content, clone voices for consistency, export in bulk from History.
Getting Help
Documentation
• FAQ: Comprehensive answers to common questions
• Support Docs: Detailed technical information
• This Guide: Getting started and feature overview