Available Providers
OpenAI (Recommended - Best Balance)
Why Choose OpenAI:- Natural sounding voices
- Wide language support
- Good cost structure
- Easy setup (if already using GPT)
ElevenLabs (Best Voice Quality)
Why Choose ElevenLabs:- Most natural sounding voices
- Extensive voice library
- Voice cloning available
- Premium sound quality
Cartesia (Enterprise Grade)
Why Choose Cartesia:- Studio-quality audio
- Ultra-low latency
- Extreme customization
- Enterprise support
Rime AI (Fastest)
Why Choose Rime AI:- Fastest synthesis
- Good quality
- Real-time performance
- Lower cost
Inworld AI (Character Voices)
Why Choose Inworld:- Character-specific voices
- Interactive responses
- Custom personality
- Story-driven conversations
Provider Comparison
| Feature | OpenAI | ElevenLabs | Cartesia | Rime AI |
|---|---|---|---|---|
| Voice Quality | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Speed | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Cost | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Voices | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Languages | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Setup
OpenAI TTS Setup (Recommended)
Step 1: Get API Key If you’re already using OpenAI for LLM, you have everything needed.- Go to platform.openai.com/api-keys
- Copy API key (or use existing)
- Ensure TTS is enabled in account
- Go to Settings → Developer Settings
- Click “API Keys”
- Select “OpenAI” from provider list (if not already added)
- Paste API key
- Save
- Create/Edit Agent
- Under “Text-to-Speech”, select:
tts-1(fastest)tts-1-hd(highest quality)
- Choose voice:
- Alloy, Echo, Fable, Onyx, Nova, Shimmer
- Select language
- Adjust speed (0.25 - 4.0)
- Save
- Make a web call
- Listen to voice quality
- Adjust if needed
ElevenLabs Setup
Step 1: Create Account- Visit elevenlabs.io
- Sign up for account
- Verify email
- Go to Account Settings
- Click “API Keys”
- Copy your API Key
- Save securely
- Go to Settings → Developer Settings
- Click “API Keys”
- Select “ElevenLabs” from provider list
- Paste API key
- Click “Test Connection”
- Save
- Create/Edit Agent
- Under “Text-to-Speech”, select “ElevenLabs”
- Choose voice:
- Browse available voices
- Listen to samples
- Select favorite
- Configure stability & clarity
- Save
- Go to ElevenLabs dashboard
- Click “Voice Lab”
- Select “Clone Voice”
- Record 30-120 seconds of your voice
- ElevenLabs creates voice
- Use in agents
Cartesia Setup
Step 1: Contact Sales- Visit cartesia.ai
- Request enterprise access
- Complete onboarding
- Receive API credentials
- Go to Settings → Developer Settings
- Add Cartesia API key
- Configure for TTS
- Save
- Create/Edit Agent
- Select Cartesia for TTS
- Choose voice (provided by sales)
- Configure quality settings
- Save
Rime AI Setup
Step 1: Create Account- Visit rime.ai
- Sign up for account
- Verify email
- Go to Dashboard
- Click “API Keys”
- Create new key
- Copy key
- Go to Settings → Developer Settings
- Select “Rime AI”
- Paste API key
- Save
- Create/Edit Agent
- Select “Rime AI” for TTS
- Choose voice
- Adjust speed and tone
- Save
Cost Comparison
Monthly Costs (1000 calls, 2 min avg, 200 chars/min)
- Rime AI (cheapest)
- OpenAI (good value)
- ElevenLabs (premium)
- Cartesia (enterprise)
Voice Selection
OpenAI Voices
| Voice | Gender | Style | Use Case |
|---|---|---|---|
| Alloy | Neutral | Professional | Customer service |
| Echo | Male | Clear | Sales, support |
| Fable | Narrative | Storytelling | Interactive scripts |
| Onyx | Male | Deep, warm | Executive calls |
| Nova | Female | Professional | General purpose |
| Shimmer | Female | Upbeat | Friendly service |
- Customer service: Nova or Alloy
- Sales: Echo
- Executive: Onyx
- Friendly tone: Shimmer
ElevenLabs Voice Selection
Browse 100+ voices by:- Listen to samples
- Test with your script
- Consider context
- Select voice that matches brand
Custom Voice Cloning
Create voice that sounds like you:Configuration Options
Speed Control
Adjust how fast agent speaks:Voice Quality
OpenAI Quality Options:Language & Accent
Most providers support multiple languages:Advanced Features
Voice Cloning (ElevenLabs)
Create custom voices:Streaming Audio
Get audio as response generates:Emotion & Style Customization
Available through some providers:Troubleshooting
Voice sounds robotic or unnatural
Voice sounds robotic or unnatural
Try ElevenLabs for more natural sound, or adjust stability/clarity settings if using ElevenLabs.
Speech is too fast or too slow
Speech is too fast or too slow
Adjust speed setting (0.5x - 4.0x depending on provider).
Wrong pronunciation of company names
Wrong pronunciation of company names
Provide phonetic guide to agent’s knowledge base, or use custom vocabulary feature if available.
Cost is higher than expected
Cost is higher than expected
Switch to Rime AI (lowest cost), or reduce character count by shortening responses.
Latency too high for real-time calls
Latency too high for real-time calls
Use tts-1 (not tts-1-hd) for OpenAI, or switch to Rime AI/Cartesia.
Best Practices
1. Start with OpenAI
2. Test Multiple Voices
3. Match Voice to Brand
4. Optimize Response Length
5. Monitor Quality
Performance Tips
Real-Time Optimization
Cost Optimization
Quality Ranking
See Also
Speech-to-Text
Configure voice input
AI Models Overview
Compare all AI providers
Agent Setup
Create complete agents