Skip to main content

Available Providers

Why Choose OpenAI:
  • Natural sounding voices
  • Wide language support
  • Good cost structure
  • Easy setup (if already using GPT)
Quality: Excellent
Speed: 1-3 seconds to first audio
Cost: ~$0.015 per 1K characters
Languages: 50+
Voices: 5 options

ElevenLabs (Best Voice Quality)

Why Choose ElevenLabs:
  • Most natural sounding voices
  • Extensive voice library
  • Voice cloning available
  • Premium sound quality
Quality: Premium
Speed: 1-2 seconds
Cost: ~$0.03 per 1K characters
Languages: 30+
Voices: 100+ (including custom clones)

Cartesia (Enterprise Grade)

Why Choose Cartesia:
  • Studio-quality audio
  • Ultra-low latency
  • Extreme customization
  • Enterprise support
Quality: Studio
Speed: < 1 second
Cost: Custom pricing
Languages: 40+
Voices: Customizable

Rime AI (Fastest)

Why Choose Rime AI:
  • Fastest synthesis
  • Good quality
  • Real-time performance
  • Lower cost
Quality: Good
Speed: < 500ms (fastest)
Cost: ~$0.01 per 1K characters
Languages: 30+
Voices: 20+

Inworld AI (Character Voices)

Why Choose Inworld:
  • Character-specific voices
  • Interactive responses
  • Custom personality
  • Story-driven conversations
Quality: Good
Speed: 1-2 seconds
Cost: Custom pricing
Languages: 20+
Voices: Character personas

Provider Comparison

FeatureOpenAIElevenLabsCartesiaRime AI
Voice Quality⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Speed⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Cost⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Voices⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Languages⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Setup

Step 1: Get API Key If you’re already using OpenAI for LLM, you have everything needed.
  1. Go to platform.openai.com/api-keys
  2. Copy API key (or use existing)
  3. Ensure TTS is enabled in account
Step 2: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Click “API Keys”
  3. Select “OpenAI” from provider list (if not already added)
  4. Paste API key
  5. Save
Step 3: Configure Agent
  1. Create/Edit Agent
  2. Under “Text-to-Speech”, select:
    • tts-1 (fastest)
    • tts-1-hd (highest quality)
  3. Choose voice:
    • Alloy, Echo, Fable, Onyx, Nova, Shimmer
  4. Select language
  5. Adjust speed (0.25 - 4.0)
  6. Save
Step 4: Test
  1. Make a web call
  2. Listen to voice quality
  3. Adjust if needed

ElevenLabs Setup

Step 1: Create Account
  1. Visit elevenlabs.io
  2. Sign up for account
  3. Verify email
Step 2: Get API Key
  1. Go to Account Settings
  2. Click “API Keys”
  3. Copy your API Key
  4. Save securely
Step 3: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Click “API Keys”
  3. Select “ElevenLabs” from provider list
  4. Paste API key
  5. Click “Test Connection”
  6. Save
Step 4: Configure Agent
  1. Create/Edit Agent
  2. Under “Text-to-Speech”, select “ElevenLabs”
  3. Choose voice:
    • Browse available voices
    • Listen to samples
    • Select favorite
  4. Configure stability & clarity
  5. Save
Step 5: Optional - Clone Your Voice To create custom voice:
  1. Go to ElevenLabs dashboard
  2. Click “Voice Lab”
  3. Select “Clone Voice”
  4. Record 30-120 seconds of your voice
  5. ElevenLabs creates voice
  6. Use in agents

Cartesia Setup

Step 1: Contact Sales
  1. Visit cartesia.ai
  2. Request enterprise access
  3. Complete onboarding
  4. Receive API credentials
Step 2: Configure in CallIntel
  1. Go to Settings → Developer Settings
  2. Add Cartesia API key
  3. Configure for TTS
  4. Save
Step 3: Configure Agent
  1. Create/Edit Agent
  2. Select Cartesia for TTS
  3. Choose voice (provided by sales)
  4. Configure quality settings
  5. Save

Rime AI Setup

Step 1: Create Account
  1. Visit rime.ai
  2. Sign up for account
  3. Verify email
Step 2: Get API Key
  1. Go to Dashboard
  2. Click “API Keys”
  3. Create new key
  4. Copy key
Step 3: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Select “Rime AI”
  3. Paste API key
  4. Save
Step 4: Configure Agent
  1. Create/Edit Agent
  2. Select “Rime AI” for TTS
  3. Choose voice
  4. Adjust speed and tone
  5. Save

Cost Comparison

Monthly Costs (1000 calls, 2 min avg, 200 chars/min)

OpenAI (tts-1):
1000 calls × 2 min × 200 chars = 400,000 chars
400,000 × ($0.015/1000 chars) = $6/month
Cost per call: $0.006

ElevenLabs:
1000 calls × 2 min × 200 chars = 400,000 chars
400,000 × ($0.03/1000 chars) = $12/month
Cost per call: $0.012

Rime AI:
1000 calls × 2 min × 200 chars = 400,000 chars
400,000 × ($0.01/1000 chars) = $4/month
Cost per call: $0.004

Cartesia:
Custom pricing (typically enterprise rates)
Cost Ranking:
  1. Rime AI (cheapest)
  2. OpenAI (good value)
  3. ElevenLabs (premium)
  4. Cartesia (enterprise)

Voice Selection

OpenAI Voices

VoiceGenderStyleUse Case
AlloyNeutralProfessionalCustomer service
EchoMaleClearSales, support
FableNarrativeStorytellingInteractive scripts
OnyxMaleDeep, warmExecutive calls
NovaFemaleProfessionalGeneral purpose
ShimmerFemaleUpbeatFriendly service
Recommendation:
  • Customer service: Nova or Alloy
  • Sales: Echo
  • Executive: Onyx
  • Friendly tone: Shimmer

ElevenLabs Voice Selection

Browse 100+ voices by:
- Gender (male, female, neutral)
- Age (young, middle-aged, senior)
- Accent (American, British, etc.)
- Tone (friendly, professional, etc.)
Tips:
  1. Listen to samples
  2. Test with your script
  3. Consider context
  4. Select voice that matches brand

Custom Voice Cloning

Create voice that sounds like you:
Step 1: Record yourself (30-120 seconds)
Step 2: ElevenLabs processes audio
Step 3: Clone available for use
Step 4: Use in agents

Cost: Included with subscription
Quality: Premium
Use: Brand consistency

Configuration Options

Speed Control

Adjust how fast agent speaks:
0.5x = Very slow (for complex topics)
0.75x = Slow (for clarity)
1.0x = Normal (default)
1.25x = Fast (for efficiency)
1.5x = Very fast (for quick updates)

Recommendation: 1.0x for customer service

Voice Quality

OpenAI Quality Options:
tts-1: Faster, lower latency (50ms latency)
tts-1-hd: Higher quality audio (100ms latency)

Recommendation:
- Real-time calls: tts-1
- Pre-recorded calls: tts-1-hd
ElevenLabs Stability/Clarity:
Stability: 0-100 (lower = more variable)
Clarity: 0-100 (higher = more clear)

Balanced: Stability 50, Clarity 75
Natural: Stability 30, Clarity 75
Consistent: Stability 100, Clarity 75

Language & Accent

Most providers support multiple languages:
OpenAI: 50+ languages
ElevenLabs: 30+ languages
Rime AI: 30+ languages

Configuration:
1. Select language in agent settings
2. Choose appropriate voice
3. Test pronunciation
4. Save

Advanced Features

Voice Cloning (ElevenLabs)

Create custom voices:
Requirements:
- 30-120 seconds of clear audio
- No background noise
- Natural speaking voice
- 1-2 sentences recommended

Quality: Professional, realistic
Cost: Included in subscription
Use Cases: Brand consistency, CEO voice

Streaming Audio

Get audio as response generates:
Advantage: Start speaking immediately
Latency: Very low (< 500ms to first word)
Quality: Same as buffered
Cost: Same pricing
Best For: Real-time conversations

Emotion & Style Customization

Available through some providers:
OpenAI: Fixed emotions per voice
ElevenLabs: Variability control (emotion)
Rime AI: Emotion settings available

Use: Match tone to conversation context

Troubleshooting

Try ElevenLabs for more natural sound, or adjust stability/clarity settings if using ElevenLabs.
Adjust speed setting (0.5x - 4.0x depending on provider).
Provide phonetic guide to agent’s knowledge base, or use custom vocabulary feature if available.
Switch to Rime AI (lowest cost), or reduce character count by shortening responses.
Use tts-1 (not tts-1-hd) for OpenAI, or switch to Rime AI/Cartesia.

Best Practices

1. Start with OpenAI

Best balance of:
- Quality (natural sounding)
- Cost (reasonable)
- Languages (50+)
- Simplicity (integrated with GPT)

2. Test Multiple Voices

Create test agents with different voices
Make sample calls
Compare quality
Choose best for brand

3. Match Voice to Brand

Professional service: Nova or Alloy
Friendly service: Shimmer or custom
Technical support: Echo or Onyx
Sales: Nova (warm and professional)

4. Optimize Response Length

Shorter responses = Lower cost + Better experience
Keep responses under 200 characters
Use bullet points instead of paragraphs
Remove unnecessary words

5. Monitor Quality

Weekly Checks:
- Listen to sample calls
- Check pronunciation
- Verify speed
- Ensure consistency

Performance Tips

Real-Time Optimization

Configuration | Latency   | Quality
--------------|-----------|----------
OpenAI tts-1  | 50-100ms  | Good
Rime AI       | 50-150ms  | Good
Cartesia      | < 50ms    | Excellent
ElevenLabs    | 100-200ms | Excellent

Cost Optimization

Rank | Provider   | Cost/Call
-----|------------|----------
1    | Rime AI    | $0.004
2    | OpenAI     | $0.006
3    | ElevenLabs | $0.012
4    | Cartesia   | Enterprise

Quality Ranking

1. ElevenLabs (most natural)
2. Cartesia (studio quality)
3. OpenAI (professional)
4. Rime AI (good quality)

See Also


Support

OpenAI TTS Docs

Contact Support