Available Providers
Deepgram (Recommended - Best Performance)
Why Choose Deepgram:- Lowest latency (200ms)
- Highest accuracy
- Excellent cost/performance ratio
- Best for real-time conversations
AssemblyAI (Best for Accuracy)
Why Choose AssemblyAI:- Exceptional accuracy
- Great documentation
- Reliable service
- Good cost structure
Cartesia (Enterprise Grade)
Why Choose Cartesia:- Enterprise-level performance
- Custom models available
- Premium support
- Dedicated infrastructure
- Latency: < 250ms
- Accuracy: 96%+
- Cost: Custom pricing
- Languages: 40+
- Special Features: Domain adaptation, custom models
Provider Comparison
| Feature | Deepgram | AssemblyAI | Cartesia |
|---|---|---|---|
| Latency | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Accuracy | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Cost | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
| Languages | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Support | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Setup
Deepgram Setup (Recommended)
Step 1: Create Account- Visit deepgram.com
- Sign up for free account
- Verify email
- Go to Console
- Click “API Keys” in left menu
- Click “Create a new API Key”
- Select “Scoped API Key”
- Assign “speech-to-text” scope
- Copy key
- Go to Settings → Developer Settings
- Click “API Keys”
- Select “Deepgram” from provider list
- Paste API key
- Click “Test Connection”
- Save
- Create/Edit Agent
- Under “Speech-to-Text”, select:
nova-2(default, best accuracy)nova(faster)enhanced(most accurate)
- Select language and options
- Save
AssemblyAI Setup
Step 1: Create Account- Visit assemblyai.com
- Sign up for account
- Verify email
- Go to Dashboard
- Click “API Keys” in sidebar
- Copy API key under “Your API Token”
- Go to Settings → Developer Settings
- Click “API Keys”
- Select “AssemblyAI” from provider list
- Paste API key
- Click “Test Connection”
- Save
- Create/Edit Agent
- Under “Speech-to-Text”, select:
- “Standard” (default)
- “Enhanced” (more accurate)
- Configure language
- Enable diarization if needed
- Save
Cartesia Setup
Step 1: Contact Sales- Visit cartesia.ai
- Request enterprise access
- Complete onboarding
- Receive API credentials
- Go to Settings → Developer Settings
- Click “API Keys”
- Select “Cartesia” from provider list
- Paste API key
- Click “Test Connection”
- Save
- Create/Edit Agent
- Under “Speech-to-Text”, select Cartesia
- Configure custom model (if provided)
- Set language and options
- Save
Cost Comparison
Monthly Costs (1000 calls, 2 min avg)
Cost Example (1000 calls/month): Deepgram:- 1000 calls × 2 minutes = 2000 minutes
- 2000 × 0.0043 USD = 8.60 USD/month
- Cost per call: 0.0086 USD
- 1000 calls × 2 minutes = 2000 minutes
- 2000 × 0.0075 USD = 15 USD/month
- Cost per call: 0.015 USD
- Custom pricing (typically 5000-10000 USD/month for enterprise)
Language Support
Deepgram Languages
AssemblyAI Languages
AssemblyAI supports 99 languages with automatic detection. Auto-Detection:Configuration Options
Model Selection
Deepgram Models
Nova 2 (Default)Key Configuration Options
PunctuationAdvanced Features
Real-Time Transcription
Get transcripts as customer speaks:Accuracy Enhancements
Boost Accuracy:- Provide domain vocabulary
- Add expected phrases
- Use correct language setting
- Choose higher-accuracy model
Sentiment Analysis
Available through some providers:Troubleshooting
Transcripts are inaccurate
Transcripts are inaccurate
Try: Deepgram Nova 2 (best accuracy), AssemblyAI Enhanced model, or add domain vocabulary.
Calls sound cut off or delayed
Calls sound cut off or delayed
Switch to Deepgram (lowest latency). Check internet connection quality.
Cost is higher than expected
Cost is higher than expected
Monitor actual usage in provider dashboard. Consider switching to Deepgram for better pricing.
Not recognizing foreign languages
Not recognizing foreign languages
Ensure correct language is selected. Some providers need explicit language configuration.
API key authentication fails
API key authentication fails
Verify key is correct, not expired, and has correct permissions. Regenerate key in provider dashboard.
Best Practices
1. Start with Deepgram Nova 2
2. Monitor Accuracy
3. Use Language Setting
4. Enable Helpful Features
Performance Tips
Optimize Response Time
Monitor Quality
Create dashboards for:See Also
Text-to-Speech
Configure voice output
Speaker Identification
Identify different speakers
Agent Setup
Configure complete agents