Skip to main content

Available Providers

Why Choose Deepgram:
  • Lowest latency (200ms)
  • Highest accuracy
  • Excellent cost/performance ratio
  • Best for real-time conversations
Latency: < 200ms (fastest) Accuracy: 95%+ Cost: ~0.0043 USD per minute Languages: 40+

AssemblyAI (Best for Accuracy)

Why Choose AssemblyAI:
  • Exceptional accuracy
  • Great documentation
  • Reliable service
  • Good cost structure
Latency: 200-400ms Accuracy: 95%+ Cost: ~0.0075 USD per minute Languages: 99+ Special Features: Speaker diarization, entity detection

Cartesia (Enterprise Grade)

Why Choose Cartesia:
  • Enterprise-level performance
  • Custom models available
  • Premium support
  • Dedicated infrastructure
Specifications:
  • Latency: < 250ms
  • Accuracy: 96%+
  • Cost: Custom pricing
  • Languages: 40+
  • Special Features: Domain adaptation, custom models

Provider Comparison

FeatureDeepgramAssemblyAICartesia
Latency⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Accuracy⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Cost⭐⭐⭐⭐⭐⭐⭐⭐⭐
Languages⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Support⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Setup

Step 1: Create Account
  1. Visit deepgram.com
  2. Sign up for free account
  3. Verify email
Step 2: Get API Key
  1. Go to Console
  2. Click “API Keys” in left menu
  3. Click “Create a new API Key”
  4. Select “Scoped API Key”
  5. Assign “speech-to-text” scope
  6. Copy key
Step 3: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Click “API Keys”
  3. Select “Deepgram” from provider list
  4. Paste API key
  5. Click “Test Connection”
  6. Save
Step 4: Configure Agent
  1. Create/Edit Agent
  2. Under “Speech-to-Text”, select:
    • nova-2 (default, best accuracy)
    • nova (faster)
    • enhanced (most accurate)
  3. Select language and options
  4. Save

AssemblyAI Setup

Step 1: Create Account
  1. Visit assemblyai.com
  2. Sign up for account
  3. Verify email
Step 2: Get API Key
  1. Go to Dashboard
  2. Click “API Keys” in sidebar
  3. Copy API key under “Your API Token”
Step 3: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Click “API Keys”
  3. Select “AssemblyAI” from provider list
  4. Paste API key
  5. Click “Test Connection”
  6. Save
Step 4: Configure Agent
  1. Create/Edit Agent
  2. Under “Speech-to-Text”, select:
    • “Standard” (default)
    • “Enhanced” (more accurate)
  3. Configure language
  4. Enable diarization if needed
  5. Save

Cartesia Setup

Step 1: Contact Sales
  1. Visit cartesia.ai
  2. Request enterprise access
  3. Complete onboarding
  4. Receive API credentials
Step 2: Get API Key Cartesia provides during onboarding. Step 3: Add to CallIntel
  1. Go to Settings → Developer Settings
  2. Click “API Keys”
  3. Select “Cartesia” from provider list
  4. Paste API key
  5. Click “Test Connection”
  6. Save
Step 4: Configure Agent
  1. Create/Edit Agent
  2. Under “Speech-to-Text”, select Cartesia
  3. Configure custom model (if provided)
  4. Set language and options
  5. Save

Cost Comparison

Monthly Costs (1000 calls, 2 min avg)

Cost Example (1000 calls/month): Deepgram:
  • 1000 calls × 2 minutes = 2000 minutes
  • 2000 × 0.0043 USD = 8.60 USD/month
  • Cost per call: 0.0086 USD
AssemblyAI:
  • 1000 calls × 2 minutes = 2000 minutes
  • 2000 × 0.0075 USD = 15 USD/month
  • Cost per call: 0.015 USD
Cartesia:
  • Custom pricing (typically 5000-10000 USD/month for enterprise)
Recommendation: Use Deepgram for cost-optimal performance.

Language Support

Deepgram Languages

English (US, UK, Australia, India, New Zealand)
Spanish, French, German, Italian, Dutch, Russian
Portuguese, Mandarin, Cantonese, Korean, Japanese
Arabic, Hindi, and 20+ more
Configure in Agent:
1. Edit Agent
2. Under STT, select language
3. Use language codes:
   - en (English)
   - es (Spanish)
   - fr (French)
   - de (German)
   - etc.
4. Save

AssemblyAI Languages

AssemblyAI supports 99 languages with automatic detection. Auto-Detection:
1. Set language to "Auto"
2. System detects language automatically
3. Great for multilingual operations

Configuration Options

Model Selection

Deepgram Models

Nova 2 (Default)
Best For: Most use cases
Accuracy: Highest
Speed: Fast
Cost: Standard
Recommended: YES ✓
Nova
Best For: Budget-conscious
Accuracy: Excellent
Speed: Very Fast
Cost: Standard
Enhanced
Best For: Noisy environments
Accuracy: Very High
Speed: Standard
Cost: Standard

Key Configuration Options

Punctuation
Enabled: Add punctuation to transcripts
Disabled: Raw text without punctuation
Default: Enabled
Recommendation: Enable for better readability
Profanity Filter
Enabled: Replace profanity with [profanity]
Disabled: Include all words
Default: Disabled
Recommendation: Enable for customer interactions
Number Conversion
Enabled: Convert "one" to "1"
Disabled: Keep as written text
Default: Enabled
Recommendation: Enable for phone numbers, quantities
Speaker Diarization (AssemblyAI)
Enabled: Identify different speakers
Disabled: Single speaker transcript
Cost: Additional (check pricing)
Recommendation: Enable for multi-party calls

Advanced Features

Real-Time Transcription

Get transcripts as customer speaks:
Advantage: Immediate feedback
Use Case: Live coaching, monitoring
Cost: Higher (streaming mode)
Latency: 200-500ms

Accuracy Enhancements

Boost Accuracy:
  1. Provide domain vocabulary
  2. Add expected phrases
  3. Use correct language setting
  4. Choose higher-accuracy model
Example:
Your company: "Acme Corp"
Products: "WidgetPro", "GadgetMax"
Add to STT configuration
Result: Better recognition of custom terms

Sentiment Analysis

Available through some providers:
Sentiment Detection:
- Positive, Negative, Neutral
- Confidence score
- Available in: AssemblyAI

Troubleshooting

Try: Deepgram Nova 2 (best accuracy), AssemblyAI Enhanced model, or add domain vocabulary.
Switch to Deepgram (lowest latency). Check internet connection quality.
Monitor actual usage in provider dashboard. Consider switching to Deepgram for better pricing.
Ensure correct language is selected. Some providers need explicit language configuration.
Verify key is correct, not expired, and has correct permissions. Regenerate key in provider dashboard.

Best Practices

1. Start with Deepgram Nova 2

Best balance of:
- Accuracy (highest)
- Speed (fast)
- Cost (competitive)
- Languages (40+)

2. Monitor Accuracy

Weekly Review:
- Sample 10 calls
- Check transcript accuracy
- Note problem phrases
- Adjust configuration if needed

3. Use Language Setting

Always specify language explicitly
Auto-detection works but is slower
Specified language: 5-10ms faster

4. Enable Helpful Features

✓ Enable: Punctuation (readability)
✓ Enable: Number conversion (clarity)
✓ Consider: Profanity filter (compliance)
✓ Consider: Diarization (if multi-party)

Performance Tips

Optimize Response Time

Action               | Impact    | Recommendation
---------------------|-----------|----------------
Lower-latency model  | Critical  | Deepgram Nova 2
Specify language     | ~10ms     | Always specify
Enable streaming     | 200-300ms | Default in CallIntel
Reduce audio quality | -5ms      | Not recommended

Monitor Quality

Create dashboards for:
- Transcription accuracy %
- Processing latency (ms)
- Cost per minute
- Language breakdown
- Error rate

See Also


Support

Contact Support