Speech-to-Text Integration

Available Providers

Deepgram (Recommended - Best Performance)

Why Choose Deepgram:

Lowest latency (200ms)
Highest accuracy
Excellent cost/performance ratio
Best for real-time conversations

Latency: < 200ms (fastest) Accuracy: 95%+ Cost: ~0.0043 USD per minute Languages: 40+

AssemblyAI (Best for Accuracy)

Why Choose AssemblyAI:

Exceptional accuracy
Great documentation
Reliable service
Good cost structure

Latency: 200-400ms Accuracy: 95%+ Cost: ~0.0075 USD per minute Languages: 99+ Special Features: Speaker diarization, entity detection

Cartesia (Enterprise Grade)

Why Choose Cartesia:

Enterprise-level performance
Custom models available
Premium support
Dedicated infrastructure

Specifications:

Latency: < 250ms
Accuracy: 96%+
Cost: Custom pricing
Languages: 40+
Special Features: Domain adaptation, custom models

Provider Comparison

Feature	Deepgram	AssemblyAI	Cartesia
Latency	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Accuracy	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Cost	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
Languages	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Support	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Setup

Deepgram Setup (Recommended)

Step 1: Create Account

Visit deepgram.com
Sign up for free account
Verify email

Step 2: Get API Key

Go to Console
Click “API Keys” in left menu
Click “Create a new API Key”
Select “Scoped API Key”
Assign “speech-to-text” scope
Copy key

Step 3: Add to CallIntel

Go to Settings → Developer Settings
Click “API Keys”
Select “Deepgram” from provider list
Paste API key
Click “Test Connection”
Save

Step 4: Configure Agent

Create/Edit Agent
Under “Speech-to-Text”, select:
- nova-2 (default, best accuracy)
- nova (faster)
- enhanced (most accurate)
Select language and options
Save

AssemblyAI Setup

Step 1: Create Account

Visit assemblyai.com
Sign up for account
Verify email

Step 2: Get API Key

Go to Dashboard
Click “API Keys” in sidebar
Copy API key under “Your API Token”

Step 3: Add to CallIntel

Go to Settings → Developer Settings
Click “API Keys”
Select “AssemblyAI” from provider list
Paste API key
Click “Test Connection”
Save

Step 4: Configure Agent

Create/Edit Agent
Under “Speech-to-Text”, select:
- “Standard” (default)
- “Enhanced” (more accurate)
Configure language
Enable diarization if needed
Save

Cartesia Setup

Step 1: Contact Sales

Visit cartesia.ai
Request enterprise access
Complete onboarding
Receive API credentials

Step 2: Get API Key Cartesia provides during onboarding. Step 3: Add to CallIntel

Go to Settings → Developer Settings
Click “API Keys”
Select “Cartesia” from provider list
Paste API key
Click “Test Connection”
Save

Step 4: Configure Agent

Create/Edit Agent
Under “Speech-to-Text”, select Cartesia
Configure custom model (if provided)
Set language and options
Save

Cost Comparison

Monthly Costs (1000 calls, 2 min avg)

Cost Example (1000 calls/month): Deepgram:

1000 calls × 2 minutes = 2000 minutes
2000 × 0.0043 USD = 8.60 USD/month
Cost per call: 0.0086 USD

AssemblyAI:

1000 calls × 2 minutes = 2000 minutes
2000 × 0.0075 USD = 15 USD/month
Cost per call: 0.015 USD

Cartesia:

Custom pricing (typically 5000-10000 USD/month for enterprise)

Recommendation: Use Deepgram for cost-optimal performance.

Language Support

Deepgram Languages

English (US, UK, Australia, India, New Zealand)
Spanish, French, German, Italian, Dutch, Russian
Portuguese, Mandarin, Cantonese, Korean, Japanese
Arabic, Hindi, and 20+ more

Configure in Agent:

1. Edit Agent
2. Under STT, select language
3. Use language codes:
   - en (English)
   - es (Spanish)
   - fr (French)
   - de (German)
   - etc.
4. Save

AssemblyAI Languages

AssemblyAI supports 99 languages with automatic detection. Auto-Detection:

Set language to "Auto"
System detects language automatically
Great for multilingual operations

Configuration Options

Model Selection

Deepgram Models

Nova 2 (Default)

Best For: Most use cases
Accuracy: Highest
Speed: Fast
Cost: Standard
Recommended: YES ✓

Nova

Best For: Budget-conscious
Accuracy: Excellent
Speed: Very Fast
Cost: Standard

Enhanced

Best For: Noisy environments
Accuracy: Very High
Speed: Standard
Cost: Standard

Key Configuration Options

Punctuation

Enabled: Add punctuation to transcripts
Disabled: Raw text without punctuation
Default: Enabled
Recommendation: Enable for better readability

Profanity Filter

Enabled: Replace profanity with [profanity]
Disabled: Include all words
Default: Disabled
Recommendation: Enable for customer interactions

Number Conversion

Enabled: Convert "one" to "1"
Disabled: Keep as written text
Default: Enabled
Recommendation: Enable for phone numbers, quantities

Speaker Diarization (AssemblyAI)

Enabled: Identify different speakers
Disabled: Single speaker transcript
Cost: Additional (check pricing)
Recommendation: Enable for multi-party calls

Advanced Features

Real-Time Transcription

Get transcripts as customer speaks:

Advantage: Immediate feedback
Use Case: Live coaching, monitoring
Cost: Higher (streaming mode)
Latency: 200-500ms

Accuracy Enhancements

Boost Accuracy:

Provide domain vocabulary
Add expected phrases
Use correct language setting
Choose higher-accuracy model

Example:

Your company: "Acme Corp"
Products: "WidgetPro", "GadgetMax"
Add to STT configuration
Result: Better recognition of custom terms

Sentiment Analysis

Available through some providers:

Sentiment Detection:
- Positive, Negative, Neutral
- Confidence score
- Available in: AssemblyAI

Troubleshooting

Transcripts are inaccurate

Try: Deepgram Nova 2 (best accuracy), AssemblyAI Enhanced model, or add domain vocabulary.

Calls sound cut off or delayed

Switch to Deepgram (lowest latency). Check internet connection quality.

Cost is higher than expected

Monitor actual usage in provider dashboard. Consider switching to Deepgram for better pricing.

Not recognizing foreign languages

Ensure correct language is selected. Some providers need explicit language configuration.

API key authentication fails

Verify key is correct, not expired, and has correct permissions. Regenerate key in provider dashboard.

Best Practices

1. Start with Deepgram Nova 2

Best balance of:
- Accuracy (highest)
- Speed (fast)
- Cost (competitive)
- Languages (40+)

2. Monitor Accuracy

Weekly Review:
- Sample 10 calls
- Check transcript accuracy
- Note problem phrases
- Adjust configuration if needed

3. Use Language Setting

Always specify language explicitly
Auto-detection works but is slower
Specified language: 5-10ms faster

4. Enable Helpful Features

✓ Enable: Punctuation (readability)
✓ Enable: Number conversion (clarity)
✓ Consider: Profanity filter (compliance)
✓ Consider: Diarization (if multi-party)

Performance Tips

Optimize Response Time

Action               | Impact    | Recommendation
---------------------|-----------|----------------
Lower-latency model  | Critical  | Deepgram Nova 2
Specify language     | ~10ms     | Always specify
Enable streaming     | 200-300ms | Default in CallIntel
Reduce audio quality | -5ms      | Not recommended

Monitor Quality

Create dashboards for:

- Transcription accuracy %
- Processing latency (ms)
- Cost per minute
- Language breakdown
- Error rate

Text-to-Speech

Configure voice output

Speaker Identification

Identify different speakers

Agent Setup

Configure complete agents

Support

Deepgram Docs

Deepgram Documentation

AssemblyAI Docs

AssemblyAI Documentation

Contact Support

Email: callintel01@gmail.com

Getting started

Core Concepts

Admin Dashboard

Organization

Services

Integrations

AI Models

​Available Providers

​Deepgram (Recommended - Best Performance)

​AssemblyAI (Best for Accuracy)

​Cartesia (Enterprise Grade)

​Provider Comparison

​Setup

​Deepgram Setup (Recommended)

​AssemblyAI Setup

​Cartesia Setup

​Cost Comparison

​Monthly Costs (1000 calls, 2 min avg)

​Language Support

​Deepgram Languages

​AssemblyAI Languages

​Configuration Options

​Model Selection

​Deepgram Models

​Key Configuration Options

​Advanced Features

​Real-Time Transcription

​Accuracy Enhancements

​Sentiment Analysis

​Troubleshooting

​Best Practices

​1. Start with Deepgram Nova 2

​2. Monitor Accuracy

​3. Use Language Setting

​4. Enable Helpful Features

​Performance Tips

​Optimize Response Time

​Monitor Quality

​See Also

Text-to-Speech

Speaker Identification

Agent Setup

​Support

Deepgram Docs

AssemblyAI Docs

Contact Support

Available Providers

Deepgram (Recommended - Best Performance)

AssemblyAI (Best for Accuracy)

Cartesia (Enterprise Grade)

Provider Comparison

Setup

Deepgram Setup (Recommended)

AssemblyAI Setup

Cartesia Setup

Cost Comparison

Monthly Costs (1000 calls, 2 min avg)

Language Support

Deepgram Languages

AssemblyAI Languages

Configuration Options

Model Selection

Deepgram Models

Key Configuration Options

Advanced Features

Real-Time Transcription

Accuracy Enhancements

Sentiment Analysis

Troubleshooting

Best Practices

1. Start with Deepgram Nova 2

2. Monitor Accuracy

3. Use Language Setting

4. Enable Helpful Features

Performance Tips

Optimize Response Time

Monitor Quality

See Also

Support