Identify and distinguish multiple speakers in calls. Speaker Identification (also called diarization) enables the system to identify and distinguish between different speakers in a conversation. This guide covers setup and best practices.
Speaker diarization automatically labels who is speaking at each moment in an audio file.Example Output:
Copy
[00:00-00:05] Speaker 1 (Customer): "Hi, I'd like to place an order"[00:05-00:12] Speaker 2 (Agent): "Hello! I'd be happy to help you"[00:12-00:20] Speaker 1: "Great, I need..."[00:20-00:35] Speaker 2: "Perfect, let me look that up for you"
Speakers Detected: Identifies multiple speakersAccuracy: GoodCost: Included in standard pricingLatency: Real-timeLanguages: All supported languagesSpecial Features: Built into standard STT
Scenario: Monitoring agent performance and call quality
Copy
Benefits:✓ Verify agent followed procedures✓ Identify who said what in disputes✓ Measure talk time (agent vs customer)✓ Training and coaching improvements
Best For: Unknown number of speakersAccuracy: HighPerformance: Slightly slowerRecommended: Most cases
Specify Count:
Copy
If you know there are always 2 speakers (agent + customer):- Set to 2 speakers- Faster processing- More accurate identificationCommon Scenarios:- 2: Standard agent + customer call- 3: Agent + supervisor + customer- 4+: Conference calls or multi-agent calls
[00:00-00:05] SPEAKER_00: "Hello, this is customer service"[00:05-00:10] SPEAKER_01: "Hi! I have a question about my order"[00:10-00:15] SPEAKER_00: "I'd be happy to help!"[00:15-00:25] SPEAKER_01: "Great, I ordered on Tuesday and..."
{ "transcript": [ { "speaker": "SPEAKER_00", "start": 0.0, "end": 5.0, "text": "Hello, this is customer service" }, { "speaker": "SPEAKER_01", "start": 5.0, "end": 10.0, "text": "Hi! I have a question about my order" } ]}
[00:00] AGENT: "Hello! Thanks for calling ABC Company"[00:05] CUSTOMER: "Hi, I saw your ad and wanted to know more"[00:10] AGENT: "Great! Let me tell you about our products..."
Not every call needs diarizationOnly enable for:- Quality assurance calls- Escalations- Training calls- Compliance-critical callsSaves on cost for standard calls
When processing transcripts:- Label speakers by role (Agent, Supervisor, Customer)- Add timestamps for easy reference- Create searchable indexes- Archive for compliance
Benefits:- Trained on your specific voices- Better accuracy for known speakers- Speaker identification by name- Customized thresholdsCost: Enterprise pricingSetup: Contact provider
1. Diarization Error Rate (DER) - Lower is better (< 5% is excellent)2. Speaker Count Accuracy - Correct identification of number of speakers3. Speaker Labeling Accuracy - Correct assignment of labels to speakers4. Processing Latency - Time to generate diarization