Best Practices for Building Medical Scribes
Introduction
Building a robust medical scribe requires careful consideration of accuracy, latency, speaker identification, and real-time capabilities while maintaining HIPAA compliance and clinical documentation standards. This guide addresses common questions and provides practical solutions for both post-visit and live encounter transcription scenarios.
Why AssemblyAI for Medical Scribes?
AssemblyAI stands out as the premier choice for medical scribes with several key advantages:
Industry-Leading Accuracy with Pre-recorded Audio
- Universal-3-Pro model delivers exceptional accuracy for medical terminology and clinical documentation
- 2.9% speaker diarization error rate for precise attribution between provider and patient
- Comprehensive LLM Gateway integration for intelligent post-processing into structured clinical notes
Streaming with Universal-3 Pro
As medical scribes evolve toward real-time documentation, AssemblyAI’s Universal-3 Pro Streaming model (u3-rt-pro) offers significant benefits:
- Ultra-low latency (~300ms) enables live transcription during patient encounters
- Format turns feature provides structured, speaker-aware output in real-time
- Keyterms prompt allows providing medical context and patient history to improve accuracy
- Medical mode (
domain: "medical-v1") for improved medical terminology accuracy
End-to-End Voice AI Platform
Unlike fragmented solutions, AssemblyAI provides a unified API for:
- Transcription with speaker diarization (provider vs. patient)
- Medical terminology recognition and contextual understanding
- HIPAA-compliant PII redaction on both text and audio
- Post-processing workflows with LLM Gateway - from SOAP notes to completely custom clinical documentation
- Streaming and pre-recorded transcription in a single platform
- Compliance and Security built for medical workloads (BAA, HIPAA, DPA, etc.)
When Should I Use Pre-recorded vs Streaming for Medical Scribes?
Understanding when to use pre-recorded versus streaming is critical for clinical workflows.
Use Pre-recorded (Universal-3-Pro) when:
Post-visit documentation - Encounter already happened, need highest accuracy
- Maximum accuracy required - Universal-3-Pro has highest medical terminology accuracy
- Complex medical terminology - Rare medications, genetic conditions, specialized procedures
- HIPAA compliance critical - Full PII redaction with audio de-identification
- Structured note generation - SOAP notes, H&P, discharge summaries via LLM Gateway
- Quality assurance - Review and editing workflow needed
- Specialty documentation - Oncology, cardiology, neurology with complex terminology
- Speaker diarization needed - Automatic provider vs. patient separation
Best for: Post-visit SOAP notes, specialist consultations, hospital discharge summaries, quality review
Use Streaming (Universal-3 Pro Streaming) When:
Live encounter documentation - Real-time transcription during patient visit
- Immediate documentation - No delay between encounter and note
- Telemedicine visits - Document while seeing patient virtually
- Emergency department - Fast-paced, immediate documentation needed
- Primary care visits - Standard encounters with common terminology
- Real-time review - Provider can review and correct during visit
- Ambient documentation - Microphone running throughout encounter
Best for: Telemedicine, primary care visits, ED encounters, real-time clinical decision support
Hybrid Approach (Recommended)
Many medical scribes use both:
- Streaming during visit - Real-time documentation, immediate review by provider
- Universal-3-Pro post-processing - Run audio through Universal-3-Pro after visit for:
- Highest accuracy verification
- Complex terminology correction
- Complete HIPAA compliance workflow
- Final structured note generation
- Speaker diarization (provider vs. patient)
Example workflow:
- Provider sees patient → Streaming captures real-time notes
- Visit ends → Audio sent to Universal-3-Pro for final high-accuracy transcription
- LLM Gateway generates structured SOAP note from high-accuracy transcript
- Provider reviews and signs final note
This gives real-time utility during visits while ensuring maximum accuracy for official documentation.
What Languages and Features for a Medical Scribe?
Pre-Recorded doctor patient visits (Universal-3-Pro)
Languages: For post-visit documentation, Universal-3-Pro supports English for the highest accuracy transcription. If you want to use other languages, Universal-2 is a suitable alternative.
Core Features:
- Speaker diarization (provider-patient separation)
- Multichannel audio support — when provider and patient are on separate audio channels, enables perfect speaker separation without diarization
- Automatic formatting, punctuation, and capitalization
- Keyterms Prompting for medical specialties and conditions (up to 1000 terms for Universal-3-Pro)
- Ability to prompt on related medical terms and improve the accuracy of others (for example,
ibuprofenimprovingnaproxen) - Natural language prompting (Universal-3-Pro) — up to 1,500 words to guide transcription behavior
Speech Understanding Models:
- Entity detection for medications, conditions, and procedures
- Sentiment analysis for patient experience insights
- Speaker identification for separating doctor and patient in a visit
Guardrails:
- PII redaction on text and audio for HIPAA compliance
Real-Time Streaming (Universal-3 Pro Streaming)
Languages:
For live encounter transcription, Universal-3 Pro Streaming (u3-rt-pro) supports English, optimized for medical contexts with the highest streaming accuracy. Use with domain: "medical-v1" for improved medical terminology recognition.
Streaming-Specific Features:
- Partial and final transcripts for responsive documentation
- Format turns for structured provider-patient dialogue
- Keyterms Prompt for patient history and current medications (up to 1000 terms)
- Natural language prompting (up to 1,500 words) for guiding transcription behavior
- Turn detection with configurable silence thresholds for natural clinical conversation flow
- Mid-session configuration updates via
UpdateConfigurationmessages — dynamically update keyterms and prompt mid-session. Use keyterms for known context like prescription medications and disease names, and use the prompt for unknown context like “this is a doctor-patient visit in a cardiology clinic” - Post-processing LLM Gateway integration for increasing medical accuracy
Recommended approach: Use streaming for real-time documentation, then run through Universal-3-Pro post-visit for accurate speaker-labeled final notes.
For full details, see the Universal-3 Pro Streaming documentation.
How Can I Get Started Building a Post-Visit Medical Scribe?
Here’s a complete example implementing pre-recorded transcription with Universal-3-Pro:
How Can I Get Started Building a Real-Time Medical Scribe?
Here’s a complete example for real-time streaming transcription with LLM post-processing:
How Do I Handle HIPAA Compliance?
HIPAA compliance is mandatory for all medical transcription workflows. Here’s how to ensure your medical scribe meets requirements:
Required HIPAA Guardrails
1. Business Associate Agreement (BAA)
- AssemblyAI provides a BAA for healthcare customers
- Required before processing any PHI
- Contact us to execute BAA
2. PII Redaction (Required)
3. Secure Audio Storage
4. Access Controls
5. Audit Logging
For complete HIPAA guidance, see our Healthcare Compliance Guide.
What Workflows Can I Build for My AI Medical Scribe?
Use these flags to transform raw medical conversations into structured clinical documentation. Below is plain-English behavior, output shape, and clinical use cases for each option.
Entity Detection (Medical)
entity_detection: true
What it does: Extracts medical entities (medications, conditions, procedures, anatomy).
Output: Array of { entity_type, text, start, end }.
Great for: Medication reconciliation, problem list updates, procedure coding.
Notes: Recognizes brand/generic drug names, medical conditions, surgical procedures. Entity types include drug, medical_condition, and medical_process.
Redact PII Text (HIPAA Compliance)
redact_pii: true
What it does: Scans transcript for Protected Health Information and replaces per HIPAA requirements.
Output: text with PHI replaced; original timing preserved.
Great for: De-identification, research datasets, training data.
Notes: Covers all 18 HIPAA identifiers when properly configured.
redact_pii_policies: [person_name, date_of_birth, healthcare_number, phone_number, email_address]
Restricts redaction scope to key HIPAA identifiers:
person_name– patient and provider namesdate_of_birth– full or partial DOBhealthcare_number– MRN, health plan numbersphone_number– contact numbersemail_address– electronic addresses
Why this set: Ensures HIPAA compliance while preserving clinical content for documentation.
redact_pii_sub: hash
What it does: Replaces each PHI span with a stable hash token.
Example:
"Patient John Doe, DOB 1/15/1980, MRN 12345" ⟶
"Patient #2af4…, DOB #7b91…, MRN #e13c…"
Benefits:
- Maintains referential integrity across document
- Preserves sentence structure for NLP/LLM processing
- Prevents reconstruction of original PHI
Redact PII Audio (HIPAA Compliance)
redact_pii_audio: true
What it does: Produces HIPAA-compliant audio with PHI portions silenced.
Output: redacted_audio_url in the transcript payload.
Great for: Quality assurance, training, research.
Notes: Original audio preserved separately; ensure proper access controls.
Sentiment Analysis (Patient Experience)
sentiment_analysis: true
What it does: Analyzes emotional tone of patient responses.
Output: Array of { text, sentiment, confidence, start, end }.
Great for: Patient satisfaction, pain assessment, mental health screening.
Notes: Helpful for identifying distressed or dissatisfied patients.
End-to-End Clinical Documentation Effect
Clinical Documentation Example
Original Encounter:
“Hi, I’m Dr. Smith. John Doe, born 1/15/1980, is here for follow-up. He’s taking metformin 1000mg twice daily for his diabetes.”
With medical scribe settings:
- Text: “Hi, I’m #2af4…. #7b91…, born #e13c…, is here for follow-up. He’s taking metformin 1000mg twice daily for his diabetes.”
- Entities:
[ { entity_type: "drug", text: "metformin 1000mg" }, { entity_type: "medical_condition", text: "diabetes" } ] - Clinical note: Structured SOAP format via LLM Gateway
- Redacted audio: PHI portions silenced for compliance
LLM Gateway for Clinical Notes
Our LLM Gateway enables transformation of raw transcripts into structured clinical documentation using the same API.
Here’s a complete example of generating structured SOAP notes from medical encounter transcripts:
Advanced SOAP Note Features
How Do I Improve the Accuracy of My Medical Scribe?
Medical Keyterms Strategy
The most effective approach for medical keyterms:
1. Patient-Specific Context
2. Specialty-Specific Terms
3. Visit-Specific Context
Using Keyterms Prompt for Streaming with LLM Gateway Enhancement
Common Medical Terminology - Top 1000 Terms
Even if you don’t know the context of a specific medical conversation, you can boost the accuracy of transcription by providing the top 1000 medical words in your field.
How Should I Handle Pre-recorded Transcription in Production?
Webhook Callbacks (Recommended)
For high-volume clinical workflows, use webhooks instead of polling:
Webhook handler example:
Scaling Considerations
- Rate limits: 20,000 POST requests per 5-minute window
- Concurrent transcriptions: 200+ for paid accounts (queued beyond that)
- Ramp up gradually - Start at 10-50 concurrent, double incrementally
- Use exponential backoff with jitter for 429 errors
- Contact Sales before large-scale rollouts
How Can I Improve the Latency of My Medical Scribe?
Async Chunking for Long Encounters
For lengthy patient visits, implement chunking to get progressive documentation. This is especially useful for:
- Hospital rounds (in-person microphone running ambient)
- Comprehensive physicals
- Specialty consultations
When to Use Streaming Instead
For optimal clinical workflow integration, streaming is ideal when:
-
Real-time documentation needed:
- Emergency department encounters
- Telemedicine visits
- Procedure documentation
-
Immediate clinical decision support:
- Medication interaction checking
- Diagnosis suggestion
- Protocol reminders
-
Live quality assurance:
- Compliance monitoring
- Training supervision
- Documentation coaching
Streaming provides:
- ~300ms latency for immediate documentation
- Real-time partial results for provider review
- No delay between encounter end and note availability
- Live clinical decision support integration
How Can I Use Speaker Identification for Doctor and Patient Recognition?
Speaker Identification can automatically distinguish between doctors and patients in medical encounters, replacing generic “Speaker A” and “Speaker B” labels with meaningful role-based identifiers.
Why Use Speaker Identification in Medical Scribes?
Clinical Benefits:
- Clear attribution - Know exactly who said what in clinical documentation
- SOAP note structure - Automatically separate subjective (patient) from objective (provider) statements
- Compliance documentation - Proper attribution for regulatory requirements
- Quality assurance - Review provider-patient communication patterns
- Training analysis - Analyze communication styles for medical education
Medical Speaker Identification Setup
Method 1: Role-Based Identification (Recommended)
Method 2: Name-Based Identification
For scenarios where you know the specific doctor’s name: