Models
AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.
Slam-1
Highest accuracy for English with fine-tuning support and customization via prompting
Universal
Best for out-of-the-box transcription with excellent accuracy and low latency
Nano
Most cost-effective with broad language support
Streaming
Optimized for real-time applications with sub-500ms initial response
Choosing the right model
Slam-1
- Best for: English content requiring highest accuracy
- Key benefits:
- Superior accuracy for English content
- Fine-tuning support
- Ideal for domain-specific terminology
Universal
- Best for: Production-ready transcription out of the box
- Key benefits:
- Excellent accuracy-to-latency ratio
- Multi-language support
- No configuration needed
Nano
- Best for: Cost-sensitive applications with broad language needs
- Key benefits:
- Most cost-effective option
- Widest language support
- Fastest transcription speed
Streaming
- Best for: Real-time voice applications and voice agents
- Key benefits:
- Sub-500ms initial response time
- Continuous speech recognition
- Ideal for interactive applications and voice agents
Pricing
For detailed pricing information, visit our pricing page.
For volume discounts, please reach out to sales@assemblyai.com.
Next steps
- For pre-recorded audio, see how to select your model
- For real-time transcription, check out our streaming documentation