Getting started

Models

AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.

Choosing the right model

Slam-1

  • Best for: English content requiring highest accuracy
  • Key benefits:
    • Superior accuracy for English content
    • Fine-tuning support
    • Ideal for domain-specific terminology

Universal

  • Best for: Production-ready transcription out of the box
  • Key benefits:
    • Excellent accuracy-to-latency ratio
    • Multi-language support
    • No configuration needed

Nano

  • Best for: Cost-sensitive applications with broad language needs
  • Key benefits:
    • Most cost-effective option
    • Widest language support
    • Fastest transcription speed

Streaming

  • Best for: Real-time voice applications and voice agents
  • Key benefits:
    • Sub-500ms initial response time
    • Continuous speech recognition
    • Ideal for interactive applications and voice agents

Pricing

For detailed pricing information, visit our pricing page.

ModelPrice per MinuteVolume discounts
Universal$0.37/hrAvailable
Slam-1$0.37/hrAvailable
Nano$0.12/hrAvailable
Streaming$0.47/hrAvailable

For volume discounts, please reach out to sales@assemblyai.com.

Next steps