Simple transparent pricing
Free
Start building with $50 of free credits
- Access to industry-leading Speech-to-Text and Audio Intelligence models
- Speech recognition
- Speaker diarization
- Custom spelling and vocabulary
- Profanity filtering, auto punctuation and casing
- Transcribe up to 416 hours of audio for free
- Get tips and support as you build from developer docs and Community resources
Pay as you go
Start as low as $0.12/hr for Speech-to-Text
- Unlimited access to Speech-to-Text, Audio Intelligence, and LeMUR
- Streaming Speech-to-Text
- Concurrency starting at 200 files and 100 streams
- Technical support via live chat and email
Custom
Start building with $50 of free credit
- Flexible, zero-obligation pricing that scales to millions of hours
- Dedicated technical support with response time under one hour
- Customize rate limits - scale to any workload
- Customized SLAs and SLOs
- Compliance with EU Data Residency standards
- Self-hosted deployments (On-prem, VPC) (Coming soon!)
- Early access to new models and model improvements
- Available through AWS Marketplace
Speech-to-Text
Build on top of the most accurate Speech-to-Text model on the market with >93% accuracy
Tiers | |||
---|---|---|---|
Nano Fast, lightweight Speech AI at an accessible price point | Free up to $50 | $0.12 /hr | Lower rates based on volume |
Best Highest accuracy, and most advanced capabilities | Free up to $50 | $0.37 /hr | Lower rates based on volume |
Features | |||
Speaker Diarization Automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker. | |||
Automatic Language Detection Automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription. | |||
Profanity Filtering Automatically detect and replace profanity in the transcription text. | |||
Custom Vocabulary Only available in Best tier. Boost accuracy for vocabulary that is unique or custom to your specific use case or product. | |||
Multichannel Transcribe audio files with multiple speakers separately. | |||
Filler Word Filtering Optionally include disfluencies in the transcripts of your audio files. | |||
Custom Spelling Specify how you would like certain words to be spelled or formatted in the transcription text. | |||
Word Timestamps Word-by-word timestamps across the entire transcript text. | |||
Auto Punctuation and Casing Automatically add casing and punctuation of proper nouns to the transcription text. |
Streaming Speech-to-Text
Transcribe live audio and video files synchronously at low latency and high quality
Tiers | |||
---|---|---|---|
Best Highest accuracy, and most advanced capabilities | $0.47 /hr | Lower rates based on volume | |
Features | |||
Auto Punctuation and Casing Automatically add casing and punctuation of proper nouns to the transcription text. | |||
Custom Vocabulary Only available in Best tier. Boost accuracy for vocabulary that is unique or custom to your specific use case or product. | |||
End of Utterance Detection Customize End of Utterance Detection to more accurately detect when one speaker finishes an utterance in Streaming Speech-to-Text. | |||
ITN/Formatting Automatically convert spoken form text into its proper written format to increase transcript readability. |
Speech Understanding
Transcribe live audio and video files synchronously at low latency and high quality
LeMUR Models | |||
---|---|---|---|
Claude 3.5 Sonnet Claude 3.5 Sonnet is the most intelligent model to date, outperforming Claude 3 Opus on a wide range of evaluations, with the speed and cost of Claude 3 Sonnet. | $0.003 / 1k tokens (Input) $0.015 / 1k tokens (Output) | $0.003 / 1k tokens (Input) $0.015 / 1k tokens (Output) | |
Claude 3 Opus Claude 3 Opus is good at handling complex analysis, longer tasks with many steps, and higher-order math and coding tasks. | $0.015 / 1k tokens (Input) $0.075 / 1k tokens (Output) | $0.015 / 1k tokens (Input) $0.075 / 1k tokens (Output) | |
Claude 3 Haiku Claude 3 Haiku is the fastest model that can execute lightweight actions. | $0.00025 / 1k tokens (Input) $0.00125 / 1k tokens (Output) | $0.00025 / 1k tokens (Input) $0.00125 / 1k tokens (Output) | |
Claude 3 Sonnet Claude 3 Sonnet is a legacy model with a balanced combination of performance and speed for efficient, high-throughput tasks. | $0.003 / 1k tokens (Input) $0.015 / 1k tokens (Output) | $0.003 / 1k tokens (Input) $0.015 / 1k tokens (Output) | |
Audio Intelligence Features | |||
Entity Detection Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations. | Included in free credits | $0.08 /hr | Lower rates based on volume |
Topic Detection Label the topics that are spoken in your audio and video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for contextual targeting. | Included in free credits | $0.15 /hr | Lower rates based on volume |
Key Phrases Accurately identify significant words and phrases in your transcription, enabling you to extract the most pertinent concepts or highlights from your audio/video file. | Included in free credits | $0.01 /hr | Lower rates based on volume |
PII Audio Redaction | Included in free credits | $0.05 /hr | Lower rates based on volume |
PII Redaction Identify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the transcription text before it is returned to you. | Included in free credits | $0.08 /hr | Lower rates based on volume |
Sentiment Analysis With Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files. | Included in free credits | $0.02 /hr | Lower rates based on volume |
Content Moderation Detect sensitive content in your audio and video files - such as hate speech, violence, sensitive social issues, alcohol, drugs, and more. | Included in free credits | $0.15 /hr | Lower rates based on volume |
Auto Chapters Automatically generate a summary over time for audio and video files. | Included in free credits | $0.08 /hr | Lower rates based on volume |
Summarization Leverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale. Customize the summary types to best fit your use case. | Included in free credits | $0.03 /hr | Lower rates based on volume |
Rate Limits
Frequently Asked Questions
AssemblyAI’s Best tier is our most robust and accurate offering, houses our most powerful models, and has the broadest range of capabilities. The Best tier is suited for use cases where accuracy and power are paramount. AssemblyAI’s Nano tier is a fast, lightweight offering that gives product and development teams access to Speech AI at an attainable price point across 99 languages. It is best for teams with extensive language needs, and those who are looking for a low-cost Speech AI option.
Yes! With the free offer, you get $50 in credits to use towards AssemblyAI’s Speech-to-Text APIs. To add more credits and gain access to Streaming and LeMUR, simply add a credit card to your account.
Absolutely! If you plan to send large volumes of audio and video content through our API, please reach out to us here to see if you qualify for a volume discount.
Most audio files sent to AssemblyAI's API can be processed in less than 60 seconds.
Great question. Once you add a credit card and deposit funds into your account, your account's funds will be drained as you use the API.
When multichannel is enabled, each channel will be transcribed and billed separately. The total cost is calculated by taking the hourly transcription rate (billed per second) and multiplying it by the number of channels. To calculate your total cost, simply multiply your recording's duration by the hourly rate, then multiply that result by the number of channels.
For example, if you sent a 5-minute recording with three channels, you would be billed for the 5 minutes of audio multiplied by the standard rate, with that total multiplied by three channels. This is equivalent to being billed for 15 minutes of audio.
You can also get started with AssemblyAI on the AWS Marketplace—or ask your AWS account team about how to leverage AssemblyAI to revolutionize the way your company understands its customers.
Feel free to email us at support@assemblyai.com, or click the chat button in the bottom right corner of your browser to chat live with our API Support team!
We support over 99 languages and counting, including Global English (English and all of its accents).
In the context of a Large Language Model (LLM), a “token” is the smallest unit of text processed by the model. 100 tokens roughly maps to ~75 words.
Turn voice data into unparalleled product experiences
Partner with the leader in Speech AI to build powerful products with breakthrough industry impact.
