Evaluations
Choosing the right Speech-to-text model for your product requires more than reviewing public benchmarks. Public benchmarks can be misleading due to overfitting — models are often trained on the same datasets used for evaluation, inflating their reported accuracy.
Running an evaluation on your own audio data is the most reliable way to determine which model performs best for your specific use case. AssemblyAI provides evaluation tools for both pre-recorded and streaming transcription, measuring metrics that matter in production.
Pre-recorded audio evaluations
Assess which pre-recorded audio STT model is best for your use case. Pre-recorded evaluations measure accuracy using metrics like Word Error Rate (WER) and Full-Word Error Rate (FWER), giving you a clear picture of transcription quality on your actual audio.
Streaming evaluations
Assess which streaming STT model is best for your voice agent or real-time use case. Streaming evaluations focus on latency metrics like Time to First Token (TTFT) and Time to Complete Turn (TTCT) alongside accuracy, since both speed and correctness matter for real-time applications.
Benchmarks
If you want to review AssemblyAI’s current model performance before running your own evaluation, see our benchmarks for the latest accuracy and latency numbers across pre-recorded and streaming models.