Tutorials

Transcribe an audio file with Universal-1 in Node.js

Learn how to transcribe an audio file in your Node.js applications with industry-leading accuracy using Universal-1.

Transcribe an audio file with Universal-1 in Node.js

We recently announced our latest speech model, Universal-1, which sets a new standard for speech-to-text accuracy. Trained on millions of hours of audio data, Universal-1 demonstrates near-human accuracy, even with accented speech, background noise, and difficult phrases like flight numbers and email addresses.

Universal-1 is also an order of magnitude faster than our previous model, Conformer-2, and supports English, Spanish, French, and German, with more languages coming shortly.

Along with Universal-1, we’ve also introduced two new pricing tiers: Best and Nano.

  • Best lets you take advantage of Universal-1 for applications where accuracy is paramount.
  • Nano is our new cost-effective tier with support for 99 different languages.

In this post, you’ll learn how to transcribe an audio file in your Go applications using Universal-1 and Nano.

Set up the AssemblyAI Node.js SDK

The easiest way to start transcribing audio is by using one of our official SDKs.

To install the AssemblyAI Go SDK, run the following command in the same directory as your Go project:

npm install assemblyai

Import the NPM package and configure a new authenticated SDK client using your API key from your account dashboard.

import { AssemblyAI } from 'assemblyai'

const client = new AssemblyAI({
  apiKey: 'YOUR_API_KEY' 
})
Get Free API Key

You'll find all the operations you need to transcribe audio on the client object.

Transcribe an audio file using Universal-1

By default, all transcriptions use the Best tier, so you’ll always get the highest accuracy without any extra configuration.

To transcribe an audio file from a URL using Best tier, create a new file called main.ts with the following code:

import { AssemblyAI } from 'assemblyai'

const client = new AssemblyAI({
  apiKey: 'YOUR_API_KEY' 
})

const audioUrl =
  'https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3'

const params = {
  audio: audioUrl,
}

const run = async () => {
  const transcript = await client.transcripts.transcribe(params)
  console.log(transcript.text)

  for (let utterance of transcript.utterances!) {
    console.log(`Speaker ${utterance.speaker}: ${utterance.text}`)
  }
}

run()

You can also change the audioUrl to a file path to transcribe an audio file available on your local computer:

const audioUrl = './5_common_sports_injuries.mp3'

Nano—a cost-effective speech-to-text alternative

Switching between Best and Nano is only a matter of setting speech_model in your transcription parameters.

const params = {
  audio: audioUrl,
  speech_model: "nano",
}

Best or Nano, which one is right for you?

With two speech-to-text options, you might wonder which one you should use for your application.

We recommend using Best for applications where it’s critical to get accurate, high-quality transcripts—for example, when you want to display the transcript to your end user.

If you have high-volume transcriptions and looking to reduce costs, or if you need additional language support, we encourage you to try Nano.

We encourage you to compare the results to find which one works best for the application you’re building.

To read more about Universal-1, see our research article.