Key Phrases - AssemblyAI

Supported languages

Supported models

Supported regions

US & EU

The Key Phrases model identifies significant words and phrases in your transcript and lets you extract the most important concepts or highlights from your audio or video file.

Key phrase results are not embedded inline in the transcript text field. They are returned in a separate auto_highlights_result object in the API response, which contains each phrase along with its relevancy rank, occurrence count, and timestamps. See the Response section and code examples below for how to access this field.

Quickstart

Python
Python SDK
JavaScript
JavaScript SDK

Enable Key Phrases by setting auto_highlights to True in the JSON payload.

import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    "authorization": "<YOUR_API_KEY>"
}

with open("./local_file.mp3", "rb") as f:
    response = requests.post(base_url + "/v2/upload",
                            headers=headers,
                            data=f)

upload_url = response.json()["upload_url"]

data = {
    "audio_url": upload_url, # You can also use a URL to an audio or video file on the web
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": True,
    "auto_highlights": True
}

url = base_url + "/v2/transcript"
response = requests.post(url, json=data, headers=headers)

transcript_id = response.json()['id']
polling_endpoint = base_url + "/v2/transcript/" + transcript_id

print(f"Transcript ID: {transcript_id}")

while True:
    transcription_result = requests.get(polling_endpoint, headers=headers).json()

    if transcription_result['status'] == 'completed':
        for result in transcription_result['auto_highlights_result']['results']:
            print(f"Highlight: {result['text']}, Count: {result['count']}, Rank: {result['rank']}, Timestamps: {result['timestamps']}")
        break
    elif transcription_result['status'] == 'error':
        raise RuntimeError(f"Transcription failed: {transcription_result['error']}")
    else:
        time.sleep(3)

Enable Key Phrases by setting auto_highlights to True in the transcription config.

import assemblyai as aai

aai.settings.api_key = "<YOUR_API_KEY>"

# audio_file = "./local_file.mp3"
audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    language_detection=True,
    auto_highlights=True
)

transcript = aai.Transcriber().transcribe(audio_file, config)
print(f"Transcript ID: {transcript.id}")

for result in transcript.auto_highlights.results:
    print(f"Highlight: {result.text}, Count: {result.count}, Rank: {result.rank}, Timestamps: {result.timestamps}")

Enable Key Phrases by setting auto_highlights to true in the JSON payload.

import fs from "fs-extra";

const baseUrl = "https://api.assemblyai.com";

const headers = {
  authorization: "<YOUR_API_KEY>",
};

const path = "./my-audio.mp3";
const audioData = await fs.readFile(path);
let res = await fetch(`${baseUrl}/v2/upload`, {
  method: "POST",
  headers,
  body: audioData,
});
if (!res.ok) throw new Error(`Error: ${res.status}`);
const uploadResponse = await res.json();
const uploadUrl = uploadResponse.upload_url;

const data = {
  audio_url: uploadUrl, // You can also use a URL to an audio or video file on the web
  speech_models: ["universal-3-pro", "universal-2"],
  language_detection: true,
  auto_highlights: true,
};

const url = `${baseUrl}/v2/transcript`;
res = await fetch(url, {
  method: "POST",
  headers: { ...headers, "Content-Type": "application/json" },
  body: JSON.stringify(data),
});
if (!res.ok) throw new Error(`Error: ${res.status}`);
const response = await res.json();

const transcriptId = response.id;
console.log("Transcript ID: ", transcriptId);

const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`;

while (true) {
  res = await fetch(pollingEndpoint, { headers });
  if (!res.ok) throw new Error(`Error: ${res.status}`);
  const transcriptionResult = await res.json();

  if (transcriptionResult.status === "completed") {
    for (const result of transcriptionResult.auto_highlights_result.results) {
      const timestamps = result.timestamps
        .map(({ start, end }) => `[Timestamp(start=${start}, end=${end})]`)
        .join(", ");
      console.log(
        `Highlight: ${result.text}, Count: ${result.count}, Rank: ${result.rank}, Timestamps: ${timestamps}`
      );
    }
    break;
  } else if (transcriptionResult.status === "error") {
    throw new Error(`Transcription failed: ${transcriptionResult.error}`);
  } else {
    await new Promise((resolve) => setTimeout(resolve, 3000));
  }
}

Enable Key Phrases by setting auto_highlights to true in the transcription config.

import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({
  apiKey: "<YOUR_API_KEY>",
});

// const audioFile = './local_file.mp3'
const audioFile = "https://assembly.ai/wildfires.mp3";

const params = {
  audio: audioFile,
  speech_models: ["universal-3-pro", "universal-2"],
  language_detection: true,
  auto_highlights: true,
};

const run = async () => {
  const transcript = await client.transcripts.transcribe(params);

  for (const result of transcript.auto_highlights_result.results) {
    const timestamps = result.timestamps
      .map(({ start, end }) => `[Timestamp(start=${start}, end=${end})]`)
      .join(", ");
    console.log(
      `Highlight: ${result.text}, Count: ${result.count}, Rank ${result.rank}, Timestamps: ${timestamps}`
    );
  }
};

run();

Example output

Highlight: air quality alerts, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=3978, end=5114)]
Highlight: wide ranging air quality consequences, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=235388, end=238838)]
Highlight: more fires, Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=184716, end=185186)]
...

API reference

Request

curl https://api.assemblyai.com/v2/transcript \
--header "Authorization: <YOUR_API_KEY>" \
--header "Content-Type: application/json" \
--data '{
  "audio_url": "YOUR_AUDIO_URL",
  "auto_highlights": true
}'

Key	Type	Description
`auto_highlights`	boolean	Enable Key Phrases.

Response

{
  auto_highlights_result: {
    status: "success",
    results: [
      {
        count: 1,
        rank: 0.08,
        text: "air quality alerts",
        timestamps: [
          {
            start: 3978,
            end: 5114,
          },
        ],
      },
      {
        count: 1,
        rank: 0.08,
        text: "wide ranging air quality consequences",
        timestamps: [
          {
            start: 235388,
            end: 238694,
          },
        ],
      },
      {
        count: 1,
        rank: 0.07,
        text: "more wildfires",
        timestamps: [
          {
            start: 230972,
            end: 232354,
          },
        ],
      },
      {
        count: 1,
        rank: 0.07,
        text: "air pollution",
        timestamps: [
          {
            start: 156004,
            end: 156910,
          },
        ],
      },
      {
        count: 3,
        rank: 0.07,
        text: "weather systems",
        timestamps: [
          {
            start: 47344,
            end: 47958,
          },
          {
            start: 205268,
            end: 205818,
          },
          {
            start: 211588,
            end: 213434,
          },
        ],
      },
      {
        count: 2,
        rank: 0.06,
        text: "high levels",
        timestamps: [
          {
            start: 121128,
            end: 121646,
          },
          {
            start: 155412,
            end: 155866,
          },
        ],
      },
      {
        count: 1,
        rank: 0.06,
        text: "health conditions",
        timestamps: [
          {
            start: 152138,
            end: 152666,
          },
        ],
      },
      {
        count: 2,
        rank: 0.06,
        text: "Peter de Carlo",
        timestamps: [
          {
            start: 18948,
            end: 19930,
          },
          {
            start: 268298,
            end: 269194,
          },
        ],
      },
      {
        count: 1,
        rank: 0.06,
        text: "New York City",
        timestamps: [
          {
            start: 125768,
            end: 126274,
          },
        ],
      },
      {
        count: 1,
        rank: 0.05,
        text: "respiratory conditions",
        timestamps: [
          {
            start: 152964,
            end: 153786,
          },
        ],
      },
      {
        count: 3,
        rank: 0.05,
        text: "New York",
        timestamps: [
          {
            start: 125768,
            end: 126034,
          },
          {
            start: 171448,
            end: 171938,
          },
          {
            start: 176008,
            end: 176322,
          },
        ],
      },
      {
        count: 3,
        rank: 0.05,
        text: "climate change",
        timestamps: [
          {
            start: 229548,
            end: 230230,
          },
          {
            start: 244576,
            end: 245162,
          },
          {
            start: 263348,
            end: 263950,
          },
        ],
      },
      {
        count: 1,
        rank: 0.05,
        text: "Johns Hopkins University Varsity",
        timestamps: [
          {
            start: 23972,
            end: 25490,
          },
        ],
      },
      {
        count: 1,
        rank: 0.05,
        text: "heart conditions",
        timestamps: [
          {
            start: 153988,
            end: 154506,
          },
        ],
      },
      {
        count: 1,
        rank: 0.05,
        text: "air quality warnings",
        timestamps: [
          {
            start: 12308,
            end: 13434,
          },
        ],
      },
    ],
  },
}

Key	Type	Description
`auto_highlights_result`	object	The result of the Key Phrases model.
`auto_highlights_result.status`	string	Is either `success` or `unavailable` in the rare case that the Key Phrases model failed.
`auto_highlights_result.results`	array	A temporally-sequential array of key phrases.
`auto_highlights_result.results[i].count`	number	The total number of times the i-th key phrase appears in the audio file.
`auto_highlights_result.results[i].rank`	number	The total relevancy to the overall audio file of this key phrase. A greater number means that the key phrase is more relevant.
`auto_highlights_result.results[i].text`	string	The text itself of the key phrase.
`auto_highlights_result.results[i].timestamps[j].start`	number	The starting time of the j-th appearance of the i-th key phrase.
`auto_highlights_result.results[i].timestamps[j].end`	number	The ending time of the j-th appearance of the i-th key phrase.

The response also includes the request parameters used to generate the transcript.

Frequently Asked Questions

How does the Key Phrases model identify important phrases in my transcription?

The Key Phrases model uses natural language processing and machine learning algorithms to analyze the frequency and distribution of words and phrases in your transcription. The algorithm identifies key phrases based on their relevancy score, which takes into account factors such as the number of times a phrase occurs, the distance between occurrences, and the overall length of the transcription.

What is the difference between the Key Phrases model and the Topic Detection model?

The Key Phrases model is designed to identify important phrases and words in your transcription, whereas the Topic Detection model is designed to categorize your transcription into predefined topics. While both models use natural language processing and machine learning algorithms, they have different goals and approaches to analyzing your text.

Can the Key Phrases model handle misspelled or unrecognized words?

Yes, the Key Phrases model can handle misspelled or unrecognized words to some extent. However, the accuracy of the detection may depend on the severity of the misspelling or the obscurity of the word. It’s recommended to provide high-quality, relevant audio files with accurate transcriptions for the best results.

What are some limitations of the Key Phrases model?

Some limitations of the Key Phrases model include its limited understanding of context, which may lead to inaccuracies in identifying the most important phrases in certain cases, such as text with heavy use of jargon or idioms. Additionally, the model assigns higher scores to words or phrases that occur more frequently in the text, which may lead to an over-representation of common words and phrases that may not be as important in the context of the text. Finally, the Key Phrases model is a general-purpose algorithm that can’t be easily customized or fine-tuned for specific domains, meaning it may not perform as well for specialized texts where certain keywords or concepts may be more important than others.

How can I optimize the performance of the Key Phrases model?

To optimize the performance of the Key Phrases model, it’s recommended to provide high-quality, relevant audio files with accurate transcriptions, to review and adjust the model’s configuration parameters, such as the confidence threshold for key phrase detection, and to refer to the list of identified key phrases to guide the analysis. It may also be helpful to consider adding additional training data to the model or consulting with AssemblyAI support for further assistance.

​Quickstart

​Example output

​API reference

​Request

​Response

​Frequently Asked Questions

Quickstart

Example output

API reference

Request

Response

Frequently Asked Questions