Translate AssemblyAI Transcripts Into Other Languages Using Commercial Models

This Cookbook walks through how to translate AssemblyAI transcripts using a variety of commerical and open-source machine translation models.

Choosing a model depends on your use-case and preferences. Here are some considerations you may want to make when choosing a model to use for translation:

  • Accuracy and Quality of Translation: you should compare the translations from each provider to see which translation you prefer
  • Language Support: check the supported languages for Google Translate, DeepL, and python-translate respectively
  • Cost: while commercial models usually have a free-tier or trials, they will incur costs eventually

Setup

To get started, paste your API token into the empty string below. If you don’t already have an API token, you can get one for free here.

1AAI_API_TOKEN = ""

Make sure not to share this token with anyone - it is a private key associated uniquely to your account.

Next, we’ll install the AssemblyAI Python SDK, which will allow us to easily use LeMUR in just a few lines of code.

$pip install "assemblyai"

Finally, import the assemblyai package and set your API token in the settings:

1import assemblyai as aai
2
3# set the API key
4aai.settings.api_key = f"{AAI_API_TOKEN}"
1config = aai.TranscriptionConfig(language_detection=True)
2transcriber = aai.Transcriber(config=config)
3
4transcript = transcriber.transcribe('./my-audio.mp3')

Specify the target language for the translation

1to_lang = 'en'

Get detected language code from the AAI JSON response

1from_lang = transcript.json_response['language_code']

Commercial Models

Google Translate API

https://cloud.google.com/translate/docs/reference/libraries/v2/python

Note: you will need a GCP account as well as app credentials to make this API request.

$pip install google-cloud-translate==2.0.1

Follow Google’s docs on how to generate a credentials JSON file: https://cloud.google.com/docs/authentication/application-default-credentials

1import os
2os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './translate_creds.json'
1from google.cloud import translate_v2
2
3def translate(target: str, text: str) -> dict:
4 """Translates text into the target language.
5
6 Target must be an ISO 639-1 language code.
7 See https://g.co/cloud/translate/v2/translate-reference#supported_languages
8 """
9 from google.cloud import translate_v2 as translate
10
11 translate_client = translate.Client()
12
13 if isinstance(text, bytes):
14 text = text.decode("utf-8")
15
16 # Text can also be a sequence of strings, in which case this method
17 # will return a sequence of results for each text.
18 result = translate_client.translate(text, target_language=target)
19
20 print("Text: {}".format(result["input"]))
21 print("Translation: {}".format(result["translatedText"]))
22 print()
23
24 return result
25
26for sent in transcript.get_sentences():
27 translate(to_lang, sent.text)

DeepL API

https://www.deepl.com/en/docs-api

$pip install deepl

You will need a DeepL account and API token, which can be found here: https://www.deepl.com/pro-api

1DEEPL_API_TOKEN = ''
1import deepl
2
3def translate(text):
4 translator = deepl.Translator(DEEPL_API_TOKEN)
5 result = translator.translate_text(text, target_lang="EN-US") # Note: DeepL requires more formal language code
6 return result.text
7
8# Example usage
9for sent in transcript.get_sentences():
10 translated_text = translate(sent.text)
11 print("Text: {}".format(sent.text))
12 print("Translation: {}".format(translated_text))
13 print()

Open-Source Models

translate-python Library

https://github.com/terryyin/translate-python

$pip install translate
1from translate import Translator
2
3def translate(text):
4 translator = Translator(to_lang=to_lang, from_lang=from_lang)
5 translation = translator.translate(text)
6 return translation
7
8for sent in transcript.get_sentences():
9 translated_text = translate(sent.text)
10 print("Text: {}".format(sent.text))
11 print("Translation: {}".format(translated_text))
12 print()

Further Documentation

Cookbook: Translate subtitles