Translate AssemblyAI Transcripts Into Other Languages Using Commercial Models
This Cookbook walks through how to translate AssemblyAI transcripts using a variety of commerical and open-source machine translation models.
Choosing a model depends on your use-case and preferences. Here are some considerations you may want to make when choosing a model to use for translation:
- Accuracy and Quality of Translation: you should compare the translations from each provider to see which translation you prefer
- Language Support: check the supported languages for Google Translate, DeepL, and python-translate respectively
- Cost: while commercial models usually have a free-tier or trials, they will incur costs eventually
Setup
To get started, paste your API token into the empty string below. If you don’t already have an API token, you can get one for free here.
Make sure not to share this token with anyone - it is a private key associated uniquely to your account.
Next, we’ll install the AssemblyAI Python SDK, which will allow us to easily use LeMUR in just a few lines of code.
Finally, import the assemblyai
package and set your API token in the settings:
Specify the target language for the translation
Get detected language code from the AAI JSON response
Commercial Models
Google Translate API
https://cloud.google.com/translate/docs/reference/libraries/v2/python
Note: you will need a GCP account as well as app credentials to make this API request.
Follow Google’s docs on how to generate a credentials JSON file: https://cloud.google.com/docs/authentication/application-default-credentials
DeepL API
https://www.deepl.com/en/docs-api
You will need a DeepL account and API token, which can be found here: https://www.deepl.com/pro-api
Open-Source Models
translate-python
Library
https://github.com/terryyin/translate-python
Further Documentation
Cookbook: Translate subtitles