Data retention and model training

Model training

We consider model training critical to providing you with the most accurate models and services that we can. Only certain files submitted to the API, as permitted by the applicable contract, are used for model training. These files undergo a redaction process designed to redact personally identifiable information before any remaining data is used for model training. We will not use files you submit for model training if you are subject to a Business Associate Addendum, are utilizing our European servers, or if you have opted out from model training. You can find more information on if and how to opt out here.

LLM Gateway model training

AssemblyAI has opted out of data training with all LLM Gateway providers.

Please note this is separate from whether AssemblyAI may train our models with your data. You can find more information on if and how to opt out of data sharing for our model improvement program here.

Encryption

Data at rest is encrypted with AES 128 or AES-256, and data in transit uses TLS 1.2+. AssemblyAI posts SSL scans quarterly to its Trust Center to verify the use of TLS with modern ciphersuites to its service.

Async

For transcription of pre-recorded audio, AssemblyAI supports the following TLS versions and cipher suites:

Supported TLS versions:

  • TLS 1.3
  • TLS 1.2

Supported cipher suites:

  • TLS_AES_128_GCM_SHA256
  • TLS_AES_256_GCM_SHA384
  • TLS_CHACHA20_POLY1305_SHA256
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES128-GCM-SHA256
  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-RSA-AES256-GCM-SHA384

Streaming

For transcription of streaming audio, AssemblyAI supports the following TLS version and cipher suites:

Supported TLS versions:

  • TLS 1.3

Supported cipher suites:

  • TLS_AES_128_GCM_SHA256
  • TLS_AES_256_GCM_SHA384
  • TLS_CHACHA20_POLY1305_SHA256

Ensure your client or application is configured to use one of the supported TLS versions and cipher suites when connecting to AssemblyAI services.

GDPR compliance

We have designed our products with GDPR principles top of mind but also understand that privacy compliance is a moving target. As privacy requirements continue to evolve (rapidly), we are constantly working to assess and improve our practices. You can read more about our privacy practices in our Privacy Policy here, and Data Processing Addendum here.

SOC2 certification

We have both SOC2 Type 1 and Type 2 certifications. You can find more information on this on our Trust Center. We also have a great blog post on the subject, which you can find here.

Data retention

Streaming production environment

If you are opted out of model training, we offer zero data retention of audio and transcripts for our Streaming product. Certain metadata about the transcript is stored and maintained for logging and billing purposes.

The model training environment differs from the production environment. You can find more information on model training in our Model Training section. If you would like to opt out of model training, please see our Opt-Out FAQ.

Asynchronous production environment

Artifact Type*Time-To-Live ConfiguredBAA ExecutedNo TTL or BAACustomer-Initiated Deletion Request
Customer-Uploaded Audio FilesDeleted at TTL expiration, which is 3 days by default.Deleted at TTL expiration, which is 3 days by default.Deletion process begins at 24 hours and is at most 48 hours.Deleted when customer initiates deletion request.
Final Transcription ArtifactDeletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**IndefiniteDeleted when customer initiates deletion request.
Customer-Provided URL Audio ReferenceLinked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Linked to the lifecycle of a Final Transcription Artifact: IndefiniteDeleted when customer initiates deletion request.
Intermediate ArtifactLinked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Deletion process begins at 48 hours and is at most 72 hours.Deleted when customer initiates deletion request.
Transcription Request Text Inputs (this includes content inputs such as key terms prompts, word boost list, etc.)Linked to the lifecycle of a Final Transcription Artifact: Deletion process begins in AWS at TTL expiration, which can be as low as one (1) hour, subject to AWS TTL processing times**Linked to the lifecycle of a Final Transcription Artifact: Default deletion process begins at 72 hours (or can be set to as low as 1-hour), subject to processing times due to the AWS TTL**Linked to the lifecycle of a Final Transcription Artifact: IndefiniteDeleted when customer initiates deletion request.

*Certain metadata is stored for logging and billing purposes.

**The minimum TTL that AssemblyAI may set for Final Transcription Artifacts in the asynchronous production environment is 1 (one) hour. The TTL mechanism that AssemblyAI uses is through Amazon Web Services’s (“AWS”) DynamoDB TTL mechanism (the “AWS TTL”). The deletion process begins in AWS at TTL expiration, but is subject to AWS TTL processing times. In practice, these deletion events typically take place anywhere from a few minutes to a few hours after the deletion process begins in AWS, depending on circumstances, including server location. However, we have seen lag times anywhere from 2-3 hours to a few days. Once the artifact is deleted in AWS, AssemblyAI processes this deletion almost immediately. See here for more information about AWS’s TTL mechanism.

Confirming deletion

Should you wish to confirm a file has been deleted, or in case you did not store the transcript_id when the transcription request was made, you can get a list of all transcripts. You can make a GET request to https://api.assemblyai.com/v2/transcript which will return a list of all transcripts created or specify a transcript_id to review a single transcript.

The model training environment differs from the production environment. You can find more information on model training in our Model Training section. If you would like to opt out of model training, please see our Opt-Out FAQ.

LLM Gateway production environment

  • If you have an executed BAA and use either Anthropic or Google inference models, we offer zero data retention for LLM Gateway inputs and outputs. Certain metadata is stored for logging and billing purposes.

  • If you have a designated TTL on your LLM Gateway account, we delete inputs and outputs on an hourly basis. Certain metadata is stored for logging and billing purposes.

  • If a customer initiates a deletion request, inputs and outputs are deleted at the time of the request. Certain metadata is stored for logging and billing purposes. For deletion of speech understanding requests, please see below.

  • For speech understanding requests, such as translation, speaker ID, or custom formatting, the retention is linked to the life of an asynchronous Final Transcription Artifact, noted above.

For more information about how Anthropic, Google, and OpenAI retain data, please refer to the Available Models charts in our LLM Gateway Overview.

AssemblyAI has opted out of model training with all LLM Gateway providers. Please note this is separate from whether AssemblyAI may train our models with your data, and the model training environment differs from the production environment. You can find more information on model training in our Model Training section. If you would like to opt out of model training, please see our Opt-Out FAQ.