Custom Formatting

Global Englishen
Australian Englishen_au
British Englishen_uk
US Englishen_us
Spanishes
Frenchfr
Germande
Italianit
Portuguesept
Dutchnl
Hindihi
Japaneseja
Chinesezh
Finnishfi
Koreanko
Polishpl
Russianru
Turkishtr
Ukrainianuk
Vietnamesevi
Afrikaansaf
Albaniansq
Amharicam
Arabicar
Armenianhy
Assameseas
Azerbaijaniaz
Bashkirba
Basqueeu
Belarusianbe
Bengalibn
Bosnianbs
Bretonbr
Bulgarianbg
Catalanca
Croatianhr
Czechcs
Danishda
Estonianet
Faroesefo
Galiciangl
Georgianka
Greekel
Gujaratigu
Haitianht
Hausaha
Hawaiianhaw
Hebrewhe
Hungarianhu
Icelandicis
Indonesianid
Javanesejw
Kannadakn
Kazakhkk
Laolo
Latinla
Latvianlv
Lingalaln
Lithuanianlt
Luxembourgishlb
Macedonianmk
Malagasymg
Malayms
Malayalamml
Maltesemt
Maorimi
Marathimr
Mongolianmn
Nepaline
Norwegianno
Norwegian Nynorsknn
Occitanoc
Panjabipa
Pashtops
Persianfa
Romanianro
Sanskritsa
Serbiansr
Shonasn
Sindhisd
Sinhalasi
Slovaksk
Sloveniansl
Somaliso
Sundanesesu
Swahilisw
Swedishsv
Tagalogtl
Tajiktg
Tamilta
Tatartt
Telugute
Turkmentk
Urduur
Uzbekuz
Welshcy
Yiddishyi
Yorubayo

Slam 1slam-1
Universaluniversal

US only

Overview

The Custom Formatting feature automatically standardizes and formats specific types of information in your transcripts, ensuring consistency across dates, phone numbers, emails, and other data types. This eliminates the need for post-processing and provides clean, formatted output ready for your application.

Key capabilities:

  • Format dates in your preferred style (US, European, ISO, etc.)
  • Standardize phone number formats with custom patterns
  • Control currency and decimal precision
  • Convert spelled-out text into formatted patterns
  • Format URLs as hyperlinks
  • Apply multiple formatting rules simultaneously

Common use cases:

  • Standardizing contact information in customer service transcripts
  • Formatting financial data in earnings calls
  • Preparing transcripts for CRM systems with specific format requirements
  • Creating consistent documentation from meetings
  • Processing legal or medical transcripts with strict formatting standards

Quickstart

There are two ways to use Custom Formatting:

  1. Transcribe and format in one request - Best when you’re starting a new transcription and want to automatically format the transcript text as part of that process
  2. Transcribe and format in separate requests - Best when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks

Method 1: Transcribe and format in one request

This method is ideal when you’re starting fresh and want both transcription and formatting in a single workflow.

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5
6headers = {
7 "authorization": "<YOUR_API_KEY>"
8}
9
10# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11audio_url = "https://assembly.ai/phone-msg.m4a"
12
13# Configure transcription with custom formatting
14data = {
15 "audio_url": audio_url,
16 "speaker_labels": True,
17 "speech_understanding": {
18 "request": {
19 "custom_formatting": {
20 "date": "mm/dd/yyyy",
21 "phone_number": "(xxx)xxx-xxxx",
22 "email": "username@domain.com",
23 "format_utterances": True
24 }
25 }
26 }
27}
28
29# Submit transcription request
30response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
31transcript_id = response.json()["id"]
32polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
33
34# Poll for transcription results
35while True:
36 transcript = requests.get(polling_endpoint, headers=headers).json()
37
38 if transcript["status"] == "completed":
39 break
40
41 elif transcript["status"] == "error":
42 raise RuntimeError(f"Transcription failed: {transcript['error']}")
43
44 else:
45 time.sleep(3)
46
47# Access and display results
48print("\n--- Formatting Details ---")
49mapping = transcript['speech_understanding']['response']['custom_formatting']['mapping']
50for original, formatted in mapping.items():
51 print(f"Original: {original}")
52 print(f"Formatted: {formatted}\n")

Method 2: Transcribe and format in separate requests

This method is useful when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks.

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5
6headers = {
7 "authorization": "<YOUR_API_KEY>"
8}
9
10# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11audio_url = "https://assembly.ai/phone-msg.m4a"
12
13# Submit transcription request (without formatting)
14data = {
15 "audio_url": audio_url,
16 "speaker_labels": True
17}
18
19response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
20transcript_id = response.json()["id"]
21polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
22
23# Poll for transcription completion
24while True:
25 transcript = requests.get(polling_endpoint, headers=headers).json()
26
27 if transcript["status"] == "completed":
28 print("Transcription completed!")
29 break
30
31 elif transcript["status"] == "error":
32 raise RuntimeError(f"Transcription failed: {transcript['error']}")
33
34 else:
35 time.sleep(3)
36
37# Add custom formatting configuration to the completed transcript
38understanding_body = {
39 "transcript_id": transcript_id,
40 "speech_understanding": {
41 "request": {
42 "custom_formatting": {
43 "date": "mm/dd/yyyy",
44 "phone_number": "(xxx)xxx-xxxx",
45 "email": "username@domain.com",
46 "format_utterances": True
47 }
48 }
49 }
50}
51
52# Send to Speech Understanding API for formatting
53result = requests.post(
54 "https://llm-gateway.assemblyai.com/v1/understanding",
55 headers=headers,
56 json=understanding_body
57).json()
58
59print("Formatting completed!")
60
61# Access and display results
62print("\n--- Formatting Details ---")
63mapping = result['speech_understanding']['response']['custom_formatting']['mapping']
64for original, formatted in mapping.items():
65 print(f"Original: {original}")
66 print(f"Formatted: {formatted}\n")

Expected output:

--- Formatting Details ---
Original: Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466. Also, my cell phone number is 555-679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is 555-679-3466. Thanks.
Formatted: Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466. Also, my cell phone number is (555)679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is (555)679-3466. Thanks.

Output format

Data from the Custom Formatting API will be returned in the custom_formatted object, which is contained in the speech_understanding object. The formatted_text key will included a formatted version of the transcript text.

If Speaker Diarization is used in the request a formatted_utterances key will be returned containing formatted utterances with preserved timestamps.

Example response structure:

1{
2 "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed",
3 "status": "completed",
4 "text": "Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466...",
5 "speech_understanding": {
6 "request": {
7 "custom_formatting": {
8 "date": "mm/dd/yyyy",
9 "phone_number": "(xxx)xxx-xxxx",
10 "email": "username@domain.com",
11 "format_utterances": true
12 }
13 },
14 "response": {
15 "custom_formatting": {
16 "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
17 "formatted_utterances": [
18 {
19 "confidence": 0.9920061471354167,
20 "end": 26000,
21 "speaker": "A",
22 "start": 1920,
23 "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
24 "words": [
25 {
26 "speaker": "A",
27 "start": 1920,
28 "end": 2160,
29 "text": "Yes,",
30 "confidence": 0.808349609375
31 },
32 // ... more words
33 ]
34 }
35 ],
36 "mapping": {
37 "555-679-3466": "(555)679-3466",
38 "555-679-8244": "(555)679-8244"
39 },
40 "status": "success"
41 }
42 }
43 }
44}

Key features of the output:

  • Formatted Text: Formatted text can be found in the formatted_text key
  • Formatted utterances: When format_utterances is enabled, speaker-separated segments in the formatted_utterances key include formatted text
  • Preserved timestamps: All word-level timestamps in formatted_utterances remain intact after formatting, allowing you to maintain temporal alignment with the audio
  • Mapping object: Shows exactly what transformations were applied (original → formatted)

Understanding the custom_formatting parameter

The custom_formatting parameter accepts an object with specific formatting rules for different data types in your transcript. Each property in the object defines how a particular type of information should be formatted.

Available formatting options

ParameterTypeDescriptionExample Values
datestringSpecifies the format pattern for dates in the transcript"mm/dd/yyyy", "dd/mm/yyyy", "yyyy-mm-dd"
phone_numberstringSpecifies the format pattern for phone numbers"(xxx)xxx-xxxx", "xxx-xxx-xxxx", "xxx.xxx.xxxx"
emailstringSpecifies the format pattern for email addresses"username@domain.com", "firstname.lastname@domain.com"
format_utterancesbooleanWhen true, applies formatting to utterances in addition to the main text field. Preserves all word-level timestamps.true, false (default: false)

Example configuration:

1{
2 "custom_formatting": {
3 "date": "mm/dd/yyyy",
4 "phone_number": "(xxx)xxx-xxxx",
5 "email": "username@domain.com",
6 "format_utterances": true
7 }
8}

When you include this configuration in your transcription request, the API will automatically detect and format dates, phone numbers, and emails in your transcript according to the specified patterns. With format_utterances enabled, the formatting is applied to both the main transcript text and individual speaker utterances while preserving all timing information.

Common formatting patterns

Date formats

PatternExample OutputDescription
mm/dd/yyyy09/19/1991US format (month/day/year)
dd/mm/yyyy19/09/1991European format (day/month/year)
yyyy-mm-dd1991-09-19ISO 8601 format
mm-dd-yyyy09-19-1991US format with dashes
dd.mm.yyyy19.09.1991European format with dots

Phone number formats

PatternExample OutputDescription
(xxx)xxx-xxxx(555)679-3466Parentheses and dash
xxx-xxx-xxxx555-679-3466Dashes only
xxx.xxx.xxxx555.679.3466Dots separator
+x(xxx)xxx-xxxx+1(555)679-3466International format

Email formats

PatternExample Output
username@domain.comjohn.doe@example.com
firstname.lastname@domain.comjohn.doe@company.com

Best practices

  1. Choose appropriate formats: Select formatting patterns that match your application’s requirements and regional standards.

  2. Combine formatting rules: You can apply multiple formatting rules simultaneously for comprehensive text standardization.

  3. Test with sample data: Verify your formatting patterns work correctly with representative audio samples before processing large batches.

  4. Review the mapping: Check the mapping object in the response to see exactly what was changed and verify the results.

  5. Consider regional differences: Be mindful of date and phone number format differences when processing international content.

API reference

Request

Method 1: Transcribe and format in one request

When creating a new transcription, include the speech_understanding parameter directly in your transcription request:

$curl -X POST \
> "https://api.assemblyai.com/v2/transcript" \
> -H "Authorization: YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "audio_url": "https://assembly.ai/phone-msg.m4a",
> "speaker_labels": true,
> "speech_understanding": {
> "request": {
> "custom_formatting": {
> "date": "mm/dd/yyyy",
> "phone_number": "(xxx)xxx-xxxx",
> "email": "username@domain.com",
> "format_utterances": true
> }
> }
> }
> }'

Method 2: Add formatting to existing transcripts

For existing transcripts, retrieve the completed transcript and send it to the Speech Understanding API:

$# Step 1: Get the completed transcript
>transcript=$(curl -s -X GET \
> "https://api.assemblyai.com/v2/transcript/YOUR_TRANSCRIPT_ID" \
> -H "Authorization: YOUR_API_KEY")
>
># Step 2: Add custom formatting and send to Speech Understanding API
>curl -X POST \
> "https://llm-gateway.assemblyai.com/v1/understanding" \
> -H "Authorization: YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "transcript_id": "{transcript_id}",
> "speech_understanding": {
> "request": {
> "custom_formatting": {
> "date": "mm/dd/yyyy",
> "phone_number": "(xxx)xxx-xxxx",
> "email": "username@domain.com",
> "format_utterances": true
> }
> }
> }
> }'
KeyTypeRequired?Description
speech_understandingobjectYesContainer for speech understanding requests.
speech_understanding.requestobjectYesThe understanding request configuration.
speech_understanding.request.custom_formattingobjectYesCustom formatting configuration.
custom_formatting.datestringNoDate format pattern. Common patterns: mm/dd/yyyy (US), dd/mm/yyyy (European), yyyy-mm-dd (ISO).
custom_formatting.phone_numberstringNoPhone number format pattern. Examples: (xxx)xxx-xxxx, xxx-xxx-xxxx, xxx.xxx.xxxx.
custom_formatting.emailstringNoEmail format pattern. Example: username@domain.com.
custom_formatting.format_utterancesbooleanNoWhen true, applies formatting to speaker utterances in addition to the main text. Preserves word-level timestamps. Default: false.

Response

The Custom Formatting API returns your original transcript response with formatting applied to the text field and additional formatting details in the speech_understanding object. When format_utterances is enabled, formatted utterances with preserved timestamps are also included.

1{
2 "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed",
3 "status": "completed",
4 "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
5 "speech_understanding": {
6 "request": {
7 "custom_formatting": {
8 "date": "mm/dd/yyyy",
9 "phone_number": "(xxx)xxx-xxxx",
10 "email": "username@domain.com",
11 "format_utterances": true
12 }
13 },
14 "response": {
15 "custom_formatting": {
16 "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
17 "formatted_utterances": [
18 {
19 "confidence": 0.9920061471354167,
20 "end": 26000,
21 "speaker": "A",
22 "start": 1920,
23 "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...",
24 "words": [
25 {
26 "speaker": "A",
27 "start": 1920,
28 "end": 2160,
29 "text": "Yes,",
30 "confidence": 0.808349609375
31 },
32 // ... more words
33 ]
34 }
35 ],
36 "mapping": {
37 "555-679-3466": "(555)679-3466",
38 "555-679-8244": "(555)679-8244"
39 },
40 "status": "success"
41 }
42 }
43 }
44}
KeyTypeDescription
textstringThe transcript text with custom formatting applied.
speech_understandingobjectContainer for speech understanding request and response information.
speech_understanding.requestobjectThe original custom formatting request configuration that was submitted.
speech_understanding.request.custom_formattingobjectThe formatting parameters that were used.
speech_understanding.responseobjectThe response information from the formatting process.
speech_understanding.response.custom_formattingobjectDetails about the formatting operation.
speech_understanding.response.custom_formatting.formatted_textstringThe complete transcript with custom formatting applied. Identical to the text field.
speech_understanding.response.custom_formatting.formatted_utterancesarrayArray of speaker utterances with formatting applied. Only present when format_utterances is true. Each utterance includes speaker label, timestamps, confidence scores, formatted text, and word-level details with preserved timestamps.
speech_understanding.response.custom_formatting.mappingobjectAn object showing the original text segments and their formatted versions. Keys are original text, values are formatted text.
speech_understanding.response.custom_formatting.statusstringThe status of the formatting operation. Will be "success" when formatting completes successfully.

Understanding formatted_utterances

When format_utterances is enabled, each object in the formatted_utterances array contains:

FieldTypeDescription
speakerstringSpeaker identifier (e.g., “A”, “B”)
startintegerStart time of the utterance in milliseconds
endintegerEnd time of the utterance in milliseconds
textstringThe utterance text with custom formatting applied
confidencenumberConfidence score for the utterance (0-1)
wordsarrayArray of word objects with individual timestamps, text (formatted), confidence scores, and speaker labels

Important: All timestamps in both utterances and words are preserved exactly as they appear in the original transcription, ensuring perfect temporal alignment with the audio even after formatting is applied.

Key differences from standard transcription

FieldStandard TranscriptionWith Custom FormattingWith format_utterances=true
textTranscribed text with default formattingTranscribed text with your custom formatting rules appliedSame as custom formatting
speech_understandingNot presentObject containing formatting request, response, and mappingSame, plus formatted_text and formatted_utterances
utterancesSpeaker-separated segments with original textUnchangedUnchanged (original utterances remain)
Word timestampsOriginal timestampsPreserved exactlyPreserved exactly in formatted_utterances

All other fields from the original transcript (words, utterances, confidence, etc.) remain unchanged. The formatted_utterances field provides an additional view of the data with formatting applied while maintaining complete timestamp fidelity.