The Custom Formatting feature automatically standardizes and formats specific types of information in your transcripts, ensuring consistency across dates, phone numbers, emails, and other data types. This eliminates the need for post-processing and provides clean, formatted output ready for your application.Key capabilities:
Format dates in your preferred style (US, European, ISO, etc.)
Standardize phone number formats with custom patterns
Control currency and decimal precision
Convert spelled-out text into formatted patterns
Format URLs as hyperlinks
Apply multiple formatting rules simultaneously
Common use cases:
Standardizing contact information in customer service transcripts
Formatting financial data in earnings calls
Preparing transcripts for CRM systems with specific format requirements
Creating consistent documentation from meetings
Processing legal or medical transcripts with strict formatting standards
Transcribe and format in one request - Best when you’re starting a new transcription and want to automatically format the transcript text as part of that process
Transcribe and format in separate requests - Best when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks
Method 2: Transcribe and format in separate requests
This method is useful when you already have text that you would like to format or for more complicated workflows where you want to separate the transcription and formatting tasks.
Python
JavaScript
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileaudio_url = "https://assembly.ai/phone-msg.m4a"# Submit transcription request (without formatting)data = { "audio_url": audio_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True}response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription completionwhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": print("Transcription completed!") break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Add custom formatting configuration to the completed transcriptunderstanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "custom_formatting": { "date": "mm/dd/yyyy", "phone_number": "(xxx)xxx-xxxx", "email": "username@domain.com", "format_utterances": True } } }}# Send to Speech Understanding API for formattingresult = requests.post( "https://llm-gateway.assemblyai.com/v1/understanding", headers=headers, json=understanding_body).json()print("Formatting completed!")# Access and display resultsprint("\n--- Formatting Details ---")mapping = result['speech_understanding']['response']['custom_formatting']['mapping']for original, formatted in mapping.items(): print(f"Original: {original}") print(f"Formatted: {formatted}\n")
const baseUrl = "https://api.assemblyai.com";const headers = { authorization: "<YOUR_API_KEY>", "content-type": "application/json",};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst audioUrl = "https://assembly.ai/phone-msg.m4a";// Submit transcription request (without formatting)const data = { audio_url: audioUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true,};async function transcribeAndFormat() { // Start transcription const response = await fetch(`${baseUrl}/v2/transcript`, { method: "POST", headers: headers, body: JSON.stringify(data), }); const { id: transcriptId } = await response.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription completion while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { console.log("Transcription completed!"); break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise((resolve) => setTimeout(resolve, 3000)); } } // Add custom formatting configuration to the completed transcript const understandingBody = { transcript_id: transcriptId, speech_understanding: { request: { custom_formatting: { date: "mm/dd/yyyy", phone_number: "(xxx)xxx-xxxx", email: "username@domain.com", format_utterances: true, }, }, }, }; // Send to Speech Understanding API for formatting const understandingResponse = await fetch( "https://llm-gateway.assemblyai.com/v1/understanding", { method: "POST", headers: headers, body: JSON.stringify(understandingBody), } ); const result = await understandingResponse.json(); // Access and display results console.log("\n--- Formatting Details ---"); const mapping = result.speech_understanding.response.custom_formatting.mapping; for (const [original, formatted] of Object.entries(mapping)) { console.log(`Original: ${original}`); console.log(`Formatted: ${formatted}\n`); }}transcribeAndFormat();
Expected output:
--- Formatting Details ---Original: Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466. Also, my cell phone number is 555-679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is 555-679-3466. Thanks.Formatted: Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466. Also, my cell phone number is (555)679-8244. Once again, if you could call me back, I'd appreciate it. My phone number is (555)679-3466. Thanks.
Data from the Custom Formatting API will be returned in the custom_formatted object, which is contained in the speech_understanding object. The formatted_text key will included a formatted version of the transcript text.If Speaker Diarization is used in the request a formatted_utterances key will be returned containing formatted utterances with preserved timestamps.Example response structure:
{ "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed", "status": "completed", "text": "Yes, I would appreciate it if you could call me back. My phone number is 555-679-3466...", "speech_understanding": { "request": { "custom_formatting": { "date": "mm/dd/yyyy", "phone_number": "(xxx)xxx-xxxx", "email": "username@domain.com", "format_utterances": true } }, "response": { "custom_formatting": { "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...", "formatted_utterances": [ { "confidence": 0.9920061471354167, "end": 26000, "speaker": "A", "start": 1920, "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...", "words": [ { "speaker": "A", "start": 1920, "end": 2160, "text": "Yes,", "confidence": 0.808349609375 } // ... more words ] } ], "mapping": { "555-679-3466": "(555)679-3466", "555-679-8244": "(555)679-8244" }, "status": "success" } } }}
Key features of the output:
Formatted Text: Formatted text can be found in the formatted_text key
Formatted utterances: When format_utterances is enabled, speaker-separated segments in the formatted_utterances key include formatted text
Preserved timestamps: All word-level timestamps in formatted_utterances remain intact after formatting, allowing you to maintain temporal alignment with the audio
Mapping object: Shows exactly what transformations were applied (original → formatted)
The custom_formatting parameter accepts an object with specific formatting rules for different data types in your transcript. Each property in the object defines how a particular type of information should be formatted.
When you include this configuration in your transcription request, the API will automatically detect and format dates, phone numbers, and emails in your transcript according to the specified patterns. With format_utterances enabled, the formatting is applied to both the main transcript text and individual speaker utterances while preserving all timing information.
The Custom Formatting API returns your original transcript response with formatting applied to the text field and additional formatting details in the speech_understanding object. When format_utterances is enabled, formatted utterances with preserved timestamps are also included.
{ "id": "2accd7f2-445b-4d08-b10b-1bafdd5906ed", "status": "completed", "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...", "speech_understanding": { "request": { "custom_formatting": { "date": "mm/dd/yyyy", "phone_number": "(xxx)xxx-xxxx", "email": "username@domain.com", "format_utterances": true } }, "response": { "custom_formatting": { "formatted_text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...", "formatted_utterances": [ { "confidence": 0.9920061471354167, "end": 26000, "speaker": "A", "start": 1920, "text": "Yes, I would appreciate it if you could call me back. My phone number is (555)679-3466...", "words": [ { "speaker": "A", "start": 1920, "end": 2160, "text": "Yes,", "confidence": 0.808349609375 } // ... more words ] } ], "mapping": { "555-679-3466": "(555)679-3466", "555-679-8244": "(555)679-8244" }, "status": "success" } } }}
Key
Type
Description
text
string
The transcript text with custom formatting applied.
speech_understanding
object
Container for speech understanding request and response information.
speech_understanding.request
object
The original custom formatting request configuration that was submitted.
speech_understanding.request.custom_formatting
object
The formatting parameters that were used.
speech_understanding.response
object
The response information from the formatting process.
Array of speaker utterances with formatting applied. Only present when format_utterances is true. Each utterance includes speaker label, timestamps, confidence scores, formatted text, and word-level details with preserved timestamps.
When format_utterances is enabled, each object in the formatted_utterances array contains:
Field
Type
Description
speaker
string
Speaker identifier (e.g., “A”, “B”)
start
integer
Start time of the utterance in milliseconds
end
integer
End time of the utterance in milliseconds
text
string
The utterance text with custom formatting applied
confidence
number
Confidence score for the utterance (0-1)
words
array
Array of word objects with individual timestamps, text (formatted), confidence scores, and speaker labels
Important: All timestamps in both utterances and words are preserved exactly as they appear in the original transcription, ensuring perfect temporal alignment with the audio even after formatting is applied.
Transcribed text with your custom formatting rules applied
Same as custom formatting
speech_understanding
Not present
Object containing formatting request, response, and mapping
Same, plus formatted_text and formatted_utterances
utterances
Speaker-separated segments with original text
Unchanged
Unchanged (original utterances remain)
Word timestamps
Original timestamps
Preserved exactly
Preserved exactly in formatted_utterances
All other fields from the original transcript (words, utterances, confidence, etc.) remain unchanged. The formatted_utterances field provides an additional view of the data with formatting applied while maintaining complete timestamp fidelity.