Webhooks for streaming speech-to-text

Webhooks allow you to receive the complete transcript via HTTP callback when a streaming session ends. This is in addition to the real-time WebSocket responses you receive during the session, such as partial and finalized turns. These WebSocket messages are delivered continuously as audio is processed, while the webhook is sent once after the session terminates and contains only the finalized turns.

This guide covers webhooks for streaming audio transcription. For webhooks with pre-recorded audio, see Webhooks for pre-recorded audio.

Configure webhooks for a streaming session

To use webhooks with streaming speech-to-text, add the following parameters to your WebSocket connection URL:

ParameterRequiredDescription
webhook_urlYesThe URL to send the transcript to when the session ends.
webhook_auth_header_nameNoThe name of the authentication header to include in the webhook request.
webhook_auth_header_valueNoThe value of the authentication header to include in the webhook request.
Don't have a webhook endpoint yet?

Create a test webhook endpoint with webhook.site to test your webhook integration.

Example WebSocket URL with webhook parameters

Add the webhook parameters as query parameters to the WebSocket URL:

wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&webhook_url=https://example.com/webhook

To include authentication:

wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&webhook_url=https://example.com/webhook&webhook_auth_header_name=X-Webhook-Secret&webhook_auth_header_value=secret-value
1import pyaudio
2import websocket
3import json
4import threading
5import time
6from urllib.parse import urlencode
7from datetime import datetime
8
9# --- Configuration ---
10YOUR_API_KEY = "<YOUR_API_KEY>"
11
12CONNECTION_PARAMS = {
13 "sample_rate": 16000,
14 "format_turns": True,
15 # Webhook parameters
16 "webhook_url": "https://example.com/webhook",
17 "webhook_auth_header_name": "X-Webhook-Secret", # Optional
18 "webhook_auth_header_value": "secret-value", # Optional
19}
20API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
21API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS)}"
22
23# Audio Configuration
24FRAMES_PER_BUFFER = 800 # 50ms of audio (0.05s * 16000Hz)
25SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
26CHANNELS = 1
27FORMAT = pyaudio.paInt16
28
29# Global variables
30audio = None
31stream = None
32ws_app = None
33audio_thread = None
34stop_event = threading.Event()
35
36
37def on_open(ws):
38 """Called when the WebSocket connection is established."""
39 print("WebSocket connection opened.")
40 print(f"Connected to: {API_ENDPOINT}")
41
42 def stream_audio():
43 global stream
44 print("Starting audio streaming...")
45 while not stop_event.is_set():
46 try:
47 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
48 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
49 except Exception as e:
50 print(f"Error streaming audio: {e}")
51 break
52 print("Audio streaming stopped.")
53
54 global audio_thread
55 audio_thread = threading.Thread(target=stream_audio)
56 audio_thread.daemon = True
57 audio_thread.start()
58
59
60def on_message(ws, message):
61 """Called when a message is received from the WebSocket."""
62 try:
63 data = json.loads(message)
64 msg_type = data.get("type")
65
66 if msg_type == "Begin":
67 session_id = data.get("id")
68 expires_at = data.get("expires_at")
69 print(f"\nSession began: ID={session_id}, ExpiresAt={datetime.fromtimestamp(expires_at)}")
70 elif msg_type == "Turn":
71 transcript = data.get("transcript", "")
72 if data.get('end_of_turn'):
73 print("\r" + " " * 80 + "\r", end="")
74 print(transcript)
75 else:
76 print(f"\r{transcript}", end="")
77 elif msg_type == "Termination":
78 audio_duration = data.get("audio_duration_seconds", 0)
79 session_duration = data.get("session_duration_seconds", 0)
80 print(f"\nSession Terminated: Audio Duration={audio_duration}s, Session Duration={session_duration}s")
81 except json.JSONDecodeError as e:
82 print(f"Error decoding message: {e}")
83 except Exception as e:
84 print(f"Error handling message: {e}")
85
86
87def on_error(ws, error):
88 """Called when a WebSocket error occurs."""
89 print(f"\nWebSocket Error: {error}")
90 stop_event.set()
91
92
93def on_close(ws, close_status_code, close_msg):
94 """Called when the WebSocket connection is closed."""
95 print(f"\nWebSocket Disconnected: Status={close_status_code}, Msg={close_msg}")
96 global stream, audio
97 stop_event.set()
98
99 if stream:
100 if stream.is_active():
101 stream.stop_stream()
102 stream.close()
103 stream = None
104 if audio:
105 audio.terminate()
106 audio = None
107 if audio_thread and audio_thread.is_alive():
108 audio_thread.join(timeout=1.0)
109
110
111def run():
112 global audio, stream, ws_app
113
114 audio = pyaudio.PyAudio()
115
116 try:
117 stream = audio.open(
118 input=True,
119 frames_per_buffer=FRAMES_PER_BUFFER,
120 channels=CHANNELS,
121 format=FORMAT,
122 rate=SAMPLE_RATE,
123 )
124 print("Microphone stream opened successfully.")
125 print("Speak into your microphone. Press Ctrl+C to stop.")
126 except Exception as e:
127 print(f"Error opening microphone stream: {e}")
128 if audio:
129 audio.terminate()
130 return
131
132 ws_app = websocket.WebSocketApp(
133 API_ENDPOINT,
134 header={"Authorization": YOUR_API_KEY},
135 on_open=on_open,
136 on_message=on_message,
137 on_error=on_error,
138 on_close=on_close,
139 )
140
141 ws_thread = threading.Thread(target=ws_app.run_forever)
142 ws_thread.daemon = True
143 ws_thread.start()
144
145 try:
146 while ws_thread.is_alive():
147 time.sleep(0.1)
148 except KeyboardInterrupt:
149 print("\nCtrl+C received. Stopping...")
150 stop_event.set()
151
152 if ws_app and ws_app.sock and ws_app.sock.connected:
153 try:
154 terminate_message = {"type": "Terminate"}
155 ws_app.send(json.dumps(terminate_message))
156 time.sleep(2)
157 except Exception as e:
158 print(f"Error sending termination message: {e}")
159
160 if ws_app:
161 ws_app.close()
162 ws_thread.join(timeout=2.0)
163
164 finally:
165 if stream and stream.is_active():
166 stream.stop_stream()
167 if stream:
168 stream.close()
169 if audio:
170 audio.terminate()
171 print("Cleanup complete.")
172
173
174if __name__ == "__main__":
175 run()

Handle webhook deliveries

When the streaming session ends, AssemblyAI sends a POST HTTP request to the URL you specified. The webhook contains the complete transcript from the session.

Your webhook endpoint must return a 2xx HTTP status code within 10 seconds to indicate successful receipt. If a 2xx status is not received within 10 seconds, AssemblyAI will retry the webhook call up to a total of 10 attempts. If at any point your endpoint returns a 4xx status code, the webhook call is considered failed and will not be retried.

Static Webhook IP addresses

AssemblyAI sends all webhook deliveries from fixed IP addresses:

RegionIP Address
US44.238.19.20
EU54.220.25.36

Delivery payload

The webhook delivery payload contains the complete transcript from the streaming session as a JSON object. The payload includes the session ID and an array of messages containing all the transcript turns.

1{
2 "session_id": "273e79fd-99e9-4e1d-91da-90f56a132d01",
3 "messages": [
4 {
5 "turn_order": 0,
6 "turn_is_formatted": true,
7 "end_of_turn": true,
8 "transcript": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US Skylines from Maine to Maryland to Minnesota are gray and smoggy, and in some places the air.",
9 "end_of_turn_confidence": 0.5005,
10 "words": [
11 {
12 "start": 4880,
13 "end": 5040,
14 "text": "Smoke",
15 "confidence": 0.76054,
16 "word_is_final": true
17 },
18 {
19 "start": 5280,
20 "end": 5360,
21 "text": "from",
22 "confidence": 0.761065,
23 "word_is_final": true
24 }
25 ],
26 "utterance": "",
27 "type": "Turn"
28 }
29 ]
30}
KeyTypeDescription
session_idstringThe unique identifier for the streaming session.
messagesarrayAn array of transcript turn objects from the session.
messages[].turn_orderintegerThe order of the turn in the session (0-indexed).
messages[].turn_is_formattedbooleanWhether the transcript has been formatted.
messages[].end_of_turnbooleanWhether this message represents the end of a turn.
messages[].transcriptstringThe transcribed text for this turn.
messages[].end_of_turn_confidencenumberConfidence score for the end of turn detection.
messages[].wordsarrayWord-level details including timestamps and confidence scores.
messages[].typestringThe message type, typically “Turn”.

Authenticate webhook deliveries

To secure your webhook endpoint, you can include custom authentication headers in the webhook request. When configuring your streaming session, provide the webhook_auth_header_name and webhook_auth_header_value parameters.

AssemblyAI will include this header in the webhook request, allowing you to verify that the request came from AssemblyAI.

webhook_auth_header_name=X-Webhook-Secret&webhook_auth_header_value=secret-value

In your webhook receiver, verify the header value matches what you configured:

1auth_header = request.headers.get("X-Webhook-Secret")
2if auth_header != "secret-value":
3 return "Unauthorized", 401

Best practices

When implementing webhooks for streaming speech-to-text, consider the following best practices:

  1. Always verify authentication: If you configure an authentication header, always verify it in your webhook receiver to ensure requests are from AssemblyAI.

  2. Respond quickly: Return a response from your webhook endpoint as quickly as possible. If you need to perform time-consuming processing, do it asynchronously after returning the response.

  3. Handle failures gracefully: Your webhook endpoint should handle errors gracefully and return appropriate HTTP status codes.

  4. Use HTTPS: Always use HTTPS for your webhook URL to ensure the transcript data is encrypted in transit.

  5. Log webhook deliveries: Keep logs of webhook deliveries for debugging and auditing purposes.