My name is Sonny.
Session initialization
When the session begins, you receive aBegin message with the session ID and expiration time.
Speech started
SpeechStarted is emitted only on Universal-3 Pro Streaming. Universal Streaming skips this message and goes straight to the first Turn.SpeechStarted message indicating that speech has been detected. The timestamp field indicates when the speech was detected, in milliseconds relative to the beginning of the audio stream. The confidence field is the confidence score that speech has started.
Partial transcript
As the speaker is talking, the server emits one or moreTurn messages with end_of_turn: false. These are partial transcripts.
End of turn
When the turn ends, the server emits aTurn message with end_of_turn: true and the final transcript.
Keep alive
KeepAlive messages are not required. By default, sessions remain open until explicitly terminated or until the 3-hour maximum session duration is reached.
KeepAlive is only relevant if you have configured the inactivity_timeout connection parameter, which closes the session after a period of no audio or messages being sent. If you are using inactivity_timeout and want to keep the session open during periods where no audio is being sent, send a KeepAlive message to reset the inactivity timer:
Session termination
To end a session, the client must send aTerminate message. The server then responds with a Termination message containing the total audio and session durations, and closes the connection.
Client sends:
Termination message, no further messages will be sent and the WebSocket connection will be closed.
If Streaming Diarization is enabled (
speaker_labels: true), the server may emit a SpeakerRevision message immediately before Termination. The end-of-session refinement adds approximately 400ms of latency at session close. See Revised speaker labels for the message schema and consumption guidance.