Transcribe Genesys Cloud Recordings with AssemblyAI | AssemblyAI

This guide walks through the process of setting up a transcription pipeline to send audio data from Genesys Cloud to AssemblyAI.

To accomplish this, we’ll stream audio through Genesys’s AudioHook Monitor integration to a WebSocket server. Upon call completion, the server will process this audio into a wav file and send it to AssemblyAI’s Speech-to-text API for pre-recorded audio transcription.

You can find all the necessary code for this guide here.

Architecture Overview

Here’s the general flow our app will follow:

+---------------------+     +--------------------+     +----------------------+
| 1. Genesys Cloud    |  →  | 2. WebSocket       |  →  | 3. Convert Raw       |
| (AudioHook Monitor) |     | Server             |     | Audio to WAV         |
+---------------------+     +--------------------+     +----------------------+
                                                                 ↓                                                             
+--------------------+     +---------------------+     +----------------------+
| 6. S3 Bucket       |  ←  | 5. AssemblyAI API   |  ←  | 4. Audio Upload (S3) |
| (Transcript Store) |     | (Transcription)     |     | (Trigger Lambda)     |
+--------------------+     +---------------------+     +----------------------+

Getting started

Before we begin, make sure you have:

An AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
An AWS account, an Access key, and permissions to S3, Lambda, and CloudWatch.
A Genesys Cloud account with the necessary permissions to create call flows, phone numbers, and routes.
ngrok installed.

Genesys AudioHook Monitor

In order to stream your voice calls to third party services outside of the Genesys Cloud platform, Genesys offers an official integration called AudioHook Monitor.

This integration allows you to specify the URL of a WebSocket server that implements the AudioHook Protocol, and once a connection has been established, Genesys will send both text (metadata messages/events encoded as JSON) and binary data (WebSocket frames containing the raw audio data in μ-law (Mu-law, PCMU) format).

An understanding of this integration and protocol is recommended before proceeding with this tutorial. Here are some helpful resources to get started:

Step 1: Create a call flow in Genesys (optional)

You may already have an inbound call flow set up in Genesys (if so, skip to Step 3), but we’ll create a simple one from scratch for the sake of this tutorial.

Adapting existing call flows

All you need to add is the Audio Monitoring step from the toolbox and make sure Suppress recording for the entire flow is unchecked in the flow’s Recording and Speech Recognition settings.

Within the Architect tool, click Add to create a new call flow. Enter a Name for your flow and click Create Flow. Select the newly created flow to open the drag and drop editor.

Create a Reusable Task from the bottom left of the left-side menu. From the Toolbox, search for Audio Monitoring and drag it just after the Start step of our flow. In the right-side menu for this option, make sure Enable Monitoring is enabled.

Back in the Toolbox, search for Transfer to User and set that as the next step. In the right-hand menu, under User select a caller. Under Pre-Transfer Audio and Failed Transfer Audio, type your preferred messages.

Search for Disconnect in the toolbox and drag that as the step following Failure.

Search the Toolbox for Jump to Reusable Task and drag this tool to the Main Menu at the top of the left-side menu. Select a DTMF and Speech Recognition value (this will be used to transfer your call to the agent). Under Task, select the task you just created.

Under Settings in the left-side menu, navigate to the Recording and Speech Recognition section. Make sure Suppress recording for the entire flow is unchecked.

In the top navbar, make sure to click Save and then click Publish to have your changes take effect.

Step 2: Setup a phone and routing for your flow (optional)

In the Genesys Cloud Admin section, navigate to the Phones page under the Telephony section and click Add to create a new phone. For Person, assign the User from your organization that you selected for the Transfer to User step in the previous section.

Navigate to the Number Management page under the Genesys Cloud Voice section and select Purchase Numbers. Enter an area code and click Search. Select a phone number from the list and click Complete Purchase.

Navigate to the Call Routing page under the Routing section and select Add. Under What call flow should be used? select your flow. For Inbound Numbers, type the number you purchased in the above step. Then click Create.

Under the Telephony section, navigate to the External Trunks page. Click Create New. Under Caller ID, the Caller Address will be an E.164 number and the phone number you created.

Under SIP Access Control, select Allow All (note: this is only for development and testing purposes, please specify actual IPs in production). Under the Media section, make sure you select Record calls on this trunk. Then click Save External Trunk (it may take a few moments for your trunk to be ready).

Step 3: Create a S3 bucket

After our Genesys call ends, store the audio file in S3.

Click Create bucket. Give your bucket a name like your-audiohook-bucket. Scroll down and click Create bucket.

Step 4: Create a WebSocket server

In this step, we’ll set up a WebSocket server to receive messages and audio data from Genesys as they are sent. Our server must respond to certain events (i.e. open, close, ping, pause, etc.) according to the AudioHook protocol. Outside of these event messages, audio data is also transferred. We’ll capture and temporarily store this audio locally until the connection is closed, at which point the server processes the audio to a wav file and uploads both the wav and raw audio files to a S3 bucket.

The AudioHook Monitor will send requests to a WebSocket URL we specify when setting up the integration in Step 5. When first enabled, AudioHook Monitor will do a quick verification step to ensure that the WebSocket server has implemented the AudioHook protocol correctly.

For this example, the server is written in JavaScript (Express) and hosted locally. We’ll use ngrok to create a secure tunnel that exposes it to the internet with a public URL so that Genesys can make a connection. However, the server can be implemented using your preferred programming language and deployed in whatever environment you choose, provided both support WebSocket TLS connections for secure bidirectional text and binary message exchange.

Server implementation

This server is a method to get up and running quickly for development and testing purposes without the complexity of a production deployment. How you implement this in practice will vary widely depending on your traffic volume, scaling needs, reliability requirements, security concerns, etc.

Clone this example repo of a WebSocket server that implements the AudioHook protocol. Follow the README instructions to download the necessary dependencies and start the server. Make sure to look over server.js to get an understanding of how the requests from Genesys are received, processed, and responded to, as well as how the audio is stored, converted, and uploaded to our S3 bucket.

Make sure to create a .env file and set the variables:

$ PORT=3000
> AWS_REGION=us-east-1 # Region of your S3 bucket
> AWS_ACCESS_KEY_ID=<ACCESS_ID> # Found under IAM > Security Credentials
> AWS_SECRET_ACCESS_KEY=<SECRET_KEY> # Found under IAM > Security Credentials
> S3_BUCKET=your-audiohook-bucket # Name of your S3 bucket
> S3_KEY_PREFIX=calls/ # The file structure you want your bucket to follow
> API_KEY=<YOUR_API_KEY> # Used for authenticating messages from Genesys
> RECORDINGS_DIR=./recordings # Temp file storage location

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can be found on your account’s IAM > Security Credentials page. API_KEY is explained further in the next step, but it can be anything you want to verify that the requests are actually originating from Genesys.

Download ngrok. Assuming your server is running on port 3000, run ngrok http 3000 --inspect=false in your terminal. From the resulting terminal output, note the forwarding url that should look something like this: https://<id>.ngrok-free.app. This is our WebSocket server URL that we’ll provide to the AudioHook Monitor in the next step.

Step 5: Setting up AudioHook Monitor

In the Genesys Cloud Admin section, navigate to the Integrations page. Add a new integration via the plus sign in the top right corner.

Search for AudioHook Monitor and install.

Navigate to the AudioHook Monitor’s Configuration tab. Under the Properties section, make sure both channels are selected and the Connection URI is set to the ngrok url from the previous step. For the ngrok url, replace https with wss.

In the Configuration tab, navigate to the Credentials section, and click Configure. Here you can set an API key to a value our server will use to verify that requests originated from Genesys. This is done via the X-API-KEY request header. Our server will compare this key to the value we set in our .env for API_KEY, so make sure they match. Click Save.

Back on the Integrations page, click the toggle button under the Status column to activate your AudioHook. Genesys will attempt to verify our server is correctly configured according to the AudioHook protocol. If it is unable to do so, a red error will show with the reason for the failed connection. If it succeeds, the connection will toggle to Active.

Step 6: Set up your AssemblyAI API call

Navigate to the Lambda services page, and create a new function. Set the runtime to Node.js 22.x. In the Change default execution role section, choose the option to create a new role with basic Lambda permissions. Assign a function name and then click Create function.

In this new function, scroll down to the Code Source section and paste the following code into index.js:

1 // Import required AWS SDK modules
2 import { S3 } from '@aws-sdk/client-s3';
3 import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
4 import { GetObjectCommand } from '@aws-sdk/client-s3';
5 
6 // Configure logging
7 const logger = {
8   info: (data) => console.log(JSON.stringify(data)),
9   error: (data) => console.error(JSON.stringify(data))
10 };
11 
12 // Configuration settings for AssemblyAI
13 // See config parameters here: https://www.assemblyai.com/docs/api-reference/transcripts/submit
14 const ASSEMBLYAI_CONFIG = {
15   multichannel: true // Using multichannel here as we told Genesys to send us multichannel audio.
16 };
17 
18 // Initialize AWS S3 client
19 const s3Client = new S3();
20 
21 /**
22  * Generate a presigned URL for the S3 object
23  * @param {string} bucket - S3 bucket name
24  * @param {string} key - S3 object key
25  * @param {number} expiration - URL expiration time in seconds
26  * @returns {Promise<string>} Presigned URL
27  */
28 const getPresignedUrl = async (bucket, key, expiration = 3600) => {
29   logger.info({
30     message: "Generating presigned URL",
31     bucket: bucket,
32     key: key,
33     expiration: expiration
34   });
35 
36   const command = new GetObjectCommand({
37     Bucket: bucket,
38     Key: key
39   });
40 
41   return getSignedUrl(s3Client, command, { expiresIn: expiration });
42 }
43 
44 /**
45  * Delete transcript data from AssemblyAI's database
46  * @param {string} transcriptId - The AssemblyAI transcript ID to delete
47  * @param {string} apiKey - The AssemblyAI API key
48  * @returns {Promise<boolean>} True if deletion was successful, False otherwise
49  */
50 const deleteTranscriptFromAssemblyAI = async (transcriptId, apiKey) => {
51   try {
52     const response = await fetch(`https://api.assemblyai.com/v2/transcript/${transcriptId}`, {
53       method: 'DELETE',
54       headers: {
55         'authorization': apiKey,
56         'content-type': 'application/json'
57       }
58     });
59     
60     if (response.ok) {
61       logger.info(`Successfully deleted transcript ${transcriptId} from AssemblyAI`);
62       return true;
63     } else {
64       const errorData = await response.text();
65       logger.error(`Failed to delete transcript ${transcriptId}: HTTP ${response.status} - ${errorData}`);
66       return false;
67     }
68   } catch (error) {
69     logger.error(`Error deleting transcript ${transcriptId}: ${error.message}`);
70     return false;
71   }
72 }
73 
74 /**
75  * Submit audio for transcription
76  * @param {object} requestData - Request data including audio URL and config
77  * @param {string} apiKey - AssemblyAI API key
78  * @returns {Promise<string>} Transcript ID
79  */
80 const submitTranscriptionRequest = async (requestData, apiKey) => {
81   const response = await fetch('https://api.assemblyai.com/v2/transcript', {
82     method: 'POST',
83     headers: {
84       'authorization': apiKey,
85       'content-type': 'application/json'
86     },
87     body: JSON.stringify(requestData)
88   });
89 
90   if (!response.ok) {
91     const errorText = await response.text();
92     throw new Error(`Failed to submit audio for transcription: ${errorText}`);
93   }
94 
95   const responseData = await response.json();
96   const transcriptId = responseData.id;
97   
98   logger.info({
99     message: "Audio submitted for transcription",
100     transcript_id: transcriptId
101   });
102   
103   return transcriptId;
104 }
105 
106 /**
107  * Poll for transcription completion
108  * @param {string} transcriptId - Transcript ID
109  * @param {string} apiKey - AssemblyAI API key
110  * @returns {Promise<object>} Transcription data
111  */
112 const pollTranscriptionStatus = async (transcriptId, apiKey) => {
113   const sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms));
114   
115   // Keep polling until we get a completion or error
116   while (true) {
117     const response = await fetch(`https://api.assemblyai.com/v2/transcript/${transcriptId}`, {
118       method: 'GET',
119       headers: {
120         'authorization': apiKey,
121         'content-type': 'application/json'
122       }
123     });
124 
125     if (!response.ok) {
126       const errorText = await response.text();
127       throw new Error(`Failed to poll transcription status: ${errorText}`);
128     }
129 
130     const pollingData = await response.json();
131 
132     if (pollingData.status === 'completed') {
133       logger.info({ message: "Transcription completed successfully" });
134       return pollingData;
135     } else if (pollingData.status === 'error') {
136       throw new Error(`Transcription failed: ${pollingData.error}`);
137     }
138     
139     // Wait before polling again
140     await sleep(3000);
141   }
142 }
143 
144 /**
145  * Transcribe audio using AssemblyAI API
146  * @param {string} audioUrl - URL of the audio file
147  * @param {string} apiKey - AssemblyAI API key
148  * @returns {Promise<object>} Transcription data
149  */
150 const transcribeAudio = async (audioUrl, apiKey) => {
151   logger.info({ message: "Starting audio transcription" });
152 
153   // Prepare request data with config parameters
154   const requestData = { audio_url: audioUrl, ...ASSEMBLYAI_CONFIG };
155   
156   // Submit the audio file for transcription
157   const transcriptId = await submitTranscriptionRequest(requestData, apiKey);
158   
159   // Poll for transcription completion
160   return await pollTranscriptionStatus(transcriptId, apiKey);
161 }
162 
163 /**
164  * Lambda function handler
165  * @param {object} event - S3 event
166  * @param {object} context - Lambda context
167  * @returns {Promise<object>} Response
168  */
169 export const handler = async (event, context) => {
170   try {
171     // Get the AssemblyAI API key from environment variables
172     const apiKey = process.env.ASSEMBLYAI_API_KEY;
173     if (!apiKey) {
174       throw new Error("ASSEMBLYAI_API_KEY environment variable is not set");
175     }
176 
177     // Process each record in the S3 event
178     const records = event.Records || [];
179     
180     for (const record of records) {
181       // Get the S3 bucket and key
182       const bucket = record.s3.bucket.name;
183       const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));
184       
185       // Generate a presigned URL for the audio file
186       const audioUrl = await getPresignedUrl(bucket, key);
187       
188       // Get the full transcript JSON from AssemblyAI
189       const transcriptData = await transcribeAudio(audioUrl, apiKey);
190       
191       // Prepare the transcript key - maintaining path structure but changing directory and extension
192       const transcriptKey = key
193         .replace('audio', 'transcripts', 1)
194         .replace('.wav', '.json');
195       
196       // Convert the JSON data to a string
197       const transcriptJsonStr = JSON.stringify(transcriptData, null, 2);
198       
199       // Upload the transcript JSON to the same bucket but in transcripts directory
200       await s3Client.putObject({
201         Bucket: bucket,  // Use the same bucket
202         Key: transcriptKey, // Store under the /transcripts directory
203         Body: transcriptJsonStr,
204         ContentType: 'application/json'
205       });
206       
207       logger.info({
208         message: "Transcript uploaded to transcript bucket successfully.",
209         key: transcriptKey
210       });
211       
212       // Uncomment the following line to delete transcript data from AssemblyAI after saving to S3
213       // https://www.assemblyai.com/docs/api-reference/transcripts/delete
214       // await deleteTranscriptFromAssemblyAI(transcriptData.id, apiKey);
215     }
216 
217     return {
218       statusCode: 200,
219       body: JSON.stringify({
220         message: "Audio file(s) processed successfully",
221         detail: "Transcripts have been stored in the AssemblyAITranscripts directory"
222       })
223     };
224   } catch (error) {
225     console.error(`Error: ${error.message}`);
226     return {
227       statusCode: 500,
228       body: JSON.stringify({
229         message: "Error processing audio file(s)",
230         error: error.message
231       })
232     };
233   }
234 };

At the top of the Lambda function, you can edit the config to enable features for your transcripts. Since our call is two channels, we’ll want to set multichannel to true. To see all available parameters, check out our API reference.

1 ASSEMBLYAI_CONFIG = {
2   'multichannel': true,
3   // 'language_code': 'en_us',
4   // 'redact_pii': true
5   // etc.
6 }

If you would like to delete transcripts from AssemblyAI after completion, you can uncomment line 212 to enable the deleteTranscriptFromAssemblyAI function. This ensures the transcript data is only saved to your S3 bucket and not stored on AssemblyAI’s database.

For more on our data retention policies, see this page from our FAQ.

Once you have finished editing the Lambda function, click Deploy to save your changes.

On the same page, navigate to the Configuration section. Under General configuration, click Edit, and then adjust Timeout to 15min 0sec and click Save. The processing times for transcription will be much shorter, but this ensures the function will have plenty of time to run.

On the left side panel, click Environment variables. Click Edit. Add an environment variable, ASSEMBLYAI_API_KEY, and set the value to your AssemblyAI API key. Then click Save.

Now, navigate to the IAM services page. On the left side panel under Access Management, click Roles and search for your Lambda function’s role (its structure should look like <function_name>-<role_id>). Click the role and then in the Permissions policies section click the dropdown for Add permissions and then select Attach policies.

From this page, find the policies named AmazonS3FullAccess and CloudWatchEventsFullAccess. Click Add permissions for both.

CloudWatchEventsFullAccess is optional, but helpful for debugging purposes. Once your Lambda runs, it should output all logs to CloudWatch under a Log group /aws/lambda/<your-lambda-fn>

Now, navigate to the S3 services page and click into the general purpose bucket where your Genesys recordings are stored. Browse to the Properties tab and then scroll down to Event notifications. Click Create event notification.

Give the event a name and then in the Prefix section enter calls/ (or whatever S3_KEY_PREFIX is set to), and in the Suffix section enter .wav. This will ensure the event is triggered once our wav file has been uploaded. In the Event types section, select All object create events.

Scroll down to the Destination section, set the destination as Lambda function and then select the Lambda function we created in Step 6. Then click Save changes.

Step 7: Transcribe your first call

To test everything is working, call the phone number you linked to this flow in Step 2. Referring to the example flow above, press the DTMF value on the key pad or say the Speech Recognition value. Once transferred, your WebSocket server should start to receive data and output to console:

$ {
>   version: '2',
>   id: '<id>',
>   type: 'ping',
>   seq: 4,
>   position: 'PT8.2S',
>   parameters: { rtt: 'PT0.035392266S' },
>   serverseq: 3
> }
> Received binary audio data: 3200 bytes 
> Received binary audio data: 3200 bytes
> ...
> Processed 146KB of audio data so far

Once the call has ended, you should see the following server logs:

$ {
>   version: '2',
>   id: '<id>',
>   type: 'close',
>   seq: 5,
>   position: 'PT10.2S',
>   parameters: { reason: 'end' },
>   serverseq: 4
> }
> Handling close message
> Closing file stream
> Converting raw audio to WAV: '<wav file name>'
> 
> # Skipping ffmpeg output for brevity...
> 
> Uploading recording '<raw file>' to S3
> Successfully uploaded raw recording to S3: '<raw file>'
> Successfully uploaded WAV recording to S3: '<wav file>'
> Sent closed response, seq=5
> WebSocket closed for session '<session_id>': code=1000, reason=Session Ended
> Cleaning up session '<session_id>'
> Deleted local raw recording file: '<raw_file>'
> Deleted local WAV recording file: '<wav_file>'

To view the logs for this Lambda function, navigate to the CloudWatch services page and under the Logs section, select Log groups. Select the log group that matches your Lambda to view the most recent log stream. This can be very useful for debugging purposes if you run into any issues.

Head to your S3 bucket. Within the /calls directory, files will be stored under a unique identifier with the following structure:

your-audiohook-bucket/calls/<timestamp>_<call_id>_<speaker_id>/<file_type>

with audio files (both raw and wav) under /audio and transcript responses under /transcripts.

The raw file can be nice to have for conversions to other formats in the future, but this step can be omitted to save on storage costs.

Success! You have successfully integrated AssemblyAI with Genesys Cloud via AudioHook Monitor. If you run into any issues or have further questions, please reach out to our Support team.

Other considerations

Supported audio formats

Audio is sent as binary WebSocket frames containing the raw audio data in the negotiated format. Currently, only μ-law (Mu-law, PCMU) is supported.
Before being uploaded to S3, the audio is converted to wav format using ffmpeg. As a lossless format, wav generally results in high transcription accuracy, but is not required. A full list of file formats supported by AssemblyAI’s API can be found here.

Multichannel

As mentioned in Step 6, the multichannel parameter should be enabled as the files are stereo utilizing one channel for each participant. When possible, multiple channels are recommended by AssemblyAI for the most accurate transcription results.
If single channel is preferred, you can simplify the approach to only send a single channel with both speakers (via AudioHook) and adjust your server code to be single channel.