Transcribe streaming audio from a microphone in Java — AssemblyAI

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Java.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Java 8 or above.
An AssemblyAI account with credit card set up.

Here’s the full sample code for what you’ll build in this tutorial:

1 import com.assemblyai.api.RealtimeTranscriber;
2 
3 import javax.sound.sampled.*;
4 import java.io.IOException;
5 
6 import static java.lang.Thread.interrupted;
7 
8 public final class App {
9 
10     public static void main(String... args) throws IOException {
11         Thread thread = new Thread(() -> {
12             try {
13                 RealtimeTranscriber realtimeTranscriber = RealtimeTranscriber.builder()
14                         .apiKey("<YOUR_API_KEY>")
15                         .sampleRate(16_000)
16                         .onSessionBegins(sessionBegins -> System.out.println(
17                                 "Session opened with ID: " + sessionBegins.getSessionId()))
18                         .onPartialTranscript(transcript -> {
19                             if (!transcript.getText().isEmpty())
20                                 System.out.println("Partial: " + transcript.getText());
21                         })
22                         .onFinalTranscript(transcript -> System.out.println("Final: " + transcript.getText()))
23                         .onError(err -> System.out.println("Error: " + err.getMessage()))
24                         .build();
25 
26                 System.out.println("Connecting to real-time transcript service");
27                 realtimeTranscriber.connect();
28 
29                 System.out.println("Starting recording");
30                 AudioFormat format = new AudioFormat(16_000, 16, 1, true, false);
31                 // `line` is your microphone
32                 TargetDataLine line = AudioSystem.getTargetDataLine(format);
33                 line.open(format);
34                 byte[] data = new byte[line.getBufferSize()];
35                 line.start();
36                 while (!interrupted()) {
37                     // Read the next chunk of data from the TargetDataLine.
38                     line.read(data, 0, data.length);
39                     realtimeTranscriber.sendAudio(data);
40                 }
41 
42                 System.out.println("Stopping recording");
43                 line.close();
44 
45                 System.out.println("Closing real-time transcript connection");
46                 realtimeTranscriber.close();
47             } catch (LineUnavailableException e) {
48                 throw new RuntimeException(e);
49             }
50         });
51         thread.start();
52 
53         System.out.println("Press ENTER key to stop...");
54         System.in.read();
55         thread.interrupt();
56         System.exit(0);
57     }
58 }

Step 1: Install the SDK

Include the latest version of AssemblyAI’s Java SDK in your project dependencies:

Maven

Gradle

1 <dependency>
2     <groupId>com.assemblyai</groupId>
3     <artifactId>assemblyai-java</artifactId>
4     <version>ASSEMBLYAI_SDK_VERSION</version>
5 </dependency>

Step 2: Create a real-time transcriber

In this step, you’ll create a real-time transcriber and configure it to use your API key.

Browse to Account, and then click the text under Your API key to copy it.

Use the builder to create a new real-time transcriber with your API key, a sample rate of 16 kHz, and lambdas to log the different events. Replace YOUR_API_KEY with your copied API key.

1 import com.assemblyai.api.RealtimeTranscriber;
2 
3 RealtimeTranscriber realtimeTranscriber = RealtimeTranscriber.builder()
4         .apiKey("<YOUR_API_KEY>")
5         .sampleRate(16_000)
6         .onSessionBegins(sessionBegins -> System.out.println(
7                 "Session opened with ID: " + sessionBegins.getSessionId()))
8         .onPartialTranscript(transcript -> {
9             if (!transcript.getText().isEmpty())
10                 System.out.println("Partial: " + transcript.getText());
11         })
12         .onFinalTranscript(transcript -> System.out.println("Final: " + transcript.getText()))
13         .onError(err -> System.out.println("Error: " + err.getMessage()))
14         .build();

The real-time transcriber returns two types of transcripts: partial and final.

Partial transcripts are returned as the audio is being streamed to AssemblyAI.
Final transcripts are returned when the service detects a pause in speech.

End of utterance controls

You can configure the silence threshold for automatic utterance detection and programmatically force the end of an utterance to immediately get a Final transcript.

Sample rate

The sample_rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.

We recommend the following sample rates:

Minimum quality: 8_000 (8 kHz)
Medium quality: 16_000 (16 kHz)
Maximum quality: 48_000 (48 kHz)

If you don’t set a sample rate on the real-time transcriber, it defaults to 16 kHz.

Step 3: Connect the streaming service

Connect to the streaming service so you can send audio to it.

1 System.out.println("Connecting to real-time transcript service");
2 realtimeTranscriber.connect();

Step 4: Record audio from microphone

In this step, you’ll use Java’s built-in APIs for recording audio.

Create the audio format that the real-time service expects, which is single channel, pcm_s16le (PCM signed 16-bit little-endian) encoded, with a sample rate of 16_000. The sample rate needs to be the same value as you configured on the real-time transcriber.

1 System.out.println("Starting recording");
2 AudioFormat format = new AudioFormat(16_000.0f, 16, 1, true, false);

Audio data format

By default, transcriptions expect PCM16-encoded audio. If you want to use mu-law encoding, see Specifying the encoding.

Get the microphone and open it.

1 // `line` is your microphone
2 TargetDataLine line = AudioSystem.getTargetDataLine(format);
3 line.open(format);

Read the audio data into a byte array and send it to the real-time transcriber.

1 byte[] data = new byte[line.getBufferSize()];
2 line.start();
3 while (!interrupted()) {
4     // Read the next chunk of data from the TargetDataLine.
5     line.read(data, 0, data.length);
6     realtimeTranscriber.sendAudio(data);
7 }

Interupting the recording

The interrupted() method returns true when the current thread is interrupted. In this example, you will use it to stop the transcriber and recording when the user presses the ENTER key.

Step 5: Disconnect the real-time service

When you are done, close the transcriber.

1 System.out.println("Stopping recording");
2 line.close();
3 
4 System.out.println("Closing real-time transcript connection");
5 realtimeTranscriber.close();

Step 6: Run the code in a thread

To be able to listen for user input while the recording is happening, you need to run the code in a separate thread. When the user hits enter, interrupt the thread and exit the program.

1 Thread thread = new Thread(() -> {
2     try {
3       // Your existing code here
4     } catch (LineUnavailableException e) {
5         throw new RuntimeException(e);
6     }
7 });
8 thread.start();
9 
10 System.out.println("Press ENTER key to stop...");
11 System.in.read();
12 thread.interrupt();
13 System.exit(0);

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

1	import com.assemblyai.api.RealtimeTranscriber;
2
3	import javax.sound.sampled.*;
4	import java.io.IOException;
5
6	import static java.lang.Thread.interrupted;
7
8	public final class App {
9
10	public static void main(String... args) throws IOException {
11	Thread thread = new Thread(() -> {
12	try {
13	RealtimeTranscriber realtimeTranscriber = RealtimeTranscriber.builder()
14	.apiKey("<YOUR_API_KEY>")
15	.sampleRate(16_000)
16	.onSessionBegins(sessionBegins -> System.out.println(
17	"Session opened with ID: " + sessionBegins.getSessionId()))
18	.onPartialTranscript(transcript -> {
19	if (!transcript.getText().isEmpty())
20	System.out.println("Partial: " + transcript.getText());
21	})
22	.onFinalTranscript(transcript -> System.out.println("Final: " + transcript.getText()))
23	.onError(err -> System.out.println("Error: " + err.getMessage()))
24	.build();
25
26	System.out.println("Connecting to real-time transcript service");
27	realtimeTranscriber.connect();
28
29	System.out.println("Starting recording");
30	AudioFormat format = new AudioFormat(16_000, 16, 1, true, false);
31	// `line` is your microphone
32	TargetDataLine line = AudioSystem.getTargetDataLine(format);
33	line.open(format);
34	byte[] data = new byte[line.getBufferSize()];
35	line.start();
36	while (!interrupted()) {
37	// Read the next chunk of data from the TargetDataLine.
38	line.read(data, 0, data.length);
39	realtimeTranscriber.sendAudio(data);
40	}
41
42	System.out.println("Stopping recording");
43	line.close();
44
45	System.out.println("Closing real-time transcript connection");
46	realtimeTranscriber.close();
47	} catch (LineUnavailableException e) {
48	throw new RuntimeException(e);
49	}
50	});
51	thread.start();
52
53	System.out.println("Press ENTER key to stop...");
54	System.in.read();
55	thread.interrupt();
56	System.exit(0);
57	}
58	}

1	<dependency>
2	<groupId>com.assemblyai</groupId>
3	<artifactId>assemblyai-java</artifactId>
4	<version>ASSEMBLYAI_SDK_VERSION</version>
5	</dependency>

1	Thread thread = new Thread(() -> {
2	try {
3	// Your existing code here
4	} catch (LineUnavailableException e) {
5	throw new RuntimeException(e);
6	}
7	});
8	thread.start();
9
10	System.out.println("Press ENTER key to stop...");
11	System.in.read();
12	thread.interrupt();
13	System.exit(0);