Transcribe streaming audio from a microphone in Go

Learn how to transcribe streaming audio in Go.

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Go.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

You can download the full sample code from GitHub.

Step 1: Install dependencies

1

PortAudio is a cross-platform library for streaming audio. The Go SDK uses PortAudio to stream audio from your microphone.

$brew install portaudio
2

Install the AssemblyAI Go module using go get.

$go get github.com/AssemblyAI/assemblyai-go-sdk

Step 2: Create a transcriber

In this step, you’ll define a transcriber to handle real-time events.

1

Create a new file called main.go that imports the AssemblyAI Go module.

1package main
2
3import (
4 aai "github.com/AssemblyAI/assemblyai-go-sdk"
5)
2

Create a type that implements RealtimeHandler.

1func main() {
2 transcriber := &aai.RealTimeTranscriber{
3 OnSessionBegins: func(event assemblyai.SessionBegins) {
4 fmt.Println("session begins")
5 },
6 OnSessionTerminated: func(event assemblyai.SessionTerminated) {
7 fmt.Println("session terminated")
8 },
9 OnFinalTranscript: func(transcript assemblyai.FinalTranscript) {
10 fmt.Println(transcript.Text)
11 },
12 OnPartialTranscript: func(transcript assemblyai.PartialTranscript) {
13 fmt.Printf("%s\r", transcript.Text)
14 },
15 OnError: func(err error) {
16 fmt.Printf("Something bad happened: %v", err)
17 },
18 }
19}
3

Browse to Account, and then click Copy API key under Copy your API key.

4

Create a new RealTimeClient using the function you created. Replace YOUR_API_KEY with your copied API key.

1client := aai.NewRealTimeClientWithOptions(
2 aai.WithRealTimeAPIKey("<YOUR_API_KEY>"),
3 aai.WithRealTimeTranscriber(transcriber),
4)
Sample rate

Sample rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network. By default, the SDK uses a sample rate of 16 kHz. You can set your own sample rate using the WithSampleRate option.

1client := aai.NewRealTimeClientWithOptions(
2 aai.WithRealTimeAPIKey("<YOUR_API_KEY>")
3 aai.WithRealTimeTranscriber(transcriber),
4 aai.WithSampleRate(16_000),
5)

We recommend the following sample rates:

  • Minimum quality: 8_000 (8 kHz)
  • Medium quality: 16_000 (16 kHz)
  • Maximum quality: 48_000 (48 kHz)

Step 3: Connect the transcriber

To stream audio to AssemblyAI, you first need to establish a connection to the API using client.Connect().

1ctx := context.Background()
2
3if err := client.Connect(ctx); err != nil {
4 logger.Fatal(err)
5}

You’ve set up the transcriber to handle real-time events, and connected it to the API. Next, you’ll create a recorder to record audio from your microphone.

Step 4: Record audio from microphone

In this step, you’ll configure your Go app to record audio from your microphone. You’ll use the gordonklaus/portaudio module to make this easier.

1

Install the portaudio module for Go.

$go get github.com/gordonklaus/portaudio
2

Create a new file called recorder.go with the following code:

1package main
2
3import (
4 "bytes"
5 "encoding/binary"
6
7 "github.com/gordonklaus/portaudio"
8)
9
10type recorder struct {
11 stream *portaudio.Stream
12 buffer []int16
13}
14
15func newRecorder(sampleRate int, framesPerBuffer int) (*recorder, error) {
16 buffer := make([]int16, framesPerBuffer)
17
18 stream, err := portaudio.OpenDefaultStream(1, 0, float64(sampleRate), framesPerBuffer, buffer)
19 if err != nil {
20 return nil, err
21 }
22
23 return &recorder{
24 stream: stream,
25 buffer: buffer,
26 }, nil
27}
28
29func (r *recorder) Read() ([]byte, error) {
30 if err := r.stream.Read(); err != nil {
31 return nil, err
32 }
33
34 var buf bytes.Buffer
35
36 if err := binary.Write(&buf, binary.LittleEndian, r.buffer); err != nil {
37 return nil, err
38 }
39
40 return buf.Bytes(), nil
41}
42
43func (r *recorder) Start() error {
44 return r.stream.Start()
45}
46
47func (r *recorder) Stop() error {
48 return r.stream.Stop()
49}
50
51func (r *recorder) Close() error {
52 return r.stream.Close()
53}
3

In main.go, open a microphone stream. The sampleRate must be the same value as the one you passed to RealTimeClient (16_000 by default).

1portaudio.Initialize()
2defer portaudio.Terminate()
3
4var (
5 // Must match the sample rate you used for the transcriber.
6 sampleRate = 16000
7
8 // Determines how many audio samples to send at once (3200 / 16000 = 200 ms).
9 framesPerBuffer = 3200
10)
11
12rec, err := newRecorder(sampleRate, framesPerBuffer)
13if err != nil {
14 log.Fatal(err)
15}
16
17if err := rec.Start(); err != nil {
18 log.Fatal(err)
19}
Audio data format

The recorder formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:

  • Single channel
  • 16-bit signed integer PCM or mu-law encoding
4

Read data from the microphone stream, and send it to AssemblyAI for transcription using client.Send().

1for {
2 select {
3 default:
4 // Read audio samples from the microphone.
5 b, err := rec.Read()
6 if err != nil {
7 logger.Fatal(err)
8 }
9
10 // Send partial audio samples.
11 if err := client.Send(ctx, b); err != nil {
12 logger.Fatal(err)
13 }
14 }
15}

Step 5: Disconnect the transcriber

In this step, you’ll clean up resources by stopping the recorder and disconnecting the transcriber.

To disconnect the transcriber on Ctrl+C, use client.Disconnect(). Disconnect() accepts a boolean parameter that allows you to wait for any remaining transcriptions before closing the connection.

1sigs := make(chan os.Signal, 1)
2
3signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
4
5for {
6 select {
7 case <-sigs:
8 // Stop recording.
9 if err := rec.Stop(); err != nil {
10 log.Fatal(err)
11 }
12
13 // Disconnect the transcriber.
14 if err := client.Disconnect(ctx, true); err != nil {
15 log.Fatal(err)
16 }
17
18 os.Exit(0)
19 default:
20 // Read audio samples from the microphone.
21 b, err := rec.Read()
22 if err != nil {
23 logger.Fatal(err)
24 }
25
26 // Send partial audio samples.
27 if err := client.Send(ctx, b); err != nil {
28 logger.Fatal(err)
29 }
30 }
31 }
32}

Run your application to start transcribing. Your OS may require you to allow your app to access your microphone. If prompted, click Allow.

You can also find the source code for this tutorial on GitHub.

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

Was this page helpful?
Built with