Transcribe streaming audio from a microphone in Go — AssemblyAI

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Go.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Go installed.
An AssemblyAI account with a credit card set up.

You can download the full sample code from GitHub.

Step 1: Install dependencies

PortAudio is a cross-platform library for streaming audio. The Go SDK uses PortAudio to stream audio from your microphone.

macOS

Windows

Linux

$ brew install portaudio

Install the AssemblyAI Go module using go get.

$ go get github.com/AssemblyAI/assemblyai-go-sdk

Step 2: Create a transcriber

In this step, you’ll define a transcriber to handle real-time events.

Create a new file called main.go that imports the AssemblyAI Go module.

1 package main
2 
3 import (
4     aai "github.com/AssemblyAI/assemblyai-go-sdk"
5 )

Create a type that implements RealtimeHandler.

1 func main() {
2     transcriber := &aai.RealTimeTranscriber{
3         OnSessionBegins: func(event assemblyai.SessionBegins) {
4             fmt.Println("session begins")
5         },
6         OnSessionTerminated: func(event assemblyai.SessionTerminated) {
7             fmt.Println("session terminated")
8         },
9         OnFinalTranscript: func(transcript assemblyai.FinalTranscript) {
10             fmt.Println(transcript.Text)
11         },
12         OnPartialTranscript: func(transcript assemblyai.PartialTranscript) {
13             fmt.Printf("%s\r", transcript.Text)
14         },
15         OnError: func(err error) {
16             fmt.Printf("Something bad happened: %v", err)
17         },
18     }
19 }

Browse to Account, and then click Copy API key under Copy your API key.

Create a new RealTimeClient using the function you created. Replace YOUR_API_KEY with your copied API key.

1 client := aai.NewRealTimeClientWithOptions(
2     aai.WithRealTimeAPIKey("<YOUR_API_KEY>"),
3     aai.WithRealTimeTranscriber(transcriber),
4 )

Sample rate

Sample rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network. By default, the SDK uses a sample rate of 16 kHz. You can set your own sample rate using the WithSampleRate option.

1 client := aai.NewRealTimeClientWithOptions(
2     aai.WithRealTimeAPIKey("<YOUR_API_KEY>")
3     aai.WithRealTimeTranscriber(transcriber),
4     aai.WithSampleRate(16_000),
5 )

We recommend the following sample rates:

Minimum quality: 8_000 (8 kHz)
Medium quality: 16_000 (16 kHz)
Maximum quality: 48_000 (48 kHz)

Step 3: Connect the transcriber

To stream audio to AssemblyAI, you first need to establish a connection to the API using client.Connect().

1 ctx := context.Background()
2 
3 if err := client.Connect(ctx); err != nil {
4     logger.Fatal(err)
5 }

You’ve set up the transcriber to handle real-time events, and connected it to the API. Next, you’ll create a recorder to record audio from your microphone.

Step 4: Record audio from microphone

In this step, you’ll configure your Go app to record audio from your microphone. You’ll use the gordonklaus/portaudio module to make this easier.

Install the portaudio module for Go.

$ go get github.com/gordonklaus/portaudio

Create a new file called recorder.go with the following code:

1 package main
2 
3 import (
4     "bytes"
5     "encoding/binary"
6 
7     "github.com/gordonklaus/portaudio"
8 )
9 
10 type recorder struct {
11     stream *portaudio.Stream
12     buffer []int16
13 }
14 
15 func newRecorder(sampleRate int, framesPerBuffer int) (*recorder, error) {
16     buffer := make([]int16, framesPerBuffer)
17 
18     stream, err := portaudio.OpenDefaultStream(1, 0, float64(sampleRate), framesPerBuffer, buffer)
19     if err != nil {
20         return nil, err
21     }
22 
23     return &recorder{
24         stream: stream,
25         buffer: buffer,
26     }, nil
27 }
28 
29 func (r *recorder) Read() ([]byte, error) {
30     if err := r.stream.Read(); err != nil {
31         return nil, err
32     }
33 
34     var buf bytes.Buffer
35 
36     if err := binary.Write(&buf, binary.LittleEndian, r.buffer); err != nil {
37         return nil, err
38     }
39 
40     return buf.Bytes(), nil
41 }
42 
43 func (r *recorder) Start() error {
44     return r.stream.Start()
45 }
46 
47 func (r *recorder) Stop() error {
48     return r.stream.Stop()
49 }
50 
51 func (r *recorder) Close() error {
52     return r.stream.Close()
53 }

In main.go, open a microphone stream. The sampleRate must be the same value as the one you passed to RealTimeClient (16_000 by default).

1 portaudio.Initialize()
2 defer portaudio.Terminate()
3 
4 var (
5     // Must match the sample rate you used for the transcriber.
6     sampleRate = 16000
7 
8     // Determines how many audio samples to send at once (3200 / 16000 = 200 ms).
9     framesPerBuffer = 3200
10 )
11 
12 rec, err := newRecorder(sampleRate, framesPerBuffer)
13 if err != nil {
14     log.Fatal(err)
15 }
16 
17 if err := rec.Start(); err != nil {
18     log.Fatal(err)
19 }

Audio data format

The recorder formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:

Single channel
16-bit signed integer PCM or mu-law encoding

Read data from the microphone stream, and send it to AssemblyAI for transcription using client.Send().

1 for {
2     select {
3     default:
4         // Read audio samples from the microphone.
5         b, err := rec.Read()
6         if err != nil {
7             logger.Fatal(err)
8         }
9 
10         // Send partial audio samples.
11         if err := client.Send(ctx, b); err != nil {
12             logger.Fatal(err)
13         }
14     }
15 }

Step 5: Disconnect the transcriber

In this step, you’ll clean up resources by stopping the recorder and disconnecting the transcriber.

To disconnect the transcriber on Ctrl+C, use client.Disconnect(). Disconnect() accepts a boolean parameter that allows you to wait for any remaining transcriptions before closing the connection.

1 sigs := make(chan os.Signal, 1)
2 
3 signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
4 
5 for {
6     select {
7     case <-sigs:
8         // Stop recording.
9         if err := rec.Stop(); err != nil {
10             log.Fatal(err)
11         }
12 
13         // Disconnect the transcriber.
14         if err := client.Disconnect(ctx, true); err != nil {
15             log.Fatal(err)
16         }
17 
18         os.Exit(0)
19     default:
20         // Read audio samples from the microphone.
21         b, err := rec.Read()
22         if err != nil {
23             logger.Fatal(err)
24         }
25 
26         // Send partial audio samples.
27         if err := client.Send(ctx, b); err != nil {
28             logger.Fatal(err)
29         }
30     }
31     }
32 }

Run your application to start transcribing. Your OS may require you to allow your app to access your microphone. If prompted, click Allow.

You can also find the source code for this tutorial on GitHub.

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

1	package main
2
3	import (
4	aai "github.com/AssemblyAI/assemblyai-go-sdk"
5	)

1	func main() {
2	transcriber := &aai.RealTimeTranscriber{
3	OnSessionBegins: func(event assemblyai.SessionBegins) {
4	fmt.Println("session begins")
5	},
6	OnSessionTerminated: func(event assemblyai.SessionTerminated) {
7	fmt.Println("session terminated")
8	},
9	OnFinalTranscript: func(transcript assemblyai.FinalTranscript) {
10	fmt.Println(transcript.Text)
11	},
12	OnPartialTranscript: func(transcript assemblyai.PartialTranscript) {
13	fmt.Printf("%s\r", transcript.Text)
14	},
15	OnError: func(err error) {
16	fmt.Printf("Something bad happened: %v", err)
17	},
18	}
19	}

1	client := aai.NewRealTimeClientWithOptions(
2	aai.WithRealTimeAPIKey("<YOUR_API_KEY>"),
3	aai.WithRealTimeTranscriber(transcriber),
4	)

1	ctx := context.Background()
2
3	if err := client.Connect(ctx); err != nil {
4	logger.Fatal(err)
5	}

1	package main
2
3	import (
4	"bytes"
5	"encoding/binary"
6
7	"github.com/gordonklaus/portaudio"
8	)
9
10	type recorder struct {
11	stream *portaudio.Stream
12	buffer []int16
13	}
14
15	func newRecorder(sampleRate int, framesPerBuffer int) (*recorder, error) {
16	buffer := make([]int16, framesPerBuffer)
17
18	stream, err := portaudio.OpenDefaultStream(1, 0, float64(sampleRate), framesPerBuffer, buffer)
19	if err != nil {
20	return nil, err
21	}
22
23	return &recorder{
24	stream: stream,
25	buffer: buffer,
26	}, nil
27	}
28
29	func (r *recorder) Read() ([]byte, error) {
30	if err := r.stream.Read(); err != nil {
31	return nil, err
32	}
33
34	var buf bytes.Buffer
35
36	if err := binary.Write(&buf, binary.LittleEndian, r.buffer); err != nil {
37	return nil, err
38	}
39
40	return buf.Bytes(), nil
41	}
42
43	func (r *recorder) Start() error {
44	return r.stream.Start()
45	}
46
47	func (r *recorder) Stop() error {
48	return r.stream.Stop()
49	}
50
51	func (r *recorder) Close() error {
52	return r.stream.Close()
53	}

1	portaudio.Initialize()
2	defer portaudio.Terminate()
3
4	var (
5	// Must match the sample rate you used for the transcriber.
6	sampleRate = 16000
7
8	// Determines how many audio samples to send at once (3200 / 16000 = 200 ms).
9	framesPerBuffer = 3200
10	)
11
12	rec, err := newRecorder(sampleRate, framesPerBuffer)
13	if err != nil {
14	log.Fatal(err)
15	}
16
17	if err := rec.Start(); err != nil {
18	log.Fatal(err)
19	}

1	for {
2	select {
3	default:
4	// Read audio samples from the microphone.
5	b, err := rec.Read()
6	if err != nil {
7	logger.Fatal(err)
8	}
9
10	// Send partial audio samples.
11	if err := client.Send(ctx, b); err != nil {
12	logger.Fatal(err)
13	}
14	}
15	}

1	sigs := make(chan os.Signal, 1)
2
3	signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
4
5	for {
6	select {
7	case <-sigs:
8	// Stop recording.
9	if err := rec.Stop(); err != nil {
10	log.Fatal(err)
11	}
12
13	// Disconnect the transcriber.
14	if err := client.Disconnect(ctx, true); err != nil {
15	log.Fatal(err)
16	}
17
18	os.Exit(0)
19	default:
20	// Read audio samples from the microphone.
21	b, err := rec.Read()
22	if err != nil {
23	logger.Fatal(err)
24	}
25
26	// Send partial audio samples.
27	if err := client.Send(ctx, b); err != nil {
28	logger.Fatal(err)
29	}
30	}
31	}
32	}