Transcribe streaming audio from a microphone in Go
Learn how to transcribe streaming audio in Go.
Overview
By the end of this tutorial, you'll be able to transcribe audio from your microphone in Go.
Streaming Speech-to-Text is only available for English. See Supported languages.
Step 1: Install dependencies
- 1
PortAudio is a cross-platform library for streaming audio. The Go SDK uses PortAudio to stream audio from your microphone.
- 2
Install the AssemblyAI Go module using
go get
.go get github.com/AssemblyAI/assemblyai-go-sdk
Step 2: Create a transcriber
In this step, you'll define a transcriber to handle real-time events.
- 1
Create a new file called
main.go
that imports the AssemblyAI Go module.package main
import (
aai "github.com/AssemblyAI/assemblyai-go-sdk"
) - 2
Create a type that implements RealtimeHandler.
func main() {
transcriber := &aai.RealTimeTranscriber{
OnSessionBegins: func(event assemblyai.SessionBegins) {
fmt.Println("session begins")
},
OnSessionTerminated: func(event assemblyai.SessionTerminated) {
fmt.Println("session terminated")
},
OnFinalTranscript: func(transcript assemblyai.FinalTranscript) {
fmt.Println(transcript.Text)
},
OnPartialTranscript: func(transcript assemblyai.PartialTranscript) {
fmt.Printf("%s\r", transcript.Text)
},
OnError: func(err error) {
fmt.Printf("Something bad happened: %v", err)
},
}
} - 3
Browse to , and then click Copy API key under Copy your API key.
- 4
Create a new RealTimeClient using the function you created. Replace
YOUR_API_KEY
with your copied API key.client := aai.NewRealTimeClientWithOptions(
aai.WithRealTimeAPIKey("YOUR_API_KEY" ),
aai.WithRealTimeTranscriber(transcriber),
)Sample rateSample rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network. By default, the SDK uses a sample rate of 16 kHz. You can set your own sample rate using the
WithSampleRate
option.client := aai.NewRealTimeClientWithOptions(
aai.WithRealTimeAPIKey("YOUR_API_KEY" )
aai.WithRealTimeTranscriber(transcriber),
aai.WithSampleRate(16_000),
)We recommend the following sample rates:
- Minimum quality:
8_000
(8 kHz) - Medium quality:
16_000
(16 kHz) - Maximum quality:
48_000
(48 kHz)
- Minimum quality:
Step 3: Connect the transcriber
To stream audio to AssemblyAI, you first need to establish a connection to the API using client.Connect()
.
ctx := context.Background()
if err := client.Connect(ctx); err != nil {
logger.Fatal(err)
}
You've set up the transcriber to handle real-time events, and connected it to the API. Next, you'll create a recorder to record audio from your microphone.
Step 4: Record audio from microphone
In this step, you'll configure your Go app to record audio from your microphone. You'll use the gordonklaus/portaudio module to make this easier.
- 1
Install the portaudio module for Go.
go get github.com/gordonklaus/portaudio
- 2
Create a new file called
recorder.go
with the following code:package main
import (
"bytes"
"encoding/binary"
"github.com/gordonklaus/portaudio"
)
type recorder struct {
stream *portaudio.Stream
buffer []int16
}
func newRecorder(sampleRate int, framesPerBuffer int) (*recorder, error) {
buffer := make([]int16, framesPerBuffer)
stream, err := portaudio.OpenDefaultStream(1, 0, float64(sampleRate), framesPerBuffer, buffer)
if err != nil {
return nil, err
}
return &recorder{
stream: stream,
buffer: buffer,
}, nil
}
func (r *recorder) Read() ([]byte, error) {
if err := r.stream.Read(); err != nil {
return nil, err
}
var buf bytes.Buffer
if err := binary.Write(&buf, binary.LittleEndian, r.buffer); err != nil {
return nil, err
}
return buf.Bytes(), nil
}
func (r *recorder) Start() error {
return r.stream.Start()
}
func (r *recorder) Stop() error {
return r.stream.Stop()
}
func (r *recorder) Close() error {
return r.stream.Close()
} - 3
In
main.go
, open a microphone stream. ThesampleRate
must be the same value as the one you passed toRealTimeClient
(16_000
by default).portaudio.Initialize()
defer portaudio.Terminate()
var (
// Must match the sample rate you used for the transcriber.
sampleRate = 16000
// Determines how many audio samples to send at once (3200 / 16000 = 200 ms).
framesPerBuffer = 3200
)
rec, err := newRecorder(sampleRate, framesPerBuffer)
if err != nil {
log.Fatal(err)
}
if err := rec.Start(); err != nil {
log.Fatal(err)
}Audio data formatThe recorder formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:
- Single channel
- 16-bit signed integer PCM or mu-law encoding
- 4
Read data from the microphone stream, and send it to AssemblyAI for transcription using
client.Send()
.for {
select {
default:
// Read audio samples from the microphone.
b, err := rec.Read()
if err != nil {
logger.Fatal(err)
}
// Send partial audio samples.
if err := client.Send(ctx, b); err != nil {
logger.Fatal(err)
}
}
}
Step 5: Disconnect the transcriber
In this step, you'll clean up resources by stopping the recorder and disconnecting the transcriber.
To disconnect the transcriber on Ctrl+C
, use client.Disconnect()
. Disconnect()
accepts a boolean parameter that allows you to wait for any remaining transcriptions before closing the connection.
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
for {
select {
case <-sigs:
// Stop recording.
if err := rec.Stop(); err != nil {
log.Fatal(err)
}
// Disconnect the transcriber.
if err := client.Disconnect(ctx, true); err != nil {
log.Fatal(err)
}
os.Exit(0)
default:
// Read audio samples from the microphone.
b, err := rec.Read()
if err != nil {
logger.Fatal(err)
}
// Send partial audio samples.
if err := client.Send(ctx, b); err != nil {
logger.Fatal(err)
}
}
}
}
Run your application to start transcribing. Your OS may require you to allow your app to access your microphone. If prompted, click Allow.
You can also find the source code for this tutorial on GitHub.
Need some help?
If you get stuck, or have any other questions, we'd love to help you out. Ask our support team in our Discord server.