Build & Learn
October 27, 2025

Detect scam calls using Go with LeMUR and Twilio

Learn how to detect scam attempts in phone calls, using LLM Gateway.

Marcus Olsson
Senior Developer Educator
Marcus Olsson
Senior Developer Educator
Reviewed by
Ryan O'Connor
Senior Developer Educator
Ryan O'Connor
Senior Developer Educator
Table of contents

As technology advances, scam attempts are becoming increasingly sophisticated, and harder to identify. Voice AI uses recent advances in speech recognition and generative AI to understand spoken language, and can give us an upper hand against scammers. In fact, a recent industry report found that over 80% of leaders predict real-time conversation intelligence will be the most transformative capability in the coming year.

In this tutorial, you'll build an application that uses Voice AI to transcribe a phone call and identify scam attempts.

By the end of this tutorial, you'll be able to:

  • Transcribe a phone call in real-time using AssemblyAI and Twilio.
  • Summarize and assess a phone call transcript using AssemblyAI's LLM Gateway.

Before you get started

To complete this tutorial, you'll need:

Step 1: Set up ngrok

Twilio will need to access your server through a publicly available URL. In this tutorial, you'll use ngrok to create a publicly available URL for an application running on your local computer.

💡

If you already have a preferred way to expose your application publicly, you may skip this step.

  1. Sign up for an ngrok account.
  2. Install ngrok for your platform.
  3. Authenticate your ngrok agent using Your Authtoken. ngrok config add-authtoken <your_token>
  4. Open an ngrok tunnel for port 8080. ngrok will only tunnel connections while the following command is running. ngrok http 8080

You'll see something similar to the output below, where the URL next to Forwarding is the publicly available URL that forwards to your local 8080 port (https://84c5df474.ngrok-free.dev in the example output).

ngrok (Ctrl+C to quit)
Session Status                online
Account                       inconshreveable (Plan: Free)
Version                       3.0.0
Region                        United States (us)
Latency                       78ms
Web Interface                 http://127.0.0.1:4040
Forwarding                    https://84c5df474.ngrok-free.dev -> http://localhost:8080

Connections                   ttl     opn     rt1     rt5     p50     p90
                             0       0       0.00    0.00    0.00    0.00

Sample output for ngrok http 8080

Copy the Forwarding URL in your terminal output and save it for the next step.

Step 2: Set up Twilio

You'll need to register a phone number with Twilio and configure it to call your server application whenever someone calls that number. You can also use the Twilio console to update the voice URL for your phone number.

  1. Sign up for a Twilio account.
  2. Download Twilio CLI.
  3. In a new terminal, log in using Twilio CLI. You'll be asked to enter an identifier for your new profile, for example dev. twilio login
  4. Select the profile you created. twilio profiles:use <your_profile_id>
  5. Update the voice URL for your phone number. twilio phone-number:update <your_twilio_number> --voice-url <your_ngrok_url>

💡

You'll find the Account SID, Auth Token, and phone number under Account info on your Twilio console.

Now, when someone calls your phone number, they'll be forwarded to port 8080 on your local computer. Not having to deploy every change to a cloud instance is going to speed up the development process.

Next up, you'll build the server application to handle the phone call.

Security and compliance considerations

Voice call data requires encryption in transit via WSS and PII redaction for stored transcripts. Audio streams contain sensitive information like names and credit card numbers, and a 2025 trends report highlights that over 30% of teams consider data privacy a significant implementation challenge.

Essential security requirements:

  • Data in Transit: Use WSS for encrypted audio streaming
  • PII Handling: Enable PII Redaction to remove sensitive data from transcripts by setting specific PII policies to filter for details like banking information, phone numbers, and email addresses.
  • Storage and Access: Implement encryption at rest and strict access controls

While our API provides the tools for secure handling, it's your responsibility to ensure your application architecture is compliant with regulations like GDPR or TCPA.

Step 3: Transcribe phone calls in real-time

Real-time phone call transcription requires a WebSocket connection between Twilio MediaStream and AssemblyAI's streaming API. The Go server handles audio data in PCMU format at 8000Hz sample rate.

The implementation uses AssemblyAI's Go SDK with Twilio MediaStream for real-time audio processing. The code is updated to use our latest Universal-Streaming (v3) API, which uses an event-based model for handling transcripts.

  1. Create and navigate into a new project directory.
  2. mkdir scam-screener-go
    cd scam-screener-go

  3. Initialize your Go module.
  4. go mod init scam-screener-go
  5. Install the AssemblyAI Go SDK.
  6. go get github.com/AssemblyAI/assemblyai-go-sdk
  7. Install the WebSocket module by Quinn Rivenwell. You'll need this to handle the incoming Twilio MediaStream connection.
  8. go get nhooyr.io/websocket
  9. Create a new file called main.go with the following code. This starter code sets up a real-time transcription session for a Twilio phone call using the latest AssemblyAI Go SDK.
  10. package main

    import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"

    "nhooyr.io/websocket"
    "nhooyr.io/websocket/wsjson"

    aai "github.com/AssemblyAI/assemblyai-go-sdk"
    "github.com/AssemblyAI/assemblyai-go-sdk/streaming/v3"
    )

    var apiKey = os.Getenv("ASSEMBLYAI_API_KEY")

    func main() {
    http.HandleFunc("/", twilio)
    http.HandleFunc("/media", media)
    log.Println("Server is running on port 8080")
    if err := http.ListenAndServe(":8080", nil); err != nil {
    log.Fatal(err)
    }
    }

    func twilio(w http.ResponseWriter, r *http.Request) {
    if r.Method != "POST" {
    w.WriteHeader(http.StatusMethodNotAllowed)
    return
    }
    twiML := `<?xml version="1.0" encoding="UTF-8"?>
     <Response>
       <Say>
         Speak to see your audio transcribed in the console.
       </Say>
       <Connect>
         <Stream url="%s">
         </Stream></Connect>
     </Response>`
    w.Header().Add("Content-Type", "application/xml")
    fmt.Fprintln(w, fmt.Sprintf(twiML, "wss://"+r.Host+"/media"))
    }

    type TwilioMessage struct {
    Event string `json:"event"`
    Media struct {
    // Contains audio samples.
    Payload []byte `json:"payload"`
    } `json:"media"`
    }

    func media(w http.ResponseWriter, r *http.Request) {
    // Upgrade HTTP request to WebSocket.
    c, err := websocket.Accept(w, r, nil)
    if err != nil {
    log.Println("unable to upgrade connection to websocket:", err)
    w.WriteHeader(http.StatusInternalServerError)
    return
    }
    defer c.CloseNow()

    ctx, cancel := context.WithCancel(r.Context())
    defer cancel()

    client := aai.NewStreamingClient(apiKey)

    client.On(streaming.EventBegin, func(event *streaming.BeginEvent) {
    log.Printf("Session started: %s", event.ID)
    })

    client.On(streaming.EventTurn, func(event *streaming.TurnEvent) {
    if event.Transcript == "" {
    return
    }
    if event.EndOfTurn {
    // This is a final transcript for an utterance.
    fmt.Printf("%s\n", event.Transcript)
    } else {
    // This is a partial transcript.
    fmt.Printf("%s\r", event.Transcript)
    }
    })

    client.On(streaming.EventTermination, func(event *streaming.TerminationEvent) {
    log.Printf("Session terminated: %s", event.Message)
    })

    client.On(streaming.EventError, func(event *streaming.ErrorEvent) {
    log.Printf("Error occurred: %s", event.Error)
    })

    err = client.Connect(
    ctx,
    streaming.StreamingParameters{
    // Twilio MediaStream sends audio in mu-law format.
    Encoding: streaming.AudioEncodingPCM_MULAW,
    // Twilio MediaStream sends audio at 8000 samples per second.
    SampleRate: 8000,
    },
    )

    if err != nil {
    log.Println("unable to connect to real-time transcription:", err)
    c.Close(websocket.StatusInternalError, err.Error())
    return
    }
    log.Println("connected to real-time transcription")

    defer client.Disconnect()

    for {
    var message TwilioMessage
    err = wsjson.Read(ctx, c, &message)
    if err != nil {
    log.Println("unable to read twilio message:", err)
    c.Close(websocket.StatusInternalError, err.Error())
    return
    }

    switch message.Event {
    case "connected":
    log.Println("twilio mediastream connected")
    case "start":
    log.Println("twilio mediastream started")
    case "media":
    if err := client.SendAudio(ctx, message.Media.Payload); err != nil {
    log.Println("unable to send audio for real-time transcription:", err)
    c.Close(websocket.StatusInternalError, err.Error())
    return
    }
    case "stop":
    log.Println("twilio mediastream stopped")
    // The client will be disconnected by the defer statement.
    c.Close(websocket.StatusNormalClosure, "")
    return
    }
    }
    }

Try speech AI in your browser

Test transcription and AI models on sample audio before wiring up your backend. No setup required.

Open Playground

Next, you'll inspect the full transcript from the phone call to determine whether it's a scam call or not.

Access LLM Gateway and streaming APIs

Sign up to use the LLM Gateway for call summaries and our real-time transcription APIs from Go.

Get API key

Step 4: Store the full transcript

The real-time streaming service returns a final transcript for each utterance when it detects the end of that utterance. An utterance is a continuous piece of speech separated by silence. Since a phone call may contain multiple utterances, you need to store all final transcripts for later analysis.

Modify the media function to store these transcripts. You can do this by defining a slice within the function's scope and appending to it inside the On(streaming.EventTurn, ...) event handler.

  1. In main.go, inside the media function and before you initialize the client, declare a slice to hold the final transcripts:
  2. var finalTranscripts []string
  3. Update the client.On(streaming.EventTurn, ...) handler to append the transcript text to this slice whenever a final utterance is received (when event.EndOfTurn is true).
  4. client.On(streaming.EventTurn, func(event *streaming.TurnEvent) {
       if event.Transcript == "" {
           return
       }
       if event.EndOfTurn {
           // This is a final transcript for an utterance.
           fmt.Printf("%s\n", event.Transcript)
           finalTranscripts = append(finalTranscripts, event.Transcript)
       } else {
           // This is a partial transcript.
           fmt.Printf("%s\r", event.Transcript)
       }
    })

Once the phone call ends, the finalTranscripts slice will contain the full transcript for the phone call, ready for the next step.

Step 5: Summarize the full transcript using the LLM Gateway

AssemblyAI's LLM Gateway provides a unified interface to powerful Large Language Models (LLMs), allowing you to perform various tasks on audio data. In this step, you'll use the LLM Gateway to create a summary of the phone call and assess whether it's a scam attempt. This is done by making a direct HTTP request to the LLM Gateway API, as this functionality is no longer part of the Go SDK.

  1. Add the necessary imports for making HTTP requests and handling JSON to the top of your main.go file.
  2. import (
       "bytes"
       "encoding/json"
       "strings"
       // ... other imports
    )
  3. Create a new function called summarize. This function constructs a prompt, sends the full transcript to the LLM Gateway's chat completions endpoint, and returns the model's response.
  4. func summarize(ctx context.Context, transcriptText string) (string, error) {
    // Define the prompt structure for the LLM Gateway.
    systemPrompt := "You're a personal assistant helping an elderly person screen for scam calls. Provide a one-sentence summary of the call in the second person, followed by a brief assessment of whether it's a scam call. Do not provide a preamble."
    userPrompt := fmt.Sprintf("Here is the call transcript:\n%s", transcriptText)

    // Prepare the request payload for the LLM Gateway.
    payload := map[string]interface{}{
    "model":      "claude-3-5-haiku-20241022", // A fast and capable model for this task.
    "messages": []map[string]string{
    {"role": "system", "content": systemPrompt},
    {"role": "user", "content": userPrompt},
    },
    "max_tokens":  150,
    "temperature": 0.3,
    }

    payloadBytes, err := json.Marshal(payload)
    if err != nil {
    return "", fmt.Errorf("failed to marshal payload: %w", err)
    }

    // Create and send the HTTP request.
    req, err := http.NewRequestWithContext(ctx, "POST", "https://llm-gateway.assemblyai.com/v1/chat/completions", bytes.NewReader(payloadBytes))
    if err != nil {
    return "", fmt.Errorf("failed to create request: %w", err)
    }
    req.Header.Set("Authorization", apiKey)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
    return "", fmt.Errorf("failed to send request: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
    return "", fmt.Errorf("api request failed with status: %s", resp.Status)
    }

    // Decode the response.
    var result struct {
    Choices []struct {
    Message struct {
    Content string `json:"content"`
    } `json:"message"`
    } `json:"choices"`
    }

    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
    return "", fmt.Errorf("failed to decode response: %w", err)
    }

    if len(result.Choices) == 0 {
    return "", fmt.Errorf("no response choices returned")
    }

    return result.Choices[0].Message.Content, nil
    }
  5. In the media function, update the stop case to call the new summarize function after the client has disconnected.
  6. case "stop":
       log.Println("twilio mediastream stopped")
       // Disconnect the client first to ensure all final transcripts are processed.
       client.Disconnect()
       log.Println("disconnected from real-time transcription")

       // Now, summarize the full transcript.
       summary, err := summarize(ctx, strings.Join(finalTranscripts, "\n"))
       if err != nil {
           log.Println("unable to summarize call:", err)
           c.Close(websocket.StatusInternalError, "Failed to summarize call.")
           return
       }
       log.Println("Summary:", summary)

       c.Close(websocket.StatusNormalClosure, "")
       return
  7. Start the server to accept calls.
  8. go run main.go

Call the Twilio phone number and leave a voice mail, then hang up. In your terminal, you'll see a similar output to the following:

Server is running on port 8080
Session started: 12345-abc-6789
connected to real-time transcription
twilio mediastream connected
twilio mediastream started
Hi. My name is John. I'm excited to share with you an exclusive offer that will slice your electric bill in half. Call me back.
twilio mediastream stopped
Session terminated: Session has been gracefully terminated.
disconnected from real-time transcription
Summary: You received a call from someone named John who is offering an exclusive deal to cut your electric bill in half. This has the characteristics of a potential scam.

Terminal output

Optimizing detection accuracy

Improve LLM Gateway detection accuracy by refining the prompt with specific instructions and examples:

  • Few-shot Examples: Add scam/legitimate call examples with expected outputs within the user message.
  • Specific Detection: Instruct the system prompt to target urgency phrases, PII requests, and unrealistic offers.
  • Context Persona: Adjust the system prompt's persona from "elderly assistant" to "compliance officer" or "security filter".

Iterating on your prompt is the fastest way to tune the system's performance for your specific needs.

Testing and validation strategies

Validate scam detection accuracy with a structured testing approach:

  1. Collect Samples: Record or find audio samples of both known scam calls and legitimate calls (e.g., appointment reminders, customer service inquiries).
  2. Create a Test Suite: Write a script that runs these audio files through your application and saves the LLM Gateway output for each one.
  3. Evaluate Results: Review the outputs. Is the system correctly identifying scams (true positives)? Is it incorrectly flagging legitimate calls (false positives)? Is it missing obvious scams (false negatives)?

Use the evaluation results to guide your prompt optimization. If you notice a pattern in the failures, adjust your prompt to address that specific weakness and run the test suite again.

Performance optimization for production

Scale beyond single-call processing using Go's goroutine model:

  • Each /media WebSocket connection runs in separate goroutine
  • HTTP server automatically spawns goroutines for concurrent requests
  • Deploy multiple instances behind load balancer for horizontal scaling

This architecture allows your application to scale horizontally. As traffic increases, you can run more instances of the application behind a load balancer. This ensures that one long call doesn't block the system from accepting and processing new ones, making your service robust and responsive.

Next steps and advanced implementations

You've now built a functional, real-time scam detection system in Go. By combining Twilio for telephony, our streaming speech-to-text for transcription, and the LLM Gateway for intelligent analysis, you have a powerful foundation for a production-ready Voice AI application.

From here, you could expand the application by:

  • Integrating the output with a database to log scam attempts.
  • Building a real-time dashboard to monitor call activity.
  • Using other AssemblyAI Speech Understanding models, like Sentiment Analysis, to add another layer of detection based on the caller's tone.

Now that you have a working prototype, the next step is to take it to production. To get started with your own application, try our API for free.

Complete code example

package main

import (
"bytes"
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"os"
"strings"

"nhooyr.io/websocket"
"nhooyr.io/websocket/wsjson"

aai "github.com/AssemblyAI/assemblyai-go-sdk"
"github.com/AssemblyAI/assemblyai-go-sdk/streaming/v3"
)

var apiKey = os.Getenv("ASSEMBLYAI_API_KEY")

func main() {
http.HandleFunc("/", twilio)
http.HandleFunc("/media", media)
log.Println("Server is running on port 8080")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}

func twilio(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
twiML := `<?xml version="1.0" encoding="UTF-8"?>
 <Response>
   <Say>
     Speak to see your audio transcribed in the console.
   </Say>
   <Connect>
     <Stream url="%s">
     </Stream></Connect>
 </Response>`
w.Header().Add("Content-Type", "application/xml")
fmt.Fprintln(w, fmt.Sprintf(twiML, "wss://"+r.Host+"/media"))
}

type TwilioMessage struct {
Event string `json:"event"`
Media struct {
// Contains audio samples.
Payload []byte `json:"payload"`
} `json:"media"`
}

func media(w http.ResponseWriter, r *http.Request) {
// Upgrade HTTP request to WebSocket.
c, err := websocket.Accept(w, r, nil)
if err != nil {
log.Println("unable to upgrade connection to websocket:", err)
w.WriteHeader(http.StatusInternalServerError)
return
}
defer c.CloseNow()

ctx, cancel := context.WithCancel(r.Context())
defer cancel()

var finalTranscripts []string

client := aai.NewStreamingClient(apiKey)

client.On(streaming.EventBegin, func(event *streaming.BeginEvent) {
log.Printf("Session started: %s", event.ID)
})

client.On(streaming.EventTurn, func(event *streaming.TurnEvent) {
if event.Transcript == "" {
return
}
if event.EndOfTurn {
// This is a final transcript for an utterance.
fmt.Printf("%s\n", event.Transcript)
finalTranscripts = append(finalTranscripts, event.Transcript)
} else {
// This is a partial transcript.
fmt.Printf("%s\r", event.Transcript)
}
})

client.On(streaming.EventTermination, func(event *streaming.TerminationEvent) {
log.Printf("Session terminated: %s", event.Message)
})

client.On(streaming.EventError, func(event *streaming.ErrorEvent) {
log.Printf("Error occurred: %s", event.Error)
})

err = client.Connect(
ctx,
streaming.StreamingParameters{
// Twilio MediaStream sends audio in mu-law format.
Encoding: streaming.AudioEncodingPCM_MULAW,
// Twilio MediaStream sends audio at 8000 samples per second.
SampleRate: 8000,
},
)

if err != nil {
log.Println("unable to connect to real-time transcription:", err)
c.Close(websocket.StatusInternalError, err.Error())
return
}
log.Println("connected to real-time transcription")

defer client.Disconnect()

for {
var message TwilioMessage
err = wsjson.Read(ctx, c, &message)
if err != nil {
log.Println("unable to read twilio message:", err)
c.Close(websocket.StatusInternalError, err.Error())
return
}

switch message.Event {
case "connected":
log.Println("twilio mediastream connected")
case "start":
log.Println("twilio mediastream started")
case "media":
if err := client.SendAudio(ctx, message.Media.Payload); err != nil {
log.Println("unable to send audio for real-time transcription:", err)
c.Close(websocket.StatusInternalError, err.Error())
return
}
case "stop":
log.Println("twilio mediastream stopped")
// Disconnect the client first to ensure all final transcripts are processed.
client.Disconnect()
log.Println("disconnected from real-time transcription")

// Now, summarize the full transcript.
summary, err := summarize(ctx, strings.Join(finalTranscripts, "\n"))
if err != nil {
log.Println("unable to summarize call:", err)
c.Close(websocket.StatusInternalError, "Failed to summarize call.")
return
}
log.Println("Summary:", summary)

c.Close(websocket.StatusNormalClosure, "")
return
}
}
}

func summarize(ctx context.Context, transcriptText string) (string, error) {
// Define the prompt structure for the LLM Gateway.
systemPrompt := "You're a personal assistant helping an elderly person screen for scam calls. Provide a one-sentence summary of the call in the second person, followed by a brief assessment of whether it's a scam call. Do not provide a preamble."
userPrompt := fmt.Sprintf("Here is the call transcript:\n%s", transcriptText)

// Prepare the request payload for the LLM Gateway.
payload := map[string]interface{}{
"model":      "claude-3-5-haiku-20241022", // A fast and capable model for this task.
"messages": []map[string]string{
{"role": "system", "content": systemPrompt},
{"role": "user", "content": userPrompt},
},
"max_tokens":  150,
"temperature": 0.3,
}

payloadBytes, err := json.Marshal(payload)
if err != nil {
return "", fmt.Errorf("failed to marshal payload: %w", err)
}

// Create and send the HTTP request.
req, err := http.NewRequestWithContext(ctx, "POST", "https://llm-gateway.assemblyai.com/v1/chat/completions", bytes.NewReader(payloadBytes))
if err != nil {
return "", fmt.Errorf("failed to create request: %w", err)
}
req.Header.Set("Authorization", apiKey)
req.Header.Set("Content-Type", "application/json")

client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return "", fmt.Errorf("failed to send request: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
return "", fmt.Errorf("api request failed with status: %s", resp.Status)
}

// Decode the response.
var result struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}

if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return "", fmt.Errorf("failed to decode response: %w", err)
}

if len(result.Choices) == 0 {
return "", fmt.Errorf("no response choices returned")
}

return result.Choices[0].Message.Content, nil
}

Frequently asked questions about detecting spam calls with Go

How do I handle multiple concurrent calls efficiently?

The net/http server automatically spawns goroutines for each WebSocket connection, enabling concurrent call processing without architectural changes.

What audio quality settings work best for spam detection accuracy?

Use 8000Hz sample rate with PCMU (mu-law) encoding for optimal telephony transcription accuracy.

How can I customize the LLM Gateway prompt to catch specific scam patterns?

Add specific detection instructions to the system message in your prompt, for example: Pay close attention to phrases related to tax debt, overdue bills, or limited-time financial offers.

What security measures should I implement when processing call data?

Use WSS for data in transit and PII Redaction for stored transcripts. Implement encryption at rest and strict access controls for stored data.

How do I debug WebSocket connection issues with Twilio MediaStream?

Start by checking your ngrok tunnel is active and the forwarding URL is correct in your Twilio configuration. Add detailed logging to the media handler to track connection states. Common issues include incorrect audio encoding settings, expired ngrok sessions, or firewall restrictions blocking WebSocket connections.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Tutorial
LeMUR