Real-time voice triage with Twilio, Claude, and ConnectWise

The MSP after-hours problem isn't hard to describe: a client calls at 11 PM, the on-call tech's phone rings, and 70% of the time it's a password reset that could have waited until morning. After enough of those calls, the tech starts not picking up. The client has a bad experience. The MSP loses a tech.

The fix is triage: classify whether it's P1 (server down, production is on fire) or P3 (forgot my password) before it reaches the human. Wake the human for P1. Log everything else for next-business-day.

The interesting engineering question is when to triage. You could do it after the call ends: record → transcribe → classify. But that's too late for P1s — you've already hung up on the client who needed help.

NightDesk triages during the call, one turn at a time.

The triage kernel

The core logic is a single function:

def run_turn(turns: list[Turn], runbook: Runbook, llm: LLMInvoker) -> Decision:
    ...

turns is the conversation so far. runbook is the per-customer context (what does "normal" look like for this customer, who to wake for P1s, what their business hours are). llm is the Claude Haiku callable.

The kernel returns a Decision with one of three actions:

GATHER — ask the caller another question. The text in the decision is what the agent says out loud.
RESOLVE — issue is handled; write a ticket, hang up cleanly.
ESCALATE — wake the on-call tech; write an open ticket; stay on the line.

Bias is toward ESCALATE on uncertainty. False-positive escalations are a cheap mistake (tech gets woken up for a P2). False-negative escalations are a fired customer.

# From the prompt passed to Haiku on each turn:
#
# You are a phone-based triage agent for an MSP. Based on the
# conversation so far and the customer runbook, decide:
#
# - GATHER: you need more information. Include a short question.
# - RESOLVE: the issue can wait or is already addressed.
# - ESCALATE: this warrants waking the on-call engineer immediately.
#
# When uncertain, escalate. A false positive wakes a tech for 5 minutes.
# A false negative leaves a client's server down overnight.

The Twilio integration

Twilio handles the actual phone call and posts webhook events to the Lambda:

POST /twilio/incoming — new inbound call. Return TwiML to greet and prompt.
POST /twilio/gather — caller finished speaking. Body contains the transcribed speech. Return TwiML based on the kernel's decision: speak the GATHER text, or read a RESOLVE/ESCALATE closing message.

Every call gets an in-flight record in DynamoDB: call SID, tenant, the running transcript, and the current kernel state. Between turns, there's no state in the Lambda — it's loaded from DDB at the start of each webhook.

The Twilio integration is shallow by design. Any other VoIP provider that can POST a webhook and accept TwiML-equivalent instructions can slot in. The triage kernel has no knowledge of Twilio.

The per-customer runbook

This is what makes the difference between generic AI responses and actually useful triage. Each MSP customer has a YAML runbook:

customer_name: "Acme Corp"
business_hours: "Mon-Fri 8a-6p CT"
known_issues:
  - "WiFi at main office drops intermittently — known, scheduled for Thursday"
  - "Backup server ran slow last week — being monitored"
escalation_rules:
  - trigger: "server down OR can't access email OR everyone locked out"
    target: "mike@acmemsp.com"
    method: "sms"
  - trigger: "slow OR cannot print OR password"
    target: null
    method: "noop"  # no escalation — morning ticket only
p1_threshold: "production is blocked for more than 1 person"

The runbook gets injected into every Haiku prompt for this customer's calls. "WiFi at main office drops intermittently" is context the triage agent needs — without it, a caller saying "the WiFi is slow" would be ambiguous.

MSPs fill these out during onboarding. The format is intentionally human-readable YAML so the MSP owner can edit it directly without a UI.

The CW ticket

Whether RESOLVE or ESCALATE, a ConnectWise ticket gets created via the CW Manage REST API. The ticket includes:

Summary: first 120 characters of the caller's opening statement
Body: full call transcript + Haiku's classification reasoning + call metadata
Priority: P1 (critical), P2 (high), P3 (normal)
Board: the MSP's after-hours board (configured per tenant)

For ESCALATE, the ticket is opened; for RESOLVE, it's opened and immediately set to "closed with notes." The on-call tech sees one open ticket for P1s in the morning triage view.

The dashboard

There's a /dashboard route that renders the last 50 calls: timestamp, tenant, priority badge, first caller turn (so the on-call tech can see what the issue was), and ticket link. Protected by a bearer token.

Not a SIEM. Just enough context for a night manager or the MSP owner to know "what happened overnight" without opening ConnectWise.

Testing a triage kernel

The kernel is the part worth testing carefully. All of its side effects (Haiku calls, CW API, paging) are injected via dependencies, so tests use fakes:

def test_p1_escalates_immediately(self):
    llm_stub = LLMStub(responses=["ESCALATE | Our servers are down."])
    runbook = fixtures.ACME_RUNBOOK
    turns = [Turn("caller", "Our email server is down and nobody can get in.")]
    decision = run_turn(turns, runbook, llm=llm_stub)
    self.assertEqual(decision.action, Action.ESCALATE)

The scripted-conversation tests run an entire multi-turn conversation with a stubbed LLM and verify the kernel reaches the right terminal action. These tests catch prompt regressions — if a rewrite of the Haiku prompt causes GATHER to fire when ESCALATE was expected, the test catches it before it reaches a real call.

What I'd build next

Pre-call context injection: when a caller's number matches a known contact in CW, retrieve their company's recent ticket history and inject it into the runbook before the first turn. "This caller's company has had three P1s in the past 90 days, two involving their cloud services." That changes the escalation threshold — companies with a recent incident history get a lower bar.

The data is already in CW. It's an API call during the initial webhook. Low effort, high signal.

The code is in the NightDesk repo. The pilot pricing is $199/month for up to 500 endpoints.

— Chester