NightDesk: voice triage for MSP after-hours calls

At 2 AM, a client calls your on-call number. The problem: their VPN is down. The fix: reboot the router. The cost: an engineer who now can't get back to sleep until 4.

This happens constantly at managed service providers. Some MSPs use answering services — a real human takes the call, reads from a script, maybe escalates. Most of those humans have no technical knowledge, so they escalate everything. Others just let the call ring through to on-call, which means someone gets woken up for every problem, trivial or not.

I built NightDesk to handle the first layer of that triage — the calls that have a scripted answer and don't need a person.

The architecture

The call flow is: Twilio Voice → Lambda → Claude Haiku → ConnectWise (for ticket creation).

Why Haiku and not Sonnet or Opus? Latency. Twilio's <Gather> verb has a configurable timeout — after the caller finishes talking, you have a window to return the next TwiML instruction before Twilio times out and hangs up or plays silence. That window is short. Haiku answers in under a second on warm Lambda. Sonnet or Opus would push you past 2–3 seconds, and 3 seconds of silence mid-call feels like a dropped connection.

The Lambda function is invoked on every <Gather> callback from Twilio. It receives the caller's SpeechResult (Twilio's STT transcription), pulls the conversation history from DynamoDB, appends the new turn, sends it to Haiku, parses the response, and returns TwiML.

The agent's response includes one of three decisions: GATHER (keep talking), RESOLVE (problem solved), or ESCALATE (need a human). GATHER loops the conversation. RESOLVE creates a closed CW ticket and plays a goodbye message. ESCALATE creates an open ticket and pages the on-call engineer via CW's paging integration — then tells the caller "someone will call you back shortly."

The hard part: stateless multi-turn over Twilio

This is the piece that took the most engineering work. Twilio's <Gather> verb is stateless — each user turn is a separate HTTP POST to your Lambda function. There's no persistent socket, no session cookie, no implicit context. Each request arrives cold.

The conversation history has to live somewhere durable that Lambda can reach in under 200ms. I used DynamoDB with the Twilio CallSid as the partition key. At the start of the call, Lambda creates a record. On each subsequent turn, it fetches the record, deserializes the message array, appends the new turn, calls Haiku, appends the assistant reply, and writes the record back.

The message array gets passed to Claude in the standard messages format. The system prompt is loaded fresh on each turn (more on that below). This means every call to Claude has the full conversation context, which is important — Haiku needs to know whether the caller already said their printer was offline three turns ago.

One edge case: if the Lambda invocation takes too long and Twilio's <Gather> times out, the call falls through to a fallback TwiML route that plays "let me connect you to someone" and escalates. Better to escalate cleanly than to leave the caller in silence.

Runbook injection: no fine-tuning needed

Each MSP tenant uploads a short plaintext runbook — their standard responses, escalation criteria, and any terminology specific to their clients. A typical runbook is 200–400 words. Something like:

If caller reports VPN issues: ask them to reboot the Meraki MX. If that doesn't resolve in 5 minutes, escalate.
If caller reports email down: check if it's Outlook-specific. If yes, ask them to clear Outlook profile. If all mail, escalate immediately.
...

At the start of each call, Lambda fetches the tenant's runbook from S3 and prepends it to Claude's system prompt. This is what lets the agent give reasonable, tenant-specific answers without any model fine-tuning. The knowledge is in the runbook, not the model weights.

The tradeoff: the agent is only as good as the runbook. MSPs that write detailed runbooks get a much better triage experience than ones who upload three lines. This is actually a forcing function — it makes the MSP articulate their own standard procedures, which most have never written down formally.

Ticket creation

Both RESOLVE and ESCALATE decisions create CW tickets through the REST API. The ticket body includes the call transcript — the full conversation in plain text — so whoever opens the ticket in CW can see exactly what the caller said and what the agent told them.

RESOLVE tickets are created as closed with status "Resolved by NightDesk." ESCALATE tickets are open, assigned to the on-call resource, with status "In Progress." The on-call paging goes through CW's schedule API — NightDesk pulls the current on-call contact and triggers an alert.

This means nothing gets lost. Even if the agent handles the call end-to-end, there's a paper trail in CW. MSPs are already working out of CW tickets, so the agent flows into their existing workflow rather than creating a parallel system.

Where it is now

NightDesk is in pilot-only mode. The infrastructure is built and tested, but I'm waiting for the first real MSP to go live with it.

The skeptic's question I keep getting: "Will callers accept talking to a bot?" I think at 2 AM, most callers would rather have a bot give them the VPN reboot instructions right now than wait 20 minutes for a callback. But the first real call will tell me more than any amount of speculation.

If you run an MSP and want to be the pilot customer, I'd like to hear from you.