The MSP after-hours problem: why every solution you've tried is wrong

If you run an MSP, you've probably solved the after-hours call problem. And you've probably re-solved it a few times.

Here's how the cycle usually goes.

Iteration 1: Cell phones

In the beginning, you gave your cell number to your best clients. They called when things broke. You answered. You were the safety net.

This worked great until it didn't. "I sleep with my phone volume on because of clients" is a badge of honor in the MSP world right up until it isn't. Until the call at 11:30pm about a printer problem. Until your family asks why you're always at the table but never actually there.

The on-call rotation was supposed to fix this. It didn't fix it — it distributed it. Now it's not just you not sleeping, it's your whole senior team rotating through not sleeping.

Iteration 2: The answering service

An answering service sounds like the answer. Real humans, available 24/7, taking messages and passing them along to whoever's on call.

The problem: answering service operators aren't IT literate. They take a message. They don't know if "the server is down" means the whole company is offline or a user's desktop is slow. They don't know if the caller is your anchor client or a chronic complainer. They page the on-call tech for everything above a certain stated urgency — and "everything" is determined by a script that doesn't know your runbook.

You end up with one of two outcomes:

Too many pages: The answering service escalates everything and your on-call tech gets woken up for P4 issues anyway
Too few pages: You configure the answering service to only escalate on "emergency" calls, and it misses a server-down because the panicked user said "something's wrong with the computer"

The answering service operator can't triage. They can only transcribe.

Iteration 3: The after-hours retainer

Some MSPs hire a part-time NOC or after-hours engineer on retainer. Someone who monitors for alerts and takes calls.

This works. It also costs $3,000-8,000/month for reasonable coverage. At that price point, it's only viable for MSPs above a certain revenue threshold — and even then, you're paying for a human to sit by a phone during hours when 80% of calls are P4 issues that don't need human judgment.

The retainer is the right solution for enterprise MSPs with high call volumes and true 24/7 SLA commitments. For the majority of MSPs — under 50 clients, mostly SMB — it's a lot of money for coverage that's usually idle.

What actually happens at 2am

I've talked to enough MSP owners to know what actually happens at 2am when a client calls.

Scenario A (most common): The call goes to a voicemail. The client panics. They text an emergency number. The on-call tech wakes up. The on-call tech calls back. It turns out a user's Outlook is showing a sync error that they noticed when they woke up to use the bathroom. Ticket created. Everyone goes back to sleep. One hour of lost sleep for a P4.

Scenario B: The call goes to an answering service. The operator takes a message. "Server down — call immediately." On-call tech wakes up, calls client back. It's a single user whose VPN isn't connecting. P3, not a server issue. Another lost hour.

Scenario C (what you want): Something actually is down. Infrastructure-level. Multiple users affected. The on-call tech gets paged immediately with full context — which client, what they said, the severity classification, and a pre-created ticket in ConnectWise. They wake up knowing what the problem is before they pick up the phone.

The problem with the current solutions is that Scenario A and B are indistinguishable from Scenario C until a human evaluates them. And that's what you're paying for — the human judgment to distinguish a P1 from a P4 at 2am.

That judgment doesn't have to be human.

What changes with AI triage

An AI that's trained on your runbook can make the same judgment call a good dispatcher makes:

"Outlook not syncing" → P4, log the ticket, schedule a callback, don't page anyone
"Server is completely down, no one can access files" → P1, page the on-call immediately with the transcript and ticket ID
"VPN isn't connecting" → P3, log the ticket, ask the caller if they want a callback in the morning or prefer to wait
"Our credit card processor is down and we have a line of customers" → P1, page immediately

The difference from an answering service: the AI has your runbook. It knows your escalation thresholds. It knows which clients are on P1 SLAs and which are standard. It creates a ticket in ConnectWise before it pages your engineer — so the engineer wakes up to a ticket with full context instead of a voicemail number and a vague description.

The difference from "just use a better answering service": the AI doesn't charge per-call. The margin math on AI triage vs. human answering services changes significantly at scale.

The cost math

A typical MSP answering service charges $0.75-2.50 per minute for live agent time. At 100 after-hours calls/month averaging 4 minutes each, that's $300-1,000/month — and that's not including the page to the on-call engineer for the 20% that actually need it.

AI triage via Twilio + Lambda costs roughly:

Twilio number: $1.15/month
Twilio voice: ~$0.013/minute (roughly $5-20/month for 100 calls)
Lambda compute: cents per month
Total: $10-25/month for the same 100 calls

At $199/month for NightDesk Solo, you're getting the AI triage + ticket creation + on-call paging + morning debrief + CSAT tracking + call quality scoring for a fraction of what a comparable human answering service costs.

The other way to think about the cost: what's one hour of your on-call engineer's sleep worth? If the AI handles the P4 calls that would have woken them up, you're paying $199/month to preserve that sleep. For most MSP owners, that math is obvious.

What the pilot looks like

The 90-day pilot for NightDesk is free. Here's what happens during those 90 days:

Week 1: You point a Twilio number at the NightDesk webhook. Upload your runbook (we have a template generator). Test with a few simulated calls. Watch the first real call come in and see what the AI does.

Week 2-4: Real calls start coming through. You watch the Slack debrief every morning. You see which calls were P4s that didn't need your engineer (and would have at 2am under the old system). You tune the runbook based on what you see.

Month 2-3: Your on-call rotation relaxes. Not disappears — the true P1s still page your engineer. But the 80% that don't need human judgment are now handled without waking anyone up.

End of pilot: You either see measurably fewer unnecessary wakeups (at which point $199/month is an obvious decision) or you don't (at which point you don't pay anything).

If you're running an MSP and the after-hours call problem sounds familiar, I'm looking for three more pilots. Reply here or reach out at chester@nightdesk.io.