Replacing a binary threshold cliff with a one-hour ramp
The inactivity-monitor cron on a side-project safety app was paging me at 7:01 AM, every morning, with "you haven't checked in." I was asleep. The cause was a hard cliff in the threshold function: at 07:00 sharp, the "acceptable gap since last activity" dropped from ten hours to two. Seven hours of overnight quiet — a perfectly normal night — was suddenly five hours over budget.
What was happening
// before
if ($inSleepWindow) {
$threshold = $sleepInactivityThresholdHours; // 10
} else {
$threshold = $inactivityThresholdHours; // 2
}
At 06:59, inSleepWindow was true, threshold was 10 hours, gap
of 7 hours was fine. At 07:00, the same gap was now five hours
over the active-hours threshold and the alert fired.
The user (me) had not moved or done anything different. The math just changed under me.
What I found
This is a domain modeling problem masquerading as a code problem. A real user's morning is not a step function. They wake up gradually, fumble for a phone, eventually do something the system can observe. The threshold ought to relax into "active" gradually, the same way humans do.
The fix
Linear ramp over the first hour past wake time:
$hoursIntoMorning = $nowHour - $sleepEndHour;
if ($hoursIntoMorning < 0) {
// still in sleep window
$threshold = $sleepInactivityThresholdHours;
} elseif ($hoursIntoMorning < 1.0) {
// ramp 10h -> 2h over the first hour after wake
$t = $hoursIntoMorning;
$threshold = $sleepInactivityThresholdHours
+ $t * ($inactivityThresholdHours - $sleepInactivityThresholdHours);
} else {
$threshold = $inactivityThresholdHours;
}
At 07:00 the threshold is still 10h. At 07:30 it's 6h. At 08:00 it's 2h — fully into active-hours rules. Overnight quiet that lasts until 07:30 no longer fires; quiet that continues past 08:00 does.
I also bumped the sleep-window learner's minimum window from 4 hours to 7 hours, because the learner had been producing unrealistically narrow sleep windows (1am-5am for me) and the cliff-at-the-edge problem was being amplified by the bad window.
What I'd do differently
Step-function thresholds in any time-of-day-aware system are a smell. Almost any state that depends on "what time is it" has fuzzy edges in reality and benefits from a ramp.
The general pattern: if you're tempted to write
if (timeNow > someBoundary) thresholdA else thresholdB, ask
"is there a real-world reason this should be discontinuous?"
Usually the answer is no. Replace with a ramp.
Bonus side effect: the ramp is debuggable. When the alert fires mid-ramp I can compute the threshold value at that moment and explain to myself exactly why this gap was over budget. The cliff version was either "below threshold" or "way over threshold" with no useful gradient between them.