A tiered mail classifier for cheap triage
Once all my mail was flowing through a Lambda, I wanted automatic triage — drop the obvious noise, bucket the rest, only forward the things I'll actually look at. I didn't want to pay a model to read every newsletter, so I went tiered.
The tiers
Tier 0 — deterministic rules. ~70-80% of volume. Sender domain
plus List-Id matching plus a couple of body heuristics. Routes
known-source mail (GitHub, AWS billing, vendor webhooks, etc.)
without ever touching an LLM. About 115 rules in a JSON file,
hand-curated, version-controlled.
Tier 1 — Haiku classifier. The ambiguous middle. Subject + first
500 chars + sender metadata → {bucket, confidence, reason}. ~50
messages/day go through this tier — roughly $0.15/month at current
volume.
Tier 2 — Sonnet, on demand only. Two cases:
1. Low confidence (<0.7) from Tier 1 escalates here.
2. Anything bucketed as "human, action required" gets a second pass
to extract the action, deadline, and a suggested reply draft.
The buckets
Five output buckets and three actions:
inbox— default forwardimportant— forward, prepend[IMPORTANT]to subjectreview— forward, prepend[REVIEW]bulk— forward, prepend[BULK]automated— forward, prepend[AUTOMATED]
And three drop targets that bypass the forward entirely — cron noise
from my own homelab, bandwidth reports, DMARC aggregate XML. Stamped
with X-Mail-Classifier-* headers on the way out so Gmail can route
on them.
The aggressive-drops audit
After a few days I ran an audit on what the classifier was dropping and found a problem: about 58% of "dropped" mail (636 of ~1,090) was mis-categorized.
The three biggest false-positives:
-
self-loop (~590 messages): Anything from
me@my-own-domain. I'd assumed this was test noise. Turned out it was things like 401(k) withdrawal confirmations, trip itineraries, and payment confirmations — forwarded from other places where I use that address. Reclassified toreview. -
mailer-daemon bounces (~16 messages): Actual delivery failure notifications for my own outbound mail. Important. Split the rule: keep dropping
postmaster@, forwardmailer-daemon@asreview. -
parker-daemon (~22 messages): I'd assumed
parker@<domain>was a cron user. Turned out to be a real person who happens to send agent-style work updates. Reclassified toreview.
The genuinely-droppable stuff — Proxmox backup status mail, daily bandwidth reports, DMARC aggregate XML — was correctly identified. ~5.6% drop rate in the corrected ruleset, which feels honest.
The unsubscribe sweep
Two simple Tier 0 rules added later picked up the long tail of newsletters automatically:
has-list-unsubscribe-header: if the message has an RFC 2369List-Unsubscribe:header, it's bulk mail by definition.body-contains-unsubscribe: case-insensitive search for "unsubscribe" in the first 8 KB of body.
Both rules ordered last so they don't override an important match
from earlier in the chain. Dry-run found ~30 historical matches; in
production the rules auto-classified about 1,500 unmatched senders
to [BULK] over the following week.
What I'd do differently
I'd build the audit-the-drops report first, not third. The whole
point of a tiered classifier is that the cheap deterministic tier
takes most of the volume, so the cost of getting that tier wrong is
high. I had 600+ messages mislabeled before I bothered to look. Now
the rule is: any drop-action rule has to ship with a dry-run that
shows the last N matches in human-readable form for me to approve
before it goes live.