A tiered mail classifier for cheap triage

FILE 0x68·A TIERED MAIL CLASSIFIER FOR CHEAP TRIAGE

May 8, 2026 · email, llm, classification

Once all my mail was flowing through a Lambda, I wanted automatic triage — drop the obvious noise, bucket the rest, only forward the things I'll actually look at. I didn't want to pay a model to read every newsletter, so I went tiered.

The tiers

Tier 0 — deterministic rules. ~70-80% of volume. Sender domain plus List-Id matching plus a couple of body heuristics. Routes known-source mail (GitHub, AWS billing, vendor webhooks, etc.) without ever touching an LLM. About 115 rules in a JSON file, hand-curated, version-controlled.

Tier 1 — Haiku classifier. The ambiguous middle. Subject + first 500 chars + sender metadata → {bucket, confidence, reason}. ~50 messages/day go through this tier — roughly $0.15/month at current volume.

Tier 2 — Sonnet, on demand only. Two cases: 1. Low confidence (<0.7) from Tier 1 escalates here. 2. Anything bucketed as "human, action required" gets a second pass to extract the action, deadline, and a suggested reply draft.

The buckets

Five output buckets and three actions:

inbox — default forward
important — forward, prepend [IMPORTANT] to subject
review — forward, prepend [REVIEW]
bulk — forward, prepend [BULK]
automated — forward, prepend [AUTOMATED]

And three drop targets that bypass the forward entirely — cron noise from my own homelab, bandwidth reports, DMARC aggregate XML. Stamped with X-Mail-Classifier-* headers on the way out so Gmail can route on them.

The aggressive-drops audit

After a few days I ran an audit on what the classifier was dropping and found a problem: about 58% of "dropped" mail (636 of ~1,090) was mis-categorized.

The three biggest false-positives:

self-loop (~590 messages): Anything from me@my-own-domain. I'd assumed this was test noise. Turned out it was things like 401(k) withdrawal confirmations, trip itineraries, and payment confirmations — forwarded from other places where I use that address. Reclassified to review.
mailer-daemon bounces (~16 messages): Actual delivery failure notifications for my own outbound mail. Important. Split the rule: keep dropping postmaster@, forward mailer-daemon@ as review.
parker-daemon (~22 messages): I'd assumed parker@<domain> was a cron user. Turned out to be a real person who happens to send agent-style work updates. Reclassified to review.

The genuinely-droppable stuff — Proxmox backup status mail, daily bandwidth reports, DMARC aggregate XML — was correctly identified. ~5.6% drop rate in the corrected ruleset, which feels honest.

The unsubscribe sweep

Two simple Tier 0 rules added later picked up the long tail of newsletters automatically:

has-list-unsubscribe-header: if the message has an RFC 2369 List-Unsubscribe: header, it's bulk mail by definition.
body-contains-unsubscribe: case-insensitive search for "unsubscribe" in the first 8 KB of body.

Both rules ordered last so they don't override an important match from earlier in the chain. Dry-run found ~30 historical matches; in production the rules auto-classified about 1,500 unmatched senders to [BULK] over the following week.

What I'd do differently

I'd build the audit-the-drops report first, not third. The whole point of a tiered classifier is that the cheap deterministic tier takes most of the volume, so the cost of getting that tier wrong is high. I had 600+ messages mislabeled before I bothered to look. Now the rule is: any drop-action rule has to ship with a dry-run that shows the last N matches in human-readable form for me to approve before it goes live.