Back to blog
FILE 0x68·A TIERED MAIL CLASSIFIER FOR CHEAP TRIAGE

A tiered mail classifier for cheap triage

Back to blog
FILE 0x68·A TIERED MAIL CLASSIFIER FOR CHEAP TRIAGE
Back to blog
FILE 0x68·A TIERED MAIL CLASSIFIER FOR CHEAP TRIAGE
May 8, 2026 · email, llm, classification

Once all my mail was flowing through a Lambda, I wanted automatic triage — drop the obvious noise, bucket the rest, only forward the things I'll actually look at. I didn't want to pay a model to read every newsletter, so I went tiered.

The tiers

Tier 0 — deterministic rules. ~70-80% of volume. Sender domain plus List-Id matching plus a couple of body heuristics. Routes known-source mail (GitHub, AWS billing, vendor webhooks, etc.) without ever touching an LLM. About 115 rules in a JSON file, hand-curated, version-controlled.

Tier 1 — Haiku classifier. The ambiguous middle. Subject + first 500 chars + sender metadata → {bucket, confidence, reason}. ~50 messages/day go through this tier — roughly $0.15/month at current volume.

Tier 2 — Sonnet, on demand only. Two cases: 1. Low confidence (<0.7) from Tier 1 escalates here. 2. Anything bucketed as "human, action required" gets a second pass to extract the action, deadline, and a suggested reply draft.

The buckets

Five output buckets and three actions:

  • inbox — default forward
  • important — forward, prepend [IMPORTANT] to subject
  • review — forward, prepend [REVIEW]
  • bulk — forward, prepend [BULK]
  • automated — forward, prepend [AUTOMATED]

And three drop targets that bypass the forward entirely — cron noise from my own homelab, bandwidth reports, DMARC aggregate XML. Stamped with X-Mail-Classifier-* headers on the way out so Gmail can route on them.

The aggressive-drops audit

After a few days I ran an audit on what the classifier was dropping and found a problem: about 58% of "dropped" mail (636 of ~1,090) was mis-categorized.

The three biggest false-positives:

  • self-loop (~590 messages): Anything from me@my-own-domain. I'd assumed this was test noise. Turned out it was things like 401(k) withdrawal confirmations, trip itineraries, and payment confirmations — forwarded from other places where I use that address. Reclassified to review.

  • mailer-daemon bounces (~16 messages): Actual delivery failure notifications for my own outbound mail. Important. Split the rule: keep dropping postmaster@, forward mailer-daemon@ as review.

  • parker-daemon (~22 messages): I'd assumed parker@<domain> was a cron user. Turned out to be a real person who happens to send agent-style work updates. Reclassified to review.

The genuinely-droppable stuff — Proxmox backup status mail, daily bandwidth reports, DMARC aggregate XML — was correctly identified. ~5.6% drop rate in the corrected ruleset, which feels honest.

The unsubscribe sweep

Two simple Tier 0 rules added later picked up the long tail of newsletters automatically:

  1. has-list-unsubscribe-header: if the message has an RFC 2369 List-Unsubscribe: header, it's bulk mail by definition.
  2. body-contains-unsubscribe: case-insensitive search for "unsubscribe" in the first 8 KB of body.

Both rules ordered last so they don't override an important match from earlier in the chain. Dry-run found ~30 historical matches; in production the rules auto-classified about 1,500 unmatched senders to [BULK] over the following week.

What I'd do differently

I'd build the audit-the-drops report first, not third. The whole point of a tiered classifier is that the cheap deterministic tier takes most of the volume, so the cost of getting that tier wrong is high. I had 600+ messages mislabeled before I bothered to look. Now the rule is: any drop-action rule has to ship with a dry-run that shows the last N matches in human-readable form for me to approve before it goes live.