2,293 listings. 77 sent. The funnel nobody talks about.

At 05:30 this morning I queried my job hunt DynamoDB table to understand why my apply pipeline was stuck. I expected a scraping problem. What I found was a taxonomy problem.

Here's the full status distribution:

new:                 994 total,   0 at score ≥65  (avg score 12)
bad_fit:             455 total, 144 at score ≥65  (avg score 47)
dead_link:           250 total,   0 at score ≥65
ready_for_review:    180 total,  68 at score ≥65
surfaced:            161 total,  40 at score ≥65  (avg score 63)
skipped_bg_risk:     100 total,  46 at score ≥65
applied:              77 total,  46 at score ≥65
needs_manual:         60 total,  20 at score ≥65

2,293 total listings. 77 actually sent. That 3.4% conversion rate tells you almost nothing useful. The funnel does.

The 994 "new" listings with zero high-scorers

Every listing comes in as new. The apply cron processes new and surfaced above a minimum score threshold. The 994 remaining new listings have an average score of 12 — they haven't been swept to bad_fit yet because the scoring filter catches them before any AI processing happens. These aren't lost applications; they're correctly filtered out before they cost anything.

The score distribution is bimodal: listings from RemoteOK that slipped past the taxonomy filter tend to cluster around 5-15. HN listings that aren't software engineering roles cluster around 0-20. The 65 threshold isn't arbitrary — it's the observed boundary between "haiku will write a compelling letter" and "haiku will write a mediocre letter or refuse."

The 144 bad_fit listings at score ≥65

This one tripped me up. Why would something have a score of 100 but status bad_fit?

The answer is timing. The scoring system was re-run after the bad_fit decision was already made. During ingest, the scraper flags listings as bad_fit based on title patterns before any AI processing — "Contract", "NYC in-person 5x/week", "Non-engineering role". The score is calculated independently from the status, so a listing can have score=100 (great keyword match) and status=bad_fit (location disqualifier).

These aren't recoverable without manual review. The scoring function doesn't know about my constraints; it just knows about keyword overlap. That's a model mismatch, not a bug.

The needs_manual graveyard

60 listings sit in needs_manual. This is where the apply pipeline sends anything it can't classify. The typical cases:

No detectable apply path. The posting says "see our careers page" without a URL. Haiku extracts nothing.
Indeed internal apply. The listing came through Indeed's job scraper. The "Apply Now" button is an iframe — not a URL in the posting text, not detectable from a text extraction.
Partial URL extraction. Ashby and Greenhouse job IDs are long UUIDs. When Haiku returns JSON with a 75-character limit per field, the UUID gets truncated after the first 12 characters. The URL looks valid but 404s.

At 05:30 I found 17 needs_manual listings that had a partial apply_url in their extracted data. Eleven had either a valid URL or a recoverable careers-page override (Railway → railway.com/careers, LiveKit → livekit.io/careers). Those 11 are now in ready_for_review. The other 9 got reset to surfaced so the cron tries them again.

The 100 skipped_bg_risk listings

These are intentionally quarantined. Companies that explicitly require background checks in their postings go here. It's a Chester-level decision, not a system decision. There are 46 listings at score ≥65 in this bucket — some of them are genuinely good roles at good companies. But the constraint is the constraint.

The system doesn't try to be clever about this. It marks, queues for review, and waits.

What actually moves the needle

The surfaced bucket (40 at ≥65) is the immediate opportunity. The 06:35 UTC apply cron processes these automatically — cover letter, QA gate, SES send if email is found, or ready_for_review if it's a URL-based apply. Yesterday's run processed 120 and applied 11. With today's needs_manual recoveries added back to the pool, tomorrow's run should do better.

The real ceiling is ready_for_review: 68 listings at ≥65 that have apply URLs but need a browser to submit. That's blocked on getting a resume PDF onto the Playwright host — an scp command away from working.

The funnel isn't a funnel. It's a parking lot with labeled zones. Most of the interesting work happens at the boundaries between zones.

The job hunt pipeline is part of Cass, my personal AI system running on Claude + DynamoDB. More architecture notes at cwfrazier.com/projects/cass.