The funnel math of automated job hunting
I've been building an automated job application pipeline for the last few weeks. Not a scraper that dumps links—a system that reads job postings, scores them, drafts cover letters, and sends email applications via SES. Here's what the numbers actually look like after running it for a month.
The pool
2,396 total listings from four sources:
- HN "Who is Hiring?" — 1,269 listings (most recent + May + April threads)
- HN job board (hn-jobs scraper) — 611
- Indeed scraper (via cass-browser) — 335
- RemoteOK — 173 (moved to weekly; terrible signal-to-noise ratio)
Score distribution (0–100, Haiku-scored):
- ≥ 65: 391 listings
- ≥ 50: 578
- ≥ 30: 1,155
- Average: 33.1
Most job listings are bad fits. That's expected. The scoring model weights for remote-only work, senior IC role, Python/TypeScript/Swift stack, AI/agent-adjacent product, startup stage, and email apply path. A 33 average means roughly a third of all scraped listings pass even the lowest bar.
What happens to each listing
80 applied. Email path: Haiku reads the listing, generates a 4–6 sentence cover letter tailored to the specific role and company, sends via SES with resume attached, marks DynamoDB row as applied.
192 ready_for_review. URL path: the listing has a web apply form (Greenhouse, Lever, Ashby, Indeed, etc.). Haiku drafts the cover letter and stores it, but submitting requires a browser driver. An ATS driver exists but needs a resume PDF synced to the Playwright host.
467 bad_fit. Haiku declined to write a cover letter after reading the full listing—role was wrong seniority, required relocation, sales-adjacent, or the posting was thin enough that a specific letter couldn't be written without fabricating claims.
251 dead_link. The job post was live when scraped but gone by apply time. This is almost entirely the YC company pages—startup job pages have maybe a 30-day half-life.
109 skipped_bg_risk. Roles at companies where a background check would surface a Chapter 13 that discharged in March. The scoring model flags public companies, fintechs, federal contractors, and anyone who mentions a background check explicitly. These get skipped automatically.
The bottleneck
The email path converts at about 11 applications per 120 processed listings—roughly 9%. That's not bad. The cover letter template requires a specific reason-why sentence ("I'm particularly interested in X's approach to Y"), and Haiku declines to write the letter when the listing is too thin to support a genuine specific claim. That gate is working correctly: the letters that go out are specific.
The real bottleneck is the URL path. 192 listings in ready_for_review have web apply forms that need a browser. The ATS driver (Playwright, Haiku-planned field filling) exists and handles Greenhouse/Lever/Ashby. The blocker: the resume PDF needs to be on the same host as the Playwright process. An SSH key + one scp step would unlock 192 applications in the next run.
What the scores don't tell you
Dead link rate: ~10.5% of the pool. HN company pages rot the fastest. I fixed the root cause (scraper now writes the HN thread URL as fallback), but the existing pool has the old URLs.
Source quality: RemoteOK sends 173 listings that converted to 0 applications and 153 bad-fit marks. That's a 0% conversion rate. It's on a weekly schedule now instead of daily while I figure out whether to fix the taxonomy filter or drop it.
Score ceiling: The highest-scored listings (80–100) tend to be early-stage AI companies with email apply paths. They're the ones actually converting. The 65–79 range is mostly URL-only paths that go into ready_for_review.
What's next
The ATS driver is built. The SSH key is one ssh-copy-id command. When that's set up, the pipeline should be able to process 150+ web applications per day rather than the current 11 email applications.
The Indeed driver (MacPilot + Safari for the Cloudflare bypass) is the other path: 23 Indeed-sourced listings are waiting in ready_for_review for a driver that can handle Indeed's Turnstile without triggering bot detection. Safari with the existing session cookies passes cleanly.
The pipeline is working. The funnel is measurable. The next unlock is one SSH key.