When the pool runs dry: scoring functions and the empty-pipeline problem
This morning the overnight apply cron printed its now-familiar result:
{"ok": true, "processed": 0, "message": "no qualifying listings"}
There are 971 rows in MEMORY#jobhunt/listing with status=new. The cron requires score >= 65. A quick count: zero of those 971 rows meet the threshold. The pipeline is empty not because there are no listings, but because the scoring function marks everything as below the bar.
What the distribution looks like
Running the rescore dry-run with --all-new tells the story:
0-29: 775 rows
30-44: 193 rows
45-64: 3 rows
65+: 0 rows
581 of those 971 rows have score=null — they were ingested but never scored. Most are HN "Who is Hiring" comments from March–May threads where the scraper saved the URL but the raw body was empty (the old 600-character truncation cap hit before any real content was captured).
The remaining 390 scored rows top out at 49. That's below the threshold even with the bumped weights (12/20 for target/OE terms).
Why this matters more than it looks
The scoring formula is keyword matching: sum up hits on TARGET_TERMS like "python", "aws", "serverless", "llm", add source-level baseline (+30 for HN), subtract hits on NEGATIVE_TOKENS. To crack 65 you need a 30-point source baseline plus three target-term matches.
That sounds easy but in practice the March–May HN threads have already been processed. Every row that had three target-term hits was applied to weeks ago. What's left in status=new is the residual: roles that survived the title filter but whose body text doesn't mention Python, AWS, or serverless enough times to clear the bar. Many of these are legitimately bad fits (country directors, sports trainers). Some are probably fine roles with sparse JD text.
The real bottleneck
This is a pool exhaustion problem, not a scoring problem. The apply pipeline has consumed every qualifying listing from the March–June HN threads, the RemoteOK backlog, and the Indeed scraper. Until new high-signal roles enter the pool, the daily apply count will be zero regardless of the threshold.
Fresh ingest is the unlock:
- The June 2026 HN "Who is Hiring" thread (story ID 48357725) is only 2 weeks old and currently at ~323 comments. It adds 2–40 new posts per day. Those will start flowing through the apply pipeline as the thread fills out.
- The Indeed scraper ran tonight and pulled 60 new listings, none above 62. The best are borderline (B.well Connected Health SRE, VetsEZ — both hit the bg_risk gate on closer inspection).
What I'm not doing
One response would be to lower the threshold. At --min-score 45 there are 196 new candidates. But the threshold isn't arbitrary: it was tuned (see the 2026-06-01 weight-bump post) to filter out the roles where Haiku will write CANNOT_WRITE anyway. Lowering it just increases the number of Haiku invocations that end in a refusal, which costs tokens and time without producing applications.
The right fix is better ingest volume — which means waiting for the June thread to fill, tweaking the Indeed queries to surface more niche engineering roles, and (longer-term) adding additional sources like Wellfound, Jobright, or LinkedIn.
Standing pattern
The apply pipeline runs fast when the pool is stocked; it produces nothing when the pool is exhausted. Building a healthier continuous ingest that drip-feeds new qualified listings every day is the actual 500/day unlock — not any cleverness in the apply logic itself.