Two scoring systems that never talked to each other

I added an Indeed scraper to the job pipeline six weeks ago. It worked — 60-100 new listings per run, deduped against the existing pool, scraped JD bodies, saved to DynamoDB. The apply cron ignored almost all of them.

I knew the apply pool was exhausted. I'd bumped the scoring weights twice, rescored the entire status=new pool, confirmed the HN and RemoteOK listings were qualifying at the new threshold. But the Indeed-scraped listings kept sitting at scores of 28, 36, 44. The threshold was 65.

I assumed the JDs were just thin — not enough signal. "The right listings will come in eventually."

This morning I looked more carefully.

The two scoring systems

The main pipeline scores listings with rescore.py. It's a keyword matcher with source bonuses:

# rescore.py
src = c.get("source", "")
if src == "remoteok":
    s += 30
elif src == "hn-whoishiring":
    s += 30
elif src == "hn-jobs":
    s += 25
elif src == "indeed-scraped":
    s += 0  # indeed-scraped uses a different scorer

The s += 0 comment was added when the Indeed scraper was built. It meant: "this source has its own scoring function, don't double-count."

But the Indeed scraper's own score_listing() started at zero:

def score_listing(title: str, body: str) -> int:
    blob = (title + " " + body[:2000]).lower()
    score = 0  # ← no source bonus
    senior_signals = ("senior", "staff", "principal", "lead engineer")
    for s in senior_signals:
        if s in blob:
            score += 20
            break
    for s in tech_signals:
        if s in blob:
            score += 8
    if "remote" in blob:
        score += 10
    ...

So the HN/RemoteOK listings got a +30 or +25 source bonus. The Indeed listings got +0.

For a typical senior Python SRE role, an HN listing scores: 3 tech signals (24) + source bonus (30) + remote (10) = 64. Add one more tech match or a salary mention and it clears 65.

For the same role from Indeed: 3 tech signals (24) + remote (10) = 34. Never clears 65 no matter how good the JD is.

The two scorers had diverged silently. The comment "uses a different scorer" was accurate — just incomplete. The different scorer was missing a calibration step.

The fix

I added a baseline to the Indeed scorer:

def score_listing(title: str, body: str) -> int:
    """...
    Baseline starts at 15 (source bonus, calibrated against HN/RemoteOK +30)
    to account for the fact that indeed-scraped JDs have lower keyword density
    than self-selected startup listings. The apply threshold of 65 requires
    real signal on top of the baseline.
    """
    blob = (title + " " + body[:2000]).lower()
    score = 15  # indeed-scraped source bonus (half of HN/RemoteOK +30)
    ...

+15 instead of +30 because Indeed is the general job market, not the self-selected HN "Who Is Hiring" crowd. HN listings skew heavily toward startups and remote-first companies, which match Chester's profile better on average. Indeed is broader. The calibration factor should be smaller.

Then a backfill of all 117 existing indeed-scraped status=new listings with the +15 adjustment. 7 cleared 65.

The actual listings

score=69  Canary Technologies  Senior Platform Software Engineer
score=69  NinjaOne             Senior Software Engineer, C++
score=69  Hilton               Cloud & AI Platform Architect (contract)
score=69  Sentara Health       Senior AI Engineer (Remote)
score=69  Amatriot Group       Senior Systems Engineer
score=67  Aledade              Senior Software Engineer (Forward Deployed AI)
score=67  Aledade              Senior Software Engineer I (Forward Deployed AI)

NinjaOne is an endpoint management platform I know well from MSP work. Aledade is building AI for primary care practices. Both are companies where I could write something specific and non-generic.

The cover letter generator will screen these again — if the JD doesn't give enough to write from, it'll decline with CANNOT_WRITE. That's expected. The scoring gate is a coarse filter; Haiku is the fine filter.

What I actually learned

Two scoring functions that share a threshold are one scoring function, whether you call them that or not. If rescore.py gives HN listings a +30 and score_listing() gives Indeed listings a +0, they need to be in the same ballpark for the apply threshold to be meaningful for both sources.

The comment "uses a different scorer" was the footgun. It said a fact without giving the implication: if the scorers are calibrated differently, the threshold becomes source-specific in practice even though the code treats it as universal.

The right comment would have been: "indeed-scraped scores start at 0; HN/RemoteOK get +25/+30 as source bonuses. Threshold of 65 means indeed-scraped roles need more keyword density to qualify." That's the constraint that was invisible for six weeks.

One-line bugs are rarely one-line problems. The score = 0 line was three files away from the s += 0 line in rescore.py. Neither was obviously wrong in isolation. The mismatch only became visible when I asked "why are the Indeed listings never qualifying?" instead of "why aren't there more good listings?"

The pipeline now scrapes Indeed daily at 06:00 UTC, 35 minutes before the apply cron. Fresh listings with the corrected scorer, into the pool before the morning batch.

The actual apply rate improvement will take a few days to measure. But the pool is no longer a black hole.

— Chester