A keyword router that lied about which mode it was in

FILE 0x3A·A KEYWORD ROUTER THAT LIED ABOUT WHICH MODE IT WAS IN

April 30, 2026 · llm, router, prompts, bugs

My assistant has a cheap classifier that routes short, simple questions to a fast model without tools, and harder questions to a tool-capable model. The classifier was mis-routing "what are the new tickets on my board" to the cheap path. Worse, the cheap model then claimed it was in "fallback mode" when it was actually in chat-only mode by design.

What was happening

The router has two keyword lists:

QUICK_KEYWORDS — short factual lookups that should skip tools. Triggered when the message starts with patterns like "what are…" and is under 30 words.
AGENT_KEYWORDS — terms that always force the tool-capable path because they reference systems the fast model can't reach.

The set difference matters: if a message hits a QUICK pattern and doesn't hit any AGENT pattern, it gets cheap-routed. Most of the time that's correct. "What are the time zones in the US" doesn't need tools.

But "what are the new tickets on my board" was hitting QUICK (starts with "what are…", under 30 words) and not hitting AGENT — because the agent list didn't include ticket, board, or any of my actual support-system vocabulary. So the cheap model got the question. The cheap model has no tools and no access to that system. It correctly punted, but the way it punted was wrong:

I'm in fallback mode and can't access your tickets.

There's no "fallback mode." That phrasing came from the cheap model's own system prompt, which I'd written months ago with a different mental model. It was telling the user about an internal state that doesn't really exist, which made the bug look like a hallucination instead of a routing problem.

What I found

Two changes had to happen in lockstep:

Make the agent list aware of every system the assistant can reach. If a keyword references a tool, it has to be in AGENT_KEYWORDS so the router knows to take the tool path.
Fix the cheap model's prompt so when it does decline, it describes its actual mode honestly. "I'm running in chat-only mode for this turn — ask me again and I'll route to the tool-capable agent" beats "fallback" by a mile, because the user can act on it.

The fix

AGENT_KEYWORDS = [
    # ... existing entries ...
    r"\bticket(s)?\b",
    r"\bboard\b",
    r"\bpsa\b",
    r"\b(connect|manage)\.com\b",
    # plus the analogous keywords for every other tool the
    # assistant has — memex, signal, mail, todos, calendar, etc.
]

And in the cheap model's system prompt:

You are operating in chat-only mode for this single turn — no
tools are available. If the user asks for something that needs a
tool, say so plainly and tell them you'll route the question on
re-ask. Do not call this "fallback mode" or "limited mode" — say
chat-only mode and name the tool the question needs.

After both changes: same question routes to the agent path, returns the tickets. When a true chat-only question lands on the cheap model and triggers the decline, the decline is at least truthful.

What I'd do differently

The smaller lesson: any keyword router has a vocabulary maintenance problem. Every new tool I add expands what AGENT_KEYWORDS needs to know. I should be generating that list from the tool registry instead of hand-maintaining it — there's no reason the source of truth shouldn't be "every tool name and its known synonyms," with the router compiling that into the regex at startup.

The larger one: be careful what your model's system prompt asserts about itself. The cheap model wasn't hallucinating when it said "fallback mode" — it was reciting the words I'd put in front of it. If those words don't reflect the actual runtime state, you've shipped a confidently-wrong assistant. Treat the prompt's first-person claims as carefully as you'd treat any other API contract.