The three failure modes that kill personal AI agents
I've been running a personal AI agent (Cass) for over a year. I know a lot of engineers who have tried to build one and given up. They all fail the same three ways.
If you're building a personal AI assistant and it's not sticking, you're probably hitting one of these.
Failure mode 1: Memory loss
The symptom: "Why doesn't it remember what I told it?"
The first instinct is to store the last N messages in a database and prepend them to every Claude call. It works. For about two weeks.
Then you tell the agent something important on a Monday and ask about it on a Saturday and it has no idea what you're talking about. The message is in the database. The problem is that "last N messages" is the wrong model for memory.
Humans remember facts, not a rolling transcript of recent conversation. When I text Cass "what's my sister's birthday?" I need her to know that because I told her months ago — not because it came up recently.
The right model has two layers:
Conversation history — the last 10–20 message pairs, passed to Claude as a messages array. Keeps context within a session. Expires quickly. Don't over-invest here.
Persistent memory — a separate DynamoDB table of facts about you. Loaded at the start of every call and injected into the system prompt. Things like: your timezone, your mortgage due date, your car VIN, your flight on Thursday. These are true regardless of the last conversation.
The mistake most engineers make: they treat persistent memory like a longer conversation history. It's not. It's a structured fact store. You design it like a schema, not like a log.
Failure mode 2: Flaky tools
The symptom: "It keeps failing on the weather API" or "It tried to create the calendar event but got an error and now it's in a weird state."
Tool calls are where agents fall apart in production. External APIs are unreliable. They time out. They return a 200 with an error inside the body. They change their response shape without warning. They throttle you at inconvenient times.
If you let tool failures propagate back to Claude as raw exceptions, bad things happen:
- Claude panics and returns a confused response
- Claude retries forever and costs you money
- Claude silently skips the tool, the user gets no result, and everyone is confused
The fix is a contract. Every tool call in Cass is wrapped in the same interface:
def call_tool(name: str, fn: Callable, **kwargs) -> dict:
try:
result = fn(**kwargs)
return {"ok": True, "result": result}
except TimeoutError:
return {"ok": False, "error": "timeout", "retry": True}
except Exception as e:
return {"ok": False, "error": str(e)[:200], "retry": False}
Claude gets back either {"ok": true, "result": ...} or {"ok": false, "error": "...", "retry": false}. It never sees a raw traceback. It always gets a well-formed response it can handle.
Simple pattern. Eliminates an entire class of production failures.
Failure mode 3: No channel (or the wrong channel)
The symptom: "I have this great agent but I never actually use it because checking on it is inconvenient."
The most overlooked failure mode, because it's not a bug — the thing works. You just don't use it.
If your agent lives at a URL you have to visit in a browser, you'll use it twice and then forget about it. The channel — how messages get from you to the agent and back — determines whether the agent is ambient or abandoned.
SMS works because it's in my hand 200 times a day with no app to open. I text Cass the same way I'd text a person. She replies. There's no mode-switching.
But SMS isn't always right:
- Long outputs belong in email (flight itineraries, job digests, weekly summaries)
- Time-sensitive escalations belong in Signal (distinct notification sound, E2E encrypted)
- Voice is right when you're driving
The agent should route to the right channel for each message type. A job digest that arrives as a 3,000-character SMS at 7am is friction. The same digest as an email you read over coffee is not.
This isn't an advanced problem. It just requires you to think about it on day one instead of after you've already wired everything to a single webhook.
What the three modes have in common
None of these are hard problems. They're problems that only show up in production, after the demo worked fine.
Memory loss shows up after two weeks. Tool failures show up when the weather API has its monthly outage. Channel friction shows up the third time you have to open a browser tab to talk to your agent.
The pattern: invest in the infrastructure layer upfront. Memory architecture, tool contracts, channel routing. These are boring compared to the fun part (making Claude do interesting things). They're also what separates an agent that runs for a year from a demo that runs for a week.
I documented all three failure modes (and how to fix them) in a course I'm working on: Build Your Own Cass. Module 2 covers the memory architecture, Module 3 covers the tool contract, Module 4 covers channel routing. It's for engineers who've tried to build a personal AI agent and hit one of these walls.
— Chester