Eight products, one night: what autonomous AI agents actually ship

I've been running Claude autonomously on my homelab from midnight to 6 AM for about six weeks. Every morning when I wake up there's a digest file waiting for me — a log of what shipped, what broke, what needs my eyes.

I've been meaning to write a full accounting. Here it is.

What it is and isn't

The setup is a ChesterGPT instance (my personal Cass deployment) that fires at 00:30 CDT and runs until 06:00 CDT. The agent gets five and a half hours, a Max Claude subscription, DynamoDB for memory, and a list of priorities in order:

Open todos
Money-making projects
Standing maintenance (blog, CV, mail triage, infra)

It can read anything, write to repos on overnight branches, run tests, query databases, and write the morning log. It cannot deploy to production (with one carve-out), send messages, or spend money.

The most interesting thing it does is priority 2.

Eight products

Over the past six weeks of overnight passes, the agent has built eight deployable micro-SaaS products from scratch. All are code-complete, tested, and documented. None are yet making money because they're waiting for me to register domains, run the SAM deploy command, and add Stripe keys.

The products, in the order they were built:

CertWatch — Domain and SSL cert expiry monitor. Free tier: 5 domains, weekly digest. 151 tests. Waiting on me to register certwatch.app and run bash deploy/build_and_deploy.sh.

OSS Pulse — Maintenance health monitor for your open-source dependencies. Computes a 0-100 score per repo, watches GitHub Security Advisories for CVEs. 104 tests. Waiting on osspulse.io.

NightDesk — AI after-hours triage for MSPs. Amazon Connect answers calls, transcribes and classifies urgency, creates ConnectWise tickets. Wakes the on-call tech only for P1s. Waiting on pilot outreach to 5 MSPs.

TicketScope — Chrome extension sidebar for ConnectWise Manage. Opens a panel on each ticket with AI summary, time-entry draft, and similar past tickets. 12 tests.

CW Slack Bridge — Serverless bridge that routes ConnectWise ticket events to Slack channels. Per-customer channel routing. 77 tests.

EverCV — Continuous resume management. Watches your GitHub commits, CW time entries, and daily done-log to maintain an always-current resume. Tailoring endpoint for specific job postings. 100 tests.

Build Your Own Cass — 8-module course on building a personal AI agent from scratch. All modules written. Waiting on Gumroad listing (~15 minutes of my time).

Brand Monitor — Brand mention alerts for founders. Monitors HN, Reddit, GitHub, Google News for any keyword. Sends a weekly digest, fires immediately on spikes. 56 tests.

What "code-complete" means

Every product above has:

Working Lambda functions with environment-variable configuration
A SAM template.yaml that deploys the whole stack with one command
Unit tests that pass in a clean environment (no local AWS state required)
A deploy/build_and_deploy.sh one-liner
A landing page (dark theme, pricing table, FAQ)
A customer quick-start guide
A Show HN draft ready to post
A social kit (Twitter thread, LinkedIn post, IH post)

What "code-complete" doesn't mean:

It's running anywhere
Anyone has paid for it
The product name is finalized
The domain is registered

The gap between "code-complete" and "live" is about two hours of my time per product. For all eight products, that's roughly sixteen hours of morning work spread over a few weekends.

What the agent actually does well

Test-first discipline. Every function the agent writes comes with a test file. Not because I asked for it — it just does this. The test suites range from 12 to 151 tests per project. They're not perfect, but they're there, and they catch real bugs before I'd find them.

Pattern recognition across projects. By project 4, the agent had developed strong opinions about DynamoDB table design, SAM template structure, and how to build a good handle(event) function. Products 5-8 came out cleaner than 1-4 because the agent had built a mental model of what works.

Uninterrupted focus. Six hours of uninterrupted, non-distracted building time is genuinely different from six hours of human work. No Slack, no email, no context switches. The agent compounds its own progress within a session in a way that's harder to replicate during the day.

Documentation. This surprised me. The agent writes better documentation than I do when I'm tired at 2 AM. The customer quick-start guides are clear, the FAQ sections anticipate real questions, the Show HN drafts have real story arcs.

What it doesn't do well

Judgment calls. "Is this the right product?" and "Who exactly is the customer?" are not questions the agent can answer. It executes on the spec I give it. If the spec is wrong — wrong target market, wrong pricing, wrong feature set — it builds the wrong thing very efficiently.

External dependencies. The agent can call read-only APIs during research, but it can't register a domain, create a Stripe account, set up SES identities, or approve an AWS budget increase. All of the "not yet making money" items in the list above are blocked on something that requires a credit card or a legal click.

Validation. Every product above is built for a pain I personally experienced. That's a good prior. But it's one data point. I don't know yet if there are 200 people willing to pay $15/mo for Brand Monitor, or if it's a niche of 12.

The bottleneck

Me.

The agent works five hours and produces a deployable product. I wake up, read the log, and then... don't deploy it immediately because I have a day job and clients and a backlog of things I already committed to.

The bottleneck isn't the agent's ability to build. It's my ability to close the loop — to do the twenty minutes of morning work that turns a branch into a live product.

I'm writing this partly as accountability. Eight products, five to twenty minutes each to deploy. Some fraction of them will find users. The ones that don't get two months of data and then a decision. The ones that do compound from there.

The code for most of these will end up public once they're deployed. The most interesting one is probably Brand Monitor — built specifically for the HN problem, has a free tier, and the Show HN story writes itself. That one goes up this week.

— Chester