CostWatch: 25 automated scanners for AWS waste you're already paying for
I've been building a tool called CostWatch for the last few weeks. The premise is simple: AWS accounts accumulate waste over time, and most of it is invisible until someone looks.
Here's what a typical AWS account looks like after 18 months of active development:
- 3 ElastiCache clusters from services that were sunset in Q1
- A 500GB EBS volume from a dev environment that was "temporary"
- 2 NAT Gateways in the same VPC from a network redesign
- A SageMaker endpoint that's been idle since the ML experiment ended
- 40 manual RDS snapshots accumulating at $0.095/GB-month
- 6 S3 buckets with abandoned multipart uploads
- 12 Route53 hosted zones with only NS+SOA records
None of this shows up as a budget alert. AWS Cost Explorer tells you what you spent — not what you wasted.
The scanner list
CostWatch now has 25+ scanners, organized by service:
Compute
- EC2 rightsizing (average CPU < 10% over 30 days → suggests next smaller type)
- ECS Fargate idle tasks (no activity in 30 days, or matching non-prod name patterns)
- Lambda unused functions (zero invocations in 30 days)
- Lambda cold start optimizer (avg InitDuration > 1000ms → SnapStart for Java, provisioned concurrency for Python/Node.js)
- Lightsail stopped instances (still billed at $3.50–$160/mo while stopped)
Database / Caching
- RDS idle instances (zero connections in 7 days)
- RDS stale manual snapshots (>90 days, not backing any AMI)
- ElastiCache idle clusters (zero CurrConnections, covers Redis + Memcached)
- OpenSearch idle domains (zero SearchRate + IndexingRate)
- DynamoDB over-provisioned capacity (consumed < 20% of provisioned on both RCU and WCU)
Storage
- EBS unattached volumes (not attached to any running or stopped instance)
- EBS orphaned snapshots (> 90 days, not referenced by any AMI)
- EBS lifecycle policy checker (volumes with no DLM policy and no recent snapshot)
- S3 incomplete multipart uploads (>7 days old — billed but never finalized)
- ECR orphaned images (never pulled or last pulled >90 days ago)
Networking
- NAT Gateway cost estimator (< 1GB/week → VPC endpoint candidate; $0.045/GB data processing cost)
- NAT Gateway duplicate detector (two NAT GWs in the same VPC)
- Elastic IP (unattached, $0.005/hr)
- Route53 empty hosted zones (only NS+SOA, $0.50/mo each)
Messaging / Streaming
- Kinesis Data Streams idle (ACTIVE streams, zero throughput, $10.95/shard/mo)
- MSK idle clusters (Kafka, zero BytesIn + BytesOut)
Serverless / API
- API Gateway idle stages (REST + HTTP/WebSocket, zero requests in 7 days)
- Step Functions idle state machines (STANDARD state machines, no executions)
Security / Compliance
- IAM inactive users (90+ days with active access keys = HIGH risk)
- IAM unused roles (skips service-linked roles)
- Secrets Manager unused secrets (not accessed in 90 days, $0.40/secret/mo)
- CloudWatch silent alarms (no AlarmActions — fires but notifies nobody)
- EC2 Reserved Instance expiration alerts (within 30 days)
Unified output
- Weekly savings email: reads all scanner tables, sums findings, sorts by savings desc, sends HTML email with the top opportunity highlighted.
How it's built
Each scanner is a standalone Lambda function, scheduled via SAM. They write findings to individual DynamoDB tables. A separate aggregator reads all tables and sends the weekly SES email.
The costwatch:ignore=true tag suppresses any resource from future scans — useful for intentional standby infrastructure.
The test suite has 1,490+ tests covering every scanner. Each test uses injectable mocks for the AWS clients so there's no live AWS access in CI.
The design choice that matters
CostWatch is read-only, always. No write permissions, no IAM role modifications, no persistent session tokens. It looks, reports, and stops.
This matters because cost optimization tools with write access are a supply chain risk. If a scanner can delete EBS volumes, a bug (or a compromise) can delete your data. CostWatch surfaces the findings; you decide what to act on.
What it finds in a typical account
The average first scan on a 12-month-old AWS account finds $800–$1,200/mo in waste. The distribution is usually:
- 60% from idle compute (RDS, ElastiCache, SageMaker) — high-cost resources that weren't decommissioned when the project ended
- 25% from orphaned storage (EBS, RDS snapshots, S3 multipart uploads) — slow accumulation that nobody notices
- 15% from networking waste (NAT GWs, Elastic IPs, empty Route53 zones) — usually from architecture changes
The $19/mo Solo plan pays for itself on the first month if it catches one idle ElastiCache cluster.
CostWatch is in early access at costwatch.io.