aws-costwatch: 25 ways your AWS bill is leaking money
The AWS Cost Explorer is good at showing you what you spent. It's not good at telling you why, or what to do about it.
aws-costwatch fills that gap. It's an autonomous cost scanner with 25 patterns, running as scheduled Lambda functions, writing findings to DynamoDB. It finds the waste your billing dashboard doesn't surface.
What aws-costwatch finds
Compute waste
EC2 idle instance detector. Instances where average CPU utilization is below 5% for 7 consecutive days. AWS doesn't flag these. Your bill just goes up.
EC2 right-sizing advisor. Identifies instances where sustained low CPU usage suggests a smaller instance type would handle the load. Groups by instance family (m5, c5, r5), estimates monthly savings at on-demand rates.
EC2 stopped instance cost. Stopped instances still pay for attached EBS volumes. A stopped r5.4xlarge costs $0 for compute but keeps paying for storage. aws-costwatch flags stopped instances by how long they've been stopped and estimates the ongoing EBS cost.
Lambda cold start patterns. Functions with average init duration > 1000ms. Runtime-aware recommendations: SnapStart for Java, provisioned concurrency for Python/Node.js. Sorted by average init duration.
ECS Fargate idle tasks. Tasks running with average CPU below threshold for 7 days. Fargate's per-vCPU pricing means idle tasks are expensive idle.
Storage waste
S3 Intelligent-Tiering gaps. Buckets > 10GB without Intelligent-Tiering lifecycle rules. S3-IT moves objects automatically between Standard and Infrequent Access based on access patterns — average 30% savings on cold data. Most teams haven't enabled it.
EBS unattached volumes. Volumes not attached to any instance. These are common artifact of terminated instances where the volume was set to "preserve on termination."
DynamoDB capacity waste. Provisioned-mode tables where both RCU and WCU consumed < 20% of provisioned for 7 days. The table is paying for 5× what it needs. PAY_PER_REQUEST mode would cost a fraction.
S3 versioning cost. Buckets with versioning enabled and significant old-version accumulation. Old versions persist until explicitly deleted or transitioned. Without lifecycle rules for old versions, versioned buckets grow indefinitely.
Networking
VPC NAT Gateway cost analyzer. NAT Gateways cost $0.045 per GB of data processed, plus $0.045/hour for the gateway itself. Instances using NAT for < 1 GB/week are candidates for VPC endpoints (which are cheaper if there's sustained traffic to AWS services).
Elastic IP waste. Unattached Elastic IPs cost $0.005/hour = $3.60/month each. AWS gives you 5 for free when they're attached. When detached, they're a small but ongoing charge that accumulates.
Unused load balancers. ALBs and NLBs with no healthy targets attached. Common after blue-green deployments where the old ALB was left running. Each ALB costs ~$16+/month in LCU charges.
Monitoring & logging
CloudWatch log retention gaps. Log groups with no retention policy (retain forever). At $0.03/GB/month, log groups without retention become a significant cost over time. Common offenders: /aws/lambda/* groups from old functions, /aws/batch groups from one-time jobs.
CloudWatch custom metric sprawl. High custom metric counts from non-essential namespaces. Each custom metric costs $0.30/month after the 10 free metrics. A poorly configured APM agent can generate thousands.
Security & access
IAM unused role detector. Roles with no activity in the last 90 days via IAM Access Advisor. Not directly a cost driver, but unused roles increase the blast radius of a compromise and are a compliance finding in most frameworks.
Secrets Manager unused secrets. Secrets not accessed in 30+ days cost $0.40/secret/month. Common accumulation pattern: dev/test secrets from dead projects, rotated secrets where the old ARN was never cleaned up.
Database
RDS idle instance detector. RDS instances with sustained low CPU and no connections. A stopped RDS instance can be restored in minutes; there's no reason to pay for running it if it's not in use.
RDS Multi-AZ review. Multi-AZ adds ~2× cost to RDS. For development instances (identified by naming convention or tag), Multi-AZ is usually unnecessary.
Aurora cluster rightsizing. Aurora ACUs set too high for the actual workload. Aurora Serverless v2 adjusts automatically, but the min ACU setting determines the floor cost.
Application-level
Lambda cold start by runtime. Broken down by runtime (Python, Node.js, Java, Go, etc.) with runtime-specific optimization paths. Java cold starts are an order of magnitude worse than Go; the fix is different.
API Gateway cache hit rate. Caches where the hit rate is < 20% aren't paying for themselves. API Gateway caching costs $0.02-0.038/hour depending on size; a cache that's always missing might as well be off.
CloudFront origin request rate. Distributions where most requests are hitting origin rather than cache. A high cache miss rate on CloudFront means you're paying for both CDN and origin compute on every request.
Step Functions state transitions. Workflows with high state transition counts per execution. Express workflows charge per state transition; workflows with many short-lived states can generate significant charges.
Scanning infrastructure
AWS Config rules cost. Config rules that evaluate frequently on many resources accumulate evaluation charges. Rules with low compliance return value should be reviewed.
Glue job cost efficiency. Glue jobs where DPU hours × run frequency exceeds a threshold relative to apparent utility (estimated by job run frequency and data processed).
Current state
1,584 tests across 25 scanners. Each scanner is a standalone Lambda function with a DynamoDB backing table. Daily cron schedule for each scanner, with a consolidated digest emailed every morning.
The monthly cost to run aws-costwatch (the tool itself) for a mid-sized AWS account is < $5/month. The median finding from the scanner set is a $200-800/month saving.
The pricing target is $49/month for a single AWS account, $199/month for up to 10 accounts. The positioning: if aws-costwatch doesn't find at least $49/month in savings in the first 30 days, you get a refund.
Like the other tools in this portfolio, the product is built. The deployment is the next step.
The week ahead
The IAM lockdown blocking overnight deploys gets removed Monday. Then:
sam deploy --guidedfrom the aws-costwatch directory- Add the first account's AWS credentials to Parameter Store
- Watch the first scan run
The findings after the first week against my own account will be the case study. I already know what it'll find — the $97 June AWS bill includes a Backup cost I haven't fully explained yet.
That's the job.