FinOps·April 8, 2026 · 6 min read

You Are Paying Too Much for Over-Provisioned AWS Resources

Most teams know they use more AWS resources than they need. Few know how much they waste. Here is where the extra cost comes from - and why CloudWatch alone will not show it.

Over-provisioning is normal in almost every AWS environment we review. It makes sense: no one gets woken up at 3am because they bought too much capacity. Teams choose safety over savings. The result is that most teams pay 30-50% more than needed. And the waste is not in the places they expect.

Where the waste really is

The easy targets - unused instances and unattached EBS volumes - are usually a small part of total waste. The bigger costs are in places that are hard to see. You need to compare billing data with usage data to find them.

EC2 instances using only 5-15% average CPU. They are not idle, just far too large. Switching from m5.4xlarge to m6i.xlarge often saves 60-70% per instance with no performance loss.
RDS instances sized for peak load that only happens once a month. The fix is not to keep the primary database at peak size 24/7. Use Performance Insights, Enhanced Monitoring, and CloudWatch to find the real bottleneck (CPU, memory, IOPS, connections, or slow queries), right-size for the steady baseline, and handle the monthly peak separately: schedule a temporary class increase during a maintenance window, move reporting/batch load to a read replica, or use Aurora Serverless v2 for variable Aurora workloads. After the shape is right, cover the stable baseline with RDS RIs or Database Savings Plans.
EBS gp2 volumes that should be gp3. gp3 costs less and performs better. The switch does not cause downtime and saves 20% right away.
NAT Gateway data processing charges. Teams send all outbound traffic through NAT without noticing the per-GB processing charge. Add gateway VPC endpoints for S3 and DynamoDB by default - they have no additional hourly or data processing charge and keep that traffic off the NAT Gateway. For the remaining traffic, monitor NAT Gateway bytes and use VPC Flow Logs to identify the top destinations. If most traffic goes to AWS services, add the right interface endpoints. If a service is mostly doing high-volume public internet egress and does not require private-subnet isolation, move that egress path to an internet-facing design so you are not paying NAT processing on every GB.
Elastic IPs attached to stopped instances. Each one costs $3.65/month. One is small, but we have seen accounts with 50 or more unused EIPs.

Measure the right signals before resizing

Right-sizing fails when teams only look at the default metrics. CPU is useful, but it is not enough. A low-CPU instance can still be constrained by memory, disk space, disk I/O, network throughput, database locks, or slow SQL. Before changing instance sizes, collect the metrics that show what is actually limiting the workload.

For RDS and Aurora, enable Performance Insights or CloudWatch Database Insights where available. Look at DB load, wait events, top SQL, hosts, and users before you decide whether the database is too large or just badly tuned. A query that needs an index should not be solved by running a bigger database forever.

For EC2, install the CloudWatch Agent when memory and disk utilization are missing from the default view. The agent can publish memory, swap, disk used percent, disk free, and disk I/O metrics. Those metrics are often the difference between a safe resize and a guess. Only resize when CPU, memory, disk, network, and application latency all support the change.

A step-by-step right-sizing process

Tag everything. You cannot assign costs to teams without tags. Use SCPs or Config rules to require tagging.
Collect 2-4 weeks of usage data for CPU, memory, network, disk, and database load. Use Performance Insights or CloudWatch Database Insights for RDS/Aurora, and install the CloudWatch Agent on EC2 instances where memory and disk utilization are missing.
Find resources where peak usage (not average) is below 40% of total capacity. These are safe to make smaller.
Resize in small groups. Change 3-5 instances, watch them for a week, then do more. Do not change 50 instances at the same time.
Set up alerts for the metrics that matter. If latency goes up after a resize, you will know in hours, not weeks.

How Finoptic turns this into one FinOps workflow

Finoptic is built as a one-stop shop for FinOps, not just another cost dashboard. Out of the box, it gives you dashboards for spend, usage, commitment coverage, waste, anomalies, and ownership across accounts and teams. That is enough to answer most cost questions quickly: where the money is going, which services changed, which teams own the spend, and which resources are likely over-provisioned.

When the first dashboard is not enough, Finoptic can go deeper with investigation plugins and automations. For example, if NAT Gateway cost is the issue, Finoptic can help enable VPC Flow Logs across the AWS Organization, collect the data, and ship a dedicated investigation dashboard that shows which workloads, destinations, and services are driving the traffic. The same model applies to other deep investigations: add the missing telemetry, automate the collection, and give the team a focused dashboard for that specific issue.

That is the difference from generic FinOps tools that stop at billing data. Finoptic combines billing visibility, cloud telemetry, automation, and purpose-built investigation dashboards, so teams can move from "we see a spike" to "we know what caused it and what to change."

If you think your environment is over-provisioned but do not have the data to prove it, that is what our free cost assessment is for. We connect Finoptic, run the analysis, and show you the exact dollar amount - usually within 48 hours.

Want to talk through this for your stack?

Free 30-minute call. No commitment.

Book a call