Every quarter, the same conversation happens in engineering organizations: infrastructure costs are climbing 30% again. Kubernetes, the orchestration layer powering modern cloud infrastructure, is the primary driver. Someone pulls metrics showing 60% CPU overprovisioning. Teams promise to "right-size" their deployments. A Slack channel gets created. Best practices get documented. Nothing changes.
Six months later, costs are higher. The cycle repeats.
This pattern is remarkably consistent across organizations. According to Datadog’s State of Cloud Costs report, 83% of container costs are wasted on idle resources, split between overprovisioned infrastructure and resource requests that far exceed actual usage. Meanwhile, analysis from cloud cost management platforms shows the average Kubernetes cluster runs at only 13 to 25% CPU utilization and 18 to 35% memory utilization.
Kubernetes has become the heartbeat of modern infrastructure. It powers how applications scale, how services communicate, and how engineering teams ship software. But this heartbeat is both your strongest asset and your most expensive liability. The system that enables your infrastructure to run reliably is the same system creating structural waste that compounds quarter over quarter.
This is not a story about lazy engineers or poor tooling. It is not about teams that "do not care" about cost or platform teams who are not doing their jobs. It is a story about incentives and how the orchestration layer that runs your infrastructure produces waste by design.
Kubernetes governs how scarce resources are allocated under uncertainty, without price signals, across multiple actors with misaligned incentives. That is not a metaphor. It is the literal definition of an economic system. And like any economic system, Kubernetes produces rational outcomes based on the incentives it creates.
The problem is that those incentives reward behaviors that appear wasteful from the outside, but make perfect sense to the engineers making allocation decisions.
Kubernetes Solved the Right Problem (For Its Time)
Let's be clear: Kubernetes was never designed to optimize for cost. It was built to solve a different problem entirely: enabling modern infrastructure to scale reliably.
Before Kubernetes, teams were drowning in custom (Puppet and Ansible) deployment scripts, manual capacity planning, and increasingly fragile infrastructure. Google released Kubernetes, based on lessons learned from Borg, its internal orchestration system, with the specific goal of enabling the reliable deployment and management of containerized applications across distributed infrastructure.
The value proposition was clear: standardize infrastructure operations, abstract away the complexity of distributed systems, and let engineering teams focus on building features rather than managing servers.
Efficiency was not the priority. Enabling modern application architecture was. And for good reason. In the mid-2010s, the ability to deploy microservices, scale automatically, and orchestrate containers represented a fundamental leap in what infrastructure could do. The cost of running that infrastructure was secondary to the capability it unlocked.
But the system we inherited carries those priorities forward, even as the context has changed. Kubernetes is no longer optional for modern infrastructure. It has become the standard orchestration layer on which everything else depends. Your CI/CD pipelines run on it. Your production services run on it. Your data processing workloads run on it. The infrastructure you built to enable scale has become one of your largest operating expenses.
Cloud costs have exploded. Infrastructure spend is now a material line item that impacts business margins. Yet the fundamental allocation mechanism (how Kubernetes decides who gets what resources) still operates without any awareness of cost.
As one analysis found, typical overprovisioning factors range from 2 to 5 times actual resource needs, with annual waste per cluster ranging from $50,000 to $500,000 depending on cluster size. The orchestration layer powering your infrastructure is operating as designed. The design was not optimized for the constraints that infrastructure teams face today.
Every Economic System Has Incentives
Here is what most people miss when discussing Kubernetes cost optimization: you cannot fix a problem that is working as designed.
In traditional markets, prices create feedback loops. When demand increases, prices rise, signaling scarcity and encouraging more efficient use. When someone wastes resources, they pay for it directly and immediately. This feedback shapes behavior over time.
Kubernetes has no prices. It has resource requests and limits, but these are technical abstractions, not economic signals. An engineer sets requests.cpu: 2000m does not see a cost. They see a configuration parameter that affects scheduling. The actual cost (the dollars spent on underlying compute) is abstracted away, delayed, and aggregated at a level where individual decisions become invisible. In reality, the 2000m costs differ depending on whether it runs on an AWS or Azure computer, whether it's an on-demand or spot instance, and, of course, on the instance type!
This creates a specific type of incentive structure:
- Outages are immediate and visible. They trigger incidents, wake people up at 3 am, and generate post-mortems that executives read.
- Waste is delayed and invisible. It appears in finance reports weeks later, aggregated across hundreds of services, with no clear attribution.
- Reliability is measured and rewarded. Uptime SLAs, incident counts, and MTTR are common engineering metrics.
- Efficiency is not. Few engineers have "reduce infrastructure cost by 20%" as a performance goal.
Given these incentives, what would a rational engineer do?
Why Overprovisioning Is Rational
Let's walk through a common scenario. You are deploying a new service to production. You need to set resource requests.
You could run thorough capacity tests, analyze real usage patterns, and set requests to match actual needs. This is the "right" approach in theory. But let's examine what that actually involves:
- Spinning up realistic load testing environments
- Generating production-like traffic patterns (including rare spikes)
- Instrumenting detailed metrics to understand resource consumption
- Running multiple test cycles to account for variability
- Documenting your methodology for future reference
- Getting sign-off from SRE or platform teams
This takes days or weeks. And even after all that work, you are still making an educated guess about future behavior. Traffic patterns change. Dependencies add latency. A database slowdown creates memory pressure you did not anticipate. If you underestimate and the service gets throttled or OOMKilled during a spike, you own the outage. And more often than not, gaining expertise in running K8s infrastructure isn’t necessarily the market your company operates in! So it will often be viewed as wasted energy.
Alternatively, you could look at what similar services use, add 50% headroom "just to be safe," and move on with your day. The deployment succeeds. The service runs fine. If it never uses those resources, no one will notice or care. There is no post-mortem for efficiency. Most importantly, the product launch or feature announcement will proceed as planned.
The rational choice is obvious: overprovision.
This is not negligence. It is risk management inside a system where the cost of being wrong in one direction (underprovisioning) is vastly higher than the cost of being wrong in the other (overprovisioning). As one DevOps engineer explained in discussing Kubernetes overprovisioning behavior: "Teams prioritize uptime over cost efficiency, especially in high-stakes environments. Overprovisioning acts as a safety net to buffer against traffic spikes or component failures."
Engineers optimize for the metric on which they are measured. And that metric is reliability, not efficiency.
Consider the career calculus: an outage that you caused by underprovisioning is a resume-generating event. It is visible and documented, and it directly impacts your reputation. Overprovisioning by 2x might waste $50,000 a year, but that cost is invisible, distributed, and rationalized as "being conservative." No engineer has ever been fired for overprovisioning.
The problem compounds because resource requests, once set, are rarely revisited. Teams set them during initial deployment, when uncertainty is highest and stakes feel highest. Then they move on. Kubernetes does not send a signal when actual usage is 40% of requests. The workload runs smoothly, costs remain hidden within the overall infrastructure budget, and there is no trigger to revisit the decision.
Even when teams do want to optimize, they face asymmetric information. Lowering resource requests requires confidence that you understand peak load patterns, but those patterns are often opaque. You can see the average utilization, but what about the spike that occurs twice a year during the sales event? What about the gradual growth that might exceed your new limits in three months?
In traditional markets, prices solve this by making the cost of holding excess capacity continuous and visible. In Kubernetes, that cost is abstracted away. So capacity sits reserved "just in case," even when "just in case" never comes (or usually comes once or twice a year).
The Tragedy of the Commons Inside Clusters
Shared Kubernetes clusters create another dynamic: the tragedy of the commons. This is a classic economic problem where individually rational behavior leads to collective waste. Garrett Hardin's original formulation described how shared pastures become overgrazed, not through malice, but through the logic of individual optimization.
Kubernetes clusters are digital pastures. Each team has access to a shared resource pool, and each team's incentive is to secure as much as they need, plus a buffer, because the cluster is "free" from their perspective. The actual cost is paid centrally and does not appear in team budgets.
As one analysis of cloud infrastructure expenses as a tragedy of the commons explains: "The nature of cloud computing means that engineers can easily scale their demands on the infrastructure as needed... Without constraints in place, developers can continue to scale their demands on the infrastructure, justifying the action as required to meet their team's deliverables and deadlines, while inadvertently impacting the company's finances."
In a shared cluster, no single team bears the full cost. Each team knows its own resource requests, but they do not see how their decisions contribute to overall utilization. The cluster is a shared pool, and everyone takes what they need, plus a little extra, just in case.
Consider what happens in a typical organization:
The ML team reserves 8 GPUs for training jobs that run twice a week because reserving them is easier than managing scheduling conflicts. They reason: "If we do not reserve them, someone else will, and then we will be blocked." The GPUs sit idle 70% of the time, but from the ML team's perspective, this is the cost of running their jobs on demand.
The data team sets high memory requests for batch jobs "to handle edge cases," even though typical usage is much lower. They have been burned by OOMKills before, so they set requests to P99 usage, not P50. The memory sits reserved but unused most of the time.
The platform team overprovisioned the cluster, anticipating growth that may or may not materialize. They are measured on uptime, not cost, so they would rather have excess capacity than risk being blamed for resource constraints. They added nodes "to be safe."
The API teams all follow similar patterns: high CPU requests because "our service is latency-sensitive," high memory usage because "we do caching," and generous limits because "what if there is a traffic spike."
Each decision is defensible in isolation. But when 50 teams make similar decisions, the cluster becomes a collection of reserved capacity that is collectively underutilized. Industry data indicate the typical cluster runs at 30-40% actual utilization, though at DevZero, we typically see production clusters operating at just 5 to 15%. Meanwhile, teams simultaneously report constant resource pressure and "not enough capacity."
This paradox (simultaneous waste and scarcity) is the signature of a broken economic system. Resources are being hoarded rather than allocated efficiently. Everyone has too much reserved, and everyone feels like they do not have enough.
The tragedy deepens because teams compete for resources through configuration rather than pricing. If one team sets aggressive requests, other teams feel pressure to do the same, "or we will not get scheduled." This creates an arms race of overprovisioning, where the equilibrium is everyone claiming more than they need.
In traditional markets, prices prevent this. If hoarding resources costs you directly, you only hold what you can justify. In Kubernetes, hoarding is free (or rather, paid for by a team that sits far away). So the rational strategy is to claim early and claim generously.
Why Tooling Alone Cannot Fix This
The usual response to Kubernetes waste is more tooling: cost dashboards, right-sizing recommendations, policy engines that enforce limits. And of course, Jira tickets.
These tools provide visibility, and visibility matters. But visibility does not change incentives.
A dashboard that shows overprovisioning is useful, but only if someone has a reason to act on it. If engineers are measured on reliability rather than efficiency, they will acknowledge waste and move on. The incentives have not changed.
Policy engines can enforce limits, but they create new problems. Set limits too aggressively, and teams will pad their requests even more to avoid hitting constraints. Set them too loosely, and nothing changes. Without price signals, there is no natural mechanism for discovering the right balance.
Right-sizing recommendations suffer from the same issue. Teams receive reports suggesting lower resource requests, but they have no motivation to implement them unless they are currently blocked. The typical response is "we will look at this when we have time," which is never.
This is why cost-optimization initiatives tend to follow a pattern: initial enthusiasm, a sprint of changes, modest savings, and then regression. Without changing the underlying incentives, behavior reverts to the equilibrium the system produces. For more on this pattern, see our analysis of which Kubernetes workloads waste the most resources.
What Changes When Price Signals Exist
The solution is not to shame engineers into caring more about infrastructure cost. It is to change how the infrastructure operates so that costs are visible and actionable at the point of decision (or as close as possible).
This is what price signals do in traditional markets. They make the marginal cost of consumption clear, creating natural feedback loops that shape behavior without requiring constant management intervention. In a well-designed system, the person making the allocation decision sees the cost of that decision and has a reason to optimize.
Think about how cloud providers themselves handle this. AWS does not send you a monthly report suggesting you "might want to use fewer EC2 instances." They charge you for every hour of compute that you consume. That continuous feedback creates natural pressure to optimize. You incur costs from leaving instances running, so you shut them down. You see the price difference between instance types and choose the appropriate one.
The orchestration layer running your infrastructure, by contrast, abstracts all of this away. An engineer setting resource requests for a service may not realize they are effectively pre-purchasing $50,000 of infrastructure capacity for a workload that will use only $15,000. They see technical parameters: CPU cores and memory GBs. The economic implications and the business impact are invisible.
Applied to modern infrastructure, introducing price signals might mean:
Continuous feedback on actual cost. Not aggregated reports in finance tools three weeks later, but real-time visibility tied to the infrastructure decisions engineers make every day. Developers should see the business impact of their resource allocation decisions immediately, in the context in which they make those decisions. This transforms infrastructure cost from an abstract financial problem into a concrete engineering tradeoff. Tools like DevZero's infrastructure cost monitoring provide this type of visibility at the workload, team, and service level.
Incentives aligned with business outcomes. Engineering metrics that include infrastructure efficiency alongside performance and reliability. When teams are measured on delivering business value efficiently, they optimize for efficiency. When they are measured solely by availability, they optimize for that, and infrastructure waste is the predictable result.
Allocation that reflects actual usage. Moving from static infrastructure reservation (where you pay for what you request) to dynamic models, where teams pay for the infrastructure capacity they actually consume. This is technically complex but economically straightforward: if your service only uses 40% of its allocated infrastructure, you should be accountable for 40%, not 100%. This creates immediate pressure to allocate accurately rather than conservatively.
Ownership and attribution. Making it clear which teams own which infrastructure costs, so that optimization becomes a team responsibility rather than a centralized mandate from the infrastructure team. When the API team sees that their services account for 35% of total infrastructure spend, they have context to evaluate tradeoffs between feature velocity and infrastructure efficiency. When costs are aggregated centrally, no one feels responsible. Our guides on EKS cost optimization, GKE cost optimization, and AKS cost optimization detail specific approaches for each platform.
At DevZero, we’re working on this exact problem by building systems that make infrastructure cost a first-class operational signal, rather than an afterthought in financial reports. The goal is to create feedback loops that change behavior naturally, without requiring constant intervention from finance teams or platform engineers.
But the deeper shift is conceptual: recognizing that infrastructure cost problems are economic, not technical. You cannot tune your way out of broken incentives. You need to change the economic system governing your infrastructure's operations.
The Path Forward
When focusing on infrastructure costs, most people recommend auditing your Kubernetes resource requests, implementing policies, or using a specific tool. That is treating the symptom, not the disease.
We’re arguing for a different starting point: understand the economic system that runs your infrastructure.
Why do infrastructure costs keep increasing? Because the orchestration layer that powers modern applications allocates resources without price signals, it leads to rational overprovisioning over time. Each team optimizes for its local incentives (e.g., avoiding outages and shipping features), and the aggregate result is structural waste in the infrastructure that supports your business.
Why is your infrastructure inefficient? Not because of bad engineering, but because the system running your infrastructure has broken incentives. It does not expose marginal cost, does not punish overconsumption, and does not create feedback loops tied to spending. Efficiency has no reward. Waste has no penalty.
Why do engineers overprovision? Because overprovisioning is the safest strategy in a system with no prices. The career risk of an outage is high and immediate. The cost of wasted infrastructure capacity is low and delayed. Rational actors optimize for the incentives they face.
Understanding this reframes the entire problem. You are not dealing with a configuration issue that can be tuned away. You are dealing with the fundamental economics of your infrastructure's operation. Individual engineers are not the issue. The orchestration layer your business depends on is working exactly as designed. The waste you are seeing is the rational output of that design.
Fixing it requires more than tooling. It requires changing how infrastructure teams experience cost: making it visible at the point of decision, aligning rewards with efficiency, and creating feedback loops that shape behavior over time. This means:
- Treating infrastructure efficiency as a first-class business metric, not just a technical concern
- Building systems that provide continuous cost feedback, where engineers work, not quarterly finance reports
- Creating organizational structures where teams own both the performance and the cost of the infrastructure they use
- Questioning whether the current model (shared infrastructure with no price signals) scales with your business
This is the foundation for rethinking infrastructure optimization. Not as a periodic tuning exercise. Not as a monitoring problem requiring better dashboards. But it's a systems problem: the incentives governing your infrastructure need to change.
When you understand how the orchestration layer running your infrastructure actually works (and what economic signals it does and does not provide), you can start building systems that work with human nature rather than against it. The goal is not to make engineers care more about cost. The goal is to change the system so that efficient behavior becomes the rational behavior.
Your infrastructure powers your business. The question is whether that infrastructure operates under economic principles that support your business goals, or under broken incentives that create waste by design.
To address infrastructure efficiency, start by reviewing your cloud pricing models:
- AWS: EKS pricing guide
- Google Cloud: GKE pricing guide
- Azure: AKS pricing guide
After you dive deep into your pricing model, explore which workload types waste the most resources in your clusters
The goal is not perfection. It is progress. And progress starts with seeing the system for what it is: an economic mechanism that produces rational outcomes based on broken incentives. As Charlie Munger said, "Show me the incentive and I'll show you the outcome." Change the incentives, and you change the outcomes.






