How Kubernetes Waste Becomes AI Budget

Rob Fletcher
Co-Founder

Part 2 of 2: The Self-Funding AI Mandate and What It Actually Requires
A directive is spreading through engineering and finance organizations: fund AI investments through optimization savings. The FinOps Foundation's State of FinOps 2026 report, surveying more than 650 FinOps practitioners globally, identified this as one of the defining dynamics of the current moment. Organizations are not being given a new budget for AI. They are being told to find it inside the infrastructure they already run.
This fundamentally changes the nature of Kubernetes cost optimization. It is no longer primarily about reducing the cloud bill. It is about capital reallocation: moving resources from low-value consumption to high-value production. The infrastructure team that figures this out doesn't just spend less. It funds the company's most strategic technology investment.
AI Budgets Are Reallocated Capital#
Most organizations treat cloud cost optimization and AI investment as separate conversations. Finance owns the cloud budget. Product and engineering own the AI roadmap. The FinOps team sits in between, trying to reduce the first without being invited to the second.
The State of FinOps 2026 data suggests that the model is breaking down. With 98% of organizations now managing AI spend (up from 31% just two years ago), and with AI identified as the single top forward-looking priority for FinOps teams, the connection between optimization savings and AI investment has become explicit. As the report noted, many organizations are directly tying traditional FinOps work to strategic technology enablement: find the waste, redirect the capital.
This isn't just a budget accounting trick. It represents a real change in how internal infrastructure efficiency is valued. When savings directly fund AI initiatives, efficiency becomes a strategic input rather than an operational cost center. The teams generating those savings gain influence over technology investment decisions. The State of FinOps 2026 found that FinOps practitioners with VP- and C-suite-level engagement have 2-4x more influence over technology selection than those engaging only at the director level. Optimization isn't just saving money anymore. It's earning a seat at the table where AI budgets get decided.
Where the Remaining Savings Actually Live#
Most organizations have already captured the obvious cloud waste. Reserved instances are purchased. Idle compute has been cleaned up. The easy wins are gone, and practitioners are saying so plainly. The State of FinOps 2026 quotes one respondent: "We have hit the 'big rocks' of waste and now face a high volume of smaller opportunities that require more effort to capture."
The remaining opportunity is layered, and it gets harder to capture as you go deeper:
- Billing layer savings (reserved instances, committed use discounts, savings plans) are well understood and largely captured by mature practices
- Infrastructure layer savings (idle instances, oversized nodes, orphaned resources) have been addressed by most organizations running any form of cost governance
- Workload layer savings (resource requests and limits that don't reflect actual usage, poor bin packing, intermittent workloads holding persistent capacity) remain largely uncaptured, and this is where the biggest remaining opportunity lives
As we explored in Part 1 of this series, and in DevZero's foundational analysis of Kubernetes as an economic system, workload-level inefficiency is structural. It is produced by rational engineering behavior inside a system with no price signals. According to Datadog's State of Cloud Costs report, 83% of container costs are wasted on idle resources, split between overprovisioned infrastructure and resource requests that far exceed actual usage. The average Kubernetes cluster runs at 13-25% CPU utilization and 18-35% memory utilization. Correcting that gap manually, at scale, across hundreds of services is not a realistic option for most teams.
And for GPU workloads, the stakes are higher still. DevZero's research shows the average GPU-enabled Kubernetes cluster operates at 15-25% utilization. On a cluster running NVIDIA H100 instances, that idle capacity represents hundreds of thousands of dollars in annual waste that, under the self-funding mandate, is the most direct source of AI investment capital available.
The Small Team Problem#
Here is the structural challenge that makes this mandate harder than it sounds: the teams being asked to generate these savings are lean by design and getting leaner by necessity.
The State of FinOps 2026 found that organizations managing $100 million or more in cloud spend operate with FinOps teams of just 8 to 10 practitioners on average. These teams cannot manually:
- Audit workload resource configurations across hundreds of services
- Track GPU allocation patterns across training jobs and inference endpoints
- Identify bin-packing inefficiencies cluster-wide
- Continuously rightsize resources as usage patterns evolve over time
The math doesn't work. This is why the same report listed "automated remediation and easy buttons" as one of the top desired tooling capabilities, and why automation has become the defining characteristic of high-performing FinOps practices. You cannot scale through headcount alone. You scale through automation that does the continuous work humans cannot.
The Reallocation Path Is Not Automatic#
Identifying where savings live is one problem. Actually moving that capital to AI investment is another. Most organizations have a structural gap between the team that generates infrastructure savings and the team that controls AI investment decisions. Platform engineers reduce workload waste. That savings surfaces in a cloud cost report. Finance adjusts the department budget eventually. AI investments get funded through a separate annual planning process. The connection between the two is loose and slow.
This matters because the self-funding mandate only works if the savings are real, visible, and attributable. If the waste reduction is invisible in aggregate cloud spend, no one can make the case that it funded anything. The State of FinOps 2026 report captured this challenge directly: "Once you fix it, it's gone. How do we give developers credit for shift-left activities?" The same question applies to infrastructure optimization. If you save $400,000 in GPU waste but can't show where that capital went, the next AI project still goes through the regular budget approval process.
This is why workload-level attribution matters as much as workload-level optimization. Knowing that the ML team's training cluster ran at 20% GPU utilization, that releasing idle GPUs saved a specific dollar amount this quarter, and that those savings were redirected to a new model training initiative — that is the narrative that elevates infrastructure efficiency from an operational cost line to a strategic funding mechanism. The FinOps Foundation's finding that VP and C-suite engagement multiplies practitioner influence by 2 to 4 times is downstream of exactly this capability: being able to show, clearly and in business terms, what optimization work produced and where the value went.
What Automated Workload Optimization Actually Looks Like#
This is exactly the problem DevZero is built to solve. DevZero operates at the workload level within Kubernetes clusters, precisely where the remaining savings opportunity lies and where lean FinOps teams cannot realistically operate manually. The core capabilities are:
- Live rightsizing without restarts. DevZero's Multi-dimensional Pod Autoscaler continuously adjusts CPU, memory, and replica counts for running pods based on actual usage, without requiring application restarts or changes to deployment manifests. This automatically captures savings from static overprovisioning on an ongoing basis.
- Automatic GPU detection and release. For AI and ML workloads, DevZero continuously monitors GPU allocation and actual usage. When GPUs are allocated but idle, between training runs, during low-traffic inference periods, or after notebooks are abandoned, the platform releases them automatically based on configurable policies. The three waste patterns it targets are: training jobs that complete and leave GPUs idle, inference endpoints with warm pools consuming capacity during off-peak hours, and interactive notebooks left running after work ends. GPU scarcity is real. Holding idle capacity you're paying for solves nothing.
- Intelligent bin packing and instance selection. DevZero improves cluster utilization by identifying workloads that can be consolidated onto fewer nodes, reducing the total node count and the cost of running underutilized infrastructure.
- Complementary, not competitive, with existing autoscalers. Tools like Karpenter and KEDA handle node-level scaling. DevZero operates at the workload level, capturing waste that node autoscalers can't reach. A node with a single small workload holding a full GPU won't scale down under Karpenter. DevZero releases the GPU allocation while the node stays active.
The results from customers running these capabilities are meaningful. In one documented case, DevZero helped an enterprise AI/SaaS company slash GPU cluster costs by $776,000 alongside their existing Karpenter setup. In another, workload costs dropped 80% within 12 hours of enabling optimization policies. These are not one-time rightsizing exercises. They are continuous, automated savings that compound over time as new workloads are deployed and existing ones change behavior.
Critically, this happens without infrastructure changes or application modifications, and without adding headcount to the FinOps or platform engineering team.
The Strategic Reframe#
The self-funding AI mandate is a prompt to rethink what infrastructure efficiency is for. It is not a cost-cutting exercise. It is a capital allocation decision. The organizations that treat it that way will move faster on AI: not because they have bigger budgets, but because they have learned to recover value from the infrastructure they already run.
The State of FinOps 2026 report closes with a clear observation: mature FinOps practices are moving from optimization to value as the goal. Workload-level efficiency is the mechanism that connects the two. Every dollar of GPU waste recovered is a dollar available for the next model, the next training run, the next AI capability your product needs.
As we argued in Part 1, AI doesn't create a new cost problem in Kubernetes. It amplifies and makes visible an existing one. The economic system governing your infrastructure has been producing waste by design for years. The self-funding mandate is the clearest signal yet that fixing it is no longer optional.
The companies that build the systems to capture that value at scale, automatically and continuously, without adding headcount, are the ones that will have the budget to compete on AI.

Rob Fletcher
Co-Founder


