Kubernetes v1.35: In-Place Pod Resize Goes Stable and What It Means for Your Infrastructure

The Kubernetes community has delivered Kubernetes v1.35 "Timbernetes" bringing a mix of stability, security improvements, and a genuinely practical feature that’s been years in the making: in-place pod resource updates are now generally available. After years of evolution, this capability fundamentally changes how we manage resource allocation in production environments.

This release doesn’t reinvent Kubernetes’ orchestration magic. Rather, it reinforces the foundation of making vertical scaling less disruptive, improving workload efficiency, and smoothing a sore point that’s long plagued production clusters. What follows is a deep dive into the changes that matter most if you run real workloads in real environments.

What In-Place Pod Resize Solves#

For years, adjusting pod resources in Kubernetes meant recreating the pod. This disruptive process worked well for stateless applications but created significant challenges for stateful workloads, long-running batch jobs, and applications that required session state or initialization time. Moreover, the traditional approach of “hey, its Kubernetes - workloads can be restarted whenever,” doesn't hold up well in most complex infrastructures.

Consider a data processing pipeline that has spent 3 hours loading a 100GB dataset into memory. Halfway through processing, you need more CPU to meet SLA requirements. With traditional Kubernetes, your options were all problematic: overprovisioning from the start and wasting resources for the entire job duration, recreating the pod and losing 3 hours of progress, or living with degraded performance and missing your SLA. Of course, all this is given that you can predict exactly how your software will behave under a diversity of runtime conditions. This will help solve a good percentage of the “cold-start”/”long startup time” problems, especially with Java applications.

Kubernetes v1.35 brings in-place updates for pod resources to stable status, enabling you to adjust CPU and memory without restarting pods or containers, scale resources dynamically based on actual workload demands, maintain state and connections through resource adjustments, and reduce operational complexity by eliminating recreation-based workflows.

The feature works through the Container Runtime Interface (CRI), allowing the kubelet to update cgroup settings directly without disrupting running containers.

kubectl patch pod my-adaptive-workload --subresource=resize -p '{
  "spec": {
    "containers": [{
      "name": "processor",
      "resources": {
        "requests": {"cpu": "4", "memory": "8Gi"},
        "limits": {"cpu": "8", "memory": "16Gi"}
      }
    }]
  }
}'

Memory Reduction Support#

One of the most significant improvements in v1.35 is support for memory reduction. The system now includes usage validation to check if current memory usage exceeds the new limit, safety protocols to skip resize if memory spike risk is detected, best-effort OOM prevention to avoid out-of-memory kills during reduction, and state tracking to monitor resize progress through pod conditions.

This means you can rightsize pods after their peak memory usage period, reclaiming resources for other workloads without disruption.

Why This Changes Everything for Stateful Workloads#

Machine Learning Workloads#

ML training jobs are a perfect example of variable resource needs. These workloads typically go through distinct phases: data loading requires high memory to load training datasets into RAM, the training phase demands high CPU or GPU resources for model training, validation needs lower resources for inference and validation, and post-training requires minimal resources for model persistence.

With in-place resize, you can optimize each phase without losing the loaded dataset in memory. This eliminates the need to overprovision for the entire job duration or restart and reload data between phases.

Database Optimization#

Stateful workloads like databases can now adapt to usage patterns throughout the day. Organizations can increase resources during peak hours to handle traffic spikes, reduce resources during off-hours to save costs, temporarily boost capacity for batch processing or maintenance operations, and adjust read replicas based on query complexity.

This enables more cost-effective resource management without risking workload (in-memory) state through pod recreation, a critical improvement for teams running production databases on Kubernetes.

Long-Running Batch Jobs#

Data processing, scientific computing, and rendering workloads often run for hours or days. In-place resize allows these workloads to:

Start with conservative resource allocations
Scale up when processing demands increase
Scale down during I/O-bound phases
Maintain all accumulated state throughout

Beyond In-Place Resize: When You Need Live Migration#

While in-place pod resize solves many resource optimization challenges, it's important to understand its limitations and where complementary technologies add value.

What In-Place Resize Doesn't Solve#

In-place pod resize is excellent for adjusting resources on the same node, but several scenarios still require pod movement. Ultimately, blocking off a node just for one process isn’t an optimal way to operate efficient infrastructure.

Node upgrades and maintenance require draining nodes for OS updates. Hardware failures necessitate moving pods to healthy nodes. Rebalancing workloads helps optimize cluster utilization across nodes. Zone or region changes may be required to meet compliance or latency requirements; reducing xAZ data transfer charges could be one of those reasons as well. And migrating to different instance types or configurations requires relocating pods entirely.

Where Live Migration Shines#

This is where solutions like live migration become critical. While in-place resize keeps your pod on the same node with adjusted resources, live migration enables:

Zero-downtime node maintenance – Move running pods before draining nodes
Hardware optimization – Migrate workloads to more cost-effective instance types
Failure recovery – Seamlessly move pods from failing nodes
Geographic optimization – Relocate pods to reduce latency or meet compliance

Combining Both Approaches#

The most sophisticated infrastructure strategies combine both capabilities. Use in-place resizing for workload-driven resource adjustments on stable nodes, and use live migration for infrastructure-driven changes that require pod relocation.

Platforms like DevZero excel at the orchestration layer above these capabilities. DevZero's live rightsizing solution now supports in-place pod resizing for smoother, more granular adjustments while maintaining the ability to migrate workloads as needed. The key advantage is automated optimization without manual intervention: DevZero detects when workloads need resource adjustments, uses in-place resizing for quick, non-disruptive changes, falls back to live migration when node-level changes are required, and optimizes costs by continuously rightsizing based on actual usage.

Other Notable Updates in Kubernetes 1.35#

Pod Certificates for Workload Identity (Beta)#

Kubernetes now supports native workload identity with automated certificate rotation. The kubelet generates keys and requests certificates via PodCertificateRequest; credential bundles are written directly to the pod's filesystem; and the kube-apiserver enforces node restrictions at admission time. This enables pure mTLS flows without bearer tokens, dramatically simplifying service mesh and zero-trust architectures.

Node Declared Features (Alpha)#

A new framework enables nodes to declare the Kubernetes features they support. Nodes report capabilities via .status.declaredFeatures, which the scheduler uses for pod placement to prevent scheduling pods on incompatible older nodes. This addresses the challenge of control-plane and node-version skew, which is increasingly important as Kubernetes releases introduce node-level features that not all nodes in a cluster may support. The importance of this release is further underscored by the number of Kubernetes releases that occur each year.

PreferSameNode Traffic Distribution (Stable)#

Service traffic distribution has been enhanced with more explicit control. The new PreferSameNode option strictly prioritizes local endpoints, while the existing PreferClose has been renamed to PreferSameZone for clarity. This provides better performance for latency-sensitive workloads and improved cost optimization by reducing cross-zone traffic.

Job API Managed-By Mechanism (Stable)#

The Job API now includes a managedBy field for external controller delegation. This allows systems like MultiKueue to handle Job status synchronization, enables multi-cluster job dispatching patterns, and provides cleaner separation of concerns for Job orchestration as the built-in Job controller steps aside when external management is active.

We’re also excited to see how the KubeFlow and Nextflow projects incorporate this.

Implementation Strategy: Getting Ready for Production#

Phase 1: Assessment and Planning#

Start by identifying workloads that would benefit from in-place resize: stateful applications, long-running batch jobs, ML training workflows, and databases. Document current resource allocation patterns and measure the cost of overprovisioning in your environment.

Verify your Kubernetes version upgrade path, check container runtime support (containerd, CRI-O), and validate compatibility with monitoring and observability tools.

Phase 2: Testing and Validation#

Deploy v1.35 in development and staging environments to test in-place resize with representative workloads. Measure the performance impact of resize operations and validate monitoring and alerting for resize events.

Test edge cases, including memory-reduction scenarios, behavior under node-pressure conditions, resize failures and recovery, and interactions with HPA and VPA if you're using them.

Phase 3: Gradual Production Rollout#

Start with non-critical workloads and enable in-place resize with conservative policies. Monitor closely for unexpected behavior and gather metrics on resource utilization improvements. Gradually enable for more critical workloads, integrate with cost optimization workflows, and update runbooks and operational procedures.

Key Metrics to Track#

Resize operations: Frequency, success rate, duration
Resource utilization: Before and after resize patterns
Cost savings: Reduced overprovisioning costs
Application performance: Impact of resize on workload performance
Failure rates: OOM kills, resize failures, and rollback frequency

Conclusion: Building More Adaptive Infrastructure#

Kubernetes v1.35 represents a significant maturation of the platform's resource management capabilities. The graduation of in-place pod resize to stable status removes one of the last major barriers to truly dynamic, adaptive infrastructure.

We're moving toward infrastructure that can respond in real-time to changing workload demands, optimize continuously without disrupting applications, reduce waste through precise resource allocation, and maintain availability through non-disruptive operations.

Bringing It All Together with DevZero#

While Kubernetes provides foundational capabilities, platforms like DevZero deliver intelligent orchestration that maximizes the value of features such as in-place pod resizing. By combining:

Automated resource rightsizing based on actual usage patterns
Intelligent decision-making about when to resize vs. migrate
Cost optimization through continuous analysis
Zero-touch operations that work without manual intervention

Organizations can achieve the promise of cloud-native infrastructure: paying only for what you use, with no overprovisioning, while maintaining the highest levels of availability.

The feature is stable, well-tested, and ready for production use. For organizations looking to maximize the benefits without building complex automation themselves, solutions like DevZero provide the intelligence layer that turns raw Kubernetes features into automated cost savings and improved reliability.

Ready to see how DevZero can help you leverage Kubernetes 1.35's new capabilities for significant cost savings? Start optimizing your clusters today or learn more about how DevZero works.

Kubernetes v1.35: In-Place Pod Resize Goes Stable and What It Means for Your Infrastructure