NewCompare CPU & GPU pricing across AWS, Azure & GCP
Workload OperatorWorkload Optimization

Live Migration

How DevZero applies resource changes to running pods using CRIU checkpoint-restore, preserving in-memory state and open connections without an application restart.

Live Migration

Overview

Live migration is DevZero's approach to applying resource changes without restarting the application. The operator checkpoints the running pod using CRIU (Checkpoint/Restore In Userspace), recreates it with the new resource spec, and restores the checkpoint state into the new pod — picking up execution from exactly where it stopped.

Kubernetes schedules the replacement pod normally. It may land on the same node or a different one depending on available capacity and the updated resource requirements. Either way, the running state is preserved.

Live migration and in-place vertical scaling are complementary. In-place scaling mutates the resource allocation on an existing pod without recreating it, but carries constraints around QoS class, platform support, and node headroom. Live migration works through pod recreation — bypassing all of those constraints — while preserving the pod's running state via CRIU. DevZero can attempt in-place scaling first and fall back to live migration when in-place is not feasible.

Why Live Migration Scales Better

In-place vertical scaling has a hard constraint surface: the pod must stay on the same node, the QoS class cannot change, the platform must support the resize, and the kubelet must accept it. Any one of these can block a recommendation from being applied — sometimes indefinitely.

Live migration sidesteps all of that. Because the pod is recreated with new specs rather than mutated in place:

  • There are no QoS class restrictions — the new pod simply starts with the right values
  • Platform limitations (Windows, static pods, GPU resources) that block in-place resize don't apply to pod creation
  • Node headroom isn't a blocker on the current node — if the new specs don't fit, the scheduler places the pod somewhere that can accommodate them
  • The approach works uniformly across every workload type, without version-gating on Kubernetes 1.33+

The tradeoff compared to plain rolling restarts is that CRIU preserves the pod's in-memory state — no cold start, no lost work, no connection disruption.

How It Works

CRIU works by freezing a running process tree, serializing all of its state — memory pages, file descriptors, TCP socket internals, PID and namespace membership — to a set of image files, then restoring that state into a new process. DevZero orchestrates this around the Kubernetes workload controller so the pod lifecycle stays coherent throughout.

For workloads with multiple pods, the operator migrates one pod at a time (a rolling checkpoint) so the workload remains available throughout the process.

Prepare the workload. The operator patches the workload spec with the new resource values and prevents the controller from interfering with the migration:

  • Deployments: paused via spec.paused = true
  • StatefulSets / DaemonSets: update strategy switched to OnDelete

Without this, the Deployment controller would detect the pod spec change and start its own rolling update, racing with the migration.

Checkpoint each pod. For each pod in turn, the operator:

  1. Calls the node agent to checkpoint the pod's containers via CRIU — images written to /var/lib/dakr/checkpoints/{containerID} on the source node
  2. Creates a CheckpointRestore CRD to track the restore lifecycle
  3. Deletes the pod, triggering the workload controller to create a replacement with the updated resource spec

Transfer and restore. The node agent on the node where the replacement pod is scheduled:

  1. Pulls the checkpoint image from the source
  2. Restores the process state into the newly scheduled pod
  3. Updates the networking stack — conntrack entries, SNAT/DNAT rules, iptables — so that existing connections route correctly to the pod's new IP (whether same node or different)

Wait for Ready, then continue. The operator waits for the restored pod to become Ready before moving to the next one. After all pods are migrated, the Deployment is unpaused (or the original StatefulSet/DaemonSet update strategy is restored).

Networking During Migration

TCP connections survive the node boundary. CRIU uses the TCP_REPAIR socket option to snapshot established connections — capturing sequence numbers, negotiated options (MSS, window scale, SACK), and queue state. During the migration gap, netfilter rules silently drop inbound packets from peers so the remote end never sees a RST and tears down the connection.

On the destination, the networking stack is updated to route traffic correctly:

  • Conntrack entries are updated with NAT rules for established connections
  • SNAT/DNAT rules handle service routing transparently from the new IP
  • TCP liberal mode (nf_conntrack_tcp_be_liberal) is enabled to tolerate packet retransmits that arrive during the brief gap

Istio note: The istio-proxy sidecar is intentionally excluded from the checkpoint — checkpointing proxy state across a node boundary is not safe. The restored pod starts a fresh Envoy proxy that reconnects to the mesh control plane. User containers are not affected.

Workload Support

Workload TypeStrategyNotes
DeploymentRolling — one pod at a timePaused during migration to prevent the controller from triggering its own rolling update
StatefulSetRolling — one pod at a timeSwitched to OnDelete; original strategy restored after
DaemonSetRolling — one pod at a timeSame as StatefulSet
PodDirect checkpoint-delete-restoreSingle operation; no rolling loop
Job / ReplicaSetSpec patch then migratePatch applied first; migration runs if the workload generation changed
Argo RolloutSpec patch then migratePatched via SSA; no new rollout revision is triggered
CronJobSpec patch then migrateActive jobs replaced; future runs inherit the new resource values
SparkApplication / ScheduledSparkApplicationSpec patch then migrateSpec updated via SSA
Kubeflow NotebookSpec patch then migrateTargets the underlying StatefulSet
CNPG ClusterSpec patch then migrateCloudNativePG clusters; spec updated via SSA

Configuration

Live migration is enabled per-recommendation via the WorkloadRecommendation CRD:

spec:
  useLiveMigration: true
  containerRecommendations:
    - containerName: my-container
      cpu:
        request: "2"
        limit: "2"
      memory:
        request: "4Gi"
        limit: "4Gi"

Nodes must be labeled dakr.devzero.io/checkpoint-node=true and must be running the node agent to participate as source or destination.

Fallback Behavior

When live migration cannot be completed, the operator falls back to a standard rolling restart so the resource recommendation is still applied.

ConditionWhy Migration Is Skipped
Image mismatchThe running pod's container image differs from the pod template. CRIU restores process state into a specific binary — a different image would produce undefined behavior or an outright restore failure.
No running podsNothing to checkpoint; the workload has no running pods to migrate.
Node agent unavailableNo agent is reachable on the source node to perform the checkpoint RPC.
Restore failureThe destination node agent accepted the restore but encountered an error during execution.

When fallback occurs, status.phase is set to AppliedWithRestartFallback and status.liveMigrationDegraded is set to true. The status.message explains why. Resource changes are applied regardless.

Status and Observability

Recommendations move through these phases during a live migration:

PhaseMeaning
ApplyingWorkload spec is being patched with new resource values
MigratingCheckpoint/restore is in progress
AppliedAll pods migrated successfully
AppliedWithRestartFallbackMigration could not complete; a rolling restart was used instead
PartialFailureRolling checkpoint: some pods migrated successfully, others fell back
FailedThe recommendation could not be applied at all
SkippedLive migration determined not viable before attempting (e.g., image mismatch)

Per-pod progress is tracked in status.rollingCheckpoint.podResults. Each pod shows its own phase (Pending, Completed, or Failed) and a message if it failed.

Limitations

  • Image coherence required: Running pods must use the same container image as the pod template. Mid-rollout Deployments where some pods use an older image will cause live migration to be skipped.
  • Istio sidecar not checkpointed: The proxy restarts fresh on the destination pod. For most applications this is transparent, but extremely latency-sensitive mesh connections may see a brief reconnect.
  • Very large memory footprints: Checkpoint image size scales with pod RSS. A pod with a 100 GiB working set produces a 100 GiB image to transfer. Migration time scales accordingly.
  • Non-dumpable processes: Applications that explicitly call prctl(PR_SET_DUMPABLE, 0) cannot be checkpointed. The operator falls back to rolling restart.
  • Node agent required on both ends: Source and destination nodes must be labeled dakr.devzero.io/checkpoint-node=true and running the agent. Migrations to unlabeled nodes are not supported.

Troubleshooting

On this page