How DevZero applies resource changes to running pods using CRIU checkpoint-restore, preserving in-memory state and open connections without an application restart.

Live Migration

Overview

Live migration is DevZero's approach to applying resource changes without restarting the application. The operator checkpoints the running pod using CRIU (Checkpoint/Restore In Userspace), recreates it with the new resource spec, and restores the checkpoint state into the new pod — picking up execution from exactly where it stopped.

Kubernetes schedules the replacement pod normally. It may land on the same node or a different one depending on available capacity and the updated resource requirements. Either way, the running state is preserved.

Live migration and in-place vertical scaling are complementary. In-place scaling mutates the resource allocation on an existing pod without recreating it, but carries constraints around QoS class, platform support, and node headroom. Live migration works through pod recreation — bypassing all of those constraints — while preserving the pod's running state via CRIU. DevZero can attempt in-place scaling first and fall back to live migration when in-place is not feasible.

Why Live Migration Scales Better

In-place vertical scaling has a hard constraint surface: the pod must stay on the same node, the QoS class cannot change, the platform must support the resize, and the kubelet must accept it. Any one of these can block a recommendation from being applied — sometimes indefinitely.

Live migration sidesteps all of that. Because the pod is recreated with new specs rather than mutated in place:

There are no QoS class restrictions — the new pod simply starts with the right values
Platform limitations (Windows, static pods, GPU resources) that block in-place resize don't apply to pod creation
Node headroom isn't a blocker on the current node — if the new specs don't fit, the scheduler places the pod somewhere that can accommodate them
The approach works uniformly across every workload type, without version-gating on Kubernetes 1.33+

The tradeoff compared to plain rolling restarts is that CRIU preserves the pod's in-memory state — no cold start, no lost work, no connection disruption.

How It Works

CRIU works by freezing a running process tree, serializing all of its state — memory pages, file descriptors, TCP socket internals, PID and namespace membership — to a set of image files, then restoring that state into a new process. DevZero orchestrates this around the Kubernetes workload controller so the pod lifecycle stays coherent throughout.

For workloads with multiple pods, the operator migrates one pod at a time (a rolling checkpoint) so the workload remains available throughout the process.

Prepare the workload. The operator patches the workload spec with the new resource values and prevents the controller from interfering with the migration:

Deployments: paused via spec.paused = true
StatefulSets / DaemonSets: update strategy switched to OnDelete

Without this, the Deployment controller would detect the pod spec change and start its own rolling update, racing with the migration.

Checkpoint each pod. For each pod in turn, the operator:

Calls the node agent to checkpoint the pod's containers via CRIU — images written to /var/lib/dakr/checkpoints/{containerID} on the source node
Creates a CheckpointRestore CRD to track the restore lifecycle
Deletes the pod, triggering the workload controller to create a replacement with the updated resource spec

Transfer and restore. The node agent on the node where the replacement pod is scheduled:

Pulls the checkpoint image from the source
Restores the process state into the newly scheduled pod
Updates the networking stack — conntrack entries, SNAT/DNAT rules, iptables — so that existing connections route correctly to the pod's new IP (whether same node or different)

Wait for Ready, then continue. The operator waits for the restored pod to become Ready before moving to the next one. After all pods are migrated, the Deployment is unpaused (or the original StatefulSet/DaemonSet update strategy is restored).

Networking During Migration

TCP connections survive the node boundary. CRIU uses the TCP_REPAIR socket option to snapshot established connections — capturing sequence numbers, negotiated options (MSS, window scale, SACK), and queue state. During the migration gap, netfilter rules silently drop inbound packets from peers so the remote end never sees a RST and tears down the connection.

On the destination, the networking stack is updated to route traffic correctly:

Conntrack entries are updated with NAT rules for established connections
SNAT/DNAT rules handle service routing transparently from the new IP
TCP liberal mode (nf_conntrack_tcp_be_liberal) is enabled to tolerate packet retransmits that arrive during the brief gap

Istio note: The istio-proxy sidecar is intentionally excluded from the checkpoint — checkpointing proxy state across a node boundary is not safe. The restored pod starts a fresh Envoy proxy that reconnects to the mesh control plane. User containers are not affected.

Workload Support

Workload Type	Strategy	Notes
Deployment	Rolling — one pod at a time	Paused during migration to prevent the controller from triggering its own rolling update
StatefulSet	Rolling — one pod at a time	Switched to `OnDelete`; original strategy restored after
DaemonSet	Rolling — one pod at a time	Same as StatefulSet
Pod	Direct checkpoint-delete-restore	Single operation; no rolling loop
Job / ReplicaSet	Spec patch then migrate	Patch applied first; migration runs if the workload generation changed
Argo Rollout	Spec patch then migrate	Patched via SSA; no new rollout revision is triggered
CronJob	Spec patch then migrate	Active jobs replaced; future runs inherit the new resource values
SparkApplication / ScheduledSparkApplication	Spec patch then migrate	Spec updated via SSA
Kubeflow Notebook	Spec patch then migrate	Targets the underlying StatefulSet
CNPG Cluster	Spec patch then migrate	CloudNativePG clusters; spec updated via SSA

Configuration

Live migration is enabled per-recommendation via the WorkloadRecommendation CRD:

spec:
  useLiveMigration: true
  containerRecommendations:
    - containerName: my-container
      cpu:
        request: "2"
        limit: "2"
      memory:
        request: "4Gi"
        limit: "4Gi"

Nodes must be labeled dakr.devzero.io/checkpoint-node=true and must be running the node agent to participate as source or destination.

Fallback Behavior

When live migration cannot be completed, the operator falls back to a standard rolling restart so the resource recommendation is still applied.

Condition	Why Migration Is Skipped
Image mismatch	The running pod's container image differs from the pod template. CRIU restores process state into a specific binary — a different image would produce undefined behavior or an outright restore failure.
No running pods	Nothing to checkpoint; the workload has no running pods to migrate.
Node agent unavailable	No agent is reachable on the source node to perform the checkpoint RPC.
Restore failure	The destination node agent accepted the restore but encountered an error during execution.

When fallback occurs, status.phase is set to AppliedWithRestartFallback and status.liveMigrationDegraded is set to true. The status.message explains why. Resource changes are applied regardless.

Status and Observability

Recommendations move through these phases during a live migration:

Phase	Meaning
`Applying`	Workload spec is being patched with new resource values
`Migrating`	Checkpoint/restore is in progress
`Applied`	All pods migrated successfully
`AppliedWithRestartFallback`	Migration could not complete; a rolling restart was used instead
`PartialFailure`	Rolling checkpoint: some pods migrated successfully, others fell back
`Failed`	The recommendation could not be applied at all
`Skipped`	Live migration determined not viable before attempting (e.g., image mismatch)

Per-pod progress is tracked in status.rollingCheckpoint.podResults. Each pod shows its own phase (Pending, Completed, or Failed) and a message if it failed.

Limitations

Image coherence required: Running pods must use the same container image as the pod template. Mid-rollout Deployments where some pods use an older image will cause live migration to be skipped.
Istio sidecar not checkpointed: The proxy restarts fresh on the destination pod. For most applications this is transparent, but extremely latency-sensitive mesh connections may see a brief reconnect.
Very large memory footprints: Checkpoint image size scales with pod RSS. A pod with a 100 GiB working set produces a 100 GiB image to transfer. Migration time scales accordingly.
Non-dumpable processes: Applications that explicitly call prctl(PR_SET_DUMPABLE, 0) cannot be checkpointed. The operator falls back to rolling restart.
Node agent required on both ends: Source and destination nodes must be labeled dakr.devzero.io/checkpoint-node=true and running the agent. Migrations to unlabeled nodes are not supported.

Live Migration

Live Migration

Overview

Why Live Migration Scales Better

How It Works

Networking During Migration

Workload Support

Configuration

Fallback Behavior

Status and Observability

Limitations

Troubleshooting

On this page

Live Migration

Check overall migration status

Check per-pod checkpoint-restore lifecycle

Migration fell back to rolling restart

On this page