DevZero Logo

DevZero

Live Migration

Checkpointing and restoring workloads

Live migration allows you to checkpoint running workloads and restore them on different nodes (or, in some instances, the same node) without losing application state.

Currently in closed alpha.

Install the agent and scheduler alongside the DevZero write operator.

Choose appropriate option based on your cluster type.

cloud parameter must be one of: aws, gcp, azure, or "" (for no cloud provider).

helm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="containerd" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKEN
helm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="rke2" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/var/lib/rancher/rke2/agent/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/k3s/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKEN
helm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="k3s" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/var/lib/rancher/k3s/agent/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/k3s/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKEN

Label nodes that will support live migration. This enables the checkpoint/restore functionality on those nodes (to see the nodes in the cluster, run kubectl get nodes).

kubectl label node <node-name> dakr.devzero.io/checkpoint-node=true

Validate label and ensure that containerd shim is present.

Nodes labeled dakr.devzero.io/checkpoint-node=true

kubectl get nodes -l "dakr.devzero.io/checkpoint-node=true" --no-headers -o custom-columns=NAME:.metadata.name

Check logs to make sure installation performed as expected

kubectl logs daemonset/dakr-dakr-operator-agent -n dakr-operator -c installer

The logs should look like this:

$ kubectl logs daemonset/dakr-dakr-operator-agent -n dakr-operator -c installer
    2025/09/16 14:54:53 Image docker.io/devzeroinc/dakr-criu:v0.0.28 not found locally, pulling from registry
    2025/09/16 14:54:55 installed criu binaries from docker.io/devzeroinc/dakr-criu:v0.0.28
    2025/09/16 14:54:55 installing runtime for rke2
    2025/09/16 14:54:55 unable to remove shim binary, continuing with install: remove /opt/checkpoint-shim/bin/containerd-shim-checkpoint-v2: no such file or directory
    2025/09/16 14:54:55 configuring containerd v1.7.27-k3s1
    2025/09/16 14:55:27 installed runtime
    2025/09/16 14:55:27 installed runtimeClass
    2025/09/16 14:55:27 installer completed

Run workload(s) on the node(s) that have C/R enabled (setting node-selector is a fast way to achieve this).

Apply workload recommendations that have live-migration enabled.

Troubleshooting