Live Migration
Checkpointing and restoring workloads
Live migration allows you to checkpoint running workloads and restore them on different nodes (or, in some instances, the same node) without losing application state.
Install the agent and scheduler alongside the DevZero write operator.
Choose appropriate option based on your cluster type.
cloud parameter must be one of: aws, gcp, azure, or "" (for no cloud provider).
helm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="containerd" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKENhelm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="rke2" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/var/lib/rancher/rke2/agent/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/k3s/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKENhelm upgrade --install dakr oci://registry-1.docker.io/devzeroinc/dakr-operator \
    --version 0.1.7 \
    --namespace dakr-operator \
    --create-namespace \
    --set cloud="<REPLACE_ME_WITH_VALID_OPTION>" \
    --set image.tag="v0.0.32" \
    --set operator.endpoint="https://dakr.devzero.io" \
    --set operator.clusterLocation="<REPLACE_ME_WITH_SOMETHING_VALID>" \
    --set operator.clusterToken="" \
    --set operator.clusterName="" \
    --set operator.noCloudCreds=true \
    --set operator.nameFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.nameFromConfigMap.namespace=devzero-zxporter \
    --set operator.nameFromConfigMap.key=KUBE_CONTEXT_NAME \
    --set operator.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set operator.tokenFromConfigMap.namespace=devzero-zxporter \
    --set operator.tokenFromConfigMap.key=CLUSTER_TOKEN \
    --set operator.customScheduler=true \
    --set operator.serviceAccount.annotations=null \
    --set operator.features.argocdPatching=true \
    --set agent.enabled=true \
    --set agent.runtime="k3s" \
    --set agent.debug=true \
    --set agent.disableIOUring=true \
    --set agent.configureInotify=true \
    --set agent.inotify.maxUserInstances=9000 \
    --set agent.inotify.maxUserWatches=624288 \
    --set agent.containerdConfigPath="/var/lib/rancher/k3s/agent/etc/containerd/config.toml" \
    --set agent.containerdSock="/run/k3s/containerd/containerd.sock" \
    --set scheduler.nodeCost.controlPlaneToken="" \
    --set scheduler.nodeCost.controlPlaneAddress="https://dakr.devzero.io" \
    --set scheduler.nodeCost.tokenFromConfigMap.name=devzero-zxporter-env-config \
    --set scheduler.nodeCost.tokenFromConfigMap.namespace=devzero-zxporter \
    --set scheduler.nodeCost.tokenFromConfigMap.key=CLUSTER_TOKENLabel nodes that will support live migration. This enables the checkpoint/restore functionality on those nodes (to see the nodes in the cluster, run kubectl get nodes).
kubectl label node <node-name> dakr.devzero.io/checkpoint-node=trueValidate label and ensure that containerd shim is present.
Nodes labeled dakr.devzero.io/checkpoint-node=true
kubectl get nodes -l "dakr.devzero.io/checkpoint-node=true" --no-headers -o custom-columns=NAME:.metadata.nameCheck logs to make sure installation performed as expected
kubectl logs daemonset/dakr-dakr-operator-agent -n dakr-operator -c installerThe logs should look like this:
$ kubectl logs daemonset/dakr-dakr-operator-agent -n dakr-operator -c installer
    2025/09/16 14:54:53 Image docker.io/devzeroinc/dakr-criu:v0.0.28 not found locally, pulling from registry
    2025/09/16 14:54:55 installed criu binaries from docker.io/devzeroinc/dakr-criu:v0.0.28
    2025/09/16 14:54:55 installing runtime for rke2
    2025/09/16 14:54:55 unable to remove shim binary, continuing with install: remove /opt/checkpoint-shim/bin/containerd-shim-checkpoint-v2: no such file or directory
    2025/09/16 14:54:55 configuring containerd v1.7.27-k3s1
    2025/09/16 14:55:27 installed runtime
    2025/09/16 14:55:27 installed runtimeClass
    2025/09/16 14:55:27 installer completedRun workload(s) on the node(s) that have C/R enabled (setting node-selector is a fast way to achieve this).
Apply workload recommendations that have live-migration enabled.