Onboard onto DevZero Karp for EKS
A step-by-step guide for setting up dzKarp on EKS.

This guide will help you onboard onto DevZero Karp in an EKS cluster, we make the following assumptions:
- You will use an existing EKS cluster
- The EKS cluster has dakr-operator installed
- You will use existing VPC and subnets
- You will use existing security groups
- Your nodes are part of one or more node groups
- Your workloads have pod disruption budgets that adhere to EKS best practices
- Your cluster has an OIDC provider for service accounts
This guide will also assume you have the aws CLI installed. You can also perform many of these steps in the console, but we will use the command line for simplicity.
Set Environment Variables
First, set a variable for your cluster name
export CLUSTER_NAME=<your cluster name>Next, set other variables from your cluster configuration.
export KARPENTER_NAMESPACE=kube-system
export AWS_PARTITION="aws" 
export AWS_REGION="$(aws configure list | grep region | tr -s " " | cut -d" " -f3)"
export OIDC_ENDPOINT="$(aws eks describe-cluster --name "${CLUSTER_NAME}" \
    --query "cluster.identity.oidc.issuer" --output text)"
export OIDC_PROVIDER_ID=$(echo $OIDC_ENDPOINT | cut -d'/' -f5)
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' \
    --output text)
export TEMPOUT="$(mktemp)"
export KARPENTER_VERSION="1.7.1"
export K8S_VERSION=$(aws eks describe-cluster --name "${CLUSTER_NAME}" --query "cluster.version" --output text)Run CloudFormation
Run the below CloudFormation script to configure AWS IAM roles and policies for node management operations, as well as set up a queue for spot interruption events.
curl -fsSL https://raw.githubusercontent.com/devzero-inc/dakr-operator-installers/refs/tags/dzkarp/dzKarp/cloudformation.yaml  > "${TEMPOUT}" \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides \
    "ClusterName=${CLUSTER_NAME}" \
    "AWSRegion=${AWS_REGION}" \
    "OIDCProviderID=${OIDC_PROVIDER_ID}" \
    "KarpenterNamespace=${KARPENTER_NAMESPACE}"Add Tags to Subnets and Security Groups
We need to add tags to our subnets so dzKarp will know which subnets and security groups to use.
VPC_ID=$(aws eks describe-cluster --name "${CLUSTER_NAME}" --query "cluster.resourcesVpcConfig.vpcId" --output text)
aws ec2 create-tags \
    --resources $(aws ec2 describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" --query "Subnets[].SubnetId" --output text) \
    --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}"Add tags to our cluster security group.
aws ec2 create-tags \
    --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" \
    --resources $(aws eks describe-cluster --name "${CLUSTER_NAME}" --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text)Update aws-auth ConfigMap
We need to allow nodes that are using the node IAM role we just created to join the cluster. To do that we have to modify the aws-auth ConfigMap in the cluster.
kubectl edit configmap aws-auth -n kube-systemYou will need to add a section to the mapRoles that looks something like this. Replace the ${AWS_PARTITION} variable with the account partition, ${AWS_ACCOUNT_ID} variable with your account ID, and ${CLUSTER_NAME} variable with the cluster name, but do not replace the {{EC2PrivateDNSName}}.
- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
  username: system:node:{{EC2PrivateDNSName}}The full aws-auth configmap should have two groups. One for your dzKarp node role and one for your existing node group.
Deploy dzKarp
Install dzKarp via Helm
# Logout of helm registry to perform an unauthenticated pull against the public ECR
helm registry logout public.ecr.aws
helm upgrade --install karpenter oci://public.ecr.aws/devzeroinc/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --waitVerify dzKarp
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controllersee that no unexpected errors are produced
Set nodeAffinity for critical workloads (optional)
Autoscaled nodes can be prone to churn and result in workload disturbance.
You may want to set a nodeAffinity critical cluster workloads to mitigate this.
Some examples are
- coredns
- metric-server
add the following to your cluster critical workload deployments
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: karpenter.sh/nodepool
          operator: DoesNotExistCreate Node Policy
We need to create a Node Policy in DevZero, and have it target the cluster on which dzKarp was just installed.
Head over to the optimization dashboard, click on "Create Node Policy" and follow the form to create a policy suitable for your needs.
After the Policy is created, click on it in the menu and point it at the cluster you just created via "Create Target".
In about a minute this should create nodepool and nodeclass objects in your kubernetes cluster.
Check them out:
kubectl describe ec2nodeclass
kubectl describe nodepoolsMigrate workloads onto autoscaled nodes
If your workloads do not have pod disruption budgets set, the following commands will cause periods of workload unavailability
If you have cluster-autoscaler installed, it must be disabled first, scale its deployment down to zero before you proceed.
To get rid of the instances that were added from the node group we can scale our nodegroup down to a minimum size to support dzKarp and other critical services.
If you have a single multi-AZ node group, we suggest having 2 instances
aws eks update-nodegroup-config --cluster-name "${CLUSTER_NAME}" \
    --nodegroup-name "${NODEGROUP}" \
    --scaling-config "minSize=2,maxSize=2,desiredSize=2"Or, if you have multiple single-AZ node groups, we suggest 1 instance each.
for NODEGROUP in $(aws eks list-nodegroups --cluster-name "${CLUSTER_NAME}" \
    --query 'nodegroups' --output text); do aws eks update-nodegroup-config --cluster-name "${CLUSTER_NAME}" \
    --nodegroup-name "${NODEGROUP}" \
    --scaling-config "minSize=1,maxSize=1,desiredSize=1"
doneIf you have a lot of nodes or workloads you may want to slowly scale down your node groups by a few instances at a time. It is recommended to watch the transition carefully for workloads that may not have enough replicas running or disruption budgets configured.
As nodegroup nodes are drained you can verify that dzKarp is creating nodes for your workloads.
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controllerYou should also see new nodes created in your cluster as the old nodes are removed.
kubectl get nodes