DevZero Logo

DevZero

Node autoscalingAzure

Onboard onto DevZero Karp on AKS

A step-by-step guide for setting up dzKarp on AKS.

dz-karpenter hero image

This guide will help you onboard onto DevZero Karp in an AKS Cluster, we make the following assumptions:

  • You will use an existing AKS cluster
  • The AKS cluster has dakr-operator installed
  • The AKS cluster has OIDC, managed identity and workload identity enabled
  • You have az CLI, kubectl, helm, jq, and yq installed

Set Environment Variables

First, set the following variables

export CLUSTER_NAME="<cluster-name>"
export RG=$(az aks list --query "[?name=='${CLUSTER_NAME}'].resourceGroup" -o tsv)
export LOCATION=$(az aks list --query "[?name=='${CLUSTER_NAME}'].location" -o tsv)

Setup dzKarp Auth

Create the workload MSI that backs the dzKarp pod auth:

KMSI_JSON=$(az identity create --name karpentermsi --resource-group "${RG}" --location "${LOCATION}")

Ensure Cluster has OIDC, workload and managed identity enabled, this may cause node disruption:

az aks update --name "${CLUSTER_NAME}" --resource-group "${RG}" \
  --enable-oidc-issuer --enable-workload-identity --enable-managed-identity

Create federated credential linked to the dzKarp service account for auth usage:

AKS_JSON=$(az aks show --name "${CLUSTER_NAME}" --resource-group "${RG}")
az identity federated-credential create --name KARPENTER_FID --identity-name karpentermsi --resource-group "${RG}" \
  --issuer "$(jq -r ".oidcIssuerProfile.issuerUrl" <<< "$AKS_JSON")" \
  --subject system:serviceaccount:kube-system:karpenter-sa \
  --audience api://AzureADTokenExchange

Create role assignments to let dzKarp manage VMs and Network resources:

KARPENTER_USER_ASSIGNED_CLIENT_ID=$(jq -r '.principalId' <<< "$KMSI_JSON")
RG_MC=$(jq -r ".nodeResourceGroup" <<< "$AKS_JSON")
RG_MC_RES=$(az group show --name "${RG_MC}" --query "id" -otsv)
for role in "Virtual Machine Contributor" "Network Contributor" "Managed Identity Operator"; do
  az role assignment create --assignee "${KARPENTER_USER_ASSIGNED_CLIENT_ID}" --scope "${RG_MC_RES}" --role "$role"
done

Configure helm values

dzKarp for aks helm values can be configured with a helper script.

export KARPENTER_VERSION=v1.6.6
# get the valus template
curl -sO https://raw.githubusercontent.com/devzero-inc/dakr-operator-installers/refs/tags/${KARPENTER_VERSION}/dzKarp/azure/karpenter-values-template.yaml
# get the helper script
curl -sO https://raw.githubusercontent.com/devzero-inc/dakr-operator-installers/refs/tags/${KARPENTER_VERSION}/dzKarp/azure/configure-values.sh
# run the helper script
chmod +x ./configure-values.sh && ./configure-values.sh ${CLUSTER_NAME} ${RG} karpenter-sa karpentermsi

Install dzKarp

Install dzKarp with helm:

helm upgrade --install karpenter oci://public.ecr.aws/devzeroinc/aks/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace kube-system --create-namespace \
  --values karpenter-values.yaml \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

Verify dzKarp

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

see that no unexpected errors are produced

Set nodeAffinity for critical workloads (optional)

Autoscaled nodes can be prone to churn and result in workload disturbance.

You may want to set a nodeAffinity critical cluster workloads to mitigate this.

Some examples are

  • coredns
  • metric-server

add the following to your cluster critical workload deployments

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: karpenter.sh/nodepool
          operator: DoesNotExist

Create Node Policy

We need to create a Node Policy in DevZero, and have it target the cluster on which dzKarp was just installed.

Head over to the optimization dashboard, click on "Create Node Policy" and follow the form to create a policy suitable for your needs.

After the Policy is created, click on it in the menu and point it at the cluster you just created via "Create Target".

In about a minute this should create nodepool and nodeclass objects in your kubernetes cluster.

Check them out:

kubectl describe aksnodeclass
kubectl describe nodepools

Migrate workloads onto autoscaled nodes

If your workloads do not have pod disruption budgets set, the following commands will cause periods of workload unavailability

If you have cluster-autoscaler installed, it must be disabled first, scale its deployment down to zero before you proceed.

To get rid of the instances that were added from the node pool we can scale our node pools down to a minimum size to support dzKarp and other critical services.

If you have a single node pool, we suggest scaling it to 2 instances:

az aks nodepool scale \
    --cluster-name "${CLUSTER_NAME}" \
    --resource-group "${RG}" \
    --name <your-node-pool-name> \
    --node-count 2

Or, if you have multiple node pools, choose one to keep and delete the others.

If you have a lot of nodes or workloads you may want to slowly scale down your node pools by a few instances at a time. It is recommended to watch the transition carefully for workloads that may not have enough replicas running or disruption budgets configured.

As node pool nodes are drained you can verify that dzKarp is creating nodes for your workloads.

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

You should also see new nodes created in your cluster as the old nodes are removed.

kubectl get nodes