Sign up now for a free Kubernetes cost and resource assessment
Node autoscalingOCI

Onboard onto DevZero Karp for OKE

A step-by-step guide for setting up dzKarp on OKE.

dz-karpenter hero image

This guide will help you onboard onto DevZero Karp in an Oracle Kubernetes Engine cluster, we make the following assumptions:

  • You will use an existing OKE cluster (Enhanced type is required for Workload Identity)
  • The OKE cluster has dakr-operator installed
  • You have the oci CLI installed and configured
  • You have kubectl and helm installed

Set Environment Variables

First, set the necessary environment variables for your OCI configuration.

export COMPARTMENT_ID="<your-compartment-id>"
export OCI_REGION="<your-region>"
export CLUSTER_NAME="<your-cluster-name>"
export KARPENTER_NAMESPACE="karpenter"
export KARPENTER_SERVICE_ACCOUNT="karpenter"
export TAG_NAMESPACE="oke-karpenter-ns"
export DZKARP_VERSION="1.4.3"

Configure OCI Identity

We need to create an IAM policy to allow the Karpenter workload identity to manage OCI resources.

cat <<EOF > karpenter_policy.json
[
  "Allow any-user to manage instance-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to manage instances in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to read instance-images in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to read app-catalog-listing in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to manage volume-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to manage volume-attachments in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use volumes in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use virtual-network-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to inspect vcns in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use subnets in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use network-security-groups in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use vnics in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}",
  "Allow any-user to use tag-namespaces in tenancy where all {request.principal.type = 'workload',request.principal.namespace = '${KARPENTER_NAMESPACE}',request.principal.service_account = '${KARPENTER_SERVICE_ACCOUNT}'}"
]
EOF

oci iam policy create \
    --compartment-id "${COMPARTMENT_ID}" \
    --name "karpenter-policy" \
    --description "Policy for Karpenter to manage OCI resources" \
    --statements file://karpenter_policy.json

Configure Tag Namespace

Karpenter uses OCI tags to identify and manage the instances it creates. We need to create a Tag Namespace and the required Tag Definition Keys.

TAG_NS_ID=$(oci iam tag-namespace list --compartment-id "${COMPARTMENT_ID}" --filter "name==\"${TAG_NAMESPACE}\"" --query "data[0].id" --raw-output)
if [ -z "$TAG_NS_ID" ]; then
  TAG_NS_ID=$(oci iam tag-namespace create \
      --compartment-id "${COMPARTMENT_ID}" \
      --name "${TAG_NAMESPACE}" \
      --description "Tag namespace for Karpenter" \
      --query "data.id" --raw-output)
fi

for key in "karpenter_k8s_oracle/ocinodeclass" "karpenter_sh/managed-by" "karpenter_sh/nodepool" "karpenter_sh/nodeclaim"; do
    oci iam tag create \
        --tag-namespace-id "${TAG_NS_ID}" \
        --name "${key}" \
        --description "Karpenter tag key" \
        --is-cost-tracking true
done

Configure Dynamic Group

We need a Dynamic Group to identify the nodes created by Karpenter, and a Policy to allow them to join the cluster.

Tag-based matching rules do not work when the tag key contains / (as all Karpenter-applied tags do, e.g. karpenter_sh/managed-by). Use a compartment-only matching rule instead.

For root-compartment clusters, the policy must use in tenancyin compartment id <tenancy-ocid> does not work even when the OCID is your tenancy OCID. If your cluster is in a sub-compartment, replace in tenancy where target.cluster.id = ... with in compartment <compartment-name>.

export DYNAMIC_GROUP_NAME="karpenter-nodes-${CLUSTER_NAME}"

# Fetch cluster ID for use in the policy condition
CLUSTER_ID=$(oci ce cluster list --compartment-id "${COMPARTMENT_ID}" --name "${CLUSTER_NAME}" --query "data[0].id" --raw-output)

oci iam dynamic-group create \
    --name "${DYNAMIC_GROUP_NAME}" \
    --description "Dynamic group for Karpenter nodes in ${CLUSTER_NAME}" \
    --matching-rule "instance.compartment.id = '${COMPARTMENT_ID}'"

cat <<EOF > node_join_policy.json
[
  "Allow dynamic-group ${DYNAMIC_GROUP_NAME} to {CLUSTER_JOIN} in tenancy where target.cluster.id = '${CLUSTER_ID}'"
]
EOF

oci iam policy create \
    --compartment-id "${COMPARTMENT_ID}" \
    --name "karpenter-node-join-${CLUSTER_NAME}" \
    --description "Policy for Karpenter nodes to join cluster ${CLUSTER_NAME}" \
    --statements file://node_join_policy.json

Install dzKarp

Install dzKarp via Helm. We will retrieve the cluster endpoint from OCI and the DNS service IP from your cluster.

# Retrieve Cluster ID and Endpoint from OCI
CLUSTER_ID=$(oci ce cluster list --compartment-id "${COMPARTMENT_ID}" --name "${CLUSTER_NAME}" --query "data[0].id" --raw-output)

# Check Cluster Type (Must be ENHANCED_CLUSTER for Workload Identity)
CLUSTER_TYPE=$(oci ce cluster get --cluster-id "${CLUSTER_ID}" --query "data.type" --raw-output)
if [ "$CLUSTER_TYPE" != "ENHANCED_CLUSTER" ]; then
  echo "WARNING: Your cluster type is ${CLUSTER_TYPE}. Karpenter requires an ENHANCED cluster for Workload Identity to function correctly."
  echo "Please upgrade your cluster to ENHANCED_CLUSTER type before proceeding."
fi

CLUSTER_ENDPOINT=$(oci ce cluster get --cluster-id "${CLUSTER_ID}" --query 'data.endpoints."private-endpoint"' --raw-output)
if [ -z "$CLUSTER_ENDPOINT" ] || [ "$CLUSTER_ENDPOINT" = "null" ]; then
  CLUSTER_ENDPOINT=$(oci ce cluster get --cluster-id "${CLUSTER_ID}" --query 'data.endpoints."public-endpoint"' --raw-output)
fi

# OCI returns IP:Port, but Karpenter expects HTTPS URL
if [[ "${CLUSTER_ENDPOINT}" != https://* ]]; then
  CLUSTER_ENDPOINT="https://${CLUSTER_ENDPOINT}"
fi

# Retrieve Cluster DNS IP (requires kubectl)
export CLUSTER_DNS=$(kubectl get svc -n kube-system kube-dns -o jsonpath='{.spec.clusterIP}' || kubectl get svc -n kube-system coredns -o jsonpath='{.spec.clusterIP}')

# Install CRDs
helm upgrade --install karpenter-crd oci://public.ecr.aws/devzeroinc/dzkarp-oci/karpenter-crd \
  --version "${DZKARP_VERSION}" \
  --namespace "${KARPENTER_NAMESPACE}" --create-namespace

# Install Karpenter Controller
helm upgrade --install karpenter oci://public.ecr.aws/devzeroinc/dzkarp-oci/karpenter \
  --version "${DZKARP_VERSION}" \
  --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.clusterEndpoint=${CLUSTER_ENDPOINT}" \
  --set "settings.clusterDns=${CLUSTER_DNS}" \
  --set "settings.compartmentId=${COMPARTMENT_ID}" \
  --set "settings.ociResourcePrincipalRegion=${OCI_REGION}" \
  --set "serviceAccount.create=true" \
  --set "serviceAccount.name=${KARPENTER_SERVICE_ACCOUNT}" \
  --wait

Verify dzKarp

Check the logs of the Karpenter controller to ensure it is running without errors.

kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller

Set nodeAffinity for critical workloads (optional)

Autoscaled nodes can be prone to churn and result in workload disturbance. You may want to set a nodeAffinity for critical cluster workloads (like coredns or metrics-server) to prevent them from being scheduled on Karpenter nodes.

Add the following to your cluster critical workload deployments:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: DoesNotExist

Find OKE Worker Image

The imageSelector in the OciNodeClass must reference an OKE-baked worker image. Plain Oracle Linux images will not work — they lack the kubelet, CRI-O, and the OKE bootstrap daemon required for nodes to join the cluster.

OKE images follow the naming pattern Oracle-Linux-8.10-YYYY.MM.DD-0-OKE-<k8s-ver>-<build>. Plain Oracle Linux images (e.g. Oracle-Linux-8.10-2026.01.29-0) will cause nodes to fail to bootstrap.

Find the correct image for your cluster's Kubernetes version (replace <k8s-version> with your version, e.g. v1.31):

oci ce node-pool-options get \
  --node-pool-option-id all \
  --query 'data.sources[?contains("source-name", `OKE`)].{"name":"source-name","id":"image-id"}' \
  | grep <k8s-version>

Use the returned image name or ID in the Image Selector field when creating the Node Policy in the next step.

Create Node Policy

We need to create a Node Policy in DevZero, and have it target the cluster on which dzKarp was just installed.

In the OCI configuration section of the form, set the following:

  • Image Selector: use the OKE worker image found in the previous step
  • Subnet Selector (required): specify the subnet(s) where Karpenter nodes should be launched — use the same subnets as your existing node pools
  • Security Group Selector (required): specify at least one security group. If your cluster does not use security groups, create an empty NSG (no rules) in OCI and use its OCID here
  • Meta Data (required for OKE Enhanced clusters with VCN-native pod networking):
    • oke-native-pod-networking = true
    • oke-cluster-id = last segment of your cluster OCID (e.g. cjs7uyfrsdq from ocid1.cluster.oc1.phx.aaaa...cjs7uyfrsdq)

Head over to the optimization dashboard, click on "Create Node Policy" and follow the form to create a policy suitable for your needs.

After the Policy is created, click on it in the menu and point it at the cluster you just created via "Create Target".

In about a minute this should create nodepool and ocinodeclass objects in your kubernetes cluster.

Check them out:

kubectl describe ocinodeclass
kubectl describe nodepools

Migrate workloads onto autoscaled nodes

If your workloads do not have pod disruption budgets set, the following commands will cause periods of workload unavailability

To migrate workloads to Karpenter, we can scale down the existing OKE node pools.

# Retrieve Cluster ID (if not set)
if [ -z "$CLUSTER_ID" ]; then
  CLUSTER_ID=$(oci ce cluster list --compartment-id "${COMPARTMENT_ID}" --name "${CLUSTER_NAME}" --query "data[0].id" --raw-output)
fi

# Get list of Node Pool IDs
NODE_POOL_IDS=$(oci ce node-pool list --compartment-id "${COMPARTMENT_ID}" --cluster-id "${CLUSTER_ID}" --query "data[].id" --raw-output)

# Scale each node pool to a minimum size (e.g., 1)
for np_id in $NODE_POOL_IDS; do
    echo "Scaling node pool ${np_id}..."
    oci ce node-pool update --node-pool-id "${np_id}" --node-config-details '{"size": 1}' --force
done

If you have a lot of nodes or workloads you may want to slowly scale down your node pools. It is recommended to watch the transition carefully for workloads that may not have enough replicas running or disruption budgets configured.

As node pool nodes are drained you can verify that dzKarp is creating nodes for your workloads.

kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller

You should also see new nodes created in your cluster as the old nodes are removed.

kubectl get nodes

On this page