Recommendations

Tuning zxporter settings to reduce sampling rates will affect the efficiency and effectiveness of recommendations.

Policy

Recommendations are generated based on active (and historical) behavior.

Nodes (and node groups/pools)

It is not recommended to have multiple node autoscalers running at the same time in a cluster.

The following parameters influence recommendation scoring:

Instance availability in {region, availability zone}.

Current shape and demographics of node {group, pool}.

Resources available (CPU/Memory/GPU devices/block devices/network bandwidth/...)

Taints

Toleration (workloads)

Affinity/Anti-affinity (workloads)

Cloud provider pricing.

Number of candidates for removal (based on recommendation mode).

Number of (non-DaemonSet) pods running on node.

Number of StatefulSet pods running on node.

Pod-level underutilization.

Node-level underutilization.

Workloads

Workload policies attempt to binpack by default, irrespective of whether node recommendation policies are set up.

The following parameters influence recommendation scoring:

Instance availability in {region, availability zone}.

Current shape and demographics of pod specs.

Toleration

Affinity/Anti-affinity

Different workloads have different risk tolerances, traffic profiles, and scaling behavior. Recommendation modes allow you to choose how aggressive or conservative you'd like the system to be when reducing resources.

In automated mode, when dakr-operator is applying recommendations, it currently doesn't reset limits, only requests. This is currently done as a reliability measure, but may change in the future.

Mode	Requests	Limits	Notes
Balanced	Use max observed usage, but capped to avoid more than 50% drop.	Adjusted to 75% of current limit, but never below the new request.	[default] Recommended default for most workloads.
Aggressive	Use P90 of max (current and historical utilization).	Set to the max of `1.5× current max utilization` or `75% of current limits`.	Backed by a reinforcement learning algorithm.
Conservative	Set to 1.2× max of current utilization.	Left unchanged from current values.	Suggested for critical or stateful workloads.

For a workload with peak observed usage of 4 cores and 12 Gi over the past 12 hours — currently requesting 9 cores and 32 Gi, with limits set to 14 cores and 48 Gi — mode-specific recommendations might look like this:

CPU Requests: 4 cores (max observed usage)
Memory Requests: 12 Gi (max observed usage)
CPU Limits: 10.5 cores (75% of current 14-core limit)
Memory Limits: 36 Gi (75% of current 48 Gi limit)
Behavior: Requests are set to the maximum observed usage (P100). If this would reduce requests by more than 50%, the cut is capped at 50% of current requests. Limits are set to 75% of current values, but always ≥ the recommended requests.

CPU Requests: 4.8 cores (1.2 × 4 core max usage)
Memory Requests: 14.4 Gi (1.2 × 12 Gi max usage)
CPU Limits: 14 cores (unchanged)
Memory Limits: 48 Gi (unchanged)
Behavior: Requests are set to 1.2× the max observed usage, providing extra headroom. Limits are left unchanged, preserving current resource ceilings.

CPU Requests: 3.2 cores (P90 of usage)
Memory Requests: 9 Gi (P90 of usage)
CPU Limits: 6 cores (max(1.5×4 cores = 6, 75% of 14 = 10.5) → 6)
Memory Limits: 18 Gi (max(1.5×12 Gi = 18, 75% of 48 = 36) → 18)
Behavior: Requests are based on the P90 percentile of observed usage. Limits are set to the greater of 1.5× max usage or 75% of current limits. This mode is optimized for cost, assuming workloads can tolerate throttling.

Replica Count Adjustments (HPA-Aware)

In some cases, we will recommend adjusting the replica count of a workload if it's significantly over-provisioned based on CPU/GPU usage trends.

Applies only when:
- The workload has more than one replica
- One or more of the following is available and can be acted upon:
  - Network bandwidth information
  - GPU usage and GPU VRAM usage data
Based on the selected mode:
- Aggressive: Assumes optimal resource use, multiplier = 1.0
- Balanced: Allows a buffer, multiplier = 1.5
- Conservative: Assumes higher future demand, multiplier = 2.0

Broadly, our methodology can be reduced down to the formula (although our actual implementation is a bit more sophisticated):

\text{recommendedReplicas} = \min \left( \text{currentReplicas}, \left\lceil \frac{\text{totalUsage}}{\text{modeAdjustment} \times \text{targetPerReplica}} \right\rceil \right)

This ensures workloads aren't over-replicated relative to CPU/GPU demand.

Here's how to select the right mode for your use case:

Recommendations

Policy

Nodes (and node groups/pools)

Workloads

Recommendation Modes

Replica Count Adjustments (HPA-Aware)

On this page

Recommendations

Policy

Nodes (and node groups/pools)

Workloads

Recommendation Modes

Replica Count Adjustments (HPA-Aware)

🧠 Explanation

How to Choose a Mode

Can I Override Mode Behavior?

On this page