Recommendations
Recommendation policies for optimizing resource utilization
Tuning zxporter
settings to reduce sampling rates will affect the efficiency and effectiveness of recommendations.
Policy
Recommendations are generated based on active (and historical) behavior.
Nodes (and node groups/pools)
It is not recommended to have multiple node autoscalers running at the same time in a cluster.
The following parameters influence recommendation scoring:
Instance availability in {region, availability zone}
.
Current shape and demographics of node {group, pool}
.
Cloud provider pricing.
Number of candidates for removal (based on recommendation mode).
Number of (non-DaemonSet
) pods running on node.
Number of StatefulSet
pods running on node.
Pod-level underutilization.
Node-level underutilization.
Workloads
Workload policies attempt to binpack by default, irrespective of whether node recommendation policies are set up.
The following parameters influence recommendation scoring:
Instance availability in {region, availability zone}
.
Current shape and demographics of pod specs.
Toleration
Affinity/Anti-affinity
Recommendation Modes
Different workloads have different risk tolerances, traffic profiles, and scaling behavior. Recommendation modes allow you to choose how aggressive or conservative you'd like the system to be when reducing resources.
dakr-operator
is applying recommendations, it currently doesn't reset limits
, only requests
. This is currently done as a reliability measure, but may change in the future.Mode | Requests | Limits | Notes |
---|---|---|---|
Balanced | Use max observed usage, but capped to avoid more than 50% drop. | Adjusted to 75% of current limit, but never below the new request. | [default] Recommended default for most workloads. |
Aggressive | Use P90 of max (current and historical utilization). | Set to the max of 1.5× current max utilization or 75% of current limits . | Backed by a reinforcement learning algorithm. |
Conservative | Set to 1.2× max of current utilization. | Left unchanged from current values. | Suggested for critical or stateful workloads. |
For a workload with peak observed usage of 4 cores and 12 Gi over the past 12 hours — currently requesting 9 cores and 32 Gi, with limits set to 14 cores and 48 Gi — mode-specific recommendations might look like this:
- CPU Requests: 4 cores (max observed usage)
- Memory Requests: 12 Gi (max observed usage)
- CPU Limits: 10.5 cores (75% of current 14-core limit)
- Memory Limits: 36 Gi (75% of current 48 Gi limit)
- Behavior:
Requests are set to the maximum observed usage (P100).
If this would reduce requests by more than 50%, the cut is capped at 50% of current requests.
Limits are set to 75% of current values, but always ≥ the recommended requests.
Replica Count Adjustments (HPA-Aware)
In some cases, we will recommend adjusting the replica count of a workload if it's significantly over-provisioned based on CPU/GPU usage trends.
- Applies only when:
- The workload has more than one replica
- One or more of the following is available and can be acted upon:
- Network bandwidth information
- GPU usage and GPU VRAM usage data
- Based on the selected mode:
- Aggressive: Assumes optimal resource use, multiplier =
1.0
- Balanced: Allows a buffer, multiplier =
1.5
- Conservative: Assumes higher future demand, multiplier =
2.0
- Aggressive: Assumes optimal resource use, multiplier =
Broadly, our methodology can be reduced down to the formula (although our actual implementation is a bit more sophisticated):
This ensures workloads aren't over-replicated relative to CPU/GPU demand.
Here's how to select the right mode for your use case: