Key Concepts
Core concepts in DevZero -- operators, recommendations, policies, and live migration.
Key Concepts
Operators
DevZero uses the Kubernetes operator pattern to extend cluster functionality. Each operator has a specific responsibility:
| Operator | Role | Cluster Access |
|---|---|---|
Read (zxporter) | Collect metrics | Read-only |
Write (dakr-operator) | Apply recommendations, update CRDs | Read-write (scoped) |
Node (dzkarp) | Manage node lifecycle | Node management |
Network (zxporter-netmon) | Monitor traffic | Read-only |
Security (dakr-security) | Scan vulnerabilities (KSPM) | Read-only |
Scheduler (dz-scheduler) | Optimize pod placement | Scheduler bindings |
Recommendations
A recommendation is a suggested change to a Kubernetes resource. DevZero generates recommendations by comparing actual resource usage against current requests and limits.
If enabled in the policy, DevZero will apply the recommendation reactively as well as proactively (by analyzing historical behavior).
Each recommendation includes:
- Target resource -- the deployment, statefulset, or daemonset to modify
- Current values -- existing CPU/memory/GPU requests and limits, and replica counts
- Recommended values -- suggested values based on observed utilization
- Confidence level -- based on the amount of historical data available
- Projected savings -- estimated cost reduction
Recommendations can be reviewed in the dashboard and applied manually, or auto-applied based on policies.
Policies
A policy defines the rules for how DevZero optimizes your cluster. Policies control:
- Scope -- which namespaces, labels, or workload types are targeted
- Aggressiveness -- how tightly to fit resources to usage (conservative, moderate, aggressive)
- Guardrails -- minimum and maximum resource values that can't be exceeded
- Approval mode -- automatic application vs. audit-only
- Schedule -- time windows when changes are allowed (e.g., business hours only)
- Optimization application mode -- spec patching vs. checkpoint/restore
- In-cluster MPA -- using a cluster-local MPA (multi-dimensional pod autoscaler) for latency sensitive workflows
Live Migration
Live migration (aka checkpoint-restore) allows DevZero to resize workloads without restarting pods. This is critical for stateful applications (databases, caches, queues) that can't tolerate downtime.
Currently, live migration is only supported for workloads that don't use init containers.
CRDs
DevZero operators use Custom Resource Definitions (CRDs) to represent their state in Kubernetes:
- WorkloadRecommendation -- a pending or applied recommendation for a workload
- NodeGroupRecommendation -- a recommendation for node pool changes
- NodeClass, NodePool, NodeClaim -- cluster autoscaling state tracking
- VulnerabilityReport -- security scan results for a container image
- ConfigAuditReport -- configuration compliance results
- ClusterComplianceReport -- cluster-wide compliance status
Supported Resource Types
DevZero can optimize the following Kubernetes resource types:
Unless explicitly noted, optimizations can be applied using traditional Kubernetes workload spec patching as well as checkpoint/restore (live migration).
Core Workloads
| Resource | Kind | Notes |
|---|---|---|
| Deployment | Deployment | Most common workload type |
| StatefulSet | StatefulSet | Databases, caches, queues |
| DaemonSet | DaemonSet | Per-node workloads |
| Job | Job | Batch workloads (only checkpoint/restore) |
| CronJob | CronJob | Scheduled batch workloads |
| ReplicaSet | ReplicaSet | Usually managed by a Deployment |
| ReplicationController | ReplicationController | Legacy controller |
| Pod | Pod | Standalone pods (only checkpoint/restore) |
Extended Workloads
DevZero also supports third-party resource types commonly found in data and ML platforms:
| Resource | Kind | Project |
|---|---|---|
| Argo Rollout | Rollout | Argo Rollouts |
| Kubeflow Notebook | Notebook | Kubeflow |
| Volcano Job | VolcanoJob | Volcano |
| Spark Application | SparkApplication | Spark Operator |
| Scheduled Spark Application | ScheduledSparkApplication | Spark Operator |
| CNPG Cluster | Cluster | CloudNativePG |