Cloud Cost Optimization

Container Checkpoint/Restore with CRIU

Container checkpoint and restore

Debo Ray

Co-Founder, CEO

July 8, 2025

Share via Social Media

Container restarts are a common occurrence in production systems. Whether it's a node failure, scheduled maintenance, or resource rebalancing, traditional container restart means losing all in-memory state and forcing applications to rebuild their working set from scratch. For stateful applications, this translates to service interruption, degraded performance, and potentially impacts end-users.

CRIU (Checkpoint/Restore In Userspace) changes this equation entirely. Instead of killing and restarting containers, CRIU enables live migration of running containers with full state preservation, including memory contents, open file descriptors, and network connections.

Note: It is essential to recognize that the “liveness” of live migration is implementation-specific; the following factors influence it: (a) Is the snapshotted or checkpointed process still running? (b) Latency to restore the checkpointed process? (c) Is there downtime between the checkpointed application serving live traffic and when the restored process is ready to serve live traffic?

The Problem with Traditional Container Restarts

When Kubernetes reschedules a pod or Docker restarts a container, the process is destructive:

  1. SIGTERM sent to main process
  2. SIGKILL after a grace period
  3. All memory state is discarded
  4. New container starts from scratch
  5. Application rebuilds caches, reconnects to databases, and reloads configuration

For a web application with a 2GB in-memory cache, this may result in 30-60 seconds of degraded performance while the cache rebuilds. For a machine learning inference service with loaded models, the restart time could be several minutes.

How CRIU Works: Process State Serialization

CRIU operates at the Linux kernel level, leveraging several kernel features to capture and restore complete process state:

Memory Dumping

CRIU uses /proc/PID/pagemap and /proc/PID/maps to identify all memory regions belonging to a process tree. It then:

  • Freezes the process tree using ptrace(PTRACE_SEIZE)
  • Dumps all memory pages to disk
  • Captures memory mapping information (heap, stack, shared libraries)
  • Records memory protection flags and special mappings

File Descriptor Preservation

Every open file descriptor is catalogued and preserved:

  • Regular files: path and offset position
  • Sockets: protocol state, connection endpoints, buffer contents
  • Pipes: buffer data and connection topology
  • Device files: state-dependent handling

Process Tree Topology

CRIU reconstructs the exact process hierarchy:

  • Parent-child relationships
  • Process groups and sessions
  • Signal handlers and pending signals
  • CPU registers and execution state

Practical Implementation with Docker

Let's walk through a real checkpoint/restore scenario. First, ensure CRIU is installed and your kernel supports the necessary features:

# Install CRIU

sudo apt install criu

# Check kernel compatibility

criu check --ms

# Verify Docker experimental features

docker version --format '{{.Server.Experimental}}'

Start a container with checkpoint support enabled:

# Run container with checkpoint support

docker run -d --name webapp \

  --security-opt seccomp:unconfined \

  --cap-add SYS_PTRACE \

  --cap-add SYS_ADMIN \

  nginx:latest

# Generate some state

docker exec webapp bash -c "echo 'test data' > /tmp/state.txt"

Create a checkpoint:

# Checkpoint the container

docker checkpoint create webapp checkpoint1

# Verify container is stopped

docker ps -a

Restore from checkpoint:

# Restore container from checkpoint

docker start --checkpoint checkpoint1 webapp

# Verify state preservation

docker exec webapp cat /tmp/state.txt

Integration with containerd

For more advanced use cases, containerd provides native CRIU integration:

# Create checkpoint with containerd

ctr task checkpoint --exit mycontainer checkpoint1

# Restore on same or different node

ctr task restore --live checkpoint1 mycontainer

Production Considerations

Performance Impact

Checkpoint operations aren't free:

  • Memory dump time: ~100MB/sec for typical workloads
  • Network freeze duration: 10-500ms depending on connection count
  • Restore time: Usually 2-5x faster than cold start

Kernel Requirements

CRIU requires specific kernel features:

  • CONFIG_CHECKPOINT_RESTORE=y
  • CONFIG_NAMESPACES=y
  • CONFIG_PID_NS=y
  • CONFIG_NET_NS=y

Security Implications

Checkpoint images contain complete process memory:

  • Encrypt checkpoint storage
  • Implement access controls
  • Consider secrets in memory dumps
  • Validate checkpoint integrity

Limitations and Gotchas

Network Connections

  • TCP connections can be restored, but may need re-establishment
  • UDP sockets restore more reliably
  • External services may timeout during migration

File System Dependencies

  • Absolute paths must exist on restore host
  • Mounted volumes need identical configuration
  • Device files may not be portable

Container Runtime Integration

  • Docker checkpoint support is experimental
  • Kubernetes native support is limited at the time of writing
  • Custom orchestration often required

Advanced Use Cases

Database Migration

For databases with large buffer pools:

# Checkpoint MySQL container with 8GB buffer pool

docker checkpoint create mysql-prod checkpoint-$(date +%s)

# Restore on new node in <30 seconds vs 5+ minutes cold start

docker start --checkpoint checkpoint-1634567890 mysql-prod

Stateful Service Scaling

CRIU enables novel scaling patterns:

  • Checkpoint running instance
  • Restore multiple copies for instant horizontal scaling
  • Preserve expensive initialization state

Future: Kubernetes Integration

Several projects are working on Kubernetes integration:

  • Kubernetes Enhancement Proposal (KEP) for native checkpoint/restore
  • Podman checkpoint integration with CRI-O
  • Third-party operators for automated live migration, one of which is DevZero

Conclusion

CRIU transforms container restart from a disruptive operation into seamless live migration. While not suitable for every workload, it's particularly valuable for:

  • Stateful applications with expensive initialization
  • Services with large in-memory caches
  • Long-running computations that need migration
  • Zero-downtime maintenance scenarios

The technology is production-ready for specific use cases, though broader ecosystem integration is still evolving. For organizations running stateful workloads at scale, CRIU provides a powerful tool for achieving true zero-downtime operations.

Ready to implement live migration in your infrastructure? Start with non-critical workloads, measure the performance characteristics, and gradually expand to more critical services as you build operational confidence.

Reduce Your Cloud Spend with Live Rightsizing MicroVMs
Run workloads in secure, right-sized microVMs with built-in observability and dynamic scaling. Just a single operator and you are on the path to reducing cloud spend.
Get full visiiblity and pay only for what you use.