Container Checkpoint/Restore with CRIU

Container restarts are a common occurrence in production systems. Whether it's a node failure, scheduled maintenance, or resource rebalancing, traditional container restart means losing all in-memory state and forcing applications to rebuild their working set from scratch. For stateful applications, this translates to service interruption, degraded performance, and potentially impacts end-users.

CRIU (Checkpoint/Restore In Userspace) changes this equation entirely. Instead of killing and restarting containers, CRIU enables live migration of running containers with full state preservation, including memory contents, open file descriptors, and network connections.

Note: It is essential to recognize that the “liveness” of live migration is implementation-specific; the following factors influence it: (a) Is the snapshotted or checkpointed process still running? (b) Latency to restore the checkpointed process? (c) Is there downtime between the checkpointed application serving live traffic and when the restored process is ready to serve live traffic?

The Problem with Traditional Container Restarts

When Kubernetes reschedules a pod or Docker restarts a container, the process is destructive:

SIGTERM sent to main process
SIGKILL after a grace period
All memory state is discarded
New container starts from scratch
Application rebuilds caches, reconnects to databases, and reloads configuration

For a web application with a 2GB in-memory cache, this may result in 30-60 seconds of degraded performance while the cache rebuilds. For a machine learning inference service with loaded models, the restart time could be several minutes.

How CRIU Works: Process State Serialization

CRIU operates at the Linux kernel level, leveraging several kernel features to capture and restore complete process state:

Memory Dumping

CRIU uses /proc/PID/pagemap and /proc/PID/maps to identify all memory regions belonging to a process tree. It then:

Freezes the process tree using ptrace(PTRACE_SEIZE)
Dumps all memory pages to disk
Captures memory mapping information (heap, stack, shared libraries)
Records memory protection flags and special mappings

File Descriptor Preservation

Every open file descriptor is catalogued and preserved:

Regular files: path and offset position
Sockets: protocol state, connection endpoints, buffer contents
Pipes: buffer data and connection topology
Device files: state-dependent handling

Process Tree Topology

CRIU reconstructs the exact process hierarchy:

Parent-child relationships
Process groups and sessions
Signal handlers and pending signals
CPU registers and execution state

Practical Implementation with Docker

Let's walk through a real checkpoint/restore scenario. First, ensure CRIU is installed and your kernel supports the necessary features:

# Install CRIU

sudo apt install criu

‍

# Check kernel compatibility

criu check --ms

‍

# Verify Docker experimental features

docker version --format '{{.Server.Experimental}}'

Start a container with checkpoint support enabled:

# Run container with checkpoint support

docker run -d --name webapp \

--security-opt seccomp:unconfined \

--cap-add SYS_PTRACE \

--cap-add SYS_ADMIN \

nginx:latest

‍

# Generate some state

docker exec webapp bash -c "echo 'test data' > /tmp/state.txt"

Create a checkpoint:

# Checkpoint the container

docker checkpoint create webapp checkpoint1

‍

# Verify container is stopped

docker ps -a

Restore from checkpoint:

# Restore container from checkpoint

docker start --checkpoint checkpoint1 webapp

‍

# Verify state preservation

docker exec webapp cat /tmp/state.txt

Integration with containerd

For more advanced use cases, containerd provides native CRIU integration:

# Create checkpoint with containerd

ctr task checkpoint --exit mycontainer checkpoint1

‍

# Restore on same or different node

ctr task restore --live checkpoint1 mycontainer

Production Considerations

Performance Impact

Checkpoint operations aren't free:

Memory dump time: ~100MB/sec for typical workloads
Network freeze duration: 10-500ms depending on connection count
Restore time: Usually 2-5x faster than cold start

Kernel Requirements

CRIU requires specific kernel features:

CONFIG_CHECKPOINT_RESTORE=y
CONFIG_NAMESPACES=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y

Security Implications

Checkpoint images contain complete process memory:

Encrypt checkpoint storage
Implement access controls
Consider secrets in memory dumps
Validate checkpoint integrity

Limitations and Gotchas

Network Connections

TCP connections can be restored, but may need re-establishment
UDP sockets restore more reliably
External services may timeout during migration

File System Dependencies

Absolute paths must exist on restore host
Mounted volumes need identical configuration
Device files may not be portable

Container Runtime Integration

Docker checkpoint support is experimental
Kubernetes native support is limited at the time of writing
Custom orchestration often required

Advanced Use Cases

Database Migration

For databases with large buffer pools:

# Checkpoint MySQL container with 8GB buffer pool

docker checkpoint create mysql-prod checkpoint-$(date +%s)

‍

# Restore on new node in <30 seconds vs 5+ minutes cold start

docker start --checkpoint checkpoint-1634567890 mysql-prod

Stateful Service Scaling

CRIU enables novel scaling patterns:

Checkpoint running instance
Restore multiple copies for instant horizontal scaling
Preserve expensive initialization state

Future: Kubernetes Integration

Several projects are working on Kubernetes integration:

Kubernetes Enhancement Proposal (KEP) for native checkpoint/restore
Podman checkpoint integration with CRI-O
Third-party operators for automated live migration, one of which is DevZero

Conclusion

CRIU transforms container restart from a disruptive operation into seamless live migration. While not suitable for every workload, it's particularly valuable for:

Stateful applications with expensive initialization
Services with large in-memory caches
Long-running computations that need migration
Zero-downtime maintenance scenarios

The technology is production-ready for specific use cases, though broader ecosystem integration is still evolving. For organizations running stateful workloads at scale, CRIU provides a powerful tool for achieving true zero-downtime operations.

Ready to implement live migration in your infrastructure? Start with non-critical workloads, measure the performance characteristics, and gradually expand to more critical services as you build operational confidence.

‍