The current landscape of constant automated attacks means container hardening isn’t optional. Here’s the configuration I apply to every publicly exposed Docker container.
These settings assume docker-compose.yml files. The goal: minimize blast radius when (not if) something gets compromised.
Non-Root User
Containers running as root can escalate to host root under certain conditions. Force a non-privileged user:
| |
Match this to a real user/group on your host. On Unraid, 99:100 maps to nobody:users.
Disable TTY and Stdin
Interactive shells are useful for debugging, not production. An attacker with container access gains nothing from these:
| |
Read-Only Filesystem
If the container doesn’t need to write anywhere outside of explicitly mounted volumes, lock it down:
| |
This breaks containers that write to unexpected locations. Start with it enabled, check logs, add writable tmpfs mounts where genuinely needed.
Block Privilege Escalation
Prevent processes inside the container from gaining new privileges after startup:
| |
This blocks setuid binaries and other privilege escalation vectors.
Drop All Capabilities
By default, Docker grants containers around 14 Linux capabilities. Most containers need zero of them. Drop everything:
| |
If the container fails, add back only what it specifically requires:
| |
For containers that need network scanning or raw sockets (rare), you might need NET_RAW. Question why before adding it.
Harden /tmp
The /tmp directory is the classic payload staging area. Mount it as tmpfs with execution disabled:
| |
The size limit prevents a malicious process from filling your host disk. noexec prevents downloaded payloads from running. nosuid blocks setuid binaries. nodev prevents device file creation.
Some applications (Plex, for example) need to execute files in /tmp for auto-updates. In those cases, change noexec to exec but keep the other restrictions.
Resource Limits
Prevent containers from consuming all host resources:
| |
Without pids_limit, a fork bomb inside the container takes down your host. Without mem_limit, a memory leak eventually triggers the OOM killer on random processes.
Set these based on actual container needs. Most services need far less than 3GB RAM and 3 CPUs.
Logging Limits
An attacker can fill your disk by generating massive log output. Cap it:
| |
This keeps total logs under 250MB per container. Adjust based on your monitoring needs.
Read-Only Volume Mounts
If a container only reads data, mount it read-only:
| |
A compromised Plex container shouldn’t be able to encrypt your media library.
Network Isolation
Run exposed containers in a separate network. If one gets compromised, it can’t reach your internal services:
| |
The database becomes unreachable from the dmz network. The exposed nginx can only reach what you explicitly connect to both networks.
Complete Example
Here’s a hardened container configuration:
| |
Reusing Configuration
YAML anchors prevent duplication across services:
| |
For larger setups, the include: directive splits configurations across files:
| |
When Things Break
Some containers won’t start with these restrictions. My approach:
- Apply all restrictions
- Check
docker logs <container> - Remove one restriction at a time until it works
- Document why that specific container needs the exception
Common issues:
- read_only fails: Add specific writable tmpfs mounts
- cap_drop breaks networking: Add back NET_BIND_SERVICE or NET_RAW
- noexec /tmp breaks updates: Switch to exec for that container
Limitations
These settings reduce blast radius. They don’t eliminate risk. A compromised container can still:
- Exfiltrate data it has read access to
- Attack other containers on the same network
- Attempt to exploit kernel vulnerabilities
For higher-risk services, consider running them in VMs or on dedicated hardware.
This works for my self-hosted setup. Adjust the resource limits and network topology for yours.
