Pod Security
Every pod managed by Lucity runs with a hardened security context. No root processes, no extra Linux capabilities, no privilege escalation. These aren't optional flags you can forget to set. The platform enforces them in the pod specs it generates.
Seccomp Profiles
Seccomp (Secure Computing Mode) is a Linux kernel feature that restricts which system calls a process can make. Lucity uses the RuntimeDefault seccomp profile on all workload pods. It's built into containerd and blocks approximately 44 dangerous syscalls:
mount,umount(filesystem manipulation)ptrace(process debugging/injection)reboot,init_module(system-level operations)keyctl(kernel keyring access)- And dozens more that containers should never need
securityContext:
seccompProfile:
type: RuntimeDefault
This is compatible with virtually all workloads. Unlike gVisor (which intercepts all syscalls and breaks some applications), RuntimeDefault only blocks syscalls that no legitimate container workload should use.
Security Context
User workload pods
Workload pods (Deployments and CronJobs created by the lucity-app chart) run with:
# Pod-level
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
# Container-level
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsUser: 1000: Forces the container process to run as UID 1000. The build pipeline appends post-build steps to create a non-root user and set USER 1000:1000 in the image config, but runAsUser in the pod spec acts as a second layer: even if the image metadata were missing or overridden, the container still runs non-root.
runAsGroup: 1000 / fsGroup: 1000: Sets the primary group and filesystem group. Any volumes mounted into the pod are owned by GID 1000, ensuring the non-root process can write to them.
runAsNonRoot: true: A safety net. Even if runAsUser were removed, this would prevent the container from running as root.
allowPrivilegeEscalation: false: Blocks setuid/setgid binaries and the PR_SET_NO_NEW_PRIVS flag. Even if a binary inside the container has the setuid bit, it can't escalate.
capabilities: drop: ["ALL"]: Linux capabilities are fine-grained root privileges (e.g., NET_RAW for raw sockets, SYS_ADMIN for mount). Dropping all of them means the container process has exactly zero special kernel privileges.
Non-root image compatibility
Railpack doesn't create a non-root user in built images. To prevent runtime write failures (e.g., Next.js writing to .next/cache, Nuxt writing to .output/), the build pipeline appends post-build steps that:
- Create a
lucityuser with UID 1000 in the image chownthe image'sWORKDIRto UID 1000- Set
USER 1000:1000in the image config
The WORKDIR is read from railpack's image config (not hardcoded), so this works for any framework or language.
Build Job pods
Build Job pods run trusted platform code (clone, railpack detection, BuildKit client). They currently inherit the default security context from the Kubernetes namespace but are not explicitly hardened with runAsNonRoot because they need to execute in the build runner environment. The primary isolation for builds comes from namespace separation and network policies.
BuildKit Exception
BuildKit requires seccompProfile: Unconfined to function. The OCI worker needs syscalls like unshare and mount to set up build environments, even with --oci-worker-no-process-sandbox. This is a BuildKit requirement, not a Lucity design choice.
BuildKit itself runs as UID 1000 (non-root), but with Unconfined seccomp. Each RUN step gets its own mount namespace (the filesystem is the image layers, not buildkitd's), but shares the PID and network namespace with buildkitd. RUN steps inherit the Unconfined seccomp profile, meaning they can make syscalls that RuntimeDefault would block.
This is why the other isolation layers are critical:
- Network policies restrict what buildkitd (and therefore
RUNsteps) can reach over the network - Namespace isolation keeps builds in
lucity-builds, away from platform services - Non-root execution (UID 1000) limits what damage a compromised build can do at the OS level
See Build Isolation for the full model.
What About gVisor?
gVisor provides stronger isolation by intercepting all syscalls through a user-space kernel. It's a meaningful security upgrade for hostile multi-tenant environments where tenants actively attempt container escapes.
The trade-offs:
| Seccomp RuntimeDefault | gVisor | |
|---|---|---|
| Compatibility | ~99% of workloads | ~85% of workloads |
| Performance | Negligible overhead | 10-30% I/O overhead |
| Node setup | None (built into containerd) | Binary install + containerd config per node |
| Maintenance | Zero | Monthly binary updates, node automation |
For most Lucity deployments (teams running their own workloads), seccomp RuntimeDefault with dropped capabilities provides strong isolation without the compatibility and operational costs of gVisor. If your threat model includes actively malicious tenants, consider adding gVisor on dedicated node pools for user workloads.