CRIU is used by LXD to save the state of an LXD container, very similar to suspe...

virtuous_sloth · on June 21, 2024

Stéphane Graber (key Incus née LXD contributor) just did a video about developing placement scriptlets in the Starlark language but the interesting thing is, if I'm interpreting what I saw correctly, his cluster was 6 beefy servers plus 3 decent-sized VMs and the idea was, I think, that containers could get placed on the nested VMs, neatly solving the migration issue with containers. The interesting part was it looked like the 3 VMs in the cluster may have been themselves in the cluster.

I could be wrong, though. Interesting approach if true

ranger_danger · on June 21, 2024

> Linux just keeps on working for years without a reboot

Except I would strongly suggest not doing that as there have been some very nasty security issues fixed as of late.

skissane · on June 21, 2024

> (00.121636) Error (criu/namespaces.c:423): Can't dump nested uts namespace for 2685261

Found a GitHub issue for this: https://github.com/checkpoint-restore/criu/issues/1430

The issue apparently is newer systemd versions create their own UTS namespace, so suddenly running systemd in a container results in nested UTS namespace. Containers with older versions of systemd, or which don't use systemd, shouldn't have the issue.

One commenter posted in April 2021 that they had a patch to add support for nested UTS namespaces, but they don't appear to have submitted it: https://github.com/checkpoint-restore/criu/issues/1430#issue...

Comment on another issue has suggestion on how to implement nested UTS namespace support: https://github.com/checkpoint-restore/criu/issues/1011#issue...

It doesn't sound like nested UTS namespace support is impossible, just something nobody has got around to implementing.

Comment in CRIU source code says nested namespaces are only supported for mount namespaces (CLONE_NEWNS) and network namespaces (CLONE_NEWNET): https://github.com/checkpoint-restore/criu/blob/b5e2025765b9...

But if you look at the OpenVZ fork of CRIU, you see it also supports PID (CLONE_NEWPID), UTS (CLONE_NEWUTS) and IPC (CLONE_NEWIPC) namespaces: https://bitbucket.org/openvz/criu.ovz/src/d9bf55896015a27df9...

I don't know why these additional features in OpenVZ CRIU don't exist in the upstream.

I think the main blocker to supporting nesting of the other namespace types (user, cgroup, time), is someone getting around to write the code for the support. It is possible some of them pose some kind of architectural issue where some kernel enhancement might be necessary (if that's true of any, I'd say most likely of user), but I suspect for most of them it is simply a matter that nobody has gotten around to it.

The other issue is eventually someone will add another namespace type to the Linux kernel, and then CRIU will need to support that too.