[PATCH v6 0/5] pid_namespace: make init creation more flexible

Pavel Tikhomirov posted 5 patches 2 weeks, 4 days ago
MAINTAINERS                                   |   7 +
kernel/exit.c                                 |   3 +-
kernel/fork.c                                 |   5 +-
kernel/pid.c                                  |  19 +-
kernel/pid_namespace.c                        |   9 -
.../selftests/pid_namespace/.gitignore        |   1 +
.../testing/selftests/pid_namespace/Makefile  |   2 +-
.../pid_namespace/pidns_init_via_setns.c      | 238 ++++++++++++++++++
8 files changed, 264 insertions(+), 20 deletions(-)
create mode 100644 tools/testing/selftests/pid_namespace/pidns_init_via_setns.c
[PATCH v6 0/5] pid_namespace: make init creation more flexible
Posted by Pavel Tikhomirov 2 weeks, 4 days ago
The first patch properly annotates accesses to ->child_reaper with
_ONCE macroses, to protect unlocked accesses from possible cpu/compiler
optimization problems.

The second patch makes sure that the init is always a first process in
the pid namespace, previously this was only checked for set_tid case.

The third patch allows to join pid namespace before pid namespace init
is created, that allows to create pid namespace by one process and then
create pid namespace init from another process after setns(). Please see
the detailed description in the patch commit message. It depends on the
second patch.

The forth and the final patch is a comprehansive test, that tests both
basic usecase of creating pid namespace and init separately, and a more
specific usecase which shows how we can improve clone3(set_tid)
usability after this change.

This change is generally useful as it makes clone3(set_tid) more
universal, and let's it work in all the cases evenly. Also it is highly
useful to CRIU to handle nested containers.

v2: Use *_ONCE for ->child_reaper accesses atomicity, and avoid taking
task_list lock for reading it. Rebase to master.
v3: Separate *_ONCE change and "init is first" checks into separate
commits.
v4: Update second patch commit message. Include Oleg's review tags.
v5: Handle one more READ_ONCE case. Include Andrei's review tags. Base
on top of mm tree.
v6: Require root in pidns_init_via_setns_set_tid test and move the
wrapper userns creation to the top of pidns_init_via_setns. Add an entry
to MAINTAINERS file for the tests. Base back on top of master.

This series is also available here:
https://github.com/Snorch/linux/commits/allow-creating-pid-namespace-init-after-setns-v6/

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Pavel Tikhomirov (5):
  pid_namespace: avoid optimization of accesses to ->child_reaper
  pid: check init is created first after idr alloc
  pid_namespace: allow opening pid_for_children before init was created
  selftests: Add tests for creating pidns init via setns
  MAINTAINERS: add a new entry for testing pidns init creation via setns

 MAINTAINERS                                   |   7 +
 kernel/exit.c                                 |   3 +-
 kernel/fork.c                                 |   5 +-
 kernel/pid.c                                  |  19 +-
 kernel/pid_namespace.c                        |   9 -
 .../selftests/pid_namespace/.gitignore        |   1 +
 .../testing/selftests/pid_namespace/Makefile  |   2 +-
 .../pid_namespace/pidns_init_via_setns.c      | 238 ++++++++++++++++++
 8 files changed, 264 insertions(+), 20 deletions(-)
 create mode 100644 tools/testing/selftests/pid_namespace/pidns_init_via_setns.c

-- 
2.53.0
Re: [PATCH v6 0/5] pid_namespace: make init creation more flexible
Posted by Christian Brauner 2 weeks, 1 day ago
On Wed, 18 Mar 2026 13:21:48 +0100, Pavel Tikhomirov wrote:
> The first patch properly annotates accesses to ->child_reaper with
> _ONCE macroses, to protect unlocked accesses from possible cpu/compiler
> optimization problems.
> 
> The second patch makes sure that the init is always a first process in
> the pid namespace, previously this was only checked for set_tid case.
> 
> [...]

Applied to the kernel-7.1.misc branch of the vfs/vfs.git tree.
Patches in the kernel-7.1.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: kernel-7.1.misc

[1/5] pid_namespace: avoid optimization of accesses to ->child_reaper
      https://git.kernel.org/vfs/vfs/c/d9c857aee2eb
[2/5] pid: check init is created first after idr alloc
      https://git.kernel.org/vfs/vfs/c/39c8806e2d88
[3/5] pid_namespace: allow opening pid_for_children before init was created
      https://git.kernel.org/vfs/vfs/c/a3bdc23ba8ea
[4/5] selftests: Add tests for creating pidns init via setns
      https://git.kernel.org/vfs/vfs/c/7c5219e1a606
[5/5] MAINTAINERS: add a new entry for testing pidns init creation via setns
      https://git.kernel.org/vfs/vfs/c/2b46715fd9ec