kernel/pid_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
It is already difficult for users to troubleshoot which of multiple pid
limits restricts their workload. The per-(hierarchical-)NS pid_max would
contribute to the confusion.
Also, the implementation copies the limit upon creation from
parent, this pattern showed cumbersome with some attributes in legacy
cgroup controllers -- it's subject to race condition between parent's
limit modification and children creation and once copied it must be
changed in the descendant.
Let's do what other places do (ucounts or cgroup limits) -- create new
pid namespaces without any limit at all. The global limit (actually any
ancestor's limit) is still effectively in place, we avoid the
set/unshare race and bumps of global (ancestral) limit have the desired
effect on pid namespace that do not care.
Link: https://lore.kernel.org/r/20240408145819.8787-1-mkoutny@suse.com/
Link: https://lore.kernel.org/r/20250221170249.890014-1-mkoutny@suse.com/
Fixes: 7863dcc72d0f4 ("pid: allow pid_max to be set per pid namespace")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
kernel/pid_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 8f6cfec87555a..7098ed44e717d 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -107,7 +107,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
goto out_free_idr;
ns->ns.ops = &pidns_operations;
- ns->pid_max = parent_pid_ns->pid_max;
+ ns->pid_max = PID_MAX_LIMIT;
err = register_pidns_sysctls(ns);
if (err)
goto out_free_inum;
base-commit: 48a5eed9ad584315c30ed35204510536235ce402
--
2.48.1
On Wed, 05 Mar 2025 15:58:49 +0100, Michal Koutný wrote:
> It is already difficult for users to troubleshoot which of multiple pid
> limits restricts their workload. The per-(hierarchical-)NS pid_max would
> contribute to the confusion.
> Also, the implementation copies the limit upon creation from
> parent, this pattern showed cumbersome with some attributes in legacy
> cgroup controllers -- it's subject to race condition between parent's
> limit modification and children creation and once copied it must be
> changed in the descendant.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/1] pid: Do not set pid_max in new pid namespaces
https://git.kernel.org/vfs/vfs/c/d385c8bceb14
Am Mi., 5. März 2025 um 15:59 Uhr schrieb Michal Koutný <mkoutny@suse.com>:
>
> It is already difficult for users to troubleshoot which of multiple pid
> limits restricts their workload. The per-(hierarchical-)NS pid_max would
> contribute to the confusion.
> Also, the implementation copies the limit upon creation from
> parent, this pattern showed cumbersome with some attributes in legacy
> cgroup controllers -- it's subject to race condition between parent's
> limit modification and children creation and once copied it must be
> changed in the descendant.
>
> Let's do what other places do (ucounts or cgroup limits) -- create new
> pid namespaces without any limit at all. The global limit (actually any
> ancestor's limit) is still effectively in place, we avoid the
> set/unshare race and bumps of global (ancestral) limit have the desired
> effect on pid namespace that do not care.
>
> Link: https://lore.kernel.org/r/20240408145819.8787-1-mkoutny@suse.com/
> Link: https://lore.kernel.org/r/20250221170249.890014-1-mkoutny@suse.com/
> Fixes: 7863dcc72d0f4 ("pid: allow pid_max to be set per pid namespace")
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
Dear Michal,
This completely makes sense and I tend to agree. But we also need to
ensure that the kselftest for pid_max is not broken with this change.
Let me play with this stuff a bit and I get back with "Tested-by" ;-)
Kind regards,
Alex
> ---
> kernel/pid_namespace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 8f6cfec87555a..7098ed44e717d 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -107,7 +107,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
> goto out_free_idr;
> ns->ns.ops = &pidns_operations;
>
> - ns->pid_max = parent_pid_ns->pid_max;
> + ns->pid_max = PID_MAX_LIMIT;
> err = register_pidns_sysctls(ns);
> if (err)
> goto out_free_inum;
>
> base-commit: 48a5eed9ad584315c30ed35204510536235ce402
> --
> 2.48.1
>
On Thu, Mar 06, 2025 at 10:11:27AM +0100, Alexander Mikhalitsyn <alexander@mihalicyn.com> wrote: > This completely makes sense and I tend to agree. > But we also need to > ensure that the kselftest for pid_max is not broken with this change. I built [1] and ran it with three passes. Assuming it didn't rely on the copying. > Let me play with this stuff a bit and I get back with "Tested-by" ;-) I'd be happy for further validation. Thanks, Michal [1] https://github.com/Werkov/linux/commit/019b884d5a005dd8a3e18f0865b276fc3d804d7e
© 2016 - 2026 Red Hat, Inc.