[PATCH] pid: Do not set pid_max in new pid namespaces

Michal Koutný posted 1 patch 11 months, 1 week ago
kernel/pid_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] pid: Do not set pid_max in new pid namespaces
Posted by Michal Koutný 11 months, 1 week ago
It is already difficult for users to troubleshoot which of multiple pid
limits restricts their workload. The per-(hierarchical-)NS pid_max would
contribute to the confusion.
Also, the implementation copies the limit upon creation from
parent, this pattern showed cumbersome with some attributes in legacy
cgroup controllers -- it's subject to race condition between parent's
limit modification and children creation and once copied it must be
changed in the descendant.

Let's do what other places do (ucounts or cgroup limits) -- create new
pid namespaces without any limit at all. The global limit (actually any
ancestor's limit) is still effectively in place, we avoid the
set/unshare race and bumps of global (ancestral) limit have the desired
effect on pid namespace that do not care.

Link: https://lore.kernel.org/r/20240408145819.8787-1-mkoutny@suse.com/
Link: https://lore.kernel.org/r/20250221170249.890014-1-mkoutny@suse.com/
Fixes: 7863dcc72d0f4 ("pid: allow pid_max to be set per pid namespace")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/pid_namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 8f6cfec87555a..7098ed44e717d 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -107,7 +107,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 		goto out_free_idr;
 	ns->ns.ops = &pidns_operations;
 
-	ns->pid_max = parent_pid_ns->pid_max;
+	ns->pid_max = PID_MAX_LIMIT;
 	err = register_pidns_sysctls(ns);
 	if (err)
 		goto out_free_inum;

base-commit: 48a5eed9ad584315c30ed35204510536235ce402
-- 
2.48.1

Re: [PATCH] pid: Do not set pid_max in new pid namespaces
Posted by Christian Brauner 11 months, 1 week ago
On Wed, 05 Mar 2025 15:58:49 +0100, Michal Koutný wrote:
> It is already difficult for users to troubleshoot which of multiple pid
> limits restricts their workload. The per-(hierarchical-)NS pid_max would
> contribute to the confusion.
> Also, the implementation copies the limit upon creation from
> parent, this pattern showed cumbersome with some attributes in legacy
> cgroup controllers -- it's subject to race condition between parent's
> limit modification and children creation and once copied it must be
> changed in the descendant.
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] pid: Do not set pid_max in new pid namespaces
      https://git.kernel.org/vfs/vfs/c/d385c8bceb14
Re: [PATCH] pid: Do not set pid_max in new pid namespaces
Posted by Alexander Mikhalitsyn 11 months, 1 week ago
Am Mi., 5. März 2025 um 15:59 Uhr schrieb Michal Koutný <mkoutny@suse.com>:
>
> It is already difficult for users to troubleshoot which of multiple pid
> limits restricts their workload. The per-(hierarchical-)NS pid_max would
> contribute to the confusion.
> Also, the implementation copies the limit upon creation from
> parent, this pattern showed cumbersome with some attributes in legacy
> cgroup controllers -- it's subject to race condition between parent's
> limit modification and children creation and once copied it must be
> changed in the descendant.
>
> Let's do what other places do (ucounts or cgroup limits) -- create new
> pid namespaces without any limit at all. The global limit (actually any
> ancestor's limit) is still effectively in place, we avoid the
> set/unshare race and bumps of global (ancestral) limit have the desired
> effect on pid namespace that do not care.
>
> Link: https://lore.kernel.org/r/20240408145819.8787-1-mkoutny@suse.com/
> Link: https://lore.kernel.org/r/20250221170249.890014-1-mkoutny@suse.com/
> Fixes: 7863dcc72d0f4 ("pid: allow pid_max to be set per pid namespace")
> Signed-off-by: Michal Koutný <mkoutny@suse.com>

Dear Michal,

This completely makes sense and I tend to agree. But we also need to
ensure that the kselftest for pid_max is not broken with this change.
Let me play with this stuff a bit and I get back with "Tested-by" ;-)

Kind regards,
Alex

> ---
>  kernel/pid_namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 8f6cfec87555a..7098ed44e717d 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -107,7 +107,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
>                 goto out_free_idr;
>         ns->ns.ops = &pidns_operations;
>
> -       ns->pid_max = parent_pid_ns->pid_max;
> +       ns->pid_max = PID_MAX_LIMIT;
>         err = register_pidns_sysctls(ns);
>         if (err)
>                 goto out_free_inum;
>
> base-commit: 48a5eed9ad584315c30ed35204510536235ce402
> --
> 2.48.1
>
Re: [PATCH] pid: Do not set pid_max in new pid namespaces
Posted by Michal Koutný 11 months, 1 week ago
On Thu, Mar 06, 2025 at 10:11:27AM +0100, Alexander Mikhalitsyn <alexander@mihalicyn.com> wrote:
> This completely makes sense and I tend to agree.
> But we also need to
> ensure that the kselftest for pid_max is not broken with this change.

I built [1] and ran it with three passes. Assuming it didn't rely on the
copying.

> Let me play with this stuff a bit and I get back with "Tested-by" ;-)

I'd be happy for further validation.

Thanks,
Michal

[1] https://github.com/Werkov/linux/commit/019b884d5a005dd8a3e18f0865b276fc3d804d7e