kernel/cgroup/cgroup.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
From: Chen Ridong <chenridong@huawei.com>
Kernel fault injection test reports NULL pointer dereference as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000010
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1611 Comm: umount Tainted: G W 6.12.0-rc2
Tainted: [W]=WARN
RIP: 0010:__percpu_ref_switch_mode+0x30/0x320
RSP: 0018:ffffc90001f9be28 EFLAGS: 00000002
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffffffff82c12d18 RDI: ffff88810753a7d8
RBP: ffff88810f3a0888 R08: 0000000000000001 R09: 00000000000c0000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000111edc000 CR4: 00000000000006f0
Call Trace:
percpu_ref_kill_and_confirm+0x3a/0x90
cgroup_kill_sb+0x61/0x190
deactivate_locked_super+0x35/0xa0
cleanup_mnt+0x100/0x160
task_work_run+0x5c/0x90
syscall_exit_to_user_mode+0x1bc/0x1d0
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
A warning was also found in dmesg when the bug occurs,
WARNING: CPU: 5 PID: 1554 at kernel/cgroup/cgroup.c:2144
Call Trace:
? cgroup_setup_root+0x39c/0x440
cgroup1_get_tree+0x38f/0x860
? cgroup1_get_tree+0x71/0x860
vfs_get_tree+0x2c/0x100
path_mount+0x2e3/0xb90
__x64_sys_mount+0x19f/0x200
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
As mentioned above, when cgroup_bpf_inherit returns an error in
cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is
killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt
may have become NULL, leading to NULL pointer dereference.
To fix this issue, goto err when cgroup_bpf_inherit returns an error.
Additionally, if cgroup_bpf_inherit returns an error after rebinding
subsystems, the root_cgrp->self.refcnt is exited, which leads to
cgroup1_root_to_use return 1 (restart) when subsystems is mounted next.
This is due to a failure trying to get the refcnt(the root is root_cgrp,
without rebinding back to cgrp_dfl_root). So move the call to
cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root.
Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cgroup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 5886b95c6eae..8a0cbf95cc57 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2136,12 +2136,13 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
if (ret)
goto destroy_root;
- ret = rebind_subsystems(root, ss_mask);
+ ret = cgroup_bpf_inherit(root_cgrp);
if (ret)
goto exit_stats;
- ret = cgroup_bpf_inherit(root_cgrp);
- WARN_ON_ONCE(ret);
+ ret = rebind_subsystems(root, ss_mask);
+ if (ret)
+ goto exit_stats;
trace_cgroup_setup_root(root);
--
2.34.1
Hello. On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: > As mentioned above, when cgroup_bpf_inherit returns an error in > cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is > killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt > may have become NULL, leading to NULL pointer dereference. > > To fix this issue, goto err when cgroup_bpf_inherit returns an error. > Additionally, if cgroup_bpf_inherit returns an error after rebinding > subsystems, the root_cgrp->self.refcnt is exited, which leads to > cgroup1_root_to_use return 1 (restart) when subsystems is mounted next. > This is due to a failure trying to get the refcnt(the root is root_cgrp, > without rebinding back to cgrp_dfl_root). So move the call to > cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root. > > Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline") > Signed-off-by: Chen Ridong <chenridong@huawei.com> Hm, I always thought that BPF progs can only be attached to the default hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent that). Thus I wonder whether cgroup_bpf_inherit (which is more like cgroup_bpf_init in this case) needs to be called no v1 roots at all (and with such a change, 04f8ef5643bc could be effectively reverted too). Or can bpf data be used on v1 hierarchies somehow? Thanks, Michal
On Wed, Oct 16, 2024 at 03:13:52PM +0200, Michal Koutný wrote: > Hello. > > On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: > > As mentioned above, when cgroup_bpf_inherit returns an error in > > cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is > > killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt > > may have become NULL, leading to NULL pointer dereference. > > > > To fix this issue, goto err when cgroup_bpf_inherit returns an error. > > Additionally, if cgroup_bpf_inherit returns an error after rebinding > > subsystems, the root_cgrp->self.refcnt is exited, which leads to > > cgroup1_root_to_use return 1 (restart) when subsystems is mounted next. > > This is due to a failure trying to get the refcnt(the root is root_cgrp, > > without rebinding back to cgrp_dfl_root). So move the call to > > cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root. > > > > Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline") > > Signed-off-by: Chen Ridong <chenridong@huawei.com> > > Hm, I always thought that BPF progs can only be attached to the default > hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent > that). > > Thus I wonder whether cgroup_bpf_inherit (which is more like > cgroup_bpf_init in this case) needs to be called no v1 roots at all (and > with such a change, 04f8ef5643bc could be effectively reverted too). > > Or can bpf data be used on v1 hierarchies somehow? We relaxed some of the usages (see cgroup_v1v2_get_from_fd()) but cgroup BPF progs can only be attached to v2. Thanks. -- tejun
On 2024/10/17 1:04, Tejun Heo wrote: > On Wed, Oct 16, 2024 at 03:13:52PM +0200, Michal Koutný wrote: >> Hello. >> >> On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: >>> As mentioned above, when cgroup_bpf_inherit returns an error in >>> cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is >>> killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt >>> may have become NULL, leading to NULL pointer dereference. >>> >>> To fix this issue, goto err when cgroup_bpf_inherit returns an error. >>> Additionally, if cgroup_bpf_inherit returns an error after rebinding >>> subsystems, the root_cgrp->self.refcnt is exited, which leads to >>> cgroup1_root_to_use return 1 (restart) when subsystems is mounted next. >>> This is due to a failure trying to get the refcnt(the root is root_cgrp, >>> without rebinding back to cgrp_dfl_root). So move the call to >>> cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root. >>> >>> Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline") >>> Signed-off-by: Chen Ridong <chenridong@huawei.com> >> >> Hm, I always thought that BPF progs can only be attached to the default >> hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent >> that). >> >> Thus I wonder whether cgroup_bpf_inherit (which is more like >> cgroup_bpf_init in this case) needs to be called no v1 roots at all (and >> with such a change, 04f8ef5643bc could be effectively reverted too). >> >> Or can bpf data be used on v1 hierarchies somehow? > > We relaxed some of the usages (see cgroup_v1v2_get_from_fd()) but cgroup BPF > progs can only be attached to v2. > > Thanks. > So, should commit 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline") be reverted, and should cgroup_bpf_inherit be only called in v2? Have I understood this correctly? Best regards, Ridong
On Thu, Oct 17, 2024 at 08:17:58PM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote: > So, should commit 04f8ef5643bc ("cgroup: Fix memory leak caused by missing > cgroup_bpf_offline") be reverted, and should cgroup_bpf_inherit be only > called in v2? Yes, that should resolve both the original leak and the current NULL ptr dereference without extra rebinding complications. I'd like to keep only cgroup v2 be attachable by BPF programs. cgroup_v1v2_get_from_fd serves the needs of traversing v1 hierarchies, not program attchments. I hope this proposal is workable also from the CC'ed BPF people perspective. Michal
© 2016 - 2024 Red Hat, Inc.