fs/kernfs/file.c | 2 +- kernel/cgroup/cgroup.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
From: Chen Ridong <chenridong@huawei.com>
A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
Stall Information) monitoring mechanism:
BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
psi_trigger_poll+0x3c/0x140
cgroup_pressure_poll+0x70/0xa0
cgroup_file_poll+0x8c/0x100
kernfs_fop_poll+0x11c/0x1c0
ep_item_poll.isra.0+0x188/0x2c0
Allocated by task 1:
cgroup_file_open+0x88/0x388
kernfs_fop_open+0x73c/0xaf0
do_dentry_open+0x5fc/0x1200
vfs_open+0xa0/0x3f0
do_open+0x7e8/0xd08
path_openat+0x2fc/0x6b0
do_filp_open+0x174/0x368
Freed by task 8462:
cgroup_file_release+0x130/0x1f8
kernfs_drain_open_files+0x17c/0x440
kernfs_drain+0x2dc/0x360
kernfs_show+0x1b8/0x288
cgroup_file_show+0x150/0x268
cgroup_pressure_write+0x1dc/0x340
cgroup_file_write+0x274/0x548
Reproduction Steps:
1. Open test/cpu.pressure and establish epoll monitoring
2. Disable monitoring: echo 0 > test/cgroup.pressure
3. Re-enable monitoring: echo 1 > test/cgroup.pressure
The race condition occurs because:
1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
- Releases PSI triggers via cgroup_file_release()
- Frees of->priv through kernfs_drain_open_files()
2. While epoll still holds reference to the file and continues polling
3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
epolling disable/enable cgroup.pressure
fd=open(cpu.pressure)
while(1)
...
epoll_wait
kernfs_fop_poll
kernfs_get_active = true echo 0 > cgroup.pressure
... cgroup_file_show
kernfs_show
// inactive kn
kernfs_drain_open_files
cft->release(of);
kfree(ctx);
...
kernfs_get_active = false
echo 1 > cgroup.pressure
kernfs_show
kernfs_activate_one(kn);
kernfs_fop_poll
kernfs_get_active = true
cgroup_file_poll
psi_trigger_poll
// UAF
...
end: close(fd)
Fix this by adding released flag check in kernfs_fop_poll(), which is
treated as kn is inactive.
Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
fs/kernfs/file.c | 2 +-
kernel/cgroup/cgroup.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index a6c692cac616..d5d01f0b9392 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
__poll_t ret;
- if (!kernfs_get_active(kn))
+ if (of->released || !kernfs_get_active(kn))
return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
if (kn->attr.ops->poll)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..d8b82afed181 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4159,6 +4159,7 @@ static void cgroup_file_release(struct kernfs_open_file *of)
cft->release(of);
put_cgroup_ns(ctx->ns);
kfree(ctx);
+ of->priv = NULL;
}
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
--
2.34.1
On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > Stall Information) monitoring mechanism: > > BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > > psi_trigger_poll+0x3c/0x140 > cgroup_pressure_poll+0x70/0xa0 > cgroup_file_poll+0x8c/0x100 > kernfs_fop_poll+0x11c/0x1c0 > ep_item_poll.isra.0+0x188/0x2c0 > > Allocated by task 1: > cgroup_file_open+0x88/0x388 > kernfs_fop_open+0x73c/0xaf0 > do_dentry_open+0x5fc/0x1200 > vfs_open+0xa0/0x3f0 > do_open+0x7e8/0xd08 > path_openat+0x2fc/0x6b0 > do_filp_open+0x174/0x368 > > Freed by task 8462: > cgroup_file_release+0x130/0x1f8 > kernfs_drain_open_files+0x17c/0x440 > kernfs_drain+0x2dc/0x360 > kernfs_show+0x1b8/0x288 > cgroup_file_show+0x150/0x268 > cgroup_pressure_write+0x1dc/0x340 > cgroup_file_write+0x274/0x548 > > Reproduction Steps: > 1. Open test/cpu.pressure and establish epoll monitoring > 2. Disable monitoring: echo 0 > test/cgroup.pressure > 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > > The race condition occurs because: > 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > - Releases PSI triggers via cgroup_file_release() > - Frees of->priv through kernfs_drain_open_files() > 2. While epoll still holds reference to the file and continues polling > 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > > epolling disable/enable cgroup.pressure > fd=open(cpu.pressure) > while(1) > ... > epoll_wait > kernfs_fop_poll > kernfs_get_active = true echo 0 > cgroup.pressure > ... cgroup_file_show > kernfs_show > // inactive kn > kernfs_drain_open_files > cft->release(of); > kfree(ctx); > ... > kernfs_get_active = false > echo 1 > cgroup.pressure > kernfs_show > kernfs_activate_one(kn); > kernfs_fop_poll > kernfs_get_active = true > cgroup_file_poll > psi_trigger_poll > // UAF > ... > end: close(fd) > > Fix this by adding released flag check in kernfs_fop_poll(), which is > treated as kn is inactive. > > Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > fs/kernfs/file.c | 2 +- > kernel/cgroup/cgroup.c | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index a6c692cac616..d5d01f0b9392 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > __poll_t ret; > > - if (!kernfs_get_active(kn)) > + if (of->released || !kernfs_get_active(kn)) I can see why the cgroup change is needed, but why is this test for released() an issue here? This feels like two different changes/fixes for different problems? Why does testing for released matter at this point in time? thanks, greg k-h
On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote: > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > > index a6c692cac616..d5d01f0b9392 100644 > > --- a/fs/kernfs/file.c > > +++ b/fs/kernfs/file.c > > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > > struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > > __poll_t ret; > > > > - if (!kernfs_get_active(kn)) > > + if (of->released || !kernfs_get_active(kn)) > > I can see why the cgroup change is needed, I don't see it that much. of->priv isn't checked in cgroup code anywhere so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL pointer deref :-/ (Additionally, same zeroing would be needed in error path in cgroup_file_open().) I _think_ the place to cleanup would be in @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, psi->enabled = enable; if (enable) psi_cgroup_restart(psi); + else + psi_trigger_destroy(???); } cgroup_kn_unlock(of->kn); The issue is that cgroup_pressure_write doesn't know all possible triggers to be cancelled. (The fix with of->released would only sanitize effect but not the cause IMO.) HTH, Michal
On 2025/8/15 22:42, Michal Koutný wrote: > On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote: >>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >>> index a6c692cac616..d5d01f0b9392 100644 >>> --- a/fs/kernfs/file.c >>> +++ b/fs/kernfs/file.c >>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >>> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >>> __poll_t ret; >>> >>> - if (!kernfs_get_active(kn)) >>> + if (of->released || !kernfs_get_active(kn)) >> >> I can see why the cgroup change is needed, > > I don't see it that much. of->priv isn't checked in cgroup code anywhere > so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL > pointer deref :-/ (Additionally, same zeroing would be needed in error > path in cgroup_file_open().) > Thank you, Michal, I believe assigning NULL to of->priv should be harmless. This change would make the bug more observable in practice. Without this explicit NULL assignment, the use-after-free (UAF) issue might remain hidden in some cases, particularly when KASAN is disabled. > I _think_ the place to cleanup would be in > @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, > psi->enabled = enable; > if (enable) > psi_cgroup_restart(psi); > + else > + psi_trigger_destroy(???); > } > Could you please provide more details about this modification? Do you mean we need to consider additional cleanup work when disabling cgroup.pressure? The psi_trigger_destroy is invoked as follows: cgroup_file_show kernfs_drain kernfs_drain_open_files kernfs_release_file cgroup_file_release cft->release(of); cgroup_pressure_release psi_trigger_destroy > cgroup_kn_unlock(of->kn); > > The issue is that cgroup_pressure_write doesn't know all possible > triggers to be cancelled. (The fix with of->released would only > sanitize effect but not the cause IMO.) > > HTH, > Michal -- Best regards, Ridong
On 2025/8/15 14:11, Greg KH wrote: > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >> Stall Information) monitoring mechanism: >> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >> >> psi_trigger_poll+0x3c/0x140 >> cgroup_pressure_poll+0x70/0xa0 >> cgroup_file_poll+0x8c/0x100 >> kernfs_fop_poll+0x11c/0x1c0 >> ep_item_poll.isra.0+0x188/0x2c0 >> >> Allocated by task 1: >> cgroup_file_open+0x88/0x388 >> kernfs_fop_open+0x73c/0xaf0 >> do_dentry_open+0x5fc/0x1200 >> vfs_open+0xa0/0x3f0 >> do_open+0x7e8/0xd08 >> path_openat+0x2fc/0x6b0 >> do_filp_open+0x174/0x368 >> >> Freed by task 8462: >> cgroup_file_release+0x130/0x1f8 >> kernfs_drain_open_files+0x17c/0x440 >> kernfs_drain+0x2dc/0x360 >> kernfs_show+0x1b8/0x288 >> cgroup_file_show+0x150/0x268 >> cgroup_pressure_write+0x1dc/0x340 >> cgroup_file_write+0x274/0x548 >> >> Reproduction Steps: >> 1. Open test/cpu.pressure and establish epoll monitoring >> 2. Disable monitoring: echo 0 > test/cgroup.pressure >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >> >> The race condition occurs because: >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >> - Releases PSI triggers via cgroup_file_release() >> - Frees of->priv through kernfs_drain_open_files() >> 2. While epoll still holds reference to the file and continues polling >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >> >> epolling disable/enable cgroup.pressure >> fd=open(cpu.pressure) >> while(1) >> ... >> epoll_wait >> kernfs_fop_poll >> kernfs_get_active = true echo 0 > cgroup.pressure >> ... cgroup_file_show >> kernfs_show >> // inactive kn >> kernfs_drain_open_files >> cft->release(of); >> kfree(ctx); >> ... >> kernfs_get_active = false >> echo 1 > cgroup.pressure >> kernfs_show >> kernfs_activate_one(kn); >> kernfs_fop_poll >> kernfs_get_active = true >> cgroup_file_poll >> psi_trigger_poll >> // UAF >> ... >> end: close(fd) >> >> Fix this by adding released flag check in kernfs_fop_poll(), which is >> treated as kn is inactive. >> >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> >> Signed-off-by: Chen Ridong <chenridong@huawei.com> >> --- >> fs/kernfs/file.c | 2 +- >> kernel/cgroup/cgroup.c | 1 + >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >> index a6c692cac616..d5d01f0b9392 100644 >> --- a/fs/kernfs/file.c >> +++ b/fs/kernfs/file.c >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >> __poll_t ret; >> >> - if (!kernfs_get_active(kn)) >> + if (of->released || !kernfs_get_active(kn)) > > I can see why the cgroup change is needed, but why is this test for > released() an issue here? This feels like two different changes/fixes > for different problems? Why does testing for released matter at this > point in time? > > thanks, > > greg k-h Thank you for your feedback. The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL pointer access problem. If the open file is properly drained(released), it can not safely invoke the poll callback again. Otherwise, it may still lead to either UAF or NULL pointer issues Are you suggesting I should split this into two separate patches? -- Best regards, Ridong
On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote: > > > On 2025/8/15 14:11, Greg KH wrote: > > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: > >> From: Chen Ridong <chenridong@huawei.com> > >> > >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure > >> Stall Information) monitoring mechanism: > >> > >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 > >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 > >> > >> psi_trigger_poll+0x3c/0x140 > >> cgroup_pressure_poll+0x70/0xa0 > >> cgroup_file_poll+0x8c/0x100 > >> kernfs_fop_poll+0x11c/0x1c0 > >> ep_item_poll.isra.0+0x188/0x2c0 > >> > >> Allocated by task 1: > >> cgroup_file_open+0x88/0x388 > >> kernfs_fop_open+0x73c/0xaf0 > >> do_dentry_open+0x5fc/0x1200 > >> vfs_open+0xa0/0x3f0 > >> do_open+0x7e8/0xd08 > >> path_openat+0x2fc/0x6b0 > >> do_filp_open+0x174/0x368 > >> > >> Freed by task 8462: > >> cgroup_file_release+0x130/0x1f8 > >> kernfs_drain_open_files+0x17c/0x440 > >> kernfs_drain+0x2dc/0x360 > >> kernfs_show+0x1b8/0x288 > >> cgroup_file_show+0x150/0x268 > >> cgroup_pressure_write+0x1dc/0x340 > >> cgroup_file_write+0x274/0x548 > >> > >> Reproduction Steps: > >> 1. Open test/cpu.pressure and establish epoll monitoring > >> 2. Disable monitoring: echo 0 > test/cgroup.pressure > >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure > >> > >> The race condition occurs because: > >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: > >> - Releases PSI triggers via cgroup_file_release() > >> - Frees of->priv through kernfs_drain_open_files() > >> 2. While epoll still holds reference to the file and continues polling > >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv > >> > >> epolling disable/enable cgroup.pressure > >> fd=open(cpu.pressure) > >> while(1) > >> ... > >> epoll_wait > >> kernfs_fop_poll > >> kernfs_get_active = true echo 0 > cgroup.pressure > >> ... cgroup_file_show > >> kernfs_show > >> // inactive kn > >> kernfs_drain_open_files > >> cft->release(of); > >> kfree(ctx); > >> ... > >> kernfs_get_active = false > >> echo 1 > cgroup.pressure > >> kernfs_show > >> kernfs_activate_one(kn); > >> kernfs_fop_poll > >> kernfs_get_active = true > >> cgroup_file_poll > >> psi_trigger_poll > >> // UAF > >> ... > >> end: close(fd) > >> > >> Fix this by adding released flag check in kernfs_fop_poll(), which is > >> treated as kn is inactive. > >> > >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") > >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> > >> Signed-off-by: Chen Ridong <chenridong@huawei.com> > >> --- > >> fs/kernfs/file.c | 2 +- > >> kernel/cgroup/cgroup.c | 1 + > >> 2 files changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > >> index a6c692cac616..d5d01f0b9392 100644 > >> --- a/fs/kernfs/file.c > >> +++ b/fs/kernfs/file.c > >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) > >> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); > >> __poll_t ret; > >> > >> - if (!kernfs_get_active(kn)) > >> + if (of->released || !kernfs_get_active(kn)) > > > > I can see why the cgroup change is needed, but why is this test for > > released() an issue here? This feels like two different changes/fixes > > for different problems? Why does testing for released matter at this > > point in time? > > > > thanks, > > > > greg k-h > > Thank you for your feedback. > > The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL > pointer access problem. Where will the null pointer access happen? kernfs_fop_poll() isn't caring about the released file, AND you need to properly lock things when testing that value (hint, what if it changes right after you tested it?) > If the open file is properly drained(released), it can not safely invoke the poll callback again. > Otherwise, it may still lead to either UAF or NULL pointer issues > > Are you suggesting I should split this into two separate patches? This feels like two separate issues for two different things. The PSI change didn't cause the kernfs change to be required right? thanks, greg k-h
On 2025/8/15 14:43, Greg KH wrote: > On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote: >> >> >> On 2025/8/15 14:11, Greg KH wrote: >>> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote: >>>> From: Chen Ridong <chenridong@huawei.com> >>>> >>>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure >>>> Stall Information) monitoring mechanism: >>>> >>>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 >>>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1 >>>> >>>> psi_trigger_poll+0x3c/0x140 >>>> cgroup_pressure_poll+0x70/0xa0 >>>> cgroup_file_poll+0x8c/0x100 >>>> kernfs_fop_poll+0x11c/0x1c0 >>>> ep_item_poll.isra.0+0x188/0x2c0 >>>> >>>> Allocated by task 1: >>>> cgroup_file_open+0x88/0x388 >>>> kernfs_fop_open+0x73c/0xaf0 >>>> do_dentry_open+0x5fc/0x1200 >>>> vfs_open+0xa0/0x3f0 >>>> do_open+0x7e8/0xd08 >>>> path_openat+0x2fc/0x6b0 >>>> do_filp_open+0x174/0x368 >>>> >>>> Freed by task 8462: >>>> cgroup_file_release+0x130/0x1f8 >>>> kernfs_drain_open_files+0x17c/0x440 >>>> kernfs_drain+0x2dc/0x360 >>>> kernfs_show+0x1b8/0x288 >>>> cgroup_file_show+0x150/0x268 >>>> cgroup_pressure_write+0x1dc/0x340 >>>> cgroup_file_write+0x274/0x548 >>>> >>>> Reproduction Steps: >>>> 1. Open test/cpu.pressure and establish epoll monitoring >>>> 2. Disable monitoring: echo 0 > test/cgroup.pressure >>>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure >>>> >>>> The race condition occurs because: >>>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: >>>> - Releases PSI triggers via cgroup_file_release() >>>> - Frees of->priv through kernfs_drain_open_files() >>>> 2. While epoll still holds reference to the file and continues polling >>>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv >>>> >>>> epolling disable/enable cgroup.pressure >>>> fd=open(cpu.pressure) >>>> while(1) >>>> ... >>>> epoll_wait >>>> kernfs_fop_poll >>>> kernfs_get_active = true echo 0 > cgroup.pressure >>>> ... cgroup_file_show >>>> kernfs_show >>>> // inactive kn >>>> kernfs_drain_open_files >>>> cft->release(of); >>>> kfree(ctx); >>>> ... >>>> kernfs_get_active = false >>>> echo 1 > cgroup.pressure >>>> kernfs_show >>>> kernfs_activate_one(kn); >>>> kernfs_fop_poll >>>> kernfs_get_active = true >>>> cgroup_file_poll >>>> psi_trigger_poll >>>> // UAF >>>> ... >>>> end: close(fd) >>>> >>>> Fix this by adding released flag check in kernfs_fop_poll(), which is >>>> treated as kn is inactive. >>>> >>>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") >>>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com> >>>> Signed-off-by: Chen Ridong <chenridong@huawei.com> >>>> --- >>>> fs/kernfs/file.c | 2 +- >>>> kernel/cgroup/cgroup.c | 1 + >>>> 2 files changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c >>>> index a6c692cac616..d5d01f0b9392 100644 >>>> --- a/fs/kernfs/file.c >>>> +++ b/fs/kernfs/file.c >>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait) >>>> struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); >>>> __poll_t ret; >>>> >>>> - if (!kernfs_get_active(kn)) >>>> + if (of->released || !kernfs_get_active(kn)) >>> >>> I can see why the cgroup change is needed, but why is this test for >>> released() an issue here? This feels like two different changes/fixes >>> for different problems? Why does testing for released matter at this >>> point in time? >>> >>> thanks, >>> >>> greg k-h >> >> Thank you for your feedback. >> >> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL >> pointer access problem. > > Where will the null pointer access happen? kernfs_fop_poll() isn't > caring about the released file, AND you need to properly lock things > when testing that value (hint, what if it changes right after you tested > it?) > Issue occurs in psi_trigger_poll() during this sequence: fd = open("cpu.pressure") <...> // cgroup.pressure disabled then re-enabled of->released flag gets set to true kernfs_fop_poll() └─ kernfs_get_active() returns true (due to re-enable) <<<-- kernfs changes, check of->released └─ kn->attr.ops->poll() └─ cgroup_file_poll() └─ *ctx = of->priv; // Already released by .release()(cgroup_file_release) └─ psi_trigger_poll() └─ smp_load_acquire(trigger_ptr); // UAF: trigger_ptr == of->priv // NULL dereference after of->priv = NULL patch >> If the open file is properly drained(released), it can not safely invoke the poll callback again. >> Otherwise, it may still lead to either UAF or NULL pointer issues >> >> Are you suggesting I should split this into two separate patches? > > This feels like two separate issues for two different things. The PSI > change didn't cause the kernfs change to be required right? > > thanks, > > greg k-h Correct. The cgroup modifications make the issue more easily observable, while the kernfs changes provide the actual fix. -- Best regards, Ridong
© 2016 - 2025 Red Hat, Inc.