[PATCH] kernfs: Fix UAF in PSI polling when open file is released

Chen Ridong posted 1 patch 1 month, 2 weeks ago
There is a newer version of this series
fs/kernfs/file.c       | 2 +-
kernel/cgroup/cgroup.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
[PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Chen Ridong 1 month, 2 weeks ago
From: Chen Ridong <chenridong@huawei.com>

A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
Stall Information) monitoring mechanism:

BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
Read of size 8 at addr ffff3de3d50bd308 by task systemd/1

psi_trigger_poll+0x3c/0x140
cgroup_pressure_poll+0x70/0xa0
cgroup_file_poll+0x8c/0x100
kernfs_fop_poll+0x11c/0x1c0
ep_item_poll.isra.0+0x188/0x2c0

Allocated by task 1:
cgroup_file_open+0x88/0x388
kernfs_fop_open+0x73c/0xaf0
do_dentry_open+0x5fc/0x1200
vfs_open+0xa0/0x3f0
do_open+0x7e8/0xd08
path_openat+0x2fc/0x6b0
do_filp_open+0x174/0x368

Freed by task 8462:
cgroup_file_release+0x130/0x1f8
kernfs_drain_open_files+0x17c/0x440
kernfs_drain+0x2dc/0x360
kernfs_show+0x1b8/0x288
cgroup_file_show+0x150/0x268
cgroup_pressure_write+0x1dc/0x340
cgroup_file_write+0x274/0x548

Reproduction Steps:
1. Open test/cpu.pressure and establish epoll monitoring
2. Disable monitoring: echo 0 > test/cgroup.pressure
3. Re-enable monitoring: echo 1 > test/cgroup.pressure

The race condition occurs because:
1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
   - Releases PSI triggers via cgroup_file_release()
   - Frees of->priv through kernfs_drain_open_files()
2. While epoll still holds reference to the file and continues polling
3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv

epolling			disable/enable cgroup.pressure
fd=open(cpu.pressure)
while(1)
...
epoll_wait
kernfs_fop_poll
kernfs_get_active = true	echo 0 > cgroup.pressure
...				cgroup_file_show
				kernfs_show
				// inactive kn
				kernfs_drain_open_files
				cft->release(of);
				kfree(ctx);
				...
kernfs_get_active = false
				echo 1 > cgroup.pressure
				kernfs_show
				kernfs_activate_one(kn);
kernfs_fop_poll
kernfs_get_active = true
cgroup_file_poll
psi_trigger_poll
// UAF
...
end: close(fd)

Fix this by adding released flag check in kernfs_fop_poll(), which is
treated as kn is inactive.

Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 fs/kernfs/file.c       | 2 +-
 kernel/cgroup/cgroup.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index a6c692cac616..d5d01f0b9392 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
 	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
 	__poll_t ret;
 
-	if (!kernfs_get_active(kn))
+	if (of->released || !kernfs_get_active(kn))
 		return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
 
 	if (kn->attr.ops->poll)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..d8b82afed181 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4159,6 +4159,7 @@ static void cgroup_file_release(struct kernfs_open_file *of)
 		cft->release(of);
 	put_cgroup_ns(ctx->ns);
 	kfree(ctx);
+	of->priv = NULL;
 }
 
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
-- 
2.34.1
Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Greg KH 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
> Stall Information) monitoring mechanism:
> 
> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
> 
> psi_trigger_poll+0x3c/0x140
> cgroup_pressure_poll+0x70/0xa0
> cgroup_file_poll+0x8c/0x100
> kernfs_fop_poll+0x11c/0x1c0
> ep_item_poll.isra.0+0x188/0x2c0
> 
> Allocated by task 1:
> cgroup_file_open+0x88/0x388
> kernfs_fop_open+0x73c/0xaf0
> do_dentry_open+0x5fc/0x1200
> vfs_open+0xa0/0x3f0
> do_open+0x7e8/0xd08
> path_openat+0x2fc/0x6b0
> do_filp_open+0x174/0x368
> 
> Freed by task 8462:
> cgroup_file_release+0x130/0x1f8
> kernfs_drain_open_files+0x17c/0x440
> kernfs_drain+0x2dc/0x360
> kernfs_show+0x1b8/0x288
> cgroup_file_show+0x150/0x268
> cgroup_pressure_write+0x1dc/0x340
> cgroup_file_write+0x274/0x548
> 
> Reproduction Steps:
> 1. Open test/cpu.pressure and establish epoll monitoring
> 2. Disable monitoring: echo 0 > test/cgroup.pressure
> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
> 
> The race condition occurs because:
> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>    - Releases PSI triggers via cgroup_file_release()
>    - Frees of->priv through kernfs_drain_open_files()
> 2. While epoll still holds reference to the file and continues polling
> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
> 
> epolling			disable/enable cgroup.pressure
> fd=open(cpu.pressure)
> while(1)
> ...
> epoll_wait
> kernfs_fop_poll
> kernfs_get_active = true	echo 0 > cgroup.pressure
> ...				cgroup_file_show
> 				kernfs_show
> 				// inactive kn
> 				kernfs_drain_open_files
> 				cft->release(of);
> 				kfree(ctx);
> 				...
> kernfs_get_active = false
> 				echo 1 > cgroup.pressure
> 				kernfs_show
> 				kernfs_activate_one(kn);
> kernfs_fop_poll
> kernfs_get_active = true
> cgroup_file_poll
> psi_trigger_poll
> // UAF
> ...
> end: close(fd)
> 
> Fix this by adding released flag check in kernfs_fop_poll(), which is
> treated as kn is inactive.
> 
> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  fs/kernfs/file.c       | 2 +-
>  kernel/cgroup/cgroup.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index a6c692cac616..d5d01f0b9392 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>  	__poll_t ret;
>  
> -	if (!kernfs_get_active(kn))
> +	if (of->released || !kernfs_get_active(kn))

I can see why the cgroup change is needed, but why is this test for
released() an issue here?  This feels like two different changes/fixes
for different problems?  Why does testing for released matter at this
point in time?

thanks,

greg k-h
Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Michal Koutný 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote:
> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index a6c692cac616..d5d01f0b9392 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
> >  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
> >  	__poll_t ret;
> >  
> > -	if (!kernfs_get_active(kn))
> > +	if (of->released || !kernfs_get_active(kn))
> 
> I can see why the cgroup change is needed,

I don't see it that much. of->priv isn't checked in cgroup code anywhere
so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL
pointer deref :-/ (Additionally, same zeroing would be needed in error
path in cgroup_file_open().)

I _think_ the place to cleanup would be in
@@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of,
                psi->enabled = enable;
                if (enable)
                        psi_cgroup_restart(psi);
+               else
+                       psi_trigger_destroy(???);
        }

        cgroup_kn_unlock(of->kn);

The issue is that cgroup_pressure_write doesn't know all possible
triggers to be cancelled. (The fix with of->released would only
sanitize effect but not the cause IMO.)

HTH,
Michal
Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Chen Ridong 1 month, 2 weeks ago

On 2025/8/15 22:42, Michal Koutný wrote:
> On Fri, Aug 15, 2025 at 08:11:39AM +0200, Greg KH <gregkh@linuxfoundation.org> wrote:
>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>>> index a6c692cac616..d5d01f0b9392 100644
>>> --- a/fs/kernfs/file.c
>>> +++ b/fs/kernfs/file.c
>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>>  	__poll_t ret;
>>>  
>>> -	if (!kernfs_get_active(kn))
>>> +	if (of->released || !kernfs_get_active(kn))
>>
>> I can see why the cgroup change is needed,
> 
> I don't see it that much. of->priv isn't checked in cgroup code anywhere
> so it isn't helpful zeroing. As Ridong writes it may trade UaF for NULL
> pointer deref :-/ (Additionally, same zeroing would be needed in error
> path in cgroup_file_open().)
> 

Thank you, Michal,

I believe assigning NULL to of->priv should be harmless. This change would make the bug more
observable in practice. Without this explicit NULL assignment, the use-after-free (UAF) issue might
remain hidden in some cases, particularly when KASAN is disabled.

> I _think_ the place to cleanup would be in
> @@ -3978,6 +3978,8 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of,
>                 psi->enabled = enable;
>                 if (enable)
>                         psi_cgroup_restart(psi);
> +               else
> +                       psi_trigger_destroy(???);
>         }
> 

Could you please provide more details about this modification? Do you mean we need to consider
additional cleanup work when disabling cgroup.pressure? The psi_trigger_destroy is invoked as
follows:

cgroup_file_show
  kernfs_drain
    kernfs_drain_open_files
      kernfs_release_file
        cgroup_file_release
          cft->release(of);
            cgroup_pressure_release
              psi_trigger_destroy

>         cgroup_kn_unlock(of->kn);
> 
> The issue is that cgroup_pressure_write doesn't know all possible
> triggers to be cancelled. (The fix with of->released would only
> sanitize effect but not the cause IMO.)
> 
> HTH,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Chen Ridong 1 month, 2 weeks ago

On 2025/8/15 14:11, Greg KH wrote:
> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>> Stall Information) monitoring mechanism:
>>
>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>
>> psi_trigger_poll+0x3c/0x140
>> cgroup_pressure_poll+0x70/0xa0
>> cgroup_file_poll+0x8c/0x100
>> kernfs_fop_poll+0x11c/0x1c0
>> ep_item_poll.isra.0+0x188/0x2c0
>>
>> Allocated by task 1:
>> cgroup_file_open+0x88/0x388
>> kernfs_fop_open+0x73c/0xaf0
>> do_dentry_open+0x5fc/0x1200
>> vfs_open+0xa0/0x3f0
>> do_open+0x7e8/0xd08
>> path_openat+0x2fc/0x6b0
>> do_filp_open+0x174/0x368
>>
>> Freed by task 8462:
>> cgroup_file_release+0x130/0x1f8
>> kernfs_drain_open_files+0x17c/0x440
>> kernfs_drain+0x2dc/0x360
>> kernfs_show+0x1b8/0x288
>> cgroup_file_show+0x150/0x268
>> cgroup_pressure_write+0x1dc/0x340
>> cgroup_file_write+0x274/0x548
>>
>> Reproduction Steps:
>> 1. Open test/cpu.pressure and establish epoll monitoring
>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>
>> The race condition occurs because:
>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>    - Releases PSI triggers via cgroup_file_release()
>>    - Frees of->priv through kernfs_drain_open_files()
>> 2. While epoll still holds reference to the file and continues polling
>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>
>> epolling			disable/enable cgroup.pressure
>> fd=open(cpu.pressure)
>> while(1)
>> ...
>> epoll_wait
>> kernfs_fop_poll
>> kernfs_get_active = true	echo 0 > cgroup.pressure
>> ...				cgroup_file_show
>> 				kernfs_show
>> 				// inactive kn
>> 				kernfs_drain_open_files
>> 				cft->release(of);
>> 				kfree(ctx);
>> 				...
>> kernfs_get_active = false
>> 				echo 1 > cgroup.pressure
>> 				kernfs_show
>> 				kernfs_activate_one(kn);
>> kernfs_fop_poll
>> kernfs_get_active = true
>> cgroup_file_poll
>> psi_trigger_poll
>> // UAF
>> ...
>> end: close(fd)
>>
>> Fix this by adding released flag check in kernfs_fop_poll(), which is
>> treated as kn is inactive.
>>
>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  fs/kernfs/file.c       | 2 +-
>>  kernel/cgroup/cgroup.c | 1 +
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>> index a6c692cac616..d5d01f0b9392 100644
>> --- a/fs/kernfs/file.c
>> +++ b/fs/kernfs/file.c
>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>  	__poll_t ret;
>>  
>> -	if (!kernfs_get_active(kn))
>> +	if (of->released || !kernfs_get_active(kn))
> 
> I can see why the cgroup change is needed, but why is this test for
> released() an issue here?  This feels like two different changes/fixes
> for different problems?  Why does testing for released matter at this
> point in time?
> 
> thanks,
> 
> greg k-h

Thank you for your feedback.

The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
pointer access problem.

If the open file is properly drained(released), it can not safely invoke the poll callback again.
Otherwise, it may still lead to either UAF or NULL pointer issues

Are you suggesting I should split this into two separate patches?

-- 
Best regards,
Ridong
Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Greg KH 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote:
> 
> 
> On 2025/8/15 14:11, Greg KH wrote:
> > On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
> >> Stall Information) monitoring mechanism:
> >>
> >> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
> >> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
> >>
> >> psi_trigger_poll+0x3c/0x140
> >> cgroup_pressure_poll+0x70/0xa0
> >> cgroup_file_poll+0x8c/0x100
> >> kernfs_fop_poll+0x11c/0x1c0
> >> ep_item_poll.isra.0+0x188/0x2c0
> >>
> >> Allocated by task 1:
> >> cgroup_file_open+0x88/0x388
> >> kernfs_fop_open+0x73c/0xaf0
> >> do_dentry_open+0x5fc/0x1200
> >> vfs_open+0xa0/0x3f0
> >> do_open+0x7e8/0xd08
> >> path_openat+0x2fc/0x6b0
> >> do_filp_open+0x174/0x368
> >>
> >> Freed by task 8462:
> >> cgroup_file_release+0x130/0x1f8
> >> kernfs_drain_open_files+0x17c/0x440
> >> kernfs_drain+0x2dc/0x360
> >> kernfs_show+0x1b8/0x288
> >> cgroup_file_show+0x150/0x268
> >> cgroup_pressure_write+0x1dc/0x340
> >> cgroup_file_write+0x274/0x548
> >>
> >> Reproduction Steps:
> >> 1. Open test/cpu.pressure and establish epoll monitoring
> >> 2. Disable monitoring: echo 0 > test/cgroup.pressure
> >> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
> >>
> >> The race condition occurs because:
> >> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
> >>    - Releases PSI triggers via cgroup_file_release()
> >>    - Frees of->priv through kernfs_drain_open_files()
> >> 2. While epoll still holds reference to the file and continues polling
> >> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
> >>
> >> epolling			disable/enable cgroup.pressure
> >> fd=open(cpu.pressure)
> >> while(1)
> >> ...
> >> epoll_wait
> >> kernfs_fop_poll
> >> kernfs_get_active = true	echo 0 > cgroup.pressure
> >> ...				cgroup_file_show
> >> 				kernfs_show
> >> 				// inactive kn
> >> 				kernfs_drain_open_files
> >> 				cft->release(of);
> >> 				kfree(ctx);
> >> 				...
> >> kernfs_get_active = false
> >> 				echo 1 > cgroup.pressure
> >> 				kernfs_show
> >> 				kernfs_activate_one(kn);
> >> kernfs_fop_poll
> >> kernfs_get_active = true
> >> cgroup_file_poll
> >> psi_trigger_poll
> >> // UAF
> >> ...
> >> end: close(fd)
> >>
> >> Fix this by adding released flag check in kernfs_fop_poll(), which is
> >> treated as kn is inactive.
> >>
> >> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
> >> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >> ---
> >>  fs/kernfs/file.c       | 2 +-
> >>  kernel/cgroup/cgroup.c | 1 +
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> >> index a6c692cac616..d5d01f0b9392 100644
> >> --- a/fs/kernfs/file.c
> >> +++ b/fs/kernfs/file.c
> >> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
> >>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
> >>  	__poll_t ret;
> >>  
> >> -	if (!kernfs_get_active(kn))
> >> +	if (of->released || !kernfs_get_active(kn))
> > 
> > I can see why the cgroup change is needed, but why is this test for
> > released() an issue here?  This feels like two different changes/fixes
> > for different problems?  Why does testing for released matter at this
> > point in time?
> > 
> > thanks,
> > 
> > greg k-h
> 
> Thank you for your feedback.
> 
> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
> pointer access problem.

Where will the null pointer access happen?  kernfs_fop_poll() isn't
caring about the released file, AND you need to properly lock things
when testing that value (hint, what if it changes right after you tested
it?)

> If the open file is properly drained(released), it can not safely invoke the poll callback again.
> Otherwise, it may still lead to either UAF or NULL pointer issues
> 
> Are you suggesting I should split this into two separate patches?

This feels like two separate issues for two different things.  The PSI
change didn't cause the kernfs change to be required right?

thanks,

greg k-h
Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
Posted by Chen Ridong 1 month, 2 weeks ago

On 2025/8/15 14:43, Greg KH wrote:
> On Fri, Aug 15, 2025 at 02:22:54PM +0800, Chen Ridong wrote:
>>
>>
>> On 2025/8/15 14:11, Greg KH wrote:
>>> On Fri, Aug 15, 2025 at 01:34:29AM +0000, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>>>> Stall Information) monitoring mechanism:
>>>>
>>>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>>>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>>>
>>>> psi_trigger_poll+0x3c/0x140
>>>> cgroup_pressure_poll+0x70/0xa0
>>>> cgroup_file_poll+0x8c/0x100
>>>> kernfs_fop_poll+0x11c/0x1c0
>>>> ep_item_poll.isra.0+0x188/0x2c0
>>>>
>>>> Allocated by task 1:
>>>> cgroup_file_open+0x88/0x388
>>>> kernfs_fop_open+0x73c/0xaf0
>>>> do_dentry_open+0x5fc/0x1200
>>>> vfs_open+0xa0/0x3f0
>>>> do_open+0x7e8/0xd08
>>>> path_openat+0x2fc/0x6b0
>>>> do_filp_open+0x174/0x368
>>>>
>>>> Freed by task 8462:
>>>> cgroup_file_release+0x130/0x1f8
>>>> kernfs_drain_open_files+0x17c/0x440
>>>> kernfs_drain+0x2dc/0x360
>>>> kernfs_show+0x1b8/0x288
>>>> cgroup_file_show+0x150/0x268
>>>> cgroup_pressure_write+0x1dc/0x340
>>>> cgroup_file_write+0x274/0x548
>>>>
>>>> Reproduction Steps:
>>>> 1. Open test/cpu.pressure and establish epoll monitoring
>>>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>>>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>>>
>>>> The race condition occurs because:
>>>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>>>>    - Releases PSI triggers via cgroup_file_release()
>>>>    - Frees of->priv through kernfs_drain_open_files()
>>>> 2. While epoll still holds reference to the file and continues polling
>>>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>>>
>>>> epolling			disable/enable cgroup.pressure
>>>> fd=open(cpu.pressure)
>>>> while(1)
>>>> ...
>>>> epoll_wait
>>>> kernfs_fop_poll
>>>> kernfs_get_active = true	echo 0 > cgroup.pressure
>>>> ...				cgroup_file_show
>>>> 				kernfs_show
>>>> 				// inactive kn
>>>> 				kernfs_drain_open_files
>>>> 				cft->release(of);
>>>> 				kfree(ctx);
>>>> 				...
>>>> kernfs_get_active = false
>>>> 				echo 1 > cgroup.pressure
>>>> 				kernfs_show
>>>> 				kernfs_activate_one(kn);
>>>> kernfs_fop_poll
>>>> kernfs_get_active = true
>>>> cgroup_file_poll
>>>> psi_trigger_poll
>>>> // UAF
>>>> ...
>>>> end: close(fd)
>>>>
>>>> Fix this by adding released flag check in kernfs_fop_poll(), which is
>>>> treated as kn is inactive.
>>>>
>>>> Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
>>>> Reported-by: Zhang Zhantian <zhangzhaotian@huawei.com>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>  fs/kernfs/file.c       | 2 +-
>>>>  kernel/cgroup/cgroup.c | 1 +
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>>>> index a6c692cac616..d5d01f0b9392 100644
>>>> --- a/fs/kernfs/file.c
>>>> +++ b/fs/kernfs/file.c
>>>> @@ -852,7 +852,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>>>>  	struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
>>>>  	__poll_t ret;
>>>>  
>>>> -	if (!kernfs_get_active(kn))
>>>> +	if (of->released || !kernfs_get_active(kn))
>>>
>>> I can see why the cgroup change is needed, but why is this test for
>>> released() an issue here?  This feels like two different changes/fixes
>>> for different problems?  Why does testing for released matter at this
>>> point in time?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Thank you for your feedback.
>>
>> The cgroup changes can prevent the UAF (Use-After-Free) issue, but they will introduce a NULL
>> pointer access problem.
> 
> Where will the null pointer access happen?  kernfs_fop_poll() isn't
> caring about the released file, AND you need to properly lock things
> when testing that value (hint, what if it changes right after you tested
> it?)
> 


Issue occurs in psi_trigger_poll() during this sequence:

fd = open("cpu.pressure")
<...> // cgroup.pressure disabled then re-enabled
of->released flag gets set to true
kernfs_fop_poll()
└─ kernfs_get_active() returns true (due to re-enable) <<<--  kernfs changes, check of->released
    └─ kn->attr.ops->poll()
        └─ cgroup_file_poll()
	    └─ *ctx = of->priv; // Already released by .release()(cgroup_file_release)
                 └─ psi_trigger_poll()
                     └─ smp_load_acquire(trigger_ptr);
// UAF: trigger_ptr == of->priv
// NULL dereference after of->priv = NULL patch


>> If the open file is properly drained(released), it can not safely invoke the poll callback again.
>> Otherwise, it may still lead to either UAF or NULL pointer issues
>>
>> Are you suggesting I should split this into two separate patches?
> 
> This feels like two separate issues for two different things.  The PSI
> change didn't cause the kernfs change to be required right?
> 
> thanks,
> 
> greg k-h

Correct. The cgroup modifications make the issue more easily observable, while the kernfs changes
provide the actual fix.

-- 
Best regards,
Ridong