[PATCH 5.10 0/5] kernfs: backport locking and concurrency improvement

Qingfang Deng posted 5 patches 7 months ago
fs/kernfs/dir.c             | 153 ++++++++++++++++++++----------------
fs/kernfs/file.c            |   4 +-
fs/kernfs/inode.c           |  26 +++---
fs/kernfs/kernfs-internal.h |  24 +++++-
fs/kernfs/mount.c           |  12 +--
fs/kernfs/symlink.c         |   4 +-
include/linux/kernfs.h      |   7 +-
7 files changed, 138 insertions(+), 92 deletions(-)
[PATCH 5.10 0/5] kernfs: backport locking and concurrency improvement
Posted by Qingfang Deng 7 months ago
KCSAN reports concurrent accesses to inode->i_mode:

==================================================================
BUG: KCSAN: data-race in generic_permission / kernfs_iop_permission

write to 0xffffffe001129590 of 2 bytes by task 2477 on cpu 1:
 kernfs_iop_permission+0x72/0x1a0
 link_path_walk.part.0.constprop.0+0x348/0x420
 path_openat+0xee/0x10f0
 do_filp_open+0xaa/0x160
 do_sys_openat2+0x252/0x380
 sys_openat+0x4c/0xa0
 ret_from_syscall+0x0/0x2

read to 0xffffffe001129590 of 2 bytes by task 3902 on cpu 3:
 generic_permission+0x26/0x120
 kernfs_iop_permission+0x150/0x1a0
 link_path_walk.part.0.constprop.0+0x348/0x420
 path_lookupat+0x58/0x280
 filename_lookup+0xae/0x1f0
 user_path_at_empty+0x3a/0x70
 vfs_statx+0x82/0x170
 __do_sys_newfstatat+0x36/0x70
 sys_newfstatat+0x2e/0x50
 ret_from_syscall+0x0/0x2

Reported by Kernel Concurrency Sanitizer on:
CPU: 3 PID: 3902 Comm: ls Not tainted 5.10.104+ #0
==================================================================

kernfs_iop_permission+0x72/0x1a0:

kernfs_refresh_inode at fs/kernfs/inode.c:174
 169 	
 170 	static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode)
 171 	{
 172 		struct kernfs_iattrs *attrs = kn->iattr;
 173 	
>174<		inode->i_mode = kn->mode;
 175 		if (attrs)
 176 			/*
 177 			 * kernfs_node has non-default attributes get them from
 178 			 * persistent copy in kernfs_node.
 179 			 */

(inlined by) kernfs_iop_permission at fs/kernfs/inode.c:285
 280 			return -ECHILD;
 281 	
 282 		kn = inode->i_private;
 283 	
 284 		mutex_lock(&kernfs_mutex);
>285<		kernfs_refresh_inode(kn, inode);
 286 		mutex_unlock(&kernfs_mutex);
 287 	
 288 		return generic_permission(inode, mask);
 289 	}
 290 	

generic_permission+0x26/0x120:

acl_permission_check at fs/namei.c:298
 293 	 * Note that the POSIX ACL check cares about the MAY_NOT_BLOCK bit,
 294 	 * for RCU walking.
 295 	 */
 296 	static int acl_permission_check(struct inode *inode, int mask)
 297 	{
>298<		unsigned int mode = inode->i_mode;
 299 	
 300 		/* Are we the owner? If so, ACL's don't matter */
 301 		if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
 302 			mask &= 7;
 303 			mode >>= 6;

(inlined by) generic_permission at fs/namei.c:353
 348 		int ret;
 349 	
 350 		/*
 351 		 * Do the basic permission checks.
 352 		 */
>353<		ret = acl_permission_check(inode, mask);
 354 		if (ret != -EACCES)
 355 			return ret;
 356 	
 357 		if (S_ISDIR(inode->i_mode)) {
 358 			/* DACs are overridable for directories */

Backport the series from 5.15 to fix the concurrency bug.
https://lore.kernel.org/all/162642752894.63632.5596341704463755308.stgit@web.messagingengine.com

Ian Kent (5):
  kernfs: add a revision to identify directory node changes
  kernfs: use VFS negative dentry caching
  kernfs: switch kernfs to use an rwsem
  kernfs: use i_lock to protect concurrent inode updates
  kernfs: dont call d_splice_alias() under kernfs node lock

 fs/kernfs/dir.c             | 153 ++++++++++++++++++++----------------
 fs/kernfs/file.c            |   4 +-
 fs/kernfs/inode.c           |  26 +++---
 fs/kernfs/kernfs-internal.h |  24 +++++-
 fs/kernfs/mount.c           |  12 +--
 fs/kernfs/symlink.c         |   4 +-
 include/linux/kernfs.h      |   7 +-
 7 files changed, 138 insertions(+), 92 deletions(-)

-- 
2.43.0
Re: [PATCH 5.10 0/5] kernfs: backport locking and concurrency improvement
Posted by Ian Kent 7 months ago
On 21/5/25 09:53, Qingfang Deng wrote:
> KCSAN reports concurrent accesses to inode->i_mode:
>
> ==================================================================
> BUG: KCSAN: data-race in generic_permission / kernfs_iop_permission
>
> write to 0xffffffe001129590 of 2 bytes by task 2477 on cpu 1:
>   kernfs_iop_permission+0x72/0x1a0
>   link_path_walk.part.0.constprop.0+0x348/0x420
>   path_openat+0xee/0x10f0
>   do_filp_open+0xaa/0x160
>   do_sys_openat2+0x252/0x380
>   sys_openat+0x4c/0xa0
>   ret_from_syscall+0x0/0x2
>
> read to 0xffffffe001129590 of 2 bytes by task 3902 on cpu 3:
>   generic_permission+0x26/0x120
>   kernfs_iop_permission+0x150/0x1a0
>   link_path_walk.part.0.constprop.0+0x348/0x420
>   path_lookupat+0x58/0x280
>   filename_lookup+0xae/0x1f0
>   user_path_at_empty+0x3a/0x70
>   vfs_statx+0x82/0x170
>   __do_sys_newfstatat+0x36/0x70
>   sys_newfstatat+0x2e/0x50
>   ret_from_syscall+0x0/0x2
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 3 PID: 3902 Comm: ls Not tainted 5.10.104+ #0
> ==================================================================

It's been soo long since this was merged.

I seem to vaguely remember something along these lines and after 
analyzing it

came to the conclusion it was a false positive.

Let me think about it for a while and see if I can remember the reasoning.


Ian

>
> kernfs_iop_permission+0x72/0x1a0:
>
> kernfs_refresh_inode at fs/kernfs/inode.c:174
>   169 	
>   170 	static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode)
>   171 	{
>   172 		struct kernfs_iattrs *attrs = kn->iattr;
>   173 	
>> 174<		inode->i_mode = kn->mode;
>   175 		if (attrs)
>   176 			/*
>   177 			 * kernfs_node has non-default attributes get them from
>   178 			 * persistent copy in kernfs_node.
>   179 			 */
>
> (inlined by) kernfs_iop_permission at fs/kernfs/inode.c:285
>   280 			return -ECHILD;
>   281 	
>   282 		kn = inode->i_private;
>   283 	
>   284 		mutex_lock(&kernfs_mutex);
>> 285<		kernfs_refresh_inode(kn, inode);
>   286 		mutex_unlock(&kernfs_mutex);
>   287 	
>   288 		return generic_permission(inode, mask);
>   289 	}
>   290 	
>
> generic_permission+0x26/0x120:
>
> acl_permission_check at fs/namei.c:298
>   293 	 * Note that the POSIX ACL check cares about the MAY_NOT_BLOCK bit,
>   294 	 * for RCU walking.
>   295 	 */
>   296 	static int acl_permission_check(struct inode *inode, int mask)
>   297 	{
>> 298<		unsigned int mode = inode->i_mode;
>   299 	
>   300 		/* Are we the owner? If so, ACL's don't matter */
>   301 		if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
>   302 			mask &= 7;
>   303 			mode >>= 6;
>
> (inlined by) generic_permission at fs/namei.c:353
>   348 		int ret;
>   349 	
>   350 		/*
>   351 		 * Do the basic permission checks.
>   352 		 */
>> 353<		ret = acl_permission_check(inode, mask);
>   354 		if (ret != -EACCES)
>   355 			return ret;
>   356 	
>   357 		if (S_ISDIR(inode->i_mode)) {
>   358 			/* DACs are overridable for directories */
>
> Backport the series from 5.15 to fix the concurrency bug.
> https://lore.kernel.org/all/162642752894.63632.5596341704463755308.stgit@web.messagingengine.com
>
> Ian Kent (5):
>    kernfs: add a revision to identify directory node changes
>    kernfs: use VFS negative dentry caching
>    kernfs: switch kernfs to use an rwsem
>    kernfs: use i_lock to protect concurrent inode updates
>    kernfs: dont call d_splice_alias() under kernfs node lock
>
>   fs/kernfs/dir.c             | 153 ++++++++++++++++++++----------------
>   fs/kernfs/file.c            |   4 +-
>   fs/kernfs/inode.c           |  26 +++---
>   fs/kernfs/kernfs-internal.h |  24 +++++-
>   fs/kernfs/mount.c           |  12 +--
>   fs/kernfs/symlink.c         |   4 +-
>   include/linux/kernfs.h      |   7 +-
>   7 files changed, 138 insertions(+), 92 deletions(-)
>
Re: [PATCH 5.10 0/5] kernfs: backport locking and concurrency improvement
Posted by Ian Kent 7 months ago
On 21/5/25 13:35, Ian Kent wrote:
> On 21/5/25 09:53, Qingfang Deng wrote:
>> KCSAN reports concurrent accesses to inode->i_mode:
>>
>> ==================================================================
>> BUG: KCSAN: data-race in generic_permission / kernfs_iop_permission
>>
>> write to 0xffffffe001129590 of 2 bytes by task 2477 on cpu 1:
>>   kernfs_iop_permission+0x72/0x1a0
>>   link_path_walk.part.0.constprop.0+0x348/0x420
>>   path_openat+0xee/0x10f0
>>   do_filp_open+0xaa/0x160
>>   do_sys_openat2+0x252/0x380
>>   sys_openat+0x4c/0xa0
>>   ret_from_syscall+0x0/0x2
>>
>> read to 0xffffffe001129590 of 2 bytes by task 3902 on cpu 3:
>>   generic_permission+0x26/0x120
>>   kernfs_iop_permission+0x150/0x1a0
>>   link_path_walk.part.0.constprop.0+0x348/0x420
>>   path_lookupat+0x58/0x280
>>   filename_lookup+0xae/0x1f0
>>   user_path_at_empty+0x3a/0x70
>>   vfs_statx+0x82/0x170
>>   __do_sys_newfstatat+0x36/0x70
>>   sys_newfstatat+0x2e/0x50
>>   ret_from_syscall+0x0/0x2
>>
>> Reported by Kernel Concurrency Sanitizer on:
>> CPU: 3 PID: 3902 Comm: ls Not tainted 5.10.104+ #0
>> ==================================================================
>
> It's been soo long since this was merged.
>
> I seem to vaguely remember something along these lines and after 
> analyzing it
>
> came to the conclusion it was a false positive.
>
> Let me think about it for a while and see if I can remember the reasoning.

Ok, IIRC, so my thinking was that mode is actually stored in the node 
->mode and is

always updated while holding the write lock and copying the same value 
from ->mode

in multiple concurrent threads wouldn't lead to corruption of inode->mode.


>
>
>
> Ian
>
>>
>> kernfs_iop_permission+0x72/0x1a0:
>>
>> kernfs_refresh_inode at fs/kernfs/inode.c:174
>>   169
>>   170     static void kernfs_refresh_inode(struct kernfs_node *kn, 
>> struct inode *inode)
>>   171     {
>>   172         struct kernfs_iattrs *attrs = kn->iattr;
>>   173
>>> 174<        inode->i_mode = kn->mode;
>>   175         if (attrs)
>>   176             /*
>>   177              * kernfs_node has non-default attributes get them 
>> from
>>   178              * persistent copy in kernfs_node.
>>   179              */
>>
>> (inlined by) kernfs_iop_permission at fs/kernfs/inode.c:285
>>   280             return -ECHILD;
>>   281
>>   282         kn = inode->i_private;
>>   283
>>   284         mutex_lock(&kernfs_mutex);
>>> 285<        kernfs_refresh_inode(kn, inode);
>>   286         mutex_unlock(&kernfs_mutex);
>>   287
>>   288         return generic_permission(inode, mask);
>>   289     }
>>   290
>>
>> generic_permission+0x26/0x120:
>>
>> acl_permission_check at fs/namei.c:298
>>   293      * Note that the POSIX ACL check cares about the 
>> MAY_NOT_BLOCK bit,
>>   294      * for RCU walking.
>>   295      */
>>   296     static int acl_permission_check(struct inode *inode, int mask)
>>   297     {
>>> 298<        unsigned int mode = inode->i_mode;
>>   299
>>   300         /* Are we the owner? If so, ACL's don't matter */
>>   301         if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
>>   302             mask &= 7;
>>   303             mode >>= 6;
>>
>> (inlined by) generic_permission at fs/namei.c:353
>>   348         int ret;
>>   349
>>   350         /*
>>   351          * Do the basic permission checks.
>>   352          */
>>> 353<        ret = acl_permission_check(inode, mask);
>>   354         if (ret != -EACCES)
>>   355             return ret;
>>   356
>>   357         if (S_ISDIR(inode->i_mode)) {
>>   358             /* DACs are overridable for directories */
>>
>> Backport the series from 5.15 to fix the concurrency bug.
>> https://lore.kernel.org/all/162642752894.63632.5596341704463755308.stgit@web.messagingengine.com 
>>
>>
>> Ian Kent (5):
>>    kernfs: add a revision to identify directory node changes
>>    kernfs: use VFS negative dentry caching
>>    kernfs: switch kernfs to use an rwsem
>>    kernfs: use i_lock to protect concurrent inode updates
>>    kernfs: dont call d_splice_alias() under kernfs node lock
>>
>>   fs/kernfs/dir.c             | 153 ++++++++++++++++++++----------------
>>   fs/kernfs/file.c            |   4 +-
>>   fs/kernfs/inode.c           |  26 +++---
>>   fs/kernfs/kernfs-internal.h |  24 +++++-
>>   fs/kernfs/mount.c           |  12 +--
>>   fs/kernfs/symlink.c         |   4 +-
>>   include/linux/kernfs.h      |   7 +-
>>   7 files changed, 138 insertions(+), 92 deletions(-)
>>
>
Re: [PATCH 5.10 0/5] kernfs: backport locking and concurrency improvement
Posted by Greg Kroah-Hartman 7 months ago
On Wed, May 21, 2025 at 09:53:30AM +0800, Qingfang Deng wrote:
> KCSAN reports concurrent accesses to inode->i_mode:

<snip>

Yes, but can you actually trigger this issue with a real workload?  How
were these tested in the 5.10 tree, and if this is a problem, can you
just move to a newer release instead?

thanks,

greg k-h