[PATCH v4 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().

Sebastian Andrzej Siewior posted 6 patches 1 year ago
There is a newer version of this series
[PATCH v4 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().
Posted by Sebastian Andrzej Siewior 1 year ago
The readdir operation iterates over all entries and invokes dir_emit()
for every entry passing kernfs_node::name as argument.
Since the name argument can change, and become invalid, the
kernfs_root::kernfs_rwsem lock should not be dropped to prevent renames
during the operation.

The lock drop around dir_emit() has been initially introduced in commit
   1e5289c97bba2 ("sysfs: Cache the last sysfs_dirent to improve readdir scalability v2")

to avoid holding a global lock during a page fault. The lock drop is
wrong since the support of renames and not a big burden since the lock
is no longer global.

Don't re-acquire kernfs_root::kernfs_rwsem while copying the name to the
userpace buffer.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/kernfs/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 458519e416fe7..5a1fea414996e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1868,10 +1868,10 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 		file->private_data = pos;
 		kernfs_get(pos);
 
-		up_read(&root->kernfs_rwsem);
-		if (!dir_emit(ctx, name, len, ino, type))
+		if (!dir_emit(ctx, name, len, ino, type)) {
+			up_read(&root->kernfs_rwsem);
 			return 0;
-		down_read(&root->kernfs_rwsem);
+		}
 	}
 	up_read(&root->kernfs_rwsem);
 	file->private_data = NULL;
-- 
2.47.2
Re: [PATCH v4 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().
Posted by Tejun Heo 1 year ago
Hello,

On Fri, Jan 24, 2025 at 06:46:12PM +0100, Sebastian Andrzej Siewior wrote:
...
> to avoid holding a global lock during a page fault. The lock drop is
> wrong since the support of renames and not a big burden since the lock
> is no longer global.

It's still a pretty big lock. Hopefully, at least name accesses can be
converted to rcu protected ones later?

> Don't re-acquire kernfs_root::kernfs_rwsem while copying the name to the
> userpace buffer.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun
Re: [PATCH v4 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().
Posted by Sebastian Andrzej Siewior 1 year ago
On 2025-01-24 13:15:32 [-1000], Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 24, 2025 at 06:46:12PM +0100, Sebastian Andrzej Siewior wrote:
> ...
> > to avoid holding a global lock during a page fault. The lock drop is
> > wrong since the support of renames and not a big burden since the lock
> > is no longer global.
> 
> It's still a pretty big lock. Hopefully, at least name accesses can be
> converted to rcu protected ones later?

Not sure what you mean by "rcu protected ones". The name is already RCU
protected which is part of the problem :P
Assuming an upper length of name to be around 255 and
kernfs_fop_readdir() to be very early on the chain then we could copy
the name on stack within the RCU section and do this without
::kernfs_rwsem.

> > Don't re-acquire kernfs_root::kernfs_rwsem while copying the name to the
> > userpace buffer.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> Thanks.

Sebastian
Re: [PATCH v4 4/6] kernfs: Don't re-lock kernfs_root::kernfs_rwsem in kernfs_fop_readdir().
Posted by Tejun Heo 1 year ago
On Mon, Jan 27, 2025 at 10:02:00AM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-01-24 13:15:32 [-1000], Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Jan 24, 2025 at 06:46:12PM +0100, Sebastian Andrzej Siewior wrote:
> > ...
> > > to avoid holding a global lock during a page fault. The lock drop is
> > > wrong since the support of renames and not a big burden since the lock
> > > is no longer global.
> > 
> > It's still a pretty big lock. Hopefully, at least name accesses can be
> > converted to rcu protected ones later?
> 
> Not sure what you mean by "rcu protected ones". The name is already RCU
> protected which is part of the problem :P
> Assuming an upper length of name to be around 255 and
> kernfs_fop_readdir() to be very early on the chain then we could copy
> the name on stack within the RCU section and do this without
> ::kernfs_rwsem.

Please ignore me on this one. I'll take a better look next round.

Thanks.

-- 
tejun