[PATCH] fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED

Mateusz Guzik posted 1 patch 1 day, 14 hours ago
fs/namei.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
[PATCH] fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
Posted by Mateusz Guzik 1 day, 14 hours ago
Otherwise the slowpath can be taken by the caller, defeating the flag.

This regressed after calls to legitimize_links() started being
conditionally elided and stems from the routine always failing
after seeing the flag, regardless if there were any links.

In order to address both the bug and the weird semantics make it illegal
to call legitimize_links() with LOOKUP_CACHED and handle the problem at
the two callsites.

While here another tiny tidyp: ->depth = 0 can be moved into
drop_links().

Fixes: 7c179096e77eca21 ("fs: add predicts based on nd->depth")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 fs/namei.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bf0f66f0e9b9..648567c77716 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -774,6 +774,7 @@ static void drop_links(struct nameidata *nd)
 		do_delayed_call(&last->done);
 		clear_delayed_call(&last->done);
 	}
+	nd->depth = 0;
 }
 
 static void leave_rcu(struct nameidata *nd)
@@ -799,7 +800,7 @@ static void terminate_walk(struct nameidata *nd)
 	} else {
 		leave_rcu(nd);
 	}
-	nd->depth = 0;
+	VFS_BUG_ON(nd->depth);
 	nd->path.mnt = NULL;
 	nd->path.dentry = NULL;
 }
@@ -830,11 +831,9 @@ static inline bool legitimize_path(struct nameidata *nd,
 static bool legitimize_links(struct nameidata *nd)
 {
 	int i;
-	if (unlikely(nd->flags & LOOKUP_CACHED)) {
-		drop_links(nd);
-		nd->depth = 0;
-		return false;
-	}
+
+	VFS_BUG_ON(nd->flags & LOOKUP_CACHED);
+
 	for (i = 0; i < nd->depth; i++) {
 		struct saved *last = nd->stack + i;
 		if (unlikely(!legitimize_path(nd, &last->link, last->seq))) {
@@ -883,6 +882,10 @@ static bool try_to_unlazy(struct nameidata *nd)
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
+	if (unlikely(nd->flags & LOOKUP_CACHED)) {
+		drop_links(nd);
+		goto out1;
+	}
 	if (unlikely(nd->depth && !legitimize_links(nd)))
 		goto out1;
 	if (unlikely(!legitimize_path(nd, &nd->path, nd->seq)))
@@ -918,6 +921,10 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	int res;
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
+	if (unlikely(nd->flags & LOOKUP_CACHED)) {
+		drop_links(nd);
+		goto out2;
+	}
 	if (unlikely(nd->depth && !legitimize_links(nd)))
 		goto out2;
 	res = __legitimize_mnt(nd->path.mnt, nd->m_seq);
-- 
2.48.1
Re: [PATCH] fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
Posted by Chris Mason 1 day, 4 hours ago
On 12/12/25 7:11 AM, Mateusz Guzik wrote:
> Otherwise the slowpath can be taken by the caller, defeating the flag.
> 
> This regressed after calls to legitimize_links() started being
> conditionally elided and stems from the routine always failing
> after seeing the flag, regardless if there were any links.
> 
> In order to address both the bug and the weird semantics make it illegal
> to call legitimize_links() with LOOKUP_CACHED and handle the problem at
> the two callsites.
> 
> While here another tiny tidyp: ->depth = 0 can be moved into
> drop_links().
> 
AI flagged that last ->depth = 0 in drop_links().  I made it list
out the ways it thinks this can happen to make it easier to
call BS if it's wrong, but I think you can judge this a lot faster
than me:

> diff --git a/fs/namei.c b/fs/namei.c
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -774,6 +774,7 @@ static void drop_links(struct nameidata *nd)
>  		do_delayed_call(&last->done);
>  		clear_delayed_call(&last->done);
>  	}
> +	nd->depth = 0;
>  }
>
>  static void leave_rcu(struct nameidata *nd)
> @@ -799,7 +800,7 @@ static void terminate_walk(struct nameidata *nd)
>  	} else {
>  		leave_rcu(nd);
>  	}
> -	nd->depth = 0;
> +	VFS_BUG_ON(nd->depth);
>  	nd->path.mnt = NULL;
>  	nd->path.dentry = NULL;
>  }

Moving nd->depth = 0 into drop_links() appears to break terminate_walk()
in non-RCU mode. The function reads:

    static void terminate_walk(struct nameidata *nd)
    {
        if (unlikely(nd->depth))
            drop_links(nd);              // <-- now sets nd->depth = 0
        if (!(nd->flags & LOOKUP_RCU)) {
            int i;
            path_put(&nd->path);
            for (i = 0; i < nd->depth; i++)   // <-- nd->depth is 0 here
                path_put(&nd->stack[i].link); // <-- loop never executes
            ...
        }
        ...
    }

When terminate_walk() is called in non-RCU mode with nd->depth > 0 (after
following symlinks), the path_put() loop for nd->stack[i].link will run
zero iterations because drop_links() has already zeroed nd->depth. Can
this leak references on the symlink paths stored in nd->stack[]?

Concrete paths where this can occur:

1. Retry after RCU-walk failure (non-RCU from the start):

   filename_lookup()
     -> path_lookupat(&nd, flags | LOOKUP_RCU, path)
     -> returns -ECHILD
     -> path_lookupat(&nd, flags, path)   // retry WITHOUT LOOKUP_RCU
         -> path_init() in non-RCU mode
         -> link_path_walk() follows symlinks
             -> pick_link() calls mntget() and increments nd->depth
         -> terminate_walk()              // nd->depth > 0, refs held

2. Mid-walk transition via try_to_unlazy():

   path_lookupat() in RCU mode
     -> link_path_walk() follows symlinks, nd->depth > 0
         -> pick_link()
             -> atime_needs_update() returns true
             -> try_to_unlazy()
                 -> legitimize_links() takes refs on nd->stack[].link
                 -> leave_rcu() clears LOOKUP_RCU
     -> error occurs later
     -> terminate_walk()                  // nd->depth > 0, refs held

3. Transition via complete_walk():

   path_lookupat() in RCU mode
     -> link_path_walk() follows symlinks, nd->depth > 0
     -> complete_walk()
         -> try_to_unlazy()
             -> legitimize_links() takes refs
             -> leave_rcu()
     -> later check fails (e.g., -ENOTDIR)
     -> terminate_walk()                  // nd->depth > 0, refs held

In all these paths, nd->stack[i].link holds references that should be
released by the path_put() loop in terminate_walk(), but the loop runs
zero iterations because drop_links() has already zeroed nd->depth.

[ ... ]

-chris
Re: [PATCH] fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
Posted by Mateusz Guzik 1 day, 3 hours ago
On Fri, Dec 12, 2025 at 10:26 PM Chris Mason <clm@meta.com> wrote:
>
> On 12/12/25 7:11 AM, Mateusz Guzik wrote:
> > Otherwise the slowpath can be taken by the caller, defeating the flag.
> >
> > This regressed after calls to legitimize_links() started being
> > conditionally elided and stems from the routine always failing
> > after seeing the flag, regardless if there were any links.
> >
> > In order to address both the bug and the weird semantics make it illegal
> > to call legitimize_links() with LOOKUP_CACHED and handle the problem at
> > the two callsites.
> >
> > While here another tiny tidyp: ->depth = 0 can be moved into
> > drop_links().
> >
> AI flagged that last ->depth = 0 in drop_links().  I made it list
> out the ways it thinks this can happen to make it easier to
> call BS if it's wrong, but I think you can judge this a lot faster
> than me:
>
> > diff --git a/fs/namei.c b/fs/namei.c
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -774,6 +774,7 @@ static void drop_links(struct nameidata *nd)
> >               do_delayed_call(&last->done);
> >               clear_delayed_call(&last->done);
> >       }
> > +     nd->depth = 0;
> >  }
> >
> >  static void leave_rcu(struct nameidata *nd)
> > @@ -799,7 +800,7 @@ static void terminate_walk(struct nameidata *nd)
> >       } else {
> >               leave_rcu(nd);
> >       }
> > -     nd->depth = 0;
> > +     VFS_BUG_ON(nd->depth);
> >       nd->path.mnt = NULL;
> >       nd->path.dentry = NULL;
> >  }
>
> Moving nd->depth = 0 into drop_links() appears to break terminate_walk()
> in non-RCU mode. The function reads:
>
>     static void terminate_walk(struct nameidata *nd)
>     {
>         if (unlikely(nd->depth))
>             drop_links(nd);              // <-- now sets nd->depth = 0
>         if (!(nd->flags & LOOKUP_RCU)) {
>             int i;
>             path_put(&nd->path);
>             for (i = 0; i < nd->depth; i++)   // <-- nd->depth is 0 here
>                 path_put(&nd->stack[i].link); // <-- loop never executes
>             ...
>         }
>         ...
>     }
>
> When terminate_walk() is called in non-RCU mode with nd->depth > 0 (after
> following symlinks), the path_put() loop for nd->stack[i].link will run
> zero iterations because drop_links() has already zeroed nd->depth. Can
> this leak references on the symlink paths stored in nd->stack[]?
>
> Concrete paths where this can occur:
>
> 1. Retry after RCU-walk failure (non-RCU from the start):
>
>    filename_lookup()
>      -> path_lookupat(&nd, flags | LOOKUP_RCU, path)
>      -> returns -ECHILD
>      -> path_lookupat(&nd, flags, path)   // retry WITHOUT LOOKUP_RCU
>          -> path_init() in non-RCU mode
>          -> link_path_walk() follows symlinks
>              -> pick_link() calls mntget() and increments nd->depth
>          -> terminate_walk()              // nd->depth > 0, refs held
>
> 2. Mid-walk transition via try_to_unlazy():
>
>    path_lookupat() in RCU mode
>      -> link_path_walk() follows symlinks, nd->depth > 0
>          -> pick_link()
>              -> atime_needs_update() returns true
>              -> try_to_unlazy()
>                  -> legitimize_links() takes refs on nd->stack[].link
>                  -> leave_rcu() clears LOOKUP_RCU
>      -> error occurs later
>      -> terminate_walk()                  // nd->depth > 0, refs held
>
> 3. Transition via complete_walk():
>
>    path_lookupat() in RCU mode
>      -> link_path_walk() follows symlinks, nd->depth > 0
>      -> complete_walk()
>          -> try_to_unlazy()
>              -> legitimize_links() takes refs
>              -> leave_rcu()
>      -> later check fails (e.g., -ENOTDIR)
>      -> terminate_walk()                  // nd->depth > 0, refs held
>
> In all these paths, nd->stack[i].link holds references that should be
> released by the path_put() loop in terminate_walk(), but the loop runs
> zero iterations because drop_links() has already zeroed nd->depth.
>
> [ ... ]
>
> -chris

Quite verbose. :)

I thought I patched all of these.

But ye, the value needs to get read prior.

Again weird semantics, why is any work on symlink stack being done
after drop_links got called.

What this really wants is some form of variable poisoning so that
instead of zeroing depth I mark it as indeterminate and have kasan
trap on access.

Will send a v2 later, thanks.