fs/namei.c | 3 --- 1 file changed, 3 deletions(-)
The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu()
is redundant because the only caller (handle_mounts) already verifies
d_managed(dentry) before calling this function, so, dentry in
__follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set.
This early-out optimization never fires in practice - but it is marking
as likely().
This was detected with branch profiling, which shows 100% misprediction
in this likely.
Remove the whole if clause instead of removing the likely, given we
know for sure that dentry is not DCACHE_MANAGED_DENTRY.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
fs/namei.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index bf0f66f0e9b9..774a2f5b0a10 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1623,9 +1623,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
struct dentry *dentry = path->dentry;
unsigned int flags = dentry->d_flags;
- if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
- return true;
-
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return false;
---
base-commit: 3609fa95fb0f2c1b099e69e56634edb8fc03f87c
change-id: 20260105-dcache-01aa5d1c9b32
Best regards,
--
Breno Leitao <leitao@debian.org>
On Mon, Jan 05, 2026 at 07:10:27AM -0800, Breno Leitao wrote:
> The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu()
> is redundant because the only caller (handle_mounts) already verifies
> d_managed(dentry) before calling this function, so, dentry in
> __follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set.
>
> This early-out optimization never fires in practice - but it is marking
> as likely().
>
> This was detected with branch profiling, which shows 100% misprediction
> in this likely.
>
> Remove the whole if clause instead of removing the likely, given we
> know for sure that dentry is not DCACHE_MANAGED_DENTRY.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> fs/namei.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index bf0f66f0e9b9..774a2f5b0a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1623,9 +1623,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
> struct dentry *dentry = path->dentry;
> unsigned int flags = dentry->d_flags;
>
> - if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
> - return true;
> -
This makes me very uneasy.
You are seeing 100% misses on this one because you are never racing
against someone mounting and umounting on the dentry as you are doing
the lookup.
As in it is possible that by the time __follow_mount_rcu is invoked,
DCACHE_MANAGED_DENTRY is no longer set and with the check removed the
rest of the routine keeps executing.
AFAICS this turns harmless as is anyway, but I don't think that's safe
to rely on future-wise and more imporantly it is trivially avoidable.
I did not do it at the time because there are no d_ macros which operate
on already read flags and I could not be bothered to add them. In
retrospect a bad call, should have went with it and kept the open-coded
DCACHE_MANAGED_DENTRY check.
something like this:
diff --git a/fs/namei.c b/fs/namei.c
index bf0f66f0e9b9..c6279f8023cf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1618,13 +1618,11 @@ EXPORT_SYMBOL(follow_down);
* Try to skip to top of mountpoint pile in rcuwalk mode. Fail if
* we meet a managed dentry that would need blocking.
*/
-static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
+static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, int flags)
{
struct dentry *dentry = path->dentry;
- unsigned int flags = dentry->d_flags;
- if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
- return true;
+ VFS_BUG_ON(!(flags & DCACHE_MANAGED_DENTRY));
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
return false;
@@ -1672,9 +1670,10 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
path->dentry = dentry;
if (nd->flags & LOOKUP_RCU) {
unsigned int seq = nd->next_seq;
- if (likely(!d_managed(dentry)))
+ unsigned int flags = READ_ONCE(dentry->d_flags);
+ if (likely(!(dentry->d_flags & DCACHE_MANAGED_DENTRY)))
return 0;
- if (likely(__follow_mount_rcu(nd, path)))
+ if (likely(__follow_mount_rcu(nd, path, flags)))
return 0;
// *path and nd->next_seq might've been clobbered
path->mnt = nd->path.mnt;
Hello Mateusz,
On Wed, Jan 07, 2026 at 03:44:27PM +0100, Mateusz Guzik wrote:
> On Mon, Jan 05, 2026 at 07:10:27AM -0800, Breno Leitao wrote:
> > The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu()
> > is redundant because the only caller (handle_mounts) already verifies
> > d_managed(dentry) before calling this function, so, dentry in
> > __follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set.
> >
> > This early-out optimization never fires in practice - but it is marking
> > as likely().
> >
> > This was detected with branch profiling, which shows 100% misprediction
> > in this likely.
> >
> > Remove the whole if clause instead of removing the likely, given we
> > know for sure that dentry is not DCACHE_MANAGED_DENTRY.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > fs/namei.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index bf0f66f0e9b9..774a2f5b0a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1623,9 +1623,6 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
> > struct dentry *dentry = path->dentry;
> > unsigned int flags = dentry->d_flags;
> >
> > - if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
> > - return true;
> > -
>
> This makes me very uneasy.
>
> You are seeing 100% misses on this one because you are never racing
> against someone mounting and umounting on the dentry as you are doing
> the lookup.
I'm still learning VFS internals, so please bear with me.
If I understand correctly, the current code checks the same condition
twice in succession:
handle_mounts() {
if (!d_managed(dentry)) /* dentry->d_flags & DCACHE_MANAGED_DENTRY */
return 0;
__follow_mount_rcu() { /* Something changes here */
unsigned int flags = dentry->d_flags;
if (!(flags & DCACHE_MANAGED_DENTRY))
return
Is your concern that DCACHE_MANAGED_DENTRY could be cleared between
these two checks?
> As in it is possible that by the time __follow_mount_rcu is invoked,
> DCACHE_MANAGED_DENTRY is no longer set and with the check removed the
> rest of the routine keeps executing.
I see, but couldn't the same race occur after the second check as
well?
In other words, whether we have one check or two, DCACHE_MANAGED_DENTRY
could still be unset immediately after the final check, leading to the
same situation.
> AFAICS this turns harmless as is anyway, but I don't think that's safe
> to rely on future-wise and more imporantly it is trivially avoidable.
>
> I did not do it at the time because there are no d_ macros which operate
> on already read flags and I could not be bothered to add them. In
> retrospect a bad call, should have went with it and kept the open-coded
> DCACHE_MANAGED_DENTRY check.
>
> something like this:
>
> diff --git a/fs/namei.c b/fs/namei.c
> index bf0f66f0e9b9..c6279f8023cf 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1618,13 +1618,11 @@ EXPORT_SYMBOL(follow_down);
> * Try to skip to top of mountpoint pile in rcuwalk mode. Fail if
> * we meet a managed dentry that would need blocking.
> */
> -static bool __follow_mount_rcu(struct nameidata *nd, struct path *path)
> +static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, int flags)
> {
> struct dentry *dentry = path->dentry;
> - unsigned int flags = dentry->d_flags;
>
> - if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
> - return true;
> + VFS_BUG_ON(!(flags & DCACHE_MANAGED_DENTRY));
>
> if (unlikely(nd->flags & LOOKUP_NO_XDEV))
> return false;
> @@ -1672,9 +1670,10 @@ static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
> path->dentry = dentry;
> if (nd->flags & LOOKUP_RCU) {
> unsigned int seq = nd->next_seq;
> - if (likely(!d_managed(dentry)))
> + unsigned int flags = READ_ONCE(dentry->d_flags);
> + if (likely(!(dentry->d_flags & DCACHE_MANAGED_DENTRY)))
Minor nit: should this be "if (likely(!(flags & DCACHE_MANAGED_DENTRY)))?"
Otherwise you're reading d_flags twice but passing the stale value to
__follow_mount_rcu().
If I understand your intent correctly, you want to read the flags once
and consistently use that snapshot throughout. Is that right?
Thanks for your review,
--breno
On Wed, Jan 07, 2026 at 08:07:50AM -0800, Breno Leitao wrote:
> If I understand correctly, the current code checks the same condition
> twice in succession:
>
> handle_mounts() {
> if (!d_managed(dentry)) /* dentry->d_flags & DCACHE_MANAGED_DENTRY */
> return 0;
> __follow_mount_rcu() { /* Something changes here */
> unsigned int flags = dentry->d_flags;
> if (!(flags & DCACHE_MANAGED_DENTRY))
> return
>
> Is your concern that DCACHE_MANAGED_DENTRY could be cleared between
> these two checks?
It may, but who cares?
> > As in it is possible that by the time __follow_mount_rcu is invoked,
> > DCACHE_MANAGED_DENTRY is no longer set and with the check removed the
> > rest of the routine keeps executing.
>
> I see, but couldn't the same race occur after the second check as
> well?
>
> In other words, whether we have one check or two, DCACHE_MANAGED_DENTRY
> could still be unset immediately after the final check, leading to the
> same situation.
Folks, this is fundamentally a lockless path; there are places on it where
we care about barriers, but this is not one of those. The damn thing
may cease to be a mountpoint under us, or it may turn into a symlink, device
node or a pumpkin for all we care.
> > + unsigned int flags = READ_ONCE(dentry->d_flags);
> > + if (likely(!(dentry->d_flags & DCACHE_MANAGED_DENTRY)))
>
> Minor nit: should this be "if (likely(!(flags & DCACHE_MANAGED_DENTRY)))?"
> Otherwise you're reading d_flags twice but passing the stale value to
> __follow_mount_rcu().
>
> If I understand your intent correctly, you want to read the flags once
> and consistently use that snapshot throughout. Is that right?
TBH, this (starting with READ_ONCE()) is quite pointless; documentation is
badly needed in the area, but asserts will not replace it.
Note that mount traversal *is* wrapped in mount_lock seqcount check; anything
that used to be a mountpoint, but has ceased to be such will automatically
trigger a full repeat of pathwalk in non-rcu mode. What's more, the same
check is done upon the transition from rcu to non-rcu, with the same effect
of a mismatch.
This READ_ONCE() is pointless, with or without a check in the next line
being done the way it's done. We are explicitly OK with the damn thing
changing under us; the check in the beginning of __follow_mount_rcu()
is only a shortcut for very common case, equivalent to what the loop
below would've done if we didn't have that check there.
IOW, correctness is not an issue here; dealing with races belongs higher
in call chain and "let's read the flags into a local variable" has nothing
to do with that - it's getting a decent code generation on the fast
path and nothing beyond that.
There *ARE* places where barriers can be an issue, but those are about
the order of acquisition of inode vs sequence counter of dentry and
placement of rechecking said sequence counter. Not here...
On Wed, Jan 7, 2026 at 9:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > TBH, this (starting with READ_ONCE()) is quite pointless; documentation is > badly needed in the area, but asserts will not replace it. > > Note that mount traversal *is* wrapped in mount_lock seqcount check; anything > that used to be a mountpoint, but has ceased to be such will automatically > trigger a full repeat of pathwalk in non-rcu mode. What's more, the same > check is done upon the transition from rcu to non-rcu, with the same effect > of a mismatch. > > This READ_ONCE() is pointless, with or without a check in the next line > being done the way it's done. We are explicitly OK with the damn thing > changing under us; the check in the beginning of __follow_mount_rcu() > is only a shortcut for very common case, equivalent to what the loop > below would've done if we didn't have that check there. > As far as I can tell it currently happens to not matter for __follow_mount_rcu, so there is no correctness issue as is. I do claim that basic hygiene in case of lockless code dictates stuff gets read and acted on once if feasible, even if the code at hand would be fine without it. Any outsider (e.g., me) reading this diff has to pause and ponder if it changes anything and this is trivially avoidable by passing the found value into the routine.
On Mon, 05 Jan 2026 07:10:27 -0800, Breno Leitao wrote:
> The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu()
> is redundant because the only caller (handle_mounts) already verifies
> d_managed(dentry) before calling this function, so, dentry in
> __follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set.
>
> This early-out optimization never fires in practice - but it is marking
> as likely().
>
> [...]
Applied to the vfs-7.0.misc branch of the vfs/vfs.git tree.
Patches in the vfs-7.0.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.0.misc
[1/1] fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
https://git.kernel.org/vfs/vfs/c/a6b9f5b2f04b
On Mon, Jan 05, 2026 at 07:10:27AM -0800, Breno Leitao wrote: > The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu() > is redundant because the only caller (handle_mounts) already verifies > d_managed(dentry) before calling this function, so, dentry in > __follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set. ... since 9d2a6211a7b9 "fs: tidy up step_into() & friends before inlining", when the check got duplicated into the caller. > This early-out optimization never fires in practice - but it is marking > as likely(). > > This was detected with branch profiling, which shows 100% misprediction > in this likely. > > Remove the whole if clause instead of removing the likely, given we > know for sure that dentry is not DCACHE_MANAGED_DENTRY. AFAICS, that's OK, for the same reason why we didn't need barriers in the original... Acked-by: Al Viro <viro@zeniv.linux.org.uk>
© 2016 - 2026 Red Hat, Inc.