[PATCH] device_cgroup: remove branch hint after code refactor

Breno Leitao posted 1 patch 1 month ago
include/linux/device_cgroup.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] device_cgroup: remove branch hint after code refactor
Posted by Breno Leitao 1 month ago
commit 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the
common case in devcgroup_inode_permission()") reordered the checks in
devcgroup_inode_permission() to check the inode mode before checking
i_rdev, for better cache behavior.

However, the likely() annotation on the i_rdev check was not updated
to reflect the new code flow. Originally, when i_rdev was checked
first, likely(!inode->i_rdev) made sense because most inodes were(?)
regular files/directories, thus i_rdev == 0.

After the reorder, by the time we reach the i_rdev check, we have
already confirmed the inode IS a block or character device. Block and
character special files are precisely defined by having a device number
(i_rdev), so !inode->i_rdev is now the rare edge case, not the common
case.

Branch profiling confirmed this is 100% mispredicted:

  correct incorrect  %    Function                      File              Line
  ------- ---------  -    --------                      ----              ----
        0   2631904 100   devcgroup_inode_permission    device_cgroup.h   24

Remove likely() to avoid giving the wrong hint to the CPU.

Fixes: 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/device_cgroup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 0864773a57e8..822085bc2d20 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 	if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
 		return 0;
 
-	if (likely(!inode->i_rdev))
+	if (!inode->i_rdev)
 		return 0;
 
 	if (S_ISBLK(inode->i_mode))

---
base-commit: f0b9d8eb98dfee8d00419aa07543bdc2c1a44fb1
change-id: 20260107-likely_device-20dae82d502d

Best regards,
--  
Breno Leitao <leitao@debian.org>
Re: [PATCH] device_cgroup: remove branch hint after code refactor
Posted by Christian Brauner 3 weeks, 5 days ago
On Wed, 07 Jan 2026 06:06:36 -0800, Breno Leitao wrote:
> commit 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the
> common case in devcgroup_inode_permission()") reordered the checks in
> devcgroup_inode_permission() to check the inode mode before checking
> i_rdev, for better cache behavior.
> 
> However, the likely() annotation on the i_rdev check was not updated
> to reflect the new code flow. Originally, when i_rdev was checked
> first, likely(!inode->i_rdev) made sense because most inodes were(?)
> regular files/directories, thus i_rdev == 0.
> 
> [...]

Applied to the vfs-7.0.misc branch of the vfs/vfs.git tree.
Patches in the vfs-7.0.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.0.misc

[1/1] device_cgroup: remove branch hint after code refactor
      https://git.kernel.org/vfs/vfs/c/2a255acce2e5
Re: [PATCH] device_cgroup: remove branch hint after code refactor
Posted by Mateusz Guzik 1 month ago
On Wed, Jan 7, 2026 at 3:06 PM Breno Leitao <leitao@debian.org> wrote:
>
> commit 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the
> common case in devcgroup_inode_permission()") reordered the checks in
> devcgroup_inode_permission() to check the inode mode before checking
> i_rdev, for better cache behavior.
>
> However, the likely() annotation on the i_rdev check was not updated
> to reflect the new code flow. Originally, when i_rdev was checked
> first, likely(!inode->i_rdev) made sense because most inodes were(?)
> regular files/directories, thus i_rdev == 0.
>
> After the reorder, by the time we reach the i_rdev check, we have
> already confirmed the inode IS a block or character device. Block and
> character special files are precisely defined by having a device number
> (i_rdev), so !inode->i_rdev is now the rare edge case, not the common
> case.
>
> Branch profiling confirmed this is 100% mispredicted:
>
>   correct incorrect  %    Function                      File              Line
>   ------- ---------  -    --------                      ----              ----
>         0   2631904 100   devcgroup_inode_permission    device_cgroup.h   24
>
> Remove likely() to avoid giving the wrong hint to the CPU.
>
> Fixes: 4ef4ac360101 ("device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/linux/device_cgroup.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> index 0864773a57e8..822085bc2d20 100644
> --- a/include/linux/device_cgroup.h
> +++ b/include/linux/device_cgroup.h
> @@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
>         if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
>                 return 0;
>
> -       if (likely(!inode->i_rdev))
> +       if (!inode->i_rdev)
>                 return 0;
>

The branch was left there because I could not be bothered to analyze
whether it can be straight up eleminated with the new checks in place.

A quick look at init_special_inode suggests it is an invariant rdev is
there in this case.

So for the time being I would replace likely with WARN_ON_ONCE . Might
be even a good candidate for the pending release.
Re: [PATCH] device_cgroup: remove branch hint after code refactor
Posted by Breno Leitao 1 month ago
Hello Mateusz,

On Wed, Jan 07, 2026 at 03:17:40PM +0100, Mateusz Guzik wrote:
> On Wed, Jan 7, 2026 at 3:06 PM Breno Leitao <leitao@debian.org> wrote:
> > diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> > index 0864773a57e8..822085bc2d20 100644
> > --- a/include/linux/device_cgroup.h
> > +++ b/include/linux/device_cgroup.h
> > @@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
> >         if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
> >                 return 0;
> >
> > -       if (likely(!inode->i_rdev))
> > +       if (!inode->i_rdev)
> >                 return 0;
> >
> 
> The branch was left there because I could not be bothered to analyze
> whether it can be straight up eleminated with the new checks in place.
> 
> A quick look at init_special_inode suggests it is an invariant rdev is
> there in this case.
> 
> So for the time being I would replace likely with WARN_ON_ONCE . Might
> be even a good candidate for the pending release.

Oh, in fact that was my first try, but, when I tested it, I found that
it warns in some cases.

	[  126.951410] WARNING: ./include/linux/device_cgroup.h:24 at inode_permission+0x181/0x190, CPU#4: networkd)/1212
	[  126.971659] Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) iTCO_wdt(E) iTCO_vendor_support(E) irqbypass(E) acpi_cpufreq(E) i2c_i801(E) i2c_smbus(E) wmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) evdev(E) button(E) bpf_preload(E) sch_fq_codel(E) xhci_pci(E) xhci_hcd(E) vhost_net(E) tap(E) tun(E) vhost(E) vhost_iotlb(E) mpls_gso(E) mpls_iptunnel(E) mpls_router(E) fou(E) loop(E) efivarfs(E) autofs4(E)
	[  127.097485] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
	[  127.109038] Hardware name: Quanta Delta Lake MP 29F0EMA01D0/Delta Lake-Class1, BIOS F0E_3A21 06/27/2024
	[  127.127889] RIP: 0010:inode_permission (./include/linux/device_cgroup.h:24 fs/namei.c:600)
	[  127.137542] Code: 14 81 e2 ff ff 0f 00 31 ff 3d 00 60 00 00 41 8d 0c 48 40 0f 95 c7 ff c7 e8 5c 68 35 00 85 c0 0f 85 6d ff ff ff e9 29 ff ff ff <0f> 0b e9 22 ff ff ff 0f 1f 84 00 00 00 00 00 41 56 53 49 89 f6 48
	All code
	========
	0:    14 81                    adc    $0x81,%al
	2:    e2 ff                    loop   0x3
	4:    ff 0f                    decl   (%rdi)
	6:    00 31                    add    %dh,(%rcx)
	8:    ff                       (bad)
	9:    3d 00 60 00 00           cmp    $0x6000,%eax
	e:    41 8d 0c 48              lea    (%r8,%rcx,2),%ecx
	12:    40 0f 95 c7              setne  %dil
	16:    ff c7                    inc    %edi
	18:    e8 5c 68 35 00           call   0x356879
	1d:    85 c0                    test   %eax,%eax
	1f:    0f 85 6d ff ff ff        jne    0xffffffffffffff92
	25:    e9 29 ff ff ff           jmp    0xffffffffffffff53
	2a:*    0f 0b                    ud2            <-- trapping instruction
	2c:    e9 22 ff ff ff           jmp    0xffffffffffffff53
	31:    0f 1f 84 00 00 00 00     nopl   0x0(%rax,%rax,1)
	38:    00
	39:    41 56                    push   %r14
	3b:    53                       push   %rbx
	3c:    49 89 f6                 mov    %rsi,%r14
	3f:    48                       rex.W

	Code starting with the faulting instruction
	===========================================
	0:    0f 0b                    ud2
	2:    e9 22 ff ff ff           jmp    0xffffffffffffff29
	7:    0f 1f 84 00 00 00 00     nopl   0x0(%rax,%rax,1)
	e:    00
	f:    41 56                    push   %r14
	11:    53                       push   %rbx
	12:    49 89 f6                 mov    %rsi,%r14
	15:    48                       rex.W
	[  127.175161] RSP: 0018:ffffc9000438fe60 EFLAGS: 00010246
	[  127.185667] RAX: 0000000000002000 RBX: 0000000000000010 RCX: 0000000000006000
	[  127.199992] RDX: 0000000000000000 RSI: ffff88842a9b5418 RDI: ffffffff83e31178
	[  127.214320] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
	[  127.228667] R10: 0000000000000000 R11: ffff88842a9b54b0 R12: ffffffff83e31178
	[  127.243017] R13: ffff88842a9b5418 R14: ffff88842a9b5418 R15: ffff88842a9b5498
	[  127.257360] FS:  00007f86e4bc3900(0000) GS:ffff8890b23fa000(0000) knlGS:0000000000000000
	[  127.273624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  127.285172] CR2: 000055990443e068 CR3: 00000004183b2001 CR4: 00000000007726f0
	[  127.299500] PKRU: 55555554
	[  127.304961] Call Trace:
	[  127.309885]  <TASK>
	[  127.314121]  do_faccessat (fs/open.c:508)
	[  127.321488]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
	[  127.331973]  __x64_sys_access (fs/open.c:549 fs/open.c:547 fs/open.c:547)
	[  127.339677]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
	[  127.347038]  ? rcu_is_watching (./include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
	[  127.354775]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
	[  127.364914] RIP: 0033:0x7f86e42ff21b


Lines against commit f0b9d8eb98df  ("Merge tag 'nfsd-6.19-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux"), plus something like:


diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 822085bc2d20..c2b933a318f8 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
        if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
                return 0;

-       if (likely(!inode->i_rdev))
+       if (WARN_ON_ONCE(!inode->i_rdev))
                return 0;

        if (S_ISBLK(inode->i_mode))


Re: [PATCH] device_cgroup: remove branch hint after code refactor
Posted by Mateusz Guzik 1 month ago
On Wed, Jan 7, 2026 at 4:17 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Mateusz,
>
> On Wed, Jan 07, 2026 at 03:17:40PM +0100, Mateusz Guzik wrote:
> > On Wed, Jan 7, 2026 at 3:06 PM Breno Leitao <leitao@debian.org> wrote:
> > > diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> > > index 0864773a57e8..822085bc2d20 100644
> > > --- a/include/linux/device_cgroup.h
> > > +++ b/include/linux/device_cgroup.h
> > > @@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
> > >         if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
> > >                 return 0;
> > >
> > > -       if (likely(!inode->i_rdev))
> > > +       if (!inode->i_rdev)
> > >                 return 0;
> > >
> >
> > The branch was left there because I could not be bothered to analyze
> > whether it can be straight up eleminated with the new checks in place.
> >
> > A quick look at init_special_inode suggests it is an invariant rdev is
> > there in this case.
> >
> > So for the time being I would replace likely with WARN_ON_ONCE . Might
> > be even a good candidate for the pending release.
>
> Oh, in fact that was my first try, but, when I tested it, I found that
> it warns in some cases.
>

heh, bummer

in that case:
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>

thanks

>         [  126.951410] WARNING: ./include/linux/device_cgroup.h:24 at inode_permission+0x181/0x190, CPU#4: networkd)/1212
>         [  126.971659] Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) iTCO_wdt(E) iTCO_vendor_support(E) irqbypass(E) acpi_cpufreq(E) i2c_i801(E) i2c_smbus(E) wmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) evdev(E) button(E) bpf_preload(E) sch_fq_codel(E) xhci_pci(E) xhci_hcd(E) vhost_net(E) tap(E) tun(E) vhost(E) vhost_iotlb(E) mpls_gso(E) mpls_iptunnel(E) mpls_router(E) fou(E) loop(E) efivarfs(E) autofs4(E)
>         [  127.097485] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
>         [  127.109038] Hardware name: Quanta Delta Lake MP 29F0EMA01D0/Delta Lake-Class1, BIOS F0E_3A21 06/27/2024
>         [  127.127889] RIP: 0010:inode_permission (./include/linux/device_cgroup.h:24 fs/namei.c:600)
>         [  127.137542] Code: 14 81 e2 ff ff 0f 00 31 ff 3d 00 60 00 00 41 8d 0c 48 40 0f 95 c7 ff c7 e8 5c 68 35 00 85 c0 0f 85 6d ff ff ff e9 29 ff ff ff <0f> 0b e9 22 ff ff ff 0f 1f 84 00 00 00 00 00 41 56 53 49 89 f6 48
>         All code
>         ========
>         0:    14 81                    adc    $0x81,%al
>         2:    e2 ff                    loop   0x3
>         4:    ff 0f                    decl   (%rdi)
>         6:    00 31                    add    %dh,(%rcx)
>         8:    ff                       (bad)
>         9:    3d 00 60 00 00           cmp    $0x6000,%eax
>         e:    41 8d 0c 48              lea    (%r8,%rcx,2),%ecx
>         12:    40 0f 95 c7              setne  %dil
>         16:    ff c7                    inc    %edi
>         18:    e8 5c 68 35 00           call   0x356879
>         1d:    85 c0                    test   %eax,%eax
>         1f:    0f 85 6d ff ff ff        jne    0xffffffffffffff92
>         25:    e9 29 ff ff ff           jmp    0xffffffffffffff53
>         2a:*    0f 0b                    ud2            <-- trapping instruction
>         2c:    e9 22 ff ff ff           jmp    0xffffffffffffff53
>         31:    0f 1f 84 00 00 00 00     nopl   0x0(%rax,%rax,1)
>         38:    00
>         39:    41 56                    push   %r14
>         3b:    53                       push   %rbx
>         3c:    49 89 f6                 mov    %rsi,%r14
>         3f:    48                       rex.W
>
>         Code starting with the faulting instruction
>         ===========================================
>         0:    0f 0b                    ud2
>         2:    e9 22 ff ff ff           jmp    0xffffffffffffff29
>         7:    0f 1f 84 00 00 00 00     nopl   0x0(%rax,%rax,1)
>         e:    00
>         f:    41 56                    push   %r14
>         11:    53                       push   %rbx
>         12:    49 89 f6                 mov    %rsi,%r14
>         15:    48                       rex.W
>         [  127.175161] RSP: 0018:ffffc9000438fe60 EFLAGS: 00010246
>         [  127.185667] RAX: 0000000000002000 RBX: 0000000000000010 RCX: 0000000000006000
>         [  127.199992] RDX: 0000000000000000 RSI: ffff88842a9b5418 RDI: ffffffff83e31178
>         [  127.214320] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
>         [  127.228667] R10: 0000000000000000 R11: ffff88842a9b54b0 R12: ffffffff83e31178
>         [  127.243017] R13: ffff88842a9b5418 R14: ffff88842a9b5418 R15: ffff88842a9b5498
>         [  127.257360] FS:  00007f86e4bc3900(0000) GS:ffff8890b23fa000(0000) knlGS:0000000000000000
>         [  127.273624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>         [  127.285172] CR2: 000055990443e068 CR3: 00000004183b2001 CR4: 00000000007726f0
>         [  127.299500] PKRU: 55555554
>         [  127.304961] Call Trace:
>         [  127.309885]  <TASK>
>         [  127.314121]  do_faccessat (fs/open.c:508)
>         [  127.321488]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
>         [  127.331973]  __x64_sys_access (fs/open.c:549 fs/open.c:547 fs/open.c:547)
>         [  127.339677]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
>         [  127.347038]  ? rcu_is_watching (./include/linux/context_tracking.h:128 kernel/rcu/tree.c:751)
>         [  127.354775]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
>         [  127.364914] RIP: 0033:0x7f86e42ff21b
>
>
> Lines against commit f0b9d8eb98df  ("Merge tag 'nfsd-6.19-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux"), plus something like:
>
>
> diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> index 822085bc2d20..c2b933a318f8 100644
> --- a/include/linux/device_cgroup.h
> +++ b/include/linux/device_cgroup.h
> @@ -21,7 +21,7 @@ static inline int devcgroup_inode_permission(struct inode *inode, int mask)
>         if (likely(!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode)))
>                 return 0;
>
> -       if (likely(!inode->i_rdev))
> +       if (WARN_ON_ONCE(!inode->i_rdev))
>                 return 0;
>
>         if (S_ISBLK(inode->i_mode))
>
>