[RFC PATCH v8 00/10] context_tracking,x86: Defer some IPIs until a user->kernel transition

Valentin Schneider posted 10 patches 1 week, 2 days ago
arch/x86/Kconfig                        |  14 +++
arch/x86/entry/calling.h                |  13 +++
arch/x86/entry/entry.S                  |   3 +-
arch/x86/entry/syscall_64.c             |   4 +
arch/x86/include/asm/jump_label.h       |  33 +++++++-
arch/x86/include/asm/text-patching.h    |   5 ++
arch/x86/include/asm/tlbflush.h         |   4 +
arch/x86/kernel/alternative.c           |  34 ++++++--
arch/x86/kernel/cpu/bugs.c              |   2 +-
arch/x86/kernel/kprobes/core.c          |   4 +-
arch/x86/kernel/kprobes/opt.c           |   4 +-
arch/x86/kernel/module.c                |   2 +-
arch/x86/mm/pti.c                       |  36 +++++---
arch/x86/mm/tlb.c                       |  34 ++++++--
include/linux/jump_label.h              |  11 ++-
include/linux/objtool.h                 |  16 ++++
kernel/sched/isolation.c                |   2 +-
mm/vmalloc.c                            |  30 +++++--
tools/objtool/Documentation/objtool.txt |  12 +++
tools/objtool/check.c                   | 108 ++++++++++++++++++++----
tools/objtool/include/objtool/check.h   |   2 +
tools/objtool/include/objtool/elf.h     |   3 +-
tools/objtool/include/objtool/special.h |   1 +
tools/objtool/special.c                 |  15 +++-
24 files changed, 331 insertions(+), 61 deletions(-)
[RFC PATCH v8 00/10] context_tracking,x86: Defer some IPIs until a user->kernel transition
Posted by Valentin Schneider 1 week, 2 days ago
Context
=======

We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a
pure-userspace application get regularly interrupted by IPIs sent from
housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs
leading to various on_each_cpu() calls, e.g.:

  64359.052209596    NetworkManager       0    1405     smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
    smp_call_function_many_cond+0x1
    smp_call_function+0x39
    on_each_cpu+0x2a
    flush_tlb_kernel_range+0x7b
    __purge_vmap_area_lazy+0x70
    _vm_unmap_aliases.part.42+0xdf
    change_page_attr_set_clr+0x16a
    set_memory_ro+0x26
    bpf_int_jit_compile+0x2f9
    bpf_prog_select_runtime+0xc6
    bpf_prepare_filter+0x523
    sk_attach_filter+0x13
    sock_setsockopt+0x92c
    __sys_setsockopt+0x16a
    __x64_sys_setsockopt+0x20
    do_syscall_64+0x87
    entry_SYSCALL_64_after_hwframe+0x65

The heart of this series is the thought that while we cannot remove NOHZ_FULL
CPUs from the list of CPUs targeted by these IPIs, they may not have to execute
the callbacks immediately. Anything that only affects kernelspace can wait
until the next user->kernel transition, providing it can be executed "early
enough" in the entry code.

The original implementation is from Peter [1]. Nicolas then added kernel TLB
invalidation deferral to that [2], and I picked it up from there.

Deferral approach
=================

Previous versions would assign IPIs a "type" and have a mapping of IPI type to
callback, leveraged upon kernel entry via the context_tracking framework.

This version now gets rid of all that, and instead goes with an
"unconditionnally run a catch-up sequence at kernel entry" approach - as was
suggested at LPC 2025 [3].

Another point made during LPC25 (sorry I didn't get your name!) was that when
kPTI is in use, the use of global pages is very limited and thus a CR4 may not
be warranted for a kernel TLB flush. That means the existing CR3 RMW used to switch
between kernel and user page tables can be used as the unconditionnal TLB flush,
meaning I could get rid of my CR4 dance.

In the same spirit, turns out a CR3 RMW is a serializing instruction:

  SDM vol2 chapter 4.3 - Move to/from control registers:
  ```
  MOV CR* instructions, except for MOV CR8, are serializing instructions.
  ```
That means I don't need to do anything extra on kernel entry to handle deferred
sync_core() IPIs sent from text_poke().
  
So long story short, the CR3 RMW that is executed for every user <-> kernel
transition when kPTI is enabled does everything I need to defer kernel TLB flush
and kernel text update IPIs. 

From that, I've completely nuked the context_tracking deferral faff.
The added x86-specific code is now "just" about having a software signal
to figure out which CR3 a CPU is using - easier said than done, details in
the individual changelogs.

Kernel entry vs execution of the deferred operation
===================================================

This is what I've referred to as the "Danger Zone" during my LPC24 talk [4].

There is a non-zero length of code that is executed upon kernel entry before the
deferred operation can be itself executed (before we start getting into
context_tracking.c proper), i.e.:

  idtentry
    idtentry_body
      error_entry
        SWITCH_TO_KERNEL_CR3

This danger zone used to be much wider in v7 and earlier (from kernel entry all
the way down to ct_kernel_enter_state()). The objtool instrumentation thus now
targets .entry.text rather than .noinstr as a whole.

Show me numbers
===============

Xeon E5-2699 system with SMToff, NOHZ_FULL, 26 isolated CPUs.
RHEL10 userspace.

Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs
and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is:

$ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \
                      -R "stacktrace if cpu & CPUS{$ISOL_CPUS}" \
                   -e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \
	           -e "ipi_send_cpu"     -f "cpu & CPUS{$ISOL_CPUS}" \
		   rteval --onlyload --loads-cpulist=$HK_CPUS \
		   --hackbench-runlowmem=True --duration=$DURATION

This only records IPIs sent to isolated CPUs, so any event there is interference
(with a bit of fuzz at the start/end of the workload when spawning the
processes). All tests were done with a duration of 6 hours.

v6.19
o ~6000 IPIs received, so about ~230 interfering IPI per isolated CPU
o About one interfering IPI roughly every 1 minute 30 seconds

v6.19 + patches
o Zilch... With some caveats

  I still get some TLB flush IPIs sent to seemingly still-in-userspace CPUs,
  about one per ~3h for /some/ runs. I haven't seen any in the last cumulated
  24h of testing...

  pcpu_balance_work also sometimes shows up, and isn't covered by the deferral
  faff. Again, sometimes it shows up, sometimes it doesn't and hasn't for a
  while now.

Patches
=======

o Patches 1-4 are standalone objtool cleanups.

o Patches 5-6 add infrastructure for annotating static keys that may be used in
  entry code (courtesy of Josh). 

o Patch 7 adds ASM support for static keys

o Patches 8-10 add the deferral mechanism.

Patches are also available at:
https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v8

Acknowledgements
================

Special thanks to:
o Clark Williams for listening to my ramblings about this and throwing ideas my way
o Josh Poimboeuf for all his help with everything objtool-related
o Dave Hansen for patiently educating me about mm
o All of the folks who attended various (too many?) talks about this and
  provided precious feedback.  

Links
=====

[1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/
[2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip
[3]: https://lpc.events/event/19/contributions/2219/
[4]: https://lpc.events/event/18/contributions/1889/

Revisions
=========

v7 -> v8
++++++++

o Rebased onto v6.19

o Fixed objtool --uaccess validation preventing --noinstr validation of
  unwind hints
o Added more objtool --noinstr warning fixes
o Reduced objtool noinstr static key validation to just .entry.text

o Moved the kernel_cr3_loaded signal update to before writing to CR3

o Ditched context_tracking based deferral
o Ditched the (additionnal) unconditionnal TLB flush upon kernel entry

v6 -> v7
++++++++

o Rebased onto latest v6.18-rc5 (6fa9041b7177f)
o Collected Acks (Sean, Frederic)

o Fixed <asm/context_tracking_work.h> include (Shrikanth)
o Fixed ct_set_cpu_work() CT_RCU_WATCHING logic (Frederic)

o Wrote more verbose comments about NOINSTR static keys and calls (Petr)

o [NEW PATCH] Instrumented one more static key: cpu_bf_vm_clear
o [NEW PATCH] added ASM-accessible static key helpers to gate NO_HZ_FULL logic
  in early entry code (Frederic)

v5 -> v6
++++++++

o Rebased onto v6.17
o Small conflict fixes with cpu_buf_idle_clear smp_text_poke() renaming

o Added the TLB flush craziness

v4 -> v5
++++++++

o Rebased onto v6.15-rc3
o Collected Reviewed-by

o Annotated a few more static keys
o Added proper checking of noinstr sections that are in loadable code such as
  KVM early entry (Sean Christopherson)

o Switched to checking for CT_RCU_WATCHING instead of CT_STATE_KERNEL or
  CT_STATE_IDLE, which means deferral is now behaving sanely for IRQ/NMI
  entry from idle (thanks to Frederic!)

o Ditched the vmap TLB flush deferral (for now)  
  

RFCv3 -> v4
+++++++++++

o Rebased onto v6.13-rc6

o New objtool patches from Josh
o More .noinstr static key/call patches
o Static calls now handled as well (again thanks to Josh)

o Fixed clearing the work bits on kernel exit
o Messed with IRQ hitting an idle CPU vs context tracking
o Various comment and naming cleanups

o Made RCU_DYNTICKS_TORTURE depend on !COMPILE_TEST (PeterZ)
o Fixed the CT_STATE_KERNEL check when setting a deferred work (Frederic)
o Cleaned up the __flush_tlb_all() mess thanks to PeterZ

RFCv2 -> RFCv3
++++++++++++++

o Rebased onto v6.12-rc6

o Added objtool documentation for the new warning (Josh)
o Added low-size RCU watching counter to TREE04 torture scenario (Paul)
o Added FORCEFUL jump label and static key types
o Added noinstr-compliant helpers for tlb flush deferral


RFCv1 -> RFCv2
++++++++++++++

o Rebased onto v6.5-rc1

o Updated the trace filter patches (Steven)

o Fixed __ro_after_init keys used in modules (Peter)
o Dropped the extra context_tracking atomic, squashed the new bits in the
  existing .state field (Peter, Frederic)
  
o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an
  rcutorture case for a low-size counter (Paul) 

o Fixed flush_tlb_kernel_range_deferrable() definition

Josh Poimboeuf (1):
  objtool: Add .entry.text validation for static branches

Valentin Schneider (9):
  objtool: Make validate_call() recognize indirect calls to pv_ops[]
  objtool: Flesh out warning related to pv_ops[] calls
  objtool: Always pass a section to validate_unwind_hints()
  x86/retpoline: Make warn_thunk_thunk .noinstr
  sched/isolation: Mark housekeeping_overridden key as __ro_after_init
  x86/jump_label: Add ASM support for static_branch_likely()
  x86/mm/pti: Introduce a kernel/user CR3 software signal
  context_tracking,x86: Defer kernel text patching IPIs when tracking
    CR3 switches
  x86/mm, mm/vmalloc: Defer kernel TLB flush IPIs when tracking CR3
    switches

 arch/x86/Kconfig                        |  14 +++
 arch/x86/entry/calling.h                |  13 +++
 arch/x86/entry/entry.S                  |   3 +-
 arch/x86/entry/syscall_64.c             |   4 +
 arch/x86/include/asm/jump_label.h       |  33 +++++++-
 arch/x86/include/asm/text-patching.h    |   5 ++
 arch/x86/include/asm/tlbflush.h         |   4 +
 arch/x86/kernel/alternative.c           |  34 ++++++--
 arch/x86/kernel/cpu/bugs.c              |   2 +-
 arch/x86/kernel/kprobes/core.c          |   4 +-
 arch/x86/kernel/kprobes/opt.c           |   4 +-
 arch/x86/kernel/module.c                |   2 +-
 arch/x86/mm/pti.c                       |  36 +++++---
 arch/x86/mm/tlb.c                       |  34 ++++++--
 include/linux/jump_label.h              |  11 ++-
 include/linux/objtool.h                 |  16 ++++
 kernel/sched/isolation.c                |   2 +-
 mm/vmalloc.c                            |  30 +++++--
 tools/objtool/Documentation/objtool.txt |  12 +++
 tools/objtool/check.c                   | 108 ++++++++++++++++++++----
 tools/objtool/include/objtool/check.h   |   2 +
 tools/objtool/include/objtool/elf.h     |   3 +-
 tools/objtool/include/objtool/special.h |   1 +
 tools/objtool/special.c                 |  15 +++-
 24 files changed, 331 insertions(+), 61 deletions(-)

--
2.52.0
[syzbot ci] Re: context_tracking,x86: Defer some IPIs until a user->kernel transition
Posted by syzbot ci 1 week, 2 days ago
syzbot ci has tested the following series

[v8] context_tracking,x86: Defer some IPIs until a user->kernel transition
https://lore.kernel.org/all/20260324094801.3092968-1-vschneid@redhat.com
* [RFC PATCH v8 01/10] objtool: Make validate_call() recognize indirect calls to pv_ops[]
* [RFC PATCH v8 02/10] objtool: Flesh out warning related to pv_ops[] calls
* [RFC PATCH v8 03/10] objtool: Always pass a section to validate_unwind_hints()
* [RFC PATCH v8 04/10] x86/retpoline: Make warn_thunk_thunk .noinstr
* [RFC PATCH v8 05/10] sched/isolation: Mark housekeeping_overridden key as __ro_after_init
* [RFC PATCH v8 06/10] objtool: Add .entry.text validation for static branches
* [RFC PATCH v8 07/10] x86/jump_label: Add ASM support for static_branch_likely()
* [RFC PATCH v8 08/10] x86/mm/pti: Introduce a kernel/user CR3 software signal
* [RFC PATCH v8 09/10] context_tracking,x86: Defer kernel text patching IPIs when tracking CR3 switches
* [RFC PATCH v8 10/10] x86/mm, mm/vmalloc: Defer kernel TLB flush IPIs when tracking CR3 switches

and found the following issues:
* KASAN: slab-out-of-bounds Read in __dynamic_pr_debug
* KASAN: slab-use-after-free Read in __dynamic_dev_dbg

Full report is available here:
https://ci.syzbot.org/series/e1f9c661-db83-4882-8439-ab6d1b3ffe07

***

KASAN: slab-out-of-bounds Read in __dynamic_pr_debug

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      f9d6fc9557e68b48253818870d002dc4784cb2f1
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/c63282ed-913a-4673-b0e4-cd21246874b2/config
C repro:   https://ci.syzbot.org/findings/5b3af331-6a29-4205-911e-8924b9c54449/c_repro
syz repro: https://ci.syzbot.org/findings/5b3af331-6a29-4205-911e-8924b9c54449/syz_repro

==================================================================
BUG: KASAN: slab-out-of-bounds in string_nocheck lib/vsprintf.c:654 [inline]
BUG: KASAN: slab-out-of-bounds in string+0x231/0x2b0 lib/vsprintf.c:736
Read of size 1 at addr ffff8881663cbca1 by task syz.0.17/5964

CPU: 0 UID: 0 PID: 5964 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xba/0x230 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 string_nocheck lib/vsprintf.c:654 [inline]
 string+0x231/0x2b0 lib/vsprintf.c:736
 vsnprintf+0x739/0xee0 lib/vsprintf.c:2947
 va_format lib/vsprintf.c:1722 [inline]
 pointer+0x9b7/0x11f0 lib/vsprintf.c:2568
 vsnprintf+0x614/0xee0 lib/vsprintf.c:2951
 vprintk_store+0x371/0xd50 kernel/printk/printk.c:2255
 vprintk_emit+0x192/0x560 kernel/printk/printk.c:2402
 _printk+0xdd/0x130 kernel/printk/printk.c:2451
 __dynamic_pr_debug+0x1a2/0x260 lib/dynamic_debug.c:879
 nfc_llcp_wks_sap net/nfc/llcp_core.c:344 [inline]
 nfc_llcp_get_sdp_ssap+0x3a5/0x440 net/nfc/llcp_core.c:420
 llcp_sock_bind+0x3d6/0x780 net/nfc/llcp_sock.c:114
 __sys_bind_socket net/socket.c:1874 [inline]
 __sys_bind+0x2e3/0x410 net/socket.c:1905
 __do_sys_bind net/socket.c:1910 [inline]
 __se_sys_bind net/socket.c:1908 [inline]
 __x64_sys_bind+0x7a/0x90 net/socket.c:1908
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf40 arch/x86/entry/syscall_64.c:98
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5158f9c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe7ca57bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007f5159215fa0 RCX: 00007f5158f9c799
RDX: 0000000000000060 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007f5159032c99 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5159215fac R14: 00007f5159215fa0 R15: 00007f5159215fa0
 </TASK>

Allocated by task 5964:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5657 [inline]
 __kmalloc_node_track_caller_noprof+0x558/0x7f0 mm/slub.c:5768
 kmemdup_noprof+0x2b/0x70 mm/util.c:138
 kmemdup_noprof include/linux/fortify-string.h:765 [inline]
 llcp_sock_bind+0x392/0x780 net/nfc/llcp_sock.c:107
 __sys_bind_socket net/socket.c:1874 [inline]
 __sys_bind+0x2e3/0x410 net/socket.c:1905
 __do_sys_bind net/socket.c:1910 [inline]
 __se_sys_bind net/socket.c:1908 [inline]
 __x64_sys_bind+0x7a/0x90 net/socket.c:1908
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf40 arch/x86/entry/syscall_64.c:98
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8881663cbca0
 which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes to the right of
 allocated 1-byte region [ffff8881663cbca0, ffff8881663cbca1)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881663cb7e0 pfn:0x1663cb
anon flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff888100041500 0000000000000000 dead000000000001
raw: ffff8881663cb7e0 0000000080800078 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 3460727604, free_ts 3423843770
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x228/0x280 mm/page_alloc.c:1884
 prep_new_page mm/page_alloc.c:1892 [inline]
 get_page_from_freelist+0x24dc/0x2580 mm/page_alloc.c:3945
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5240
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2486
 alloc_slab_page mm/slub.c:3075 [inline]
 allocate_slab+0x86/0x3a0 mm/slub.c:3248
 new_slab mm/slub.c:3302 [inline]
 ___slab_alloc+0xd82/0x1760 mm/slub.c:4656
 __slab_alloc+0x65/0x100 mm/slub.c:4779
 __slab_alloc_node mm/slub.c:4855 [inline]
 slab_alloc_node mm/slub.c:5251 [inline]
 __do_kmalloc_node mm/slub.c:5656 [inline]
 __kmalloc_noprof+0x46c/0x7e0 mm/slub.c:5669
 kmalloc_noprof include/linux/slab.h:961 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 acpi_ns_internalize_name+0x2c9/0x3e0 drivers/acpi/acpica/nsutils.c:331
 acpi_ns_get_node_unlocked+0x186/0x480 drivers/acpi/acpica/nsutils.c:666
 acpi_ns_get_node+0x76/0xc0 drivers/acpi/acpica/nsutils.c:726
 acpi_ns_evaluate+0x283/0x1230 drivers/acpi/acpica/nseval.c:62
 acpi_evaluate_object+0x657/0xd50 drivers/acpi/acpica/nsxfeval.c:354
 acpi_get_physical_device_location+0xa0/0x2d0 drivers/acpi/utils.c:504
 acpi_store_pld_crc drivers/acpi/scan.c:728 [inline]
 acpi_device_add+0x6c4/0x940 drivers/acpi/scan.c:787
 acpi_add_single_object+0x1621/0x1b70 drivers/acpi/scan.c:1910
page last free pid 1 tgid 1 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1433 [inline]
 __free_frozen_pages+0xbf8/0xd70 mm/page_alloc.c:2973
 discard_slab mm/slub.c:3346 [inline]
 __put_partials+0x146/0x170 mm/slub.c:3886
 __slab_free+0x294/0x320 mm/slub.c:5956
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
 kasan_quarantine_remove_cache+0x1ca/0x360 mm/kasan/quarantine.c:364
 kmem_cache_shrink+0xd/0x20 mm/slab_common.c:564
 acpi_os_purge_cache+0x15/0x20 drivers/acpi/osl.c:1605
 acpi_purge_cached_objects+0xd5/0x100 drivers/acpi/acpica/utxface.c:240
 acpi_initialize_objects+0x2e/0xb0 drivers/acpi/acpica/utxfinit.c:250
 acpi_bus_init+0xaf/0x570 drivers/acpi/bus.c:1367
 acpi_init+0xa1/0x1f0 drivers/acpi/bus.c:1456
 do_one_initcall+0x250/0x840 init/main.c:1378
 do_initcall_level+0x104/0x190 init/main.c:1440
 do_initcalls+0x59/0xa0 init/main.c:1456
 kernel_init_freeable+0x2a6/0x3d0 init/main.c:1688
 kernel_init+0x1d/0x1d0 init/main.c:1578

Memory state around the buggy address:
 ffff8881663cbb80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
 ffff8881663cbc00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
>ffff8881663cbc80: fa fc fc fc 01 fc fc fc 00 fc fc fc fa fc fc fc
                               ^
 ffff8881663cbd00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
 ffff8881663cbd80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
==================================================================


***

KASAN: slab-use-after-free Read in __dynamic_dev_dbg

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      f9d6fc9557e68b48253818870d002dc4784cb2f1
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/c63282ed-913a-4673-b0e4-cd21246874b2/config
syz repro: https://ci.syzbot.org/findings/a49a383b-f0f8-498a-9415-6a927fc7d4b7/syz_repro

==================================================================
BUG: KASAN: slab-use-after-free in dev_driver_string+0x35/0xd0 drivers/base/core.c:2406
Read of size 8 at addr ffff8881137960e0 by task syz.0.17/6043

CPU: 1 UID: 0 PID: 6043 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xba/0x230 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 dev_driver_string+0x35/0xd0 drivers/base/core.c:2406
 __dynamic_dev_dbg+0x1ae/0x2e0 lib/dynamic_debug.c:906
 display_close+0x1f9/0x240 drivers/media/rc/imon.c:576
 __fput+0x44f/0xa70 fs/file_table.c:469
 task_work_run+0x1d9/0x270 kernel/task_work.c:233
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0x69b/0x2310 kernel/exit.c:971
 do_group_exit+0x21b/0x2d0 kernel/exit.c:1112
 get_signal+0x1284/0x1330 kernel/signal.c:3034
 arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
 __exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
 exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:75
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
 do_syscall_64+0x2b7/0xf40 arch/x86/entry/syscall_64.c:104
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb43639c799
Code: Unable to access opcode bytes at 0x7fb43639c76f.
RSP: 002b:00007fb4372890e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007fb436616188 RCX: 00007fb43639c799
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fb436616188
RBP: 00007fb436616180 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb436616218 R14: 00007ffeece49910 R15: 00007ffeece499f8
 </TASK>

Allocated by task 5990:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x3d1/0x6e0 mm/slub.c:5780
 kmalloc_noprof include/linux/slab.h:957 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 usb_set_configuration+0x3c9/0x2110 drivers/usb/core/message.c:2037
 usb_generic_driver_probe+0x8d/0x150 drivers/usb/core/generic.c:250
 usb_probe_device+0x1c4/0x3b0 drivers/usb/core/driver.c:291
 call_driver_probe drivers/base/dd.c:-1 [inline]
 really_probe+0x267/0xaf0 drivers/base/dd.c:661
 __driver_probe_device+0x18c/0x320 drivers/base/dd.c:803
 driver_probe_device+0x4f/0x240 drivers/base/dd.c:833
 __device_attach_driver+0x279/0x430 drivers/base/dd.c:961
 bus_for_each_drv+0x258/0x2f0 drivers/base/bus.c:500
 __device_attach+0x2c5/0x450 drivers/base/dd.c:1033
 device_initial_probe+0xa1/0xd0 drivers/base/dd.c:1088
 bus_probe_device+0x12a/0x220 drivers/base/bus.c:574
 device_add+0x7b6/0xb70 drivers/base/core.c:3689
 usb_new_device+0xa08/0x16f0 drivers/usb/core/hub.c:2695
 hub_port_connect drivers/usb/core/hub.c:5567 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5707 [inline]
 port_event drivers/usb/core/hub.c:5871 [inline]
 hub_event+0x2a1c/0x4f30 drivers/usb/core/hub.c:5953
 process_one_work kernel/workqueue.c:3257 [inline]
 process_scheduled_works+0xaec/0x17a0 kernel/workqueue.c:3340
 worker_thread+0xda6/0x1360 kernel/workqueue.c:3421
 kthread+0x726/0x8b0 kernel/kthread.c:463
 ret_from_fork+0x51b/0xa40 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Freed by task 9:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2540 [inline]
 slab_free mm/slub.c:6674 [inline]
 kfree+0x1be/0x650 mm/slub.c:6886
 device_release+0x9e/0x1d0 drivers/base/core.c:-1
 kobject_cleanup lib/kobject.c:689 [inline]
 kobject_release lib/kobject.c:720 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x228/0x560 lib/kobject.c:737
 usb_disable_device+0x611/0x8d0 drivers/usb/core/message.c:1425
 usb_disconnect+0x32f/0x990 drivers/usb/core/hub.c:2345
 hub_port_connect drivers/usb/core/hub.c:5407 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5707 [inline]
 port_event drivers/usb/core/hub.c:5871 [inline]
 hub_event+0x1cc9/0x4f30 drivers/usb/core/hub.c:5953
 process_one_work kernel/workqueue.c:3257 [inline]
 process_scheduled_works+0xaec/0x17a0 kernel/workqueue.c:3340
 worker_thread+0xda6/0x1360 kernel/workqueue.c:3421
 kthread+0x726/0x8b0 kernel/kthread.c:463
 ret_from_fork+0x51b/0xa40 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

The buggy address belongs to the object at ffff888113796000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 224 bytes inside of
 freed 2048-byte region [ffff888113796000, ffff888113796800)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113790
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888100042000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 017ff00000000040 ffff888100042000 dead000000000122 0000000000000000
head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 017ff00000000003 ffffea00044de401 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 6019, tgid 6019 (kworker/0:5), ts 128207296876, free_ts 125356693648
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x228/0x280 mm/page_alloc.c:1884
 prep_new_page mm/page_alloc.c:1892 [inline]
 get_page_from_freelist+0x24dc/0x2580 mm/page_alloc.c:3945
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5240
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2486
 alloc_slab_page mm/slub.c:3075 [inline]
 allocate_slab+0x86/0x3a0 mm/slub.c:3248
 new_slab mm/slub.c:3302 [inline]
 ___slab_alloc+0xd82/0x1760 mm/slub.c:4656
 __slab_alloc+0x65/0x100 mm/slub.c:4779
 __slab_alloc_node mm/slub.c:4855 [inline]
 slab_alloc_node mm/slub.c:5251 [inline]
 __do_kmalloc_node mm/slub.c:5656 [inline]
 __kmalloc_node_track_caller_noprof+0x5b7/0x7f0 mm/slub.c:5768
 kmalloc_reserve+0x136/0x290 net/core/skbuff.c:608
 __alloc_skb+0x204/0x390 net/core/skbuff.c:690
 alloc_skb include/linux/skbuff.h:1383 [inline]
 mld_newpack+0x14c/0xc90 net/ipv6/mcast.c:1775
 add_grhead+0x5a/0x2a0 net/ipv6/mcast.c:1886
 add_grec+0x1452/0x1740 net/ipv6/mcast.c:2025
 mld_send_cr net/ipv6/mcast.c:2148 [inline]
 mld_ifc_work+0x6e6/0xe70 net/ipv6/mcast.c:2693
 process_one_work kernel/workqueue.c:3257 [inline]
 process_scheduled_works+0xaec/0x17a0 kernel/workqueue.c:3340
 worker_thread+0xda6/0x1360 kernel/workqueue.c:3421
page last free pid 5243 tgid 5243 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1433 [inline]
 __free_frozen_pages+0xbf8/0xd70 mm/page_alloc.c:2973
 discard_slab mm/slub.c:3346 [inline]
 __put_partials+0x146/0x170 mm/slub.c:3886
 __slab_free+0x294/0x320 mm/slub.c:5956
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
 kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4953 [inline]
 slab_alloc_node mm/slub.c:5263 [inline]
 kmem_cache_alloc_node_noprof+0x427/0x6f0 mm/slub.c:5315
 __alloc_skb+0x1d7/0x390 net/core/skbuff.c:679
 alloc_skb include/linux/skbuff.h:1383 [inline]
 alloc_skb_with_frags+0xca/0x890 net/core/skbuff.c:6715
 sock_alloc_send_pskb+0x878/0x990 net/core/sock.c:2995
 unix_dgram_sendmsg+0x460/0x18e0 net/unix/af_unix.c:2130
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 __sys_sendto+0x709/0x7a0 net/socket.c:2206
 __do_sys_sendto net/socket.c:2213 [inline]
 __se_sys_sendto net/socket.c:2209 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2209
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf40 arch/x86/entry/syscall_64.c:98
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff888113795f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888113796000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888113796080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff888113796100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888113796180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.