From nobody Sun Feb 8 05:40:39 2026 Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D0D530E846 for ; Mon, 12 Jan 2026 09:39:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.70 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768210746; cv=none; b=camUANaLz1XYVN2tEcyWSxfS9nfECFLgBoWV3XJjwwDZdSRjJBGvMEj18tODGqcG9LAufjetQDmqMhZexLdqU9j6otNnQthO6E3wFy0IjvG3riOjR7kK9eSKTkht5doJ9efet5w4VhqTQHkiRL8ctFvRPxXQy1Bb/XyN5EU5rpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768210746; c=relaxed/simple; bh=+H9qh6IB05hjH/qOYF7VfJKmnRRbr4cKXtfPfWTExj0=; h=MIME-Version:Date:In-Reply-To:Message-ID:Subject:From:To: Content-Type; b=jqFxjJUwrpz9AfcOk0VfvP9Z6+zVfYnbxscK2JOXd1fsUjA+fSyVMTOyBJa7qoW1H6HRJ6JtFRncgtqdE/EsZwDHFSHW0HIcK2OG2Oflja7FdQvPEnJgV/8UgbLdm+r9vNXCfPgENJTzP++IzcbzVZAdgnbOdjkfNaL8l79gAhU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=syzkaller.appspotmail.com; spf=pass smtp.mailfrom=M3KW2WVRGUFZ5GODRSRYTGD7.apphosting.bounces.google.com; arc=none smtp.client-ip=209.85.210.70 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=syzkaller.appspotmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=M3KW2WVRGUFZ5GODRSRYTGD7.apphosting.bounces.google.com Received: by mail-ot1-f70.google.com with SMTP id 46e09a7af769-7ce50c65d7dso8375654a34.2 for ; Mon, 12 Jan 2026 01:39:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768210743; x=1768815543; h=content-transfer-encoding:to:from:subject:message-id:in-reply-to :date:mime-version:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+H9qh6IB05hjH/qOYF7VfJKmnRRbr4cKXtfPfWTExj0=; b=VP92J2FK8/LMK1BCwK9BPW3Yio1wALbEl0YmgaJRGK7sL5aKUTNb3KlY64rrQtU8Bv dA8kVdxax3GHpf4yyNjlzunaQy5vYdb0oRbqwR2rip3legVZHb/Gm8il6KkoM+i0K+aW rehHyGO+8mjVrR2/ofb6R2hNUFJuDuMIujhrl3SfXJ6NaY3gE9KpKwQ5kFsHdx0naZXT hF5nRWolXoJeeyQrIWn9WxxomyB6GXGlorlyE14/LpanHo9k083nKXSJOHusSUGtnMy8 APlDD4yheJpeqRR4VUdaZOLTXIHO3PkiBWGO+9KyAnRmzIcpnPjNfFLH6G++cV1i4HcN rmdA== X-Gm-Message-State: AOJu0YyotkVRt3jDSFJjWbLIJrZJtPa5qyPCh62zqpgHsoWLk2ykRyB7 Y1XE8vI1pLMaUp5QtnzeiO7MfP0VG2UUHUUktAsTzSY+y1Sc7egL03vw7ZbkWnDqe5Gh2VARDHv u/HUad2D3NrVOUdfxRpNqN2hJVTxZGpbQ4NI7mUDrV4ZpxngOWsYQbbJGORQ= X-Google-Smtp-Source: AGHT+IFXS//lWEWJiVSZWTSMddeBpQVfQiH5HyTn7blqOqD0tG2D+h1z+HAsofI6/W0isj42/iRGK3AwiT73ytllcAg0qY/W7R/E Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Received: by 2002:a05:6820:80e:b0:65d:2d2:6d4c with SMTP id 006d021491bc7-65f54f1a1afmr8112888eaf.22.1768210743364; Mon, 12 Jan 2026 01:39:03 -0800 (PST) Date: Mon, 12 Jan 2026 01:39:03 -0800 In-Reply-To: <696487a4.050a0220.eaf7.0085.GAE@google.com> X-Google-Appengine-App-Id: s~syzkaller X-Google-Appengine-App-Id-Alias: syzkaller Message-ID: <6964c137.050a0220.eaf7.0097.GAE@google.com> Subject: Forwarded: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node From: syzbot To: linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For archival purposes, forwarding an incoming command email to linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com. *** Subject: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall= in purge_vmap_node Author: kapoorarnav43@gmail.com From: Arnav Kapoor Date: Sun, 12 Jan 2026 15:30:00 +0000 Subject: [PATCH] mm/kasan: add cond_resched() in shadow page table walk #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git=20 master Syzbot reported RCU stalls during vmalloc cleanup: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks task:kworker/0:17 state:R running task purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299 When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during vmalloc cleanup triggers expensive stack unwinding via save_stack() -> unwind_next_frame(), which acquires RCU read locks. Processing large vmalloc regions can free thousands of shadow pages without yielding, causing the worker to monopolize CPU for 10+ seconds, leading to RCU stalls and potential OOM. The issue occurs in this call chain: purge_vmap_node() -> kasan_release_vmalloc_node() -> kasan_release_vmalloc() [for each vmap_area] -> __kasan_release_vmalloc() -> apply_to_existing_page_range() -> kasan_depopulate_vmalloc_pte() [for each PTE] -> __free_page() -> __reset_page_owner() [CONFIG_PAGE_OWNER] -> save_stack() -> unwind_next_frame() [RCU read lock held] Each shadow page free triggers stack unwinding under RCU lock. A single large vmalloc region can have thousands of shadow pages, creating an unbounded RCU critical section. The previous attempt to fix this added cond_resched() between processing each vmap_area in kasan_release_vmalloc_node(), but that's insufficient because a single vmap_area can still contain many pages. Fix this by adding cond_resched() in the page table walk callback kasan_depopulate_vmalloc_pte() after every 32 pages. This ensures regular scheduling points during large shadow region depopulation while minimizing overhead for typical cases. The batch size of 32 is chosen to: - Amortize cond_resched() overhead (typically ~100ns) over multiple pages - Limit worst-case non-preemptible time to ~3ms on typical hardware (32 pages =C3=97 ~100=CE=BCs per stack unwind) - Match common TLB and cache behavior Note: We can't use need_resched() alone because under light CPU load, need_resched() may remain false while RCU grace periods starve. The batch count provides a guaranteed upper bound. Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3Dd8d4c31d40f868eaea30 Signed-off-by: Arnav Kapoor --- mm/kasan/shadow.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 000000000000..111111111111 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -468,9 +468,23 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep,=20 unsigned long addr, void *unused) { pte_t pte; int none; + static DEFINE_PER_CPU(unsigned int, depopulate_batch_count); + unsigned int *batch =3D this_cpu_ptr(&depopulate_batch_count); arch_leave_lazy_mmu_mode(); + /* + * With CONFIG_PAGE_OWNER, each page free triggers expensive stack + * unwinding under RCU lock. Yield periodically to prevent RCU stalls + * when processing large vmalloc regions with thousands of shadow pages. + */ + if (++(*batch) >=3D 32) { + *batch =3D 0; + cond_resched(); + arch_enter_lazy_mmu_mode(); + } + spin_lock(&init_mm.page_table_lock); pte =3D ptep_get(ptep); none =3D pte_none(pte); On Monday, 12 January 2026 at 14:10:07 UTC+5:30 syzbot wrote: Hello,=20 syzbot has tested the proposed patch but the reproducer is still triggering=20 an issue:=20 INFO: rcu detected stall in unwind_next_frame=20 rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:=20 rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6892/1:b..l=20 P6893/3:b..l P6746/1:b..l=20 rcu: (detected by 1, t=3D10502 jiffies, g=3D16737, q=3D586 ncpus=3D2)=20 task:kworker/u8:18 state:R running task stack:24088 pid:6746 tgid:6746=20 ppid:2 task_flags:0x4208060 flags:0x00080000=20 Workqueue: kvfree_rcu_reclaim kfree_rcu_monitor=20 Call Trace:=20 =20 context_switch kernel/sched/core.c:5256 [inline]=20 __schedule+0x1139/0x6150 kernel/sched/core.c:6863=20 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190=20 irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216=20 asm_sysvec_apic_timer_interrupt+0x1a/0x20=20 arch/x86/include/asm/idtentry.h:697=20 RIP: 0010:lock_acquire+0x62/0x330 kernel/locking/lockdep.c:5872=20 Code: b4 18 12 83 f8 07 0f 87 a2 02 00 00 89 c0 48 0f a3 05 e2 c1 ee 0e 0f=20 82 74 02 00 00 8b 35 7a f2 ee 0e 85 f6 0f 85 8d 00 00 00 <48> 8b 44 24 30=20 65 48 2b 05 f9 b3 18 12 0f 85 ad 02 00 00 48 83 c4=20 RSP: 0018:ffffc90003fbf5b8 EFLAGS: 00000206=20 RAX: 0000000000000046 RBX: ffffffff8e3c96a0 RCX: 00000000993b8195=20 RDX: 0000000000000000 RSI: ffffffff8daa8a1d RDI: ffffffff8bf2b400=20 RBP: 0000000000000002 R08: 00000000e61a05bb R09: 00000000be61a05b=20 R10: 0000000000000002 R11: ffff888029058b30 R12: 0000000000000000=20 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000=20 rcu_lock_acquire include/linux/rcupdate.h:331 [inline]=20 rcu_read_lock include/linux/rcupdate.h:867 [inline]=20 class_rcu_constructor include/linux/rcupdate.h:1195 [inline]=20 unwind_next_frame+0xd1/0x20b0 arch/x86/kernel/unwind_orc.c:495=20 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25=20 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122=20 kasan_save_stack+0x33/0x60 mm/kasan/common.c:57=20 kasan_save_track+0x14/0x30 mm/kasan/common.c:78=20 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584=20 poison_slab_object mm/kasan/common.c:253 [inline]=20 __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:285=20 kasan_slab_free include/linux/kasan.h:235 [inline]=20 slab_free_hook mm/slub.c:2540 [inline]=20 slab_free_freelist_hook mm/slub.c:2569 [inline]=20 slab_free_bulk mm/slub.c:6703 [inline]=20 kmem_cache_free_bulk mm/slub.c:7390 [inline]=20 kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7369=20 kfree_bulk include/linux/slab.h:830 [inline]=20 kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523=20 kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline]=20 kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801=20 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257=20 process_scheduled_works kernel/workqueue.c:3340 [inline]=20 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421=20 kthread+0x3c5/0x780 kernel/kthread.c:463=20 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158=20 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246=20 =20 task:sed state:R running task stack:25800 pid:6893 tgid:6893 ppid:6890=20 task_flags:0x400000 flags:0x00080000=20 Call Trace:=20 =20 context_switch kernel/sched/core.c:5256 [inline]=20 __schedule+0x1139/0x6150 kernel/sched/core.c:6863=20 preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047=20 preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12=20 __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]=20 _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186=20 spin_unlock include/linux/spinlock.h:391 [inline]=20 filemap_map_pages+0x1194/0x1e00 mm/filemap.c:3931=20 do_fault_around mm/memory.c:5713 [inline]=20 do_read_fault mm/memory.c:5746 [inline]=20 do_fault+0x9cd/0x1ad0 mm/memory.c:5889=20 do_pte_missing mm/memory.c:4401 [inline]=20 handle_pte_fault mm/memory.c:6273 [inline]=20 __handle_mm_fault+0x1919/0x2bb0 mm/memory.c:6411=20 handle_mm_fault+0x3fe/0xad0 mm/memory.c:6580=20 do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336=20 handle_page_fault arch/x86/mm/fault.c:1476 [inline]=20 exc_page_fault+0x64/0xc0 arch/x86/mm/fault.c:1532=20 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618=20 RIP: 0033:0x7fc6d7657c50=20 RSP: 002b:00007ffe6008c528 EFLAGS: 00010246=20 RAX: 0000000000000000 RBX: 00007fc6d768e490 RCX: 00007ffe6008c560=20 RDX: 00007fc6d7689d63 RSI: 00007fc6d7689d36 RDI: 00007ffe6008c748=20 RBP: 0000000000000041 R08: 00007ffe6008c550 R09: 00007ffe6008c558=20 R10: 0000000000000004 R11: 0000000000000246 R12: 00007ffe6008c748=20 R13: 00007ffe6008c550 R14: 00007fc6d76ce000 R15: 00005602504c1d98=20 =20 task:udevd state:R running task stack:28152 pid:6892 tgid:6892 ppid:5186=20 task_flags:0x400140 flags:0x00080000=20 Call Trace:=20 =20 context_switch kernel/sched/core.c:5256 [inline]=20 __schedule+0x1139/0x6150 kernel/sched/core.c:6863=20 preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047=20 preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12=20 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]=20 _raw_spin_unlock_irqrestore+0x61/0x80 kernel/locking/spinlock.c:194=20 sock_def_readable+0x15b/0x5d0 net/core/sock.c:3611=20 unix_dgram_sendmsg+0xcbd/0x1830 net/unix/af_unix.c:2286=20 sock_sendmsg_nosec net/socket.c:727 [inline]=20 __sock_sendmsg net/socket.c:742 [inline]=20 sock_write_iter+0x566/0x610 net/socket.c:1195=20 new_sync_write fs/read_write.c:593 [inline]=20 vfs_write+0x7d3/0x11d0 fs/read_write.c:686=20 ksys_write+0x1f8/0x250 fs/read_write.c:738=20 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]=20 do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94=20 entry_SYSCALL_64_after_hwframe+0x77/0x7f=20 RIP: 0033:0x7f6502bdf407=20 RSP: 002b:00007ffc4b535850 EFLAGS: 00000202 ORIG_RAX: 0000000000000001=20 RAX: ffffffffffffffda RBX: 00007f6502b53880 RCX: 00007f6502bdf407=20 RDX: 0000000000000000 RSI: 00007ffc4b5358f7 RDI: 000000000000000a=20 RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000000=20 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f6502b536e8=20 R13: 0000000000000000 R14: 0000000000000000 R15: 000055685a0a1150=20 =20 rcu: rcu_preempt kthread starved for 10572 jiffies! g16737 f0x0=20 RCU_GP_WAIT_FQS(5) ->state=3D0x0 ->cpu=3D0=20 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now=20 expected behavior.=20 rcu: RCU grace-period kthread stack dump:=20 task:rcu_preempt state:R running task stack:28120 pid:16 tgid:16 ppid:2=20 task_flags:0x208040 flags:0x00080000=20 Call Trace:=20 =20 context_switch kernel/sched/core.c:5256 [inline]=20 __schedule+0x1139/0x6150 kernel/sched/core.c:6863=20 __schedule_loop kernel/sched/core.c:6945 [inline]=20 schedule+0xe7/0x3a0 kernel/sched/core.c:6960=20 schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99=20 rcu_gp_fqs_loop+0x1ea/0xaf0 kernel/rcu/tree.c:2083=20 rcu_gp_kthread+0x26d/0x380 kernel/rcu/tree.c:2285=20 kthread+0x3c5/0x780 kernel/kthread.c:463=20 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158=20 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246=20 =20 rcu: Stack dump where RCU GP kthread last ran:=20 Sending NMI from CPU 1 to CPUs 0:=20 NMI backtrace for cpu 0=20 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full)=20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS=20 Google 10/25/2025=20 RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:82=20 Code: a6 5f 02 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90=20 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 13 19 12 00 fb f4 cc 35 03 00=20 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90=20 RSP: 0018:ffffffff8e007df8 EFLAGS: 000002c6=20 RAX: 0000000000186859 RBX: 0000000000000000 RCX: ffffffff8b7846d9=20 RDX: 0000000000000000 RSI: ffffffff8daceab2 RDI: ffffffff8bf2b400=20 RBP: fffffbfff1c12f68 R08: 0000000000000001 R09: ffffed101708673d=20 R10: ffff8880b84339eb R11: ffffffff8e098670 R12: 0000000000000000=20 R13: ffffffff8e097b40 R14: ffffffff9088bdd0 R15: 0000000000000000=20 FS: 0000000000000000(0000) GS:ffff8881248f5000(0000) knlGS:0000000000000000=20 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033=20 CR2: 000055555eaef588 CR3: 00000000522b3000 CR4: 00000000003526f0=20 Call Trace:=20 =20 arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]=20 default_idle+0x13/0x20 arch/x86/kernel/process.c:767=20 default_idle_call+0x6c/0xb0 kernel/sched/idle.c:122=20 cpuidle_idle_call kernel/sched/idle.c:191 [inline]=20 do_idle+0x38d/0x510 kernel/sched/idle.c:332=20 cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:430=20 rest_init+0x16b/0x2b0 init/main.c:757=20 start_kernel+0x3ef/0x4d0 init/main.c:1206=20 x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:310=20 x86_64_start_kernel+0x130/0x190 arch/x86/kernel/head64.c:291=20 common_startup_64+0x13e/0x148=20 =20 Tested on:=20 commit: 0f61b186 Linux 6.19-rc5=20 git tree: upstream=20 console output: https://syzkaller.appspot.com/x/log.txt?x=3D1397199a580000=20 kernel config: https://syzkaller.appspot.com/x/.config?x=3D1859476832863c41=20 dashboard link: https://syzkaller.appspot.com/bug?extid=3Dd8d4c31d40f868eae= a30=20 compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for=20 Debian) 2.40=20 patch: https://syzkaller.appspot.com/x/patch.diff?x=3D1704399a580000=20