kernel/bpf/arraymap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Add a missing cond_resched() in bpf_fd_array_map_clear() loop.
For PROG_ARRAY maps with many entries this loop calls
prog_array_map_poke_run() per entry which can be expensive, and
without yielding this can cause RCU stalls under load:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: (detected by 0, t=6502 jiffies, g=729293, q=305 ncpus=1)
rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4295096514-4295090012), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 6502 jiffies! g729293 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:0 pid:15 tgid:15 ppid:2 task_flags:0x208040 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x697/0x1430 kernel/sched/core.c:6767
__schedule_loop kernel/sched/core.c:6845 [inline]
schedule+0x10a/0x3e0 kernel/sched/core.c:6860
schedule_timeout+0x145/0x2c0 kernel/time/sleep_timeout.c:99
rcu_gp_fqs_loop+0x255/0x1350 kernel/rcu/tree.c:2046
rcu_gp_kthread+0x347/0x680 kernel/rcu/tree.c:2248
kthread+0x465/0x880 kernel/kthread.c:464
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x19/0x30 arch/x86/entry/entry_64.S:245
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 0 UID: 0 PID: 30932 Comm: kworker/0:2 Not tainted 6.14.0-13195-g967e8def1100 #2 PREEMPT(undef)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events prog_array_map_clear_deferred
RIP: 0010:write_comp_data+0x38/0x90 kernel/kcov.c:246
Call Trace:
<TASK>
prog_array_map_poke_run+0x77/0x380 kernel/bpf/arraymap.c:1096
__fd_array_map_delete_elem+0x197/0x310 kernel/bpf/arraymap.c:925
bpf_fd_array_map_clear kernel/bpf/arraymap.c:1000 [inline]
prog_array_map_clear_deferred+0x119/0x1b0 kernel/bpf/arraymap.c:1141
process_one_work+0x898/0x19d0 kernel/workqueue.c:3238
process_scheduled_works kernel/workqueue.c:3319 [inline]
worker_thread+0x770/0x10b0 kernel/workqueue.c:3400
kthread+0x465/0x880 kernel/kthread.c:464
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x19/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Fixes: da765a2f5993 ("bpf: Add poke dependency tracking for prog array maps")
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
---
kernel/bpf/arraymap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 33de68c95..5e25e0353 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1015,8 +1015,10 @@ static void bpf_fd_array_map_clear(struct bpf_map *map, bool need_defer)
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;
- for (i = 0; i < array->map.max_entries; i++)
+ for (i = 0; i < array->map.max_entries; i++) {
__fd_array_map_delete_elem(map, &i, need_defer);
+ cond_resched();
+ }
}
static void prog_array_map_seq_show_elem(struct bpf_map *map, void *key,
--
2.43.0
On 31/3/26 10:30, Sechang Lim wrote:
[...]
> Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
> Fixes: da765a2f5993 ("bpf: Add poke dependency tracking for prog array maps")
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> ---
After looking at v2, there's no functional change for v2 -> v3.
I think, you should send a PING in v2 after some days instead of sending
v3. If v2 will be applied, the tag will be picked up btw.
Besides, change logs are missing here.
v2 -> v3:
* ...
v2: [its lore link]
v1 -> v2:
* ...
v1: [its lore link]
Also, you should check sashiko's review [1].
[1]
https://sashiko.dev/#/patchset/20260331023056.484354-1-rhkrqnwk98%40gmail.com
> kernel/bpf/arraymap.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 33de68c95..5e25e0353 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -1015,8 +1015,10 @@ static void bpf_fd_array_map_clear(struct bpf_map *map, bool need_defer)
> struct bpf_array *array = container_of(map, struct bpf_array, map);
> int i;
>
> - for (i = 0; i < array->map.max_entries; i++)
> + for (i = 0; i < array->map.max_entries; i++) {
> __fd_array_map_delete_elem(map, &i, need_defer);
> + cond_resched();
Since bpf_fd_array_map_clear() is used across prog_array,
perf_event_array, cgroup_array, and array_of_map, and this patch aims to
avoid RCU stalls for prog_array, does this cond_resched() punish
perf_event_array, cgroup_array, and array_of_map?
Thanks,
Leon
> + }
> }
>
> static void prog_array_map_seq_show_elem(struct bpf_map *map, void *key,
On 31/3/26 13:19, Leon Hwang wrote: > After looking at v2, there's no functional change for v2 -> v3. > > I think, you should send a PING in v2 after some days instead of sending > v3. If v2 will be applied, the tag will be picked up btw. > > Besides, change logs are missing here. You're right, I should have just pinged v2 instead of sending v3. The only change was fixing a CC typo (eddyz78 -> eddyz87), no functional change. Apologies for the missing changelog as well. > Since bpf_fd_array_map_clear() is used across prog_array, > perf_event_array, cgroup_array, and array_of_map, and this patch aims to > avoid RCU stalls for prog_array, does this cond_resched() punish > perf_event_array, cgroup_array, and array_of_map? map_poke_run is only set in prog_array_map_ops, so the expensive path (poke_mutex + map_poke_run) in __fd_array_map_delete_elem() is exclusive to prog_array. For perf_event_array, cgroup_array, and array_of_map, each iteration is just xchg + put_ptr, which is lightweight enough that cond_resched() will not trigger rescheduling in practice. Thanks, Sechang
© 2016 - 2026 Red Hat, Inc.