include/linux/rcupdate.h | 23 +++++++++++++++++++++++ kernel/bpf/cpumap.c | 2 ++ net/core/dev.c | 3 +++ 3 files changed, 28 insertions(+)
This changeset fixes a common problem for busy networking kthreads.
These threads, e.g. NAPI threads, typically will do:
* polling a batch of packets
* if there are more work, call cond_resched to allow scheduling
* continue to poll more packets when rx queue is not empty
We observed this being a problem in production, since it can block RCU
tasks from making progress under heavy load. Investigation indicates
that just calling cond_resched is insufficient for RCU tasks to reach
quiescent states. This at least affects NAPI threads, napi_busy_loop, and
also cpumap kthread for now.
By reporting RCU QSes in these kthreads periodically before
cond_resched, the blocked RCU waiters can correctly progress. Instead of
just reporting QS for RCU tasks, these code share the same concern as
noted in the commit d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched
and RCU-preempt when safe"). So report a consolidated QS for safety.
It is worth noting that, although this problem is reproducible in
napi_busy_loop, it only shows up when setting the polling interval to as
high as 2ms, which is far larger than recommended 50us-100us in the
documentation. So napi_busy_loop is left untouched.
V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
changes since v2:
* created a helper in rcu header to abstract the behavior
* fixed cpumap kthread in addition
changes since v1:
* disable preemption first as Paul McKenney suggested
Yan Zhai (3):
rcu: add a helper to report consolidated flavor QS
net: report RCU QS on threaded NAPI repolling
bpf: report RCU QS in cpumap kthread
include/linux/rcupdate.h | 23 +++++++++++++++++++++++
kernel/bpf/cpumap.c | 2 ++
net/core/dev.c | 3 +++
3 files changed, 28 insertions(+)
--
2.30.2
Yan Zhai <yan@cloudflare.com> writes:
> This changeset fixes a common problem for busy networking kthreads.
> These threads, e.g. NAPI threads, typically will do:
>
> * polling a batch of packets
> * if there are more work, call cond_resched to allow scheduling
> * continue to poll more packets when rx queue is not empty
>
> We observed this being a problem in production, since it can block RCU
> tasks from making progress under heavy load. Investigation indicates
> that just calling cond_resched is insufficient for RCU tasks to reach
> quiescent states. This at least affects NAPI threads, napi_busy_loop, and
> also cpumap kthread for now.
>
> By reporting RCU QSes in these kthreads periodically before
> cond_resched, the blocked RCU waiters can correctly progress. Instead of
> just reporting QS for RCU tasks, these code share the same concern as
> noted in the commit d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched
> and RCU-preempt when safe"). So report a consolidated QS for safety.
>
> It is worth noting that, although this problem is reproducible in
> napi_busy_loop, it only shows up when setting the polling interval to as
> high as 2ms, which is far larger than recommended 50us-100us in the
> documentation. So napi_busy_loop is left untouched.
>
> V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
> V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
>
> changes since v2:
> * created a helper in rcu header to abstract the behavior
> * fixed cpumap kthread in addition
>
> changes since v1:
> * disable preemption first as Paul McKenney suggested
>
> Yan Zhai (3):
> rcu: add a helper to report consolidated flavor QS
> net: report RCU QS on threaded NAPI repolling
> bpf: report RCU QS in cpumap kthread
>
> include/linux/rcupdate.h | 23 +++++++++++++++++++++++
> kernel/bpf/cpumap.c | 2 ++
> net/core/dev.c | 3 +++
> 3 files changed, 28 insertions(+)
For the series:
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
On 13/03/2024 17.25, Yan Zhai wrote:
> This changeset fixes a common problem for busy networking kthreads.
> These threads, e.g. NAPI threads, typically will do:
>
> * polling a batch of packets
> * if there are more work, call cond_resched to allow scheduling
> * continue to poll more packets when rx queue is not empty
>
> We observed this being a problem in production, since it can block RCU
> tasks from making progress under heavy load. Investigation indicates
> that just calling cond_resched is insufficient for RCU tasks to reach
> quiescent states. This at least affects NAPI threads, napi_busy_loop, and
> also cpumap kthread for now.
>
> By reporting RCU QSes in these kthreads periodically before
> cond_resched, the blocked RCU waiters can correctly progress. Instead of
> just reporting QS for RCU tasks, these code share the same concern as
> noted in the commit d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched
> and RCU-preempt when safe"). So report a consolidated QS for safety.
>
> It is worth noting that, although this problem is reproducible in
> napi_busy_loop, it only shows up when setting the polling interval to as
> high as 2ms, which is far larger than recommended 50us-100us in the
> documentation. So napi_busy_loop is left untouched.
>
> V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
> V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
>
> changes since v2:
> * created a helper in rcu header to abstract the behavior
> * fixed cpumap kthread in addition
>
> changes since v1:
> * disable preemption first as Paul McKenney suggested
>
> Yan Zhai (3):
> rcu: add a helper to report consolidated flavor QS
> net: report RCU QS on threaded NAPI repolling
> bpf: report RCU QS in cpumap kthread
>
> include/linux/rcupdate.h | 23 +++++++++++++++++++++++
> kernel/bpf/cpumap.c | 2 ++
> net/core/dev.c | 3 +++
> 3 files changed, 28 insertions(+)
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
© 2016 - 2026 Red Hat, Inc.