[v2] net: Handle napi_schedule() calls from non-interrupt

[PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Frederic Weisbecker 11 months, 2 weeks ago

napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* From a softirq disabled section, where raised softirqs are handled on
  the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
  round in do_softirq(), or further deferred to a dedicated kthread.

Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.

Such "misuses" have been detected on several places thanks to messages
of the kind:

	"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

For example:

       __raise_softirq_irqoff
        __napi_schedule
        rtl8152_runtime_resume.isra.0
        rtl8152_resume
        usb_resume_interface.isra.0
        usb_resume_both
        __rpm_callback
        rpm_callback
        rpm_resume
        __pm_runtime_resume
        usb_autoresume_device
        usb_remote_wakeup
        hub_event
        process_one_work
        worker_thread
        kthread
        ret_from_fork
        ret_from_fork_asm

And also:

* drivers/net/usb/r8152.c::rtl_work_func_t
* drivers/net/netdevsim/netdev.c::nsim_start_xmit

There is a long history of issues of this kind:

	019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
	330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
	e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
	e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
	c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
	970be1dff26d ("mt76: disable BH around napi_schedule() calls")
	019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
	30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
	e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
	83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
	bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
	8cf699ec849f ("mlx4: do not call napi_schedule() without care")
	ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")

This shows that relying on the caller to arrange a proper context for
the softirqs to be handled while calling napi_schedule() is very fragile
and error prone. Also fixing them can also prove challenging if the
caller may be called from different kinds of contexts.

Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Francois Romieu <romieu@fr.zoreil.com>
Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
Cc: Breno Leitao <leitao@debian.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 80e415ccf2c8..5c1b93a3f50a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
 	 * we have to raise NET_RX_SOFTIRQ.
 	 */
 	if (!sd->in_net_rx_action)
-		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+		raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
 
 #ifdef CONFIG_RPS
-- 
2.48.1

Re: [PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Eric Dumazet 11 months, 2 weeks ago

On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> napi_schedule() is expected to be called either:
>
> * From an interrupt, where raised softirqs are handled on IRQ exit
>
> * From a softirq disabled section, where raised softirqs are handled on
>   the next call to local_bh_enable().
>
> * From a softirq handler, where raised softirqs are handled on the next
>   round in do_softirq(), or further deferred to a dedicated kthread.
>
> Other bare tasks context may end up ignoring the raised NET_RX vector
> until the next random softirq handling opportunity, which may not
> happen before a while if the CPU goes idle afterwards with the tick
> stopped.
>
> Such "misuses" have been detected on several places thanks to messages
> of the kind:
>
>         "NOHZ tick-stop error: local softirq work is pending, handler #08!!!"
>
> For example:
>
>        __raise_softirq_irqoff
>         __napi_schedule
>         rtl8152_runtime_resume.isra.0
>         rtl8152_resume
>         usb_resume_interface.isra.0
>         usb_resume_both
>         __rpm_callback
>         rpm_callback
>         rpm_resume
>         __pm_runtime_resume
>         usb_autoresume_device
>         usb_remote_wakeup
>         hub_event
>         process_one_work
>         worker_thread
>         kthread
>         ret_from_fork
>         ret_from_fork_asm
>
> And also:
>
> * drivers/net/usb/r8152.c::rtl_work_func_t
> * drivers/net/netdevsim/netdev.c::nsim_start_xmit
>
> There is a long history of issues of this kind:
>
>         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
>         330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
>         e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
>         e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
>         c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
>         970be1dff26d ("mt76: disable BH around napi_schedule() calls")
>         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
>         30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
>         e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
>         83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
>         bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
>         8cf699ec849f ("mlx4: do not call napi_schedule() without care")
>         ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
>
> This shows that relying on the caller to arrange a proper context for
> the softirqs to be handled while calling napi_schedule() is very fragile
> and error prone. Also fixing them can also prove challenging if the
> caller may be called from different kinds of contexts.
>
> Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> when softirqs are raised from task contexts.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
> Cc: Breno Leitao <leitao@debian.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Francois Romieu <romieu@fr.zoreil.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  net/core/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 80e415ccf2c8..5c1b93a3f50a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>          * we have to raise NET_RX_SOFTIRQ.
>          */
>         if (!sd->in_net_rx_action)
> -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +               raise_softirq_irqoff(NET_RX_SOFTIRQ);

Your patch is fine, but would silence performance bugs.

I would probably add something to let network developers be aware of them.

diff --git a/net/core/dev.c b/net/core/dev.c
index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct
softnet_data *sd,
        /* If not called from net_rx_action()
         * we have to raise NET_RX_SOFTIRQ.
         */
-       if (!sd->in_net_rx_action)
-               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
+       if (!sd->in_net_rx_action) {
+               raise_softirq_irqoff(NET_RX_SOFTIRQ);
+               DEBUG_NET_WARN_ON_ONCE(!in_interrupt());
+       }
 }

 #ifdef CONFIG_RPS


Looking at the list of patches, I can see idpf fix was not very good,
I will submit another patch.

Re: [PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Frederic Weisbecker 11 months, 2 weeks ago

Le Wed, Feb 26, 2025 at 11:31:24AM +0100, Eric Dumazet a écrit :
> On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker
> <frederic@kernel.org> wrote:
> >
> > napi_schedule() is expected to be called either:
> >
> > * From an interrupt, where raised softirqs are handled on IRQ exit
> >
> > * From a softirq disabled section, where raised softirqs are handled on
> >   the next call to local_bh_enable().
> >
> > * From a softirq handler, where raised softirqs are handled on the next
> >   round in do_softirq(), or further deferred to a dedicated kthread.
> >
> > Other bare tasks context may end up ignoring the raised NET_RX vector
> > until the next random softirq handling opportunity, which may not
> > happen before a while if the CPU goes idle afterwards with the tick
> > stopped.
> >
> > Such "misuses" have been detected on several places thanks to messages
> > of the kind:
> >
> >         "NOHZ tick-stop error: local softirq work is pending, handler #08!!!"
> >
> > For example:
> >
> >        __raise_softirq_irqoff
> >         __napi_schedule
> >         rtl8152_runtime_resume.isra.0
> >         rtl8152_resume
> >         usb_resume_interface.isra.0
> >         usb_resume_both
> >         __rpm_callback
> >         rpm_callback
> >         rpm_resume
> >         __pm_runtime_resume
> >         usb_autoresume_device
> >         usb_remote_wakeup
> >         hub_event
> >         process_one_work
> >         worker_thread
> >         kthread
> >         ret_from_fork
> >         ret_from_fork_asm
> >
> > And also:
> >
> > * drivers/net/usb/r8152.c::rtl_work_func_t
> > * drivers/net/netdevsim/netdev.c::nsim_start_xmit
> >
> > There is a long history of issues of this kind:
> >
> >         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
> >         330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
> >         e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
> >         e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
> >         c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
> >         970be1dff26d ("mt76: disable BH around napi_schedule() calls")
> >         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
> >         30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
> >         e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
> >         83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
> >         bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
> >         8cf699ec849f ("mlx4: do not call napi_schedule() without care")
> >         ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
> >
> > This shows that relying on the caller to arrange a proper context for
> > the softirqs to be handled while calling napi_schedule() is very fragile
> > and error prone. Also fixing them can also prove challenging if the
> > caller may be called from different kinds of contexts.
> >
> > Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> > when softirqs are raised from task contexts.
> >
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Reported-by: Jakub Kicinski <kuba@kernel.org>
> > Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> > Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
> > Cc: Breno Leitao <leitao@debian.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Francois Romieu <romieu@fr.zoreil.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  net/core/dev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 80e415ccf2c8..5c1b93a3f50a 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >          * we have to raise NET_RX_SOFTIRQ.
> >          */
> >         if (!sd->in_net_rx_action)
> > -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > +               raise_softirq_irqoff(NET_RX_SOFTIRQ);
> 
> Your patch is fine, but would silence performance bugs.
> 
> I would probably add something to let network developers be aware of them.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct
> softnet_data *sd,
>         /* If not called from net_rx_action()
>          * we have to raise NET_RX_SOFTIRQ.
>          */
> -       if (!sd->in_net_rx_action)
> -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +       if (!sd->in_net_rx_action) {
> +               raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +               DEBUG_NET_WARN_ON_ONCE(!in_interrupt());
> +       }

That looks good and looks like what I did initially:

https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/

Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
like in the link?

Thanks.

>  }
> 
>  #ifdef CONFIG_RPS
> 
> 
> Looking at the list of patches, I can see idpf fix was not very good,
> I will submit another patch.

Re: [PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Eric Dumazet 11 months, 2 weeks ago

On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>

> That looks good and looks like what I did initially:
>
> https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/
>
> Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
> like in the link?

To be clear, I have not tried this thing yet.

Perhaps let your patch as is (for stable backports), and put the debug
stuff only after some tests, in net-next.

It is very possible that napi_schedule() in the problematic cases were
not on a fast path anyway.

Reviewed-by: Eric Dumazet <edumazet@google.com>

Re: [PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Sebastian Andrzej Siewior 11 months, 1 week ago

On 2025-02-26 14:34:39 [+0100], Eric Dumazet wrote:
> On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> 
> > That looks good and looks like what I did initially:
> >
> > https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/
> >
> > Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
> > like in the link?
> 
> To be clear, I have not tried this thing yet.
> 
> Perhaps let your patch as is (for stable backports), and put the debug
> stuff only after some tests, in net-next.
> 
> It is very possible that napi_schedule() in the problematic cases were
> not on a fast path anyway.

I got here via Sascha's stable backports. It looks to me that these
paths (the reported once) are not widely tested and then we don't have
any prints if the BH section is missing while expected.

Would it work for everyone if warnings are added and this patch is
reverted? These days ksoftirqd is not blocking any softirqs until it run
so chances are that the NAPI softirq kicks in before ksoftirqd gets on
the CPU so the damage is probably little.

I am slightly undecided here since real work usually originates in
softirq and it is hard to get this wrong but then who knows…

> Reviewed-by: Eric Dumazet <edumazet@google.com>

Sebastian

Re: [PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

Posted by Frederic Weisbecker 11 months, 2 weeks ago

Le Wed, Feb 26, 2025 at 02:34:39PM +0100, Eric Dumazet a écrit :
> On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> 
> > That looks good and looks like what I did initially:
> >
> > https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/
> >
> > Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
> > like in the link?
> 
> To be clear, I have not tried this thing yet.
> 
> Perhaps let your patch as is (for stable backports), and put the debug
> stuff only after some tests, in net-next.

Ok.

> 
> It is very possible that napi_schedule() in the problematic cases were
> not on a fast path anyway.

That was my assumption but I must confess I don't know well this realm.

> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks!