virtio_net: gate delayed refill scheduling

[PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Michael S. Tsirkin 2 months, 1 week ago

Make the delayed refill worker honor the "refill_enabled" flag by
checking it under refill_lock before requeueing itself. This
prevents a window where virtnet_rx_pause[_all]() disables NAPI and
synchronously waits for the current refill_work instance to finish only
for that instance to immediately arm another run, which then deadlocks
when it tries to double-disable NAPI.

Add and use a helper that encapsulates the locking and flag check so all
refill scheduling paths behave consistently and we no longer replicate
the spin_lock/unlock pattern.

This fixes the deadlock triggered by the XDP selftests when XDP is toggled
and RX is paused/resumed quickly.

Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Lightly tested.

Paolo is there a way to confirm this actually fixes the bug?
Could you help with that?

 drivers/net/virtio_net.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8855a994e12b..e2bfe8337f50 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -734,6 +734,15 @@ static void disable_delayed_refill(struct virtnet_info *vi)
 	spin_unlock_bh(&vi->refill_lock);
 }
 
+static void virtnet_schedule_refill_work(struct virtnet_info *vi,
+					 unsigned long delay)
+{
+	spin_lock_bh(&vi->refill_lock);
+	if (vi->refill_enabled)
+		schedule_delayed_work(&vi->refill, delay);
+	spin_unlock_bh(&vi->refill_lock);
+}
+
 static void enable_rx_mode_work(struct virtnet_info *vi)
 {
 	rtnl_lock();
@@ -2959,7 +2968,7 @@ static void refill_work(struct work_struct *work)
 		 * we will *never* try to fill again.
 		 */
 		if (still_empty)
-			schedule_delayed_work(&vi->refill, HZ/2);
+			virtnet_schedule_refill_work(vi, HZ/2);
 	}
 }
 
@@ -3026,12 +3035,8 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		packets = virtnet_receive_packets(vi, rq, budget, xdp_xmit, &stats);
 
 	if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
-		if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
-			spin_lock(&vi->refill_lock);
-			if (vi->refill_enabled)
-				schedule_delayed_work(&vi->refill, 0);
-			spin_unlock(&vi->refill_lock);
-		}
+		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
+			virtnet_schedule_refill_work(vi, 0);
 	}
 
 	u64_stats_set(&stats.packets, packets);
@@ -3216,7 +3221,7 @@ static int virtnet_open(struct net_device *dev)
 		if (i < vi->curr_queue_pairs)
 			/* Make sure we have some buffers: if oom use wq. */
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
+				virtnet_schedule_refill_work(vi, 0);
 
 		err = virtnet_enable_queue_pair(vi, i);
 		if (err < 0)
@@ -3469,7 +3474,7 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
 		virtnet_napi_enable(rq);
 
 	if (schedule_refill)
-		schedule_delayed_work(&vi->refill, 0);
+		virtnet_schedule_refill_work(vi, 0);
 }
 
 static void virtnet_rx_resume_all(struct virtnet_info *vi)
@@ -3815,10 +3820,8 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 succ:
 	vi->curr_queue_pairs = queue_pairs;
 	/* virtnet_open() will refill when device is going to up. */
-	spin_lock_bh(&vi->refill_lock);
-	if (dev->flags & IFF_UP && vi->refill_enabled)
-		schedule_delayed_work(&vi->refill, 0);
-	spin_unlock_bh(&vi->refill_lock);
+	if (dev->flags & IFF_UP)
+		virtnet_schedule_refill_work(vi, 0);
 
 	return 0;
 }
-- 
MST

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Bui Quang Minh 2 months, 1 week ago

I think the the requeue in refill_work is not the problem here. In
virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
use "even if the work re-queues itself". AFAICS, cancel_work_sync()
will disable work -> flush work -> enable again. So if the work requeue
itself in flush work, the requeue will fail because the work is already
disabled.

I think what triggers the deadlock here is a bug in
virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
__virtnet_rx_resume() which calls napi_enable() and may schedule
refill. It schedules the refill work right after napi_enable the first
receive queue. The correct way must be napi_enable all receive queues
before scheduling refill work.

The fix is like this (there are quite duplicated code, I will clean up
and send patches later if it is correct)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8e04adb57f52..892aa0805d1b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3482,20 +3482,25 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
  static void virtnet_rx_resume_all(struct virtnet_info *vi)
  {
      int i;
+    bool schedule_refill = false;
+
+    for (i = 0; i < vi->max_queue_pairs; i++)
+        __virtnet_rx_resume(vi, &vi->rq[i], false);

      enable_delayed_refill(vi);
-    for (i = 0; i < vi->max_queue_pairs; i++) {
-        if (i < vi->curr_queue_pairs)
-            __virtnet_rx_resume(vi, &vi->rq[i], true);
-        else
-            __virtnet_rx_resume(vi, &vi->rq[i], false);
-    }
+
+    for (i = 0; i < vi->curr_queue_pairs; i++)
+        if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+            schedule_refill = true;
+
+    if (schedule_refill)
+        schedule_delayed_work(&vi->refill, 0);
  }

  static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
  {
-    enable_delayed_refill(vi);
      __virtnet_rx_resume(vi, rq, true);
+    enable_delayed_refill(vi);
  }

  static int virtnet_rx_resize(struct virtnet_info *vi,

I also move the enable_delayed_refill() after we __virtnet_rx_resume()
to ensure no refill is scheduled before napi_enable().

What do you think?

Thanks,
Quang Minh

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Jason Wang 2 months, 1 week ago

On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> I think the the requeue in refill_work is not the problem here. In
> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
> will disable work -> flush work -> enable again. So if the work requeue
> itself in flush work, the requeue will fail because the work is already
> disabled.

Right.

>
> I think what triggers the deadlock here is a bug in
> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
> __virtnet_rx_resume() which calls napi_enable() and may schedule
> refill. It schedules the refill work right after napi_enable the first
> receive queue. The correct way must be napi_enable all receive queues
> before scheduling refill work.

So what you meant is that the napi_disable() is called for a queue
whose NAPI has been disabled?

cpu0] enable_delayed_refill()
cpu0] napi_enable(queue0)
cpu0] schedule_delayed_work(&vi->refill)
cpu1] napi_disable(queue0)
cpu1] napi_enable(queue0)
cpu1] napi_disable(queue1)

In this case cpu1 waits forever while holding the netdev lock. This
looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
NAPI enablement with netdev_lock()")?

>
> The fix is like this (there are quite duplicated code, I will clean up
> and send patches later if it is correct)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8e04adb57f52..892aa0805d1b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3482,20 +3482,25 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
>   static void virtnet_rx_resume_all(struct virtnet_info *vi)
>   {
>       int i;
> +    bool schedule_refill = false;
> +
> +    for (i = 0; i < vi->max_queue_pairs; i++)
> +        __virtnet_rx_resume(vi, &vi->rq[i], false);
>
>       enable_delayed_refill(vi);
> -    for (i = 0; i < vi->max_queue_pairs; i++) {
> -        if (i < vi->curr_queue_pairs)
> -            __virtnet_rx_resume(vi, &vi->rq[i], true);
> -        else
> -            __virtnet_rx_resume(vi, &vi->rq[i], false);
> -    }
> +
> +    for (i = 0; i < vi->curr_queue_pairs; i++)
> +        if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +            schedule_refill = true;
> +
> +    if (schedule_refill)
> +        schedule_delayed_work(&vi->refill, 0);
>   }
>
>   static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
>   {
> -    enable_delayed_refill(vi);
>       __virtnet_rx_resume(vi, rq, true);
> +    enable_delayed_refill(vi);

This seems to be odd. I think at least we need to move this before:

> +    if (schedule_refill)
> +        schedule_delayed_work(&vi->refill, 0);

?

>   }
>
>   static int virtnet_rx_resize(struct virtnet_info *vi,
>
> I also move the enable_delayed_refill() after we __virtnet_rx_resume()
> to ensure no refill is scheduled before napi_enable().
>
> What do you think?

This has been implemented in your patch or I may miss something.

Thanks

>
> Thanks,
> Quang Minh
>

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Bui Quang Minh 2 months, 1 week ago

On 11/28/25 09:20, Jason Wang wrote:
> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> I think the the requeue in refill_work is not the problem here. In
>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
>> will disable work -> flush work -> enable again. So if the work requeue
>> itself in flush work, the requeue will fail because the work is already
>> disabled.
> Right.
>
>> I think what triggers the deadlock here is a bug in
>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
>> __virtnet_rx_resume() which calls napi_enable() and may schedule
>> refill. It schedules the refill work right after napi_enable the first
>> receive queue. The correct way must be napi_enable all receive queues
>> before scheduling refill work.
> So what you meant is that the napi_disable() is called for a queue
> whose NAPI has been disabled?
>
> cpu0] enable_delayed_refill()
> cpu0] napi_enable(queue0)
> cpu0] schedule_delayed_work(&vi->refill)
> cpu1] napi_disable(queue0)
> cpu1] napi_enable(queue0)
> cpu1] napi_disable(queue1)
>
> In this case cpu1 waits forever while holding the netdev lock. This
> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
> NAPI enablement with netdev_lock()")?

Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed 
refill when pausing rx"), but it has flaws.

>> The fix is like this (there are quite duplicated code, I will clean up
>> and send patches later if it is correct)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8e04adb57f52..892aa0805d1b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -3482,20 +3482,25 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
>>    static void virtnet_rx_resume_all(struct virtnet_info *vi)
>>    {
>>        int i;
>> +    bool schedule_refill = false;
>> +
>> +    for (i = 0; i < vi->max_queue_pairs; i++)
>> +        __virtnet_rx_resume(vi, &vi->rq[i], false);
>>
>>        enable_delayed_refill(vi);
>> -    for (i = 0; i < vi->max_queue_pairs; i++) {
>> -        if (i < vi->curr_queue_pairs)
>> -            __virtnet_rx_resume(vi, &vi->rq[i], true);
>> -        else
>> -            __virtnet_rx_resume(vi, &vi->rq[i], false);
>> -    }
>> +
>> +    for (i = 0; i < vi->curr_queue_pairs; i++)
>> +        if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
>> +            schedule_refill = true;
>> +
>> +    if (schedule_refill)
>> +        schedule_delayed_work(&vi->refill, 0);
>>    }
>>
>>    static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
>>    {
>> -    enable_delayed_refill(vi);
>>        __virtnet_rx_resume(vi, rq, true);
>> +    enable_delayed_refill(vi);
> This seems to be odd. I think at least we need to move this before:
>
>> +    if (schedule_refill)
>> +        schedule_delayed_work(&vi->refill, 0);
> ?

Yes, I think helper __virtnet_rx_resume does not work well, because 
virtnet_rx_resume_all and virtnet_rx_resume have slightly different 
logic. So I think it's better to open-code the helper and do the logic 
directly in virtnet_rx_resume_all and virtnet_rx_resume.

>>    }
>>
>>    static int virtnet_rx_resize(struct virtnet_info *vi,
>>
>> I also move the enable_delayed_refill() after we __virtnet_rx_resume()
>> to ensure no refill is scheduled before napi_enable().
>>
>> What do you think?
> This has been implemented in your patch or I may miss something.

Yes, I move the enable_delayed_refill after the call __virtnet_rx_resume 
in the above diff because I see it creates a window where delayed refill 
is enabled before napi is enabled. However, AFAICS, other places that 
schedule the delayed refill work must acquire the rtnl_lock and/or 
netdev_lock, so it cannot happen while we are in virtnet_rx_resume[_all].

Thanks,
Quang Minh.

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Jason Wang 2 months, 1 week ago

On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> On 11/28/25 09:20, Jason Wang wrote:
> > On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >> I think the the requeue in refill_work is not the problem here. In
> >> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
> >> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
> >> will disable work -> flush work -> enable again. So if the work requeue
> >> itself in flush work, the requeue will fail because the work is already
> >> disabled.
> > Right.
> >
> >> I think what triggers the deadlock here is a bug in
> >> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
> >> __virtnet_rx_resume() which calls napi_enable() and may schedule
> >> refill. It schedules the refill work right after napi_enable the first
> >> receive queue. The correct way must be napi_enable all receive queues
> >> before scheduling refill work.
> > So what you meant is that the napi_disable() is called for a queue
> > whose NAPI has been disabled?
> >
> > cpu0] enable_delayed_refill()
> > cpu0] napi_enable(queue0)
> > cpu0] schedule_delayed_work(&vi->refill)
> > cpu1] napi_disable(queue0)
> > cpu1] napi_enable(queue0)
> > cpu1] napi_disable(queue1)
> >
> > In this case cpu1 waits forever while holding the netdev lock. This
> > looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
> > NAPI enablement with netdev_lock()")?
>
> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
> refill when pausing rx"), but it has flaws.

I wonder if a simplified version is just restoring the behaviour
before 413f0271f3966 by using napi_enable_locked() but maybe I miss
something.

Thanks

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Bui Quang Minh 2 months, 1 week ago

On 12/2/25 13:03, Jason Wang wrote:
> On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> On 11/28/25 09:20, Jason Wang wrote:
>>> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>> I think the the requeue in refill_work is not the problem here. In
>>>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
>>>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
>>>> will disable work -> flush work -> enable again. So if the work requeue
>>>> itself in flush work, the requeue will fail because the work is already
>>>> disabled.
>>> Right.
>>>
>>>> I think what triggers the deadlock here is a bug in
>>>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
>>>> __virtnet_rx_resume() which calls napi_enable() and may schedule
>>>> refill. It schedules the refill work right after napi_enable the first
>>>> receive queue. The correct way must be napi_enable all receive queues
>>>> before scheduling refill work.
>>> So what you meant is that the napi_disable() is called for a queue
>>> whose NAPI has been disabled?
>>>
>>> cpu0] enable_delayed_refill()
>>> cpu0] napi_enable(queue0)
>>> cpu0] schedule_delayed_work(&vi->refill)
>>> cpu1] napi_disable(queue0)
>>> cpu1] napi_enable(queue0)
>>> cpu1] napi_disable(queue1)
>>>
>>> In this case cpu1 waits forever while holding the netdev lock. This
>>> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
>>> NAPI enablement with netdev_lock()")?
>> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
>> refill when pausing rx"), but it has flaws.
> I wonder if a simplified version is just restoring the behaviour
> before 413f0271f3966 by using napi_enable_locked() but maybe I miss
> something.

As far as I understand, before 413f0271f3966 ("net: protect NAPI 
enablement with netdev_lock()"), the napi is protected by the 
rtnl_lock(). But in the refill_work, we don't acquire the rtnl_lock(), 
so it seems like we will have race condition before 413f0271f3966 ("net: 
protect NAPI enablement with netdev_lock()").

Thanks,
Quang Minh.

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Jason Wang 2 months, 1 week ago

On Tue, Dec 2, 2025 at 11:29 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> On 12/2/25 13:03, Jason Wang wrote:
> > On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >> On 11/28/25 09:20, Jason Wang wrote:
> >>> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>> I think the the requeue in refill_work is not the problem here. In
> >>>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
> >>>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
> >>>> will disable work -> flush work -> enable again. So if the work requeue
> >>>> itself in flush work, the requeue will fail because the work is already
> >>>> disabled.
> >>> Right.
> >>>
> >>>> I think what triggers the deadlock here is a bug in
> >>>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
> >>>> __virtnet_rx_resume() which calls napi_enable() and may schedule
> >>>> refill. It schedules the refill work right after napi_enable the first
> >>>> receive queue. The correct way must be napi_enable all receive queues
> >>>> before scheduling refill work.
> >>> So what you meant is that the napi_disable() is called for a queue
> >>> whose NAPI has been disabled?
> >>>
> >>> cpu0] enable_delayed_refill()
> >>> cpu0] napi_enable(queue0)
> >>> cpu0] schedule_delayed_work(&vi->refill)
> >>> cpu1] napi_disable(queue0)
> >>> cpu1] napi_enable(queue0)
> >>> cpu1] napi_disable(queue1)
> >>>
> >>> In this case cpu1 waits forever while holding the netdev lock. This
> >>> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
> >>> NAPI enablement with netdev_lock()")?
> >> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
> >> refill when pausing rx"), but it has flaws.
> > I wonder if a simplified version is just restoring the behaviour
> > before 413f0271f3966 by using napi_enable_locked() but maybe I miss
> > something.
>
> As far as I understand, before 413f0271f3966 ("net: protect NAPI
> enablement with netdev_lock()"), the napi is protected by the

I guess you meant napi enable/disable actually.

> rtnl_lock(). But in the refill_work, we don't acquire the rtnl_lock(),

Any reason we need to hold rtnl_lock() there?

> so it seems like we will have race condition before 413f0271f3966 ("net:
> protect NAPI enablement with netdev_lock()").
>
> Thanks,
> Quang Minh.
>

Thanks

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Bui Quang Minh 2 months ago

On 12/3/25 13:37, Jason Wang wrote:
> On Tue, Dec 2, 2025 at 11:29 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> On 12/2/25 13:03, Jason Wang wrote:
>>> On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>> On 11/28/25 09:20, Jason Wang wrote:
>>>>> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>>>>>> I think the the requeue in refill_work is not the problem here. In
>>>>>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
>>>>>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
>>>>>> will disable work -> flush work -> enable again. So if the work requeue
>>>>>> itself in flush work, the requeue will fail because the work is already
>>>>>> disabled.
>>>>> Right.
>>>>>
>>>>>> I think what triggers the deadlock here is a bug in
>>>>>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
>>>>>> __virtnet_rx_resume() which calls napi_enable() and may schedule
>>>>>> refill. It schedules the refill work right after napi_enable the first
>>>>>> receive queue. The correct way must be napi_enable all receive queues
>>>>>> before scheduling refill work.
>>>>> So what you meant is that the napi_disable() is called for a queue
>>>>> whose NAPI has been disabled?
>>>>>
>>>>> cpu0] enable_delayed_refill()
>>>>> cpu0] napi_enable(queue0)
>>>>> cpu0] schedule_delayed_work(&vi->refill)
>>>>> cpu1] napi_disable(queue0)
>>>>> cpu1] napi_enable(queue0)
>>>>> cpu1] napi_disable(queue1)
>>>>>
>>>>> In this case cpu1 waits forever while holding the netdev lock. This
>>>>> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
>>>>> NAPI enablement with netdev_lock()")?
>>>> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
>>>> refill when pausing rx"), but it has flaws.
>>> I wonder if a simplified version is just restoring the behaviour
>>> before 413f0271f3966 by using napi_enable_locked() but maybe I miss
>>> something.
>> As far as I understand, before 413f0271f3966 ("net: protect NAPI
>> enablement with netdev_lock()"), the napi is protected by the
> I guess you meant napi enable/disable actually.
>
>> rtnl_lock(). But in the refill_work, we don't acquire the rtnl_lock(),
> Any reason we need to hold rtnl_lock() there?

Correct me if I'm wrong here. Before 413f0271f3966 ("net: protect NAPI 
enablement with netdev_lock()"), napi_disable and napi_enable are not 
safe to be called concurrently.

The example race is

napi_disable -> napi_save_config -> write to n->config->defer_hard_irqs
napi_enable -> napi_restore_config -> read n->config->defer_hard_irqs

In refill_work, we don't hold any locks so the race scenario can happen.

Maybe I misunderstand what you mean by restoring the behavior before 
413f0271f3966. Do you mean that we use this pattern

     In virtnet_xdp_se;

     netdev_lock(dev);
     virtnet_rx_pause_all()
         -> napi_disable_locked

     virtnet_rx_resume_all()
         -> napi_disable_locked
     netdev_unlock(dev);

And in other places where we pause the rx too. It will hold the 
netdev_lock during the time napi is disabled so that even when 
refill_work happens concurrently, napi_disable cannot acquire the 
netdev_lock and gets stuck inside.


>
>> so it seems like we will have race condition before 413f0271f3966 ("net:
>> protect NAPI enablement with netdev_lock()").
>>
>> Thanks,
>> Quang Minh.
>>
> Thanks
>

Thanks,
Quang Minh.

Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

Posted by Jason Wang 2 months ago

On Thu, Dec 4, 2025 at 11:08 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>
> On 12/3/25 13:37, Jason Wang wrote:
> > On Tue, Dec 2, 2025 at 11:29 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >> On 12/2/25 13:03, Jason Wang wrote:
> >>> On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>> On 11/28/25 09:20, Jason Wang wrote:
> >>>>> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> >>>>>> I think the the requeue in refill_work is not the problem here. In
> >>>>>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
> >>>>>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
> >>>>>> will disable work -> flush work -> enable again. So if the work requeue
> >>>>>> itself in flush work, the requeue will fail because the work is already
> >>>>>> disabled.
> >>>>> Right.
> >>>>>
> >>>>>> I think what triggers the deadlock here is a bug in
> >>>>>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
> >>>>>> __virtnet_rx_resume() which calls napi_enable() and may schedule
> >>>>>> refill. It schedules the refill work right after napi_enable the first
> >>>>>> receive queue. The correct way must be napi_enable all receive queues
> >>>>>> before scheduling refill work.
> >>>>> So what you meant is that the napi_disable() is called for a queue
> >>>>> whose NAPI has been disabled?
> >>>>>
> >>>>> cpu0] enable_delayed_refill()
> >>>>> cpu0] napi_enable(queue0)
> >>>>> cpu0] schedule_delayed_work(&vi->refill)
> >>>>> cpu1] napi_disable(queue0)
> >>>>> cpu1] napi_enable(queue0)
> >>>>> cpu1] napi_disable(queue1)
> >>>>>
> >>>>> In this case cpu1 waits forever while holding the netdev lock. This
> >>>>> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
> >>>>> NAPI enablement with netdev_lock()")?
> >>>> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
> >>>> refill when pausing rx"), but it has flaws.
> >>> I wonder if a simplified version is just restoring the behaviour
> >>> before 413f0271f3966 by using napi_enable_locked() but maybe I miss
> >>> something.
> >> As far as I understand, before 413f0271f3966 ("net: protect NAPI
> >> enablement with netdev_lock()"), the napi is protected by the
> > I guess you meant napi enable/disable actually.
> >
> >> rtnl_lock(). But in the refill_work, we don't acquire the rtnl_lock(),
> > Any reason we need to hold rtnl_lock() there?
>
> Correct me if I'm wrong here. Before 413f0271f3966 ("net: protect NAPI
> enablement with netdev_lock()"), napi_disable and napi_enable are not
> safe to be called concurrently.
>
> The example race is
>
> napi_disable -> napi_save_config -> write to n->config->defer_hard_irqs
> napi_enable -> napi_restore_config -> read n->config->defer_hard_irqs
>
> In refill_work, we don't hold any locks so the race scenario can happen.

Ok, I get you, so it occurs after we introduced the NAPI config to virtio-net.

>
> Maybe I misunderstand what you mean by restoring the behavior before
> 413f0271f3966. Do you mean that we use this pattern
>
>      In virtnet_xdp_se;
>
>      netdev_lock(dev);
>      virtnet_rx_pause_all()
>          -> napi_disable_locked
>
>      virtnet_rx_resume_all()
>          -> napi_disable_locked
>      netdev_unlock(dev);
>
> And in other places where we pause the rx too. It will hold the
> netdev_lock during the time napi is disabled so that even when
> refill_work happens concurrently, napi_disable cannot acquire the
> netdev_lock and gets stuck inside.

It might work but I think it would be easy to either

1) use locked version everywhere

or

2) use the unlocked version everywhere

Mix using locked and unlocked may cause the code hard to be maintained

Thanks

>
>
> >
> >> so it seems like we will have race condition before 413f0271f3966 ("net:
> >> protect NAPI enablement with netdev_lock()").
> >>
> >> Thanks,
> >> Quang Minh.
> >>
> > Thanks
> >
>
> Thanks,
> Quang Minh.
>