[v1] virtio_net: adjust the execution order of function `virtnet_close` during freeze

[PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 3 weeks ago

"Use after free" issue appears in suspend once race occurs when
napi poll scheduls after `netif_device_detach` and before napi disables.

For details, during suspend flow of virtio-net,
the tx queue state is set to "__QUEUE_STATE_DRV_XOFF" by CPU-A.

And at some coincidental times, if a TCP connection is still working,
CPU-B does `virtnet_poll` before napi disable.
In this flow, the state "__QUEUE_STATE_DRV_XOFF"
of tx queue will be cleared. This is not the normal process it expects.

After that, CPU-A continues to close driver then virtqueue is removed.

Sequence likes below:
--------------------------------------------------------------------------
              CPU-A                            CPU-B
              -----                            -----
         suspend is called                  A TCP based on
                                        virtio-net still work
 virtnet_freeze
 |- virtnet_freeze_down
 | |- netif_device_detach
 | | |- netif_tx_stop_all_queues
 | |  |- netif_tx_stop_queue
 | |   |- set_bit
 | |     (__QUEUE_STATE_DRV_XOFF,...)
 | |                                     softirq rasied
 | |                                    |- net_rx_action
 | |                                     |- napi_poll
 | |                                      |- virtnet_poll
 | |                                       |- virtnet_poll_cleantx
 | |                                        |- netif_tx_wake_queue
 | |                                         |- test_and_clear_bit
 | |                                          (__QUEUE_STATE_DRV_XOFF,...)
 | |- virtnet_close
 |  |- virtnet_disable_queue_pair
 |   |- virtnet_napi_tx_disable
 |- remove_vq_common
--------------------------------------------------------------------------

When TCP delayack timer is up, a cpu gets softirq and irq handler
`tcp_delack_timer_handler` will be called, which will finally call
`start_xmit` in virtio net driver.
Then the access to tx virtq will cause panic.

The root cause of this issue is that napi tx
is not disable before `netif_tx_stop_queue`,
once `virnet_poll` schedules in such coincidental time,
the tx queue state will be cleared.

To solve this issue, adjusts the order of
function `virtnet_close` in `virtnet_freeze_down`.

Co-developed-by: Ying Xu <ying123.xu@samsung.com>
Signed-off-by: Ying Xu <ying123.xu@samsung.com>
Signed-off-by: Junnan Wu <junnan01.wu@samsung.com>
---
 drivers/net/virtio_net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d14e6d602273..975bdc5dab84 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5758,14 +5758,15 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	disable_rx_mode_work(vi);
 	flush_work(&vi->rx_mode_work);
 
-	netif_tx_lock_bh(vi->dev);
-	netif_device_detach(vi->dev);
-	netif_tx_unlock_bh(vi->dev);
 	if (netif_running(vi->dev)) {
 		rtnl_lock();
 		virtnet_close(vi->dev);
 		rtnl_unlock();
 	}
+
+	netif_tx_lock_bh(vi->dev);
+	netif_device_detach(vi->dev);
+	netif_tx_unlock_bh(vi->dev);
 }
 
 static int init_vqs(struct virtnet_info *vi);
-- 
2.34.1

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jakub Kicinski 1 month, 3 weeks ago

On Tue, 12 Aug 2025 17:08:17 +0800 Junnan Wu wrote:
> "Use after free" issue appears in suspend once race occurs when
> napi poll scheduls after `netif_device_detach` and before napi disables.

Sounds like a fix people may want to backport. Could you repost with 
an appropriate Fixes tag added, pointing to the earliest commit where
the problem can be observed?
-- 
pw-bot: cr

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 3 weeks ago

On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> Sounds like a fix people may want to backport. Could you repost with 
> an appropriate Fixes tag added, pointing to the earliest commit where
> the problem can be observed?

This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
will be invoked, which will wakeup tx queue and clear queue state.
If you agree with it, I will repost with this Fixes tag later.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jason Wang 1 month, 3 weeks ago

On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
>
> On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> > Sounds like a fix people may want to backport. Could you repost with
> > an appropriate Fixes tag added, pointing to the earliest commit where
> > the problem can be observed?
>
> This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
> After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
> will be invoked, which will wakeup tx queue and clear queue state.
> If you agree with it, I will repost with this Fixes tag later.
>
> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")

Could you please explain why it is specific to RX NAPI but not TX?

Thanks

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 3 weeks ago

On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote:
> On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
> >
> > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> > > Sounds like a fix people may want to backport. Could you repost with
> > > an appropriate Fixes tag added, pointing to the earliest commit where
> > > the problem can be observed?
> >
> > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
> > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
> > will be invoked, which will wakeup tx queue and clear queue state.
> > If you agree with it, I will repost with this Fixes tag later.
> >
> > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> 
> Could you please explain why it is specific to RX NAPI but not TX?
> 
> Thanks

This issue appears in suspend flow, if a TCP connection in host VM is still
sending packet before driver suspend is completed, it will tigger RX napi schedule,
Finally "use after free" happens when tcp ack timer is up.

And in suspend flow, the action to send packet is already stopped in guest VM,
therefore TX napi will not be scheduled.

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jason Wang 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote:
>
> On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote:
> > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
> > >
> > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> > > > Sounds like a fix people may want to backport. Could you repost with
> > > > an appropriate Fixes tag added, pointing to the earliest commit where
> > > > the problem can be observed?
> > >
> > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
> > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
> > > will be invoked, which will wakeup tx queue and clear queue state.
> > > If you agree with it, I will repost with this Fixes tag later.
> > >
> > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >
> > Could you please explain why it is specific to RX NAPI but not TX?
> >
> > Thanks
>
> This issue appears in suspend flow, if a TCP connection in host VM is still
> sending packet before driver suspend is completed, it will tigger RX napi schedule,
> Finally "use after free" happens when tcp ack timer is up.
>
> And in suspend flow, the action to send packet is already stopped in guest VM,
> therefore TX napi will not be scheduled.

I basically mean who guarantees the TX NAPI is not scheduled?

Thanks

>

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 2 weeks ago

Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend,
others can not be guaranteed, such as unfinished packets already in tx vq etc.

But after this patch, once `virtnet_close` completes,
both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future.
And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`.

Thanks.

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jason Wang 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
>
> Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend,
> others can not be guaranteed, such as unfinished packets already in tx vq etc.
>
> But after this patch, once `virtnet_close` completes,
> both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future.
> And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`.

Ok, so the commit mentioned by fix tag is incorrect.

Thanks

>
> Thanks.
>

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 2 weeks ago

On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote
> On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
> >
> > Sorry, I basically mean that the tx napi which caused by userspace will not be scheduled during suspend,
> > others can not be guaranteed, such as unfinished packets already in tx vq etc.
> >
> > But after this patch, once `virtnet_close` completes,
> > both tx and rq napi will be disabled which guarantee their napi will not be scheduled in future.
> > And the tx state will be set to "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`.
> 
> Ok, so the commit mentioned by fix tag is incorrect.

Yes, you are right. The commit of this fix tag is the first commit I found which add function `virtnet_poll_cleantx`.
Actually, we are not sure whether this issue appears after this commit.

In our side, this issue is found by chance in version 5.15.

It's hard to find the key commit which cause this issue
for reason that the reproduction of this scenario is too complex.

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jakub Kicinski 1 month, 2 weeks ago

On Fri, 15 Aug 2025 14:06:15 +0800 Junnan Wu wrote:
> On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote
> > On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote:  
> > > Sorry, I basically mean that the tx napi which caused by
> > > userspace will not be scheduled during suspend, others can not be
> > > guaranteed, such as unfinished packets already in tx vq etc.
> > >
> > > But after this patch, once `virtnet_close` completes,
> > > both tx and rq napi will be disabled which guarantee their napi
> > > will not be scheduled in future. And the tx state will be set to
> > > "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`.  
> > 
> > Ok, so the commit mentioned by fix tag is incorrect.  
> 
> Yes, you are right. The commit of this fix tag is the first commit I
> found which add function `virtnet_poll_cleantx`. Actually, we are not
> sure whether this issue appears after this commit.
> 
> In our side, this issue is found by chance in version 5.15.
> 
> It's hard to find the key commit which cause this issue
> for reason that the reproduction of this scenario is too complex.

I think the problem needs to be more clearly understood, and then it
will be easier to find the fixes tag. At the face of it the patch
makes it look like close() doesn't reliably stop the device, which
is highly odd.

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 2 weeks ago

On Fri, 15 Aug 2025 08:55:03 -0700 Jakub Kicinski wrote
> On Fri, 15 Aug 2025 14:06:15 +0800 Junnan Wu wrote:
> > On Fri, 15 Aug 2025 13:38:21 +0800 Jason Wang <jasowang@redhat.com> wrote
> > > On Fri, Aug 15, 2025 at 10:24 AM Junnan Wu <junnan01.wu@samsung.com> wrote:  
> > > > Sorry, I basically mean that the tx napi which caused by
> > > > userspace will not be scheduled during suspend, others can not be
> > > > guaranteed, such as unfinished packets already in tx vq etc.
> > > >
> > > > But after this patch, once `virtnet_close` completes,
> > > > both tx and rq napi will be disabled which guarantee their napi
> > > > will not be scheduled in future. And the tx state will be set to
> > > > "__QUEUE_STATE_DRV_XOFF" correctly in `netif_device_detach`.  
> > > 
> > > Ok, so the commit mentioned by fix tag is incorrect.  
> > 
> > Yes, you are right. The commit of this fix tag is the first commit I
> > found which add function `virtnet_poll_cleantx`. Actually, we are not
> > sure whether this issue appears after this commit.
> > 
> > In our side, this issue is found by chance in version 5.15.
> > 
> > It's hard to find the key commit which cause this issue
> > for reason that the reproduction of this scenario is too complex.
> 
> I think the problem needs to be more clearly understood, and then it
> will be easier to find the fixes tag. At the face of it the patch
> makes it look like close() doesn't reliably stop the device, which
> is highly odd.

Yes, you are right. It is really strange that `close()` acts like that, because current order has worked for long time.
But panic call stack in our env shows that the function `virtnet_close` and `netif_device_detach`
should have a correct execution order.
And it needs more time to find the fixes tag. I wonder that is it must have fixes tag to merge?

By the way, you mentioned that "the problem need to be more clearly understood",
did you mean the descriptions and sequences in commit message are not easy to understand?
Do you have some suggestions about this?

Thanks!

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jakub Kicinski 1 month, 2 weeks ago

On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote:
> > > Yes, you are right. The commit of this fix tag is the first commit I
> > > found which add function `virtnet_poll_cleantx`. Actually, we are not
> > > sure whether this issue appears after this commit.
> > > 
> > > In our side, this issue is found by chance in version 5.15.
> > > 
> > > It's hard to find the key commit which cause this issue
> > > for reason that the reproduction of this scenario is too complex.  
> > 
> > I think the problem needs to be more clearly understood, and then it
> > will be easier to find the fixes tag. At the face of it the patch
> > makes it look like close() doesn't reliably stop the device, which
> > is highly odd.  
> 
> Yes, you are right. It is really strange that `close()` acts like
> that, because current order has worked for long time. But panic call
> stack in our env shows that the function `virtnet_close` and
> `netif_device_detach` should have a correct execution order. And it
> needs more time to find the fixes tag. I wonder that is it must have
> fixes tag to merge?
> 
> By the way, you mentioned that "the problem need to be more clearly
> understood", did you mean the descriptions and sequences in commit
> message are not easy to understand? Do you have some suggestions
> about this?

Perhaps Jason gets your explanation and will correct me, but to me it
seems like the fix is based on trial and error rather than clear
understanding of the problem. If you understood the problem clearly
you should be able to find the Fixes tag without a problem..

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jason Wang 1 month, 2 weeks ago

On Mon, Aug 18, 2025 at 11:39 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote:
> > > > Yes, you are right. The commit of this fix tag is the first commit I
> > > > found which add function `virtnet_poll_cleantx`. Actually, we are not
> > > > sure whether this issue appears after this commit.
> > > >
> > > > In our side, this issue is found by chance in version 5.15.
> > > >
> > > > It's hard to find the key commit which cause this issue
> > > > for reason that the reproduction of this scenario is too complex.
> > >
> > > I think the problem needs to be more clearly understood, and then it
> > > will be easier to find the fixes tag. At the face of it the patch
> > > makes it look like close() doesn't reliably stop the device, which
> > > is highly odd.
> >
> > Yes, you are right. It is really strange that `close()` acts like
> > that, because current order has worked for long time. But panic call
> > stack in our env shows that the function `virtnet_close` and
> > `netif_device_detach` should have a correct execution order. And it
> > needs more time to find the fixes tag. I wonder that is it must have
> > fixes tag to merge?
> >
> > By the way, you mentioned that "the problem need to be more clearly
> > understood", did you mean the descriptions and sequences in commit
> > message are not easy to understand? Do you have some suggestions
> > about this?
>
> Perhaps Jason gets your explanation and will correct me, but to me it
> seems like the fix is based on trial and error rather than clear
> understanding of the problem. If you understood the problem clearly
> you should be able to find the Fixes tag without a problem..
>

+1

The code looks fine but the fixes tag needs to be correct.

Thanks

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 2 weeks ago

On Tue, 19 Aug 2025 10:48:37 +0800 Jason Wang wrote:
> On Mon, Aug 18, 2025 at 11:39 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 18 Aug 2025 09:15:22 +0800 Junnan Wu wrote:
> > > > > Yes, you are right. The commit of this fix tag is the first commit I
> > > > > found which add function `virtnet_poll_cleantx`. Actually, we are not
> > > > > sure whether this issue appears after this commit.
> > > > >
> > > > > In our side, this issue is found by chance in version 5.15.
> > > > >
> > > > > It's hard to find the key commit which cause this issue
> > > > > for reason that the reproduction of this scenario is too complex.
> > > >
> > > > I think the problem needs to be more clearly understood, and then it
> > > > will be easier to find the fixes tag. At the face of it the patch
> > > > makes it look like close() doesn't reliably stop the device, which
> > > > is highly odd.
> > >
> > > Yes, you are right. It is really strange that `close()` acts like
> > > that, because current order has worked for long time. But panic call
> > > stack in our env shows that the function `virtnet_close` and
> > > `netif_device_detach` should have a correct execution order. And it
> > > needs more time to find the fixes tag. I wonder that is it must have
> > > fixes tag to merge?
> > >
> > > By the way, you mentioned that "the problem need to be more clearly
> > > understood", did you mean the descriptions and sequences in commit
> > > message are not easy to understand? Do you have some suggestions
> > > about this?
> >
> > Perhaps Jason gets your explanation and will correct me, but to me it
> > seems like the fix is based on trial and error rather than clear
> > understanding of the problem. If you understood the problem clearly
> > you should be able to find the Fixes tag without a problem..
> >
> 
> +1
> 
> The code looks fine but the fixes tag needs to be correct.
> 
> Thanks

Well, I will do some works to find out the fixes tag.
Once there's progress, I will let you know.

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Jason Wang 1 month, 3 weeks ago

On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote:
>
> On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote:
> > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
> > >
> > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> > > > Sounds like a fix people may want to backport. Could you repost with
> > > > an appropriate Fixes tag added, pointing to the earliest commit where
> > > > the problem can be observed?
> > >
> > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
> > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
> > > will be invoked, which will wakeup tx queue and clear queue state.
> > > If you agree with it, I will repost with this Fixes tag later.
> > >
> > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >
> > Could you please explain why it is specific to RX NAPI but not TX?
> >
> > Thanks
>
> This issue appears in suspend flow, if a TCP connection in host VM is still
> sending packet before driver suspend is completed, it will tigger RX napi schedule,
> Finally "use after free" happens when tcp ack timer is up.
>
> And in suspend flow, the action to send packet is already stopped in guest VM,

The TX interrupt and NAPI is not disabled yet. Or anything I miss here?

> therefore TX napi will not be scheduled.
>

Thanks

Re: [PATCH net] virtio_net: adjust the execution order of function `virtnet_close` during freeze

Posted by Junnan Wu 1 month, 3 weeks ago

On Thu, 14 Aug 2025 14:49:06 +0800 Jason Wang wrote:
> On Thu, Aug 14, 2025 at 2:44 PM Junnan Wu <junnan01.wu@samsung.com> wrote:
> >
> > On Thu, 14 Aug 2025 12:01:18 +0800 Jason Wang wrote:
> > > On Thu, Aug 14, 2025 at 10:36 AM Junnan Wu <junnan01.wu@samsung.com> wrote:
> > > >
> > > > On Wed, 13 Aug 2025 17:23:07 -0700 Jakub Kicinski wrote:
> > > > > Sounds like a fix people may want to backport. Could you repost with
> > > > > an appropriate Fixes tag added, pointing to the earliest commit where
> > > > > the problem can be observed?
> > > >
> > > > This issue is caused by commit "7b0411ef4aa69c9256d6a2c289d0a2b320414633"
> > > > After this patch, during `virtnet_poll`, function `virtnet_poll_cleantx`
> > > > will be invoked, which will wakeup tx queue and clear queue state.
> > > > If you agree with it, I will repost with this Fixes tag later.
> > > >
> > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > >
> > > Could you please explain why it is specific to RX NAPI but not TX?
> > >
> > > Thanks
> >
> > This issue appears in suspend flow, if a TCP connection in host VM is still
> > sending packet before driver suspend is completed, it will tigger RX napi schedule,
> > Finally "use after free" happens when tcp ack timer is up.
> >
> > And in suspend flow, the action to send packet is already stopped in guest VM,
> 
> The TX interrupt and NAPI is not disabled yet. Or anything I miss here?

When system suspends, the userspace progress which based on virtio_net
will be freezed firstly, and then driver suspend callback executes.
so though TX interrupt and NAPI is not disabled at that time, it will also not be scheduled.