[PATCH 3/4] [WTF] avoid qemu_del_nic() in xen_netdev_unrealize() on shutdown

David Woodhouse posted 4 patches 2 years, 3 months ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Stefano Stabellini <sstabellini@kernel.org>, Anthony Perard <anthony.perard@citrix.com>, Paul Durrant <paul@xen.org>, Jason Wang <jasowang@redhat.com>
[PATCH 3/4] [WTF] avoid qemu_del_nic() in xen_netdev_unrealize() on shutdown
Posted by David Woodhouse 2 years, 3 months ago
From: David Woodhouse <dwmw@amazon.co.uk>

When QEMU is exiting, qemu_cleanup() calls net_cleanup(), which deletes
the NIC from underneath the xen-net-device. When xen_netdev_unrealize()
is later called via the xenbus exit notifier, it crashes.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/net/xen_nic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 84914c329c..8d25fb3101 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -25,6 +25,8 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/error.h"
 
+#include "sysemu/runstate.h"
+
 #include <sys/socket.h>
 #include <sys/ioctl.h>
 #include <sys/wait.h>
@@ -530,7 +532,11 @@ static void xen_netdev_unrealize(XenDevice *xendev)
     /* Disconnect from the frontend in case this has not already happened */
     xen_netdev_disconnect(xendev, NULL);
 
-    if (netdev->nic) {
+    /*
+     * WTF? In RUN_STATE_SHUTDOWN, qemu_cleanup()→net_cleanup() already deleted
+     * our NIC from underneath us!
+     */
+    if (netdev->nic && !runstate_check(RUN_STATE_SHUTDOWN)) {
         qemu_del_nic(netdev->nic);
     }
 }
-- 
2.40.1


Re: [PATCH 3/4] [WTF] avoid qemu_del_nic() in xen_netdev_unrealize() on shutdown
Posted by David Woodhouse 2 years, 3 months ago
On Tue, 2023-10-17 at 19:25 +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When QEMU is exiting, qemu_cleanup() calls net_cleanup(), which deletes
> the NIC from underneath the xen-net-device. When xen_netdev_unrealize()
> is later called via the xenbus exit notifier, it crashes.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  hw/net/xen_nic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
> index 84914c329c..8d25fb3101 100644
> --- a/hw/net/xen_nic.c
> +++ b/hw/net/xen_nic.c
> @@ -25,6 +25,8 @@
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/error.h"
>  
> +#include "sysemu/runstate.h"
> +
>  #include <sys/socket.h>
>  #include <sys/ioctl.h>
>  #include <sys/wait.h>
> @@ -530,7 +532,11 @@ static void xen_netdev_unrealize(XenDevice *xendev)
>      /* Disconnect from the frontend in case this has not already happened */
>      xen_netdev_disconnect(xendev, NULL);
>  
> -    if (netdev->nic) {
> +    /*
> +     * WTF? In RUN_STATE_SHUTDOWN, qemu_cleanup()→net_cleanup() already deleted
> +     * our NIC from underneath us!
> +     */
> +    if (netdev->nic && !runstate_check(RUN_STATE_SHUTDOWN)) {
>          qemu_del_nic(netdev->nic);
>      }
>  }

I wonder if this is the better answer? There's no point deleting the
*NICs*, is there? It's the other net clients we really want to clean
up?

--- a/net/net.c
+++ b/net/net.c
@@ -1499,18 +1499,22 @@ static void net_vm_change_state_handler(void *opaque, bool running,
 
 void net_cleanup(void)
 {
-    NetClientState *nc;
+    NetClientState *nc, **p = &net_clients.tqh_first;
 
     /*cleanup colo compare module for COLO*/
     colo_compare_cleanup();
 
-    /* We may del multiple entries during qemu_del_net_client(),
-     * so QTAILQ_FOREACH_SAFE() is also not safe here.
+    /*
+     * We may del multiple entries during qemu_del_net_client(), so
+     * QTAILQ_FOREACH_SAFE() is not safe here. The only safe pointer
+     * to keep is a NET_CLIENT_DRIVER_NIC entry, as we don't want
+     * to delete those (we'd upset the devices which own them, if we
+     * did).
      */
-    while (!QTAILQ_EMPTY(&net_clients)) {
-        nc = QTAILQ_FIRST(&net_clients);
+    while (*p) {
+        nc = *p;
         if (nc->info->type == NET_CLIENT_DRIVER_NIC) {
-            qemu_del_nic(qemu_get_nic(nc));
+            p = &nc->next.tqe_next;
         } else {
             qemu_del_net_client(nc);
         }