[PATCH] net/net.c: Fix qemu crash when hot-pluging a vhost-net failed.

Ming Yang via posted 1 patch 1 year, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221205080859.2216-1-yangming73@huawei.com
Maintainers: Jason Wang <jasowang@redhat.com>
net/net.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
[PATCH] net/net.c: Fix qemu crash when hot-pluging a vhost-net failed.
Posted by Ming Yang via 1 year, 5 months ago
Hot-pluging a vhost-net may cause virtual machine crash in following steps:
1. Starting a vm without net devices.
2. Hot-pluging 70 memory devices.
3. Hot-pluging a vhost-net device.

The reason is : if hotplug a vhost-net failed, the nc cannot be found via function qemu_find_netdev, as
it has been cleaned up through function qemu_cleanup_net_client. Which leads to the result
that assert(nc) failed, then qemu crashed.

While, the root reason is that, in commit 46d4d36d0bf2 if not both has_vhostforce and vhostforce flags
are true, the errp would not be set. Then net_init_tap would not return a negative value, fallowed by founding nc
and assert nc.

In this patch, asserting nc is replaced with setting an error message.

Fixes: 46d4d36d0bf2("tap: setting error appropriately when calling net_init_tap_one()")
Signed-off-by: Ming Yang <yangming73@huawei.com>
Signed-off-by: Liang Zhang <zhangliang5@huawei.com>
---
 net/net.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/net.c b/net/net.c
index 840ad9dca5..1d1d7e54c4 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1103,7 +1103,16 @@ static int net_client_init1(const Netdev *netdev, bool is_netdev, Error **errp)
 
     if (is_netdev) {
         nc = qemu_find_netdev(netdev->id);
-        assert(nc);
+        /*
+         * If the tap of hotpluged net device do not has both has_vhostforce flag and vhostforce flags,
+         * when error occurs, the error messags will be report but not set to errp. Thus net_client_init_fun
+         * will not return a negatave value. Therefore the value of nc might be NULL. To make qemu robust,
+         * it is better to judge if nc is NULL.
+         */
+        if (!nc) {
+            error_setg(errp, "Device '%s' could not be initialized", netdev->id);
+            return -1;
+        }
         nc->is_netdev = true;
     }
 
-- 
2.33.0
Re: [PATCH] net/net.c: Fix qemu crash when hot-pluging a vhost-net failed.
Posted by Peter Maydell 1 year, 5 months ago
On Mon, 5 Dec 2022 at 14:24, Ming Yang via <qemu-devel@nongnu.org> wrote:
>
> Hot-pluging a vhost-net may cause virtual machine crash in following steps:
> 1. Starting a vm without net devices.
> 2. Hot-pluging 70 memory devices.
> 3. Hot-pluging a vhost-net device.
>
> The reason is : if hotplug a vhost-net failed, the nc cannot be found via function qemu_find_netdev, as
> it has been cleaned up through function qemu_cleanup_net_client. Which leads to the result
> that assert(nc) failed, then qemu crashed.
>
> While, the root reason is that, in commit 46d4d36d0bf2 if not both has_vhostforce and vhostforce flags
> are true, the errp would not be set. Then net_init_tap would not return a negative value, fallowed by founding nc
> and assert nc.
>
> In this patch, asserting nc is replaced with setting an error message.
>
> Fixes: 46d4d36d0bf2("tap: setting error appropriately when calling net_init_tap_one()")
> Signed-off-by: Ming Yang <yangming73@huawei.com>
> Signed-off-by: Liang Zhang <zhangliang5@huawei.com>
> ---
>  net/net.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/net/net.c b/net/net.c
> index 840ad9dca5..1d1d7e54c4 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1103,7 +1103,16 @@ static int net_client_init1(const Netdev *netdev, bool is_netdev, Error **errp)
>
>      if (is_netdev) {
>          nc = qemu_find_netdev(netdev->id);
> -        assert(nc);
> +        /*
> +         * If the tap of hotpluged net device do not has both has_vhostforce flag and vhostforce flags,
> +         * when error occurs, the error messags will be report but not set to errp. Thus net_client_init_fun
> +         * will not return a negatave value. Therefore the value of nc might be NULL. To make qemu robust,
> +         * it is better to judge if nc is NULL.
> +         */
> +        if (!nc) {
> +            error_setg(errp, "Device '%s' could not be initialized", netdev->id);
> +            return -1;
> +        }
>          nc->is_netdev = true;
>      }

This doesn't look like the right fix to me. If the net_client_init_fun
doesn't correctly initialize the netdev, it should report that error
back to the caller. We should make the tap init function correctly
return an error in this error situation, not work around it in
the caller. The assert() is correct, because it is detecting a bug
elsewhere in QEMU that we need to fix.

thanks
-- PMM