[PATCH] net/hub: make net_hub_port_cleanup idempotent

Jonah Palmer posted 1 patch 2 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250821142641.359132-1-jonah.palmer@oracle.com
Maintainers: Jason Wang <jasowang@redhat.com>
net/hub.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
[PATCH] net/hub: make net_hub_port_cleanup idempotent
Posted by Jonah Palmer 2 months, 3 weeks ago
Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.

When using a Xen networking device with hubport backends, e.g.:

-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...

the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.

Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 net/hub.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/hub.c b/net/hub.c
index e3b58b1c4f8e..ee5881f6d5cb 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -34,6 +34,7 @@ typedef struct NetHubPort {
     QLIST_ENTRY(NetHubPort) next;
     NetHub *hub;
     int id;
+    bool listed;
 } NetHubPort;
 
 struct NetHub {
@@ -129,7 +130,10 @@ static void net_hub_port_cleanup(NetClientState *nc)
 {
     NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
 
-    QLIST_REMOVE(port, next);
+    if (port->listed) {
+        QLIST_REMOVE(port, next);
+        port->listed = false;
+    }
 }
 
 static NetClientInfo net_hub_port_info = {
@@ -159,8 +163,10 @@ static NetHubPort *net_hub_port_new(NetHub *hub, const char *name,
     port = DO_UPCAST(NetHubPort, nc, nc);
     port->id = id;
     port->hub = hub;
+    port->listed = false;
 
     QLIST_INSERT_HEAD(&hub->ports, port, next);
+    port->listed = true;
 
     return port;
 }
-- 
2.47.3
Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
Posted by Jason Wang 2 months, 1 week ago
On Thu, Aug 21, 2025 at 10:28 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
>
> When using a Xen networking device with hubport backends, e.g.:
>
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
>
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
>
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>

Queued.

Thanks
Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
Posted by David Woodhouse 2 months, 3 weeks ago
On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
> 
> When using a Xen networking device with hubport backends, e.g.:
> 
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
> 
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
> 
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")

Tested-by: David Woodhouse <dwmw@amazon.co.uk>

But I hate it.

The lifetime of these objects is confusing, and this patch doesn't make
it nicer.

Why is it OK for the object to be taken off the list while it still
exists and is findable by other pointers? What does it *mean* for it to
be in that state? Doesn't it have a refcount? Can't it be unlisted
*and* freed only when that refcount goes to zero?


Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
Posted by Jonah Palmer 2 months, 3 weeks ago

On 8/21/25 11:13 AM, David Woodhouse wrote:
> On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
>> Makes the net_hub_port_cleanup function idempotent to avoid double
>> removals by guarding its QLIST_REMOVE with a flag.
>>
>> When using a Xen networking device with hubport backends, e.g.:
>>
>> -accel kvm,xen-version=0x40011
>> -netdev hubport,...
>> -device xen-net-device,...
>>
>> the shutdown order starts with net_cleanup, which walks the list and
>> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
>> called, which eventually leads to a second net_hub_port_cleanup call,
>> resulting in a segfault.
>>
>> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> 
> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> But I hate it.
> 
> The lifetime of these objects is confusing, and this patch doesn't make
> it nicer.
> 
> Why is it OK for the object to be taken off the list while it still
> exists and is findable by other pointers? What does it *mean* for it to
> be in that state? Doesn't it have a refcount? Can't it be unlisted
> *and* freed only when that refcount goes to zero?
> 
> 

I believe this "off-list but still exists" state is intentional and is 
the model for NIC peer backends. IIUC, it essentially means "detached 
from hub broadcast but owned by the NIC until device teardown".

The net core explicitly transfers ownership of a backend from the global 
list to the NIC without freeing it, so it can be torn down later by the 
device itself. See the comments in qemu_del_net_client and net_cleanup 
in net/net.c.

In other words, a backend can be removed from net_clients and still 
exist (owned by the NIC) until qemu_del_nic cleans it up and frees it. 
There's no refcount today and lifetime is manual.

The real bug here is that hubport's cleanup is not idempotent, not 
necessarily the ownership model itself.

I do agree though that the lifetime of these objects is confusing. And a 
refcount model would be theoretically cleaner, but current code 
explicitly relies on ownership transfer without refcounts.

---

Actually, now that I think about it, perhaps we don't need this extra 
'listed' state and instead can just check 'if (port->hub)' and set it to 
NULL after it's removed from the list.
Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
Posted by Jason Wang 2 months, 1 week ago
On Fri, Aug 22, 2025 at 12:39 AM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 8/21/25 11:13 AM, David Woodhouse wrote:
> > On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> >> Makes the net_hub_port_cleanup function idempotent to avoid double
> >> removals by guarding its QLIST_REMOVE with a flag.
> >>
> >> When using a Xen networking device with hubport backends, e.g.:
> >>
> >> -accel kvm,xen-version=0x40011
> >> -netdev hubport,...
> >> -device xen-net-device,...
> >>
> >> the shutdown order starts with net_cleanup, which walks the list and
> >> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> >> called, which eventually leads to a second net_hub_port_cleanup call,
> >> resulting in a segfault.
> >>
> >> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> >
> > Tested-by: David Woodhouse <dwmw@amazon.co.uk>
> >
> > But I hate it.
> >
> > The lifetime of these objects is confusing, and this patch doesn't make
> > it nicer.
> >
> > Why is it OK for the object to be taken off the list while it still
> > exists and is findable by other pointers? What does it *mean* for it to
> > be in that state? Doesn't it have a refcount? Can't it be unlisted
> > *and* freed only when that refcount goes to zero?
> >
> >
>
> I believe this "off-list but still exists" state is intentional and is
> the model for NIC peer backends. IIUC, it essentially means "detached
> from hub broadcast but owned by the NIC until device teardown".
>
> The net core explicitly transfers ownership of a backend from the global
> list to the NIC without freeing it, so it can be torn down later by the
> device itself. See the comments in qemu_del_net_client and net_cleanup
> in net/net.c.
>
> In other words, a backend can be removed from net_clients and still
> exist (owned by the NIC) until qemu_del_nic cleans it up and frees it.
> There's no refcount today and lifetime is manual.
>
> The real bug here is that hubport's cleanup is not idempotent, not
> necessarily the ownership model itself.
>
> I do agree though that the lifetime of these objects is confusing. And a
> refcount model would be theoretically cleaner, but current code
> explicitly relies on ownership transfer without refcounts.

In the long term, we should consider converting networking code to QOM.

Thanks

>
> ---
>
> Actually, now that I think about it, perhaps we don't need this extra
> 'listed' state and instead can just check 'if (port->hub)' and set it to
> NULL after it's removed from the list.
>