Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.
When using a Xen networking device with hubport backends, e.g.:
-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...
the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.
Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
net/hub.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/hub.c b/net/hub.c
index e3b58b1c4f8e..ee5881f6d5cb 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -34,6 +34,7 @@ typedef struct NetHubPort {
QLIST_ENTRY(NetHubPort) next;
NetHub *hub;
int id;
+ bool listed;
} NetHubPort;
struct NetHub {
@@ -129,7 +130,10 @@ static void net_hub_port_cleanup(NetClientState *nc)
{
NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
- QLIST_REMOVE(port, next);
+ if (port->listed) {
+ QLIST_REMOVE(port, next);
+ port->listed = false;
+ }
}
static NetClientInfo net_hub_port_info = {
@@ -159,8 +163,10 @@ static NetHubPort *net_hub_port_new(NetHub *hub, const char *name,
port = DO_UPCAST(NetHubPort, nc, nc);
port->id = id;
port->hub = hub;
+ port->listed = false;
QLIST_INSERT_HEAD(&hub->ports, port, next);
+ port->listed = true;
return port;
}
--
2.47.3
On Thu, Aug 21, 2025 at 10:28 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
>
> When using a Xen networking device with hubport backends, e.g.:
>
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
>
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
>
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Queued.
Thanks
On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
>
> When using a Xen networking device with hubport backends, e.g.:
>
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
>
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
>
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
But I hate it.
The lifetime of these objects is confusing, and this patch doesn't make
it nicer.
Why is it OK for the object to be taken off the list while it still
exists and is findable by other pointers? What does it *mean* for it to
be in that state? Doesn't it have a refcount? Can't it be unlisted
*and* freed only when that refcount goes to zero?
On 8/21/25 11:13 AM, David Woodhouse wrote:
> On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
>> Makes the net_hub_port_cleanup function idempotent to avoid double
>> removals by guarding its QLIST_REMOVE with a flag.
>>
>> When using a Xen networking device with hubport backends, e.g.:
>>
>> -accel kvm,xen-version=0x40011
>> -netdev hubport,...
>> -device xen-net-device,...
>>
>> the shutdown order starts with net_cleanup, which walks the list and
>> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
>> called, which eventually leads to a second net_hub_port_cleanup call,
>> resulting in a segfault.
>>
>> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
>
> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
>
> But I hate it.
>
> The lifetime of these objects is confusing, and this patch doesn't make
> it nicer.
>
> Why is it OK for the object to be taken off the list while it still
> exists and is findable by other pointers? What does it *mean* for it to
> be in that state? Doesn't it have a refcount? Can't it be unlisted
> *and* freed only when that refcount goes to zero?
>
>
I believe this "off-list but still exists" state is intentional and is
the model for NIC peer backends. IIUC, it essentially means "detached
from hub broadcast but owned by the NIC until device teardown".
The net core explicitly transfers ownership of a backend from the global
list to the NIC without freeing it, so it can be torn down later by the
device itself. See the comments in qemu_del_net_client and net_cleanup
in net/net.c.
In other words, a backend can be removed from net_clients and still
exist (owned by the NIC) until qemu_del_nic cleans it up and frees it.
There's no refcount today and lifetime is manual.
The real bug here is that hubport's cleanup is not idempotent, not
necessarily the ownership model itself.
I do agree though that the lifetime of these objects is confusing. And a
refcount model would be theoretically cleaner, but current code
explicitly relies on ownership transfer without refcounts.
---
Actually, now that I think about it, perhaps we don't need this extra
'listed' state and instead can just check 'if (port->hub)' and set it to
NULL after it's removed from the list.
On Fri, Aug 22, 2025 at 12:39 AM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 8/21/25 11:13 AM, David Woodhouse wrote:
> > On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> >> Makes the net_hub_port_cleanup function idempotent to avoid double
> >> removals by guarding its QLIST_REMOVE with a flag.
> >>
> >> When using a Xen networking device with hubport backends, e.g.:
> >>
> >> -accel kvm,xen-version=0x40011
> >> -netdev hubport,...
> >> -device xen-net-device,...
> >>
> >> the shutdown order starts with net_cleanup, which walks the list and
> >> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> >> called, which eventually leads to a second net_hub_port_cleanup call,
> >> resulting in a segfault.
> >>
> >> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> >
> > Tested-by: David Woodhouse <dwmw@amazon.co.uk>
> >
> > But I hate it.
> >
> > The lifetime of these objects is confusing, and this patch doesn't make
> > it nicer.
> >
> > Why is it OK for the object to be taken off the list while it still
> > exists and is findable by other pointers? What does it *mean* for it to
> > be in that state? Doesn't it have a refcount? Can't it be unlisted
> > *and* freed only when that refcount goes to zero?
> >
> >
>
> I believe this "off-list but still exists" state is intentional and is
> the model for NIC peer backends. IIUC, it essentially means "detached
> from hub broadcast but owned by the NIC until device teardown".
>
> The net core explicitly transfers ownership of a backend from the global
> list to the NIC without freeing it, so it can be torn down later by the
> device itself. See the comments in qemu_del_net_client and net_cleanup
> in net/net.c.
>
> In other words, a backend can be removed from net_clients and still
> exist (owned by the NIC) until qemu_del_nic cleans it up and frees it.
> There's no refcount today and lifetime is manual.
>
> The real bug here is that hubport's cleanup is not idempotent, not
> necessarily the ownership model itself.
>
> I do agree though that the lifetime of these objects is confusing. And a
> refcount model would be theoretically cleaner, but current code
> explicitly relies on ownership transfer without refcounts.
In the long term, we should consider converting networking code to QOM.
Thanks
>
> ---
>
> Actually, now that I think about it, perhaps we don't need this extra
> 'listed' state and instead can just check 'if (port->hub)' and set it to
> NULL after it's removed from the list.
>
© 2016 - 2025 Red Hat, Inc.