[Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize

Fam Zheng posted 1 patch 6 years, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170516072414.19025-1-famz@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
hw/virtio/virtio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Fam Zheng 6 years, 11 months ago
This is noticed while working on RHBZ 1449031, and fixes the reported
crash which happens when plugging back a virtio-scsi device after
unplugging it.

The root cause of the crash is not obvious here, but the change
regardlessly makes sense so it's proposed here: the listener was
registered in .realize(), so do the cleanup in the matching .unrealize()
rather than the .finalize() callback.

The difference this makes is that, due to some other references to the
memory region that is owned here, .finalize() is not called when unplug.
(Note that memory_region_ref() does object_ref() on the owner instead of
the MemoryRegion itself.) This is something fishy, and is being
investigated independently.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/virtio/virtio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 03592c5..12604d6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2515,6 +2515,7 @@ static void virtio_device_unrealize(DeviceState *dev, Error **errp)
         }
     }
 
+    memory_listener_unregister(&vdev->listener);
     g_free(vdev->bus_name);
     vdev->bus_name = NULL;
 }
@@ -2539,7 +2540,6 @@ static void virtio_device_instance_finalize(Object *obj)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(obj);
 
-    memory_listener_unregister(&vdev->listener);
     virtio_device_free_virtqueues(vdev);
 
     g_free(vdev->config);
-- 
2.9.3


Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Fam Zheng 6 years, 11 months ago
On Tue, 05/16 15:24, Fam Zheng wrote:
> The root cause of the crash is not obvious here, but the change
> regardlessly makes sense so it's proposed here: the listener was
> registered in .realize(), so do the cleanup in the matching .unrealize()
> rather than the .finalize() callback.

Actually it seem calling memory_listener_unregister in .instance_finalize is not
safe because it can be in the RCU thread.  This race is what caused the
corruption of the listener lists.

So I think this patch is doing the right thing.

Fam

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Paolo Bonzini 6 years, 11 months ago

On 16/05/2017 10:07, Fam Zheng wrote:
> On Tue, 05/16 15:24, Fam Zheng wrote:
>> The root cause of the crash is not obvious here, but the change
>> regardlessly makes sense so it's proposed here: the listener was
>> registered in .realize(), so do the cleanup in the matching .unrealize()
>> rather than the .finalize() callback.

This is not entirely true.

Unrealize is the point where the device doesn't get any more requests.
Instance finalize is the point where there are no references anymore.
If a pending request has a reference to X, instance finalize is the
right place to free X.

However, in this case using .unrealize() should be fine.

> Actually it seem calling memory_listener_unregister in .instance_finalize is not
> safe because it can be in the RCU thread.  This race is what caused the
> corruption of the listener lists.

RCU callbacks are called with BQL held, so that shouldn't be it.  But
the patch should be okay anyway.

Paolo

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Fam Zheng 6 years, 11 months ago
On Tue, 05/16 11:23, Paolo Bonzini wrote:
> 
> 
> On 16/05/2017 10:07, Fam Zheng wrote:
> > On Tue, 05/16 15:24, Fam Zheng wrote:
> >> The root cause of the crash is not obvious here, but the change
> >> regardlessly makes sense so it's proposed here: the listener was
> >> registered in .realize(), so do the cleanup in the matching .unrealize()
> >> rather than the .finalize() callback.
> 
> This is not entirely true.
> 
> Unrealize is the point where the device doesn't get any more requests.
> Instance finalize is the point where there are no references anymore.
> If a pending request has a reference to X, instance finalize is the
> right place to free X.
> 
> However, in this case using .unrealize() should be fine.
> 
> > Actually it seem calling memory_listener_unregister in .instance_finalize is not
> > safe because it can be in the RCU thread.  This race is what caused the
> > corruption of the listener lists.
> 
> RCU callbacks are called with BQL held, so that shouldn't be it.  But
> the patch should be okay anyway.

You are right. Having had another look, I think it's because of this:
VirtIODevice is an embeded member of VirtIOSCSIPCI therefore it is never
"finalized" through QOM reference directly.  Am I right?

Fam

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Paolo Bonzini 6 years, 11 months ago

On 16/05/2017 14:25, Fam Zheng wrote:
> You are right. Having had another look, I think it's because of this:
> VirtIODevice is an embeded member of VirtIOSCSIPCI therefore it is never
> "finalized" through QOM reference directly.  Am I right?

What I would expect is:

virtio_instance_init_common:
- create the VirtIODevice with refcount 1
- create a child property for the VirtIODevice (refcount is now 2)
- unref the VirtIODevice (refcount is again 1)

virtio_pci_realize:
- virtio_pci_bus_new creates the virtio bus
- the virtio bus is added as a child property

virtio_scsi_pci_realize:
- qdev_set_parent_bus links the device to the bus (bus and VirtIODevice
refcounts are now 3)
- the VirtIODevice is realized

...
at hot-unplug time:
- the device is unrealized
- the bus is unparented (calling bus_unparent)
  - the device is unparented (calling device_unparent)
    - bus_remove_child is called (bus and VirtIODevice refcounts are now 1)
    - the VirtIODevice child property is deleted by object_unparent and
the VirtIODevice is finalized
  - the bus child property is deleted by object_unparent and the
VirtIODevice is finalized

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Fam Zheng 6 years, 11 months ago
On Tue, 05/16 14:44, Paolo Bonzini wrote:
> 
> 
> On 16/05/2017 14:25, Fam Zheng wrote:
> > You are right. Having had another look, I think it's because of this:
> > VirtIODevice is an embeded member of VirtIOSCSIPCI therefore it is never
> > "finalized" through QOM reference directly.  Am I right?
> 
> What I would expect is:
> 
> virtio_instance_init_common:
> - create the VirtIODevice with refcount 1
> - create a child property for the VirtIODevice (refcount is now 2)
> - unref the VirtIODevice (refcount is again 1)
> 
> virtio_pci_realize:
> - virtio_pci_bus_new creates the virtio bus
> - the virtio bus is added as a child property
> 
> virtio_scsi_pci_realize:
> - qdev_set_parent_bus links the device to the bus (bus and VirtIODevice
> refcounts are now 3)
> - the VirtIODevice is realized
> 
> ...
> at hot-unplug time:
> - the device is unrealized
> - the bus is unparented (calling bus_unparent)
>   - the device is unparented (calling device_unparent)
>     - bus_remove_child is called (bus and VirtIODevice refcounts are now 1)
>     - the VirtIODevice child property is deleted by object_unparent and
> the VirtIODevice is finalized
>   - the bus child property is deleted by object_unparent and the
> VirtIODevice is finalized

Sorry I don't understand. From my debugging, VirtIODevice is not finalized,
because it is embeded as VirtIOSCSIPCI.vdev.parent_obj.parent_obj, in non-zero
offset.

As I understand it, it's the VirtIOSCSIPCI instance that is being finalized, but
not VirtIODevice. Because the object_unparent is actually called on the
containing object:

Thread 4 "qemu-system-x86" hit Breakpoint 1, device_unparent (obj=0x559449824f40) at /stor/work/qemu/hw/core/qdev.c:1078
1078	    DeviceState *dev = DEVICE(obj);
(gdb) bt
#0  0x00005594455f11df in device_unparent (obj=0x559449824f40) at /stor/work/qemu/hw/core/qdev.c:1078
#1  0x00005594457be582 in object_finalize_child_property (obj=0x559447f0d320, name=0x5594498306e0 "scsi1", opaque=0x559449824f40) at /stor/work/qemu/qom/object.c:1369
#2  0x00005594457bc34a in object_property_del_child (obj=0x559447f0d320, child=0x559449824f40, errp=0x0) at /stor/work/qemu/qom/object.c:428
#3  0x00005594457bc42a in object_unparent (obj=0x559449824f40) at /stor/work/qemu/qom/object.c:447
#4  0x00005594455acf19 in acpi_pcihp_eject_slot (s=0x55944967c440, bsel=0, slots=16) at /stor/work/qemu/hw/acpi/pcihp.c:138
#5  0x00005594455ad542 in pci_write (opaque=0x55944967c440, addr=8, data=16, size=4) at /stor/work/qemu/hw/acpi/pcihp.c:272
#6  0x0000559445437667 in memory_region_write_accessor (mr=0x55944967d050, addr=8, value=0x7fdfa7569838, size=4, shift=0, mask=4294967295, attrs=...)
    at /stor/work/qemu/memory.c:526
#7  0x000055944543787f in access_with_adjusted_size (addr=8, value=0x7fdfa7569838, size=4, access_size_min=1, access_size_max=4, access=
    0x55944543757d <memory_region_write_accessor>, mr=0x55944967d050, attrs=...) at /stor/work/qemu/memory.c:592
#8  0x0000559445439fdb in memory_region_dispatch_write (mr=0x55944967d050, addr=8, data=16, size=4, attrs=...) at /stor/work/qemu/memory.c:1319
#9  0x00005594453dfa77 in address_space_write_continue (as=0x559445f1fce0 <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4, addr1=8, l=4, mr=0x55944967d050) at /stor/work/qemu/exec.c:2822
#10 0x00005594453dfc31 in address_space_write (as=0x559445f1fce0 <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4) at /stor/work/qemu/exec.c:2879
#11 0x00005594453dffbd in address_space_rw (as=0x559445f1fce0 <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4, is_write=true)
    at /stor/work/qemu/exec.c:2981
#12 0x0000559445433797 in kvm_handle_io (port=44552, attrs=..., data=0x7fdfb77d2000, direction=1, size=4, count=1) at /stor/work/qemu/kvm-all.c:1803
#13 0x0000559445433e87 in kvm_cpu_exec (cpu=0x559447f0d560) at /stor/work/qemu/kvm-all.c:2032
#14 0x0000559445419cc6 in qemu_kvm_cpu_thread_fn (arg=0x559447f0d560) at /stor/work/qemu/cpus.c:1118
#15 0x00007fdfb56ba6ca in start_thread () at /lib64/libpthread.so.0
#16 0x00007fdfb1382f7f in clone () at /lib64/libc.so.6
(gdb) p *obj.class.type
$1 = {name = 0x559447e57a20 "virtio-scsi-pci", class_size = 280, instance_size = 34688, class_init = 0x55944573573e <virtio_scsi_pci_class_init>, class_base_init = 0x0, 
  class_finalize = 0x0, class_data = 0x0, instance_init = 0x559445735837 <virtio_scsi_pci_instance_init>, instance_post_init = 0x0, instance_finalize = 0x0, 
  abstract = false, parent = 0x559447e57a40 "virtio-pci", parent_type = 0x559447e57520, class = 0x559447e7add0, num_interfaces = 0, interfaces = {{
      typename = 0x0} <repeats 32 times>}}

Am I missing something?

Fam

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Paolo Bonzini 6 years, 11 months ago
> On Tue, 05/16 14:44, Paolo Bonzini wrote:
> > On 16/05/2017 14:25, Fam Zheng wrote:
> > > You are right. Having had another look, I think it's because of this:
> > > VirtIODevice is an embeded member of VirtIOSCSIPCI therefore it is never
> > > "finalized" through QOM reference directly.  Am I right?
> > 
> > What I would expect is:
> > 
> > virtio_instance_init_common:
> > - create the VirtIODevice with refcount 1
> > - create a child property for the VirtIODevice (refcount is now 2)
> > - unref the VirtIODevice (refcount is again 1)
> > 
> > virtio_pci_realize:
> > - virtio_pci_bus_new creates the virtio bus
> > - the virtio bus is added as a child property
> > 
> > virtio_scsi_pci_realize:
> > - qdev_set_parent_bus links the device to the bus (bus and VirtIODevice
> > refcounts are now 3)
> > - the VirtIODevice is realized
> > 
> > ...
> > at hot-unplug time:
> > - the device is unrealized
> > - the bus is unparented (calling bus_unparent)
> >   - the device is unparented (calling device_unparent)
> >     - bus_remove_child is called (bus and VirtIODevice refcounts are now 1)
> >     - the VirtIODevice child property is deleted by object_unparent and
> > the VirtIODevice is finalized
> >   - the bus child property is deleted by object_unparent and the
> > VirtIODevice is finalized
> 
> Sorry I don't understand. From my debugging, VirtIODevice is not finalized,
> because it is embeded as VirtIOSCSIPCI.vdev.parent_obj.parent_obj, in
> non-zero offset.

It is correct that it's embedded.  But it is added as a child property,
and when the child property is deleted the refcount should go to zero and
the VirtIODevice should be deleted too.

The child property is deleted when bus_unparent calls object_unparent:

    while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
        DeviceState *dev = kid->child;
        object_unparent(OBJECT(dev));
    }

and in turn bus_unparent is called by the VirtIOSCSIPCI's unparent
callback (device_unparent):

    while (dev->num_child_bus) {
        bus = QLIST_FIRST(&dev->child_bus);
        object_unparent(OBJECT(bus));
    }

Thanks,

Paolo


> 
> As I understand it, it's the VirtIOSCSIPCI instance that is being finalized,
> but
> not VirtIODevice. Because the object_unparent is actually called on the
> containing object:
> 
> Thread 4 "qemu-system-x86" hit Breakpoint 1, device_unparent
> (obj=0x559449824f40) at /stor/work/qemu/hw/core/qdev.c:1078
> 1078	    DeviceState *dev = DEVICE(obj);
> (gdb) bt
> #0  0x00005594455f11df in device_unparent (obj=0x559449824f40) at
> /stor/work/qemu/hw/core/qdev.c:1078
> #1  0x00005594457be582 in object_finalize_child_property (obj=0x559447f0d320,
> name=0x5594498306e0 "scsi1", opaque=0x559449824f40) at
> /stor/work/qemu/qom/object.c:1369
> #2  0x00005594457bc34a in object_property_del_child (obj=0x559447f0d320,
> child=0x559449824f40, errp=0x0) at /stor/work/qemu/qom/object.c:428
> #3  0x00005594457bc42a in object_unparent (obj=0x559449824f40) at
> /stor/work/qemu/qom/object.c:447
> #4  0x00005594455acf19 in acpi_pcihp_eject_slot (s=0x55944967c440, bsel=0,
> slots=16) at /stor/work/qemu/hw/acpi/pcihp.c:138
> #5  0x00005594455ad542 in pci_write (opaque=0x55944967c440, addr=8, data=16,
> size=4) at /stor/work/qemu/hw/acpi/pcihp.c:272
> #6  0x0000559445437667 in memory_region_write_accessor (mr=0x55944967d050,
> addr=8, value=0x7fdfa7569838, size=4, shift=0, mask=4294967295, attrs=...)
>     at /stor/work/qemu/memory.c:526
> #7  0x000055944543787f in access_with_adjusted_size (addr=8,
> value=0x7fdfa7569838, size=4, access_size_min=1, access_size_max=4, access=
>     0x55944543757d <memory_region_write_accessor>, mr=0x55944967d050,
>     attrs=...) at /stor/work/qemu/memory.c:592
> #8  0x0000559445439fdb in memory_region_dispatch_write (mr=0x55944967d050,
> addr=8, data=16, size=4, attrs=...) at /stor/work/qemu/memory.c:1319
> #9  0x00005594453dfa77 in address_space_write_continue (as=0x559445f1fce0
> <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4,
> addr1=8, l=4, mr=0x55944967d050) at /stor/work/qemu/exec.c:2822
> #10 0x00005594453dfc31 in address_space_write (as=0x559445f1fce0
> <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4)
> at /stor/work/qemu/exec.c:2879
> #11 0x00005594453dffbd in address_space_rw (as=0x559445f1fce0
> <address_space_io>, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4,
> is_write=true)
>     at /stor/work/qemu/exec.c:2981
> #12 0x0000559445433797 in kvm_handle_io (port=44552, attrs=...,
> data=0x7fdfb77d2000, direction=1, size=4, count=1) at
> /stor/work/qemu/kvm-all.c:1803
> #13 0x0000559445433e87 in kvm_cpu_exec (cpu=0x559447f0d560) at
> /stor/work/qemu/kvm-all.c:2032
> #14 0x0000559445419cc6 in qemu_kvm_cpu_thread_fn (arg=0x559447f0d560) at
> /stor/work/qemu/cpus.c:1118
> #15 0x00007fdfb56ba6ca in start_thread () at /lib64/libpthread.so.0
> #16 0x00007fdfb1382f7f in clone () at /lib64/libc.so.6
> (gdb) p *obj.class.type
> $1 = {name = 0x559447e57a20 "virtio-scsi-pci", class_size = 280,
> instance_size = 34688, class_init = 0x55944573573e
> <virtio_scsi_pci_class_init>, class_base_init = 0x0,
>   class_finalize = 0x0, class_data = 0x0, instance_init = 0x559445735837
>   <virtio_scsi_pci_instance_init>, instance_post_init = 0x0,
>   instance_finalize = 0x0,
>   abstract = false, parent = 0x559447e57a40 "virtio-pci", parent_type =
>   0x559447e57520, class = 0x559447e7add0, num_interfaces = 0, interfaces =
>   {{
>       typename = 0x0} <repeats 32 times>}}
> 
> Am I missing something?
> 
> Fam
> 

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Fam Zheng 6 years, 11 months ago
On Wed, 05/17 02:58, Paolo Bonzini wrote:
> The child property is deleted when bus_unparent calls object_unparent:
> 
>     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
>         DeviceState *dev = kid->child;
>         object_unparent(OBJECT(dev));
>     }
> 
> and in turn bus_unparent is called by the VirtIOSCSIPCI's unparent
> callback (device_unparent):
> 
>     while (dev->num_child_bus) {
>         bus = QLIST_FIRST(&dev->child_bus);
>         object_unparent(OBJECT(bus));
>     }

OK, sorry for being dumb, these are way over my head. Let me try again:

I count three references before unplug:

a.1) object_property_add_child in virtio_instance_init_common as virtio-backend
a.2) qdev_set_parent_bus in virtio_scsi_pci_realize by virtio-pci-bus
a.3) qbus_set_hotplug_handler in virtio_scsi_device_realize for 

Only two object_unref()'s happen in unplug, respectively:

b.1) object_finalize_child_property, matches a.1)
b.2) bus_remove_child, matches a.2)

Do we need cleanup for a.3) ? The patch below does fix the crash for me.

---

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 46a3e3f..fde1b1fe 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -918,6 +918,8 @@ void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp)
 
 static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp)
 {
+    VirtIOSCSI *s = VIRTIO_SCSI(dev);
+    qbus_set_hotplug_handler(BUS(&s->bus), NULL, &error_abort);
     virtio_scsi_common_unrealize(dev, errp);
 }


Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize
Posted by Paolo Bonzini 6 years, 11 months ago

On 17/05/2017 14:00, Fam Zheng wrote:
> On Wed, 05/17 02:58, Paolo Bonzini wrote:
>> The child property is deleted when bus_unparent calls object_unparent:
>>
>>     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
>>         DeviceState *dev = kid->child;
>>         object_unparent(OBJECT(dev));
>>     }
>>
>> and in turn bus_unparent is called by the VirtIOSCSIPCI's unparent
>> callback (device_unparent):
>>
>>     while (dev->num_child_bus) {
>>         bus = QLIST_FIRST(&dev->child_bus);
>>         object_unparent(OBJECT(bus));
>>     }
> 
> OK, sorry for being dumb, these are way over my head. Let me try again:
> 
> I count three references before unplug:
> 
> a.1) object_property_add_child in virtio_instance_init_common as virtio-backend
> a.2) qdev_set_parent_bus in virtio_scsi_pci_realize by virtio-pci-bus
> a.3) qbus_set_hotplug_handler in virtio_scsi_device_realize for 
> 
> Only two object_unref()'s happen in unplug, respectively:
> 
> b.1) object_finalize_child_property, matches a.1)
> b.2) bus_remove_child, matches a.2)
> 
> Do we need cleanup for a.3) ? The patch below does fix the crash for me.

Yes, good catch.  Either there, or in bus_unparent.

Paolo