Infinite loop in bus_unparent(), qdev bug or qdev misuse?

Markus Armbruster posted 1 patch 4 years ago
Test docker-mingw@fedora failed
Test docker-quick@centos7 failed
Test checkpatch failed
Test asan failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/87tv0vzrwj.fsf@dusky.pond.sub.org
Maintainers: John Snow <jsnow@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
hw/block/fdc.c | 4 ++++
hw/core/bus.c  | 5 +++++
2 files changed, 9 insertions(+)
Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Markus Armbruster 4 years ago
I stumbled over this while working on a feature branch.  Instead of
throwing the whole branch at you as a reproducer, I give you a mock up.

This is fdctrl_connect_drives():

        dev = qdev_create(&fdctrl->bus.bus, "floppy");
        qdev_prop_set_uint32(dev, "unit", i);
        qdev_prop_set_enum(dev, "drive-type", fdctrl->qdev_for_drives[i].type);

        blk_ref(blk);
        blk_detach_dev(blk, fdc_dev);
        fdctrl->qdev_for_drives[i].blk = NULL;
        qdev_prop_set_drive(dev, "drive", blk, &local_err);
        blk_unref(blk);

        if (local_err) {
            error_propagate(errp, local_err);
            return;
        }

        object_property_set_bool(OBJECT(dev), true, "realized", &local_err);
        if (local_err) {
            error_propagate(errp, local_err);
            return;
        }

What if qdev_prop_set_drive() fails?  I don't have a reproducer ready (I
do on my feature branch), so let's mock it, and also instrument the
place where things go wrong.  Patch appended.  To try it, run
qemu-system-x86_64 without arguments.

Turns out the failure bubbles up into device_set_realized() for the
isa-fdc, where the cleanup code calls object_unparent().  This unparents
children, and ends up in bus_unparent() for the isa-fdc's floppy-bus:

    #4  0x0000555555abdb7f in bus_unparent (obj=0x55555675a9f0)
        at /work/armbru/qemu/hw/core/bus.c:148
    #5  0x0000555555d2aea6 in object_finalize_child_property
        (obj=0x55555675a800, name=0x555557281230 "floppy-bus.0", opaque=0x55555675a9f0) at /work/armbru/qemu/qom/object.c:1672
    #6  0x0000555555d2872b in object_property_del_child
        (obj=0x55555675a800, child=0x55555675a9f0, errp=0x0)
        at /work/armbru/qemu/qom/object.c:628
    #7  0x0000555555d2880b in object_unparent (obj=0x55555675a9f0)
        at /work/armbru/qemu/qom/object.c:647
    #8  0x0000555555ab9e10 in device_unparent (obj=0x55555675a800)
        at /work/armbru/qemu/hw/core/qdev.c:1101

This loop there

    while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
        DeviceState *dev = kid->child;
        object_unparent(OBJECT(dev));
    }

makes no progreess because OBJECT(dev)->parent is still null, and
therefore object_unparent() does nothing.

Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which
adds the device to the bus, but leaves ->parent null.  If this isn't
wrong outright, it's at least a dangerous state.

Work-around: call qdev_set_id(dev, NULL) right after qdev_create().
This sets ->parent.


From 2554db096866138a85482d683e57a38166bb425b Mon Sep 17 00:00:00 2001
From: Markus Armbruster <armbru@redhat.com>
Date: Mon, 4 May 2020 15:58:10 +0200
Subject: [PATCH] qdev: Hack to reproduce infinite loop in bus_unparent()

---
 hw/block/fdc.c | 4 ++++
 hw/core/bus.c  | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 9628cc171e..f57558eea4 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2523,7 +2523,11 @@ static void fdctrl_connect_drives(FDCtrl *fdctrl, DeviceState *fdc_dev,
         blk_ref(blk);
         blk_detach_dev(blk, fdc_dev);
         fdctrl->qdev_for_drives[i].blk = NULL;
+#if 0
         qdev_prop_set_drive(dev, "drive", blk, &local_err);
+#else
+        error_setg(&local_err, "hack");
+#endif
         blk_unref(blk);
 
         if (local_err) {
diff --git a/hw/core/bus.c b/hw/core/bus.c
index 3dc0a825f0..3620a7be54 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -135,12 +135,17 @@ static void bus_unparent(Object *obj)
     BusState *bus = BUS(obj);
     BusChild *kid;
 
+    printf("### %s bus=%p %s\n",
+           __func__, obj, object_get_typename(obj));
     /* Only the main system bus has no parent, and that bus is never freed */
     assert(bus->parent);
 
     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
         DeviceState *dev = kid->child;
+        printf("### %s kid=%p %s\n",
+               __func__, OBJECT(dev), object_get_typename(OBJECT(dev)));
         object_unparent(OBJECT(dev));
+        assert(kid != QTAILQ_FIRST(&bus->children));
     }
     QLIST_REMOVE(bus, sibling);
     bus->parent->num_child_bus--;
-- 
2.21.1


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by no-reply@patchew.org 4 years ago
Patchew URL: https://patchew.org/QEMU/87tv0vzrwj.fsf@dusky.pond.sub.org/



Hi,

This series failed build test on FreeBSD host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
if qemu-system-x86_64 --help >/dev/null 2>&1; then
  QEMU=qemu-system-x86_64
elif /usr/libexec/qemu-kvm --help >/dev/null 2>&1; then
  QEMU=/usr/libexec/qemu-kvm
else
  exit 1
fi
make vm-build-freebsd J=21 QEMU=$QEMU
exit 0
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/87tv0vzrwj.fsf@dusky.pond.sub.org/testing.FreeBSD/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by no-reply@patchew.org 4 years ago
Patchew URL: https://patchew.org/QEMU/87tv0vzrwj.fsf@dusky.pond.sub.org/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/87tv0vzrwj.fsf@dusky.pond.sub.org/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Paolo Bonzini 4 years ago
On 04/05/20 16:38, Markus Armbruster wrote:
> makes no progreess because OBJECT(dev)->parent is still null, and
> therefore object_unparent() does nothing.
> 
> Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which
> adds the device to the bus, but leaves ->parent null.  If this isn't
> wrong outright, it's at least a dangerous state.
> 
> Work-around: call qdev_set_id(dev, NULL) right after qdev_create().
> This sets ->parent.

That's a good one, and especially a safe one, since it matches
qdev_device_add.  It has the disadvantage of having to touch all
qdev_create() calls.

Even better however would be to move the bus argument (and thus
qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move
qdev_set_id after qemu_opt_foreach.  I looked at the property setters
and couldn't find anything suspicious (somewhat to my surprise), but I
haven't honestly tried.

Paolo


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Peter Maydell 4 years ago
On Mon, 4 May 2020 at 16:11, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 04/05/20 16:38, Markus Armbruster wrote:
> > makes no progreess because OBJECT(dev)->parent is still null, and
> > therefore object_unparent() does nothing.
> >
> > Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which
> > adds the device to the bus, but leaves ->parent null.  If this isn't
> > wrong outright, it's at least a dangerous state.
> >
> > Work-around: call qdev_set_id(dev, NULL) right after qdev_create().
> > This sets ->parent.
>
> That's a good one, and especially a safe one, since it matches
> qdev_device_add.  It has the disadvantage of having to touch all
> qdev_create() calls.

qdev_create() is supposed to be a "board code etc uses this to
create devices" function; I don't think we should impose extra
usage requirements like "you must immediately afterwards call this
function you weren't calling before". If qdev_set_id() needs to
be called, why not call it inside qdev_create() ?

thanks
-- PMM

Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Markus Armbruster 3 years, 12 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 04/05/20 16:38, Markus Armbruster wrote:
>> makes no progreess because OBJECT(dev)->parent is still null, and
>> therefore object_unparent() does nothing.
>> 
>> Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which
>> adds the device to the bus, but leaves ->parent null.  If this isn't
>> wrong outright, it's at least a dangerous state.
>> 
>> Work-around: call qdev_set_id(dev, NULL) right after qdev_create().
>> This sets ->parent.
>
> That's a good one, and especially a safe one, since it matches
> qdev_device_add.  It has the disadvantage of having to touch all
> qdev_create() calls.

Also, it moves onboard devices from /machine/unattached/ to
/machine/peripheral-anon/.

I really regard it as a work-around, not a proper solution.

> Even better however would be to move the bus argument (and thus
> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move
> qdev_set_id after qemu_opt_foreach.  I looked at the property setters
> and couldn't find anything suspicious (somewhat to my surprise), but I
> haven't honestly tried.

Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM
parent"[*] by moving "add to parent bus" next to the place where we
ensure "has QOM parent" by putting orphans under /machine/unattached/.
Makes sense.

If we add to the bus first, the precondition ceases to hold until we
realize.  Ugly, but harmless unless we manage to actually call the
function then.

I suspect we can't realize first, because the realize method may want to
use the parent bus.

We could lift putting orphans under /machine/unattached from
device_set_realized() into those callers that don't already assign a QOM
parent.  Possibly hundreds of places, hmm.

We could factor it out into a helper, and call it right before we add to
the parent bus.  Orphans with a parent bus would get adopted ealier,
orphans without one would still get adopted at realize time.

I'll play with it.


[*] Confusing terminology; two separate parent-child relationships


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Paolo Bonzini 3 years, 12 months ago
On 05/05/20 18:03, Markus Armbruster wrote:
>> That's a good one, and especially a safe one, since it matches
>> qdev_device_add.  It has the disadvantage of having to touch all
>> qdev_create() calls.
> 
> Also, it moves onboard devices from /machine/unattached/ to
> /machine/peripheral-anon/.

Uh indeed.  No that's too ugly.

>> Even better however would be to move the bus argument (and thus
>> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move
>> qdev_set_id after qemu_opt_foreach.  I looked at the property setters
>> and couldn't find anything suspicious (somewhat to my surprise), but I
>> haven't honestly tried.
> 
> Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM
> parent"[*] by moving "add to parent bus" next to the place where we
> ensure "has QOM parent" by putting orphans under /machine/unattached/.
> Makes sense.
> 
> If we add to the bus first, the precondition ceases to hold until we
> realize.  Ugly, but harmless unless we manage to actually call the
> function then.

Shouldn't be a big deal, since users should call either qdev_set_id or
object_property_add_child before device_set_realized.

> I suspect we can't realize first, because the realize method may want to
> use the parent bus.

Right.

Moving the bus to qdev_init would be quite large but hopefully scriptable.

Paolo


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Markus Armbruster 3 years, 12 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 05/05/20 18:03, Markus Armbruster wrote:
>>> That's a good one, and especially a safe one, since it matches
>>> qdev_device_add.  It has the disadvantage of having to touch all
>>> qdev_create() calls.
>> 
>> Also, it moves onboard devices from /machine/unattached/ to
>> /machine/peripheral-anon/.
>
> Uh indeed.  No that's too ugly.
>
>>> Even better however would be to move the bus argument (and thus
>>> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move
>>> qdev_set_id after qemu_opt_foreach.  I looked at the property setters
>>> and couldn't find anything suspicious (somewhat to my surprise), but I
>>> haven't honestly tried.
>> 
>> Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM
>> parent"[*] by moving "add to parent bus" next to the place where we
>> ensure "has QOM parent" by putting orphans under /machine/unattached/.
>> Makes sense.
>> 
>> If we add to the bus first, the precondition ceases to hold until we
>> realize.  Ugly, but harmless unless we manage to actually call the
>> function then.
>
> Shouldn't be a big deal, since users should call either qdev_set_id or
> object_property_add_child before device_set_realized.

The issue isn't neglecting to set a QOM parent, it's destroying a device
before its bus children get their QOM parent.

Mostly harmless: by delaying "add to bus" until right before realize, we
narrow the window where the trap is armed, and keep it completely within
qdev_init_nofail(), qdev_device_add(), and possibly code that duplicates
their work.  Ensuring qdev_init_nofail() and qdev_device_add() don't
fail in this window should be easy enough (except for writing the
comment explaining it).  The duplicates, though... I guess we need to
double-check users of qdev_set_parent_bus().

Ugly: yes, compared to the pretty invariant "bus children all have QOM
parents".

>> I suspect we can't realize first, because the realize method may want to
>> use the parent bus.
>
> Right.
>
> Moving the bus to qdev_init would be quite large but hopefully scriptable.

Feels feasible.


[*] We might want to look into deduplicating: the string "realized"
occurs more than 450 times, and I figure almost always as property name.


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Markus Armbruster 3 years, 11 months ago
Markus Armbruster <armbru@redhat.com> writes:

> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 05/05/20 18:03, Markus Armbruster wrote:
>>>> That's a good one, and especially a safe one, since it matches
>>>> qdev_device_add.  It has the disadvantage of having to touch all
>>>> qdev_create() calls.
>>> 
>>> Also, it moves onboard devices from /machine/unattached/ to
>>> /machine/peripheral-anon/.
>>
>> Uh indeed.  No that's too ugly.
>>
>>>> Even better however would be to move the bus argument (and thus
>>>> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move
>>>> qdev_set_id after qemu_opt_foreach.  I looked at the property setters
>>>> and couldn't find anything suspicious (somewhat to my surprise), but I
>>>> haven't honestly tried.
>>> 
>>> Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM
>>> parent"[*] by moving "add to parent bus" next to the place where we
>>> ensure "has QOM parent" by putting orphans under /machine/unattached/.
>>> Makes sense.
>>> 
>>> If we add to the bus first, the precondition ceases to hold until we
>>> realize.  Ugly, but harmless unless we manage to actually call the
>>> function then.
>>
>> Shouldn't be a big deal, since users should call either qdev_set_id or
>> object_property_add_child before device_set_realized.
>
> The issue isn't neglecting to set a QOM parent, it's destroying a device
> before its bus children get their QOM parent.
>
> Mostly harmless: by delaying "add to bus" until right before realize, we
> narrow the window where the trap is armed, and keep it completely within
> qdev_init_nofail(), qdev_device_add(), and possibly code that duplicates
> their work.  Ensuring qdev_init_nofail() and qdev_device_add() don't
> fail in this window should be easy enough (except for writing the
> comment explaining it).  The duplicates, though... I guess we need to
> double-check users of qdev_set_parent_bus().
>
> Ugly: yes, compared to the pretty invariant "bus children all have QOM
> parents".
>
>>> I suspect we can't realize first, because the realize method may want to
>>> use the parent bus.
>>
>> Right.
>>
>> Moving the bus to qdev_init would be quite large but hopefully scriptable.
>
> Feels feasible.

Look, a rabbit hole :)

I got rough patches, with several more to do.

There's one thing worth mentioning.  Unrealize is recursive: unrealizing
a qdev recurses into its qbuses, and unrealizing a qbus recurses into
its qdevs.  Realize, however, is TODO recursive :)  See your commit
5942a19040 "qdev: recursively unrealize devices when unrealizing bus",
2014-06-19.

Moving "put on qbus" from qdev_create() (and its wrappers) to
qdev_init_nofail() means we put on bus by realizing.  No use to
recursive realization then,

> [*] We might want to look into deduplicating: the string "realized"
> occurs more than 450 times, and I figure almost always as property name.

I wield a sharp hatchet.


Re: Infinite loop in bus_unparent(), qdev bug or qdev misuse?
Posted by Paolo Bonzini 3 years, 11 months ago
On 12/05/20 17:58, Markus Armbruster wrote:
> 
> Moving "put on qbus" from qdev_create() (and its wrappers) to
> qdev_init_nofail() means we put on bus by realizing.  No use to
> recursive realization then,

... for qdev_init_nofail; it may still be useful to *replace*
qdev_init_nofail with object_property_add_child+qdev_set_parent_bus, and
do recursive realization.  The bug wouldn't be hit as long as
qdev_set_parent_bus is only called after object_property_add_child.

Certainly not today though!

Paolo