hw/block/fdc.c | 4 ++++ hw/core/bus.c | 5 +++++ 2 files changed, 9 insertions(+)
I stumbled over this while working on a feature branch. Instead of
throwing the whole branch at you as a reproducer, I give you a mock up.
This is fdctrl_connect_drives():
dev = qdev_create(&fdctrl->bus.bus, "floppy");
qdev_prop_set_uint32(dev, "unit", i);
qdev_prop_set_enum(dev, "drive-type", fdctrl->qdev_for_drives[i].type);
blk_ref(blk);
blk_detach_dev(blk, fdc_dev);
fdctrl->qdev_for_drives[i].blk = NULL;
qdev_prop_set_drive(dev, "drive", blk, &local_err);
blk_unref(blk);
if (local_err) {
error_propagate(errp, local_err);
return;
}
object_property_set_bool(OBJECT(dev), true, "realized", &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
What if qdev_prop_set_drive() fails? I don't have a reproducer ready (I
do on my feature branch), so let's mock it, and also instrument the
place where things go wrong. Patch appended. To try it, run
qemu-system-x86_64 without arguments.
Turns out the failure bubbles up into device_set_realized() for the
isa-fdc, where the cleanup code calls object_unparent(). This unparents
children, and ends up in bus_unparent() for the isa-fdc's floppy-bus:
#4 0x0000555555abdb7f in bus_unparent (obj=0x55555675a9f0)
at /work/armbru/qemu/hw/core/bus.c:148
#5 0x0000555555d2aea6 in object_finalize_child_property
(obj=0x55555675a800, name=0x555557281230 "floppy-bus.0", opaque=0x55555675a9f0) at /work/armbru/qemu/qom/object.c:1672
#6 0x0000555555d2872b in object_property_del_child
(obj=0x55555675a800, child=0x55555675a9f0, errp=0x0)
at /work/armbru/qemu/qom/object.c:628
#7 0x0000555555d2880b in object_unparent (obj=0x55555675a9f0)
at /work/armbru/qemu/qom/object.c:647
#8 0x0000555555ab9e10 in device_unparent (obj=0x55555675a800)
at /work/armbru/qemu/hw/core/qdev.c:1101
This loop there
while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
DeviceState *dev = kid->child;
object_unparent(OBJECT(dev));
}
makes no progreess because OBJECT(dev)->parent is still null, and
therefore object_unparent() does nothing.
Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which
adds the device to the bus, but leaves ->parent null. If this isn't
wrong outright, it's at least a dangerous state.
Work-around: call qdev_set_id(dev, NULL) right after qdev_create().
This sets ->parent.
From 2554db096866138a85482d683e57a38166bb425b Mon Sep 17 00:00:00 2001
From: Markus Armbruster <armbru@redhat.com>
Date: Mon, 4 May 2020 15:58:10 +0200
Subject: [PATCH] qdev: Hack to reproduce infinite loop in bus_unparent()
---
hw/block/fdc.c | 4 ++++
hw/core/bus.c | 5 +++++
2 files changed, 9 insertions(+)
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 9628cc171e..f57558eea4 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2523,7 +2523,11 @@ static void fdctrl_connect_drives(FDCtrl *fdctrl, DeviceState *fdc_dev,
blk_ref(blk);
blk_detach_dev(blk, fdc_dev);
fdctrl->qdev_for_drives[i].blk = NULL;
+#if 0
qdev_prop_set_drive(dev, "drive", blk, &local_err);
+#else
+ error_setg(&local_err, "hack");
+#endif
blk_unref(blk);
if (local_err) {
diff --git a/hw/core/bus.c b/hw/core/bus.c
index 3dc0a825f0..3620a7be54 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -135,12 +135,17 @@ static void bus_unparent(Object *obj)
BusState *bus = BUS(obj);
BusChild *kid;
+ printf("### %s bus=%p %s\n",
+ __func__, obj, object_get_typename(obj));
/* Only the main system bus has no parent, and that bus is never freed */
assert(bus->parent);
while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
DeviceState *dev = kid->child;
+ printf("### %s kid=%p %s\n",
+ __func__, OBJECT(dev), object_get_typename(OBJECT(dev)));
object_unparent(OBJECT(dev));
+ assert(kid != QTAILQ_FIRST(&bus->children));
}
QLIST_REMOVE(bus, sibling);
bus->parent->num_child_bus--;
--
2.21.1
Patchew URL: https://patchew.org/QEMU/87tv0vzrwj.fsf@dusky.pond.sub.org/ Hi, This series failed build test on FreeBSD host. Please find the details below. === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch if qemu-system-x86_64 --help >/dev/null 2>&1; then QEMU=qemu-system-x86_64 elif /usr/libexec/qemu-kvm --help >/dev/null 2>&1; then QEMU=/usr/libexec/qemu-kvm else exit 1 fi make vm-build-freebsd J=21 QEMU=$QEMU exit 0 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/87tv0vzrwj.fsf@dusky.pond.sub.org/testing.FreeBSD/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Patchew URL: https://patchew.org/QEMU/87tv0vzrwj.fsf@dusky.pond.sub.org/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/87tv0vzrwj.fsf@dusky.pond.sub.org/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On 04/05/20 16:38, Markus Armbruster wrote: > makes no progreess because OBJECT(dev)->parent is still null, and > therefore object_unparent() does nothing. > > Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which > adds the device to the bus, but leaves ->parent null. If this isn't > wrong outright, it's at least a dangerous state. > > Work-around: call qdev_set_id(dev, NULL) right after qdev_create(). > This sets ->parent. That's a good one, and especially a safe one, since it matches qdev_device_add. It has the disadvantage of having to touch all qdev_create() calls. Even better however would be to move the bus argument (and thus qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move qdev_set_id after qemu_opt_foreach. I looked at the property setters and couldn't find anything suspicious (somewhat to my surprise), but I haven't honestly tried. Paolo
On Mon, 4 May 2020 at 16:11, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 04/05/20 16:38, Markus Armbruster wrote: > > makes no progreess because OBJECT(dev)->parent is still null, and > > therefore object_unparent() does nothing. > > > > Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which > > adds the device to the bus, but leaves ->parent null. If this isn't > > wrong outright, it's at least a dangerous state. > > > > Work-around: call qdev_set_id(dev, NULL) right after qdev_create(). > > This sets ->parent. > > That's a good one, and especially a safe one, since it matches > qdev_device_add. It has the disadvantage of having to touch all > qdev_create() calls. qdev_create() is supposed to be a "board code etc uses this to create devices" function; I don't think we should impose extra usage requirements like "you must immediately afterwards call this function you weren't calling before". If qdev_set_id() needs to be called, why not call it inside qdev_create() ? thanks -- PMM
Paolo Bonzini <pbonzini@redhat.com> writes: > On 04/05/20 16:38, Markus Armbruster wrote: >> makes no progreess because OBJECT(dev)->parent is still null, and >> therefore object_unparent() does nothing. >> >> Possible culprit: qdev_try_create() calls qdev_set_parent_bus(), which >> adds the device to the bus, but leaves ->parent null. If this isn't >> wrong outright, it's at least a dangerous state. >> >> Work-around: call qdev_set_id(dev, NULL) right after qdev_create(). >> This sets ->parent. > > That's a good one, and especially a safe one, since it matches > qdev_device_add. It has the disadvantage of having to touch all > qdev_create() calls. Also, it moves onboard devices from /machine/unattached/ to /machine/peripheral-anon/. I really regard it as a work-around, not a proper solution. > Even better however would be to move the bus argument (and thus > qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move > qdev_set_id after qemu_opt_foreach. I looked at the property setters > and couldn't find anything suspicious (somewhat to my surprise), but I > haven't honestly tried. Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM parent"[*] by moving "add to parent bus" next to the place where we ensure "has QOM parent" by putting orphans under /machine/unattached/. Makes sense. If we add to the bus first, the precondition ceases to hold until we realize. Ugly, but harmless unless we manage to actually call the function then. I suspect we can't realize first, because the realize method may want to use the parent bus. We could lift putting orphans under /machine/unattached from device_set_realized() into those callers that don't already assign a QOM parent. Possibly hundreds of places, hmm. We could factor it out into a helper, and call it right before we add to the parent bus. Orphans with a parent bus would get adopted ealier, orphans without one would still get adopted at realize time. I'll play with it. [*] Confusing terminology; two separate parent-child relationships
On 05/05/20 18:03, Markus Armbruster wrote: >> That's a good one, and especially a safe one, since it matches >> qdev_device_add. It has the disadvantage of having to touch all >> qdev_create() calls. > > Also, it moves onboard devices from /machine/unattached/ to > /machine/peripheral-anon/. Uh indeed. No that's too ugly. >> Even better however would be to move the bus argument (and thus >> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move >> qdev_set_id after qemu_opt_foreach. I looked at the property setters >> and couldn't find anything suspicious (somewhat to my surprise), but I >> haven't honestly tried. > > Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM > parent"[*] by moving "add to parent bus" next to the place where we > ensure "has QOM parent" by putting orphans under /machine/unattached/. > Makes sense. > > If we add to the bus first, the precondition ceases to hold until we > realize. Ugly, but harmless unless we manage to actually call the > function then. Shouldn't be a big deal, since users should call either qdev_set_id or object_property_add_child before device_set_realized. > I suspect we can't realize first, because the realize method may want to > use the parent bus. Right. Moving the bus to qdev_init would be quite large but hopefully scriptable. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 05/05/20 18:03, Markus Armbruster wrote: >>> That's a good one, and especially a safe one, since it matches >>> qdev_device_add. It has the disadvantage of having to touch all >>> qdev_create() calls. >> >> Also, it moves onboard devices from /machine/unattached/ to >> /machine/peripheral-anon/. > > Uh indeed. No that's too ugly. > >>> Even better however would be to move the bus argument (and thus >>> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move >>> qdev_set_id after qemu_opt_foreach. I looked at the property setters >>> and couldn't find anything suspicious (somewhat to my surprise), but I >>> haven't honestly tried. >> >> Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM >> parent"[*] by moving "add to parent bus" next to the place where we >> ensure "has QOM parent" by putting orphans under /machine/unattached/. >> Makes sense. >> >> If we add to the bus first, the precondition ceases to hold until we >> realize. Ugly, but harmless unless we manage to actually call the >> function then. > > Shouldn't be a big deal, since users should call either qdev_set_id or > object_property_add_child before device_set_realized. The issue isn't neglecting to set a QOM parent, it's destroying a device before its bus children get their QOM parent. Mostly harmless: by delaying "add to bus" until right before realize, we narrow the window where the trap is armed, and keep it completely within qdev_init_nofail(), qdev_device_add(), and possibly code that duplicates their work. Ensuring qdev_init_nofail() and qdev_device_add() don't fail in this window should be easy enough (except for writing the comment explaining it). The duplicates, though... I guess we need to double-check users of qdev_set_parent_bus(). Ugly: yes, compared to the pretty invariant "bus children all have QOM parents". >> I suspect we can't realize first, because the realize method may want to >> use the parent bus. > > Right. > > Moving the bus to qdev_init would be quite large but hopefully scriptable. Feels feasible. [*] We might want to look into deduplicating: the string "realized" occurs more than 450 times, and I figure almost always as property name.
Markus Armbruster <armbru@redhat.com> writes: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 05/05/20 18:03, Markus Armbruster wrote: >>>> That's a good one, and especially a safe one, since it matches >>>> qdev_device_add. It has the disadvantage of having to touch all >>>> qdev_create() calls. >>> >>> Also, it moves onboard devices from /machine/unattached/ to >>> /machine/peripheral-anon/. >> >> Uh indeed. No that's too ugly. >> >>>> Even better however would be to move the bus argument (and thus >>>> qdev_set_parent_bus) to qdev_init, and likewise in qdev_device_add move >>>> qdev_set_id after qemu_opt_foreach. I looked at the property setters >>>> and couldn't find anything suspicious (somewhat to my surprise), but I >>>> haven't honestly tried. >>> >>> Thus, we satisfy bus_unparent()'s precondition "bus children have a QOM >>> parent"[*] by moving "add to parent bus" next to the place where we >>> ensure "has QOM parent" by putting orphans under /machine/unattached/. >>> Makes sense. >>> >>> If we add to the bus first, the precondition ceases to hold until we >>> realize. Ugly, but harmless unless we manage to actually call the >>> function then. >> >> Shouldn't be a big deal, since users should call either qdev_set_id or >> object_property_add_child before device_set_realized. > > The issue isn't neglecting to set a QOM parent, it's destroying a device > before its bus children get their QOM parent. > > Mostly harmless: by delaying "add to bus" until right before realize, we > narrow the window where the trap is armed, and keep it completely within > qdev_init_nofail(), qdev_device_add(), and possibly code that duplicates > their work. Ensuring qdev_init_nofail() and qdev_device_add() don't > fail in this window should be easy enough (except for writing the > comment explaining it). The duplicates, though... I guess we need to > double-check users of qdev_set_parent_bus(). > > Ugly: yes, compared to the pretty invariant "bus children all have QOM > parents". > >>> I suspect we can't realize first, because the realize method may want to >>> use the parent bus. >> >> Right. >> >> Moving the bus to qdev_init would be quite large but hopefully scriptable. > > Feels feasible. Look, a rabbit hole :) I got rough patches, with several more to do. There's one thing worth mentioning. Unrealize is recursive: unrealizing a qdev recurses into its qbuses, and unrealizing a qbus recurses into its qdevs. Realize, however, is TODO recursive :) See your commit 5942a19040 "qdev: recursively unrealize devices when unrealizing bus", 2014-06-19. Moving "put on qbus" from qdev_create() (and its wrappers) to qdev_init_nofail() means we put on bus by realizing. No use to recursive realization then, > [*] We might want to look into deduplicating: the string "realized" > occurs more than 450 times, and I figure almost always as property name. I wield a sharp hatchet.
On 12/05/20 17:58, Markus Armbruster wrote: > > Moving "put on qbus" from qdev_create() (and its wrappers) to > qdev_init_nofail() means we put on bus by realizing. No use to > recursive realization then, ... for qdev_init_nofail; it may still be useful to *replace* qdev_init_nofail with object_property_add_child+qdev_set_parent_bus, and do recursive realization. The bug wouldn't be hit as long as qdev_set_parent_bus is only called after object_property_add_child. Certainly not today though! Paolo
© 2016 - 2024 Red Hat, Inc.