qapi/qdev.json | 2 +- qapi/qom.json | 2 +- include/qom/object.h | 25 ++++++++++++++++ include/qom/object_interfaces.h | 51 +++++++++++++++++++++++++++++++++ chardev/char.c | 4 +-- hw/core/cpu-common.c | 13 ++++++--- hw/core/qdev.c | 20 +++++++++++-- hw/i386/x86-iommu.c | 26 +++++++++++++++-- migration/migration.c | 36 +++++++++++++++++++++-- qom/object.c | 50 ++++++++++++++++++++++++++++++-- qom/object_interfaces.c | 33 +++++++++++++++++++-- qom/qom-qmp-cmds.c | 4 +-- system/qdev-monitor.c | 4 +-- 13 files changed, 243 insertions(+), 27 deletions(-)
v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com This patchset introduces the singleton interface for QOM. I didn't add a changelog because there're quite a few changes here and there, plus new patches. So it might just be easier to re-read, considering the patchset isn't large. I switched v2 into RFC, because we have reviewer concerns (Phil and Dan so far) that it could be error prone to try to trap every attempts to create an object. My argument is, if we already have abstract class, meanwhile we do not allow instantiation of abstract class, so the complexity is already there. I prepared patch 1 this time to collect and track all similar random object creations; it might be helpful as a cleanup on its own to deduplicate some similar error messages. Said that, I'm still always open to rejections to this proposal. I hope v2 looks slightly cleaner by having not only object_new_allowed() but also object_new_or_fetch(). Patch layout: Patch 1-2: The patches to introduce QOM singleton interface Patch 3-5: Add support for vIOMMU singleton, some qdev change needed Patch 6-7: Add support for migration singleton, fix dangle pointer Below are copy-paste of the commit message of patch 2, that I could have put into the cover letter too. ====8<==== The singleton interface declares an object class which can only create one instance globally. Backgrounds / Use Cases ======================= There can be some existing classes that can start to benefit from it. One example is vIOMMU implementations. QEMU emulated vIOMMUs are normally implemented on top of a generic device, however it's special enough to normally only allow one instance of it for the whole system, attached to the root complex. These vIOMMU classes can be singletons in this case, so that QEMU can fail or detect yet another attempt of creating such devices for more than once, which can be fatal errors to a system. We used to have some special code guarding it from happening. In x86, pc_machine_device_pre_plug_cb() has code to detect when vIOMMU is created more than once, for instance. With singleton class, logically we could consider dropping the special code, but start to rely on QOM to make sure there's only one vIOMMU for the whole system emulation. There is a similar demand raising recently (even if the problem existed over years) in migration. Firstly, the migration object can currently be created randomly, even though not wanted, e.g. during qom-list-properties QMP commands. Ideally, there can be cases where we want to have an object walking over the properties, we could use the existing migration object instead of dynamically create one. Meanwhile, migration has a long standing issue on current_migration pointer, where it can point to freed data after the migration object is finalized. It is debatable that the pointer can be cleared after the main thread (1) join() the migration thread first, then (2) release the last refcount for the migration object and clear the pointer. However there's still major challenges [1]. With singleton, we could have a slightly but hopefully working workaround to clear the pointer during finalize(). Design ====== The idea behind is pretty simple: any object that can only be created once can now declare the TYPE_SINGLETON interface. Then, QOM facilities will make sure it won't be created more than once for the whole QEMU lifecycle. Whenever possible (e.g., on object_new_allowed() checks), pretty error message will be generated to report an error. QOM also guards at the core of object_new() so that any further violation of trying to create a singleton more than once will crash QEMU as a programming error. For example, qom-list-properties, device-list-properties, etc., will be smart enough to not try to create temporary singleton objects if the class is a singleton class and if there's existing instance created. Such usages should be rare, and this patch introduced object_new_or_fetch() just for it, which either create a new temp object when available, or fetch the instance if we found an existing singleton instance. There're only two such use cases. Meanwhile, we also guard device-add or similar paths using the singleton check in object_new_allowed(), so that it'll fail properly if a singleton class instantiate more than one object. Glib Singleton implementation ============================= One note here to mention the Glib implementation of singletons [1]. QEMU chose not to follow Glib's implementation because Glib's version is not thread safe on the constructor, so that two concurrent g_object_new() on a single can race. It's not ideal to QEMU, as QEMU has to not only support the event-driven context which is normally lock-free, but also the case where threads are heavily used. It could be QEMU's desire to properly support multi-threading by default on such new interface. The "bad" side effect of that is, QEMU's object_new() on singletons can assert failures if the singleton existed, but that's also part of the design so as to forbid such from happening, taking which as a programming error. Meanwhile since we have pretty ways to fail elsewhere on qdev creations, it should already guard us in a clean way, from anywhere that the user could try to create the singleton more than once. The current QEMU impl also guarantees object_new() always return a newly allocated object as long as properly returned, rather than silently return an existing object as what Glib's impl would do. I see it a benefit, so as to avoid unknown caller manipulate a global object, wrongly assuming it was temporarily created. [1] https://lore.kernel.org/qemu-devel/20190228122822.GD4970@work-vm/ [2] https://lore.kernel.org/r/ZxtqGQbd4Hq4APtm@redhat.com Thanks, Peter Xu (7): qom: Track dynamic initiations of random object class qom: TYPE_SINGLETON interface qdev: Make device_set_realized() be fully prepared with !machine qdev: Make qdev_get_machine() safe before machine creates x86/iommu: Make x86-iommu a singleton object migration: Make migration object a singleton object migration: Reset current_migration properly qapi/qdev.json | 2 +- qapi/qom.json | 2 +- include/qom/object.h | 25 ++++++++++++++++ include/qom/object_interfaces.h | 51 +++++++++++++++++++++++++++++++++ chardev/char.c | 4 +-- hw/core/cpu-common.c | 13 ++++++--- hw/core/qdev.c | 20 +++++++++++-- hw/i386/x86-iommu.c | 26 +++++++++++++++-- migration/migration.c | 36 +++++++++++++++++++++-- qom/object.c | 50 ++++++++++++++++++++++++++++++-- qom/object_interfaces.c | 33 +++++++++++++++++++-- qom/qom-qmp-cmds.c | 4 +-- system/qdev-monitor.c | 4 +-- 13 files changed, 243 insertions(+), 27 deletions(-) -- 2.45.0
On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > Meanwhile, migration has a long standing issue on current_migration > pointer, where it can point to freed data after the migration object is > finalized. It is debatable that the pointer can be cleared after the main > thread (1) join() the migration thread first, then (2) release the last > refcount for the migration object and clear the pointer. However there's > still major challenges [1]. With singleton, we could have a slightly but > hopefully working workaround to clear the pointer during finalize(). I'm still not entirely convinced that this singleton proposal is fixing the migration problem correctly. Based on discussions in v1, IIUC, the situation is that we have migration_shutdown() being called from qemu_cleanup(). The former will call object_unref(current_migration), but there may still be background migration threads running that access 'current_migration', and thus a potential use-after-free. Based on what the 7th patch here does, the key difference is that the finalize() method for MigrationState will set 'current_migration' to NULL after free'ing it. I don't believe that is safe. Back to the current code, if there is a use-after-free today, that implies that the background threads are *not* holding their own reference on 'current_migration', allowing the object to be free'd while they're still using it. If they held their own reference then the object_unref in migration_shutdown would not have any use after free risk. The new code is not changing the ref counting done by any threads. Therefore if there's a use-after-free in existing code, AFAICT, the same use-after-free *must* still exist in the current code. The 7th patch only fixes the use-after-free, *if and only if* the background thread tries to access 'current_migration', /after/ finalize as completed. The use-after-free in this case, has been turned into a NULL pointer reference. A background thread could be accessing the 'current_migration' pointer *concurrently* with the finalize method executing though. In this case we still have a use after free problem, only the time window in which it exists has been narrowed a little. Shouldn't the problem with migration be solved by every migration thread holding a reference on current_migration, that the thread releases when it exits, such that MigrationState is only finalized once every thread has exited ? That would not require any join() synchronization point. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > Meanwhile, migration has a long standing issue on current_migration > > pointer, where it can point to freed data after the migration object is > > finalized. It is debatable that the pointer can be cleared after the main > > thread (1) join() the migration thread first, then (2) release the last > > refcount for the migration object and clear the pointer. However there's > > still major challenges [1]. With singleton, we could have a slightly but > > hopefully working workaround to clear the pointer during finalize(). > > I'm still not entirely convinced that this singleton proposal is > fixing the migration problem correctly. > > Based on discussions in v1, IIUC, the situation is that we have > migration_shutdown() being called from qemu_cleanup(). The former > will call object_unref(current_migration), but there may still > be background migration threads running that access 'current_migration', > and thus a potential use-after-free. migration thread is fine, it takes a refcount at the entry. And btw, taking it at the entry is racy, we've just fixed it, see (in my next migration pull): https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-peterx@redhat.com/ The access reported was, IIUC, outside migration code, but after both main/migration threads released the refcount, hence after finalize(). It could be a random migration_is_running() call very late in device code, for example. > > Based on what the 7th patch here does, the key difference is that > the finalize() method for MigrationState will set 'current_migration' > to NULL after free'ing it. Yes. But this show case series isn't complete. We need a migration-side lock finally to make it safe to access. For that, see: https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-peterx@redhat.com/ > > I don't believe that is safe. I hope after the other series applied it will be 100% safe, even though I agree it's tricky. But hopefully QOM is very clean, the trickly part is still within migration, and it should be less tricky than migration implement a refcount on top of Object.. > > Back to the current code, if there is a use-after-free today, that > implies that the background threads are *not* holding their own > reference on 'current_migration', allowing the object to be free'd > while they're still using it. If they held their own reference then > the object_unref in migration_shutdown would not have any use after > free risk. > > The new code is not changing the ref counting done by any threads. > Therefore if there's a use-after-free in existing code, AFAICT, the > same use-after-free *must* still exist in the current code. > > The 7th patch only fixes the use-after-free, *if and only if* the > background thread tries to access 'current_migration', /after/ > finalize as completed. The use-after-free in this case, has been > turned into a NULL pointer reference. > > A background thread could be accessing the 'current_migration' pointer > *concurrently* with the finalize method executing though. In this case > we still have a use after free problem, only the time window in which > it exists has been narrowed a little. > > Shouldn't the problem with migration be solved by every migration thread > holding a reference on current_migration, that the thread releases when > it exits, such that MigrationState is only finalized once every thread > has exited ? That would not require any join() synchronization point. I think the question is whether things like migration_is_running() is allowed to be used anywhere, even after migration_shutdown(). My answer is, it should be ok to be used anywhere, and we don't necessarilly need to limit that. In that case the caller doesn't need to take a refcount because it's an immediate query. It can simply check its existance with the lock (after my patch 8 of the other series applied, which depends on this qom series). Thanks, -- Peter Xu
On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote: > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > > > Meanwhile, migration has a long standing issue on current_migration > > > pointer, where it can point to freed data after the migration object is > > > finalized. It is debatable that the pointer can be cleared after the main > > > thread (1) join() the migration thread first, then (2) release the last > > > refcount for the migration object and clear the pointer. However there's > > > still major challenges [1]. With singleton, we could have a slightly but > > > hopefully working workaround to clear the pointer during finalize(). > > > > I'm still not entirely convinced that this singleton proposal is > > fixing the migration problem correctly. > > > > Based on discussions in v1, IIUC, the situation is that we have > > migration_shutdown() being called from qemu_cleanup(). The former > > will call object_unref(current_migration), but there may still > > be background migration threads running that access 'current_migration', > > and thus a potential use-after-free. > > migration thread is fine, it takes a refcount at the entry. > > And btw, taking it at the entry is racy, we've just fixed it, see (in my > next migration pull): > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-peterx@redhat.com/ Yep, acquiring the refcount immediately before thread-create is what I meant. > The access reported was, IIUC, outside migration code, but after both > main/migration threads released the refcount, hence after finalize(). It > could be a random migration_is_running() call very late in device code, for > example. > > > > > Based on what the 7th patch here does, the key difference is that > > the finalize() method for MigrationState will set 'current_migration' > > to NULL after free'ing it. > > Yes. But this show case series isn't complete. We need a migration-side > lock finally to make it safe to access. For that, see: > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-peterx@redhat.com/ > > > > > I don't believe that is safe. > > I hope after the other series applied it will be 100% safe, even though I > agree it's tricky. But hopefully QOM is very clean, the trickly part is > still within migration, and it should be less tricky than migration > implement a refcount on top of Object.. Ok, so with the other series applied, this does look safe, but it also doesn't seem to really have any dependancy on the single interface code. Patch 7 here looks sufficient, in combo with the other 2 series to avoid the use-after-free flaws. > I think the question is whether things like migration_is_running() is > allowed to be used anywhere, even after migration_shutdown(). My answer > is, it should be ok to be used anywhere, and we don't necessarilly need to > limit that. In that case the caller doesn't need to take a refcount > because it's an immediate query. It can simply check its existance with > the lock (after my patch 8 of the other series applied, which depends on > this qom series). Agree, and from a practical POV, I think it would be impossible to require a ref count be held from other non-migration threads, so the locking around 'current_migration' looks like the only practical option. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote: > On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote: > > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > > > > > Meanwhile, migration has a long standing issue on current_migration > > > > pointer, where it can point to freed data after the migration object is > > > > finalized. It is debatable that the pointer can be cleared after the main > > > > thread (1) join() the migration thread first, then (2) release the last > > > > refcount for the migration object and clear the pointer. However there's > > > > still major challenges [1]. With singleton, we could have a slightly but > > > > hopefully working workaround to clear the pointer during finalize(). > > > > > > I'm still not entirely convinced that this singleton proposal is > > > fixing the migration problem correctly. > > > > > > Based on discussions in v1, IIUC, the situation is that we have > > > migration_shutdown() being called from qemu_cleanup(). The former > > > will call object_unref(current_migration), but there may still > > > be background migration threads running that access 'current_migration', > > > and thus a potential use-after-free. > > > > migration thread is fine, it takes a refcount at the entry. > > > > And btw, taking it at the entry is racy, we've just fixed it, see (in my > > next migration pull): > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-peterx@redhat.com/ > > Yep, acquiring the refcount immediately before thread-create > is what I meant. > > > The access reported was, IIUC, outside migration code, but after both > > main/migration threads released the refcount, hence after finalize(). It > > could be a random migration_is_running() call very late in device code, for > > example. > > > > > > > > > > > Based on what the 7th patch here does, the key difference is that > > > the finalize() method for MigrationState will set 'current_migration' > > > to NULL after free'ing it. > > > > Yes. But this show case series isn't complete. We need a migration-side > > lock finally to make it safe to access. For that, see: > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-peterx@redhat.com/ > > > > > > > > I don't believe that is safe. > > > > I hope after the other series applied it will be 100% safe, even though I > > agree it's tricky. But hopefully QOM is very clean, the trickly part is > > still within migration, and it should be less tricky than migration > > implement a refcount on top of Object.. > > Ok, so with the other series applied, this does look safe, but > it also doesn't seem to really have any dependancy on the > single interface code. Patch 7 here looks sufficient, in combo > with the other 2 series to avoid the use-after-free flaws. Patch 7, when applied without patch 6 and prior, will crash in device-introspect-test, trying to create yet another migration object when processing the "device-list-properties" QMP command. And it turns out that's also not the only way QEMU can crash by that. Fundamentally it's because patch 7 has global operations within init()/finalize() to fix the migration dangling pointer, hence it must not be instanciated more than once. It's also probably because I always think singleton can be useful in general to QEMU's device model where can be special devices all over the places that I'm not aware of. I didn't work on a lot of QEMU devices, but with that limited experience I still stumbled upon two devices (if taking migration object as one..) that might benefit from it. That leads to this whole series, which is also the cleanest so far I can think of to solve the immediate migration UAF. Thanks, > > > I think the question is whether things like migration_is_running() is > > allowed to be used anywhere, even after migration_shutdown(). My answer > > is, it should be ok to be used anywhere, and we don't necessarilly need to > > limit that. In that case the caller doesn't need to take a refcount > > because it's an immediate query. It can simply check its existance with > > the lock (after my patch 8 of the other series applied, which depends on > > this qom series). > > Agree, and from a practical POV, I think it would be impossible to > require a ref count be held from other non-migration threads, so the > locking around 'current_migration' looks like the only practical > option. > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- Peter Xu
On Wed, Oct 30, 2024 at 01:51:54PM -0400, Peter Xu wrote: > On Wed, Oct 30, 2024 at 04:13:57PM +0000, Daniel P. Berrangé wrote: > > On Wed, Oct 30, 2024 at 09:13:13AM -0400, Peter Xu wrote: > > > On Wed, Oct 30, 2024 at 09:48:07AM +0000, Daniel P. Berrangé wrote: > > > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > > > > > > > Meanwhile, migration has a long standing issue on current_migration > > > > > pointer, where it can point to freed data after the migration object is > > > > > finalized. It is debatable that the pointer can be cleared after the main > > > > > thread (1) join() the migration thread first, then (2) release the last > > > > > refcount for the migration object and clear the pointer. However there's > > > > > still major challenges [1]. With singleton, we could have a slightly but > > > > > hopefully working workaround to clear the pointer during finalize(). > > > > > > > > I'm still not entirely convinced that this singleton proposal is > > > > fixing the migration problem correctly. > > > > > > > > Based on discussions in v1, IIUC, the situation is that we have > > > > migration_shutdown() being called from qemu_cleanup(). The former > > > > will call object_unref(current_migration), but there may still > > > > be background migration threads running that access 'current_migration', > > > > and thus a potential use-after-free. > > > > > > migration thread is fine, it takes a refcount at the entry. > > > > > > And btw, taking it at the entry is racy, we've just fixed it, see (in my > > > next migration pull): > > > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-2-peterx@redhat.com/ > > > > Yep, acquiring the refcount immediately before thread-create > > is what I meant. > > > > > The access reported was, IIUC, outside migration code, but after both > > > main/migration threads released the refcount, hence after finalize(). It > > > could be a random migration_is_running() call very late in device code, for > > > example. > > > > > > > > > > > > > > > > > Based on what the 7th patch here does, the key difference is that > > > > the finalize() method for MigrationState will set 'current_migration' > > > > to NULL after free'ing it. > > > > > > Yes. But this show case series isn't complete. We need a migration-side > > > lock finally to make it safe to access. For that, see: > > > > > > https://lore.kernel.org/qemu-devel/20241024213056.1395400-9-peterx@redhat.com/ > > > > > > > > > > > I don't believe that is safe. > > > > > > I hope after the other series applied it will be 100% safe, even though I > > > agree it's tricky. But hopefully QOM is very clean, the trickly part is > > > still within migration, and it should be less tricky than migration > > > implement a refcount on top of Object.. > > > > Ok, so with the other series applied, this does look safe, but > > it also doesn't seem to really have any dependancy on the > > single interface code. Patch 7 here looks sufficient, in combo > > with the other 2 series to avoid the use-after-free flaws. > > Patch 7, when applied without patch 6 and prior, will crash in > device-introspect-test, trying to create yet another migration object when > processing the "device-list-properties" QMP command. And it turns out > that's also not the only way QEMU can crash by that. > > Fundamentally it's because patch 7 has global operations within > init()/finalize() to fix the migration dangling pointer, hence it must not > be instanciated more than once. That's a result from moving the "assert()" into the constructor. The assert(!current_migration) can be kept in migration_object_init, the constructor could conditionally set current_migration only if it is NULL, and the finalizer could conditionally clear current_migration only if it matches the current object. There's no conceptual dependancy on having a singleton interface in the patch. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Oct 30, 2024 at 05:58:15PM +0000, Daniel P. Berrangé wrote: > That's a result from moving the "assert()" into the constructor. > The assert(!current_migration) can be kept in migration_object_init, > the constructor could conditionally set current_migration only if it > is NULL, and the finalizer could conditionally clear current_migration > only if it matches the current object. There's no conceptual dependancy > on having a singleton interface in the patch. Yes that could work, but that sounds more hackish from my POV, trying to detect "which is the real migration object". We need to assume (1) the 1st migration object being created must be the global migration object, then (2) we still allow concurrent creations of such object, which I personally don't really like, even with a hack already. If this series cannot get accepted, I can try go with that, or I'll implement the refcount in migration.c, whichever I found better at last. To me, both are less clean comparing to singleton. -- Peter Xu
On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > This patchset introduces the singleton interface for QOM. I didn't add a > changelog because there're quite a few changes here and there, plus new > patches. So it might just be easier to re-read, considering the patchset > isn't large. > > I switched v2 into RFC, because we have reviewer concerns (Phil and Dan so > far) that it could be error prone to try to trap every attempts to create > an object. My argument is, if we already have abstract class, meanwhile we > do not allow instantiation of abstract class, so the complexity is already > there. I prepared patch 1 this time to collect and track all similar > random object creations; it might be helpful as a cleanup on its own to > deduplicate some similar error messages. Said that, I'm still always open > to rejections to this proposal. > > I hope v2 looks slightly cleaner by having not only object_new_allowed() > but also object_new_or_fetch(). For me, that doesn't really make it much more appealing. Yes, we already have an abstract class, but that has narrower impact, as there are fewer places in which which we can trigger instantiation of an abstract class, than where we can trigger instantiation of arbitrary objects and devices. The conversion of the iommu code results in worse error reporting, and doesn't handle the virtio-iommu case, and the migration problems appear solvable without inventing a singleton interface. So this doesn't feel like it is worth the the trouble. NB, my view point would have been different if 'object_new' had an "Error *errp" parameter. That would have made handling failure a standard part of the design pattern for object construction, thus avoiding adding asserts in the 'object_new' codepath which could be triggered by unexpected/badly validated user input. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Oct 30, 2024 at 06:07:08PM +0000, Daniel P. Berrangé wrote: > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > > This patchset introduces the singleton interface for QOM. I didn't add a > > changelog because there're quite a few changes here and there, plus new > > patches. So it might just be easier to re-read, considering the patchset > > isn't large. > > > > I switched v2 into RFC, because we have reviewer concerns (Phil and Dan so > > far) that it could be error prone to try to trap every attempts to create > > an object. My argument is, if we already have abstract class, meanwhile we > > do not allow instantiation of abstract class, so the complexity is already > > there. I prepared patch 1 this time to collect and track all similar > > random object creations; it might be helpful as a cleanup on its own to > > deduplicate some similar error messages. Said that, I'm still always open > > to rejections to this proposal. > > > > I hope v2 looks slightly cleaner by having not only object_new_allowed() > > but also object_new_or_fetch(). > > For me, that doesn't really make it much more appealing. Yes, we already have > an abstract class, but that has narrower impact, as there are fewer places > in which which we can trigger instantiation of an abstract class, than > where we can trigger instantiation of arbitrary objects and devices. There should be exactly the same number of places that will need care for either abstract or singleton. I tried to justify this with patch 1. I still think patch 1 can be seen as a cleanup too on its own (dedups the same "The class is abstract" error message), tracking random object creations so logically we could have the idea on whether a class can be instantiated at all, starting with abstract class. The real extra "complexity" is object_new_or_fetch(), but I hope it's not a major concern either. We only have two such use (aka, "please give me an object of class XXX"), which is qom/device-list-properties. I don't expect it to be common, I hope it's easy to maintain. > > The conversion of the iommu code results in worse error reporting, and > doesn't handle the virtio-iommu case, and the migration problems appear > solvable without inventing a singleton interface. So this doesn't feel > like it is worth the the trouble. IMHO that's not a major issue, I can drop patch 3-5 just to make it simple as of now. Btw, I have a TODO in patch 2 where I mentioned we can provide better error report if we want, so we can easily have exactly the same error as before with maybe a few or 10+ LOCs on top. It's trivial. object_new_allowed(): + if (object_class_is_singleton(klass)) { + Object *obj = singleton_get_instance(klass); + + if (obj) { + object_unref(obj); + /* + * TODO: Enhance the error message. E.g., the singleton class + * can provide a per-class error message in SingletonClass. + */ + error_setg(errp, "Object type '%s' conflicts with " + "an existing singleton instance", + klass->type->name); + return false; + } + } > > NB, my view point would have been different if 'object_new' had an > "Error *errp" parameter. That would have made handling failure a > standard part of the design pattern for object construction, thus > avoiding adding asserts in the 'object_new' codepath which could be > triggered by unexpected/badly validated user input. Yes I also wished object_new() can take an Error** when I started working on it. It would make this much easier, indeed. I suppose we don't need that by not allowing instance_init() to fail at all, postponing things to realize(). I suppose that's a "tactic" QEMU chose explicitly to make it easy that object_new() callers keep like before with zero error handling needed. At least for TYPE_DEVICE it looks all fine if all such operations can be offloaded into realize(). I'm not sure user creatable has those steps also because of this limitation. I was trying to do that with object_new_allowed() here instead, whenever it could be triggered by an user input. We could have an extra layer before reaching object_new() to guard any user input, and I think object_new_allowed() could play that role. When / If we want to introduce Error** to object_new() some day (or a variance of it), we could simply move object_new_allowed() into it. Thanks, -- Peter Xu
On Wed, Oct 30, 2024 at 03:08:08PM -0400, Peter Xu wrote: > On Wed, Oct 30, 2024 at 06:07:08PM +0000, Daniel P. Berrangé wrote: > > On Tue, Oct 29, 2024 at 05:16:00PM -0400, Peter Xu wrote: > > > v1: https://lore.kernel.org/r/20241024165627.1372621-1-peterx@redhat.com > > > > > > This patchset introduces the singleton interface for QOM. I didn't add a > > > changelog because there're quite a few changes here and there, plus new > > > patches. So it might just be easier to re-read, considering the patchset > > > isn't large. > > > > > > I switched v2 into RFC, because we have reviewer concerns (Phil and Dan so > > > far) that it could be error prone to try to trap every attempts to create > > > an object. My argument is, if we already have abstract class, meanwhile we > > > do not allow instantiation of abstract class, so the complexity is already > > > there. I prepared patch 1 this time to collect and track all similar > > > random object creations; it might be helpful as a cleanup on its own to > > > deduplicate some similar error messages. Said that, I'm still always open > > > to rejections to this proposal. > > > > > > I hope v2 looks slightly cleaner by having not only object_new_allowed() > > > but also object_new_or_fetch(). > > > > For me, that doesn't really make it much more appealing. Yes, we already have > > an abstract class, but that has narrower impact, as there are fewer places > > in which which we can trigger instantiation of an abstract class, than > > where we can trigger instantiation of arbitrary objects and devices. > > There should be exactly the same number of places that will need care for > either abstract or singleton. I tried to justify this with patch 1. > > I still think patch 1 can be seen as a cleanup too on its own (dedups the > same "The class is abstract" error message), tracking random object > creations so logically we could have the idea on whether a class can be > instantiated at all, starting with abstract class. I think patch 1 might be incomplete, as I'm not seeing what checks for abstract or singleton classes in the 'qdev_new' code paths, used by -device / device_add QMP. This is an example of the risks of adding more failure scenarios to object_add. > > NB, my view point would have been different if 'object_new' had an > > "Error *errp" parameter. That would have made handling failure a > > standard part of the design pattern for object construction, thus > > avoiding adding asserts in the 'object_new' codepath which could be > > triggered by unexpected/badly validated user input. > > Yes I also wished object_new() can take an Error** when I started working > on it. It would make this much easier, indeed. I suppose we don't need > that by not allowing instance_init() to fail at all, postponing things to > realize(). I suppose that's a "tactic" QEMU chose explicitly to make it > easy that object_new() callers keep like before with zero error handling > needed. At least for TYPE_DEVICE it looks all fine if all such operations > can be offloaded into realize(). I'm not sure user creatable has those > steps also because of this limitation. > > I was trying to do that with object_new_allowed() here instead, whenever it > could be triggered by an user input. We could have an extra layer before > reaching object_new() to guard any user input, and I think > object_new_allowed() could play that role. When / If we want to introduce > Error** to object_new() some day (or a variance of it), we could simply > move object_new_allowed() into it. Yes, having thought about this today, I came up with a way that we could introduce a object_new_dynamic() variant with "Error *errp" instead of asserts, and *crucially* force its use in the unsafe scenarios. ie any place that is not passing a const,static string. I've CC'd you on an RFC series that mocks up this idea. That would be sufficient to remove my objections wrt the singleton concept. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2024 Red Hat, Inc.