[v3] Require error handling for dynamically created objects

[PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Daniel P. Berrangé 1 year, 2 months ago

NB, this series is targetting 10.0, NOT for 9.2 freeze.

With code like

    Object *obj = object_new(TYPE_BLAH)

the caller can be pretty confident that they will successfully create
an object instance of TYPE_BLAH. They know exactly what type has been
requested, so it passing an abstract type for example, it is a clear
programmer error that they'll get an assertion failure.

Conversely with code like

   void somefunc(const char *typename) {
      Object * obj = object_new(typename)
      ...
   }

all bets are off, because the call of object_new() knows nothing
about what 'typename' resolves to. It could easily be an abstract
type. As a result, many code paths have added a manual check ahead
of time

   if (object_class_is_abstract(typename)) {
      error_setg(errp, ....)
   }

...except for where we forget to do this, such as qdev_new().

Overall 'object_new' is a bad design because it is inherantly
unsafe to call with unvalidated typenames.

This problem is made worse by the proposal to introduce the idea
of 'singleton' classes[1].

Thus, this series suggests a way to improve safety at build
time. The core idea is to allow 'object_new' to continue to be
used *if-and-only-if* given a static, const string, because that
scenario indicates the caller is aware of what type they are
creating at build time.

A new 'object_new_dynamic' method is proposed for cases where
the typename is dynamically chosen at runtime. This method has
an "Error **errp" parameter, which can report when an abstract
type is created, leaving the assert()s only for scenarios which
are unambiguous programmer errors.

With a little macro magic, we guarantee a compile error is
generated if 'object_new' is called with a dynamic type, forcing
all potentially unsafe code over to object_new_dynamic.

This is more tractable than adding 'Error **errp' to 'object_new'
as only a handful of places use a dynamic type name.

With this series, my objections to Peter Xu's singleton series[1]
would be largely nullified.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html

Changed in v3:

 * Remove qdev_try_new/isa_try_new/usb_try_new entirely
 * Fix use of obj_free

Changed in v2:

 * Removed "RFC" tag
 * Converted code in all non-x86_64 targets
 * Converted qdev_new to same pattern as object_new
 * Ensured test suites work now

Daniel P. Berrangé (9):
  hw: eliminate qdev_try_new, isa_try_new & usb_try_new
  qom: refactor checking abstract property when creating instances
  qom: allow failure of object_new_with_class
  qom: introduce object_new_dynamic()
  convert code to object_new_dynamic() where appropriate
  qom: enforce use of static, const string with object_new()
  qom: introduce qdev_new_dynamic()
  convert code to qdev_new_dynamic() where appropriate
  hw: enforce use of static, const string with qdev_new()

 accel/accel-user.c                    |  4 ++-
 chardev/char.c                        |  6 +++-
 hw/arm/aspeed.c                       |  6 ++--
 hw/arm/exynos4210.c                   |  3 +-
 hw/arm/highbank.c                     |  2 +-
 hw/arm/integratorcp.c                 |  2 +-
 hw/arm/mps3r.c                        |  3 +-
 hw/arm/npcm7xx_boards.c               |  2 +-
 hw/arm/realview.c                     |  3 +-
 hw/arm/sbsa-ref.c                     |  7 ++--
 hw/arm/versatilepb.c                  |  2 +-
 hw/arm/vexpress.c                     |  4 +--
 hw/arm/virt.c                         | 10 +++---
 hw/arm/xilinx_zynq.c                  |  3 +-
 hw/audio/soundhw.c                    |  2 +-
 hw/block/xen-block.c                  |  7 +++-
 hw/core/bus.c                         |  2 +-
 hw/core/cpu-common.c                  |  2 +-
 hw/core/qdev.c                        | 13 +++-----
 hw/core/sysbus.c                      |  2 +-
 hw/i2c/core.c                         |  2 +-
 hw/i386/pc.c                          | 22 ++++++------
 hw/i386/x86-common.c                  |  5 ++-
 hw/i386/xen/xen-pvh.c                 |  2 +-
 hw/intc/xics.c                        |  5 ++-
 hw/isa/isa-bus.c                      |  7 +---
 hw/mips/cps.c                         |  3 +-
 hw/pci-host/pnv_phb.c                 |  5 ++-
 hw/pci/pci.c                          |  2 +-
 hw/ppc/e500.c                         |  2 +-
 hw/ppc/pnv.c                          |  6 ++--
 hw/ppc/pnv_core.c                     |  5 ++-
 hw/ppc/spapr.c                        |  2 +-
 hw/ppc/spapr_cpu_core.c               |  5 ++-
 hw/ppc/spapr_drc.c                    |  2 +-
 hw/s390x/s390-pci-bus.c               |  6 +---
 hw/s390x/s390-virtio-ccw.c            | 10 ++++--
 hw/scsi/scsi-bus.c                    |  5 ++-
 hw/sparc/leon3.c                      |  2 +-
 hw/sparc/sun4m.c                      |  2 +-
 hw/sparc64/sparc64.c                  |  2 +-
 hw/ssi/ssi.c                          |  2 +-
 hw/usb/bus.c                          |  6 ++--
 hw/vfio/common.c                      |  6 +++-
 hw/vfio/container.c                   |  7 +++-
 include/hw/isa/isa.h                  |  1 -
 include/hw/net/ne2000-isa.h           | 16 ++++-----
 include/hw/qdev-core.h                | 41 ++++++++++++++++++-----
 include/hw/usb.h                      |  7 +---
 include/qom/object.h                  | 48 +++++++++++++++++++++++++--
 net/net.c                             |  7 ++--
 qom/object.c                          | 44 +++++++++++++++++-------
 qom/object_interfaces.c               |  7 ++--
 qom/qom-qmp-cmds.c                    | 16 +++++----
 system/qdev-monitor.c                 |  5 ++-
 system/vl.c                           |  6 ++--
 target/arm/arm-qmp-cmds.c             |  5 ++-
 target/i386/cpu-apic.c                |  8 ++++-
 target/i386/cpu-sysemu.c              | 11 ++++--
 target/i386/cpu.c                     |  4 +--
 target/loongarch/loongarch-qmp-cmds.c |  5 ++-
 target/mips/cpu.c                     |  2 +-
 target/riscv/riscv-qmp-cmds.c         |  5 ++-
 target/s390x/cpu_models_sysemu.c      |  7 ++--
 target/xtensa/cpu.c                   |  2 +-
 tests/unit/check-qom-interface.c      |  3 +-
 tests/unit/test-smp-parse.c           | 20 +++++------
 67 files changed, 312 insertions(+), 166 deletions(-)

-- 
2.46.0

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Markus Armbruster 1 year, 2 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> NB, this series is targetting 10.0, NOT for 9.2 freeze.
>
> With code like
>
>     Object *obj = object_new(TYPE_BLAH)
>
> the caller can be pretty confident that they will successfully create
> an object instance of TYPE_BLAH. They know exactly what type has been
> requested, so it passing an abstract type for example, it is a clear
> programmer error that they'll get an assertion failure.
>
> Conversely with code like
>
>    void somefunc(const char *typename) {
>       Object * obj = object_new(typename)
>       ...
>    }
>
> all bets are off, because the call of object_new() knows nothing
> about what 'typename' resolves to.

We know nothing *locally*.

Commonly, a non-local argument can demonstrate safety.  Only when the
type name comes from the user, we truly know nothing.

>                                    It could easily be an abstract
> type.

It could also be no type at all.

>       As a result, many code paths have added a manual check ahead
> of time
>
>    if (object_class_is_abstract(typename)) {
>       error_setg(errp, ....)
>    }

Actually, object_class_is_abstract() takes an ObjectClass, not a type
name string.

The actual guards we use are variations of

    klass = object_class_by_name(typename);
    if (!klass) {
        error_setg(errp, "invalid object type: %s", typename);
        return NULL;
    }

    if (object_class_is_abstract(klass)) {
        error_setg(errp, "object type '%s' is abstract", typename);
        return NULL;
    }

which covers "no type at all", too.

Sometimes, we use module_object_class_by_name() instead, which I believe
additionally loads the module providing the type, if any.  Which of the
two should be used where is a mystery to me, and I suspect we're getting
it wrong in places.  But this is turning into a digression.  To
hopefully maintain focus, I'm pretending modules don't exist until later
in this message.

Sometimes, we throw in an object_class_dynamic_cast(klass, T) to check
@typename resolves to a subtype of some T.

> ...except for where we forget to do this, such as qdev_new().

We did not forget it there!  It's by design a thin wrapper around
object_new(), with preconditions just like object_new().

> Overall 'object_new' is a bad design because it is inherantly
> unsafe to call with unvalidated typenames.

To be fair, object_new() was not designed for use with user-provided
type names.  When it chokes on type names not provided by the user, it's
clearly a programming error, and assert() is a perfectly fine way to
catch programming errors.  Same for qdev_new().

However, we do in fact use these functions with user-provided type
names, if rarely.  When we do, we need to validate the type name before
we pass it to them.

Trouble is the validation code is a bit involved, and reimplementing it
everywhere it's needed is asking for bugs.

Creating and using more interfaces that are more convenient for this
purpose would avoid that.

> This problem is made worse by the proposal to introduce the idea
> of 'singleton' classes[1].
>
> Thus, this series suggests a way to improve safety at build
> time. The core idea is to allow 'object_new' to continue to be
> used *if-and-only-if* given a static, const string, because that
> scenario indicates the caller is aware of what type they are
> creating at build time.
>
> A new 'object_new_dynamic' method is proposed for cases where
> the typename is dynamically chosen at runtime. This method has
> an "Error **errp" parameter, which can report when an abstract
> type is created, leaving the assert()s only for scenarios which
> are unambiguous programmer errors.
>
> With a little macro magic, we guarantee a compile error is
> generated if 'object_new' is called with a dynamic type, forcing
> all potentially unsafe code over to object_new_dynamic.

Three cases:

1. Type name is literal string.  No change.  This is the most common
   case.

2. It's not.

2a. Type name is user-provided.  This is rare.  We replace

        if (... guard ...) {
            ... return failure ...
        }
        obj = object_new(...);

    by

        obj = object_new_dynamic(..., errp);
        if (!obj) {
            ... return failure ...
        }

    This is an improvement.

2b. It's not.  We should replace

        obj = object_new(...);

    by

        obj = object_new_dynamic(..., &error_abort);

    Exact same behavior, just wordier, to placate the compiler.
    Tolerable as long as it's relatively rare.

    But I'm not sure it's worthwhile.  All it really does is helping
    some towards not getting case 2a wrong.  But 2a is rare.

Same for similar interfaces, e.g. qdev_new().

> This is more tractable than adding 'Error **errp' to 'object_new'
> as only a handful of places use a dynamic type name.

True!

Alright, enter modules.

Modules break a fundamental design assumption: object_new() on a
compiled-in type name is safe, i.e. the failure modes are all
programming errors.

Modules add new failure modes that are *not* programming errors:

* The module providing the type was not deployed correctly.

* It was, but the host system lacks the resources to load it.

Before modules, object_new(T) was safe unless T was user-provided.
Which implies it's safe when T is a literal string.

Since modules, object_new(T) is safe unless T is user-provided or the
type named by it is compiled as module.  This does *not* imply it's safe
when T is a literal string.

When looking at a use of object_new(), whether the argument names a type
that could be compiled as module cannot be known locally.  Therefore, we
cannot know locally whether we need to handle failure, either with a
suitable guard or by switching to a new function like
object_new_dynamic().  This is bad.

Breaking fundamental design assumptions tends to have ugly and expensive
consequences.  Consequences like having to rework every single call of
object_new() & friends.

Can we reduce the damage?  Maybe.  What if we create a
module_object_new() that takes an Error **, and make object_new() crash
& burn when the @typename argument resolves to a type provided by a
module?

Maybe module_object_new() and object_new_dynamic() could be fused into a
single function with a better name.

> With this series, my objections to Peter Xu's singleton series[1]
> would be largely nullified.
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Daniel P. Berrangé 1 year, 2 months ago

On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > NB, this series is targetting 10.0, NOT for 9.2 freeze.
> >
> > With code like
> >
> >     Object *obj = object_new(TYPE_BLAH)
> >
> > the caller can be pretty confident that they will successfully create
> > an object instance of TYPE_BLAH. They know exactly what type has been
> > requested, so it passing an abstract type for example, it is a clear
> > programmer error that they'll get an assertion failure.
> >
> > Conversely with code like
> >
> >    void somefunc(const char *typename) {
> >       Object * obj = object_new(typename)
> >       ...
> >    }
> >
> > all bets are off, because the call of object_new() knows nothing
> > about what 'typename' resolves to.
> 
> We know nothing *locally*.
> 
> Commonly, a non-local argument can demonstrate safety.  Only when the
> type name comes from the user, we truly know nothing.

...except for the failures introduced by modules not being installed,
then all bets are off for all types unless you happen to recall
which have been modularized so far.

> >                                    It could easily be an abstract
> > type.
> 
> It could also be no type at all.
> 
> >       As a result, many code paths have added a manual check ahead
> > of time
> >
> >    if (object_class_is_abstract(typename)) {
> >       error_setg(errp, ....)
> >    }
> 
> Actually, object_class_is_abstract() takes an ObjectClass, not a type
> name string.
> 
> The actual guards we use are variations of
> 
>     klass = object_class_by_name(typename);
>     if (!klass) {
>         error_setg(errp, "invalid object type: %s", typename);
>         return NULL;
>     }
> 
>     if (object_class_is_abstract(klass)) {
>         error_setg(errp, "object type '%s' is abstract", typename);
>         return NULL;
>     }
> 
> which covers "no type at all", too.
> 
> Sometimes, we use module_object_class_by_name() instead, which I believe
> additionally loads the module providing the type, if any.  Which of the
> two should be used where is a mystery to me, and I suspect we're getting
> it wrong in places.  But this is turning into a digression.  To
> hopefully maintain focus, I'm pretending modules don't exist until later
> in this message.

Yeah, I'm not a fan of having the separate module_object_class_by_name,
because it requires us to remember whether something has been modularized
or not.

> Sometimes, we throw in an object_class_dynamic_cast(klass, T) to check
> @typename resolves to a subtype of some T.
> 
> > ...except for where we forget to do this, such as qdev_new().
> 
> We did not forget it there!  It's by design a thin wrapper around
> object_new(), with preconditions just like object_new().

Yes, I think what I meant to write here, was "...except for where
we forgot todo this in *callers* of qdev_new that take user input"

> > Overall 'object_new' is a bad design because it is inherantly
> > unsafe to call with unvalidated typenames.
> 
> To be fair, object_new() was not designed for use with user-provided
> type names.  When it chokes on type names not provided by the user, it's
> clearly a programming error, and assert() is a perfectly fine way to
> catch programming errors.  Same for qdev_new().
> 
> However, we do in fact use these functions with user-provided type
> names, if rarely.  When we do, we need to validate the type name before
> we pass it to them.
> 
> Trouble is the validation code is a bit involved, and reimplementing it
> everywhere it's needed is asking for bugs.
> Creating and using more interfaces that are more convenient for this
> purpose would avoid that.

Yep, I don't have confidence in an API that will assert if the caller
forgot to validate the pre-conditions that can be triggered by user
input (or potentially other unexpected scenarios like something being
switched over to a module).

> > This problem is made worse by the proposal to introduce the idea
> > of 'singleton' classes[1].
> >
> > Thus, this series suggests a way to improve safety at build
> > time. The core idea is to allow 'object_new' to continue to be
> > used *if-and-only-if* given a static, const string, because that
> > scenario indicates the caller is aware of what type they are
> > creating at build time.
> >
> > A new 'object_new_dynamic' method is proposed for cases where
> > the typename is dynamically chosen at runtime. This method has
> > an "Error **errp" parameter, which can report when an abstract
> > type is created, leaving the assert()s only for scenarios which
> > are unambiguous programmer errors.
> >
> > With a little macro magic, we guarantee a compile error is
> > generated if 'object_new' is called with a dynamic type, forcing
> > all potentially unsafe code over to object_new_dynamic.
> 
> Three cases:
> 
> 1. Type name is literal string.  No change.  This is the most common
>    case.
> 
> 2. It's not.
> 
> 2a. Type name is user-provided.  This is rare.  We replace
> 
>         if (... guard ...) {
>             ... return failure ...
>         }
>         obj = object_new(...);
> 
>     by
> 
>         obj = object_new_dynamic(..., errp);
>         if (!obj) {
>             ... return failure ...
>         }
> 
>     This is an improvement.
> 
> 2b. It's not.  We should replace
> 
>         obj = object_new(...);
> 
>     by
> 
>         obj = object_new_dynamic(..., &error_abort);
> 
>     Exact same behavior, just wordier, to placate the compiler.
>     Tolerable as long as it's relatively rare.
> 
>     But I'm not sure it's worthwhile.  All it really does is helping
>     some towards not getting case 2a wrong.  But 2a is rare.

Yes, 2a is fairly rare, but this is amplified by the consequences
of getting it wrong, which are an assert killing your running VM.
My goal was to make it much harder to screw up and trigger an
assert, even if that makes some valid uses more verbose.

> > This is more tractable than adding 'Error **errp' to 'object_new'
> > as only a handful of places use a dynamic type name.
> 
> True!
> 
> Alright, enter modules.
> 
> Modules break a fundamental design assumption: object_new() on a
> compiled-in type name is safe, i.e. the failure modes are all
> programming errors.
> 
> Modules add new failure modes that are *not* programming errors:
> 
> * The module providing the type was not deployed correctly.
> 
> * It was, but the host system lacks the resources to load it.

Hmm, yes, I hadn't considered the 2nd problem. That's more
unpleasant, as libvirt may well have queried QEMU earlier to
detect the missing module, and assume all is safe if it is
present.

> 
> Before modules, object_new(T) was safe unless T was user-provided.
> Which implies it's safe when T is a literal string.
> 
> Since modules, object_new(T) is safe unless T is user-provided or the
> type named by it is compiled as module.  This does *not* imply it's safe
> when T is a literal string.

Agreed. 

> When looking at a use of object_new(), whether the argument names a type
> that could be compiled as module cannot be known locally.  Therefore, we
> cannot know locally whether we need to handle failure, either with a
> suitable guard or by switching to a new function like
> object_new_dynamic().  This is bad.
>
> Breaking fundamental design assumptions tends to have ugly and expensive
> consequences.  Consequences like having to rework every single call of
> object_new() & friends.
> 
> Can we reduce the damage?  Maybe.  What if we create a
> module_object_new() that takes an Error **, and make object_new() crash
> & burn when the @typename argument resolves to a type provided by a
> module?

I'm doubtful about a design where maintainers have to choose the
right API, based on mental knowledge of what is a module or not.
The place where object_new is called is typically distinct from
the module impl, so the knowledge is separated. This opens the
door to forgetting to change code from object_new to module_object_new.
This is what motivated my attempt to try to force compile time errors
scenarios which had a high chance of being user specified types.

I don't have a good answer for how to extend compile time validation
to cover non-user specified types that might be modules, without
changnig 'object_new' itself to add "Error **errp" and convert as
many callers as possible to propagate errors. That's a huge pile
of tedious work and in many cases would deteriorate  to &error_abort
since some key common use scenarios lack a "Error *errp" to propagate
into.

> Maybe module_object_new() and object_new_dynamic() could be fused into a
> single function with a better name.
> 
> > With this series, my objections to Peter Xu's singleton series[1]
> > would be largely nullified.
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Markus Armbruster 1 year, 2 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > NB, this series is targetting 10.0, NOT for 9.2 freeze.
>> >
>> > With code like
>> >
>> >     Object *obj = object_new(TYPE_BLAH)
>> >
>> > the caller can be pretty confident that they will successfully create
>> > an object instance of TYPE_BLAH. They know exactly what type has been
>> > requested, so it passing an abstract type for example, it is a clear
>> > programmer error that they'll get an assertion failure.
>> >
>> > Conversely with code like
>> >
>> >    void somefunc(const char *typename) {
>> >       Object * obj = object_new(typename)
>> >       ...
>> >    }
>> >
>> > all bets are off, because the call of object_new() knows nothing
>> > about what 'typename' resolves to.
>> 
>> We know nothing *locally*.
>> 
>> Commonly, a non-local argument can demonstrate safety.  Only when the
>> type name comes from the user, we truly know nothing.
>
> ...except for the failures introduced by modules not being installed,
> then all bets are off for all types unless you happen to recall
> which have been modularized so far.

True.  I was pretending modules don't exist until later in the message.

>> >                                    It could easily be an abstract
>> > type.
>> 
>> It could also be no type at all.
>> 
>> >       As a result, many code paths have added a manual check ahead
>> > of time
>> >
>> >    if (object_class_is_abstract(typename)) {
>> >       error_setg(errp, ....)
>> >    }
>> 
>> Actually, object_class_is_abstract() takes an ObjectClass, not a type
>> name string.
>> 
>> The actual guards we use are variations of
>> 
>>     klass = object_class_by_name(typename);
>>     if (!klass) {
>>         error_setg(errp, "invalid object type: %s", typename);
>>         return NULL;
>>     }
>> 
>>     if (object_class_is_abstract(klass)) {
>>         error_setg(errp, "object type '%s' is abstract", typename);
>>         return NULL;
>>     }
>> 
>> which covers "no type at all", too.
>> 
>> Sometimes, we use module_object_class_by_name() instead, which I believe
>> additionally loads the module providing the type, if any.  Which of the
>> two should be used where is a mystery to me, and I suspect we're getting
>> it wrong in places.  But this is turning into a digression.  To
>> hopefully maintain focus, I'm pretending modules don't exist until later
>> in this message.
>
> Yeah, I'm not a fan of having the separate module_object_class_by_name,
> because it requires us to remember whether something has been modularized
> or not.
>
>> Sometimes, we throw in an object_class_dynamic_cast(klass, T) to check
>> @typename resolves to a subtype of some T.
>> 
>> > ...except for where we forget to do this, such as qdev_new().
>> 
>> We did not forget it there!  It's by design a thin wrapper around
>> object_new(), with preconditions just like object_new().
>
> Yes, I think what I meant to write here, was "...except for where
> we forgot todo this in *callers* of qdev_new that take user input"

Correct.

>> > Overall 'object_new' is a bad design because it is inherantly
>> > unsafe to call with unvalidated typenames.
>> 
>> To be fair, object_new() was not designed for use with user-provided
>> type names.  When it chokes on type names not provided by the user, it's
>> clearly a programming error, and assert() is a perfectly fine way to
>> catch programming errors.  Same for qdev_new().
>> 
>> However, we do in fact use these functions with user-provided type
>> names, if rarely.  When we do, we need to validate the type name before
>> we pass it to them.
>> 
>> Trouble is the validation code is a bit involved, and reimplementing it
>> everywhere it's needed is asking for bugs.
>>
>> Creating and using more interfaces that are more convenient for this
>> purpose would avoid that.
>
> Yep, I don't have confidence in an API that will assert if the caller
> forgot to validate the pre-conditions that can be triggered by user
> input (or potentially other unexpected scenarios like something being
> switched over to a module).

Modules broke object_new(), but I'd rather not call object_new()'s
design bad for not accomodating a feature tacked on half-baked almost a
decade later.  But let's discuss modules further down.

Asserting preconditions isn't the problem; this is how preconditions
*should* be checked.  The problem is error-prone preconditions.

Using string type names is in theory error-prone: the compiler cannot
check the type name is valid.  It could be invalid because of a typo, or
because it names a type that's not linked into this binary.

The compiler could check with an enumeration, but then the header
defining needed to be included basically everywhere QOM is used, and
changed all the time.

So QOM went with strings.  I can't remember "invalid type name" bugs
surviving even basic testing in more than a decade of QOM use.

Except for *user-supplied* type names.  These need to be validated, we
failed to factor out common validation code, and ended up with bugs in
some of the copies.

>> > This problem is made worse by the proposal to introduce the idea
>> > of 'singleton' classes[1].
>> >
>> > Thus, this series suggests a way to improve safety at build
>> > time. The core idea is to allow 'object_new' to continue to be
>> > used *if-and-only-if* given a static, const string, because that
>> > scenario indicates the caller is aware of what type they are
>> > creating at build time.
>> >
>> > A new 'object_new_dynamic' method is proposed for cases where
>> > the typename is dynamically chosen at runtime. This method has
>> > an "Error **errp" parameter, which can report when an abstract
>> > type is created, leaving the assert()s only for scenarios which
>> > are unambiguous programmer errors.
>> >
>> > With a little macro magic, we guarantee a compile error is
>> > generated if 'object_new' is called with a dynamic type, forcing
>> > all potentially unsafe code over to object_new_dynamic.
>> 
>> Three cases:
>> 
>> 1. Type name is literal string.  No change.  This is the most common
>>    case.
>> 
>> 2. It's not.
>> 
>> 2a. Type name is user-provided.  This is rare.  We replace
>> 
>>         if (... guard ...) {
>>             ... return failure ...
>>         }
>>         obj = object_new(...);
>> 
>>     by
>> 
>>         obj = object_new_dynamic(..., errp);
>>         if (!obj) {
>>             ... return failure ...
>>         }
>> 
>>     This is an improvement.
>> 
>> 2b. It's not.  We should replace
>> 
>>         obj = object_new(...);
>> 
>>     by
>> 
>>         obj = object_new_dynamic(..., &error_abort);
>> 
>>     Exact same behavior, just wordier, to placate the compiler.
>>     Tolerable as long as it's relatively rare.
>> 
>>     But I'm not sure it's worthwhile.  All it really does is helping
>>     some towards not getting case 2a wrong.  But 2a is rare.
>
> Yes, 2a is fairly rare, but this is amplified by the consequences
> of getting it wrong, which are an assert killing your running VM.
> My goal was to make it much harder to screw up and trigger an
> assert, even if that makes some valid uses more verbose.

Has this been a problem in practice?  We have thirteen years of
experience...

>> > This is more tractable than adding 'Error **errp' to 'object_new'
>> > as only a handful of places use a dynamic type name.
>> 
>> True!
>> 
>> Alright, enter modules.
>> 
>> Modules break a fundamental design assumption: object_new() on a
>> compiled-in type name is safe, i.e. the failure modes are all
>> programming errors.
>> 
>> Modules add new failure modes that are *not* programming errors:
>> 
>> * The module providing the type was not deployed correctly.
>> 
>> * It was, but the host system lacks the resources to load it.
>
> Hmm, yes, I hadn't considered the 2nd problem. That's more
> unpleasant, as libvirt may well have queried QEMU earlier to
> detect the missing module, and assume all is safe if it is
> present.

The first problem could perhaps be hand-waved away: must deploy all
modules or else.  But that partly defeats the purpose of modules, namely
keeping unwanted dependencies off the host.

The second problem cannot: assertion failure on otherwise survivable
resource shortage is unequivocally wrong.

For more on module trouble, see "Problem 3: Loadable modules" in my memo
"Dynamic & heterogeneous machines, initial configuration: problems"[*]

>> Before modules, object_new(T) was safe unless T was user-provided.
>> Which implies it's safe when T is a literal string.
>> 
>> Since modules, object_new(T) is safe unless T is user-provided or the
>> type named by it is compiled as module.  This does *not* imply it's safe
>> when T is a literal string.
>
> Agreed. 
>
>> When looking at a use of object_new(), whether the argument names a type
>> that could be compiled as module cannot be known locally.  Therefore, we
>> cannot know locally whether we need to handle failure, either with a
>> suitable guard or by switching to a new function like
>> object_new_dynamic().  This is bad.
>>
>> Breaking fundamental design assumptions tends to have ugly and expensive
>> consequences.  Consequences like having to rework every single call of
>> object_new() & friends.
>> 
>> Can we reduce the damage?  Maybe.  What if we create a
>> module_object_new() that takes an Error **, and make object_new() crash
>> & burn when the @typename argument resolves to a type provided by a
>> module?
>
> I'm doubtful about a design where maintainers have to choose the
> right API, based on mental knowledge of what is a module or not.
> The place where object_new is called is typically distinct from
> the module impl, so the knowledge is separated. This opens the
> door to forgetting to change code from object_new to module_object_new.
> This is what motivated my attempt to try to force compile time errors
> scenarios which had a high chance of being user specified types.

We'd have to rely on the test suite here, as we've done for "type name
is valid".

> I don't have a good answer for how to extend compile time validation
> to cover non-user specified types that might be modules, without
> changnig 'object_new' itself to add "Error **errp" and convert as
> many callers as possible to propagate errors. That's a huge pile
> of tedious work and in many cases would deteriorate  to &error_abort
> since some key common use scenarios lack a "Error *errp" to propagate
> into.

I can offer two ideas.

I'll start with devices for reasons that will become apparent in a
minute.

The first idea is straighforward in conception: since the problem is
modules breaking existing design assumptions, unbreak them.

Device creation cannot fail, only realize can.  Could we delay the
problematic failure modes introduced by modules from creation to
realize?

When creating the real thing fails, create a dummy instead.  Of course,
the dummy needs to be sufficiently functional to provide for the things
we do with devices before realize, such as introspection.

Note that we already link information on modules into the binary, so
that the binary knows which modules provide a certain object.  To enable
sufficiently functional dummies, we'd have to link more.

The difficulty is "the things we do with devices before realize": do we
even know?

The other difficulty is that objects don't have realize.  User-creatable
objects have complete, which is kind of similar.  See also "Problem 5:
QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous
machines, initial configuration: problems"[*].

The second idea is a variation of your idea to provide two interfaces
for object creation, where using the wrong one won't compile: a common
one that cannot fail, i.e. object_new(), and an uncommon one that can.
Let's call that one object_try_new() for now.

Your proposed "string literal" as a useful approximation of "cannot
fail".  Modules defeat that.

What if we switch from strings to something more expressive?

Step one: replace string type names by symbols

Change

    #define TYPE_FOO "foo"

    Object *object_new(const char *typename);

to something like

    extern const TypeInfoSafe foo_info;
    #define TYPE_FOO &foo_info

    Object *object_new(const TypeInfoSafe *type_info);

Step two: different symbols for safe and unsafe types

    extern const TypeInfoUnsafe bar_info;
    #define TYPE_BAR &bar_info
    
    Object *object_try_new(const TypeInfoUnsafe *type_info);

Now you cannot pass bar_info to object_new().

For a module-enabled TYPE_BAR, we already have something like

    module_obj(TYPE_BAR)

Make macro module_obj() require its argument to be TypeInfoUnsafe.

Voilà, the compiler enforces use of object_try_new() for objects
provided by loadable modules.

There will be some fallout around computed type names such as
ACCEL_OPS_NAME().  Fairly rare, I think.

More fallout around passing TYPE_ macros to functions that accept both
safe and unsafe types.  How common is that?

Worth exploring?

>> Maybe module_object_new() and object_new_dynamic() could be fused into a
>> single function with a better name.
>> 
>> > With this series, my objections to Peter Xu's singleton series[1]
>> > would be largely nullified.
>> >
>> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
>> 
>
> With regards,
> Daniel


[*] Message-ID: <87o7d1i7ky.fsf@pond.sub.org>
https://lore.kernel.org/qemu-devel/87o7d1i7ky.fsf@pond.sub.org/

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Daniel P. Berrangé 1 year, 2 months ago

On Fri, Dec 06, 2024 at 09:25:54AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
> >> To be fair, object_new() was not designed for use with user-provided
> >> type names.  When it chokes on type names not provided by the user, it's
> >> clearly a programming error, and assert() is a perfectly fine way to
> >> catch programming errors.  Same for qdev_new().
> >> 
> >> However, we do in fact use these functions with user-provided type
> >> names, if rarely.  When we do, we need to validate the type name before
> >> we pass it to them.
> >> 
> >> Trouble is the validation code is a bit involved, and reimplementing it
> >> everywhere it's needed is asking for bugs.
> >>
> >> Creating and using more interfaces that are more convenient for this
> >> purpose would avoid that.
> >
> > Yep, I don't have confidence in an API that will assert if the caller
> > forgot to validate the pre-conditions that can be triggered by user
> > input (or potentially other unexpected scenarios like something being
> > switched over to a module).
> 
> Modules broke object_new(), but I'd rather not call object_new()'s
> design bad for not accomodating a feature tacked on half-baked almost a
> decade later.  But let's discuss modules further down.
> 
> Asserting preconditions isn't the problem; this is how preconditions
> *should* be checked.  The problem is error-prone preconditions.

Yep, pre-conditions need to be something developers can be reasonably
expected to accurately comply with.

> Using string type names is in theory error-prone: the compiler cannot
> check the type name is valid.  It could be invalid because of a typo, or
> because it names a type that's not linked into this binary.

> The compiler could check with an enumeration, but then the header
> defining needed to be included basically everywhere QOM is used, and
> changed all the time.
> 
> So QOM went with strings.  I can't remember "invalid type name" bugs
> surviving even basic testing in more than a decade of QOM use.

Yep, at least for static object creation using since we're using the
pattern "object_new(TYPE_BLAH)" - even if TYPE_BLAH contains a typo,
it'll be the same typo passed in the .name = TYPE_BLAH of  TypeInfo,
so all will work fine if following normal code patterns.

> Except for *user-supplied* type names.  These need to be validated, we
> failed to factor out common validation code, and ended up with bugs in
> some of the copies.

Yep

> >> Three cases:
> >> 
> >> 1. Type name is literal string.  No change.  This is the most common
> >>    case.
> >> 
> >> 2. It's not.
> >> 
> >> 2a. Type name is user-provided.  This is rare.  We replace
> >> 
> >>         if (... guard ...) {
> >>             ... return failure ...
> >>         }
> >>         obj = object_new(...);
> >> 
> >>     by
> >> 
> >>         obj = object_new_dynamic(..., errp);
> >>         if (!obj) {
> >>             ... return failure ...
> >>         }
> >> 
> >>     This is an improvement.
> >> 
> >> 2b. It's not.  We should replace
> >> 
> >>         obj = object_new(...);
> >> 
> >>     by
> >> 
> >>         obj = object_new_dynamic(..., &error_abort);
> >> 
> >>     Exact same behavior, just wordier, to placate the compiler.
> >>     Tolerable as long as it's relatively rare.
> >> 
> >>     But I'm not sure it's worthwhile.  All it really does is helping
> >>     some towards not getting case 2a wrong.  But 2a is rare.
> >
> > Yes, 2a is fairly rare, but this is amplified by the consequences
> > of getting it wrong, which are an assert killing your running VM.
> > My goal was to make it much harder to screw up and trigger an
> > assert, even if that makes some valid uses more verbose.
> 
> Has this been a problem in practice?  We have thirteen years of
> experience...

No, but this series came out of Peter's proposal to introduce the
idea of Singleton classes, which would cause object_new to assert
in fun new scenarios. Effectively adding a new pre-condition and
would thus require all places which pass a dynamic type name to
object_new(), to be updated with a "if singleton..." check. I
wasn't happy with the idea of adding that precondition without a
way to enforce that we've not missed checks somewhere in the code.

Of course this pre-condition applies to static object_new calls
too, but those are less risky as the developer (probably) has the
mental context that the static object_new call is for a singleton.

> > I don't have a good answer for how to extend compile time validation
> > to cover non-user specified types that might be modules, without
> > changnig 'object_new' itself to add "Error **errp" and convert as
> > many callers as possible to propagate errors. That's a huge pile
> > of tedious work and in many cases would deteriorate  to &error_abort
> > since some key common use scenarios lack a "Error *errp" to propagate
> > into.
> 
> I can offer two ideas.
> 
> I'll start with devices for reasons that will become apparent in a
> minute.
> 
> The first idea is straighforward in conception: since the problem is
> modules breaking existing design assumptions, unbreak them.
> 
> Device creation cannot fail, only realize can.  Could we delay the
> problematic failure modes introduced by modules from creation to
> realize?
> 
> When creating the real thing fails, create a dummy instead.  Of course,
> the dummy needs to be sufficiently functional to provide for the things
> we do with devices before realize, such as introspection.
>
> Note that we already link information on modules into the binary, so
> that the binary knows which modules provide a certain object.  To enable
> sufficiently functional dummies, we'd have to link more.
> 
> The difficulty is "the things we do with devices before realize": do we
> even know?

Yeah, the idea of a dummy stub until realize is called fills me with
worry. It feels like something where it would be really easy to make
a mistake and have code that crashes interacting with an unrealized
object that doesn't have the struct fields you expect it to have, or
has the struct fields, but not initialized since no 'init' method
was run.

A slight refinement of your idea would be to break anything modular
into 2 distinct objects classes. MyDeviceFacade and MyDeviceImpl.
Creators of the device always call object_new(TYPE_MY_DEVICE_FACADE),
and the realize() method would load the module and make thje call
to object_new(TYPE_MY_DEVICE_IMPL).

Making something currently built-in, into a module, would involve
a bunch of tedious refactoring work, so I don't much like the
thought of choosing this as a design approach.

> The other difficulty is that objects don't have realize.  User-creatable
> objects have complete, which is kind of similar.  See also "Problem 5:
> QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous
> machines, initial configuration: problems"[*].

It would be nice to have a unified model between object and devices
for the complete/realize approach, but that's a slight tangent.

> The second idea is a variation of your idea to provide two interfaces
> for object creation, where using the wrong one won't compile: a common
> one that cannot fail, i.e. object_new(), and an uncommon one that can.
> Let's call that one object_try_new() for now.
> 
> Your proposed "string literal" as a useful approximation of "cannot
> fail".  Modules defeat that.
> 
> What if we switch from strings to something more expressive?
> 
> Step one: replace string type names by symbols
> 
> Change
> 
>     #define TYPE_FOO "foo"
> 
>     Object *object_new(const char *typename);
> 
> to something like
> 
>     extern const TypeInfoSafe foo_info;
>     #define TYPE_FOO &foo_info
> 
>     Object *object_new(const TypeInfoSafe *type_info);
> 
> Step two: different symbols for safe and unsafe types
> 
>     extern const TypeInfoUnsafe bar_info;
>     #define TYPE_BAR &bar_info
>     
>     Object *object_try_new(const TypeInfoUnsafe *type_info);
> 
> Now you cannot pass bar_info to object_new().
> 
> For a module-enabled TYPE_BAR, we already have something like
> 
>     module_obj(TYPE_BAR)
> 
> Make macro module_obj() require its argument to be TypeInfoUnsafe.
> 
> Voilà, the compiler enforces use of object_try_new() for objects
> provided by loadable modules.
> 
> There will be some fallout around computed type names such as
> ACCEL_OPS_NAME().  Fairly rare, I think.
> 
> More fallout around passing TYPE_ macros to functions that accept both
> safe and unsafe types.  How common is that?

Perhaps more common than we care to admit. eg most block device drivers
are safe, except for a few we modularized which are unsafe. Most ui
frontends would be safe, except for a few we modularized. This pattern
of "except for a few we modularized" has been repeated all over, and
conceptually that's not a bad thing, as we wanted to make it easy to
modularize things incrementally.

Looking at our current /usr/bin/qemu-system-XXX binaries, they range in
size from 6 MB to 30 MB, stripped, ignoring linked libraries. Considering
work on the qemu-system-any binary that is intended to unify all targets,
I wouldn't be surprised if it came out at over 100 MB with all devices
from all targets included.

Is qemu-system-any pushing us to a place where our approach to modules
is in fact wrong ?

Modularizing piecemeal let us cull the big offenders that pulled in
huge external libraries.

People still complain QEMU is "too big" and binaries linked to too
many legacy devices.

With my distro hat on, if we had 'qemu-system-any' would I really
want to have it as monolithic binary ?

I think I would want to have loadable TCG backends for each target,
and I would want all the devices for each target to be loadable too.
eg, so I could have a 'qemu-system-any' RPM with just the core, and
'qemu-system-modules-arm', 'qemu-system-modules-x86_64', etc, or
even more fine grained than that.

IOW, everything is a module by default. Not necccessarily
1 object == 1 module, more  "N objects == 1 module", but certainly
with very few objects built-in.

In such a world, IMHO, it doesn't make sense to have TypeInfoSafe
and TypeInfoUnsafe, with different object_new/object_try_new
methods. I think we would have to accept that object_new must
get an "Error **errp", and possibly even the 'init" method too.
It would force us to make sure we can propagage into errp in
all the key places we can't do so today wrt object lifecycles.

Overall I've talked myself into believing my series here is not
worthwhile, as it doesn't solve a big enough problem, and it
needs somethign more ambituous.

> >> Maybe module_object_new() and object_new_dynamic() could be fused into a
> >> single function with a better name.
> >> 
> >> > With this series, my objections to Peter Xu's singleton series[1]
> >> > would be largely nullified.
> >> >
> >> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
> 
> [*] Message-ID: <87o7d1i7ky.fsf@pond.sub.org>
> https://lore.kernel.org/qemu-devel/87o7d1i7ky.fsf@pond.sub.org/

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/9] Require error handling for dynamically created objects

Posted by Markus Armbruster 1 year, 2 months ago

Cc: Phil, because we're now discusing qemu-system-any.

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Dec 06, 2024 at 09:25:54AM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
>> >> To be fair, object_new() was not designed for use with user-provided
>> >> type names.  When it chokes on type names not provided by the user, it's
>> >> clearly a programming error, and assert() is a perfectly fine way to
>> >> catch programming errors.  Same for qdev_new().
>> >> 
>> >> However, we do in fact use these functions with user-provided type
>> >> names, if rarely.  When we do, we need to validate the type name before
>> >> we pass it to them.
>> >> 
>> >> Trouble is the validation code is a bit involved, and reimplementing it
>> >> everywhere it's needed is asking for bugs.
>> >>
>> >> Creating and using more interfaces that are more convenient for this
>> >> purpose would avoid that.
>> >
>> > Yep, I don't have confidence in an API that will assert if the caller
>> > forgot to validate the pre-conditions that can be triggered by user
>> > input (or potentially other unexpected scenarios like something being
>> > switched over to a module).
>> 
>> Modules broke object_new(), but I'd rather not call object_new()'s
>> design bad for not accomodating a feature tacked on half-baked almost a
>> decade later.  But let's discuss modules further down.
>> 
>> Asserting preconditions isn't the problem; this is how preconditions
>> *should* be checked.  The problem is error-prone preconditions.
>
> Yep, pre-conditions need to be something developers can be reasonably
> expected to accurately comply with.
>
>> Using string type names is in theory error-prone: the compiler cannot
>> check the type name is valid.  It could be invalid because of a typo, or
>> because it names a type that's not linked into this binary.
>
>
>> The compiler could check with an enumeration, but then the header
>> defining needed to be included basically everywhere QOM is used, and
>> changed all the time.
>> 
>> So QOM went with strings.  I can't remember "invalid type name" bugs
>> surviving even basic testing in more than a decade of QOM use.
>
> Yep, at least for static object creation using since we're using the
> pattern "object_new(TYPE_BLAH)" - even if TYPE_BLAH contains a typo,
> it'll be the same typo passed in the .name = TYPE_BLAH of  TypeInfo,
> so all will work fine if following normal code patterns.

There's no shortage of qdev_new("mumble"), and even there typos haven't
been a problem.

>> Except for *user-supplied* type names.  These need to be validated, we
>> failed to factor out common validation code, and ended up with bugs in
>> some of the copies.
>
> Yep
>
>
>> >> Three cases:
>> >> 
>> >> 1. Type name is literal string.  No change.  This is the most common
>> >>    case.
>> >> 
>> >> 2. It's not.
>> >> 
>> >> 2a. Type name is user-provided.  This is rare.  We replace
>> >> 
>> >>         if (... guard ...) {
>> >>             ... return failure ...
>> >>         }
>> >>         obj = object_new(...);
>> >> 
>> >>     by
>> >> 
>> >>         obj = object_new_dynamic(..., errp);
>> >>         if (!obj) {
>> >>             ... return failure ...
>> >>         }
>> >> 
>> >>     This is an improvement.
>> >> 
>> >> 2b. It's not.  We should replace
>> >> 
>> >>         obj = object_new(...);
>> >> 
>> >>     by
>> >> 
>> >>         obj = object_new_dynamic(..., &error_abort);
>> >> 
>> >>     Exact same behavior, just wordier, to placate the compiler.
>> >>     Tolerable as long as it's relatively rare.
>> >> 
>> >>     But I'm not sure it's worthwhile.  All it really does is helping
>> >>     some towards not getting case 2a wrong.  But 2a is rare.
>> >
>> > Yes, 2a is fairly rare, but this is amplified by the consequences
>> > of getting it wrong, which are an assert killing your running VM.
>> > My goal was to make it much harder to screw up and trigger an
>> > assert, even if that makes some valid uses more verbose.
>> 
>> Has this been a problem in practice?  We have thirteen years of
>> experience...
>
> No, but this series came out of Peter's proposal to introduce the
> idea of Singleton classes, which would cause object_new to assert
> in fun new scenarios. Effectively adding a new pre-condition and
> would thus require all places which pass a dynamic type name to
> object_new(), to be updated with a "if singleton..." check. I
> wasn't happy with the idea of adding that precondition without a
> way to enforce that we've not missed checks somewhere in the code.
>
> Of course this pre-condition applies to static object_new calls
> too, but those are less risky as the developer (probably) has the
> mental context that the static object_new call is for a singleton.
>
>> > I don't have a good answer for how to extend compile time validation
>> > to cover non-user specified types that might be modules, without
>> > changnig 'object_new' itself to add "Error **errp" and convert as
>> > many callers as possible to propagate errors. That's a huge pile
>> > of tedious work and in many cases would deteriorate  to &error_abort
>> > since some key common use scenarios lack a "Error *errp" to propagate
>> > into.
>> 
>> I can offer two ideas.
>> 
>> I'll start with devices for reasons that will become apparent in a
>> minute.
>> 
>> The first idea is straighforward in conception: since the problem is
>> modules breaking existing design assumptions, unbreak them.
>> 
>> Device creation cannot fail, only realize can.  Could we delay the
>> problematic failure modes introduced by modules from creation to
>> realize?
>> 
>> When creating the real thing fails, create a dummy instead.  Of course,
>> the dummy needs to be sufficiently functional to provide for the things
>> we do with devices before realize, such as introspection.
>>
>> Note that we already link information on modules into the binary, so
>> that the binary knows which modules provide a certain object.  To enable
>> sufficiently functional dummies, we'd have to link more.
>> 
>> The difficulty is "the things we do with devices before realize": do we
>> even know?
>
> Yeah, the idea of a dummy stub until realize is called fills me with
> worry. It feels like something where it would be really easy to make
> a mistake and have code that crashes interacting with an unrealized
> object that doesn't have the struct fields you expect it to have, or
> has the struct fields, but not initialized since no 'init' method
> was run.
>
> A slight refinement of your idea would be to break anything modular
> into 2 distinct objects classes. MyDeviceFacade and MyDeviceImpl.
> Creators of the device always call object_new(TYPE_MY_DEVICE_FACADE),
> and the realize() method would load the module and make thje call
> to object_new(TYPE_MY_DEVICE_IMPL).
>
> Making something currently built-in, into a module, would involve
> a bunch of tedious refactoring work, so I don't much like the
> thought of choosing this as a design approach.
>
>> The other difficulty is that objects don't have realize.  User-creatable
>> objects have complete, which is kind of similar.  See also "Problem 5:
>> QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous
>> machines, initial configuration: problems"[*].
>
> It would be nice to have a unified model between object and devices
> for the complete/realize approach, but that's a slight tangent.

Yes, and yes.

>> The second idea is a variation of your idea to provide two interfaces
>> for object creation, where using the wrong one won't compile: a common
>> one that cannot fail, i.e. object_new(), and an uncommon one that can.
>> Let's call that one object_try_new() for now.
>> 
>> Your proposed "string literal" as a useful approximation of "cannot
>> fail".  Modules defeat that.
>> 
>> What if we switch from strings to something more expressive?
>> 
>> Step one: replace string type names by symbols
>> 
>> Change
>> 
>>     #define TYPE_FOO "foo"
>> 
>>     Object *object_new(const char *typename);
>> 
>> to something like
>> 
>>     extern const TypeInfoSafe foo_info;
>>     #define TYPE_FOO &foo_info
>> 
>>     Object *object_new(const TypeInfoSafe *type_info);
>> 
>> Step two: different symbols for safe and unsafe types
>> 
>>     extern const TypeInfoUnsafe bar_info;
>>     #define TYPE_BAR &bar_info
>>     
>>     Object *object_try_new(const TypeInfoUnsafe *type_info);
>> 
>> Now you cannot pass bar_info to object_new().
>> 
>> For a module-enabled TYPE_BAR, we already have something like
>> 
>>     module_obj(TYPE_BAR)
>> 
>> Make macro module_obj() require its argument to be TypeInfoUnsafe.
>> 
>> Voilà, the compiler enforces use of object_try_new() for objects
>> provided by loadable modules.
>> 
>> There will be some fallout around computed type names such as
>> ACCEL_OPS_NAME().  Fairly rare, I think.
>> 
>> More fallout around passing TYPE_ macros to functions that accept both
>> safe and unsafe types.  How common is that?
>
> Perhaps more common than we care to admit. eg most block device drivers
> are safe, except for a few we modularized which are unsafe. Most ui
> frontends would be safe, except for a few we modularized. This pattern
> of "except for a few we modularized" has been repeated all over, and
> conceptually that's not a bad thing, as we wanted to make it easy to
> modularize things incrementally.

It's only a problem if we have functions other than object_new() &
wrappers that now take type name strings.  Since their string argument
can't become both TypeInfoSafe and TypeInfoUnsafe, we'd have to split
them into a safe and an unsafe variant just like object_new(), or splice
in a suitable conversion.  Do such functions exist?  Helpers within qom/
don't really count.

> Looking at our current /usr/bin/qemu-system-XXX binaries, they range in
> size from 6 MB to 30 MB, stripped, ignoring linked libraries. Considering
> work on the qemu-system-any binary that is intended to unify all targets,
> I wouldn't be surprised if it came out at over 100 MB with all devices
> from all targets included.
>
> Is qemu-system-any pushing us to a place where our approach to modules
> is in fact wrong ?
>
> Modularizing piecemeal let us cull the big offenders that pulled in
> huge external libraries.
>
> People still complain QEMU is "too big" and binaries linked to too
> many legacy devices.
>
> With my distro hat on, if we had 'qemu-system-any' would I really
> want to have it as monolithic binary ?

No, and I don't think that's Phil's plan.

> I think I would want to have loadable TCG backends for each target,
> and I would want all the devices for each target to be loadable too.
> eg, so I could have a 'qemu-system-any' RPM with just the core, and
> 'qemu-system-modules-arm', 'qemu-system-modules-x86_64', etc, or
> even more fine grained than that.
>
> IOW, everything is a module by default. Not necccessarily
> 1 object == 1 module, more  "N objects == 1 module", but certainly
> with very few objects built-in.

I doubt one module per QOM type makes sense.  That's a huuuuge amount
modules, a thicket of dependencies, and way too much dynamic loading and
linking.

My qemu-system-x86_64 links with ~200 shared objects according to ldd.

It sports ~800 QOM types according to qom-list-types.

Pulling in shared objects in the low hundreds is already a dubious idea.
Pushing their number into the thousands feels... unadvisable.

Quite a few QOM types live together with relatives in the same .c.  But
even one module per .c instead of one module per QOM type will still
result in an excessive number of modules.  Evidence: I count >1700 .c
under hw, and I suspect most of them would become modules.

How can we do better?

Clearly, each target requires a certain set of QOM types.  Same for each
machine type.  We could simply have one module per target plus one
module per machine type.  A homogeneous guest would need one of each.

Since some types are used by more than one target / machine type, we'd
end up with them duplicated in different target / machine type modules.
To avoid that, we'd have to factor them out into common modules the
target / machine type modules can statically require.

Sounds like work, but it should produce a lot fewer modules.

> In such a world, IMHO, it doesn't make sense to have TypeInfoSafe
> and TypeInfoUnsafe, with different object_new/object_try_new
> methods. I think we would have to accept that object_new must
> get an "Error **errp", and possibly even the 'init" method too.
> It would force us to make sure we can propagage into errp in
> all the key places we can't do so today wrt object lifecycles.

With less fine-grained modularization, such as the one I sketched above,
many (most?) instances of object_new() remain safe, because they create
instances of types provided by the same module, or a statically required
module.  Remember, shared objects don't load unless their dependencies
also load, which means a successfully loaded module will have all it
statically requires.

> Overall I've talked myself into believing my series here is not
> worthwhile, as it doesn't solve a big enough problem, and it
> needs somethign more ambituous.

Yes, but what exactly is not yet clear to me.

>> >> Maybe module_object_new() and object_new_dynamic() could be fused into a
>> >> single function with a better name.
>> >> 
>> >> > With this series, my objections to Peter Xu's singleton series[1]
>> >> > would be largely nullified.
>> >> >
>> >> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
>> 
>> [*] Message-ID: <87o7d1i7ky.fsf@pond.sub.org>
>> https://lore.kernel.org/qemu-devel/87o7d1i7ky.fsf@pond.sub.org/
>
> With regards,
> Daniel