[PATCH] tests: improve performance of device-introspect-test

Daniel P. Berrangé posted 1 patch 3 years, 8 months ago
Test checkpatch passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200709112857.3760116-1-berrange@redhat.com
There is a newer version of this series
tests/qtest/device-introspect-test.c | 38 +++++++++++++---------------
1 file changed, 18 insertions(+), 20 deletions(-)
[PATCH] tests: improve performance of device-introspect-test
Posted by Daniel P. Berrangé 3 years, 8 months ago
Total execution time with "-m slow" and x86_64 QEMU, drops from 3
minutes 15 seconds, down to 54 seconds.

Individual tests drop from 17-20 seconds, down to 3-4 seconds.

The cost of this change is that any QOM bugs resulting in the test
failure will not be directly associated with the device that caused
the failure. The test case is not frequently identifying such bugs
though, and the cause is likely easily visible in the patch series
that causes the failure. So overall the shorter running time is
considered the more important factor.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/device-introspect-test.c | 38 +++++++++++++---------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/tests/qtest/device-introspect-test.c b/tests/qtest/device-introspect-test.c
index 9abb5ec889..b4af1e19f6 100644
--- a/tests/qtest/device-introspect-test.c
+++ b/tests/qtest/device-introspect-test.c
@@ -105,14 +105,9 @@ static void test_one_device(QTestState *qts, const char *type)
 {
     QDict *resp;
     char *help;
-    char *qom_tree_start, *qom_tree_end;
-    char *qtree_start, *qtree_end;
 
     g_test_message("Testing device '%s'", type);
 
-    qom_tree_start = qtest_hmp(qts, "info qom-tree");
-    qtree_start = qtest_hmp(qts, "info qtree");
-
     resp = qtest_qmp(qts, "{'execute': 'device-list-properties',"
                           " 'arguments': {'typename': %s}}",
                type);
@@ -120,21 +115,6 @@ static void test_one_device(QTestState *qts, const char *type)
 
     help = qtest_hmp(qts, "device_add \"%s,help\"", type);
     g_free(help);
-
-    /*
-     * Some devices leave dangling pointers in QOM behind.
-     * "info qom-tree" or "info qtree" have a good chance at crashing then.
-     * Also make sure that the tree did not change.
-     */
-    qom_tree_end = qtest_hmp(qts, "info qom-tree");
-    g_assert_cmpstr(qom_tree_start, ==, qom_tree_end);
-    g_free(qom_tree_start);
-    g_free(qom_tree_end);
-
-    qtree_end = qtest_hmp(qts, "info qtree");
-    g_assert_cmpstr(qtree_start, ==, qtree_end);
-    g_free(qtree_start);
-    g_free(qtree_end);
 }
 
 static void test_device_intro_list(void)
@@ -232,10 +212,17 @@ static void test_device_intro_concrete(const void *args)
     QListEntry *entry;
     const char *type;
     QTestState *qts;
+    g_autofree char *qom_tree_start = NULL;
+    g_autofree char *qom_tree_end = NULL;
+    g_autofree char *qtree_start = NULL;
+    g_autofree char *qtree_end = NULL;
 
     qts = qtest_init(args);
     types = device_type_list(qts, false);
 
+    qom_tree_start = qtest_hmp(qts, "info qom-tree");
+    qtree_start = qtest_hmp(qts, "info qtree");
+
     QLIST_FOREACH_ENTRY(types, entry) {
         type = qdict_get_try_str(qobject_to(QDict, qlist_entry_obj(entry)),
                                  "name");
@@ -243,6 +230,17 @@ static void test_device_intro_concrete(const void *args)
         test_one_device(qts, type);
     }
 
+    /*
+     * Some devices leave dangling pointers in QOM behind.
+     * "info qom-tree" or "info qtree" have a good chance at crashing then.
+     * Also make sure that the tree did not change.
+     */
+    qom_tree_end = qtest_hmp(qts, "info qom-tree");
+    g_assert_cmpstr(qom_tree_start, ==, qom_tree_end);
+
+    qtree_end = qtest_hmp(qts, "info qtree");
+    g_assert_cmpstr(qtree_start, ==, qtree_end);
+
     qobject_unref(types);
     qtest_quit(qts);
     g_free((void *)args);
-- 
2.26.2


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Markus Armbruster 3 years, 8 months ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> Total execution time with "-m slow" and x86_64 QEMU, drops from 3
> minutes 15 seconds, down to 54 seconds.
>
> Individual tests drop from 17-20 seconds, down to 3-4 seconds.

Nice!

A few observations on this test (impatient readers may skip to
"Conclusions"):

* Your testing rig is a fair bit faster than mine.

* Run time is almost completely spent in the
  /TARGET/device/introspect/concrete/*/* cases.

  Each such case iterates over all known device types.  Each iteration
  introspects one device.

  Cases concrete/*/M use machine type M.  The test ignores "old"
  versioned machine types; see qtest_is_old_versioned_machine().

  Cases concrete/nodefaults/* use -nodefaults, cases concrete/defaults/*
  don't.

* The number of known device types varies between targets from 33
  (tricore) to several hundreds (x86_64+i386: 421, ppc 593, arm 667,
  aarch64 680, ppc64 689).  Median is 215, sum is 7485.

  Aside: why on earth do we need 334 ppc64 CPUs?

* The number of tested machine types varies between targets from two to
  over 60 (arm 62, aarch64 66).  Median is 4, sum is 286.

* Without -m slow, only /TARGET/device/introspect/concrete/defaults/none
  is run.

  This tests #devices introspections, i.e. between 33 and 689 depending
  on target, 7485 for all targets together.

  Run times for master (commit eb2c66b10e) vary for me from well under
  one second to almost 6 seconds, with x86_64 and i386 the slowest,
  followed by aarch64, arm, ppc64, pcc. s390x.  This is what "make
  check" runs.  Total run time for all 32 targets is ~72s.  That's
  roughly 100 introspections per second.

  Note I measured manual runs, like so:

  $ QEMU_AUDIO_DRV=none QTEST_QEMU_BINARY="$TARGET-softmmu/qemu-system-$TARGET" QTEST_QEMU_IMG=qemu-img time tests/qtest/device-introspect-test

* With -m slow, we run all cases.  This is relatively recent: commit
  410573aa2c 'tests/device-introspect: Test with all machines, not only
  with "none"'.  The commit message explains:

    Certain device introspection crashes used to only happen if you were
    using a certain machine, e.g. if the machine was using serial_hd() or
    nd_table[], and a device was trying to use these in its instance_init
    function, too.

  Aside: these are things an instance_init() should not do.  But
  catching things that should not be done is one reason for having
  tests.  This one is an awfully expensive catcher, but until someone
  has better ideas...

  With -m slow, we test 2 * #machines * #devices introspections,
  i.e. from 132 (tricore) to over 10k (ppc 13046, ppc64 23426, arm
  82708, aarch64 89760).  Median is ~1600, sum is ~260k.

  Except we actually test just 89k now, because the test *fails* for arm
  and aarch64 after some 500 introspections: introspecting device
  msf2-soc with machine ast2600-evb makes QEMU terminate unsuccessfully
  with "Unsupported NIC model: ftgmac100".  Cause: m2sxxx_soc_initfn()
  calls qemu_check_nic_model().  Goes back to commit 05b7374a58 "msf2:
  Add EMAC block to SmartFusion2 SoC", merged some ten weeks ago.  This
  is exactly the kind of mistake the test is designed to catch.  How
  come it wasn't?  Possibly due to unlucky combination with the slowdown
  discussed in the next item (but that was less than four weeks ago).
  I'll post a patch to fix the bug.

  Run time increases even more than the number of introspections.  Total
  run time for all targets is around one hour, which is less than 25
  introspections per second.  ppc64 is the worst by far: 26 *minutes*
  for 23k tests, less than 15 introspections per second.  sparc and rx
  achieve a paltry 5 introspections per second.

* Recent commit e8c9e65816 'qom: Make "info qom-tree" show children
  sorted' made things worse.  Reverting it cuts run time without -m slow
  from 72s to 43s (100 and 170 introspections per second, respectively),
  and with -m slow from one hour to 6 minutes (24 and 250 introspections
  per second, respectively).

  The speedup with -m slow varies much between targets: from less than 2
  (cris, moxie, nios2, or1k, s390x, tricore, unicore32) to more than 20
  (ppc64, rx, sparc).  The most sluggish introspections profit the most.

  Paolo pointed out that the sorting is n^2.  True.  I did not expect n
  to be large enough for that to matter.  What's being sorted is the
  children of each node in the QOM composition tree.  The number of
  children is almost always small.  The performance difference proves
  enough of them are large enough for the sorting to matter.  The
  variance between machine types suggests some machines have nodes with
  a particularly large number of children.

* I found a memory leak in the same commit.  Plugging it makes the test
  slightly slower for me.  I'll post the fix.

* Since some of the n appear to be large enough for n^2 to matter, I
  tried sorting more efficiently.

  With g_queue_insert_sorted() replaced by g_queue_push_head() &
  g_queue_sort(), the test without -m slow is *slower* : 80s instead of
  72s.  Same for g_slist_prepend() & g_slist_sort().  I believe that's
  because the n are indeed small for the none machine.

  -m slow gets faster: 34 minutes instead of one hour (44 introspections
  per second).  Better, I wouldn't call it good.

* These modest time savings are dwarved many times over by Daniel's
  patch.  With his patch and my leak fix, the test takes 51s without -m
  slow (down from 72s), and 7:29 minutes with -m slow (down from 1h).
  The test is still a pig, but anything that tests a hundred thousand
  QMP commands is bound to be one.

  My "info qom-tree" improvements don't matter anymore then, at least
  for this test: 50s vs. 51s, and 449s vs. 456s.

* Fixing the msf2-soc bug more than *doubles* the run time with -m slow:
  from 7.5 to 18 minutes.

* Measurements in tabular form:

                                without -m slow     with -m slow
  master                        72s  (100 / s)      ~1h  ( 24 / s)
  sort reverted                 43s  (170 / s)      362s (250 / s)
  sort redone, leak plugged     80s  ( 90 / s)      ~34m ( 44 / s)
  Daniels' patch, leak plugged  51s  (170 / s)      456s (200 / s)
  & sort redone                 50s  (150 / s)      449s (200 / s)
  & msf2-soc fixed              no change           ~18m (240 / s)

Conclusions:

* The test matrix is *expensive*.  Testing even a simple QMP query is
  when you do it a quarter million times.  ARM is the greediest pig by
  far (170k introspections, almost two thirds of the total!), followed
  by ppc (36k), x86 (12k) and mips (11k).  Ideas on trimming excess are
  welcome.  I'm not even sure anymore this should be a qtest.

* Running "info qom-tree" and "info qtree" in the order of a hundred
  thousand times is a bad idea.  Daniel's patch makes sense.

* Making info qom-tree build unsorted lists, then sorting them
  efficiently only matters when you run "info qom-tree" way more than
  you should.  It helps more with machines where longer lists exist than
  it hurts with machines where the lists are all short.

> The cost of this change is that any QOM bugs resulting in the test
> failure will not be directly associated with the device that caused
> the failure. The test case is not frequently identifying such bugs
> though,

Most likely because it identifies them long before the patches hit the
list :)

>         and the cause is likely easily visible in the patch series
> that causes the failure. So overall the shorter running time is
> considered the more important factor.

The time savings are simply too large to be left on the table.

I agree with the idea that most of the time the offending device should
be fairly obvious.  But when it isn't, having a way to let the test do
the work of pinpointing the device would be sooo nice.  Can we keep the
slow & thorough checking as a run-time option?

> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks!


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Thomas Huth 3 years, 8 months ago
On 10/07/2020 22.03, Markus Armbruster wrote:
[...]
>   With -m slow, we test 2 * #machines * #devices introspections,
>   i.e. from 132 (tricore) to over 10k (ppc 13046, ppc64 23426, arm
>   82708, aarch64 89760).  Median is ~1600, sum is ~260k.
> 
>   Except we actually test just 89k now, because the test *fails* for arm
>   and aarch64 after some 500 introspections: introspecting device
>   msf2-soc with machine ast2600-evb makes QEMU terminate unsuccessfully
>   with "Unsupported NIC model: ftgmac100".  Cause: m2sxxx_soc_initfn()
>   calls qemu_check_nic_model().  Goes back to commit 05b7374a58 "msf2:
>   Add EMAC block to SmartFusion2 SoC", merged some ten weeks ago.  This
>   is exactly the kind of mistake the test is designed to catch.  How
>   come it wasn't?  Possibly due to unlucky combination with the slowdown
>   discussed in the next item (but that was less than four weeks ago).

Well, the explanation is simpler: Nobody ran the
device-introspection-test with the arm target in slow mode! Hardly
anybody runs the tests with SPEED=slow, and in the CI, we currently only
run the test in slow mode for the i386-softmmu, ppc64-softmmu and
mips64-softmmu targets. Simply because testing with arm was toooo slow
when I wrote that CI script, I didn't want to wait endlessly here.

But now with the speedup patch from Daniel, and maybe with some smarter
checks in the YML file (I now know that there are things like "only:
changes:" keywords so we could e.g. only run that test if something in
hw/arm/* has changed), I think it should be feasible to get this
included in the CI, too.

 Thomas


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Daniel P. Berrangé 3 years, 8 months ago
On Fri, Jul 10, 2020 at 10:03:56PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > Total execution time with "-m slow" and x86_64 QEMU, drops from 3
> > minutes 15 seconds, down to 54 seconds.
> >
> > Individual tests drop from 17-20 seconds, down to 3-4 seconds.
> 
> Nice!
> 
> A few observations on this test (impatient readers may skip to
> "Conclusions"):

snip

> * The number of known device types varies between targets from 33
>   (tricore) to several hundreds (x86_64+i386: 421, ppc 593, arm 667,
>   aarch64 680, ppc64 689).  Median is 215, sum is 7485.

snip

> * The test matrix is *expensive*.  Testing even a simple QMP query is
>   when you do it a quarter million times.  ARM is the greediest pig by
>   far (170k introspections, almost two thirds of the total!), followed
>   by ppc (36k), x86 (12k) and mips (11k).  Ideas on trimming excess are
>   welcome.  I'm not even sure anymore this should be a qtest.

We have 70 arm machines, 667 devices. IIUC we are roughly testing every
device against everything machine. 46,690 tests.

Most of the time devices are going to behave identically regardless of
which machine type is used. The trouble is some machines are different
enough that they can genuinely trigger different behaviour. It isn't
possible to slim the (machine, device) expansion down programatically
while still exercising the interesting combinations unless we get alot
more advanced.

eg if a have a PCI device, we only need test it in one PCI based machine,
and only need test it on one non-PCI based machine.

I would be interesting to actually get some CPU profiling data for
this test to see if it points out anything interesting about where the
time is being spent. Even if we don't reduce the complexity, reducing
a time factor will potentially greatly help. 


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Markus Armbruster 3 years, 8 months ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Jul 10, 2020 at 10:03:56PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > Total execution time with "-m slow" and x86_64 QEMU, drops from 3
>> > minutes 15 seconds, down to 54 seconds.
>> >
>> > Individual tests drop from 17-20 seconds, down to 3-4 seconds.
>> 
>> Nice!
>> 
>> A few observations on this test (impatient readers may skip to
>> "Conclusions"):
>
> snip
>
>> * The number of known device types varies between targets from 33
>>   (tricore) to several hundreds (x86_64+i386: 421, ppc 593, arm 667,
>>   aarch64 680, ppc64 689).  Median is 215, sum is 7485.
>
> snip
>
>> * The test matrix is *expensive*.  Testing even a simple QMP query is
>>   when you do it a quarter million times.  ARM is the greediest pig by
>>   far (170k introspections, almost two thirds of the total!), followed
>>   by ppc (36k), x86 (12k) and mips (11k).  Ideas on trimming excess are
>>   welcome.  I'm not even sure anymore this should be a qtest.
>
> We have 70 arm machines, 667 devices. IIUC we are roughly testing every
> device against everything machine. 46,690 tests.
>
> Most of the time devices are going to behave identically regardless of
> which machine type is used. The trouble is some machines are different
> enough that they can genuinely trigger different behaviour. It isn't
> possible to slim the (machine, device) expansion down programatically
> while still exercising the interesting combinations unless we get alot
> more advanced.
>
> eg if a have a PCI device, we only need test it in one PCI based machine,
> and only need test it on one non-PCI based machine.

The trouble is .instance_init() can do anything, and can therefore
interact badly with anything.

Example: m2sxxx_soc_initfn() of device type "msf2-soc" messes with
nd_table[0].  That's wrong.  The test doesn't catch it with machine type
"none", where nd_table[0] is blank.  It does catch it with machine type
"ast2600-evb", because aspeed_machine_init() puts something incompatible
into nd_table[0], which makes m2sxxx_soc_initfn() crash.

"msf2-soc" is not a PCI device, but if it was, then the two machines
(with and without PCI) picked for testing PCI devices may well both
leave nd_table[0] blank, and therefore not catch the bug.

Some instances of "device code does stuff it should not" could be
prevented by making interfaces inaccessible there.  We'd have to
identify device code first.  The hw/BASE-ARCH/ contain both boards and
devices.  Possibly even in the same .c.

> I would be interesting to actually get some CPU profiling data for
> this test to see if it points out anything interesting about where the
> time is being spent. Even if we don't reduce the complexity, reducing
> a time factor will potentially greatly help. 

Hunch: when we want to test device instantiation and finalization for a
million (give or take) combinations of board x device, testing them one
by one with QMP might be a bad idea.


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Laurent Vivier 3 years, 8 months ago
On 09/07/2020 13:28, Daniel P. Berrangé wrote:
> Total execution time with "-m slow" and x86_64 QEMU, drops from 3
> minutes 15 seconds, down to 54 seconds.
> 
> Individual tests drop from 17-20 seconds, down to 3-4 seconds.
> 
> The cost of this change is that any QOM bugs resulting in the test
> failure will not be directly associated with the device that caused
> the failure. The test case is not frequently identifying such bugs
> though, and the cause is likely easily visible in the patch series
> that causes the failure. So overall the shorter running time is
> considered the more important factor.

You don't report the test to test_device_intro_none() and
test_device_intro_abstract(): is it intended ?

Thanks,
Laurent


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Daniel P. Berrangé 3 years, 8 months ago
On Thu, Jul 09, 2020 at 01:44:45PM +0200, Laurent Vivier wrote:
> On 09/07/2020 13:28, Daniel P. Berrangé wrote:
> > Total execution time with "-m slow" and x86_64 QEMU, drops from 3
> > minutes 15 seconds, down to 54 seconds.
> > 
> > Individual tests drop from 17-20 seconds, down to 3-4 seconds.
> > 
> > The cost of this change is that any QOM bugs resulting in the test
> > failure will not be directly associated with the device that caused
> > the failure. The test case is not frequently identifying such bugs
> > though, and the cause is likely easily visible in the patch series
> > that causes the failure. So overall the shorter running time is
> > considered the more important factor.
> 
> You don't report the test to test_device_intro_none() and
> test_device_intro_abstract(): is it intended ?

Since neither of those tests will result in any device being created there
didn't seem any point in chceking the qtree output.

IIUC, both of those tests should result in an error being reported from
the device_add command, but I see nothing actually validates that is the
case. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Laurent Vivier 3 years, 8 months ago
On 09/07/2020 13:59, Daniel P. Berrangé wrote:
> On Thu, Jul 09, 2020 at 01:44:45PM +0200, Laurent Vivier wrote:
>> On 09/07/2020 13:28, Daniel P. Berrangé wrote:
>>> Total execution time with "-m slow" and x86_64 QEMU, drops from 3
>>> minutes 15 seconds, down to 54 seconds.
>>>
>>> Individual tests drop from 17-20 seconds, down to 3-4 seconds.
>>>
>>> The cost of this change is that any QOM bugs resulting in the test
>>> failure will not be directly associated with the device that caused
>>> the failure. The test case is not frequently identifying such bugs
>>> though, and the cause is likely easily visible in the patch series
>>> that causes the failure. So overall the shorter running time is
>>> considered the more important factor.
>>
>> You don't report the test to test_device_intro_none() and
>> test_device_intro_abstract(): is it intended ?
> 
> Since neither of those tests will result in any device being created there
> didn't seem any point in chceking the qtree output.
> 
> IIUC, both of those tests should result in an error being reported from
> the device_add command, but I see nothing actually validates that is the
> case. 

I think the purpose of these tests is precisely to ensure nothing is
created. This is why they check the qtree and not the reported error.

Markus?

Thanks,
Laurent


Re: [PATCH] tests: improve performance of device-introspect-test
Posted by Markus Armbruster 3 years, 8 months ago
Laurent Vivier <lvivier@redhat.com> writes:

> On 09/07/2020 13:59, Daniel P. Berrangé wrote:
>> On Thu, Jul 09, 2020 at 01:44:45PM +0200, Laurent Vivier wrote:
>>> On 09/07/2020 13:28, Daniel P. Berrangé wrote:
>>>> Total execution time with "-m slow" and x86_64 QEMU, drops from 3
>>>> minutes 15 seconds, down to 54 seconds.
>>>>
>>>> Individual tests drop from 17-20 seconds, down to 3-4 seconds.
>>>>
>>>> The cost of this change is that any QOM bugs resulting in the test
>>>> failure will not be directly associated with the device that caused
>>>> the failure. The test case is not frequently identifying such bugs
>>>> though, and the cause is likely easily visible in the patch series
>>>> that causes the failure. So overall the shorter running time is
>>>> considered the more important factor.
>>>
>>> You don't report the test to test_device_intro_none() and
>>> test_device_intro_abstract(): is it intended ?
>> 
>> Since neither of those tests will result in any device being created there
>> didn't seem any point in chceking the qtree output.
>> 
>> IIUC, both of those tests should result in an error being reported from
>> the device_add command, but I see nothing actually validates that is the
>> case. 
>
> I think the purpose of these tests is precisely to ensure nothing is
> created. This is why they check the qtree and not the reported error.
>
> Markus?

Before I answer the question, let me provide a bit of background.

The tests we're discussing exercise QMP command device-list-properties.
We're trying to catch two kinds of bugs:

* device introspection crashes or hangs outright, and

* device introspection has unwelcome side effects.

We're vulnerable to such bugs since device introspection has to create
and destroy an instance (consequence of QOM's design), and the device
code that gets run on creation and destruction can do anything.

To catch crashs or hangs, we execute one introspection, throwing away
its result.

Catching general side effects is impossible, so we instead catch special
side effects that (a) have a history of actual bugs, and (b) are easy to
observe: changes to the qdev tree visible in info qtree, and changes to
the QOM composition tree visible in info qom-tree.

Now I'm ready to answer the question.

test_device_intro_none() covers introspection of nonexistent device type
"nonexistent".  device-list-properties is expected to fail without
creating a device instance, let alone run device code.

test_device_intro_abstract() covers introspection of abstract device
type "abstract".  device-list-properties is again expected to fail
without creating a device instance.

The test neglects to cover "type exists, but is not a device" (my
fault).  Same thing.

In all these cases, side effects on "info qtree" and "info qom-tree" in
the (tiny amount of) generic code they run are theoretically possible,
but vanishingly unlikely.  If generic code ever gets confused enough to
somehow create an instance, all bets are off, of course.

Daniel's patch loses us some insurance against this kind of confusion.
Do we care?

I guess the insurance is nice to have when it costs basically nothing.