[v2] fast qom tree get

[PATCH V2 0/5] fast qom tree get

Posted by Steve Sistare 6 months, 1 week ago

Using qom-list and qom-get to get all the nodes and property values in a
QOM tree can take multiple seconds because it requires 1000's of individual
QOM requests.  Some managers fetch the entire tree or a large subset
of it when starting a new VM, and this cost is a substantial fraction of
start up time.

To reduce this cost, consider QAPI calls that fetch more information in
each call:
  * qom-list-get: given a path, return a list of properties and values.
  * qom-list-getv: given a list of paths, return a list of properties and
    values for each path.
  * qom-tree-get: given a path, return all descendant nodes rooted at that
    path, with properties and values for each.

In all cases, a returned property is represented by ObjectPropertyValue,
with fields name, type, and value.  If an error occurs when reading a value
the value field is omitted.  Thus an error for one property will not cause a
bulk fetch operation to fail.

To evaluate each method, I modified scripts/qmp/qom-tree to use the method,
verified all methods produce the same output, and timed each using:

  qemu-system-x86_64 -display none \
    -chardev socket,id=monitor0,path=/tmp/vm1.sock,server=on,wait=off \
    -mon monitor0,mode=control &

  time qom-tree -s /tmp/vm1.sock > /dev/null

I only measured once per method, but the variation is low after a warm up run.
The 'real - user - sys' column is a proxy for QEMU CPU time.

method               real(s)   user(s)   sys(s)  (real - user - sys)(s)
qom-list / qom-get   2.048     0.932     0.057   1.059
qom-list-get         0.402     0.230     0.029   0.143
qom-list-getv        0.200     0.132     0.015   0.053
qom-tree-get         0.143     0.123     0.012   0.008

qom-tree-get is the clear winner, reducing elapsed time by a factor of 14X,
and reducing QEMU CPU time by 132X.

qom-list-getv is slower when fetching the entire tree, but can beat
qom-tree-get when only a subset of the tree needs to be fetched (not shown).
qom-list-get is shown for comparison only, and is not included in this series.

Changes in V2:
  * removed "qom: qom_resolve_path", which was pulled separately
  * dropped the error member
  * fixed missing _list_tree in qom.py
  * updated 10.0 to 10.1

Steve Sistare (5):
  qom: qom-tree-get
  python: use qom-tree-get
  tests/qtest/qom-test: unit test for qom-tree-get
  qom: qom-list-getv
  tests/qtest/qom-test: unit test for qom-list-getv

 python/qemu/utils/qom.py        |  36 ++++++-------
 python/qemu/utils/qom_common.py |  48 +++++++++++++++++
 qapi/qom.json                   |  90 ++++++++++++++++++++++++++++++++
 qom/qom-qmp-cmds.c              | 112 +++++++++++++++++++++++++++++++++++++++
 tests/qtest/qom-test.c          | 113 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 381 insertions(+), 18 deletions(-)

base-commit: 7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16

-- 
1.8.3.1

Re: [PATCH V2 0/5] fast qom tree get

Posted by Markus Armbruster 4 months, 2 weeks ago

Steve,

My sincere apologies for the long, long delay.

It wasn't just for the usual reasons.  It was also because I had a vague
feeling of unease about qom-tree, and had trouble figuring out why.

I'll try do your work justice before the window for 10.1 closes.

QOM maintainers, please have a look, too.

Re: [PATCH V2 0/5] fast qom tree get

Posted by Steven Sistare 4 months, 1 week ago

On 7/4/2025 8:33 AM, Markus Armbruster wrote:
> Steve,
> 
> My sincere apologies for the long, long delay.
> 
> It wasn't just for the usual reasons.  It was also because I had a vague
> feeling of unease about qom-tree, and had trouble figuring out why.
> 
> I'll try do your work justice before the window for 10.1 closes.
> 
> QOM maintainers, please have a look, too.

Thanks, much appreciated - steve

Re: [PATCH V2 0/5] fast qom tree get

Posted by Markus Armbruster 4 months, 2 weeks ago

Steve Sistare <steven.sistare@oracle.com> writes:

> Using qom-list and qom-get to get all the nodes and property values in a
> QOM tree can take multiple seconds because it requires 1000's of individual
> QOM requests.  Some managers fetch the entire tree or a large subset
> of it when starting a new VM, and this cost is a substantial fraction of
> start up time.
>
> To reduce this cost, consider QAPI calls that fetch more information in
> each call:
>   * qom-list-get: given a path, return a list of properties and values.
>   * qom-list-getv: given a list of paths, return a list of properties and
>     values for each path.
>   * qom-tree-get: given a path, return all descendant nodes rooted at that
>     path, with properties and values for each.
>
> In all cases, a returned property is represented by ObjectPropertyValue,
> with fields name, type, and value.  If an error occurs when reading a value
> the value field is omitted.  Thus an error for one property will not cause a
> bulk fetch operation to fail.
>
> To evaluate each method, I modified scripts/qmp/qom-tree to use the method,
> verified all methods produce the same output, and timed each using:
>
>   qemu-system-x86_64 -display none \
>     -chardev socket,id=monitor0,path=/tmp/vm1.sock,server=on,wait=off \
>     -mon monitor0,mode=control &
>
>   time qom-tree -s /tmp/vm1.sock > /dev/null
>
> I only measured once per method, but the variation is low after a warm up run.
> The 'real - user - sys' column is a proxy for QEMU CPU time.
>
> method               real(s)   user(s)   sys(s)  (real - user - sys)(s)
> qom-list / qom-get   2.048     0.932     0.057   1.059
> qom-list-get         0.402     0.230     0.029   0.143
> qom-list-getv        0.200     0.132     0.015   0.053
> qom-tree-get         0.143     0.123     0.012   0.008
>
> qom-tree-get is the clear winner, reducing elapsed time by a factor of 14X,
> and reducing QEMU CPU time by 132X.
>
> qom-list-getv is slower when fetching the entire tree, but can beat
> qom-tree-get when only a subset of the tree needs to be fetched (not shown).
> qom-list-get is shown for comparison only, and is not included in this series.

How badly do you need the additional performance qom-tree-get can give
you in certain cases?

I'm asking because I find qom-list-getv *much* simpler.

Re: [PATCH V2 0/5] fast qom tree get

Posted by Steven Sistare 4 months, 1 week ago

On 7/4/2025 8:26 AM, Markus Armbruster wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> Using qom-list and qom-get to get all the nodes and property values in a
>> QOM tree can take multiple seconds because it requires 1000's of individual
>> QOM requests.  Some managers fetch the entire tree or a large subset
>> of it when starting a new VM, and this cost is a substantial fraction of
>> start up time.
>>
>> To reduce this cost, consider QAPI calls that fetch more information in
>> each call:
>>    * qom-list-get: given a path, return a list of properties and values.
>>    * qom-list-getv: given a list of paths, return a list of properties and
>>      values for each path.
>>    * qom-tree-get: given a path, return all descendant nodes rooted at that
>>      path, with properties and values for each.
>>
>> In all cases, a returned property is represented by ObjectPropertyValue,
>> with fields name, type, and value.  If an error occurs when reading a value
>> the value field is omitted.  Thus an error for one property will not cause a
>> bulk fetch operation to fail.
>>
>> To evaluate each method, I modified scripts/qmp/qom-tree to use the method,
>> verified all methods produce the same output, and timed each using:
>>
>>    qemu-system-x86_64 -display none \
>>      -chardev socket,id=monitor0,path=/tmp/vm1.sock,server=on,wait=off \
>>      -mon monitor0,mode=control &
>>
>>    time qom-tree -s /tmp/vm1.sock > /dev/null
>>
>> I only measured once per method, but the variation is low after a warm up run.
>> The 'real - user - sys' column is a proxy for QEMU CPU time.
>>
>> method               real(s)   user(s)   sys(s)  (real - user - sys)(s)
>> qom-list / qom-get   2.048     0.932     0.057   1.059
>> qom-list-get         0.402     0.230     0.029   0.143
>> qom-list-getv        0.200     0.132     0.015   0.053
>> qom-tree-get         0.143     0.123     0.012   0.008
>>
>> qom-tree-get is the clear winner, reducing elapsed time by a factor of 14X,
>> and reducing QEMU CPU time by 132X.
>>
>> qom-list-getv is slower when fetching the entire tree, but can beat
>> qom-tree-get when only a subset of the tree needs to be fetched (not shown).
>> qom-list-get is shown for comparison only, and is not included in this series.
> 
> How badly do you need the additional performance qom-tree-get can give
> you in certain cases?
> 
> I'm asking because I find qom-list-getv *much* simpler.

I would be content with qom-list-getv, so I will drop qom-tree-get.
qom-list-getv needs ObjectPropertyValue and qom_list_add_property_value
from the qom-tree-get patch, so I will respond to those comments.

- Steve

Re: [PATCH V2 0/5] fast qom tree get

Posted by Fabiano Rosas 5 months, 4 weeks ago

Steve Sistare <steven.sistare@oracle.com> writes:

> Using qom-list and qom-get to get all the nodes and property values in a
> QOM tree can take multiple seconds because it requires 1000's of individual
> QOM requests.  Some managers fetch the entire tree or a large subset
> of it when starting a new VM, and this cost is a substantial fraction of
> start up time.
>
> To reduce this cost, consider QAPI calls that fetch more information in
> each call:
>   * qom-list-get: given a path, return a list of properties and values.
>   * qom-list-getv: given a list of paths, return a list of properties and
>     values for each path.
>   * qom-tree-get: given a path, return all descendant nodes rooted at that
>     path, with properties and values for each.
>
> In all cases, a returned property is represented by ObjectPropertyValue,
> with fields name, type, and value.  If an error occurs when reading a value
> the value field is omitted.  Thus an error for one property will not cause a
> bulk fetch operation to fail.
>
> To evaluate each method, I modified scripts/qmp/qom-tree to use the method,
> verified all methods produce the same output, and timed each using:
>
>   qemu-system-x86_64 -display none \
>     -chardev socket,id=monitor0,path=/tmp/vm1.sock,server=on,wait=off \
>     -mon monitor0,mode=control &
>
>   time qom-tree -s /tmp/vm1.sock > /dev/null
>
> I only measured once per method, but the variation is low after a warm up run.
> The 'real - user - sys' column is a proxy for QEMU CPU time.
>
> method               real(s)   user(s)   sys(s)  (real - user - sys)(s)
> qom-list / qom-get   2.048     0.932     0.057   1.059
> qom-list-get         0.402     0.230     0.029   0.143
> qom-list-getv        0.200     0.132     0.015   0.053
> qom-tree-get         0.143     0.123     0.012   0.008
>
> qom-tree-get is the clear winner, reducing elapsed time by a factor of 14X,
> and reducing QEMU CPU time by 132X.
>
> qom-list-getv is slower when fetching the entire tree, but can beat
> qom-tree-get when only a subset of the tree needs to be fetched (not shown).
> qom-list-get is shown for comparison only, and is not included in this series.
>
> Changes in V2:
>   * removed "qom: qom_resolve_path", which was pulled separately
>   * dropped the error member
>   * fixed missing _list_tree in qom.py
>   * updated 10.0 to 10.1
>
> Steve Sistare (5):
>   qom: qom-tree-get
>   python: use qom-tree-get
>   tests/qtest/qom-test: unit test for qom-tree-get
>   qom: qom-list-getv
>   tests/qtest/qom-test: unit test for qom-list-getv
>
>  python/qemu/utils/qom.py        |  36 ++++++-------
>  python/qemu/utils/qom_common.py |  48 +++++++++++++++++
>  qapi/qom.json                   |  90 ++++++++++++++++++++++++++++++++
>  qom/qom-qmp-cmds.c              | 112 +++++++++++++++++++++++++++++++++++++++
>  tests/qtest/qom-test.c          | 113 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 381 insertions(+), 18 deletions(-)
>
> base-commit: 7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16

Acked-by: Fabiano Rosas <farosas@suse.de>