[PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters

wucf@linux.ibm.com posted 16 patches 3 months ago
There is a newer version of this series
[PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by wucf@linux.ibm.com 3 months ago
From: Chun Feng Wu <wucf@linux.ibm.com>

* Add new elements '<throttlefilters>'
* <ThrottleFilters> can include multiple throttlegroup references to form filter chain in qemu
* Chained throttle filters feature in qemu is described at https://github.com/qemu/qemu/blob/master/docs/throttle.txt

Signed-off-by: Chun Feng Wu <wucf@linux.ibm.com>
---
 docs/formatdomain.rst             | 22 ++++++++++++++++++++++
 src/conf/schemas/domaincommon.rng | 19 ++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index b7e1f9cc83..0fa8f1267c 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -2736,6 +2736,15 @@ paravirtualized driver is specified via the ``disk`` element.
        <source dev='/dev/vhost-vdpa-0' />
        <target dev='vdg' bus='virtio'/>
      </disk>
+     <disk type='file' device='disk'>
+       <driver name='qemu' type='qcow2' />
+       <source file='/var/lib/libvirt/images/disk.qcow2'/>
+       <target dev='vdh' bus='virtio'/>
+       <throttlefilters>
+         <throttlefilter group='limit2'/>
+         <throttlefilter group='limit012'/>
+       </throttlefilters>
+     </disk>
    </devices>
    ...
 
@@ -3217,6 +3226,19 @@ paravirtualized driver is specified via the ``disk`` element.
    :since:`since after 0.4.4`; "sata" attribute value :since:`since 0.9.7`;
    "removable" attribute value :since:`since 1.1.3`;
    "rotation_rate" attribute value :since:`since 7.3.0`
+``throttlefilters``
+   The optional ``throttlefilters`` element provides the ability to provide additional
+   per-device throttle chain :since:`Since 10.5.0`
+   For example, if we have four different disks and we want to limit I/O for each one
+   and we also want to limit combined I/O of all four disks, we can leverage
+   ``throttlefilters`` to achieve this goal by setting two ``throttlefilter`` for
+   each disk: disk's own filter(e.g. limit2) and combined filter(e.g. limit012).
+   The nodes in qemu shape a chain like libvirt-4-filter(node name of "limit012") ->
+   libvirt-3-filter(node name of "limit2") -> libvirt-2-format -> libvirt-1-storage.
+   ``throttlefilters`` and ``iotune`` should be used exclusively.
+
+   ``throttlefilter``
+      The optional ``throttlefilter`` element is to reference defined throttle group.
 ``iotune``
    The optional ``iotune`` element provides the ability to provide additional
    per-device I/O tuning, with values that can vary for each device (contrast
diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng
index 08c520e222..7ceb8c0be2 100644
--- a/src/conf/schemas/domaincommon.rng
+++ b/src/conf/schemas/domaincommon.rng
@@ -1578,7 +1578,10 @@
         <ref name="encryption"/>
       </optional>
       <optional>
-        <ref name="diskIoTune"/>
+        <choice>
+          <ref name="throttlefilters"/>
+          <ref name="diskIoTune"/>
+        </choice>
       </optional>
       <optional>
         <ref name="alias"/>
@@ -6671,6 +6674,20 @@
       </element>
     </optional>
   </define>
+  <!--
+      A set of throttlefilters to reference throttlegroups
+    -->
+  <define name="throttlefilters">
+    <element name="throttlefilters">
+      <zeroOrMore>
+        <element name="throttlefilter">
+          <attribute name="group">
+            <data type="string"/>
+          </attribute>
+        </element>
+      </zeroOrMore>
+    </element>
+  </define>
   <!--
       A set of optional features: PAE, APIC, ACPI, GIC, TCG,
       HyperV Enlightenment, KVM features, paravirtual spinlocks and HAP support
-- 
2.34.1
Re: [PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by Peter Krempa 2 months, 2 weeks ago
On Wed, Jun 12, 2024 at 03:02:10 -0700, wucf@linux.ibm.com wrote:
> From: Chun Feng Wu <wucf@linux.ibm.com>
> 
> * Add new elements '<throttlefilters>'
> * <ThrottleFilters> can include multiple throttlegroup references to form filter chain in qemu
> * Chained throttle filters feature in qemu is described at https://github.com/qemu/qemu/blob/master/docs/throttle.txt
> 
> Signed-off-by: Chun Feng Wu <wucf@linux.ibm.com>
> ---
>  docs/formatdomain.rst             | 22 ++++++++++++++++++++++
>  src/conf/schemas/domaincommon.rng | 19 ++++++++++++++++++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> index b7e1f9cc83..0fa8f1267c 100644
> --- a/docs/formatdomain.rst
> +++ b/docs/formatdomain.rst
> @@ -2736,6 +2736,15 @@ paravirtualized driver is specified via the ``disk`` element.
>         <source dev='/dev/vhost-vdpa-0' />
>         <target dev='vdg' bus='virtio'/>
>       </disk>
> +     <disk type='file' device='disk'>
> +       <driver name='qemu' type='qcow2' />
> +       <source file='/var/lib/libvirt/images/disk.qcow2'/>
> +       <target dev='vdh' bus='virtio'/>
> +       <throttlefilters>
> +         <throttlefilter group='limit2'/>
> +         <throttlefilter group='limit012'/>
> +       </throttlefilters>
> +     </disk>
>     </devices>
>     ...
>  
> @@ -3217,6 +3226,19 @@ paravirtualized driver is specified via the ``disk`` element.
>     :since:`since after 0.4.4`; "sata" attribute value :since:`since 0.9.7`;
>     "removable" attribute value :since:`since 1.1.3`;
>     "rotation_rate" attribute value :since:`since 7.3.0`
> +``throttlefilters``
> +   The optional ``throttlefilters`` element provides the ability to provide additional
> +   per-device throttle chain :since:`Since 10.5.0`
> +   For example, if we have four different disks and we want to limit I/O for each one
> +   and we also want to limit combined I/O of all four disks, we can leverage
> +   ``throttlefilters`` to achieve this goal by setting two ``throttlefilter`` for
> +   each disk: disk's own filter(e.g. limit2) and combined filter(e.g. limit012).

> +   The nodes in qemu shape a chain like libvirt-4-filter(node name of "limit012") ->
> +   libvirt-3-filter(node name of "limit2") -> libvirt-2-format -> libvirt-1-storage.
> +   ``throttlefilters`` and ``iotune`` should be used exclusively.

Node names are a qemu driver internal implementation detail and thus
must not be noted in documentation.

> +
> +   ``throttlefilter``
> +      The optional ``throttlefilter`` element is to reference defined throttle group.
>  ``iotune``
>     The optional ``iotune`` element provides the ability to provide additional
>     per-device I/O tuning, with values that can vary for each device (contrast
> diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng
> index 08c520e222..7ceb8c0be2 100644
> --- a/src/conf/schemas/domaincommon.rng
> +++ b/src/conf/schemas/domaincommon.rng
> @@ -1578,7 +1578,10 @@
>          <ref name="encryption"/>
>        </optional>
>        <optional>
> -        <ref name="diskIoTune"/>
> +        <choice>
> +          <ref name="throttlefilters"/>
> +          <ref name="diskIoTune"/>
> +        </choice>
>        </optional>
>        <optional>
>          <ref name="alias"/>
> @@ -6671,6 +6674,20 @@
>        </element>
>      </optional>
>    </define>
> +  <!--
> +      A set of throttlefilters to reference throttlegroups
> +    -->
> +  <define name="throttlefilters">
> +    <element name="throttlefilters">
> +      <zeroOrMore>
> +        <element name="throttlefilter">
> +          <attribute name="group">
> +            <data type="string"/>
> +          </attribute>
> +        </element>
> +      </zeroOrMore>
> +    </element>
> +  </define>
>    <!--
>        A set of optional features: PAE, APIC, ACPI, GIC, TCG,
>        HyperV Enlightenment, KVM features, paravirtual spinlocks and HAP support
> -- 
> 2.34.1
>
Re: [PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by Peter Krempa 1 month, 3 weeks ago
On Tue, Jul 02, 2024 at 16:11:03 +0200, Peter Krempa wrote:
> On Wed, Jun 12, 2024 at 03:02:10 -0700, wucf@linux.ibm.com wrote:
> > From: Chun Feng Wu <wucf@linux.ibm.com>
> > 
> > * Add new elements '<throttlefilters>'
> > * <ThrottleFilters> can include multiple throttlegroup references to form filter chain in qemu
> > * Chained throttle filters feature in qemu is described at https://github.com/qemu/qemu/blob/master/docs/throttle.txt
> > 
> > Signed-off-by: Chun Feng Wu <wucf@linux.ibm.com>
> > ---
> >  docs/formatdomain.rst             | 22 ++++++++++++++++++++++
> >  src/conf/schemas/domaincommon.rng | 19 ++++++++++++++++++-
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > index b7e1f9cc83..0fa8f1267c 100644
> > --- a/docs/formatdomain.rst
> > +++ b/docs/formatdomain.rst
> > @@ -2736,6 +2736,15 @@ paravirtualized driver is specified via the ``disk`` element.
> >         <source dev='/dev/vhost-vdpa-0' />
> >         <target dev='vdg' bus='virtio'/>
> >       </disk>
> > +     <disk type='file' device='disk'>
> > +       <driver name='qemu' type='qcow2' />
> > +       <source file='/var/lib/libvirt/images/disk.qcow2'/>
> > +       <target dev='vdh' bus='virtio'/>
> > +       <throttlefilters>
> > +         <throttlefilter group='limit2'/>
> > +         <throttlefilter group='limit012'/>
> > +       </throttlefilters>
> > +     </disk>
> >     </devices>
> >     ...
> >  
> > @@ -3217,6 +3226,19 @@ paravirtualized driver is specified via the ``disk`` element.
> >     :since:`since after 0.4.4`; "sata" attribute value :since:`since 0.9.7`;
> >     "removable" attribute value :since:`since 1.1.3`;
> >     "rotation_rate" attribute value :since:`since 7.3.0`
> > +``throttlefilters``
> > +   The optional ``throttlefilters`` element provides the ability to provide additional
> > +   per-device throttle chain :since:`Since 10.5.0`
> > +   For example, if we have four different disks and we want to limit I/O for each one
> > +   and we also want to limit combined I/O of all four disks, we can leverage
> > +   ``throttlefilters`` to achieve this goal by setting two ``throttlefilter`` for
> > +   each disk: disk's own filter(e.g. limit2) and combined filter(e.g. limit012).
> 
> > +   The nodes in qemu shape a chain like libvirt-4-filter(node name of "limit012") ->
> > +   libvirt-3-filter(node name of "limit2") -> libvirt-2-format -> libvirt-1-storage.
> > +   ``throttlefilters`` and ``iotune`` should be used exclusively.
> 
> Node names are a qemu driver internal implementation detail and thus
> must not be noted in documentation.

I'm not exactly sure how the internals in qemu work here, but you also
might want to document how the order of the filters impacts things (or
that it does not impact things).
Re: [PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by Chun Feng Wu 1 month, 1 week ago
The order of such ``throttlefilter`` doesn't matter within ``throttlefilters``.

I will put above statement into doc
Re: [PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by Peter Krempa 1 month, 1 week ago
On Tue, Aug 06, 2024 at 00:27:58 -0000, Chun Feng Wu wrote:

Please keep the context in the reply. I had to check back what I've
asked.

> The order of such ``throttlefilter`` doesn't matter within ``throttlefilters``.

So IIUC, re-ordering of the filters doesn't have any guest-OS visible
impact? I'm trying to understand whether one disk can exhaust one layer
while be blocked on the next, in which case a different disk which has
only one layer (equivalent to the first disk's first layer) would be
starved, but if the filters were ordered the other way around at the
first disk it would not.

If the above can happen you'll need to document how it's supposed to
behave.
Re: [PATCH RFC v3 02/16] schema: Add new domain elements to support multiple throttle filters
Posted by Chun Feng Wu 1 month, 1 week ago
my original conclusion is based on the following test xml:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
   ...
   <throttlegroups>
     <throttlegroup>
       <total_iops_sec>200</total_iops_sec>
       <total_iops_sec_max>200</total_iops_sec_max>
       <group_name>limit0</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>250</total_iops_sec>
       <total_iops_sec_max>250</total_iops_sec_max>
       <group_name>limit1</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>300</total_iops_sec>
       <total_iops_sec_max>300</total_iops_sec_max>
       <group_name>limit2</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>400</total_iops_sec>
       <total_iops_sec_max>400</total_iops_sec_max>
       <group_name>limit012</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
   </throttlegroups>

...

<devices>
     <!-- Disk for the operating system -->
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/images/jammy-server-cloudimg-amd64.img'/>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_1.qcow2'/>
       <target dev='vdb' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit0'/>
         <throttlefilter group='limit012'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_2.qcow2'/>
       <target dev='vdc' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit1'/>
         <throttlefilter group='limit012'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_3.qcow2'/>
       <target dev='vdd' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit2'/>
         <throttlefilter group='limit012'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x07' 
function='0x0'/>
     </disk>

    ...

   </devices>
</domain>

if I re-order filters in vdc as below, fio tests(randwrite) show the 
same result for both concurrent(400 iops in total, around 133(400/3) for 
each disk) and individual disk test(200 for vdb, 250 for vdc, 300 for vdd).

     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_2.qcow2'/>
       <target dev='vdc' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit012'/>
         <throttlefilter group='limit1'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
function='0x0'/>
     </disk>


and back to your case(vdb, vdc in the following xml):

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
   ...
   <throttlegroups>
     <throttlegroup>
       <total_iops_sec>200</total_iops_sec>
       <total_iops_sec_max>200</total_iops_sec_max>
       <group_name>limit0</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>250</total_iops_sec>
       <total_iops_sec_max>250</total_iops_sec_max>
       <group_name>limit1</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>300</total_iops_sec>
       <total_iops_sec_max>300</total_iops_sec_max>
       <group_name>limit2</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
     <throttlegroup>
       <total_iops_sec>400</total_iops_sec>
       <total_iops_sec_max>400</total_iops_sec_max>
       <group_name>limit012</group_name>
<total_iops_sec_max_length>1</total_iops_sec_max_length>
     </throttlegroup>
   </throttlegroups>

...

<devices>
     <!-- Disk for the operating system -->
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/images/jammy-server-cloudimg-amd64.img'/>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_1.qcow2'/>
       <target dev='vdb' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit012'/>
         <throttlefilter group='limit0'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_2.qcow2'/>
       <target dev='vdc' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit012'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
function='0x0'/>
     </disk>
        ...

   </devices>
</domain>


with above xml, fio tests(randwrite) show:
- concurrent: 400 iops in total, around 200(400/2) for each disk

- individual disk test: 200 for vdb, 400 for vdc

after I re-order vdb disk as below, tests have the same result:

- concurrent: 400 iops in total, around 200(400/2) for each disk

- individual disk test: 200 for vdb, 400 for vdc

     <disk type='file' device='disk'>
       <driver name='qemu' type='qcow2'/>
       <source file='/virt/disks/vm1_disk_1.qcow2'/>
       <target dev='vdb' bus='virtio'/>
       <throttlefilters>
         <throttlefilter group='limit0'/>
         <throttlefilter group='limit012'/>
       </throttlefilters>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </disk>


let me know if I understand your case correctly, thanks!


On 2024/8/6 15:36, Peter Krempa wrote:
> On Tue, Aug 06, 2024 at 00:27:58 -0000, Chun Feng Wu wrote:
>
> Please keep the context in the reply. I had to check back what I've
> asked.
>
>> The order of such ``throttlefilter`` doesn't matter within ``throttlefilters``.
> So IIUC, re-ordering of the filters doesn't have any guest-OS visible
> impact? I'm trying to understand whether one disk can exhaust one layer
> while be blocked on the next, in which case a different disk which has
> only one layer (equivalent to the first disk's first layer) would be
> starved, but if the filters were ordered the other way around at the
> first disk it would not.
>
> If the above can happen you'll need to document how it's supposed to
> behave.
>
-- 
Thanks and Regards,

Wu