[PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes

~hyman posted 3 patches 2 years, 6 months ago
[PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes
Posted by ~hyman 2 years, 6 months ago
From: Hyman Huang(黄勇) <yong.huang@smartx.com>

Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
whether discard and write-zeroes requests are handled by the
virtio-blk device.

To distinguish the attributes in block drive layer, use the 'virtio'
prefix to indicate that these attributes are specific for virtio-blk.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
---
 docs/formatdomain.rst             |  8 ++++++++
 src/conf/domain_conf.c            | 16 ++++++++++++++++
 src/conf/domain_conf.h            |  2 ++
 src/conf/schemas/domaincommon.rng | 10 ++++++++++
 src/conf/storage_source_conf.c    |  2 ++
 src/conf/storage_source_conf.h    |  2 ++
 src/qemu/qemu_domain.c            |  2 ++
 src/qemu/qemu_driver.c            |  4 +++-
 src/vz/vz_utils.c                 | 12 ++++++++++++
 9 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 4af0b82569..7be12ff08e 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element.
       value can be either "unmap" (allow the discard request to be passed) or
       "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU and KVM
       only)`
+   -  The optional ``virtio_discard`` and ``virtio_write_zeroes`` are attributes
+      that control whether discard and write-zeroes requests are handled by the
+      virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES
+      commands introduced in virtio-blk protocol to improve performance when
+      using SSD backend. The value can be either 'on' or 'off'. Note that
+      ``discard`` and ``write_zeroes`` implementations in the block drive layer
+      are parts of the feature logically and should be turned on when enabling
+      the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
    -  The optional ``detect_zeroes`` attribute controls whether to detect zero
       write requests. The value can be "off", "on" or "unmap". First two values
       turn the detection off and on, respectively. The third value ("unmap")
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 5ac5c0b771..0f82b489f4 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -7813,6 +7813,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDef *def,
                        VIR_XML_PROP_NONZERO, &def->discard) < 0)
         return -1;
 
+    if (virXMLPropTristateSwitch(cur, "virtio_discard", VIR_XML_PROP_NONE,
+                                 &def->virtio_discard) < 0)
+        return -1;
+
+    if (virXMLPropTristateSwitch(cur, "virtio_write_zeroes", VIR_XML_PROP_NONE,
+                                 &def->virtio_write_zeroes) < 0)
+        return -1;
+
     if (virXMLPropUInt(cur, "iothread", 10, VIR_XML_PROP_NONZERO, &def->iothread) < 0)
         return -1;
 
@@ -22514,6 +22522,14 @@ virDomainDiskDefFormatDriver(virBuffer *buf,
         virBufferAsprintf(&attrBuf, " discard='%s'",
                           virDomainDiskDiscardTypeToString(disk->discard));
 
+    if (disk->virtio_discard)
+        virBufferAsprintf(&attrBuf, " virtio_discard='%s'",
+                          virTristateSwitchTypeToString(disk->virtio_discard));
+
+    if (disk->virtio_write_zeroes)
+        virBufferAsprintf(&attrBuf, " virtio_write_zeroes='%s'",
+                          virTristateSwitchTypeToString(disk->virtio_write_zeroes));
+
     if (disk->iothread)
         virBufferAsprintf(&attrBuf, " iothread='%u'", disk->iothread);
 
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index c857ba556f..86e955333e 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -607,6 +607,8 @@ struct _virDomainDiskDef {
     unsigned int iothread; /* unused = 0, > 0 specific thread # */
     virDomainDiskDetectZeroes detect_zeroes;
     virTristateSwitch discard_no_unref;
+    virTristateSwitch virtio_discard;
+    virTristateSwitch virtio_write_zeroes;
     char *domain_name; /* backend domain name */
     unsigned int queues;
     unsigned int queue_size;
diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng
index c2f56b0490..c3a59aeb15 100644
--- a/src/conf/schemas/domaincommon.rng
+++ b/src/conf/schemas/domaincommon.rng
@@ -2510,6 +2510,16 @@
       <optional>
         <ref name="discard"/>
       </optional>
+      <optional>
+        <attribute name="virtio_discard">
+          <ref name="virOnOff"/>
+        </attribute>
+      </optional>
+      <optional>
+        <attribute name="virtio_write_zeroes">
+          <ref name="virOnOff"/>
+        </attribute>
+      </optional>
       <optional>
         <ref name="driverIOThread"/>
       </optional>
diff --git a/src/conf/storage_source_conf.c b/src/conf/storage_source_conf.c
index dcac3a8ff6..c2fee652f3 100644
--- a/src/conf/storage_source_conf.c
+++ b/src/conf/storage_source_conf.c
@@ -810,6 +810,8 @@ virStorageSourceCopy(const virStorageSource *src,
     def->discard = src->discard;
     def->detect_zeroes = src->detect_zeroes;
     def->discard_no_unref = src->discard_no_unref;
+    def->virtio_discard = src->virtio_discard;
+    def->virtio_write_zeroes = src->virtio_write_zeroes;
     def->sslverify = src->sslverify;
     def->readahead = src->readahead;
     def->timeout = src->timeout;
diff --git a/src/conf/storage_source_conf.h b/src/conf/storage_source_conf.h
index f13e7c756a..5e7f093fc0 100644
--- a/src/conf/storage_source_conf.h
+++ b/src/conf/storage_source_conf.h
@@ -400,6 +400,8 @@ struct _virStorageSource {
     int discard; /* enum virDomainDiskDiscard */
     int detect_zeroes; /* enum virDomainDiskDetectZeroes */
     virTristateSwitch discard_no_unref;
+    virTristateSwitch virtio_discard;
+    virTristateSwitch virtio_write_zeroes;
 
     bool floppyimg; /* set to true if the storage source is going to be used
                        as a source for floppy drive */
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 3700b3e711..634d8699d0 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11031,6 +11031,8 @@ qemuDomainPrepareDiskSourceData(virDomainDiskDef *disk,
     src->cachemode = disk->cachemode;
     src->discard = disk->discard;
     src->discard_no_unref = disk->discard_no_unref;
+    src->virtio_discard = disk->virtio_discard;
+    src->virtio_write_zeroes = disk->virtio_write_zeroes;
 
     if (disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY)
         src->floppyimg = true;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index f20544590d..43e157da68 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -14243,9 +14243,11 @@ qemuDomainBlockCopyCommon(virDomainObj *vm,
      * Since 'mirror' has the ambition to replace it we need to propagate
      * it into the mirror too. We do it directly as otherwise we'd need
      * to modify all callers of 'qemuDomainPrepareStorageSourceBlockdev'
-     * Same for discard_no_unref */
+     * Same for discard_no_unref,virtio_discard and virtio_write_zeroes */
     mirror->detect_zeroes = disk->detect_zeroes;
     mirror->discard_no_unref = disk->discard_no_unref;
+    mirror->virtio_discard = disk->virtio_discard;
+    mirror->virtio_write_zeroes = disk->virtio_write_zeroes;
 
     /* If reusing an external image that includes a backing file but the user
      * did not enumerate the chain in the XML we need to detect the chain */
diff --git a/src/vz/vz_utils.c b/src/vz/vz_utils.c
index 7db7dbd419..1d824a9d2b 100644
--- a/src/vz/vz_utils.c
+++ b/src/vz/vz_utils.c
@@ -353,6 +353,18 @@ vzCheckDiskUnsupportedParams(virDomainDiskDef *disk)
         return -1;
     }
 
+    if (disk->virtio_discard) {
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("Disk virtio_discard is not supported by vz driver."));
+        return -1;
+    }
+
+    if (disk->virtio_write_zeroes) {
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("Disk virtio_write_zeroes is not supported by vz driver."));
+        return -1;
+    }
+
     if (disk->startupPolicy != VIR_DOMAIN_STARTUP_POLICY_DEFAULT) {
         virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
                        _("Setting up disk startup policy is not "
-- 
2.38.5
Re: [PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes
Posted by Peter Krempa 2 years, 6 months ago
On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
> From: Hyman Huang(黄勇) <yong.huang@smartx.com>
> 
> Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
> whether discard and write-zeroes requests are handled by the
> virtio-blk device.
> 
> To distinguish the attributes in block drive layer, use the 'virtio'
> prefix to indicate that these attributes are specific for virtio-blk.
> 
> Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> ---
>  docs/formatdomain.rst             |  8 ++++++++
>  src/conf/domain_conf.c            | 16 ++++++++++++++++
>  src/conf/domain_conf.h            |  2 ++
>  src/conf/schemas/domaincommon.rng | 10 ++++++++++
>  src/conf/storage_source_conf.c    |  2 ++
>  src/conf/storage_source_conf.h    |  2 ++
>  src/qemu/qemu_domain.c            |  2 ++
>  src/qemu/qemu_driver.c            |  4 +++-
>  src/vz/vz_utils.c                 | 12 ++++++++++++
>  9 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> index 4af0b82569..7be12ff08e 100644
> --- a/docs/formatdomain.rst
> +++ b/docs/formatdomain.rst
> @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element.
>        value can be either "unmap" (allow the discard request to be passed) or
>        "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU and KVM
>        only)`
> +   -  The optional ``virtio_discard`` and ``virtio_write_zeroes`` are attributes
> +      that control whether discard and write-zeroes requests are handled by the
> +      virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES
> +      commands introduced in virtio-blk protocol to improve performance when
> +      using SSD backend. The value can be either 'on' or 'off'. Note that
> +      ``discard`` and ``write_zeroes`` implementations in the block drive layer
> +      are parts of the feature logically and should be turned on when enabling
> +      the feature. :since:`Since 9.6.0 (QEMU and KVM only)`

Based on current released qemu both 'discard' and 'write-zeroes' feature
of the 'virtio-blk' device is enabled by default:

 $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard
   discard=<bool>         - on/off (default: true)
   discard_granularity=<size> -  (default: 4294967295)
   max-discard-sectors=<uint32> -  (default: 4194303)
   report-discard-granularity=<bool> -  (default: true)
 $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes
   max-write-zeroes-sectors=<uint32> -  (default: 4194303)
   write-zeroes=<bool>    - on/off (default: true)

Do you need a way to disable this feature? Based on the description the
default seems to be sane and actually what you'd want users to set.

The feature was introduced to qemu by:

commit 5c81161f804144b146607f890e84613a4cbad95c
Author: Stefano Garzarella <sgarzare@redhat.com>
Date:   Thu Feb 21 11:33:07 2019 +0100

    virtio-blk: add "discard" and "write-zeroes" properties

    In order to avoid migration issues, we enable DISCARD and
    WRITE_ZEROES features only for machine type >= 4.0

    As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the
    list [1], DISCARD operation should not have security implications
    (eg. page cache attacks), so we can enable it by default.

The default was always enabled for all new machine types, so you should
only ever need to update your machine type.

Re: [PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes
Posted by Yong Huang 2 years, 6 months ago
On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:

> On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
> > From: Hyman Huang(黄勇) <yong.huang@smartx.com>
> >
> > Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
> > whether discard and write-zeroes requests are handled by the
> > virtio-blk device.
> >
> > To distinguish the attributes in block drive layer, use the 'virtio'
> > prefix to indicate that these attributes are specific for virtio-blk.
> >
> > Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > ---
> >  docs/formatdomain.rst             |  8 ++++++++
> >  src/conf/domain_conf.c            | 16 ++++++++++++++++
> >  src/conf/domain_conf.h            |  2 ++
> >  src/conf/schemas/domaincommon.rng | 10 ++++++++++
> >  src/conf/storage_source_conf.c    |  2 ++
> >  src/conf/storage_source_conf.h    |  2 ++
> >  src/qemu/qemu_domain.c            |  2 ++
> >  src/qemu/qemu_driver.c            |  4 +++-
> >  src/vz/vz_utils.c                 | 12 ++++++++++++
> >  9 files changed, 57 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > index 4af0b82569..7be12ff08e 100644
> > --- a/docs/formatdomain.rst
> > +++ b/docs/formatdomain.rst
> > @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the
> ``disk`` element.
> >        value can be either "unmap" (allow the discard request to be
> passed) or
> >        "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU
> and KVM
> >        only)`
> > +   -  The optional ``virtio_discard`` and ``virtio_write_zeroes`` are
> attributes
> > +      that control whether discard and write-zeroes requests are
> handled by the
> > +      virtio-blk device. The feature is based on DISCARD and
> WRITE_ZEROES
> > +      commands introduced in virtio-blk protocol to improve performance
> when
> > +      using SSD backend. The value can be either 'on' or 'off'. Note
> that
> > +      ``discard`` and ``write_zeroes`` implementations in the block
> drive layer
> > +      are parts of the feature logically and should be turned on when
> enabling
> > +      the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
>
> Based on current released qemu both 'discard' and 'write-zeroes' feature
> of the 'virtio-blk' device is enabled by default:
>
>  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard
>    discard=<bool>         - on/off (default: true)
>    discard_granularity=<size> -  (default: 4294967295)
>    max-discard-sectors=<uint32> -  (default: 4194303)
>    report-discard-granularity=<bool> -  (default: true)
>  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes
>    max-write-zeroes-sectors=<uint32> -  (default: 4194303)
>    write-zeroes=<bool>    - on/off (default: true)
>
> Do you need a way to disable this feature? Based on the description the
> default seems to be sane and actually what you'd want users to set.
>
The default is reasonable indeed. But there are still some scenarios
in the production environment that discard may  need to be disabled.
For example:
- The virtio-blk discard/write-zeroes commands was introduced in
  2017 in virtio-blk spec, so the OS distribution before 2017 can not
  use this feature. In that case, the cloud management platform(CMP)
  could recognize this issue and disable the feature in advance.

- Discard/write-zeroes may need to be configured at disk granularity
  in some scenarios. Assume that CMP support 2 distributed storage
  clusters,  one cluster supports discard and another does not.
  If the VM is configured with 2 disks - vda, vdb. Which are located in
  2 clusters respectively. Or, the VM with disks located in the
  discard-supportive cluster and want to migrate disks to another
  cluster for some reason(eg. storage capability is exhausted)
  CMP may want to turn discard off explicitly.

Though the above scenarios are quite rare and the virtio spec can
ensure the feature can be negotiated and used correctly.
CMP still wants to control the features it supports carefully and
precisely.
To summarise, IMHO, libvirt may not forbid the upper layer to
control the 2 features of the virtio-blk disk. Leaving the option
configurable for CMP or even customers.

There's one more background may need to be stated:
Current released QEMU do not provide hmp/qmp to query the
final features that are negotiated between front-end and back-end
from the hypervisor side. So if CMP wants to query what features a
guest VM uses, it has to query it inside the guest VM or hack the
qemu process. This way is inconvenient for control-planes, so the CMP
needs to control the feature as aggressively as it can.

Thank Peter for the attention to this patchset.
Yong

>
> The feature was introduced to qemu by:
>
> commit 5c81161f804144b146607f890e84613a4cbad95c
> Author: Stefano Garzarella <sgarzare@redhat.com>
> Date:   Thu Feb 21 11:33:07 2019 +0100
>
>     virtio-blk: add "discard" and "write-zeroes" properties
>
>     In order to avoid migration issues, we enable DISCARD and
>     WRITE_ZEROES features only for machine type >= 4.0
>
>     As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the
>     list [1], DISCARD operation should not have security implications
>     (eg. page cache attacks), so we can enable it by default.
>
> The default was always enabled for all new machine types, so you should
> only ever need to update your machine type.
>
>

-- 
Best regards
Re: [PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes
Posted by Peter Krempa 2 years, 6 months ago
On Wed, Jul 19, 2023 at 11:52:45 +0800, Yong Huang wrote:
> On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
> 
> > On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
> > > From: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > >
> > > Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
> > > whether discard and write-zeroes requests are handled by the
> > > virtio-blk device.
> > >
> > > To distinguish the attributes in block drive layer, use the 'virtio'
> > > prefix to indicate that these attributes are specific for virtio-blk.
> > >
> > > Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > > ---
> > >  docs/formatdomain.rst             |  8 ++++++++
> > >  src/conf/domain_conf.c            | 16 ++++++++++++++++
> > >  src/conf/domain_conf.h            |  2 ++
> > >  src/conf/schemas/domaincommon.rng | 10 ++++++++++
> > >  src/conf/storage_source_conf.c    |  2 ++
> > >  src/conf/storage_source_conf.h    |  2 ++
> > >  src/qemu/qemu_domain.c            |  2 ++
> > >  src/qemu/qemu_driver.c            |  4 +++-
> > >  src/vz/vz_utils.c                 | 12 ++++++++++++
> > >  9 files changed, 57 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > index 4af0b82569..7be12ff08e 100644
> > > --- a/docs/formatdomain.rst
> > > +++ b/docs/formatdomain.rst
> > > @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the
> > ``disk`` element.
> > >        value can be either "unmap" (allow the discard request to be
> > passed) or
> > >        "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU
> > and KVM
> > >        only)`
> > > +   -  The optional ``virtio_discard`` and ``virtio_write_zeroes`` are
> > attributes
> > > +      that control whether discard and write-zeroes requests are
> > handled by the
> > > +      virtio-blk device. The feature is based on DISCARD and
> > WRITE_ZEROES
> > > +      commands introduced in virtio-blk protocol to improve performance
> > when
> > > +      using SSD backend. The value can be either 'on' or 'off'. Note
> > that
> > > +      ``discard`` and ``write_zeroes`` implementations in the block
> > drive layer
> > > +      are parts of the feature logically and should be turned on when
> > enabling
> > > +      the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
> >
> > Based on current released qemu both 'discard' and 'write-zeroes' feature
> > of the 'virtio-blk' device is enabled by default:
> >
> >  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard
> >    discard=<bool>         - on/off (default: true)
> >    discard_granularity=<size> -  (default: 4294967295)
> >    max-discard-sectors=<uint32> -  (default: 4194303)
> >    report-discard-granularity=<bool> -  (default: true)
> >  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes
> >    max-write-zeroes-sectors=<uint32> -  (default: 4194303)
> >    write-zeroes=<bool>    - on/off (default: true)
> >
> > Do you need a way to disable this feature? Based on the description the
> > default seems to be sane and actually what you'd want users to set.
> >
> The default is reasonable indeed. But there are still some scenarios
> in the production environment that discard may  need to be disabled.
> For example:
> - The virtio-blk discard/write-zeroes commands was introduced in
>   2017 in virtio-blk spec, so the OS distribution before 2017 can not
>   use this feature. In that case, the cloud management platform(CMP)
>   could recognize this issue and disable the feature in advance.

Older drivers which do not support the feature are supposed to ignore
any new feature bits:

https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-140002

 2.2 Feature Bits

  Each virtio device offers all the features it understands. During device
  initialization, the driver reads this and tells the device the subset
  that it accepts. The only way to renegotiate is to reset the device.

  This allows for forwards and backwards compatibility: if the device is
  enhanced with a new feature bit, older drivers will not write that
  feature bit back to the device. Similarly, if a driver is enhanced with
  a feature that the device doesn’t support, it see the new feature is not
  offered.

Based on the above it's just fine to use a older OS with the new device
definition and management doesn't need to do anything to ensure
compatibility.

> - Discard/write-zeroes may need to be configured at disk granularity
>   in some scenarios. Assume that CMP support 2 distributed storage
>   clusters,  one cluster supports discard and another does not.
>   If the VM is configured with 2 disks - vda, vdb. Which are located in
>   2 clusters respectively. Or, the VM with disks located in the
>   discard-supportive cluster and want to migrate disks to another
>   cluster for some reason(eg. storage capability is exhausted)
>   CMP may want to turn discard off explicitly.

There are two distinct operations which have different requirements and
limitations based on the specification:

 - discard:
   The virtio spec says:
     "For discard commands, the device MAY deallocate the specified
      range of sectors in the device backend storage."

   Thus the device implementation is free to ignore any unsupported
   discard requests. In fact the implementation in qemu does ignore
   the request if e.g. the device's backend has discard disabled.

   The backend is in fact configured by the <driver discard="unmap">
   attribute, thus if you need to disable discard you can do that even
   now on the backend level.

 - write zeroes:

   Write zeroes becomes mandatory once negotiated, but qemu has it's own
   implementation which has integrated fallback of simply writing zeroes
   to the backing image if there isn't any faster method available.

> Though the above scenarios are quite rare and the virtio spec can
> ensure the feature can be negotiated and used correctly.
> CMP still wants to control the features it supports carefully and
> precisely.

As we both note there's not an actual technical problem here, right? As
in, everything will work as configured.

I'd prefer if you could justify/elaborate a bit more why you still need
to control those features.

> To summarise, IMHO, libvirt may not forbid the upper layer to
> control the 2 features of the virtio-blk disk. Leaving the option
> configurable for CMP or even customers.

As noted above, disard requests reaching the block device below the
image can be already configured via <driver discard="unmap/ignore">.

With the 'write-zeroes' feature there isn't really any semantic
difference regardless of what feature is supported by the underlying
storage. The difference is only in who actually does the actual writing
of zeroes.

> There's one more background may need to be stated:
> Current released QEMU do not provide hmp/qmp to query the
> final features that are negotiated between front-end and back-end
> from the hypervisor side. So if CMP wants to query what features a
> guest VM uses, it has to query it inside the guest VM or hack the
> qemu process.

I don't quite understand why that would be something any management
application needs to know. The hypervisor provides the features and if
the guest OS decides not to use them it' doesn't really matter, or does
it?

There are multiple other layers apart from the hypervisor support for
passing discards and the software that runs inside the VM (kernel,
volume manager, filesystem driver, filesystem configuration) which can
decide to use/don't use discards so the knowledge at the hypervisor
level doesn't tell you much.

> This way is inconvenient for control-planes, so the CMP
> needs to control the feature as aggressively as it can.

Based on the above, the only thing you can achieve is to forbid discards
for a VM on another level than those already supported. You will not be
able to force the OS to use discards and neither will be able to tell
whether the OS does use them in any other way you do now can (you still
can query number of discard requests send to the image backend).

> Thank Peter for the attention to this patchset.

My objections to this patchset are on the fact that the code originally
didn't provide much justification for this patch. With your explanations
I can't say I'm persuaded more of the contrary.

If you decide to still pursue this patchset:

 - Add justification to the commit message why that is needed. This is
   needed because the code itself can't justify the need for itself.
   Additionally do not use the first justification from this message, it
   simply doesn't have technical grounds.

 - rename the 'virtio_discard' and 'virtio_write_zeroes' features to
   'frontend_discard' and 'frontend_write_zeroes', so that if any other
   device frontend will expose such knobs they can be used without the
   need for yet another knob

 - modify the documentation where it states that in the vast majority of
   cases the user doesn't need to configure it because it simply does
   the right thing (this is actually the main objection, if the user
   doesn't need to set a feature it doesn't make sense to expose a
   knob)

 - Since this is a device frontend feature you'll need to add a ABI
   stability check for it (see. virDomainDiskDefCheckABIStability). Note
   that the existing discard/write_zeroes knobs don't need a ABI
   stability check because they are backend features thus can be changed
   e.g. on migration.

Re: [PATCH Libvirt v2 1/3] conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes
Posted by Yong Huang 2 years, 6 months ago
I'm sorry for the late reply, we have discussed your point of view
and agree with the point, so we stop pursuing this patchset unless
we find a persuasive scene that needs to disable these features.

Thank Peter for the explanation of the main objection. :)

Yong

On Wed, Jul 19, 2023 at 6:17 PM Peter Krempa <pkrempa@redhat.com> wrote:

> On Wed, Jul 19, 2023 at 11:52:45 +0800, Yong Huang wrote:
> > On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
> >
> > > On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
> > > > From: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > > >
> > > > Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
> > > > whether discard and write-zeroes requests are handled by the
> > > > virtio-blk device.
> > > >
> > > > To distinguish the attributes in block drive layer, use the 'virtio'
> > > > prefix to indicate that these attributes are specific for virtio-blk.
> > > >
> > > > Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > > > ---
> > > >  docs/formatdomain.rst             |  8 ++++++++
> > > >  src/conf/domain_conf.c            | 16 ++++++++++++++++
> > > >  src/conf/domain_conf.h            |  2 ++
> > > >  src/conf/schemas/domaincommon.rng | 10 ++++++++++
> > > >  src/conf/storage_source_conf.c    |  2 ++
> > > >  src/conf/storage_source_conf.h    |  2 ++
> > > >  src/qemu/qemu_domain.c            |  2 ++
> > > >  src/qemu/qemu_driver.c            |  4 +++-
> > > >  src/vz/vz_utils.c                 | 12 ++++++++++++
> > > >  9 files changed, 57 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > > index 4af0b82569..7be12ff08e 100644
> > > > --- a/docs/formatdomain.rst
> > > > +++ b/docs/formatdomain.rst
> > > > @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the
> > > ``disk`` element.
> > > >        value can be either "unmap" (allow the discard request to be
> > > passed) or
> > > >        "ignore" (ignore the discard request). :since:`Since 1.0.6
> (QEMU
> > > and KVM
> > > >        only)`
> > > > +   -  The optional ``virtio_discard`` and ``virtio_write_zeroes``
> are
> > > attributes
> > > > +      that control whether discard and write-zeroes requests are
> > > handled by the
> > > > +      virtio-blk device. The feature is based on DISCARD and
> > > WRITE_ZEROES
> > > > +      commands introduced in virtio-blk protocol to improve
> performance
> > > when
> > > > +      using SSD backend. The value can be either 'on' or 'off'. Note
> > > that
> > > > +      ``discard`` and ``write_zeroes`` implementations in the block
> > > drive layer
> > > > +      are parts of the feature logically and should be turned on
> when
> > > enabling
> > > > +      the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
> > >
> > > Based on current released qemu both 'discard' and 'write-zeroes'
> feature
> > > of the 'virtio-blk' device is enabled by default:
> > >
> > >  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard
> > >    discard=<bool>         - on/off (default: true)
> > >    discard_granularity=<size> -  (default: 4294967295)
> > >    max-discard-sectors=<uint32> -  (default: 4194303)
> > >    report-discard-granularity=<bool> -  (default: true)
> > >  $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes
> > >    max-write-zeroes-sectors=<uint32> -  (default: 4194303)
> > >    write-zeroes=<bool>    - on/off (default: true)
> > >
> > > Do you need a way to disable this feature? Based on the description the
> > > default seems to be sane and actually what you'd want users to set.
> > >
> > The default is reasonable indeed. But there are still some scenarios
> > in the production environment that discard may  need to be disabled.
> > For example:
> > - The virtio-blk discard/write-zeroes commands was introduced in
> >   2017 in virtio-blk spec, so the OS distribution before 2017 can not
> >   use this feature. In that case, the cloud management platform(CMP)
> >   could recognize this issue and disable the feature in advance.
>
> Older drivers which do not support the feature are supposed to ignore
> any new feature bits:
>
>
> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-140002
>
>  2.2 Feature Bits
>
>   Each virtio device offers all the features it understands. During device
>   initialization, the driver reads this and tells the device the subset
>   that it accepts. The only way to renegotiate is to reset the device.
>
>   This allows for forwards and backwards compatibility: if the device is
>   enhanced with a new feature bit, older drivers will not write that
>   feature bit back to the device. Similarly, if a driver is enhanced with
>   a feature that the device doesn’t support, it see the new feature is not
>   offered.
>
> Based on the above it's just fine to use a older OS with the new device
> definition and management doesn't need to do anything to ensure
> compatibility.
>
> > - Discard/write-zeroes may need to be configured at disk granularity
> >   in some scenarios. Assume that CMP support 2 distributed storage
> >   clusters,  one cluster supports discard and another does not.
> >   If the VM is configured with 2 disks - vda, vdb. Which are located in
> >   2 clusters respectively. Or, the VM with disks located in the
> >   discard-supportive cluster and want to migrate disks to another
> >   cluster for some reason(eg. storage capability is exhausted)
> >   CMP may want to turn discard off explicitly.
>
> There are two distinct operations which have different requirements and
> limitations based on the specification:
>
>  - discard:
>    The virtio spec says:
>      "For discard commands, the device MAY deallocate the specified
>       range of sectors in the device backend storage."
>
>    Thus the device implementation is free to ignore any unsupported
>    discard requests. In fact the implementation in qemu does ignore
>    the request if e.g. the device's backend has discard disabled.
>
>    The backend is in fact configured by the <driver discard="unmap">
>    attribute, thus if you need to disable discard you can do that even
>    now on the backend level.
>
>  - write zeroes:
>
>    Write zeroes becomes mandatory once negotiated, but qemu has it's own
>    implementation which has integrated fallback of simply writing zeroes
>    to the backing image if there isn't any faster method available.
>
> > Though the above scenarios are quite rare and the virtio spec can
> > ensure the feature can be negotiated and used correctly.
> > CMP still wants to control the features it supports carefully and
> > precisely.
>
> As we both note there's not an actual technical problem here, right? As
> in, everything will work as configured.
>
> I'd prefer if you could justify/elaborate a bit more why you still need
> to control those features.
>
> > To summarise, IMHO, libvirt may not forbid the upper layer to
> > control the 2 features of the virtio-blk disk. Leaving the option
> > configurable for CMP or even customers.
>
> As noted above, disard requests reaching the block device below the
> image can be already configured via <driver discard="unmap/ignore">.
>
> With the 'write-zeroes' feature there isn't really any semantic
> difference regardless of what feature is supported by the underlying
> storage. The difference is only in who actually does the actual writing
> of zeroes.
>
> > There's one more background may need to be stated:
> > Current released QEMU do not provide hmp/qmp to query the
> > final features that are negotiated between front-end and back-end
> > from the hypervisor side. So if CMP wants to query what features a
> > guest VM uses, it has to query it inside the guest VM or hack the
> > qemu process.
>
> I don't quite understand why that would be something any management
> application needs to know. The hypervisor provides the features and if
> the guest OS decides not to use them it' doesn't really matter, or does
> it?
>
> There are multiple other layers apart from the hypervisor support for
> passing discards and the software that runs inside the VM (kernel,
> volume manager, filesystem driver, filesystem configuration) which can
> decide to use/don't use discards so the knowledge at the hypervisor
> level doesn't tell you much.
>
> > This way is inconvenient for control-planes, so the CMP
> > needs to control the feature as aggressively as it can.
>
> Based on the above, the only thing you can achieve is to forbid discards
> for a VM on another level than those already supported. You will not be
> able to force the OS to use discards and neither will be able to tell
> whether the OS does use them in any other way you do now can (you still
> can query number of discard requests send to the image backend).
>
> > Thank Peter for the attention to this patchset.
>
> My objections to this patchset are on the fact that the code originally
> didn't provide much justification for this patch. With your explanations
> I can't say I'm persuaded more of the contrary.
>
> If you decide to still pursue this patchset:
>
>  - Add justification to the commit message why that is needed. This is
>    needed because the code itself can't justify the need for itself.
>    Additionally do not use the first justification from this message, it
>    simply doesn't have technical grounds.
>
>  - rename the 'virtio_discard' and 'virtio_write_zeroes' features to
>    'frontend_discard' and 'frontend_write_zeroes', so that if any other
>    device frontend will expose such knobs they can be used without the
>    need for yet another knob
>
>  - modify the documentation where it states that in the vast majority of
>    cases the user doesn't need to configure it because it simply does
>    the right thing (this is actually the main objection, if the user
>    doesn't need to set a feature it doesn't make sense to expose a
>    knob)
>
>  - Since this is a device frontend feature you'll need to add a ABI
>    stability check for it (see. virDomainDiskDefCheckABIStability). Note
>    that the existing discard/write_zeroes knobs don't need a ABI
>    stability check because they are backend features thus can be changed
>    e.g. on migration.
>
>

-- 
Best regards