[PATCH v5 1/4] docs/qcow2: add the zoned format feature

Sam Li posted 4 patches 1 year ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
There is a newer version of this series
[PATCH v5 1/4] docs/qcow2: add the zoned format feature
Posted by Sam Li 1 year ago
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file can be taken as zoned device and passed through by
virtio-blk device or NVMe ZNS device to the guest given zoned
information.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..4647c5fa29 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
     filename`` to check if the NOCOW flag is set or not (Capital 'C' is
     NOCOW flag).
 
+  .. option:: zoned
+    1 for host-managed zoned device and 0 for a non-zoned device.
+
+  .. option:: zone_size
+
+    The size of a zone in bytes. The device is divided into zones of this
+    size with the exception of the last zone, which may be smaller.
+
+  .. option:: zone_capacity
+
+    The initial capacity value, in bytes, for all zones. The capacity must
+    be less than or equal to zone size. If the last zone is smaller, then
+    its capacity is capped.
+
+    The zone capacity is per zone and may be different between zones in real
+    devices. For simplicity, QCow2 sets all zones to the same capacity.
+
+  .. option:: zone_nr_conv
+
+    The number of conventional zones of the zoned device.
+
+  .. option:: max_open_zones
+
+    The maximal allowed open zones.
+
+  .. option:: max_active_zones
+
+    The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: max_append_sectors
+
+    The maximal number of 512-byte sectors in a zone append request.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1
Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
Posted by Eric Blake 1 year ago
On Mon, Oct 30, 2023 at 08:18:44PM +0800, Sam Li wrote:
> Add the specs for the zoned format feature of the qcow2 driver.
> The qcow2 file can be taken as zoned device and passed through by
> virtio-blk device or NVMe ZNS device to the guest given zoned
> information.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> index 105cb9679c..4647c5fa29 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
>      filename`` to check if the NOCOW flag is set or not (Capital 'C' is
>      NOCOW flag).
>  
> +  .. option:: zoned
> +    1 for host-managed zoned device and 0 for a non-zoned device.

Should this be a bool or enum type, instead of requiring the user to
know magic numbers?  Is there a potential to add yet another type in
the future?

> +
> +  .. option:: zone_size
> +
> +    The size of a zone in bytes. The device is divided into zones of this
> +    size with the exception of the last zone, which may be smaller.
> +
> +  .. option:: zone_capacity
> +
> +    The initial capacity value, in bytes, for all zones. The capacity must
> +    be less than or equal to zone size. If the last zone is smaller, then
> +    its capacity is capped.
> +
> +    The zone capacity is per zone and may be different between zones in real
> +    devices. For simplicity, QCow2 sets all zones to the same capacity.

Just making sure I understand: One possible setup would be to describe
a block device with zones of size 1024M but with capacity 1000M (that
is, the zone reserves 24M capacity for other purposes)?

Otherwise, I'm having a hard time seeing when you would ever set a
capacity different from size.

Are there requirements that one (or both) of these values must be
powers of 2?  Or is the requirement merely that they must be a
multiple of 512 bytes (because sub-sector operations are not
permitted)?  Is there any implicit requirement based on qcow2
implementation that a zone size/capacity must be a multiple of cluster
size (other than possibly for the last zone)?

> +
> +  .. option:: zone_nr_conv
> +
> +    The number of conventional zones of the zoned device.
> +
> +  .. option:: max_open_zones
> +
> +    The maximal allowed open zones.
> +
> +  .. option:: max_active_zones
> +
> +    The limit of the zones with implicit open, explicit open or closed state.
> +
> +  .. option:: max_append_sectors
> +
> +    The maximal number of 512-byte sectors in a zone append request.

Why is this value in sectors instead of bytes?  I understand that
drivers may be written with sectors in mind, but any time we mix units
in the public interface, it gets awkward.  I'd lean towards having
bytes here, with a requirement that it be a multiple of 512.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org
Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
Posted by Damien Le Moal 1 year ago
On 10/30/23 23:04, Eric Blake wrote:
>> +
>> +  .. option:: zone_size
>> +
>> +    The size of a zone in bytes. The device is divided into zones of this
>> +    size with the exception of the last zone, which may be smaller.
>> +
>> +  .. option:: zone_capacity
>> +
>> +    The initial capacity value, in bytes, for all zones. The capacity must
>> +    be less than or equal to zone size. If the last zone is smaller, then
>> +    its capacity is capped.
>> +
>> +    The zone capacity is per zone and may be different between zones in real
>> +    devices. For simplicity, QCow2 sets all zones to the same capacity.
> 
> Just making sure I understand: One possible setup would be to describe
> a block device with zones of size 1024M but with capacity 1000M (that
> is, the zone reserves 24M capacity for other purposes)?
> 
> Otherwise, I'm having a hard time seeing when you would ever set a
> capacity different from size.
> 
> Are there requirements that one (or both) of these values must be
> powers of 2?  Or is the requirement merely that they must be a
> multiple of 512 bytes (because sub-sector operations are not
> permitted)?  Is there any implicit requirement based on qcow2
> implementation that a zone size/capacity must be a multiple of cluster
> size (other than possibly for the last zone)?

Linux requires the zone size to be a power of 2 number of LBAs. As a value so
defined may not align to a ZNS drive internal superblock size (e.g. align to
erase blocks), the zone capacity can be smaller than the zone size. The zone
capacity represents the number of LBAs that are usable within a zone. The LBAs
between zone capacity and zone size are unusable: reads will return 0s and
writes will fail for these LBAs. A drive capacity is reported as the sum of all
zone sizes, so it may be larger than the actual usable capacity, which is the
sum of all zone capacities.

Qcow2 follows this model despite the fact that we do not have the constraint of
aligning to some hardware erase block size. This is mainly to allow emulating a
real drive configuration. If a real drive emulation is not the goal of the
use-case at hand, most users will simply want to have zone size == zone capacity
for their zoned qcow2 drives.

> 
>> +
>> +  .. option:: zone_nr_conv
>> +
>> +    The number of conventional zones of the zoned device.
>> +
>> +  .. option:: max_open_zones
>> +
>> +    The maximal allowed open zones.
>> +
>> +  .. option:: max_active_zones
>> +
>> +    The limit of the zones with implicit open, explicit open or closed state.
>> +
>> +  .. option:: max_append_sectors
>> +
>> +    The maximal number of 512-byte sectors in a zone append request.
> 
> Why is this value in sectors instead of bytes?  I understand that
> drivers may be written with sectors in mind, but any time we mix units
> in the public interface, it gets awkward.  I'd lean towards having
> bytes here, with a requirement that it be a multiple of 512.

Agreed. Let's use bytes to avoid confusion.

-- 
Damien Le Moal
Western Digital Research
Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature
Posted by Sam Li 1 year ago
Eric Blake <eblake@redhat.com> 于2023年10月30日周一 22:05写道:
>
> On Mon, Oct 30, 2023 at 08:18:44PM +0800, Sam Li wrote:
> > Add the specs for the zoned format feature of the qcow2 driver.
> > The qcow2 file can be taken as zoned device and passed through by
> > virtio-blk device or NVMe ZNS device to the guest given zoned
> > information.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  docs/system/qemu-block-drivers.rst.inc | 33 ++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >
> > diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
> > index 105cb9679c..4647c5fa29 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -172,6 +172,39 @@ This section describes each format and the options that are supported for it.
> >      filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> >      NOCOW flag).
> >
> > +  .. option:: zoned
> > +    1 for host-managed zoned device and 0 for a non-zoned device.
>
> Should this be a bool or enum type, instead of requiring the user to
> know magic numbers?  Is there a potential to add yet another type in
> the future?

Mistake, sorry. Forgot to document this change but the configurations
in the subsequent patch uses enum type.

>
> > +
> > +  .. option:: zone_size
> > +
> > +    The size of a zone in bytes. The device is divided into zones of this
> > +    size with the exception of the last zone, which may be smaller.
> > +
> > +  .. option:: zone_capacity
> > +
> > +    The initial capacity value, in bytes, for all zones. The capacity must
> > +    be less than or equal to zone size. If the last zone is smaller, then
> > +    its capacity is capped.
> > +
> > +    The zone capacity is per zone and may be different between zones in real
> > +    devices. For simplicity, QCow2 sets all zones to the same capacity.
>
> Just making sure I understand: One possible setup would be to describe
> a block device with zones of size 1024M but with capacity 1000M (that
> is, the zone reserves 24M capacity for other purposes)?

Yes, it is. The NVMe ZNS drive allows that.

>
> Otherwise, I'm having a hard time seeing when you would ever set a
> capacity different from size.
>
> Are there requirements that one (or both) of these values must be
> powers of 2?  Or is the requirement merely that they must be a
> multiple of 512 bytes (because sub-sector operations are not
> permitted)?  Is there any implicit requirement based on qcow2
> implementation that a zone size/capacity must be a multiple of cluster
> size (other than possibly for the last zone)?

Yes. Linux will only expose zoned devices that have a zone size
that is a power of 2 number of LBAs.

No, the zone size/capacity is not necessarily a multiple of the cluster size.

>
> > +
> > +  .. option:: zone_nr_conv
> > +
> > +    The number of conventional zones of the zoned device.
> > +
> > +  .. option:: max_open_zones
> > +
> > +    The maximal allowed open zones.
> > +
> > +  .. option:: max_active_zones
> > +
> > +    The limit of the zones with implicit open, explicit open or closed state.
> > +
> > +  .. option:: max_append_sectors
> > +
> > +    The maximal number of 512-byte sectors in a zone append request.
>
> Why is this value in sectors instead of bytes?  I understand that
> drivers may be written with sectors in mind, but any time we mix units
> in the public interface, it gets awkward.  I'd lean towards having
> bytes here, with a requirement that it be a multiple of 512.

Sorry. Same, already changed this in the following patches.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>