[PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set

Dmitry Fomichev posted 14 patches 3 years, 7 months ago
Test docker-quick@centos7 passed
Test docker-mingw@fedora passed
Test checkpatch passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200923182021.3724-1-dmitry.fomichev@wdc.com
Maintainers: Klaus Jensen <its@irrelevant.dk>, Keith Busch <kbusch@kernel.org>, Max Reitz <mreitz@redhat.com>, Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
block/nvme.c          |    2 +-
hw/block/nvme.c       | 1987 +++++++++++++++++++++++++++++++++++++++--
hw/block/nvme.h       |  180 ++++
hw/block/trace-events |   39 +
include/block/nvme.h  |  210 ++++-
5 files changed, 2351 insertions(+), 67 deletions(-)
[PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Dmitry Fomichev 3 years, 7 months ago
v3 -> v4

 - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
   to a zone happen at the new write pointer variable, zone->w_ptr,
   that is advanced right after submitting the backend i/o. The existing
   zone->d.wp variable is updated upon the successful write completion
   and it is used for zone reporting. Some code has been split from
   nvme_finalize_zoned_write() function to a new function,
   nvme_advance_zone_wp().

 - Make the code compile under mingw. Switch to using QEMU API for
   mmap/msync, i.e. memory_region...(). Since mmap is not available in
   mingw (even though there is mman-win32 library available on Github),
   conditional compilation is added around these calls to avoid
   undefined symbols under mingw. A better fix would be to add stub
   functions to softmmu/memory.c for the case when CONFIG_POSIX is not
   defined, but such change is beyond the scope of this patchset and it
   can be made in a separate patch.

 - Correct permission mask used to open zone metadata file.

 - Fold "Define 64 bit cqe.result" patch into ZNS commit.

 - Use clz64/clz32 instead of defining nvme_ilog2() function.

 - Simplify rpt_empty_id_struct() code, move nvme_fill_data() back
   to ZNS patch.

 - Fix a power-on processing bug.

 - Rename NVME_CMD_ZONE_APND to NVME_CMD_ZONE_APPEND.

 - Make the list of review comments addressed in v2 of the series
   (see below).

v2 -> v3:

 - Moved nvme_fill_data() function to the NSTypes patch as it is
   now used there to output empty namespace identify structs.
 - Fixed typo in Maxim's email address.

v1 -> v2:

 - Rebased on top of qemu-nvme/next branch.
 - Incorporated feedback from Klaus and Alistair.
    * Allow a subset of CSE log to be read, not the entire log
    * Assign admin command entries in CSE log to ACS fields
    * Set LPA bit 1 to indicate support of CSE log page
    * Rename CC.CSS value CSS_ALL_NSTYPES (110b) to CSS_CSI
    * Move the code to assign lbaf.ds to a separate patch
    * Remove the change in firmware revision
    * Change "driver" to "device" in comments and annotations
    * Rename ZAMDS to ZASL
    * Correct a few format expressions and some wording in
      trace event definitions
    * Remove validation code to return NVME_CAP_EXCEEDED error
    * Make ZASL to be equal to MDTS if "zone_append_size_limit"
      module parameter is not set
    * Clean up nvme_zoned_init_ctrl() to make size calculations
      less confusing
    * Avoid changing module parameters, use separate n/s variables
      if additional calculations are necessary to convert parameters
      to running values
    * Use NVME_DEFAULT_ZONE_SIZE to assign the default zone size value
    * Use default 0 for zone capacity meaning that zone capacity will
      be equal to zone size by default
    * Issue warnings if user MAR/MOR values are too large and have
      to be adjusted
    * Use unsigned values for MAR/MOR
 - Dropped "Simulate Zone Active excursions" patch.
   Excursion behavior may depend on the internal controller
   architecture and therefore be vendor-specific.
 - Dropped support for Zone Attributes and zoned AENs for now.
   These features can be added in a future series.
 - NS Types support is extended to handle active/inactive namespaces.
 - Update the write pointer after backing storage I/O completion, not
   before. This makes the emulation to run correctly in case of
   backing device failures.
 - Avoid division in the I/O path if the device zone size is
   a power of two (the most common case). Zone index then can be
   calculated by using bit shift.
 - A few reported bugs have been fixed.
 - Indentation in function definitions has been changed to make it
   the same as the rest of the code.


Zoned Namespace (ZNS) Command Set is a newly introduced command set
published by the NVM Express, Inc. organization as TP 4053. The main
design goals of ZNS are to provide hardware designers the means to
reduce NVMe controller complexity and to allow achieving a better I/O
latency and throughput. SSDs that implement this interface are
commonly known as ZNS SSDs.

This command set is implementing a zoned storage model, similarly to
ZAC/ZBC. As such, there is already support in Linux, allowing one to
perform the majority of tasks needed for managing ZNS SSDs.

The Zoned Namespace Command Set relies on another TP, known as
Namespace Types (NVMe TP 4056), which introduces support for having
multiple command sets per namespace.

Both ZNS and Namespace Types specifications can be downloaded by
visiting the following link -

https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip

This patch series adds Namespace Types support and zoned namespace
emulation capability to the existing NVMe PCI device.

The patchset is organized as follows -

The first several patches are preparatory and are added to allow for
an easier review of the subsequent commits. The group of patches that
follows adds NS Types support with only NVM Command Set being
available. Finally, the last group of commits makes definitions and
adds new code to support Zoned Namespace Command Set.

Based-on: Message-ID: <20200729220638.344477-17-its@irrelevant.dk>
Dmitry Fomichev (11):
  hw/block/nvme: Report actual LBA data shift in LBAF
  hw/block/nvme: Add Commands Supported and Effects log
  hw/block/nvme: Define trace events related to NS Types
  hw/block/nvme: Make Zoned NS Command Set definitions
  hw/block/nvme: Define Zoned NS Command Set trace events
  hw/block/nvme: Support Zoned Namespace Command Set
  hw/block/nvme: Introduce max active and open zone limits
  hw/block/nvme: Support Zone Descriptor Extensions
  hw/block/nvme: Add injection of Offline/Read-Only zones
  hw/block/nvme: Use zone metadata file for persistence
  hw/block/nvme: Document zoned parameters in usage text

Niklas Cassel (3):
  hw/block/nvme: Introduce the Namespace Types definitions
  hw/block/nvme: Add support for Namespace Types
  hw/block/nvme: Add support for active/inactive namespaces

 block/nvme.c          |    2 +-
 hw/block/nvme.c       | 1987 +++++++++++++++++++++++++++++++++++++++--
 hw/block/nvme.h       |  180 ++++
 hw/block/trace-events |   39 +
 include/block/nvme.h  |  210 ++++-
 5 files changed, 2351 insertions(+), 67 deletions(-)

-- 
2.21.0


Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Klaus Jensen 3 years, 7 months ago
On Sep 24 03:20, Dmitry Fomichev wrote:
> v3 -> v4
> 
>  - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
>    to a zone happen at the new write pointer variable, zone->w_ptr,
>    that is advanced right after submitting the backend i/o. The existing
>    zone->d.wp variable is updated upon the successful write completion
>    and it is used for zone reporting. Some code has been split from
>    nvme_finalize_zoned_write() function to a new function,
>    nvme_advance_zone_wp().
> 

Same approach that I've used, +1.

>  - Make the code compile under mingw. Switch to using QEMU API for
>    mmap/msync, i.e. memory_region...(). Since mmap is not available in
>    mingw (even though there is mman-win32 library available on Github),
>    conditional compilation is added around these calls to avoid
>    undefined symbols under mingw. A better fix would be to add stub
>    functions to softmmu/memory.c for the case when CONFIG_POSIX is not
>    defined, but such change is beyond the scope of this patchset and it
>    can be made in a separate patch.
> 

Ewwww.

This feels like a hack or at the very least an abuse of the physical
memory management API.

If it really needs to be memory mapped, then I think a hostmem-based
approach similar to what Andrzej did for PMR is needed (I think that
will get rid of the CONFIG_POSIX ifdef at least, but still leave it
slightly tricky to get it to work on all platforms AFAIK). But really,
since we do not require memory semantics for this, then I think the
abstraction is fundamentally wrong.

I am, of course, blowing my own horn, since my implementation uses a
portable blockdev for this.

Another issue is the complete lack of endian conversions. Does it
matter? It depends. Will anyone ever use this on a big endian host and
move the meta data backing file to a little endian host? Probably not.
So does it really matter? Probably not, but it is cutting corners.

>  - Make the list of review comments addressed in v2 of the series
>    (see below).
> 

Very detailed! Thanks!
RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Dmitry Fomichev 3 years, 7 months ago
> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Thursday, September 24, 2020 5:08 PM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky
> <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis
> <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> and Zoned Namespace Command Set
> 
> On Sep 24 03:20, Dmitry Fomichev wrote:
> > v3 -> v4
> >
> >  - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
> >    to a zone happen at the new write pointer variable, zone->w_ptr,
> >    that is advanced right after submitting the backend i/o. The existing
> >    zone->d.wp variable is updated upon the successful write completion
> >    and it is used for zone reporting. Some code has been split from
> >    nvme_finalize_zoned_write() function to a new function,
> >    nvme_advance_zone_wp().
> >
> 
> Same approach that I've used, +1.
> 
> >  - Make the code compile under mingw. Switch to using QEMU API for
> >    mmap/msync, i.e. memory_region...(). Since mmap is not available in
> >    mingw (even though there is mman-win32 library available on Github),
> >    conditional compilation is added around these calls to avoid
> >    undefined symbols under mingw. A better fix would be to add stub
> >    functions to softmmu/memory.c for the case when CONFIG_POSIX is not
> >    defined, but such change is beyond the scope of this patchset and it
> >    can be made in a separate patch.
> >
> 
> Ewwww.
> 
> This feels like a hack or at the very least an abuse of the physical
> memory management API.
> 
> If it really needs to be memory mapped, then I think a hostmem-based
> approach similar to what Andrzej did for PMR is needed (I think that
> will get rid of the CONFIG_POSIX ifdef at least, but still leave it
> slightly tricky to get it to work on all platforms AFAIK).

Ok, it looks that using the HostMemoryBackendFile backend will be
more appropriate. This will remove the need for conditional compile.

The mmap() portability is pretty decent across software platforms.
Any poor Windows user who is forced to emulate ZNS on mingw will be
able to do so, just without having zone state persistency. Considering
how specialized this stuff is in first place, I estimate the number of users
affected by this "limitation" to be exactly zero.

> But really,
> since we do not require memory semantics for this, then I think the
> abstraction is fundamentally wrong.
> 

Seriously, what is wrong with using mmap :) ? It is used successfully for
similar applications, for example -
https://github.com/open-iscsi/tcmu-runner/blob/master/file_zbc.c

> I am, of course, blowing my own horn, since my implementation uses a
> portable blockdev for this.
> 

You are making it sound like the entire WDC series relies on this approach.
Actually, the persistency is introduced in the second to last patch in the
series and it only adds a couple of lines of code in the i/o path to mark
zones dirty. This is possible because of using mmap() and I find the way
it is done to be quite elegant, not ugly :)

> Another issue is the complete lack of endian conversions. Does it
> matter? It depends. Will anyone ever use this on a big endian host and
> move the meta data backing file to a little endian host? Probably not.
> So does it really matter? Probably not, but it is cutting corners.
> 

Great point on endianness! Naturally, all file backed values are stored in
their native endianness. This way, there is no extra overhead on big endian
hardware architectures. Portability concerns can be easily addressed by
storing metadata endianness as a byte flag in its header. Then, during
initialization, the metadata validation code can detect the possible
discrepancy in endianness and automatically convert the metadata to the
endianness of the host. This part is out of scope of this series, but I would
be able to contribute such a solution as an enhancement in the future.

> >  - Make the list of review comments addressed in v2 of the series
> >    (see below).
> >
> 
> Very detailed! Thanks!
Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Klaus Jensen 3 years, 7 months ago
On Sep 28 02:33, Dmitry Fomichev wrote:
> > -----Original Message-----
> > From: Klaus Jensen <its@irrelevant.dk>
> >
> > If it really needs to be memory mapped, then I think a hostmem-based
> > approach similar to what Andrzej did for PMR is needed (I think that
> > will get rid of the CONFIG_POSIX ifdef at least, but still leave it
> > slightly tricky to get it to work on all platforms AFAIK).
> 
> Ok, it looks that using the HostMemoryBackendFile backend will be
> more appropriate. This will remove the need for conditional compile.
> 
> The mmap() portability is pretty decent across software platforms.
> Any poor Windows user who is forced to emulate ZNS on mingw will be
> able to do so, just without having zone state persistency. Considering
> how specialized this stuff is in first place, I estimate the number of users
> affected by this "limitation" to be exactly zero.
> 

QEMU is a cross platform project - we should strive for portability.

Alienating developers that use a Windows platform and calling them out
as "poor" is not exactly good for the zoned ecosystem.

> > But really,
> > since we do not require memory semantics for this, then I think the
> > abstraction is fundamentally wrong.
> > 
> 
> Seriously, what is wrong with using mmap :) ? It is used successfully for
> similar applications, for example -
> https://github.com/open-iscsi/tcmu-runner/blob/master/file_zbc.c
> 

There is nothing fundamentally wrong with mmap. I just think it is the
wrong abstraction here (and it limits portability for no good reason).
For PMR there is a good reason - it requires memory semantics.

> > I am, of course, blowing my own horn, since my implementation uses a
> > portable blockdev for this.
> > 
> 
> You are making it sound like the entire WDC series relies on this approach.
> Actually, the persistency is introduced in the second to last patch in the
> series and it only adds a couple of lines of code in the i/o path to mark
> zones dirty. This is possible because of using mmap() and I find the way
> it is done to be quite elegant, not ugly :)
> 

No, I understand that your implementation works fine without
persistance, but persistance is key. That is why my series adds it in
the first patch. Without persistence it is just a toy. And the QEMU
device is not just an "NVMe-version" of null_blk.

And I don't think I ever called the use of mmap ugly. I called out the
physical memory API shenanigans as a hack.

> > Another issue is the complete lack of endian conversions. Does it
> > matter? It depends. Will anyone ever use this on a big endian host and
> > move the meta data backing file to a little endian host? Probably not.
> > So does it really matter? Probably not, but it is cutting corners.
> > 

After I had replied this, I considered a follow-up, because there are
probably QEMU developers that would call me out on this.

This definitely DOES matter to QEMU.

> 
> Great point on endianness! Naturally, all file backed values are stored in
> their native endianness. This way, there is no extra overhead on big endian
> hardware architectures. Portability concerns can be easily addressed by
> storing metadata endianness as a byte flag in its header. Then, during
> initialization, the metadata validation code can detect the possible
> discrepancy in endianness and automatically convert the metadata to the
> endianness of the host. This part is out of scope of this series, but I would
> be able to contribute such a solution as an enhancement in the future.
> 

It is not out of scope. I don't see why we should merge something that
is arguably buggy.

Bottomline is that I just don't see why we should accept an
implementation that

  a) excludes some platforms (Windows) from using persistence; and
  b) contains endianness conversion issues

when there is a portable implementation posted that at least tries to
convert endianness as needed.
Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Keith Busch 3 years, 7 months ago
On Mon, Sep 28, 2020 at 08:36:48AM +0200, Klaus Jensen wrote:
> On Sep 28 02:33, Dmitry Fomichev wrote:
> > You are making it sound like the entire WDC series relies on this approach.
> > Actually, the persistency is introduced in the second to last patch in the
> > series and it only adds a couple of lines of code in the i/o path to mark
> > zones dirty. This is possible because of using mmap() and I find the way
> > it is done to be quite elegant, not ugly :)
> > 
> 
> No, I understand that your implementation works fine without
> persistance, but persistance is key. That is why my series adds it in
> the first patch. Without persistence it is just a toy. And the QEMU
> device is not just an "NVMe-version" of null_blk.

I really think we should be a bit more cautious of commiting to an
on-disk format for the persistent state. Both this and Klaus' persistent
state feels a bit ad-hoc, and with all the other knobs provided, it
looks too easy to have out-of-sync states, or just not being able to
boot at all if a qemu versions have different on-disk formats.

Is anyone really considering zone emulation for production level stuff
anyway? I can't imagine a real scenario where you'd want put yourself
through that: you are just giving yourself all the downsides of a zoned
block device and none of the benefits. AFAIK, this is provided as a
development vehicle, closer to a "toy".

I think we should consider trimming this down to a more minimal set that
we *do* agree on and commit for inclusion ASAP. We can iterate all the
bells & whistles and flush out the meta data's data marshalling scheme
for persistence later.

RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Dmitry Fomichev 3 years, 7 months ago
> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Monday, September 28, 2020 2:37 AM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Klaus Jensen <k.jensen@samsung.com>; Kevin
> Wolf <kwolf@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> Maxim Levitsky <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>;
> Niklas Cassel <Niklas.Cassel@wdc.com>; qemu-block@nongnu.org; qemu-
> devel@nongnu.org; Alistair Francis <Alistair.Francis@wdc.com>; Matias
> Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> and Zoned Namespace Command Set
> 
> On Sep 28 02:33, Dmitry Fomichev wrote:
> > > -----Original Message-----
> > > From: Klaus Jensen <its@irrelevant.dk>
> > >
> > > If it really needs to be memory mapped, then I think a hostmem-based
> > > approach similar to what Andrzej did for PMR is needed (I think that
> > > will get rid of the CONFIG_POSIX ifdef at least, but still leave it
> > > slightly tricky to get it to work on all platforms AFAIK).
> >
> > Ok, it looks that using the HostMemoryBackendFile backend will be
> > more appropriate. This will remove the need for conditional compile.
> >
> > The mmap() portability is pretty decent across software platforms.
> > Any poor Windows user who is forced to emulate ZNS on mingw will be
> > able to do so, just without having zone state persistency. Considering
> > how specialized this stuff is in first place, I estimate the number of users
> > affected by this "limitation" to be exactly zero.
> >
> 
> QEMU is a cross platform project - we should strive for portability.
> 
> Alienating developers that use a Windows platform and calling them out
> as "poor" is not exactly good for the zoned ecosystem.
> 

Wow. By bringing up political correctness here you are basically admitting
the fact that you have no real technical argument here. The whole Windows
issue is red herring that you are using to attack the code that is absolutely
legit, but comes from a competitor. Your initial complaint was that it
doesn't compile in mingw and that it uses "wrong" API. You have even
suggested the API to use. Now, the code uses that API and builds fine, but
now it's still not good simply because you "do not like it". It's a disgrace.

> > > But really,
> > > since we do not require memory semantics for this, then I think the
> > > abstraction is fundamentally wrong.
> > >
> >
> > Seriously, what is wrong with using mmap :) ? It is used successfully for
> > similar applications, for example -
> > https://github.com/open-iscsi/tcmu-runner/blob/master/file_zbc.c
> >
> 
> There is nothing fundamentally wrong with mmap. I just think it is the
> wrong abstraction here (and it limits portability for no good reason).
> For PMR there is a good reason - it requires memory semantics.
> 

We are trying to emulate NVMEe controller NVRAM.  The best abstraction
for emulating NVRAM would be... NVRAM!

> > > I am, of course, blowing my own horn, since my implementation uses a
> > > portable blockdev for this.
> > >
> >
> > You are making it sound like the entire WDC series relies on this approach.
> > Actually, the persistency is introduced in the second to last patch in the
> > series and it only adds a couple of lines of code in the i/o path to mark
> > zones dirty. This is possible because of using mmap() and I find the way
> > it is done to be quite elegant, not ugly :)
> >
> 
> No, I understand that your implementation works fine without
> persistance, but persistance is key. That is why my series adds it in
> the first patch. Without persistence it is just a toy. And the QEMU
> device is not just an "NVMe-version" of null_blk.
> 
> And I don't think I ever called the use of mmap ugly. I called out the
> physical memory API shenanigans as a hack.
> 
> > > Another issue is the complete lack of endian conversions. Does it
> > > matter? It depends. Will anyone ever use this on a big endian host and
> > > move the meta data backing file to a little endian host? Probably not.
> > > So does it really matter? Probably not, but it is cutting corners.
> > >
> 
> After I had replied this, I considered a follow-up, because there are
> probably QEMU developers that would call me out on this.
> 
> This definitely DOES matter to QEMU.
> 
> >
> > Great point on endianness! Naturally, all file backed values are stored in
> > their native endianness. This way, there is no extra overhead on big endian
> > hardware architectures. Portability concerns can be easily addressed by
> > storing metadata endianness as a byte flag in its header. Then, during
> > initialization, the metadata validation code can detect the possible
> > discrepancy in endianness and automatically convert the metadata to the
> > endianness of the host. This part is out of scope of this series, but I would
> > be able to contribute such a solution as an enhancement in the future.
> >
> 
> It is not out of scope. I don't see why we should merge something that
> is arguably buggy.

Again, wow! Now you turned around and arbitrarily elevated this issue from
moderate ("Does it matter?, cutting corners") to severe ("buggy"). Likely
because v5 of WDC patchset has been posted. This, again, just shows your
lack of integrity as a maintainer.

This "issue" is a real trivial one to fix as I described above and you are
blowing it up way out of proportion, making it look like it is a fundamental
problem that can not be resolved. It's not.

> 
> Bottomline is that I just don't see why we should accept an
> implementation that
> 
>   a) excludes some platforms (Windows) from using persistence; and
>   b) contains endianness conversion issues
> 
> when there is a portable implementation posted that at least tries to
> convert endianness as needed.

Doesn't that implementation discriminate against big endian architectures? :)
Ok, it is a joke - with some folks I need to clarify this.

Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Klaus Jensen 3 years, 7 months ago
On Sep 29 15:42, Dmitry Fomichev wrote:
> > -----Original Message-----
> > From: Klaus Jensen <its@irrelevant.dk>
> > Sent: Monday, September 28, 2020 2:37 AM
> > To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> > Cc: Keith Busch <kbusch@kernel.org>; Damien Le Moal
> > <Damien.LeMoal@wdc.com>; Klaus Jensen <k.jensen@samsung.com>; Kevin
> > Wolf <kwolf@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> > Maxim Levitsky <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>;
> > Niklas Cassel <Niklas.Cassel@wdc.com>; qemu-block@nongnu.org; qemu-
> > devel@nongnu.org; Alistair Francis <Alistair.Francis@wdc.com>; Matias
> > Bjorling <Matias.Bjorling@wdc.com>
> > Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> > and Zoned Namespace Command Set
> > 
> > On Sep 28 02:33, Dmitry Fomichev wrote:
> > > > -----Original Message-----
> > > > From: Klaus Jensen <its@irrelevant.dk>
> > > >
> > > > If it really needs to be memory mapped, then I think a hostmem-based
> > > > approach similar to what Andrzej did for PMR is needed (I think that
> > > > will get rid of the CONFIG_POSIX ifdef at least, but still leave it
> > > > slightly tricky to get it to work on all platforms AFAIK).
> > >
> > > Ok, it looks that using the HostMemoryBackendFile backend will be
> > > more appropriate. This will remove the need for conditional compile.
> > >
> > > The mmap() portability is pretty decent across software platforms.
> > > Any poor Windows user who is forced to emulate ZNS on mingw will be
> > > able to do so, just without having zone state persistency. Considering
> > > how specialized this stuff is in first place, I estimate the number of users
> > > affected by this "limitation" to be exactly zero.
> > >
> > 
> > QEMU is a cross platform project - we should strive for portability.
> > 
> > Alienating developers that use a Windows platform and calling them out
> > as "poor" is not exactly good for the zoned ecosystem.
> > 
> 
> Wow. By bringing up political correctness here you are basically admitting
> the fact that you have no real technical argument here.

I prefer that we support all platforms if and when we can. That's a
technical argument, not a personal one like you those you start using
now.

> The whole Windows issue is red herring that you are using to attack
> the code that is absolutely legit, but comes from a competitor.

I can't even...

> Your initial complaint was that it doesn't compile in mingw and that
> it uses "wrong" API. You have even suggested the API to use. Now, the
> code uses that API and builds fine, but now it's still not good simply
> because you "do not like it". It's a disgrace.
> 

I answered this in a previous reply.

> > > > But really,
> > > > since we do not require memory semantics for this, then I think the
> > > > abstraction is fundamentally wrong.
> > > >
> > >
> > > Seriously, what is wrong with using mmap :) ? It is used successfully for
> > > similar applications, for example -
> > > https://github.com/open-iscsi/tcmu-runner/blob/master/file_zbc.c
> > >
> > 
> > There is nothing fundamentally wrong with mmap. I just think it is the
> > wrong abstraction here (and it limits portability for no good reason).
> > For PMR there is a good reason - it requires memory semantics.
> > 
> 
> We are trying to emulate NVMEe controller NVRAM.  The best abstraction
> for emulating NVRAM would be... NVRAM!
> 

You never brought that up before and sure it could be a fair argument,
except it is not true.

PMR is emulating NVRAM (and requires memory semantics). Persistent state
is not emulating anything. It is an implementation detail.

> > > > I am, of course, blowing my own horn, since my implementation uses a
> > > > portable blockdev for this.
> > > >
> > >
> > > You are making it sound like the entire WDC series relies on this approach.
> > > Actually, the persistency is introduced in the second to last patch in the
> > > series and it only adds a couple of lines of code in the i/o path to mark
> > > zones dirty. This is possible because of using mmap() and I find the way
> > > it is done to be quite elegant, not ugly :)
> > >
> > 
> > No, I understand that your implementation works fine without
> > persistance, but persistance is key. That is why my series adds it in
> > the first patch. Without persistence it is just a toy. And the QEMU
> > device is not just an "NVMe-version" of null_blk.
> > 
> > And I don't think I ever called the use of mmap ugly. I called out the
> > physical memory API shenanigans as a hack.
> > 
> > > > Another issue is the complete lack of endian conversions. Does it
> > > > matter? It depends. Will anyone ever use this on a big endian host and
> > > > move the meta data backing file to a little endian host? Probably not.
> > > > So does it really matter? Probably not, but it is cutting corners.
> > > >
> > 
> > After I had replied this, I considered a follow-up, because there are
> > probably QEMU developers that would call me out on this.
> > 
> > This definitely DOES matter to QEMU.
> > 
> > >
> > > Great point on endianness! Naturally, all file backed values are stored in
> > > their native endianness. This way, there is no extra overhead on big endian
> > > hardware architectures. Portability concerns can be easily addressed by
> > > storing metadata endianness as a byte flag in its header. Then, during
> > > initialization, the metadata validation code can detect the possible
> > > discrepancy in endianness and automatically convert the metadata to the
> > > endianness of the host. This part is out of scope of this series, but I would
> > > be able to contribute such a solution as an enhancement in the future.
> > >
> > 
> > It is not out of scope. I don't see why we should merge something that
> > is arguably buggy.
> 
> Again, wow! Now you turned around and arbitrarily elevated this issue from
> moderate ("Does it matter?, cutting corners") to severe ("buggy"). Likely
> because v5 of WDC patchset has been posted.

No, exactly as I wrote above, after I hit reply I considered a
follow-up. I guess I should have.

> This, again, just shows your lack of integrity as a maintainer.
> 

I can't believe I just read that.

I will not put up with this. It is completely non-called for. I stand up
for my opinions and I will fight to make sure the best possible code
goes upstream. Yes, I am paid by Samsung. But I can compartmentalize. I
have been working on QEMU before Samsung and I know how to separate
corporate interest and open source. I have a proven record on this list
to show that. I really cannot believe that you brought it down to that
level. I have been putting forth technical arguments throughout this
entire review process and now you are getting personal.

Not. Cool. Please keep things professional from now.

> This "issue" is a real trivial one to fix as I described above and you are
> blowing it up way out of proportion, making it look like it is a fundamental
> problem that can not be resolved. It's not.
> 

If it is so trival to fix, please fix it. I think I made it clear that I
won't be happy until it is portable.

And please note that I have *not* complained about other parts of your
series. I have complained ALOT about the persistence implementation -
and I continue to stand behind those complaints.

I'm getting super tired of this one-sided process. I have continuously
reviewed and commented your series, I have found multiple bugs, I have
suggested improvements. Maybe if just one or two of those 9 people who
signed off on your zoned implementation could look past their own nose
and look at my series - you might just realize that its decent, portable
and offers some niceties that yours do not have (at the cost of the same
amount of code mind you).
Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Keith Busch 3 years, 7 months ago
All,

Let's de-escalate this, please. There's no reason to doubt Klaus wants
to see this to work well, just as everyone else does. We unfortunately
have conflicting proposals posted, and everyone is passionate enough
about their work, but please simmer down.

As I mentioned earlier, I'd like to refocus on the basic implementation
and save the persistent state discussion once the core is solid. After
going through it all, I feel there's enough to discuss there to keep us
busy for little while longer. Additional comments on the code will be
coming from me later today.

RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Dmitry Fomichev 3 years, 7 months ago

> -----Original Message-----
> From: Keith Busch <kbusch@kernel.org>
> Sent: Tuesday, September 29, 2020 3:22 PM
> To: Klaus Jensen <its@irrelevant.dk>
> Cc: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>; Kevin Wolf
> <kwolf@redhat.com>; Fam Zheng <fam@euphon.net>; Damien Le Moal
> <Damien.LeMoal@wdc.com>; qemu-block@nongnu.org; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Klaus Jensen <k.jensen@samsung.com>; qemu-
> devel@nongnu.org; Alistair Francis <Alistair.Francis@wdc.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Matias Bjorling
> <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> and Zoned Namespace Command Set
> 
> All,
> 
> Let's de-escalate this, please. There's no reason to doubt Klaus wants
> to see this to work well, just as everyone else does. We unfortunately
> have conflicting proposals posted, and everyone is passionate enough
> about their work, but please simmer down.
> 
> As I mentioned earlier, I'd like to refocus on the basic implementation
> and save the persistent state discussion once the core is solid. After
> going through it all, I feel there's enough to discuss there to keep us
> busy for little while longer. Additional comments on the code will be
> coming from me later today.

OK, I agree with this and I will not be replying to the email prior to this
one it the thread. Let's calm down so we will be able to have a beer at a
conference one day :)

The only one thing that I would like to cover is lack of response to Klaus'
ZNS patchset. Klaus, you are right to complain about it. Since discovering
about the large backlog of NVMe patches that you had pending
(something that we were not aware at the time of publishing our patches),
we made the decision to rebase our series on top of the patches that you
had posted before the publication time of WDC ZNS patchset. Since then,
I got caught in the constant cycle of rebasing our patches on top of your
series and that prevented me from doing much in terms of reviewing of
your commits. Now, once we seem to catch up with the current head of
development, I should be able to do more of this. There is absolutely no
ill will involved :)

Dmitry