RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set

Matias Bjorling posted 14 patches 3 years, 7 months ago
Only 0 patches received!
RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Posted by Matias Bjorling 3 years, 7 months ago

> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Tuesday, 29 September 2020 20.36
> To: Matias Bjorling <Matias.Bjorling@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Fam Zheng <fam@euphon.net>; Kevin Wolf
> <kwolf@redhat.com>; qemu-block@nongnu.org; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Klaus Jensen <k.jensen@samsung.com>; qemu-
> devel@nongnu.org; Alistair Francis <Alistair.Francis@wdc.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and
> Zoned Namespace Command Set
> 
> On Sep 29 18:17, Matias Bjorling wrote:
> >
> >
> > > -----Original Message-----
> > > From: Klaus Jensen <its@irrelevant.dk>
> > > Sent: Tuesday, 29 September 2020 20.00
> > > To: Keith Busch <kbusch@kernel.org>
> > > Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Fam Zheng
> > > <fam@euphon.net>; Kevin Wolf <kwolf@redhat.com>; qemu-
> > > block@nongnu.org; Niklas Cassel <Niklas.Cassel@wdc.com>; Klaus
> > > Jensen <k.jensen@samsung.com>; qemu-devel@nongnu.org; Alistair
> > > Francis <Alistair.Francis@wdc.com>; Philippe Mathieu-Daudé
> > > <philmd@redhat.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> > > Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> > > and Zoned Namespace Command Set
> > >
> > > On Sep 29 10:29, Keith Busch wrote:
> > > > On Tue, Sep 29, 2020 at 12:46:33PM +0200, Klaus Jensen wrote:
> > > > > It is unmistakably clear that you are invalidating my arguments
> > > > > about portability and endianness issues by suggesting that we
> > > > > just remove persistent state and deal with it later, but
> > > > > persistence is the killer feature that sets the QEMU emulated
> > > > > device apart from other emulation options. It is not about using
> > > > > emulation in production (because yeah, why would you?), but
> > > > > persistence is what makes it possible to develop and test "zoned
> > > > > FTLs" or something that
> > > requires recovery at power up.
> > > > > This is what allows testing of how your host software deals with
> > > > > opened zones being transitioned to FULL on power up and the
> > > > > persistent tracking of LBA allocation (in my series) can be used
> > > > > to properly test error recovery if you lost state in the app.
> > > >
> > > > Hold up -- why does an OPEN zone transition to FULL on power up?
> > > > The spec suggests it should be CLOSED. The spec does appear to
> > > > support going to FULL on a NVM Subsystem Reset, though. Actually,
> > > > now that I'm looking at this part of the spec, these implicit
> > > > transitions seem a bit less clear than I expected. I'm not sure
> > > > it's clear enough to evaluate qemu's compliance right now.
> > > >
> > > > But I don't see what testing these transitions has to do with
> > > > having a persistent state. You can reboot your VM without tearing
> > > > down the running QEMU instance. You can also unbind the driver or
> > > > shutdown the controller within the running operating system. That
> > > > should make those implicit state transitions reachable in order to
> > > > exercise your FTL's recovery.
> > > >
> > >
> > > Oh dear - don't "spec" with me ;)
> > >
> > > NVMe v1.4 Section 7.3.1:
> > >
> > >     An NVM Subsystem Reset is initiated when:
> > >       * Main power is applied to the NVM subsystem;
> > >       * A value of 4E564D64h ("NVMe") is written to the NSSR.NSSRC
> > >         field;
> > >       * Requested using a method defined in the NVMe Management
> > >         Interface specification; or
> > >       * A vendor specific event occurs.
> > >
> > > In the context of QEMU, "Main power" is tearing down QEMU and
> > > starting it from scratch. Just like on a "real" host, unbinding the
> > > driver, rebooting or shutting down the controller does not cause a
> > > subsystem reset (and does not cause the zones to change state). And
> > > since the device does not indicate support for the optional
> > > NSSR.NSSRC register, that way to initiate a subsystem cannot be used.
> > >
> > > The reason for moving to FULL is that write pointer updates are not
> > > persisted on each advancement, only when the zone state changes. So
> > > zones that were opened might have valid data, but invalid write pointer.
> > > So the device transitions them to FULL as it is allowed to.
> > >
> >
> > How about when one must also recover from intermediate states (i.e.,
> > open/closed upon power loss). For example, I don't hope a real SSD
> > implementation transition zones to full when it has thousands of open
> > simultaneously. That could be a disaster for the PE cycles, and a lot
> > of media going to waste. One would want applications to support that
> > kind of failure mode as well.
> 
> Christ. The WDC Strike Force is really jumping out of lightspeed here.
> I'm afraid I don't have an opposing force to engage with. So I'll be your only
> boxing bag for the evening.
> 
> As Keith just said, "Opened" is not a valid intial state. Didn't you write the
> spec? ;) As for Closed, they will be brought up as is.

Upon power failure, a zone in the Explicitly Opened state or the Implicitly Opened state, and has LBAs written, can either be transitioned to Full or Closed state by the controller.

In the previous mail, I wanted to point out that if the intention of qemu was to test applications upon power failures, it could be beneficial to have an option that allowed transitioning open zones to closed upon power failure.

Then applications can be tested with that in mind as well, without having access to an SSD that provided that kind of implementation.

> 
> With that in mind, I'm not sure what you specifically refer to? I'll gently remind
> you that the QEMU nvme device is not a real SSD and does not deal with NAND
> so it does not really do any "recovering" of intermediate states on power on if
> that is what you refer to?