[PATCH RFC 00/13] hw/nvme: experimental user-creatable objects

Klaus Jensen posted 13 patches 2 years, 7 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
hw/nvme/ctrl.c       | 1187 ++++++++++++++++++++++++------------------
hw/nvme/dif.c        |  120 +++--
hw/nvme/dif.h        |   55 ++
hw/nvme/meson.build  |    2 +-
hw/nvme/ns-nvm.c     |  360 +++++++++++++
hw/nvme/ns-zoned.c   |  449 ++++++++++++++++
hw/nvme/ns.c         |  818 ++++++++++++++++-------------
hw/nvme/nvm.h        |   65 +++
hw/nvme/nvme.h       |  325 +++++-------
hw/nvme/subsys.c     |  154 +++++-
hw/nvme/zoned.h      |  147 ++++++
include/block/nvme.h |   11 +-
qapi/qom.json        |   83 +++
13 files changed, 2612 insertions(+), 1164 deletions(-)
create mode 100644 hw/nvme/dif.h
create mode 100644 hw/nvme/ns-nvm.c
create mode 100644 hw/nvme/ns-zoned.c
create mode 100644 hw/nvme/nvm.h
create mode 100644 hw/nvme/zoned.h
[PATCH RFC 00/13] hw/nvme: experimental user-creatable objects
Posted by Klaus Jensen 2 years, 7 months ago
From: Klaus Jensen <k.jensen@samsung.com>

Hi,

This is an attempt at adressing a bunch of issues that have presented
themselves since we added subsystem support. It's been brewing for a
while now.

Fundamentally, I've come to the conclusion that modeling namespaces and
subsystems as "devices" is wrong. They should have been user-creatable
objects. We've run into multiple issues with wrt. hotplugging due to how
namespaces hook up to the controller with a bus. The bus-based design
made a lot of sense when we didn't have subsystem support and it follows
the design of hw/scsi. But, the problem here is that the bus-based
design dictates a one parent relationship, and with shared namespaces,
that is just not true. If the namespaces are considered to have a single
parent, that parent is the subsystem, not any specific controller.

This series adds a set of experimental user-creatable objects:

  -object x-nvme-subsystem
  -object x-nvme-ns-nvm
  -object x-nvme-ns-zoned

It also adds a new controller device (-device x-nvme-ctrl) that supports
these new objects (and gets rid of a bunch of deprecated and confusing
parameters). This new approach has a bunch of benefits (other than just
fixing the hotplugging issues properly) - we also get support for some
nice introspection through some new dynamic properties:

  (qemu) qom-get /machine/peripheral/nvme-ctrl-1 attached-namespaces
  [
      "/objects/nvm-1",
      "/objects/zns-1"
  ]

  (qemu) qom-list /objects/zns-1
  type (string)
  subsys (link<x-nvme-subsystem>)
  nsid (uint32)
  uuid (string)
  attached-ctrls (str)
  eui64 (string)
  blockdev (string)
  pi-first (bool)
  pi-type (NvmeProtInfoType)
  extended-lba (bool)
  metadata-size (uint16)
  lba-size (size)
  zone-descriptor-extension-size (size)
  zone-cross-read (bool)
  zone-max-open (uint32)
  zone-capacity (size)
  zone-size (size)
  zone-max-active (uint32)

  (qemu) qom-get /objects/zns-1 pi-type
  "none"

  (qemu) qom-get /objects/zns-1 eui64
  "52:54:00:17:67:a0:40:15"

  (qemu) qom-get /objects/zns-1 zone-capacity
  12582912

Currently, there are no shortcuts, so you have to define the full
topology to get it up and running. Notice that the topology is explicit
(the 'subsys' and 'attached-ctrls' links). There are no 'nvme-bus'
anymore.

  -object x-nvme-subsystem,id=subsys0,subnqn=foo
  -device x-nvme-ctrl,id=nvme-ctrl-0,serial=foo,subsys=subsys0
  -device x-nvme-ctrl,id=nvme-ctrl-1,serial=bar,subsys=subsys0
  -drive  id=nvm-1,file=nvm-1.img,format=raw,if=none,discard=unmap
  -object x-nvme-ns-nvm,id=nvm-1,blockdev=nvm-1,nsid=1,subsys=subsys0,attached-ctrls=nvme-ctrl-1
  -drive  id=nvm-2,file=nvm-2.img,format=raw,if=none,discard=unmap
  -object x-nvme-ns-nvm,id=nvm-2,blockdev=nvm-2,nsid=2,subsys=subsys0,attached-ctrls=nvme-ctrl-0

It'd be nice to add some defaults for when you don't need/want a
full-blown multi controller/namespace setup.

The first patches in this series reorganized a bunch of structs to make
it easier to separate them in later patches. Then, it proceeds to hoist
the device state into separate structures such that we can reuse the
core logic in both the new objects and the existing devices. Thus, full
backwards compatibility is kept and the existing device all work as they
do prior to this series being applied. I have chosen to separate the nvm
and zoned namespace types into individual objects. The core namespace
functionality is contained in an abstract (non user-creatable) x-nvme-ns
object and the x-nvme-ns-nvm object extends this and serves at the
parent of the x-nvme-ns-zoned object itself.

There are definitely an alternative to this approach - one that I've
previously discussed with Hannes (and other QEMU devs, thanks!), and
that would be to add the subsystem as a system bus device.

Cheers, Klaus

Klaus Jensen (13):
  hw/nvme: move dif/pi prototypes into dif.h
  hw/nvme: move zns helpers and types into zoned.h
  hw/nvme: move zoned namespace members to separate struct
  hw/nvme: move nvm namespace members to separate struct
  hw/nvme: move BlockBackend to NvmeNamespaceNvm
  nvme: add structured type for nguid
  hw/nvme: hoist qdev state from namespace
  hw/nvme: hoist qdev state from controller
  hw/nvme: add experimental device x-nvme-ctrl
  hw/nvme: add experimental object x-nvme-subsystem
  hw/nvme: add experimental abstract object x-nvme-ns
  hw/nvme: add experimental objects x-nvme-ns-{nvm,zoned}
  hw/nvme: add attached-namespaces prop

 hw/nvme/ctrl.c       | 1187 ++++++++++++++++++++++++------------------
 hw/nvme/dif.c        |  120 +++--
 hw/nvme/dif.h        |   55 ++
 hw/nvme/meson.build  |    2 +-
 hw/nvme/ns-nvm.c     |  360 +++++++++++++
 hw/nvme/ns-zoned.c   |  449 ++++++++++++++++
 hw/nvme/ns.c         |  818 ++++++++++++++++-------------
 hw/nvme/nvm.h        |   65 +++
 hw/nvme/nvme.h       |  325 +++++-------
 hw/nvme/subsys.c     |  154 +++++-
 hw/nvme/zoned.h      |  147 ++++++
 include/block/nvme.h |   11 +-
 qapi/qom.json        |   83 +++
 13 files changed, 2612 insertions(+), 1164 deletions(-)
 create mode 100644 hw/nvme/dif.h
 create mode 100644 hw/nvme/ns-nvm.c
 create mode 100644 hw/nvme/ns-zoned.c
 create mode 100644 hw/nvme/nvm.h
 create mode 100644 hw/nvme/zoned.h

-- 
2.33.0


Re: [PATCH RFC 00/13] hw/nvme: experimental user-creatable objects
Posted by Kevin Wolf 2 years, 7 months ago
Am 14.09.2021 um 22:37 hat Klaus Jensen geschrieben:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Hi,
> 
> This is an attempt at adressing a bunch of issues that have presented
> themselves since we added subsystem support. It's been brewing for a
> while now.
> 
> Fundamentally, I've come to the conclusion that modeling namespaces and
> subsystems as "devices" is wrong. They should have been user-creatable
> objects. We've run into multiple issues with wrt. hotplugging due to how
> namespaces hook up to the controller with a bus. The bus-based design
> made a lot of sense when we didn't have subsystem support and it follows
> the design of hw/scsi. But, the problem here is that the bus-based
> design dictates a one parent relationship, and with shared namespaces,
> that is just not true. If the namespaces are considered to have a single
> parent, that parent is the subsystem, not any specific controller.
> 
> This series adds a set of experimental user-creatable objects:
> 
>   -object x-nvme-subsystem
>   -object x-nvme-ns-nvm
>   -object x-nvme-ns-zoned
> 
> It also adds a new controller device (-device x-nvme-ctrl) that supports
> these new objects (and gets rid of a bunch of deprecated and confusing
> parameters). This new approach has a bunch of benefits (other than just
> fixing the hotplugging issues properly) - we also get support for some
> nice introspection through some new dynamic properties:
> 
>   (qemu) qom-get /machine/peripheral/nvme-ctrl-1 attached-namespaces
>   [
>       "/objects/nvm-1",
>       "/objects/zns-1"
>   ]
> 
>   (qemu) qom-list /objects/zns-1
>   type (string)
>   subsys (link<x-nvme-subsystem>)
>   nsid (uint32)
>   uuid (string)
>   attached-ctrls (str)
>   eui64 (string)
>   blockdev (string)
>   pi-first (bool)
>   pi-type (NvmeProtInfoType)
>   extended-lba (bool)
>   metadata-size (uint16)
>   lba-size (size)
>   zone-descriptor-extension-size (size)
>   zone-cross-read (bool)
>   zone-max-open (uint32)
>   zone-capacity (size)
>   zone-size (size)
>   zone-max-active (uint32)
> 
>   (qemu) qom-get /objects/zns-1 pi-type
>   "none"
> 
>   (qemu) qom-get /objects/zns-1 eui64
>   "52:54:00:17:67:a0:40:15"
> 
>   (qemu) qom-get /objects/zns-1 zone-capacity
>   12582912
> 
> Currently, there are no shortcuts, so you have to define the full
> topology to get it up and running. Notice that the topology is explicit
> (the 'subsys' and 'attached-ctrls' links). There are no 'nvme-bus'
> anymore.
> 
>   -object x-nvme-subsystem,id=subsys0,subnqn=foo
>   -device x-nvme-ctrl,id=nvme-ctrl-0,serial=foo,subsys=subsys0
>   -device x-nvme-ctrl,id=nvme-ctrl-1,serial=bar,subsys=subsys0
>   -drive  id=nvm-1,file=nvm-1.img,format=raw,if=none,discard=unmap
>   -object x-nvme-ns-nvm,id=nvm-1,blockdev=nvm-1,nsid=1,subsys=subsys0,attached-ctrls=nvme-ctrl-1
>   -drive  id=nvm-2,file=nvm-2.img,format=raw,if=none,discard=unmap
>   -object x-nvme-ns-nvm,id=nvm-2,blockdev=nvm-2,nsid=2,subsys=subsys0,attached-ctrls=nvme-ctrl-0

I may be wrong here, but my first gut feeling when seeing this was that
referencing the controller device in the namespace object feels
backwards. Usually, we have objects that are created independently and
then the devices reference them.

Your need to use a machine_done notifier is probably related to that,
too, because it goes against the normal initialisation order, so you
have to wait. Error handling also isn't really possible in the notifier
any more, so this series seems to just print something to stderr, but
ignore the error otherwise.

Did you consider passing a list of namespaces to the controller device
instead?

I guess a problem that you have with both ways is that support for list
options isn't great in QemuOpts, which is still used both for -object
and -device in the system emulator...

Kevin


Re: [PATCH RFC 00/13] hw/nvme: experimental user-creatable objects
Posted by Klaus Jensen 2 years, 7 months ago
On Sep 16 14:41, Kevin Wolf wrote:
> Am 14.09.2021 um 22:37 hat Klaus Jensen geschrieben:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > Hi,
> > 
> > This is an attempt at adressing a bunch of issues that have presented
> > themselves since we added subsystem support. It's been brewing for a
> > while now.
> > 
> > Fundamentally, I've come to the conclusion that modeling namespaces and
> > subsystems as "devices" is wrong. They should have been user-creatable
> > objects. We've run into multiple issues with wrt. hotplugging due to how
> > namespaces hook up to the controller with a bus. The bus-based design
> > made a lot of sense when we didn't have subsystem support and it follows
> > the design of hw/scsi. But, the problem here is that the bus-based
> > design dictates a one parent relationship, and with shared namespaces,
> > that is just not true. If the namespaces are considered to have a single
> > parent, that parent is the subsystem, not any specific controller.
> > 
> > This series adds a set of experimental user-creatable objects:
> > 
> >   -object x-nvme-subsystem
> >   -object x-nvme-ns-nvm
> >   -object x-nvme-ns-zoned
> > 
> > It also adds a new controller device (-device x-nvme-ctrl) that supports
> > these new objects (and gets rid of a bunch of deprecated and confusing
> > parameters). This new approach has a bunch of benefits (other than just
> > fixing the hotplugging issues properly) - we also get support for some
> > nice introspection through some new dynamic properties:
> > 
> >   (qemu) qom-get /machine/peripheral/nvme-ctrl-1 attached-namespaces
> >   [
> >       "/objects/nvm-1",
> >       "/objects/zns-1"
> >   ]
> > 
> >   (qemu) qom-list /objects/zns-1
> >   type (string)
> >   subsys (link<x-nvme-subsystem>)
> >   nsid (uint32)
> >   uuid (string)
> >   attached-ctrls (str)
> >   eui64 (string)
> >   blockdev (string)
> >   pi-first (bool)
> >   pi-type (NvmeProtInfoType)
> >   extended-lba (bool)
> >   metadata-size (uint16)
> >   lba-size (size)
> >   zone-descriptor-extension-size (size)
> >   zone-cross-read (bool)
> >   zone-max-open (uint32)
> >   zone-capacity (size)
> >   zone-size (size)
> >   zone-max-active (uint32)
> > 
> >   (qemu) qom-get /objects/zns-1 pi-type
> >   "none"
> > 
> >   (qemu) qom-get /objects/zns-1 eui64
> >   "52:54:00:17:67:a0:40:15"
> > 
> >   (qemu) qom-get /objects/zns-1 zone-capacity
> >   12582912
> > 
> > Currently, there are no shortcuts, so you have to define the full
> > topology to get it up and running. Notice that the topology is explicit
> > (the 'subsys' and 'attached-ctrls' links). There are no 'nvme-bus'
> > anymore.
> > 
> >   -object x-nvme-subsystem,id=subsys0,subnqn=foo
> >   -device x-nvme-ctrl,id=nvme-ctrl-0,serial=foo,subsys=subsys0
> >   -device x-nvme-ctrl,id=nvme-ctrl-1,serial=bar,subsys=subsys0
> >   -drive  id=nvm-1,file=nvm-1.img,format=raw,if=none,discard=unmap
> >   -object x-nvme-ns-nvm,id=nvm-1,blockdev=nvm-1,nsid=1,subsys=subsys0,attached-ctrls=nvme-ctrl-1
> >   -drive  id=nvm-2,file=nvm-2.img,format=raw,if=none,discard=unmap
> >   -object x-nvme-ns-nvm,id=nvm-2,blockdev=nvm-2,nsid=2,subsys=subsys0,attached-ctrls=nvme-ctrl-0
> 
> I may be wrong here, but my first gut feeling when seeing this was that
> referencing the controller device in the namespace object feels
> backwards. Usually, we have objects that are created independently and
> then the devices reference them.
> 
> Your need to use a machine_done notifier is probably related to that,
> too, because it goes against the normal initialisation order, so you
> have to wait. Error handling also isn't really possible in the notifier
> any more, so this series seems to just print something to stderr, but
> ignore the error otherwise.
> 
> Did you consider passing a list of namespaces to the controller device
> instead?
> 
> I guess a problem that you have with both ways is that support for
> list options isn't great in QemuOpts, which is still used both for
> -object and -device in the system emulator...
> 

Heh. Exactly. The ability to better support lists with -object through
QAPI is why I did it like this...

Having the list of namespaces on the controller is preferable. I'll see
what I can come up with.

Thanks!
Re: [PATCH RFC 00/13] hw/nvme: experimental user-creatable objects
Posted by Klaus Jensen 2 years, 7 months ago
On Sep 16 18:30, Klaus Jensen wrote:
> On Sep 16 14:41, Kevin Wolf wrote:
> > Am 14.09.2021 um 22:37 hat Klaus Jensen geschrieben:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Hi,
> > > 
> > > This is an attempt at adressing a bunch of issues that have presented
> > > themselves since we added subsystem support. It's been brewing for a
> > > while now.
> > > 
> > > Fundamentally, I've come to the conclusion that modeling namespaces and
> > > subsystems as "devices" is wrong. They should have been user-creatable
> > > objects. We've run into multiple issues with wrt. hotplugging due to how
> > > namespaces hook up to the controller with a bus. The bus-based design
> > > made a lot of sense when we didn't have subsystem support and it follows
> > > the design of hw/scsi. But, the problem here is that the bus-based
> > > design dictates a one parent relationship, and with shared namespaces,
> > > that is just not true. If the namespaces are considered to have a single
> > > parent, that parent is the subsystem, not any specific controller.
> > > 
> > > This series adds a set of experimental user-creatable objects:
> > > 
> > >   -object x-nvme-subsystem
> > >   -object x-nvme-ns-nvm
> > >   -object x-nvme-ns-zoned
> > > 
> > > It also adds a new controller device (-device x-nvme-ctrl) that supports
> > > these new objects (and gets rid of a bunch of deprecated and confusing
> > > parameters). This new approach has a bunch of benefits (other than just
> > > fixing the hotplugging issues properly) - we also get support for some
> > > nice introspection through some new dynamic properties:
> > > 
> > >   (qemu) qom-get /machine/peripheral/nvme-ctrl-1 attached-namespaces
> > >   [
> > >       "/objects/nvm-1",
> > >       "/objects/zns-1"
> > >   ]
> > > 
> > >   (qemu) qom-list /objects/zns-1
> > >   type (string)
> > >   subsys (link<x-nvme-subsystem>)
> > >   nsid (uint32)
> > >   uuid (string)
> > >   attached-ctrls (str)
> > >   eui64 (string)
> > >   blockdev (string)
> > >   pi-first (bool)
> > >   pi-type (NvmeProtInfoType)
> > >   extended-lba (bool)
> > >   metadata-size (uint16)
> > >   lba-size (size)
> > >   zone-descriptor-extension-size (size)
> > >   zone-cross-read (bool)
> > >   zone-max-open (uint32)
> > >   zone-capacity (size)
> > >   zone-size (size)
> > >   zone-max-active (uint32)
> > > 
> > >   (qemu) qom-get /objects/zns-1 pi-type
> > >   "none"
> > > 
> > >   (qemu) qom-get /objects/zns-1 eui64
> > >   "52:54:00:17:67:a0:40:15"
> > > 
> > >   (qemu) qom-get /objects/zns-1 zone-capacity
> > >   12582912
> > > 
> > > Currently, there are no shortcuts, so you have to define the full
> > > topology to get it up and running. Notice that the topology is explicit
> > > (the 'subsys' and 'attached-ctrls' links). There are no 'nvme-bus'
> > > anymore.
> > > 
> > >   -object x-nvme-subsystem,id=subsys0,subnqn=foo
> > >   -device x-nvme-ctrl,id=nvme-ctrl-0,serial=foo,subsys=subsys0
> > >   -device x-nvme-ctrl,id=nvme-ctrl-1,serial=bar,subsys=subsys0
> > >   -drive  id=nvm-1,file=nvm-1.img,format=raw,if=none,discard=unmap
> > >   -object x-nvme-ns-nvm,id=nvm-1,blockdev=nvm-1,nsid=1,subsys=subsys0,attached-ctrls=nvme-ctrl-1
> > >   -drive  id=nvm-2,file=nvm-2.img,format=raw,if=none,discard=unmap
> > >   -object x-nvme-ns-nvm,id=nvm-2,blockdev=nvm-2,nsid=2,subsys=subsys0,attached-ctrls=nvme-ctrl-0
> > 
> > I may be wrong here, but my first gut feeling when seeing this was that
> > referencing the controller device in the namespace object feels
> > backwards. Usually, we have objects that are created independently and
> > then the devices reference them.
> > 
> > Your need to use a machine_done notifier is probably related to that,
> > too, because it goes against the normal initialisation order, so you
> > have to wait. Error handling also isn't really possible in the notifier
> > any more, so this series seems to just print something to stderr, but
> > ignore the error otherwise.
> > 
> > Did you consider passing a list of namespaces to the controller device
> > instead?
> > 
> > I guess a problem that you have with both ways is that support for
> > list options isn't great in QemuOpts, which is still used both for
> > -object and -device in the system emulator...
> > 
> 
> Heh. Exactly. The ability to better support lists with -object through
> QAPI is why I did it like this...
> 
> Having the list of namespaces on the controller is preferable. I'll see
> what I can come up with.
> 

There is also the issue that the x-nvme-ns-nvm -object needs a blockdev
- and the ordering is also a problem here. That also requires the
machine done notifier.