[PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS

Jonathan Derrick posted 2 patches 1 year, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221228194141.118-1-jonathan.derrick@linux.dev
Maintainers: Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
There is a newer version of this series
docs/system/devices/nvme.rst |  60 +++++-
hw/nvme/cfg_key_checker.c    |  51 +++++
hw/nvme/ctrl-cfg.c           | 224 +++++++++++++++++++++
hw/nvme/ctrl.c               | 313 +++++++++++++++++++++++++++++-
hw/nvme/meson.build          |   2 +-
hw/nvme/ns-backend.c         | 288 +++++++++++++++++++++++++++
hw/nvme/ns.c                 | 365 +++++++++++++++++++++++++++++++----
hw/nvme/nvme.h               |  32 ++-
hw/nvme/subsys.c             |  11 +-
hw/nvme/trace-events         |   3 +
include/block/nvme.h         |  31 +++
include/hw/nvme/ctrl-cfg.h   |  24 +++
include/hw/nvme/ns-cfg.h     |  28 +++
include/hw/nvme/nvme-cfg.h   | 188 ++++++++++++++++++
qemu-img-cmds.hx             |   6 +
qemu-img.c                   | 132 +++++++++++++
16 files changed, 1704 insertions(+), 54 deletions(-)
create mode 100644 hw/nvme/cfg_key_checker.c
create mode 100644 hw/nvme/ctrl-cfg.c
create mode 100644 hw/nvme/ns-backend.c
create mode 100644 include/hw/nvme/ctrl-cfg.h
create mode 100644 include/hw/nvme/ns-cfg.h
create mode 100644 include/hw/nvme/nvme-cfg.h
[PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS
Posted by Jonathan Derrick 1 year, 4 months ago
From: Michael Kropaczek <michael.kropaczek@solidigm.com>

Description:

Currently namespaces could be configured as follows:
1. Legacy Namespace - just one namespace within Nvme controller's
   where the back-end was specified for nvme device by -drive parameter
   pointing directly to the image file.
2. Additional Namespaces - specified by nvme-ns devices each having its
   own back-end. To have multiple namespaces each needed to be specified
   at Qemu's command line being associated with the most recently defined
   nvme-bus from nvme device.
   If a such additional namespace should be attached and/or detached by the
   guest OS, nvme controller has to be linked with another device nvme-subsys.

All that have a static nature, all need to be specified at the Qemu's 
command line, all specified virtual nvme entities will be processed during
Qemu's start-up then created and provided to the guest OS.

To have a support for nvme create-ns and delete-ns commands with specified
parameters a different approach is needed.
Virtual devices representing namespaces need to be created and/or deleted 
during Qemu's running session, at anytime. The back-end image sizes for a
namespace must accommodate the payload size and size of metadata resulted
from specified parameters. The total capacity of the nvme controller
altogether with un-allocated capacity needs to be taken into account and
updated according to nvme create-ns and delete-ns commands respectively.

Here is the approach:
The nvme device will get new parameter:
 - auto-ns-path, which specifies the path to the storage area where back-end
   image and necessary config files located stored.

The virtual devices representing namespaces will be created dynamically during
the Qemu running session following issuance of nvme create-ns and delete-ns
commands from the guest OS. QOM classes and instances will be created utilizing
existing configuration scheme used during Qemu's start-up. Back-end image files
will be neither created nor deleted during Qemu's startup or running session.
Instead a set of back-end image files and relevant config will be created by
qemu-img tool with createns sub-command prior to Qemu's session.
Required parameters are: -S serial number which must match serial parameter of
qemu-system-xx -device nvme command line specification, -C total capacity, and
optional -N that will set a maximal limit on number of allowed
namespaces (default 256) which will be followed by path name pointing to
storage location corresponding to auto-ns-path of qemu-system-xx -device nvme
parameter.

Those created back-end image files will be pre-loaded during Qemu's start-up
and then during running Qemu's session will be associated or disassociated with
QOM namespaces virtual instances, as a result of issuing nvme create-ns or
delete-ns commands. The associated back-end image file for relevant namespace
will be re-sized as follows: delete-ns command will truncate image file to the
size of 0, whereas create-ns command will re-size the image file to the size
provided by nvme create-ns command parameters. Truncating/re-sizing is a result
of blk_truncate() API which utilizes co-routines and should not block Qemu main
thread while scheduling AIO operations. It is assured that all settings will
retain over Qemu's start-ups and shutdowns. The implementation makes it
possible to combine the existing "Additional Namespace" implementation with the
new "Managed Namespaces". Those will coexist with obvious restrictions, like
both will share the same NsIds space, "static" namespaces cannot be deleted or
if its NsId specified at Qemu's command line will conflicts with previously
created one by nvme create-ns (and retained), this will lead to an abort of
Qemu at its start up.

More than one of NVMe controllers associated with NVMe subsystem are supported.
This feature requires that parameters serial= and subsys= of additional
controllers must match those of the primary controller and auto-ns-path=
must not be specified.

Michael Kropaczek (2):
  hw/nvme: Support for Namespaces Management from guest OS - create-ns
  hw/nvme: Support for Namespaces Management from guest OS - delete-ns

 docs/system/devices/nvme.rst |  60 +++++-
 hw/nvme/cfg_key_checker.c    |  51 +++++
 hw/nvme/ctrl-cfg.c           | 224 +++++++++++++++++++++
 hw/nvme/ctrl.c               | 313 +++++++++++++++++++++++++++++-
 hw/nvme/meson.build          |   2 +-
 hw/nvme/ns-backend.c         | 288 +++++++++++++++++++++++++++
 hw/nvme/ns.c                 | 365 +++++++++++++++++++++++++++++++----
 hw/nvme/nvme.h               |  32 ++-
 hw/nvme/subsys.c             |  11 +-
 hw/nvme/trace-events         |   3 +
 include/block/nvme.h         |  31 +++
 include/hw/nvme/ctrl-cfg.h   |  24 +++
 include/hw/nvme/ns-cfg.h     |  28 +++
 include/hw/nvme/nvme-cfg.h   | 188 ++++++++++++++++++
 qemu-img-cmds.hx             |   6 +
 qemu-img.c                   | 132 +++++++++++++
 16 files changed, 1704 insertions(+), 54 deletions(-)
 create mode 100644 hw/nvme/cfg_key_checker.c
 create mode 100644 hw/nvme/ctrl-cfg.c
 create mode 100644 hw/nvme/ns-backend.c
 create mode 100644 include/hw/nvme/ctrl-cfg.h
 create mode 100644 include/hw/nvme/ns-cfg.h
 create mode 100644 include/hw/nvme/nvme-cfg.h

-- 
2.37.3
Re: [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS
Posted by Kevin Wolf 1 year, 4 months ago
Am 28.12.2022 um 20:41 hat Jonathan Derrick geschrieben:
> Here is the approach:
> The nvme device will get new parameter:
>  - auto-ns-path, which specifies the path to the storage area where back-end
>    image and necessary config files located stored.
> 
> The virtual devices representing namespaces will be created dynamically during
> the Qemu running session following issuance of nvme create-ns and delete-ns
> commands from the guest OS. QOM classes and instances will be created utilizing
> existing configuration scheme used during Qemu's start-up. Back-end image files
> will be neither created nor deleted during Qemu's startup or running session.
> Instead a set of back-end image files and relevant config will be created by
> qemu-img tool with createns sub-command prior to Qemu's session.
> Required parameters are: -S serial number which must match serial parameter of
> qemu-system-xx -device nvme command line specification, -C total capacity, and
> optional -N that will set a maximal limit on number of allowed
> namespaces (default 256) which will be followed by path name pointing to
> storage location corresponding to auto-ns-path of qemu-system-xx -device nvme
> parameter.
> 
> Those created back-end image files will be pre-loaded during Qemu's start-up
> and then during running Qemu's session will be associated or disassociated with
> QOM namespaces virtual instances, as a result of issuing nvme create-ns or
> delete-ns commands. The associated back-end image file for relevant namespace
> will be re-sized as follows: delete-ns command will truncate image file to the
> size of 0, whereas create-ns command will re-size the image file to the size
> provided by nvme create-ns command parameters. Truncating/re-sizing is a result
> of blk_truncate() API which utilizes co-routines and should not block Qemu main
> thread while scheduling AIO operations. It is assured that all settings will
> retain over Qemu's start-ups and shutdowns. The implementation makes it
> possible to combine the existing "Additional Namespace" implementation with the
> new "Managed Namespaces". Those will coexist with obvious restrictions, like
> both will share the same NsIds space, "static" namespaces cannot be deleted or
> if its NsId specified at Qemu's command line will conflicts with previously
> created one by nvme create-ns (and retained), this will lead to an abort of
> Qemu at its start up.

This looks like a valid approach for a proof of concept, but from a
backend perspective, I'm concerned that this approach might be too
limiting and we won't have a good path forward.

For example, how can we integrate this with snapshots? You expect a
specific filename for the image, but taking an external snapshot means
creating an overlay image with a different name.

How do we migrate storage like this? If the management tool (probably
libvirt) knows about all the namespace images and the config file (!),
it can possibly migrate them individually, but note that while a mirror
job is active, images can't be resized any more.

What if we don't want to use a directory on the local filesystem to
store the images, but some network protocol?

It seems to me that we should define proper block layer APIs for
handling namespaces, and then we can have your implementation as one
possible image driver that supports these APIs, for which we can accept
these limitations for now. At least this would already avoid having
backend logic in the device implementation, and allow us to replace it
with something better later without having to change the design of the
device emulation code.

Eventually, I think, if we want to have dynamic namespaces properly
supported, they need to be a feature on the image format level, so that
you could keep all namespaces in a single qcow2 file.

Kevin