[PATCH 0/2] NVMe namespace hotplug and drive reconnection support

mr-083 posted 2 patches 2 days, 8 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260409070114.11313-1-matthieu@min.io
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, "Dr. David Alan Gilbert" <dave@treblig.org>, Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Jesper Devantier <foss@defmacro.it>
block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
hmp-commands.hx                | 18 +++++++
hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
hw/nvme/ns.c                   |  1 +
hw/nvme/subsys.c               |  2 +
include/block/block-hmp-cmds.h |  1 +
6 files changed, 166 insertions(+)
[PATCH 0/2] NVMe namespace hotplug and drive reconnection support
Posted by mr-083 2 days, 8 hours ago
This series adds two features that together enable transparent NVMe disk
hot-swap simulation in QEMU, matching the behavior of physical NVMe
drives being pulled and reinserted in the same PCIe slot.

Problem:
Currently, hot-swapping an NVMe disk in QEMU requires removing the
entire NVMe controller via device_del, which causes the Linux guest to
assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
This breaks storage software that tracks drives by device name.

Solution:
Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
proper Asynchronous Event Notification (AEN) so the guest kernel detects
namespace changes. This allows namespace-level hot-swap without removing
the NVMe controller.

Patch 2 adds a drive_insert HMP command that reconnects a host block
device file to an existing guest device after drive_del. This is the
counterpart to drive_del for non-removable devices where
blockdev-change-medium cannot be used.

The recommended hot-swap sequence is:
  1. drive_del <drive-id>          # disconnect backing store
  2. drive_insert <device> <file>  # reconnect backing store
  3. pcie_aer_inject_error <port> SDN  # trigger controller reset

After this sequence, the guest sees the same controller and namespace
names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
recovers transparently via the standard AER recovery path.

Tested with:
- Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
- NVMe subsystem model with multipath disabled
- DirectPV and MinIO AIStor storage stack

mr-083 (2):
  hw/nvme: add namespace hotplug support
  block/monitor: add drive_insert HMP command

 block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
 hmp-commands.hx                | 18 +++++++
 hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
 hw/nvme/ns.c                   |  1 +
 hw/nvme/subsys.c               |  2 +
 include/block/block-hmp-cmds.h |  1 +
 6 files changed, 166 insertions(+)

--
2.50.1 (Apple Git-155)
Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
Posted by Stefan Hajnoczi 1 day, 18 hours ago
On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote:
> This series adds two features that together enable transparent NVMe disk
> hot-swap simulation in QEMU, matching the behavior of physical NVMe
> drives being pulled and reinserted in the same PCIe slot.
> 
> Problem:
> Currently, hot-swapping an NVMe disk in QEMU requires removing the
> entire NVMe controller via device_del, which causes the Linux guest to
> assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
> This breaks storage software that tracks drives by device name.

Hi mr-083,
Neat, I was looking for something like this recently!

> Solution:
> Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
> proper Asynchronous Event Notification (AEN) so the guest kernel detects
> namespace changes. This allows namespace-level hot-swap without removing
> the NVMe controller.
> 
> Patch 2 adds a drive_insert HMP command that reconnects a host block
> device file to an existing guest device after drive_del. This is the
> counterpart to drive_del for non-removable devices where
> blockdev-change-medium cannot be used.
> 
> The recommended hot-swap sequence is:
>   1. drive_del <drive-id>          # disconnect backing store
>   2. drive_insert <device> <file>  # reconnect backing store

Is it possible to achieve this with device_del + device_add instead of
introducing a new monitor command?

device_del nvme-ns2
blockdev-del nvme-ns2-blk      (or drive_del)
...
blockdev-add nvme-ns2-blk,...  (or drive_add)
device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk

>   3. pcie_aer_inject_error <port> SDN  # trigger controller reset

Is NVMe AEN insufficient to get the guest to recognize the Namespace
change? I looked at the Linux NVMe driver code recently and got the
impression it would process changes to the Namespace list upon receiving
the NVMe AEN.

> After this sequence, the guest sees the same controller and namespace
> names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
> recovers transparently via the standard AER recovery path.
> 
> Tested with:
> - Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
> - NVMe subsystem model with multipath disabled
> - DirectPV and MinIO AIStor storage stack
> 
> mr-083 (2):
>   hw/nvme: add namespace hotplug support
>   block/monitor: add drive_insert HMP command
> 
>  block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
>  hmp-commands.hx                | 18 +++++++
>  hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
>  hw/nvme/ns.c                   |  1 +
>  hw/nvme/subsys.c               |  2 +
>  include/block/block-hmp-cmds.h |  1 +
>  6 files changed, 166 insertions(+)
> 
> --
> 2.50.1 (Apple Git-155)
> 
Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
Posted by Matthieu Rolla 1 day, 14 hours ago
Thanks for the review!

> Is it possible to achieve this with device_del + device_add instead of
> introducing a new monitor command?

Yes, device_del + device_add works. I tested it and the AEN properly
notifies the guest kernel which rescans and adds/removes the block
device.

However, when filesystems (XFS via DirectPV in our case) are mounted
on the namespace, the old block device number is not reused on re-add.
The kernel's IDA allocator only frees the ID when all references to
the namespace head are released (nvme_free_ns_head), but the stale
XFS mount holds a reference indefinitely.

Without mounted filesystems, the ID is reused correctly (/dev/nvme0n1
stays nvme0n1).

> Is NVMe AEN insufficient to get the guest to recognize the Namespace
> change?

You're right AEN is sufficient. I confirmed that the Linux NVMe
driver processes NVME_AER_NOTICE_NS_CHANGED and rescans automatically.
The SDN was unnecessary.

I dropped Patch 2 (drive_insert) and sent v2 with just the
namespace hotplug support. The commit message now documents the
correct device_del + device_add flow.

Here is the link
https://mail.gnu.org/archive/html/qemu-devel/2026-04/msg01507.html

Thanks

On Thu, Apr 9, 2026 at 11:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote:
> > This series adds two features that together enable transparent NVMe disk
> > hot-swap simulation in QEMU, matching the behavior of physical NVMe
> > drives being pulled and reinserted in the same PCIe slot.
> >
> > Problem:
> > Currently, hot-swapping an NVMe disk in QEMU requires removing the
> > entire NVMe controller via device_del, which causes the Linux guest to
> > assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
> > This breaks storage software that tracks drives by device name.
>
> Hi mr-083,
> Neat, I was looking for something like this recently!
>
> > Solution:
> > Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
> > proper Asynchronous Event Notification (AEN) so the guest kernel detects
> > namespace changes. This allows namespace-level hot-swap without removing
> > the NVMe controller.
> >
> > Patch 2 adds a drive_insert HMP command that reconnects a host block
> > device file to an existing guest device after drive_del. This is the
> > counterpart to drive_del for non-removable devices where
> > blockdev-change-medium cannot be used.
> >
> > The recommended hot-swap sequence is:
> >   1. drive_del <drive-id>          # disconnect backing store
> >   2. drive_insert <device> <file>  # reconnect backing store
>
> Is it possible to achieve this with device_del + device_add instead of
> introducing a new monitor command?
>
> device_del nvme-ns2
> blockdev-del nvme-ns2-blk      (or drive_del)
> ...
> blockdev-add nvme-ns2-blk,...  (or drive_add)
> device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk
>
> >   3. pcie_aer_inject_error <port> SDN  # trigger controller reset
>
> Is NVMe AEN insufficient to get the guest to recognize the Namespace
> change? I looked at the Linux NVMe driver code recently and got the
> impression it would process changes to the Namespace list upon receiving
> the NVMe AEN.
>
> > After this sequence, the guest sees the same controller and namespace
> > names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
> > recovers transparently via the standard AER recovery path.
> >
> > Tested with:
> > - Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
> > - NVMe subsystem model with multipath disabled
> > - DirectPV and MinIO AIStor storage stack
> >
> > mr-083 (2):
> >   hw/nvme: add namespace hotplug support
> >   block/monitor: add drive_insert HMP command
> >
> >  block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
> >  hmp-commands.hx                | 18 +++++++
> >  hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
> >  hw/nvme/ns.c                   |  1 +
> >  hw/nvme/subsys.c               |  2 +
> >  include/block/block-hmp-cmds.h |  1 +
> >  6 files changed, 166 insertions(+)
> >
> > --
> > 2.50.1 (Apple Git-155)
> >
>