[PATCH 0/4] [RFC] hw/nvme: add basic live migration support

Alexander Mikhalitsyn posted 4 patches 1 month, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260217152517.271422-1-alexander@mihalicyn.com
Maintainers: Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Jesper Devantier <foss@defmacro.it>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
There is a newer version of this series
hw/nvme/ctrl.c              | 505 +++++++++++++++++++++++++++++++++---
hw/nvme/nvme.h              |   5 +
hw/nvme/trace-events        |   9 +
include/migration/vmstate.h |  21 ++
migration/vmstate-types.c   |  88 +++++++
5 files changed, 598 insertions(+), 30 deletions(-)
[PATCH 0/4] [RFC] hw/nvme: add basic live migration support
Posted by Alexander Mikhalitsyn 1 month, 3 weeks ago
From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Dear friends,

This patchset adds basic live migration support for
QEMU emulated NVMe device.

Implementation has some limitations:
- only one NVMe namespace is supported
- SMART counters are not preserved
- CMB is not supported
- PMR is not supported
- SPDM is not supported
- SR-IOV is not supported
- AERs are not fully supported

I believe this is something I can support in next patchset versions or
separately on-demand (when usecase appears). But I wanted to share this
first version as RFC to get some feedback on this in case if I'm approaching
it wrong.

Kind regards,
Alex

Alexander Mikhalitsyn (4):
  hw/nvme: add migration blockers for non-supported cases
  hw/nvme: split nvme_init_sq/nvme_init_cq into helpers
  migration: add VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_ALLOC
  hw/nvme: add basic live migration support

 hw/nvme/ctrl.c              | 505 +++++++++++++++++++++++++++++++++---
 hw/nvme/nvme.h              |   5 +
 hw/nvme/trace-events        |   9 +
 include/migration/vmstate.h |  21 ++
 migration/vmstate-types.c   |  88 +++++++
 5 files changed, 598 insertions(+), 30 deletions(-)

-- 
2.47.3
Re: [PATCH 0/4] [RFC] hw/nvme: add basic live migration support
Posted by Klaus Jensen 1 month, 1 week ago
On Feb 17 16:25, Alexander Mikhalitsyn wrote:
> From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> 
> Dear friends,
> 
> This patchset adds basic live migration support for
> QEMU emulated NVMe device.
> 
> Implementation has some limitations:
> - only one NVMe namespace is supported
> - SMART counters are not preserved
> - CMB is not supported
> - PMR is not supported
> - SPDM is not supported
> - SR-IOV is not supported
> - AERs are not fully supported
> 
> I believe this is something I can support in next patchset versions or
> separately on-demand (when usecase appears). But I wanted to share this
> first version as RFC to get some feedback on this in case if I'm approaching
> it wrong.
> 

Hi Alex,

Nice work!

As you have already identified, there are a lot of features that are
non-trivial to implement migration for. I am completely in favor of only
supporting migration on a very limited feature set (i.e., don't worry
about CMB, PMR, SPDM, SR-IOV, ZNS/FDP and so on). Focus on the bare
mandatory requirements. It would be preferable if the "is migration
possible?" test is an allowlist instead of a denylist. That makes sure
we don't add a feature down the road and forget to add it to the
denylist. I'm not 100% sure how to go about that at this point.

AERs are something we need to deal with. We should not drop events. I
don't think I have a problem with aborting enqueued AERs, but not the
events.

Finally, this at a minimum needs somekind of simple smoke test to catch
regressions. Preferably as part of the QEMU test suite itself, but if
that is hard to achieve, then I may be ok with an out-of-tree test that
maintainers can use.


Cheers,
Klaus
Re: [PATCH 0/4] [RFC] hw/nvme: add basic live migration support
Posted by Alexander Mikhalitsyn 4 weeks, 1 day ago
Am Fr., 27. Feb. 2026 um 10:59 Uhr schrieb Klaus Jensen <its@irrelevant.dk>:
>
> On Feb 17 16:25, Alexander Mikhalitsyn wrote:
> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> >
> > Dear friends,
> >
> > This patchset adds basic live migration support for
> > QEMU emulated NVMe device.
> >
> > Implementation has some limitations:
> > - only one NVMe namespace is supported
> > - SMART counters are not preserved
> > - CMB is not supported
> > - PMR is not supported
> > - SPDM is not supported
> > - SR-IOV is not supported
> > - AERs are not fully supported
> >
> > I believe this is something I can support in next patchset versions or
> > separately on-demand (when usecase appears). But I wanted to share this
> > first version as RFC to get some feedback on this in case if I'm approaching
> > it wrong.
> >
>
> Hi Alex,
>
> Nice work!
>
> As you have already identified, there are a lot of features that are
> non-trivial to implement migration for. I am completely in favor of only
> supporting migration on a very limited feature set (i.e., don't worry
> about CMB, PMR, SPDM, SR-IOV, ZNS/FDP and so on). Focus on the bare
> mandatory requirements. It would be preferable if the "is migration
> possible?" test is an allowlist instead of a denylist. That makes sure
> we don't add a feature down the road and forget to add it to the
> denylist. I'm not 100% sure how to go about that at this point.
>
> AERs are something we need to deal with. We should not drop events. I
> don't think I have a problem with aborting enqueued AERs, but not the
> events.
>
> Finally, this at a minimum needs somekind of simple smoke test to catch
> regressions. Preferably as part of the QEMU test suite itself, but if
> that is hard to achieve, then I may be ok with an out-of-tree test that
> maintainers can use.

Hi Klaus,

JFYI: I've just sent a version 4 of this patchset, it has all the requested
changes:
- AERs handling
- new autotest
- better feature filtering

Kind regards,
Alex

>
>
> Cheers,
> Klaus
Re: [PATCH 0/4] [RFC] hw/nvme: add basic live migration support
Posted by Alexander Mikhalitsyn 1 month, 1 week ago
Am Fr., 27. Feb. 2026 um 10:59 Uhr schrieb Klaus Jensen <its@irrelevant.dk>:
>
> On Feb 17 16:25, Alexander Mikhalitsyn wrote:
> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> >
> > Dear friends,
> >
> > This patchset adds basic live migration support for
> > QEMU emulated NVMe device.
> >
> > Implementation has some limitations:
> > - only one NVMe namespace is supported
> > - SMART counters are not preserved
> > - CMB is not supported
> > - PMR is not supported
> > - SPDM is not supported
> > - SR-IOV is not supported
> > - AERs are not fully supported
> >
> > I believe this is something I can support in next patchset versions or
> > separately on-demand (when usecase appears). But I wanted to share this
> > first version as RFC to get some feedback on this in case if I'm approaching
> > it wrong.
> >
>
> Hi Alex,

Hi Klaus,

>
> Nice work!

Thank you ;-)

>
> As you have already identified, there are a lot of features that are
> non-trivial to implement migration for. I am completely in favor of only
> supporting migration on a very limited feature set (i.e., don't worry
> about CMB, PMR, SPDM, SR-IOV, ZNS/FDP and so on). Focus on the bare
> mandatory requirements. It would be preferable if the "is migration
> possible?" test is an allowlist instead of a denylist. That makes sure
> we don't add a feature down the road and forget to add it to the
> denylist. I'm not 100% sure how to go about that at this point.

Yeah, I agree. I'll think about it.

>
> AERs are something we need to deal with. We should not drop events. I
> don't think I have a problem with aborting enqueued AERs, but not the
> events.

Yes, this is something I'm working on right now and this week I'll
send a -v2 with
graceful AERs handling.

>
> Finally, this at a minimum needs somekind of simple smoke test to catch
> regressions. Preferably as part of the QEMU test suite itself, but if
> that is hard to achieve, then I may be ok with an out-of-tree test that
> maintainers can use.

Sure, my current tests were simple, I was running fio in screen
session and then manual
suspend/resume:

time fio --name=nvme-verify \
    --filename=/dev/nvme0n1 \
    --size=5G \
    --rw=randwrite \
    --bs=4k \
    --iodepth=16 \
    --numjobs=1 \
    --direct=0 \
    --ioengine=io_uring \
    --verify=crc32c \
    --verify_fatal=1

Also, I had an idea in mind to ensure that Windows VM also survives
suspend/resume.

I'll take a look on how QEMU test suite is organized and try to come
up with something.

Thank you very much for looking into this stuff, Klaus!

Kind regards,
Alex

>
>
> Cheers,
> Klaus