[RFC 0/4] scsi: persistent reservation live migration

Stefan Hajnoczi posted 4 patches 3 weeks, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260113215320.566595-1-stefanha@redhat.com
Maintainers: Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>
There is a newer version of this series
include/hw/scsi/scsi.h   |  15 +-
include/scsi/constants.h |  21 +++
hw/core/machine.c        |   4 +-
hw/scsi/scsi-bus.c       |   3 +
hw/scsi/scsi-disk.c      |  53 +++++++-
hw/scsi/scsi-generic.c   | 286 +++++++++++++++++++++++++++++++++++++--
hw/scsi/trace-events     |   2 +
7 files changed, 369 insertions(+), 15 deletions(-)
[RFC 0/4] scsi: persistent reservation live migration
Posted by Stefan Hajnoczi 3 weeks, 4 days ago
Live migration does not work for SCSI Persistent Reservations acquired on
scsi-block devices. This patch series migrates the reservation key and
reservation type so that the destination QEMU can take over the persistent
reservation with the PREEMPT service action upon live migration.

The approach involves snooping PERSISTENT RESERVE OUT replies and tracking the
scsi-block device's current reservation key and reservation type. In most cases
this involves no additional SCSI commands. This approach isn't perfect: if
another machine modifies the reservation on the physical LUN, then QEMU's state
becomes stale. Persistent reservations are inherently cooperative, so this is
acceptable as long as real applications don't run into problems.

I am also working on a test suite called pr-tests that runs sg_persist(8)
commands across multiple machines in order to exercise various scenarios:
https://gitlab.com/stefanha/pr-tests

Stefan Hajnoczi (4):
  scsi: track SCSI reservation state for live migration
  scsi: generalize scsi_SG_IO_FROM_DEV() to scsi_SG_IO()
  scsi: add error reporting to scsi_SG_IO()
  scsi: save/load SCSI reservation state

 include/hw/scsi/scsi.h   |  15 +-
 include/scsi/constants.h |  21 +++
 hw/core/machine.c        |   4 +-
 hw/scsi/scsi-bus.c       |   3 +
 hw/scsi/scsi-disk.c      |  53 +++++++-
 hw/scsi/scsi-generic.c   | 286 +++++++++++++++++++++++++++++++++++++--
 hw/scsi/trace-events     |   2 +
 7 files changed, 369 insertions(+), 15 deletions(-)

-- 
2.52.0
Re: [RFC 0/4] scsi: persistent reservation live migration
Posted by Paolo Bonzini 3 weeks, 2 days ago
On 1/13/26 22:53, Stefan Hajnoczi wrote:
> Live migration does not work for SCSI Persistent Reservations acquired on
> scsi-block devices. This patch series migrates the reservation key and
> reservation type so that the destination QEMU can take over the persistent
> reservation with the PREEMPT service action upon live migration.
> 
> The approach involves snooping PERSISTENT RESERVE OUT replies and tracking the
> scsi-block device's current reservation key and reservation type. In most cases
> this involves no additional SCSI commands. This approach isn't perfect: if
> another machine modifies the reservation on the physical LUN, then QEMU's state
> becomes stale. Persistent reservations are inherently cooperative, so this is
> acceptable as long as real applications don't run into problems.

One issue is that this would not transfer reservations done from a 
previous invocation of the VM.  Are you assuming that the restarted VM 
won't assume to still have the reservation?  I think this is fine, but 
it has to be documented, or maybe QEMU could issue a PR IN command at 
startup?

> I am also working on a test suite called pr-tests that runs sg_persist(8)
> commands across multiple machines in order to exercise various scenarios:
> https://gitlab.com/stefanha/pr-tests

Thank you so much for that!

Paolo
Re: [RFC 0/4] scsi: persistent reservation live migration
Posted by Stefan Hajnoczi 3 weeks, 2 days ago
On Thu, Jan 15, 2026 at 06:11:37PM +0100, Paolo Bonzini wrote:
> On 1/13/26 22:53, Stefan Hajnoczi wrote:
> > Live migration does not work for SCSI Persistent Reservations acquired on
> > scsi-block devices. This patch series migrates the reservation key and
> > reservation type so that the destination QEMU can take over the persistent
> > reservation with the PREEMPT service action upon live migration.
> > 
> > The approach involves snooping PERSISTENT RESERVE OUT replies and tracking the
> > scsi-block device's current reservation key and reservation type. In most cases
> > this involves no additional SCSI commands. This approach isn't perfect: if
> > another machine modifies the reservation on the physical LUN, then QEMU's state
> > becomes stale. Persistent reservations are inherently cooperative, so this is
> > acceptable as long as real applications don't run into problems.
> 
> One issue is that this would not transfer reservations done from a previous
> invocation of the VM.  Are you assuming that the restarted VM won't assume
> to still have the reservation?  I think this is fine, but it has to be
> documented, or maybe QEMU could issue a PR IN command at startup?

Good point. The reason for this limitation is that I don't see a
reliable way to detect reservation keys and reservations that belong to
a guest. This is why the patch series uses the snooping approach.

The basic READ KEYS and READ RESERVATION service actions for PERSISTENT
RESERVATION IN only report the list of reservation keys that have been
registered and the key of the current reservation holder. There is no
way of tying that information back to the guest or even the host. It
would be necessary to know the guest's unique reservation key, but that
can be chosen by the guest at runtime. In addition, keys are not
guaranteed to be unique so there is no way to tell whether this host is
actually the reservation holder without potentially destructive probing
(e.g. attempting a command and seeing if it results in a RESERVATION
CONFLICT).

The optional READ FULL STATUS service action improves the situation by
identifying the initiator together with the reservation holder's key,
but it is not universally available. The hardware I'm testing on doesn't
implement this service action.

Therefore the assumption is that the guest configures persistent
reservations and if it is shut down, it reclaims the reservation while
starting up again. It also means that an administrator cannot configure
persistent reservations outside the guest and expect live migration to
move those persistent reservations with the guest.

The clean way to handle persistent reservation migration is using Fibre
Channel N_Port ID Virtualization (NPIV) so that each guest is a distinct
initiator that can migrate to another host without messing with the
persistent reservation state. NPIV requires the storage administrator to
set up the initiators and their zoning - something that few users go
through the trouble of doing.

Having said all this, if someone knows about a better way or I'm wrong
about how this works, please let me know!

Stefan
Re: [RFC 0/4] scsi: persistent reservation live migration
Posted by Peter Krempa 3 weeks, 2 days ago
On Tue, Jan 13, 2026 at 16:53:15 -0500, Stefan Hajnoczi wrote:
> Live migration does not work for SCSI Persistent Reservations acquired on
> scsi-block devices. This patch series migrates the reservation key and
> reservation type so that the destination QEMU can take over the persistent
> reservation with the PREEMPT service action upon live migration.
> 
> The approach involves snooping PERSISTENT RESERVE OUT replies and tracking the
> scsi-block device's current reservation key and reservation type. In most cases
> this involves no additional SCSI commands. This approach isn't perfect: if
> another machine modifies the reservation on the physical LUN, then QEMU's state
> becomes stale. Persistent reservations are inherently cooperative, so this is
> acceptable as long as real applications don't run into problems.
> 
> I am also working on a test suite called pr-tests that runs sg_persist(8)
> commands across multiple machines in order to exercise various scenarios:
> https://gitlab.com/stefanha/pr-tests

I've also prepared libvirt RFC patches adding support for the feature:

https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/KBZDAIQWFILAC4USJY3C3TDPYHI6K5WK/