[PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"

Stefano Garzarella posted 1 patch 9 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230705071523.15496-1-sgarzare@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>, "Michael S. Tsirkin" <mst@redhat.com>
include/hw/scsi/scsi.h |  1 -
hw/scsi/scsi-bus.c     | 18 ------------------
hw/scsi/virtio-scsi.c  |  2 --
3 files changed, 21 deletions(-)
[PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 3 weeks ago
This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.

That commit causes several problems in Linux as described in the BZ.
In particular, after a while, other devices on the bus are no longer
usable even if those devices are not affected by the hotunplug.
This may be a problem in Linux, but we have not been able to identify
it so far. So better to revert this patch until we find a solution.

Also, Oracle, which initially proposed this patch for a problem with
Solaris, seems to have already reversed it downstream:
    https://linux.oracle.com/errata/ELSA-2023-12065.html

Suggested-by: Thomas Huth <thuth@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Cc: qemu-stable@nongnu.org
Cc: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/hw/scsi/scsi.h |  1 -
 hw/scsi/scsi-bus.c     | 18 ------------------
 hw/scsi/virtio-scsi.c  |  2 --
 3 files changed, 21 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index e2bb1a2fbf..7c8adf10b1 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
                                       BlockdevOnError rerror,
                                       BlockdevOnError werror,
                                       const char *serial, Error **errp);
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
 void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
 
 SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f80f4cb4fc..42a915f8b7 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
     return (sense.asc << 8) | sense.ascq;
 }
 
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
-{
-    int prec1, prec2;
-    if (sense.key != UNIT_ATTENTION) {
-        return;
-    }
-
-    /*
-     * Override a pre-existing unit attention condition, except for a more
-     * important reset condition.
-     */
-    prec1 = scsi_ua_precedence(bus->unit_attention);
-    prec2 = scsi_ua_precedence(sense);
-    if (prec2 < prec1) {
-        bus->unit_attention = sense;
-    }
-}
-
 void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
 {
     int prec1, prec2;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..1f56607100 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
 
         virtio_scsi_acquire(s);
         virtio_scsi_push_event(s, &info);
-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
         virtio_scsi_release(s);
     }
 }
@@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
         virtio_scsi_acquire(s);
         virtio_scsi_push_event(s, &info);
-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
         virtio_scsi_release(s);
     }
 }
-- 
2.41.0
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Michael S. Tsirkin 8 months, 4 weeks ago
On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> 
> That commit causes several problems in Linux as described in the BZ.
> In particular, after a while, other devices on the bus are no longer
> usable even if those devices are not affected by the hotunplug.
> This may be a problem in Linux, but we have not been able to identify
> it so far. So better to revert this patch until we find a solution.
> 
> Also, Oracle, which initially proposed this patch for a problem with
> Solaris, seems to have already reversed it downstream:
>     https://linux.oracle.com/errata/ELSA-2023-12065.html
> 
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> Cc: qemu-stable@nongnu.org
> Cc: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

scsi maintainers, what should be done about this patch?
we don't want a regression in the release ...
revert for now and think what's the right thing to do is
after the release?


> ---
>  include/hw/scsi/scsi.h |  1 -
>  hw/scsi/scsi-bus.c     | 18 ------------------
>  hw/scsi/virtio-scsi.c  |  2 --
>  3 files changed, 21 deletions(-)
> 
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index e2bb1a2fbf..7c8adf10b1 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>                                        BlockdevOnError rerror,
>                                        BlockdevOnError werror,
>                                        const char *serial, Error **errp);
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>  
>  SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index f80f4cb4fc..42a915f8b7 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>      return (sense.asc << 8) | sense.ascq;
>  }
>  
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> -{
> -    int prec1, prec2;
> -    if (sense.key != UNIT_ATTENTION) {
> -        return;
> -    }
> -
> -    /*
> -     * Override a pre-existing unit attention condition, except for a more
> -     * important reset condition.
> -     */
> -    prec1 = scsi_ua_precedence(bus->unit_attention);
> -    prec2 = scsi_ua_precedence(sense);
> -    if (prec2 < prec1) {
> -        bus->unit_attention = sense;
> -    }
> -}
> -
>  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>  {
>      int prec1, prec2;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..1f56607100 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> -- 
> 2.41.0
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 8 months, 4 weeks ago
Hi Michael,

On Thu, Aug 3, 2023 at 10:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> > This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> >
> > That commit causes several problems in Linux as described in the BZ.
> > In particular, after a while, other devices on the bus are no longer
> > usable even if those devices are not affected by the hotunplug.
> > This may be a problem in Linux, but we have not been able to identify
> > it so far. So better to revert this patch until we find a solution.
> >
> > Also, Oracle, which initially proposed this patch for a problem with
> > Solaris, seems to have already reversed it downstream:
> >     https://linux.oracle.com/errata/ELSA-2023-12065.html
> >
> > Suggested-by: Thomas Huth <thuth@redhat.com>
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> > Cc: qemu-stable@nongnu.org
> > Cc: Mark Kanda <mark.kanda@oracle.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> scsi maintainers, what should be done about this patch?
> we don't want a regression in the release ...
> revert for now and think what's the right thing to do is
> after the release?

We found the issue and fixed it in this release with commit 9472083e64
("scsi: fetch unit attention when creating the request").

So we don't need to revert it anymore.

Thanks,
Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Michael S. Tsirkin 8 months, 4 weeks ago
On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> 
> That commit causes several problems in Linux as described in the BZ.
> In particular, after a while, other devices on the bus are no longer
> usable even if those devices are not affected by the hotunplug.
> This may be a problem in Linux, but we have not been able to identify
> it so far. So better to revert this patch until we find a solution.
> 
> Also, Oracle, which initially proposed this patch for a problem with
> Solaris, seems to have already reversed it downstream:
>     https://linux.oracle.com/errata/ELSA-2023-12065.html
> 
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> Cc: qemu-stable@nongnu.org
> Cc: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

OK guys we are reverting this?

> ---
>  include/hw/scsi/scsi.h |  1 -
>  hw/scsi/scsi-bus.c     | 18 ------------------
>  hw/scsi/virtio-scsi.c  |  2 --
>  3 files changed, 21 deletions(-)
> 
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index e2bb1a2fbf..7c8adf10b1 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>                                        BlockdevOnError rerror,
>                                        BlockdevOnError werror,
>                                        const char *serial, Error **errp);
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>  
>  SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index f80f4cb4fc..42a915f8b7 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>      return (sense.asc << 8) | sense.ascq;
>  }
>  
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> -{
> -    int prec1, prec2;
> -    if (sense.key != UNIT_ATTENTION) {
> -        return;
> -    }
> -
> -    /*
> -     * Override a pre-existing unit attention condition, except for a more
> -     * important reset condition.
> -     */
> -    prec1 = scsi_ua_precedence(bus->unit_attention);
> -    prec2 = scsi_ua_precedence(sense);
> -    if (prec2 < prec1) {
> -        bus->unit_attention = sense;
> -    }
> -}
> -
>  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>  {
>      int prec1, prec2;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..1f56607100 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> -- 
> 2.41.0
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Michael S. Tsirkin 8 months, 4 weeks ago
On Thu, Aug 03, 2023 at 03:36:44PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> > This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> > 
> > That commit causes several problems in Linux as described in the BZ.
> > In particular, after a while, other devices on the bus are no longer
> > usable even if those devices are not affected by the hotunplug.
> > This may be a problem in Linux, but we have not been able to identify
> > it so far. So better to revert this patch until we find a solution.
> > 
> > Also, Oracle, which initially proposed this patch for a problem with
> > Solaris, seems to have already reversed it downstream:
> >     https://linux.oracle.com/errata/ELSA-2023-12065.html
> > 
> > Suggested-by: Thomas Huth <thuth@redhat.com>
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> > Cc: qemu-stable@nongnu.org
> > Cc: Mark Kanda <mark.kanda@oracle.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> OK guys we are reverting this?


ignore me pls, replied to a wrong patch.

> > ---
> >  include/hw/scsi/scsi.h |  1 -
> >  hw/scsi/scsi-bus.c     | 18 ------------------
> >  hw/scsi/virtio-scsi.c  |  2 --
> >  3 files changed, 21 deletions(-)
> > 
> > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> > index e2bb1a2fbf..7c8adf10b1 100644
> > --- a/include/hw/scsi/scsi.h
> > +++ b/include/hw/scsi/scsi.h
> > @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
> >                                        BlockdevOnError rerror,
> >                                        BlockdevOnError werror,
> >                                        const char *serial, Error **errp);
> > -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
> >  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
> >  
> >  SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > index f80f4cb4fc..42a915f8b7 100644
> > --- a/hw/scsi/scsi-bus.c
> > +++ b/hw/scsi/scsi-bus.c
> > @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
> >      return (sense.asc << 8) | sense.ascq;
> >  }
> >  
> > -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> > -{
> > -    int prec1, prec2;
> > -    if (sense.key != UNIT_ATTENTION) {
> > -        return;
> > -    }
> > -
> > -    /*
> > -     * Override a pre-existing unit attention condition, except for a more
> > -     * important reset condition.
> > -     */
> > -    prec1 = scsi_ua_precedence(bus->unit_attention);
> > -    prec2 = scsi_ua_precedence(sense);
> > -    if (prec2 < prec1) {
> > -        bus->unit_attention = sense;
> > -    }
> > -}
> > -
> >  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
> >  {
> >      int prec1, prec2;
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 45b95ea070..1f56607100 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >  
> >          virtio_scsi_acquire(s);
> >          virtio_scsi_push_event(s, &info);
> > -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> >          virtio_scsi_release(s);
> >      }
> >  }
> > @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> >          virtio_scsi_acquire(s);
> >          virtio_scsi_push_event(s, &info);
> > -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> >          virtio_scsi_release(s);
> >      }
> >  }
> > -- 
> > 2.41.0
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 3 weeks ago
CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
since I found a few things in the virtio-scsi driver...

FYI we have seen that Linux has problems with a QEMU patch for the
virtio-scsi device (details at the bottom of this email in the revert
commit message and BZ).


This is what I found when I looked at the Linux code:

In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
(sshdr->asc == 0x3f && sshdr->ascq == 0x0e).

When `sdev_target->expecting_lun_change = 1` is set and we call
scsi_check_sense(), for example to check the next UNIT ATTENTION, it
will return NEEDS_RETRY, that I think will cause the issues we are
seeing.

`sdev_target->expecting_lun_change` is reset only in
scsi_decide_disposition() when `REPORT_LUNS` command returns with
SAM_STAT_GOOD.
That command is issued in scsi_report_lun_scan() called by
__scsi_scan_target(), called for example by scsi_scan_target(),
scsi_scan_host(), etc.

So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
send also the UNIT ATTENTION.

In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
(hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
command to the device. This does not happen for
VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
UNIT ATTENTION from the hotunplug in QEMU, everything works well.

So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index bd5633667d01..c57658a63097 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -291,6 +291,7 @@ static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
                 }
                 break;
         case VIRTIO_SCSI_EVT_RESET_REMOVED:
+               scsi_scan_host(shost);
                 sdev = scsi_device_lookup(shost, 0, target, lun);
                 if (sdev) {
                         scsi_remove_device(sdev);

This somehow helps, now linux only breaks if the plug/unplug frequency
is really high. If I put a 5 second sleep between plug/unplug events, it
doesn't break (at least for the duration of my test which has been
running for about 30 minutes, before it used to break after about a
minute).

Another thing I noticed is that in QEMU maybe we should set the UNIT
ATTENTION first and then send the event on the virtqueue, because the
scan should happen after the unit attention, but I don't know if in any
case the unit attention is processed before the virtqueue.

I mean something like this:

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..13db40f4f3 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
          };

          virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, &info);
          scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+        virtio_scsi_push_event(s, &info);
          virtio_scsi_release(s);
      }
  }
@@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,

      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
          virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, &info);
          scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+        virtio_scsi_push_event(s, &info);
          virtio_scsi_release(s);
      }
  }

At this point I think the problem is on the handling of the
VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
to be enough when the event rate is very high.

I don't know if along with this fix, we also need to limit the rate in
QEMU somehow.

Sorry for the length of this email, but I'm not familiar with SCSI and
wanted some suggestions on how to proceed.

Paolo, Stefan, Linux SCSI maintainers, any suggestion?


Thanks,
Stefano

On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
>This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>
>That commit causes several problems in Linux as described in the BZ.
>In particular, after a while, other devices on the bus are no longer
>usable even if those devices are not affected by the hotunplug.
>This may be a problem in Linux, but we have not been able to identify
>it so far. So better to revert this patch until we find a solution.
>
>Also, Oracle, which initially proposed this patch for a problem with
>Solaris, seems to have already reversed it downstream:
>    https://linux.oracle.com/errata/ELSA-2023-12065.html
>
>Suggested-by: Thomas Huth <thuth@redhat.com>
>Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
>Cc: qemu-stable@nongnu.org
>Cc: Mark Kanda <mark.kanda@oracle.com>
>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>---
> include/hw/scsi/scsi.h |  1 -
> hw/scsi/scsi-bus.c     | 18 ------------------
> hw/scsi/virtio-scsi.c  |  2 --
> 3 files changed, 21 deletions(-)
>
>diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
>index e2bb1a2fbf..7c8adf10b1 100644
>--- a/include/hw/scsi/scsi.h
>+++ b/include/hw/scsi/scsi.h
>@@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>                                       BlockdevOnError rerror,
>                                       BlockdevOnError werror,
>                                       const char *serial, Error **errp);
>-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
> void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>
> SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
>diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>index f80f4cb4fc..42a915f8b7 100644
>--- a/hw/scsi/scsi-bus.c
>+++ b/hw/scsi/scsi-bus.c
>@@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>     return (sense.asc << 8) | sense.ascq;
> }
>
>-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
>-{
>-    int prec1, prec2;
>-    if (sense.key != UNIT_ATTENTION) {
>-        return;
>-    }
>-
>-    /*
>-     * Override a pre-existing unit attention condition, except for a more
>-     * important reset condition.
>-     */
>-    prec1 = scsi_ua_precedence(bus->unit_attention);
>-    prec2 = scsi_ua_precedence(sense);
>-    if (prec2 < prec1) {
>-        bus->unit_attention = sense;
>-    }
>-}
>-
> void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
> {
>     int prec1, prec2;
>diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>index 45b95ea070..1f56607100 100644
>--- a/hw/scsi/virtio-scsi.c
>+++ b/hw/scsi/virtio-scsi.c
>@@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>
>         virtio_scsi_acquire(s);
>         virtio_scsi_push_event(s, &info);
>-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>         virtio_scsi_release(s);
>     }
> }
>@@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>         virtio_scsi_acquire(s);
>         virtio_scsi_push_event(s, &info);
>-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>         virtio_scsi_release(s);
>     }
> }
>-- 
>2.41.0
>
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Paolo Bonzini 9 months, 2 weeks ago
On 7/11/23 19:06, Stefano Garzarella wrote:
> CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
> since I found a few things in the virtio-scsi driver...
> 
> FYI we have seen that Linux has problems with a QEMU patch for the
> virtio-scsi device (details at the bottom of this email in the revert
> commit message and BZ).
> 
> 
> This is what I found when I looked at the Linux code:
> 
> In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
> scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
> 1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
> (sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
> 
> When `sdev_target->expecting_lun_change = 1` is set and we call
> scsi_check_sense(), for example to check the next UNIT ATTENTION, it
> will return NEEDS_RETRY, that I think will cause the issues we are
> seeing.
> 
> `sdev_target->expecting_lun_change` is reset only in
> scsi_decide_disposition() when `REPORT_LUNS` command returns with
> SAM_STAT_GOOD.
> That command is issued in scsi_report_lun_scan() called by
> __scsi_scan_target(), called for example by scsi_scan_target(),
> scsi_scan_host(), etc.
> 
> So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
> and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
> send also the UNIT ATTENTION.
> 
> In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
> (hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
> will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
> command to the device. This does not happen for
> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
> UNIT ATTENTION from the hotunplug in QEMU, everything works well.
> 
> So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:

The point of having the event queue is to avoid expensive scans of the 
entire host, so I don't think this is the right thing to do.

On the Linux side, one change we might do is to remove the printk for 
adapters that do process hotplug/hotunplug, using a new flag in 
scsi_host_template.  There are several callers of scsi_add_device() and 
scsi_remove_device() in adapter code, so at least these should not issue 
the printk:

drivers/scsi/aacraid/commsup.c
drivers/scsi/arcmsr/arcmsr_hba.c
drivers/scsi/esas2r/esas2r_main.c
drivers/scsi/hpsa.c
drivers/scsi/ipr.c
drivers/scsi/megaraid/megaraid_sas_base.c
drivers/scsi/mvumi.c
drivers/scsi/pmcraid.c
drivers/scsi/smartpqi/smartpqi_init.c
drivers/scsi/virtio_scsi.c
drivers/scsi/vmw_pvscsi.c
drivers/scsi/xen-scsifront.c

Paolo

> Another thing I noticed is that in QEMU maybe we should set the UNIT
> ATTENTION first and then send the event on the virtqueue, because the
> scan should happen after the unit attention, but I don't know if in any
> case the unit attention is processed before the virtqueue.
> 
> I mean something like this:
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..13db40f4f3 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>           };
> 
>           virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>           virtio_scsi_release(s);
>       }
>   }
> @@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
> 
>       if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>           virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>           virtio_scsi_release(s);
>       }
>   }
> 
> At this point I think the problem is on the handling of the
> VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
> somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
> to be enough when the event rate is very high.
> 
> I don't know if along with this fix, we also need to limit the rate in
> QEMU somehow.
> 
> Sorry for the length of this email, but I'm not familiar with SCSI and
> wanted some suggestions on how to proceed.
> 
> Paolo, Stefan, Linux SCSI maintainers, any suggestion?
> 
> 
> Thanks,
> Stefano
> 
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
>> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>>
>> That commit causes several problems in Linux as described in the BZ.
>> In particular, after a while, other devices on the bus are no longer
>> usable even if those devices are not affected by the hotunplug.
>> This may be a problem in Linux, but we have not been able to identify
>> it so far. So better to revert this patch until we find a solution.
>>
>> Also, Oracle, which initially proposed this patch for a problem with
>> Solaris, seems to have already reversed it downstream:
>>    https://linux.oracle.com/errata/ELSA-2023-12065.html
>>
>> Suggested-by: Thomas Huth <thuth@redhat.com>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
>> Cc: qemu-stable@nongnu.org
>> Cc: Mark Kanda <mark.kanda@oracle.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>> include/hw/scsi/scsi.h |  1 -
>> hw/scsi/scsi-bus.c     | 18 ------------------
>> hw/scsi/virtio-scsi.c  |  2 --
>> 3 files changed, 21 deletions(-)
>>
>> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
>> index e2bb1a2fbf..7c8adf10b1 100644
>> --- a/include/hw/scsi/scsi.h
>> +++ b/include/hw/scsi/scsi.h
>> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus 
>> *bus, BlockBackend *blk,
>>                                       BlockdevOnError rerror,
>>                                       BlockdevOnError werror,
>>                                       const char *serial, Error **errp);
>> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>> void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>>
>> SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
>> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>> index f80f4cb4fc..42a915f8b7 100644
>> --- a/hw/scsi/scsi-bus.c
>> +++ b/hw/scsi/scsi-bus.c
>> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>>     return (sense.asc << 8) | sense.ascq;
>> }
>>
>> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
>> -{
>> -    int prec1, prec2;
>> -    if (sense.key != UNIT_ATTENTION) {
>> -        return;
>> -    }
>> -
>> -    /*
>> -     * Override a pre-existing unit attention condition, except for a 
>> more
>> -     * important reset condition.
>> -     */
>> -    prec1 = scsi_ua_precedence(bus->unit_attention);
>> -    prec2 = scsi_ua_precedence(sense);
>> -    if (prec2 < prec1) {
>> -        bus->unit_attention = sense;
>> -    }
>> -}
>> -
>> void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>> {
>>     int prec1, prec2;
>> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>> index 45b95ea070..1f56607100 100644
>> --- a/hw/scsi/virtio-scsi.c
>> +++ b/hw/scsi/virtio-scsi.c
>> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler 
>> *hotplug_dev, DeviceState *dev,
>>
>>         virtio_scsi_acquire(s);
>>         virtio_scsi_push_event(s, &info);
>> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>>         virtio_scsi_release(s);
>>     }
>> }
>> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler 
>> *hotplug_dev, DeviceState *dev,
>>     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>>         virtio_scsi_acquire(s);
>>         virtio_scsi_push_event(s, &info);
>> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>>         virtio_scsi_release(s);
>>     }
>> }
>> -- 
>> 2.41.0
>>
> 
> 


Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 2 weeks ago
On Wed, Jul 12, 2023 at 10:35:48AM +0200, Paolo Bonzini wrote:
>On 7/11/23 19:06, Stefano Garzarella wrote:
>>CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
>>since I found a few things in the virtio-scsi driver...
>>
>>FYI we have seen that Linux has problems with a QEMU patch for the
>>virtio-scsi device (details at the bottom of this email in the revert
>>commit message and BZ).
>>
>>
>>This is what I found when I looked at the Linux code:
>>
>>In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
>>scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
>>1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
>>(sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
>>
>>When `sdev_target->expecting_lun_change = 1` is set and we call
>>scsi_check_sense(), for example to check the next UNIT ATTENTION, it
>>will return NEEDS_RETRY, that I think will cause the issues we are
>>seeing.
>>
>>`sdev_target->expecting_lun_change` is reset only in
>>scsi_decide_disposition() when `REPORT_LUNS` command returns with
>>SAM_STAT_GOOD.
>>That command is issued in scsi_report_lun_scan() called by
>>__scsi_scan_target(), called for example by scsi_scan_target(),
>>scsi_scan_host(), etc.
>>
>>So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
>>and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
>>send also the UNIT ATTENTION.
>>
>>In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
>>(hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
>>will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
>>command to the device. This does not happen for
>>VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
>>UNIT ATTENTION from the hotunplug in QEMU, everything works well.
>>
>>So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:
>
>The point of having the event queue is to avoid expensive scans of the 
>entire host, so I don't think this is the right thing to do.

I see. I'll follow your advice for QEMU changes ;-)

>
>On the Linux side, one change we might do is to remove the printk for 

Do you mean the "LUN assignments on this target have changed.
The Linux SCSI layer does not automatically remap LUN assignments."?

>adapters that do process hotplug/hotunplug, using a new flag in 
>scsi_host_template.  There are several callers of scsi_add_device() and 
>scsi_remove_device() in adapter code, so at least these should not 
>issue the printk:

I guess it makes sense since that message could be confusing in this
case. I'll try to send a patch for that.

>
>drivers/scsi/aacraid/commsup.c
>drivers/scsi/arcmsr/arcmsr_hba.c
>drivers/scsi/esas2r/esas2r_main.c
>drivers/scsi/hpsa.c
>drivers/scsi/ipr.c
>drivers/scsi/megaraid/megaraid_sas_base.c
>drivers/scsi/mvumi.c
>drivers/scsi/pmcraid.c
>drivers/scsi/smartpqi/smartpqi_init.c
>drivers/scsi/virtio_scsi.c
>drivers/scsi/vmw_pvscsi.c
>drivers/scsi/xen-scsifront.c
>

Thanks,
Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Mike Christie 9 months, 3 weeks ago
What was the issue you are seeing?

Was it something like you get the UA. We retry then on one of the
retries the sense is not setup correctly, so the scsi error handler
runs? That fails and the device goes offline?

If you turn on scsi debugging you would see:


[  335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed
[  335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00
[  335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  335.451447] scsi host0: scsi_eh_0: waking up 0/2/2
[  335.451453] scsi host0: Total of 2 commands on 1 devices require eh work
[  335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense


I don't know the qemu scsi code well, but I scanned the code for my co-worker
and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it.

How is locking done? when it is a bus level UA but there are multiple devices
on the bus?

Is it possible, devA is clearing the sense on devB. For example, thread1 for
devA  is doing scsi_clear_unit_attention but then thread2 for devB has seen that
bus->unit_attention so it set req ops to reqops_unit_attention. But when
we run reqops_unit_attention.send_command scsi_unit_attention does not see
req->bus->unit_attention set anymore so we get a CC with no sense.

If the linux kernel scsi layer sees a CC with no sense then we fire the SCSI
error handler like seen above in the logs.


On 7/11/23 12:06 PM, Stefano Garzarella wrote:
> CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
> since I found a few things in the virtio-scsi driver...
> 
> FYI we have seen that Linux has problems with a QEMU patch for the
> virtio-scsi device (details at the bottom of this email in the revert
> commit message and BZ).
> 
> 
> This is what I found when I looked at the Linux code:
> 
> In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
> scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
> 1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
> (sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
> 
> When `sdev_target->expecting_lun_change = 1` is set and we call
> scsi_check_sense(), for example to check the next UNIT ATTENTION, it
> will return NEEDS_RETRY, that I think will cause the issues we are
> seeing.
> 
> `sdev_target->expecting_lun_change` is reset only in
> scsi_decide_disposition() when `REPORT_LUNS` command returns with
> SAM_STAT_GOOD.
> That command is issued in scsi_report_lun_scan() called by
> __scsi_scan_target(), called for example by scsi_scan_target(),
> scsi_scan_host(), etc.
> 
> So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
> and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
> send also the UNIT ATTENTION.
> 
> In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
> (hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
> will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
> command to the device. This does not happen for
> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
> UNIT ATTENTION from the hotunplug in QEMU, everything works well.
> 
> So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index bd5633667d01..c57658a63097 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -291,6 +291,7 @@ static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
>                 }
>                 break;
>         case VIRTIO_SCSI_EVT_RESET_REMOVED:
> +               scsi_scan_host(shost);
>                 sdev = scsi_device_lookup(shost, 0, target, lun);
>                 if (sdev) {
>                         scsi_remove_device(sdev);
> 
> This somehow helps, now linux only breaks if the plug/unplug frequency
> is really high. If I put a 5 second sleep between plug/unplug events, it
> doesn't break (at least for the duration of my test which has been
> running for about 30 minutes, before it used to break after about a
> minute).
> 
> Another thing I noticed is that in QEMU maybe we should set the UNIT
> ATTENTION first and then send the event on the virtqueue, because the
> scan should happen after the unit attention, but I don't know if in any
> case the unit attention is processed before the virtqueue.
> 
> I mean something like this:
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..13db40f4f3 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          };
> 
>          virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>          scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>          virtio_scsi_release(s);
>      }
>  }
> @@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> 
>      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>          virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>          scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>          virtio_scsi_release(s);
>      }
>  }
> 
> At this point I think the problem is on the handling of the
> VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
> somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
> to be enough when the event rate is very high.
> 
> I don't know if along with this fix, we also need to limit the rate in
> QEMU somehow.
> 
> Sorry for the length of this email, but I'm not familiar with SCSI and
> wanted some suggestions on how to proceed.
> 
> Paolo, Stefan, Linux SCSI maintainers, any suggestion?
> 
> 
> Thanks,
> Stefano
> 
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
>> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>>
>> That commit causes several problems in Linux as described in the BZ.
>> In particular, after a while, other devices on the bus are no longer
>> usable even if those devices are not affected by the hotunplug.
>> This may be a problem in Linux, but we have not been able to identify
>> it so far. So better to revert this patch until we find a solution.
>>
>> Also, Oracle, which initially proposed this patch for a problem with
>> Solaris, seems to have already reversed it downstream:
>>    https://linux.oracle.com/errata/ELSA-2023-12065.html
>>
>> Suggested-by: Thomas Huth <thuth@redhat.com>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
>> Cc: qemu-stable@nongnu.org
>> Cc: Mark Kanda <mark.kanda@oracle.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>> include/hw/scsi/scsi.h |  1 -
>> hw/scsi/scsi-bus.c     | 18 ------------------
>> hw/scsi/virtio-scsi.c  |  2 --
>> 3 files changed, 21 deletions(-)
>>
>> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
>> index e2bb1a2fbf..7c8adf10b1 100644
>> --- a/include/hw/scsi/scsi.h
>> +++ b/include/hw/scsi/scsi.h
>> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>>                                       BlockdevOnError rerror,
>>                                       BlockdevOnError werror,
>>                                       const char *serial, Error **errp);
>> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>> void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>>
>> SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
>> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>> index f80f4cb4fc..42a915f8b7 100644
>> --- a/hw/scsi/scsi-bus.c
>> +++ b/hw/scsi/scsi-bus.c
>> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>>     return (sense.asc << 8) | sense.ascq;
>> }
>>
>> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
>> -{
>> -    int prec1, prec2;
>> -    if (sense.key != UNIT_ATTENTION) {
>> -        return;
>> -    }
>> -
>> -    /*
>> -     * Override a pre-existing unit attention condition, except for a more
>> -     * important reset condition.
>> -     */
>> -    prec1 = scsi_ua_precedence(bus->unit_attention);
>> -    prec2 = scsi_ua_precedence(sense);
>> -    if (prec2 < prec1) {
>> -        bus->unit_attention = sense;
>> -    }
>> -}
>> -
>> void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>> {
>>     int prec1, prec2;
>> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>> index 45b95ea070..1f56607100 100644
>> --- a/hw/scsi/virtio-scsi.c
>> +++ b/hw/scsi/virtio-scsi.c
>> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>
>>         virtio_scsi_acquire(s);
>>         virtio_scsi_push_event(s, &info);
>> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>>         virtio_scsi_release(s);
>>     }
>> }
>> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>>         virtio_scsi_acquire(s);
>>         virtio_scsi_push_event(s, &info);
>> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>>         virtio_scsi_release(s);
>>     }
>> }
>> -- 
>> 2.41.0
>>
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Paolo Bonzini 9 months, 2 weeks ago
On 7/11/23 22:21, Mike Christie wrote:
> What was the issue you are seeing?
> 
> Was it something like you get the UA. We retry then on one of the
> retries the sense is not setup correctly, so the scsi error handler
> runs? That fails and the device goes offline?
> 
> If you turn on scsi debugging you would see:
> 
> 
> [  335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed
> [  335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> [  335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00
> [  335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  335.451447] scsi host0: scsi_eh_0: waking up 0/2/2
> [  335.451453] scsi host0: Total of 2 commands on 1 devices require eh work
> [  335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense

Does this log come from internal discussions within Oracle?

> I don't know the qemu scsi code well, but I scanned the code for my co-worker
> and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it.
> 
> How is locking done? when it is a bus level UA but there are multiple devices
> on the bus?

No locking should be necessary, the code is single threaded.  However, 
what can happen is that two consecutive calls to 
virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and 
then the second virtio_scsi_handle_cmd_req_submit finds no unit 
attention (see the loop in virtio_scsi_handle_cmd_vq).  That can 
definitely explain the log above.

Paolo
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 2 weeks ago
On Wed, Jul 12, 2023 at 10:06:56AM +0200, Paolo Bonzini wrote:
>On 7/11/23 22:21, Mike Christie wrote:
>>What was the issue you are seeing?
>>
>>Was it something like you get the UA. We retry then on one of the
>>retries the sense is not setup correctly, so the scsi error handler
>>runs? That fails and the device goes offline?
>>
>>If you turn on scsi debugging you would see:
>>
>>
>>[  335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed
>>[  335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>[  335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00
>>[  335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>[  335.451447] scsi host0: scsi_eh_0: waking up 0/2/2
>>[  335.451453] scsi host0: Total of 2 commands on 1 devices require eh work
>>[  335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense
>
>Does this log come from internal discussions within Oracle?
>
>>I don't know the qemu scsi code well, but I scanned the code for my co-worker
>>and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it.
>>
>>How is locking done? when it is a bus level UA but there are multiple devices
>>on the bus?
>
>No locking should be necessary, the code is single threaded.  However, 
>what can happen is that two consecutive calls to 
>virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and 
>then the second virtio_scsi_handle_cmd_req_submit finds no unit 
>attention (see the loop in virtio_scsi_handle_cmd_vq).  That can 
>definitely explain the log above.

Yes, this seems to be the case!
Thank you both for the help!

Following Paolo's advice, I'm preparing a series for QEMU to solve the
problem!

Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 2 weeks ago
On Wed, Jul 12, 2023 at 12:14 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 12, 2023 at 10:06:56AM +0200, Paolo Bonzini wrote:
> >On 7/11/23 22:21, Mike Christie wrote:
> >>What was the issue you are seeing?
> >>
> >>Was it something like you get the UA. We retry then on one of the
> >>retries the sense is not setup correctly, so the scsi error handler
> >>runs? That fails and the device goes offline?
> >>
> >>If you turn on scsi debugging you would see:
> >>
> >>
> >>[  335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed
> >>[  335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> >>[  335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00
> >>[  335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>[  335.451447] scsi host0: scsi_eh_0: waking up 0/2/2
> >>[  335.451453] scsi host0: Total of 2 commands on 1 devices require eh work
> >>[  335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense
> >
> >Does this log come from internal discussions within Oracle?
> >
> >>I don't know the qemu scsi code well, but I scanned the code for my co-worker
> >>and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it.
> >>
> >>How is locking done? when it is a bus level UA but there are multiple devices
> >>on the bus?
> >
> >No locking should be necessary, the code is single threaded.  However,
> >what can happen is that two consecutive calls to
> >virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and
> >then the second virtio_scsi_handle_cmd_req_submit finds no unit
> >attention (see the loop in virtio_scsi_handle_cmd_vq).  That can
> >definitely explain the log above.
>
> Yes, this seems to be the case!
> Thank you both for the help!
>
> Following Paolo's advice, I'm preparing a series for QEMU to solve the
> problem!

Series posted here:
https://lore.kernel.org/qemu-devel/20230712134352.118655-1-sgarzare@redhat.com/

Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefan Hajnoczi 9 months, 3 weeks ago
On Tue, 11 Jul 2023 at 13:06, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
> since I found a few things in the virtio-scsi driver...
>
> FYI we have seen that Linux has problems with a QEMU patch for the
> virtio-scsi device (details at the bottom of this email in the revert
> commit message and BZ).
>
>
> This is what I found when I looked at the Linux code:
>
> In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
> scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
> 1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
> (sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
>
> When `sdev_target->expecting_lun_change = 1` is set and we call
> scsi_check_sense(), for example to check the next UNIT ATTENTION, it
> will return NEEDS_RETRY, that I think will cause the issues we are
> seeing.
>
> `sdev_target->expecting_lun_change` is reset only in
> scsi_decide_disposition() when `REPORT_LUNS` command returns with
> SAM_STAT_GOOD.
> That command is issued in scsi_report_lun_scan() called by
> __scsi_scan_target(), called for example by scsi_scan_target(),
> scsi_scan_host(), etc.
>
> So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
> and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
> send also the UNIT ATTENTION.
>
> In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
> (hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
> will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
> command to the device. This does not happen for
> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
> UNIT ATTENTION from the hotunplug in QEMU, everything works well.
>
> So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index bd5633667d01..c57658a63097 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -291,6 +291,7 @@ static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
>                  }
>                  break;
>          case VIRTIO_SCSI_EVT_RESET_REMOVED:
> +               scsi_scan_host(shost);
>                  sdev = scsi_device_lookup(shost, 0, target, lun);
>                  if (sdev) {
>                          scsi_remove_device(sdev);
>
> This somehow helps, now linux only breaks if the plug/unplug frequency
> is really high. If I put a 5 second sleep between plug/unplug events, it
> doesn't break (at least for the duration of my test which has been
> running for about 30 minutes, before it used to break after about a
> minute).
>
> Another thing I noticed is that in QEMU maybe we should set the UNIT
> ATTENTION first and then send the event on the virtqueue, because the
> scan should happen after the unit attention, but I don't know if in any
> case the unit attention is processed before the virtqueue.
>
> I mean something like this:
>
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..13db40f4f3 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>           };
>
>           virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>           virtio_scsi_release(s);
>       }
>   }
> @@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>
>       if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>           virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, &info);
>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_push_event(s, &info);
>           virtio_scsi_release(s);
>       }
>   }

That is racy. It's up to the guest whether the event virtqueue or the
UNIT ATTENTION will be processed first.

If the device wants to ensure ordering then it must withhold the event
until the driver has responded to the UNIT ATTENTION. That may not be
a good idea though.

I'd like to understand the root cause before choosing a solution.

> At this point I think the problem is on the handling of the
> VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
> somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
> to be enough when the event rate is very high.

Why is it necessary to rescan the whole bus instead of removing just
the device that has been unplugged?

> I don't know if along with this fix, we also need to limit the rate in
> QEMU somehow.

Why is a high rate problematic?

> Sorry for the length of this email, but I'm not familiar with SCSI and
> wanted some suggestions on how to proceed.
>
> Paolo, Stefan, Linux SCSI maintainers, any suggestion?

I don't know the Linux SCSI code well enough to say, sorry. I think we
need input from someone familiar with the code.

However, QEMU is not at liberty to make changes that break existing
guests. So even if it turns out the specs allow something or there is
an existing bug in virtio_scsi.ko, we still can't break existing
guests.

Stefan

>
>
> Thanks,
> Stefano
>
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> >This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> >
> >That commit causes several problems in Linux as described in the BZ.
> >In particular, after a while, other devices on the bus are no longer
> >usable even if those devices are not affected by the hotunplug.
> >This may be a problem in Linux, but we have not been able to identify
> >it so far. So better to revert this patch until we find a solution.
> >
> >Also, Oracle, which initially proposed this patch for a problem with
> >Solaris, seems to have already reversed it downstream:
> >    https://linux.oracle.com/errata/ELSA-2023-12065.html
> >
> >Suggested-by: Thomas Huth <thuth@redhat.com>
> >Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> >Cc: qemu-stable@nongnu.org
> >Cc: Mark Kanda <mark.kanda@oracle.com>
> >Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >---
> > include/hw/scsi/scsi.h |  1 -
> > hw/scsi/scsi-bus.c     | 18 ------------------
> > hw/scsi/virtio-scsi.c  |  2 --
> > 3 files changed, 21 deletions(-)
> >
> >diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> >index e2bb1a2fbf..7c8adf10b1 100644
> >--- a/include/hw/scsi/scsi.h
> >+++ b/include/hw/scsi/scsi.h
> >@@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
> >                                       BlockdevOnError rerror,
> >                                       BlockdevOnError werror,
> >                                       const char *serial, Error **errp);
> >-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
> > void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
> >
> > SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> >diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> >index f80f4cb4fc..42a915f8b7 100644
> >--- a/hw/scsi/scsi-bus.c
> >+++ b/hw/scsi/scsi-bus.c
> >@@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
> >     return (sense.asc << 8) | sense.ascq;
> > }
> >
> >-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> >-{
> >-    int prec1, prec2;
> >-    if (sense.key != UNIT_ATTENTION) {
> >-        return;
> >-    }
> >-
> >-    /*
> >-     * Override a pre-existing unit attention condition, except for a more
> >-     * important reset condition.
> >-     */
> >-    prec1 = scsi_ua_precedence(bus->unit_attention);
> >-    prec2 = scsi_ua_precedence(sense);
> >-    if (prec2 < prec1) {
> >-        bus->unit_attention = sense;
> >-    }
> >-}
> >-
> > void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
> > {
> >     int prec1, prec2;
> >diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> >index 45b95ea070..1f56607100 100644
> >--- a/hw/scsi/virtio-scsi.c
> >+++ b/hw/scsi/virtio-scsi.c
> >@@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >
> >         virtio_scsi_acquire(s);
> >         virtio_scsi_push_event(s, &info);
> >-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> >         virtio_scsi_release(s);
> >     }
> > }
> >@@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> >         virtio_scsi_acquire(s);
> >         virtio_scsi_push_event(s, &info);
> >-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> >         virtio_scsi_release(s);
> >     }
> > }
> >--
> >2.41.0
> >
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 2 weeks ago
On Tue, Jul 11, 2023 at 01:41:31PM -0400, Stefan Hajnoczi wrote:
>On Tue, 11 Jul 2023 at 13:06, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
>> since I found a few things in the virtio-scsi driver...
>>
>> FYI we have seen that Linux has problems with a QEMU patch for the
>> virtio-scsi device (details at the bottom of this email in the revert
>> commit message and BZ).
>>
>>
>> This is what I found when I looked at the Linux code:
>>
>> In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
>> scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
>> 1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
>> (sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
>>
>> When `sdev_target->expecting_lun_change = 1` is set and we call
>> scsi_check_sense(), for example to check the next UNIT ATTENTION, it
>> will return NEEDS_RETRY, that I think will cause the issues we are
>> seeing.
>>
>> `sdev_target->expecting_lun_change` is reset only in
>> scsi_decide_disposition() when `REPORT_LUNS` command returns with
>> SAM_STAT_GOOD.
>> That command is issued in scsi_report_lun_scan() called by
>> __scsi_scan_target(), called for example by scsi_scan_target(),
>> scsi_scan_host(), etc.
>>
>> So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
>> and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
>> send also the UNIT ATTENTION.
>>
>> In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
>> (hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
>> will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
>> command to the device. This does not happen for
>> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
>> UNIT ATTENTION from the hotunplug in QEMU, everything works well.
>>
>> So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index bd5633667d01..c57658a63097 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -291,6 +291,7 @@ static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
>>                  }
>>                  break;
>>          case VIRTIO_SCSI_EVT_RESET_REMOVED:
>> +               scsi_scan_host(shost);
>>                  sdev = scsi_device_lookup(shost, 0, target, lun);
>>                  if (sdev) {
>>                          scsi_remove_device(sdev);
>>
>> This somehow helps, now linux only breaks if the plug/unplug frequency
>> is really high. If I put a 5 second sleep between plug/unplug events, it
>> doesn't break (at least for the duration of my test which has been
>> running for about 30 minutes, before it used to break after about a
>> minute).
>>
>> Another thing I noticed is that in QEMU maybe we should set the UNIT
>> ATTENTION first and then send the event on the virtqueue, because the
>> scan should happen after the unit attention, but I don't know if in any
>> case the unit attention is processed before the virtqueue.
>>
>> I mean something like this:
>>
>> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>> index 45b95ea070..13db40f4f3 100644
>> --- a/hw/scsi/virtio-scsi.c
>> +++ b/hw/scsi/virtio-scsi.c
>> @@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>           };
>>
>>           virtio_scsi_acquire(s);
>> -        virtio_scsi_push_event(s, &info);
>>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>> +        virtio_scsi_push_event(s, &info);
>>           virtio_scsi_release(s);
>>       }
>>   }
>> @@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>
>>       if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>>           virtio_scsi_acquire(s);
>> -        virtio_scsi_push_event(s, &info);
>>           scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>> +        virtio_scsi_push_event(s, &info);
>>           virtio_scsi_release(s);
>>       }
>>   }
>
>That is racy. It's up to the guest whether the event virtqueue or the
>UNIT ATTENTION will be processed first.

Yep, agree. I wrote above that UA could be processed in a different
order. It was just another potential problem.

>
>If the device wants to ensure ordering then it must withhold the event
>until the driver has responded to the UNIT ATTENTION. That may not be
>a good idea though.
>
>I'd like to understand the root cause before choosing a solution.

This last patch is not the solution.

I think the root cause is in the Linux driver and SCSI subsystem.
When the SCSI code receive an UA with REPORTED LUN CHANGED, it seems
it expects that `REPORT_LUNS` command is issued (I tried to describe it
in the first part).

The problem is that the SCSI stack does not send this command, so we
should do it in the driver. In fact we do it for
VIRTIO_SCSI_EVT_RESET_RESCAN (hotplug), but not for
VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug).

I think that's where the problem is, but I don't know if that's what the
specification expects, I haven't found much information on that :-(

>
>> At this point I think the problem is on the handling of the
>> VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
>> somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
>> to be enough when the event rate is very high.
>
>Why is it necessary to rescan the whole bus instead of removing just
>the device that has been unplugged?

I hope I covered in the previous answer.

>
>> I don't know if along with this fix, we also need to limit the rate in
>> QEMU somehow.
>
>Why is a high rate problematic?

Could be related on the race that you mention before (also without that
untested diff there should be the race)

>
>> Sorry for the length of this email, but I'm not familiar with SCSI and
>> wanted some suggestions on how to proceed.
>>
>> Paolo, Stefan, Linux SCSI maintainers, any suggestion?
>
>I don't know the Linux SCSI code well enough to say, sorry. I think we
>need input from someone familiar with the code.

Thank you very much for the suggestions!
I will try to ping the SCSI maintainers.

>
>However, QEMU is not at liberty to make changes that break existing
>guests. So even if it turns out the specs allow something or there is
>an existing bug in virtio_scsi.ko, we still can't break existing
>guests.

Yes, I can see that. We need to revert or somehow fix the device in
QEMU.

Thanks,
Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Christoph Hellwig 9 months, 2 weeks ago
On Wed, Jul 12, 2023 at 10:28:00AM +0200, Stefano Garzarella wrote:
> The problem is that the SCSI stack does not send this command, so we
> should do it in the driver. In fact we do it for
> VIRTIO_SCSI_EVT_RESET_RESCAN (hotplug), but not for
> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug).

No, you should absolutely no do it in the driver.  The fact that
virtio-scsi even tries to do some of its own LUN scanning is
problematic and should have never happened.
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Paolo Bonzini 9 months, 2 weeks ago
On 7/12/23 15:40, Christoph Hellwig wrote:
>> The problem is that the SCSI stack does not send this command, so we
>> should do it in the driver. In fact we do it for
>> VIRTIO_SCSI_EVT_RESET_RESCAN (hotplug), but not for
>> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug).
>
> No, you should absolutely no do it in the driver.  The fact that
> virtio-scsi even tries to do some of its own LUN scanning is
> problematic and should have never happened.

I agree that it should not do it for hot-unplug.  However, for hot-plug 
the spec says that a hotplug event for LUN 0 represents the addition of 
an entire target, so why is it incorrect to start a REPORT LUNS scan if 
the host doesn't tell you the exact LUN(s) that have been added?

There is a similar case in mpi3mr/mpi3mr_os.c, though it's only scanning 
for newly added devices after a controller reset.

Paolo
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Michael S. Tsirkin 9 months, 3 weeks ago
On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> 
> That commit causes several problems in Linux as described in the BZ.
> In particular, after a while, other devices on the bus are no longer
> usable even if those devices are not affected by the hotunplug.
> This may be a problem in Linux, but we have not been able to identify
> it so far. So better to revert this patch until we find a solution.
> 
> Also, Oracle, which initially proposed this patch for a problem with
> Solaris, seems to have already reversed it downstream:
>     https://linux.oracle.com/errata/ELSA-2023-12065.html
> 
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> Cc: qemu-stable@nongnu.org
> Cc: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

Seems safest to revert, but I'll let storage guys decide whether to
queue this.

> ---
>  include/hw/scsi/scsi.h |  1 -
>  hw/scsi/scsi-bus.c     | 18 ------------------
>  hw/scsi/virtio-scsi.c  |  2 --
>  3 files changed, 21 deletions(-)
> 
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index e2bb1a2fbf..7c8adf10b1 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>                                        BlockdevOnError rerror,
>                                        BlockdevOnError werror,
>                                        const char *serial, Error **errp);
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>  
>  SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index f80f4cb4fc..42a915f8b7 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>      return (sense.asc << 8) | sense.ascq;
>  }
>  
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> -{
> -    int prec1, prec2;
> -    if (sense.key != UNIT_ATTENTION) {
> -        return;
> -    }
> -
> -    /*
> -     * Override a pre-existing unit attention condition, except for a more
> -     * important reset condition.
> -     */
> -    prec1 = scsi_ua_precedence(bus->unit_attention);
> -    prec2 = scsi_ua_precedence(sense);
> -    if (prec2 < prec1) {
> -        bus->unit_attention = sense;
> -    }
> -}
> -
>  void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>  {
>      int prec1, prec2;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..1f56607100 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>          virtio_scsi_acquire(s);
>          virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>          virtio_scsi_release(s);
>      }
>  }
> -- 
> 2.41.0
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Paolo Bonzini 9 months, 2 weeks ago
On 7/10/23 21:40, Michael S. Tsirkin wrote:
> 
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
> 
> Seems safest to revert, but I'll let storage guys decide whether to
> queue this.

There are multiple possibilities:

1) it's a QEMU bug that can be fixed, so no need to revert.

2) there's both a QEMU and a Linux bug, but fixing the QEMU side is 
enough to hide the Linux bug.  The Linux bug can be fixed leisurely.

3) it's a Linux bug that is hard to fix, so we need to revert

4) it's a Linux bug that is worth fixing, but depending on how easy it 
is to trigger the issue we may or may not need to revert

But yeah, reverting is on the table.

Thanks,

Paolo
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Michael S. Tsirkin 8 months, 4 weeks ago
On Wed, Jul 12, 2023 at 10:12:13AM +0200, Paolo Bonzini wrote:
> On 7/10/23 21:40, Michael S. Tsirkin wrote:
> > 
> > Acked-by: Michael S. Tsirkin<mst@redhat.com>
> > 
> > Seems safest to revert, but I'll let storage guys decide whether to
> > queue this.
> 
> There are multiple possibilities:
> 
> 1) it's a QEMU bug that can be fixed, so no need to revert.
> 
> 2) there's both a QEMU and a Linux bug, but fixing the QEMU side is enough
> to hide the Linux bug.  The Linux bug can be fixed leisurely.
> 
> 3) it's a Linux bug that is hard to fix, so we need to revert
> 
> 4) it's a Linux bug that is worth fixing, but depending on how easy it is to
> trigger the issue we may or may not need to revert
> 
> But yeah, reverting is on the table.
> 
> Thanks,
> 
> Paolo

time to decide?

-- 
MST
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Mark Kanda 9 months, 3 weeks ago
On 7/5/2023 2:15 AM, Stefano Garzarella wrote:
> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>
> That commit causes several problems in Linux as described in the BZ.
> In particular, after a while, other devices on the bus are no longer
> usable even if those devices are not affected by the hotunplug.
> This may be a problem in Linux, but we have not been able to identify
> it so far. So better to revert this patch until we find a solution.
>
> Also, Oracle, which initially proposed this patch for a problem with
> Solaris, seems to have already reversed it downstream:
>      https://linux.oracle.com/errata/ELSA-2023-12065.html
>
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> Cc: qemu-stable@nongnu.org
> Cc: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>

Thanks/regards,
-Mark

> ---
>   include/hw/scsi/scsi.h |  1 -
>   hw/scsi/scsi-bus.c     | 18 ------------------
>   hw/scsi/virtio-scsi.c  |  2 --
>   3 files changed, 21 deletions(-)
>
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index e2bb1a2fbf..7c8adf10b1 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
>                                         BlockdevOnError rerror,
>                                         BlockdevOnError werror,
>                                         const char *serial, Error **errp);
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
>   void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
>   
>   SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index f80f4cb4fc..42a915f8b7 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
>       return (sense.asc << 8) | sense.ascq;
>   }
>   
> -void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
> -{
> -    int prec1, prec2;
> -    if (sense.key != UNIT_ATTENTION) {
> -        return;
> -    }
> -
> -    /*
> -     * Override a pre-existing unit attention condition, except for a more
> -     * important reset condition.
> -     */
> -    prec1 = scsi_ua_precedence(bus->unit_attention);
> -    prec2 = scsi_ua_precedence(sense);
> -    if (prec2 < prec1) {
> -        bus->unit_attention = sense;
> -    }
> -}
> -
>   void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
>   {
>       int prec1, prec2;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 45b95ea070..1f56607100 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>   
>           virtio_scsi_acquire(s);
>           virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>           virtio_scsi_release(s);
>       }
>   }
> @@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>       if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>           virtio_scsi_acquire(s);
>           virtio_scsi_push_event(s, &info);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
>           virtio_scsi_release(s);
>       }
>   }
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 3 weeks ago
Hi Mark,

On Wed, Jul 05, 2023 at 07:28:05AM -0500, Mark Kanda wrote:
>On 7/5/2023 2:15 AM, Stefano Garzarella wrote:
>>This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>>
>>That commit causes several problems in Linux as described in the BZ.
>>In particular, after a while, other devices on the bus are no longer
>>usable even if those devices are not affected by the hotunplug.
>>This may be a problem in Linux, but we have not been able to identify
>>it so far. So better to revert this patch until we find a solution.
>>
>>Also, Oracle, which initially proposed this patch for a problem with
>>Solaris, seems to have already reversed it downstream:
>>     https://linux.oracle.com/errata/ELSA-2023-12065.html
>>
>>Suggested-by: Thomas Huth <thuth@redhat.com>
>>Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
>>Cc: qemu-stable@nongnu.org
>>Cc: Mark Kanda <mark.kanda@oracle.com>
>>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
>

Thanks for the review.

By any chance do you have any information you can share regarding
[Orabug: 34905939] mentioned in the errata?

I'd like to better understand why this patch created problems in Linux,
but solved others in Solaris.

Thanks,
Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Mark Kanda 9 months, 3 weeks ago
Hi Stefano,

On 7/5/2023 7:36 AM, Stefano Garzarella wrote:
> Hi Mark,
>
> On Wed, Jul 05, 2023 at 07:28:05AM -0500, Mark Kanda wrote:
>> On 7/5/2023 2:15 AM, Stefano Garzarella wrote:
>>> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
>>>
>>> That commit causes several problems in Linux as described in the BZ.
>>> In particular, after a while, other devices on the bus are no longer
>>> usable even if those devices are not affected by the hotunplug.
>>> This may be a problem in Linux, but we have not been able to identify
>>> it so far. So better to revert this patch until we find a solution.
>>>
>>> Also, Oracle, which initially proposed this patch for a problem with
>>> Solaris, seems to have already reversed it downstream:
>>>     https://linux.oracle.com/errata/ELSA-2023-12065.html
>>>
>>> Suggested-by: Thomas Huth <thuth@redhat.com>
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
>>> Cc: qemu-stable@nongnu.org
>>> Cc: Mark Kanda <mark.kanda@oracle.com>
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>
>> Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
>>
>
> Thanks for the review.
>
> By any chance do you have any information you can share regarding
> [Orabug: 34905939] mentioned in the errata?
>
> I'd like to better understand why this patch created problems in Linux,
> but solved others in Solaris.

Apologies for the delay. I unfortunately can't provide any useful details. We 
had a brief internal discussion about whether the Solaris or Linux driver was 
technically correct per SCSI spec (I'm not sure we came to a conclusion). In any 
case, we ultimately decided it didn't matter because we cannot tolerate a Linux 
regression, and therefore Solaris should change to behave like Linux.

Thanks/regards,
-Mark


Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Stefano Garzarella 9 months, 3 weeks ago
Hi Mark

On Fri, Jul 7, 2023 at 5:58 PM Mark Kanda <mark.kanda@oracle.com> wrote:

[...]

> >> On 7/5/2023 2:15 AM, Stefano Garzarella wrote:
> >>> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> >>>
> >>> That commit causes several problems in Linux as described in the BZ.
> >>> In particular, after a while, other devices on the bus are no longer
> >>> usable even if those devices are not affected by the hotunplug.
> >>> This may be a problem in Linux, but we have not been able to identify
> >>> it so far. So better to revert this patch until we find a solution.
> >>>
> >>> Also, Oracle, which initially proposed this patch for a problem with
> >>> Solaris, seems to have already reversed it downstream:
> >>>     https://linux.oracle.com/errata/ELSA-2023-12065.html
> >>>
> >>> Suggested-by: Thomas Huth <thuth@redhat.com>
> >>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> >>> Cc: qemu-stable@nongnu.org
> >>> Cc: Mark Kanda <mark.kanda@oracle.com>
> >>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >>
> >> Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
> >>
> >
> > Thanks for the review.
> >
> > By any chance do you have any information you can share regarding
> > [Orabug: 34905939] mentioned in the errata?
> >
> > I'd like to better understand why this patch created problems in Linux,
> > but solved others in Solaris.
>
> Apologies for the delay. I unfortunately can't provide any useful details. We
> had a brief internal discussion about whether the Solaris or Linux driver was
> technically correct per SCSI spec (I'm not sure we came to a conclusion). In any
> case, we ultimately decided it didn't matter because we cannot tolerate a Linux
> regression, and therefore Solaris should change to behave like Linux.

Okay, thanks for the update.

We'll try to figure out what goes wrong in Linux, but if we can't we
should merge this before the next release.

Thanks,
Stefano
Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"
Posted by Thomas Huth 9 months, 3 weeks ago
On 05/07/2023 09.15, Stefano Garzarella wrote:
> This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> 
> That commit causes several problems in Linux as described in the BZ.
> In particular, after a while, other devices on the bus are no longer
> usable even if those devices are not affected by the hotunplug.
> This may be a problem in Linux, but we have not been able to identify
> it so far. So better to revert this patch until we find a solution.
> 
> Also, Oracle, which initially proposed this patch for a problem with
> Solaris, seems to have already reversed it downstream:
>      https://linux.oracle.com/errata/ELSA-2023-12065.html
> 
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> Cc: qemu-stable@nongnu.org
> Cc: Mark Kanda <mark.kanda@oracle.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   include/hw/scsi/scsi.h |  1 -
>   hw/scsi/scsi-bus.c     | 18 ------------------
>   hw/scsi/virtio-scsi.c  |  2 --
>   3 files changed, 21 deletions(-)

Reviewed-by: Thomas Huth <thuth@redhat.com>