On 16/11/20 19:31, Hannes Reinecke wrote:
> Hi all,
>
> one of our customers reported an infinite guest hang following an FC link loss when using scsi-disk.
> Problem is that scsi-disk issues SG_IO command with a timeout of UINT_MAX, which essentially signals
> 'no timeout' to the host kernel. So if the command gets lost eg during an unexpected link loss the
> HBA driver will never attempt to abort or return the command. Hence the guest will hang forever, and
> the only way to resolve things is to reboot the host.
>
> To solve it this patchset adds an 'io_timeout' parameter to scsi-disk and scsi-generic, which allows
> the admin to specify a command timeout for SG_IO request. It is initialized to 30 seconds to avoid the
> infinite hang as mentioned above.
>
> As usual, comments and reviews are welcome.
>
> Hannes Reinecke (3):
> virtio-scsi: trace events
> scsi: make io_timeout configurable
> scsi: add tracing for SG_IO commands
>
> hw/scsi/scsi-disk.c | 9 ++++++---
> hw/scsi/scsi-generic.c | 25 ++++++++++++++++++-------
> hw/scsi/trace-events | 13 +++++++++++++
> hw/scsi/virtio-scsi.c | 30 +++++++++++++++++++++++++++++-
> include/hw/scsi/scsi.h | 4 +++-
> 5 files changed, 69 insertions(+), 12 deletions(-)
>
The UINT_MAX timeout predates me, but I think the idea was to make it
sort of like NFS's hard option. Without a timeout you cannot be quite
sure if/when the command will stay in some buffer of the HBA or the SAN
or the target, and there could be unintended reordering of writes.
Though I guess at some point you'll anyway restart the VM on another
host and the same reordering can happen, so I've queued the patch.
Paolo