include/block/aio.h | 7 ++++++- hw/scsi/virtio-scsi.c | 7 ++++++- hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 2 deletions(-)
Hi, When registering callbacks via aio_set_event_notifier_poll(), the io_poll_end() callback is only invoked when polling actually ends. If the notifiers are removed while in a polling section, it is not called. Therefore, io_poll_start() is not necessarily followed up by io_poll_end(). It is not entirely clear whether this is good or bad behavior. On one hand, it may be unexpected to callers. On the other, it may be counterproductive to call io_poll_end() when the polling section has not ended yet. Right now, there is only one user of aio_set_event_notifier(), which is virtio_queue_aio_attach_host_notifier(). It does not expect this behavior, which leads to virtqueue notifiers remaining disabled if virtio_queue_aio_detach_host_notifier() is called while polling. That can happen e.g. through virtio_scsi_drained_begin() or virtio_blk_drained_begin() (through virtio_blk_data_plane_detach()). In such a case, the virtqueue may not be processed for a while, letting the guest driver hang. This can be reproduced by repeatedly hot-plugging and -unplugging a virtio-scsi device with a scsi-hd disk, because the guest will try to enumerate the virtio-scsi device while we’re attaching the scsi-hd disk, which causes a drain, which can cause the virtio-scsi virtqueue to stall as described. Stefan has suggested ensuring we always follow up io_poll_start() by io_poll_end(): https://lists.nongnu.org/archive/html/qemu-block/2023-12/msg00163.html I prefer changing the caller instead, because I don’t think we actually want the virtqueue notifier to be force-enabled when removing our AIO notifiers. So I believe we actually only want to take care to force-enable it when we re-attach the AIO notifiers, and to kick virtqueue processing once, in case we missed any events while the AIO notifiers were not attached. That is done by patch 2. We have already discussed a prior version of it here: https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00001.html And compared to that, based on the discussion, there are some changes: 1. Used virtio_queue_notify() instead of virtio_queue_notify_vq(), as suggested by Paolo, because it’s thread-safe 2. Moved virtio_queue_notify() into virtio_queue_aio_attach_host_notifier*(), because we always want it 3. Dropped virtio_queue_set_notification(vq, 0) from virtio_queue_aio_detach_host_notifier(): Paolo wasn’t sure whether that was safe to do from any context. We don’t really need to call it anyway, so I just dropped it. 4. Added patch 1: Patch 1 fixes virtio_scsi_drained_end() so it won’t attach polling notifiers for the event virtqueue. That didn’t turn out to be an issue so far, but with patch 2, Fiona saw the virtqueue processing queue spinning in a loop as described in 38738f7dbbda90fbc161757b7f4be35b52205552 ("virtio-scsi: don't waste CPU polling the event virtqueue"). Note that as of eaad0fe26050c227dc5dad63205835bac4912a51 ("scsi: only access SCSIDevice->requests from one thread") there’s a different problem when trying to reproduce the bug via hot-plugging and -unplugging a virtio-scsi device, specifically, when unplugging, qemu may crash with an assertion failure[1]. I don’t have a full fix for that yet, but in case you need a work-around for the specific case of virtio-scsi hot-plugging and -unplugging, you can use this patch: https://czenczek.de/0001-DONTMERGE-Fix-crash-on-scsi-unplug.patch [1] https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00317.html Hanna Czenczek (2): virtio-scsi: Attach event vq notifier with no_poll virtio: Keep notifications disabled during drain include/block/aio.h | 7 ++++++- hw/scsi/virtio-scsi.c | 7 ++++++- hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 2 deletions(-) -- 2.43.0
On Wed, Jan 24, 2024 at 06:38:28PM +0100, Hanna Czenczek wrote: > Hi, > > When registering callbacks via aio_set_event_notifier_poll(), the > io_poll_end() callback is only invoked when polling actually ends. If > the notifiers are removed while in a polling section, it is not called. > Therefore, io_poll_start() is not necessarily followed up by > io_poll_end(). > > It is not entirely clear whether this is good or bad behavior. On one > hand, it may be unexpected to callers. On the other, it may be > counterproductive to call io_poll_end() when the polling section has not > ended yet. > > Right now, there is only one user of aio_set_event_notifier(), which is > virtio_queue_aio_attach_host_notifier(). It does not expect this > behavior, which leads to virtqueue notifiers remaining disabled if > virtio_queue_aio_detach_host_notifier() is called while polling. That > can happen e.g. through virtio_scsi_drained_begin() or > virtio_blk_drained_begin() (through virtio_blk_data_plane_detach()). > In such a case, the virtqueue may not be processed for a while, letting > the guest driver hang. This can be reproduced by repeatedly > hot-plugging and -unplugging a virtio-scsi device with a scsi-hd disk, > because the guest will try to enumerate the virtio-scsi device while > we’re attaching the scsi-hd disk, which causes a drain, which can cause > the virtio-scsi virtqueue to stall as described. > > Stefan has suggested ensuring we always follow up io_poll_start() by > io_poll_end(): > > https://lists.nongnu.org/archive/html/qemu-block/2023-12/msg00163.html > > I prefer changing the caller instead, because I don’t think we actually > want the virtqueue notifier to be force-enabled when removing our AIO > notifiers. So I believe we actually only want to take care to > force-enable it when we re-attach the AIO notifiers, and to kick > virtqueue processing once, in case we missed any events while the AIO > notifiers were not attached. > > That is done by patch 2. We have already discussed a prior version of > it here: > > https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00001.html > > And compared to that, based on the discussion, there are some changes: > 1. Used virtio_queue_notify() instead of virtio_queue_notify_vq(), as > suggested by Paolo, because it’s thread-safe > 2. Moved virtio_queue_notify() into > virtio_queue_aio_attach_host_notifier*(), because we always want it > 3. Dropped virtio_queue_set_notification(vq, 0) from > virtio_queue_aio_detach_host_notifier(): Paolo wasn’t sure whether > that was safe to do from any context. We don’t really need to call > it anyway, so I just dropped it. > 4. Added patch 1: > > Patch 1 fixes virtio_scsi_drained_end() so it won’t attach polling > notifiers for the event virtqueue. That didn’t turn out to be an issue > so far, but with patch 2, Fiona saw the virtqueue processing queue > spinning in a loop as described in > 38738f7dbbda90fbc161757b7f4be35b52205552 ("virtio-scsi: don't waste CPU > polling the event virtqueue"). > > > Note that as of eaad0fe26050c227dc5dad63205835bac4912a51 ("scsi: only > access SCSIDevice->requests from one thread") there’s a different > problem when trying to reproduce the bug via hot-plugging and > -unplugging a virtio-scsi device, specifically, when unplugging, qemu > may crash with an assertion failure[1]. I don’t have a full fix for > that yet, but in case you need a work-around for the specific case of > virtio-scsi hot-plugging and -unplugging, you can use this patch: > > https://czenczek.de/0001-DONTMERGE-Fix-crash-on-scsi-unplug.patch > > > [1] https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00317.html > > > Hanna Czenczek (2): > virtio-scsi: Attach event vq notifier with no_poll > virtio: Keep notifications disabled during drain > > include/block/aio.h | 7 ++++++- > hw/scsi/virtio-scsi.c | 7 ++++++- > hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 54 insertions(+), 2 deletions(-) This patch series also fixes RHEL-7356. Buglink: https://issues.redhat.com/browse/RHEL-7356. Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
© 2016 - 2024 Red Hat, Inc.