[v1] scsi: restart dma after vm change state handlers

[Qemu-devel] [RFC] scsi: restart dma after vm change state handlers

Posted by Stefan Hajnoczi 6 years, 5 months ago

Various components register vm change state handlers to restart device
emulation when the guest is unpaused.  These handlers run in an
arbitrary order since there is no way to specific explicit dependencies
or priorities.

Each SCSIDevice has a vm change state handler to restart failed I/O
requests when the guest is unpaused.  It schedules a BH in the
AioContext of the BlockBackend.

When virtio-scsi is used with an iothread, the BH may execute in the
iothread while the main loop thread is invoking the remaining vm change
state handlers.  In this case virtio-scsi iothread may not be fully
started yet, leading to problems.

One symptom is that I/O request completion is processed in the iothread
before virtio-scsi iothread is fully started and the MSI notify code
path takes the BQL.  This violates QEMU's lock order and causes a
deadlock.

This patch defers restarting SCSIDevice requests until after all vm
change state handlers have completed.  It's an ugly fix because we're
taking advantage of side-effects instead of explicitly introducing
dependencies that are visible in the source code, but I haven't found a
cleaner solution that isn't also complex and hard to understand.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
This is RFC because I am waiting for a test result on the system where
the bug was originally discovered.  I'm also open to nicer solutions!

 hw/scsi/scsi-bus.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c480553083..13b3823752 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -134,13 +134,10 @@ void scsi_req_retry(SCSIRequest *req)
     req->retry = true;
 }
 
-static void scsi_dma_restart_cb(void *opaque, int running, RunState state)
+static void scsi_device_retry_reqs_bh(void *opaque)
 {
     SCSIDevice *s = opaque;
 
-    if (!running) {
-        return;
-    }
     if (!s->bh) {
         AioContext *ctx = blk_get_aio_context(s->conf.blk);
         s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
@@ -148,6 +145,22 @@ static void scsi_dma_restart_cb(void *opaque, int running, RunState state)
     }
 }
 
+static void scsi_dma_restart_cb(void *opaque, int running, RunState state)
+{
+    SCSIDevice *s = opaque;
+
+    if (!running) {
+        return;
+    }
+
+    /* Defer to a main loop BH since other vm change state handlers may need to
+     * run before the bus is ready for I/O activity (e.g. virtio-scsi's
+     * iothread setup).
+     */
+    aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                            scsi_device_retry_reqs_bh, s);
+}
+
 static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
 {
     SCSIDevice *dev = SCSI_DEVICE(qdev);
-- 
2.21.0

Re: [Qemu-devel] [RFC] scsi: restart dma after vm change state handlers

Posted by Paolo Bonzini 6 years, 5 months ago

On 21/05/19 12:36, Stefan Hajnoczi wrote:
> This is RFC because I am waiting for a test result on the system where
> the bug was originally discovered.  I'm also open to nicer solutions!

I don't think it's too ugly; IDE is also using a bottom half for this.

Paolo

Re: [Qemu-devel] [RFC] scsi: restart dma after vm change state handlers

Posted by Kevin Wolf 6 years, 5 months ago

Am 21.05.2019 um 13:04 hat Paolo Bonzini geschrieben:
> On 21/05/19 12:36, Stefan Hajnoczi wrote:
> > This is RFC because I am waiting for a test result on the system where
> > the bug was originally discovered.  I'm also open to nicer solutions!
> 
> I don't think it's too ugly; IDE is also using a bottom half for this.

I think the IDE case is different, see commit 213189ab65d. The case
we're protecting against there is stopping the VM from inside a VM state
handler, which can confuse other VM state callbacks that come later. The
actual order of the IDE callback vs. the other callback doesn't matter,
it's just important that all start callbacks are completed before stop
callbacks are called.

In our current case, the problem is not that we're confusing other
handlers, but that we rely on another handler to have completed resuming
something. If that other handler changes e.g. to use a BH itself, we get
an undefined order again.

The clean solution would probably be not to use a VM state handler in
scsi-bus, but a callback from the HBA that tells the bus that the HBA is
ready to receive requests again.

If we go with the not so clean solution, maybe at least a comment in
virtio-scsi would be in order.

Kevin

Re: [Qemu-devel] [RFC] scsi: restart dma after vm change state handlers

Posted by Stefan Hajnoczi 6 years, 5 months ago

On Tue, May 21, 2019 at 01:30:59PM +0200, Kevin Wolf wrote:
> Am 21.05.2019 um 13:04 hat Paolo Bonzini geschrieben:
> > On 21/05/19 12:36, Stefan Hajnoczi wrote:
> > > This is RFC because I am waiting for a test result on the system where
> > > the bug was originally discovered.  I'm also open to nicer solutions!
> > 
> > I don't think it's too ugly; IDE is also using a bottom half for this.
> 
> I think the IDE case is different, see commit 213189ab65d. The case
> we're protecting against there is stopping the VM from inside a VM state
> handler, which can confuse other VM state callbacks that come later. The
> actual order of the IDE callback vs. the other callback doesn't matter,
> it's just important that all start callbacks are completed before stop
> callbacks are called.
> 
> In our current case, the problem is not that we're confusing other
> handlers, but that we rely on another handler to have completed resuming
> something. If that other handler changes e.g. to use a BH itself, we get
> an undefined order again.
> 
> The clean solution would probably be not to use a VM state handler in
> scsi-bus, but a callback from the HBA that tells the bus that the HBA is
> ready to receive requests again.
> 
> If we go with the not so clean solution, maybe at least a comment in
> virtio-scsi would be in order.

I explored this approach originally but found it hard to connect things
together in an easy-to-understand way.  That's when I abandoned the idea
and used a BH as a hack, but I find it problematic in the long-term (too
many things could go wrong and cause a regression).

Time for another look at a proper callback for DMA restart...

Stefan