[PATCH 13/17] hw/scsi: Convert to new bh API

Nicholas Piggin posted 17 patches 3 days, 12 hours ago
[PATCH 13/17] hw/scsi: Convert to new bh API
Posted by Nicholas Piggin 3 days, 12 hours ago
Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
which can specify the clock type, making it compatible with
record-replay.

Operations on SCSI reqs do affect target machine state, so it should
use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.

This fixes hangs in record/replay when using SCSI devices.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/scsi/scsi-bus.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 2f1678d51e7..16f294ce6b7 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -166,9 +166,17 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
 
     /* Paired with blk_dec_in_flight() in scsi_device_for_each_req_async_bh() */
     blk_inc_in_flight(s->conf.blk);
-    aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
-                            scsi_device_for_each_req_async_bh,
-                            data);
+
+    /*
+     * This is called by device reset and does not affect the observable state
+     * of the target (because it is being reset), and by scsi_dma_restart_cb
+     * to restart DMA on vmstate change which also should not affect the state
+     * of the target (XXX is this really true?), so QEMU_CLOCK_REALTIME should
+     * be used to avoid record-replay of the bh event.
+     */
+    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
+                                  scsi_device_for_each_req_async_bh,
+                                  data, QEMU_CLOCK_REALTIME);
 }
 
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
-- 
2.45.2
Re: [PATCH 13/17] hw/scsi: Convert to new bh API
Posted by Paolo Bonzini 2 days, 23 hours ago
Il ven 20 dic 2024, 11:44 Nicholas Piggin <npiggin@gmail.com> ha scritto:

> Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
> which can specify the clock type, making it compatible with
> record-replay.
>
> Operations on SCSI reqs do affect target machine state, so it should
> use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.
>

This does not seem to match the patch below?

Paolo

+    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
> +                                  scsi_device_for_each_req_async_bh,
> +                                  data, QEMU_CLOCK_REALTIME);
>  }
>
>  static void scsi_device_realize(SCSIDevice *s, Error **errp)
> --
> 2.45.2
>
>
Re: [PATCH 13/17] hw/scsi: Convert to new bh API
Posted by Nicholas Piggin 2 days, 19 hours ago
On Sat Dec 21, 2024 at 9:54 AM AEST, Paolo Bonzini wrote:
> Il ven 20 dic 2024, 11:44 Nicholas Piggin <npiggin@gmail.com> ha scritto:
>
> > Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
> > which can specify the clock type, making it compatible with
> > record-replay.
> >
> > Operations on SCSI reqs do affect target machine state, so it should
> > use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.
> >
>
> This does not seem to match the patch below?

Ah, good catch, I missed fixing the changelog.

I think scsi_device_purge_requests() does not affect target because
it is called on machine reset so the state is all going away anyway.

But initially I thought scsi_dma_restart_cb did, like the ide
restart (which was a real bug). But that caused record-replay hangs
because it is a vm_change_state handler, so I took another looks and
it seems like it perhaps just kicks the host DMA running again and
perhaps it is okay to be outside record-replay. I'm completely
confident of this though.

And now you make me look again, the IDE restart is also a vm change
state handler. So my patch for that does not solve all problems
either (it's better than nothing, but still has this bug).

AFAIK, vm state change (stop, continue) should ideally not affect
machine or emulated devices right? Is it possible to split out
the architectural SCSI/IDE restart from the DMA restart that is
reqiured by vm state change?

At least I will redo the patches and leave a comment and a qemu log
message if we hit the case of IDE vmstate change with record
replay active...

Thanks,
Nick

>
> Paolo
>
> +    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
> > +                                  scsi_device_for_each_req_async_bh,
> > +                                  data, QEMU_CLOCK_REALTIME);
> >  }
> >
> >  static void scsi_device_realize(SCSIDevice *s, Error **errp)
> > --
> > 2.45.2
> >
> >