Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
before main() quits") introduced iothread_stop_all() to avoid the
following virtio-scsi assertion failure:
assert(blk_get_aio_context(d->conf.blk) == s->ctx);
Back then the assertion failed because when bdrv_close_all() made
d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
instead of s->ctx.
The same assertion can still fail today when vcpus submit new I/O
requests after iothread_stop_all() has moved the BDS to the global
AioContext.
This patch hardens the iothread_stop_all() approach by pausing vcpus
before calling iothread_stop_all().
Note that the assertion failure is a race condition. It is not possible
to reproduce it reliably.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
vl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/vl.c b/vl.c
index e517a8d995..011f22ae20 100644
--- a/vl.c
+++ b/vl.c
@@ -4765,10 +4765,10 @@ int main(int argc, char **argv, char **envp)
os_setup_post();
main_loop();
+
replay_disable_events();
+ pause_all_vcpus();
iothread_stop_all();
-
- pause_all_vcpus();
bdrv_close_all();
res_free();
--
2.14.3
Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben: > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads > before main() quits") introduced iothread_stop_all() to avoid the > following virtio-scsi assertion failure: > > assert(blk_get_aio_context(d->conf.blk) == s->ctx); > > Back then the assertion failed because when bdrv_close_all() made > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext > instead of s->ctx. > > The same assertion can still fail today when vcpus submit new I/O > requests after iothread_stop_all() has moved the BDS to the global > AioContext. > > This patch hardens the iothread_stop_all() approach by pausing vcpus > before calling iothread_stop_all(). > > Note that the assertion failure is a race condition. It is not possible > to reproduce it reliably. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Does pausing the vcpus actually make sure that the iothread isn't active any more, or do we still have a small window where the vcpu is already stopped, but the iothread is still processing requests? Essentially, I think the bdrv_set_aio_context() in iothread_stop_all() does either not have any effect, or if it does have an effect, it's wrong. You can't just force an in-use BDS into a different AioContext when the user that set the AioContext is still there. At the very least, do we need a blk_drain_all() before stopping the iothreads? It would still just be a hack, the proper way seens to be getting the virtio device out of dataplane mode so that the iothread is actually unused and doesn't just happen to not process something at the moment. Kevin > vl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/vl.c b/vl.c > index e517a8d995..011f22ae20 100644 > --- a/vl.c > +++ b/vl.c > @@ -4765,10 +4765,10 @@ int main(int argc, char **argv, char **envp) > os_setup_post(); > > main_loop(); > + > replay_disable_events(); > + pause_all_vcpus(); > iothread_stop_all(); > - > - pause_all_vcpus(); > bdrv_close_all(); > res_free(); > > -- > 2.14.3 >
On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote: > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben: > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads > > before main() quits") introduced iothread_stop_all() to avoid the > > following virtio-scsi assertion failure: > > > > assert(blk_get_aio_context(d->conf.blk) == s->ctx); > > > > Back then the assertion failed because when bdrv_close_all() made > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext > > instead of s->ctx. > > > > The same assertion can still fail today when vcpus submit new I/O > > requests after iothread_stop_all() has moved the BDS to the global > > AioContext. > > > > This patch hardens the iothread_stop_all() approach by pausing vcpus > > before calling iothread_stop_all(). > > > > Note that the assertion failure is a race condition. It is not possible > > to reproduce it reliably. > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > Does pausing the vcpus actually make sure that the iothread isn't active > any more, or do we still have a small window where the vcpu is already > stopped, but the iothread is still processing requests? > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all() > does either not have any effect, or if it does have an effect, it's > wrong. You can't just force an in-use BDS into a different AioContext > when the user that set the AioContext is still there. > > At the very least, do we need a blk_drain_all() before stopping the > iothreads? bdrv_set_aio_context() contains aio_disable_external() + bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all requests, even those sitting in a descriptor ring that hasn't been processed yet. > It would still just be a hack, the proper way seens to be > getting the virtio device out of dataplane mode so that the iothread is > actually unused and doesn't just happen to not process something at the > moment. Agreed, the existing approach is a hack. I'm not keen on implementing a proper device<->IOThread detach operation because vl.c:main() seems to be the only place that needs it - and it can get away with just quiescing requests and the IOThread instead. Stefan
Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben: > On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote: > > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben: > > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads > > > before main() quits") introduced iothread_stop_all() to avoid the > > > following virtio-scsi assertion failure: > > > > > > assert(blk_get_aio_context(d->conf.blk) == s->ctx); > > > > > > Back then the assertion failed because when bdrv_close_all() made > > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext > > > instead of s->ctx. > > > > > > The same assertion can still fail today when vcpus submit new I/O > > > requests after iothread_stop_all() has moved the BDS to the global > > > AioContext. > > > > > > This patch hardens the iothread_stop_all() approach by pausing vcpus > > > before calling iothread_stop_all(). > > > > > > Note that the assertion failure is a race condition. It is not possible > > > to reproduce it reliably. > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > Does pausing the vcpus actually make sure that the iothread isn't active > > any more, or do we still have a small window where the vcpu is already > > stopped, but the iothread is still processing requests? > > > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all() > > does either not have any effect, or if it does have an effect, it's > > wrong. You can't just force an in-use BDS into a different AioContext > > when the user that set the AioContext is still there. > > > > At the very least, do we need a blk_drain_all() before stopping the > > iothreads? > > bdrv_set_aio_context() contains aio_disable_external() + > bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all > requests, even those sitting in a descriptor ring that hasn't been > processed yet. Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can have my R-b either way then: Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > It would still just be a hack, the proper way seens to be > > getting the virtio device out of dataplane mode so that the iothread is > > actually unused and doesn't just happen to not process something at the > > moment. > > Agreed, the existing approach is a hack. I'm not keen on implementing > a proper device<->IOThread detach operation because vl.c:main() seems to > be the only place that needs it - and it can get away with just > quiescing requests and the IOThread instead. As long as we don't want to switch devices between iothreads at runtime (and create/delete iothreads over QMP), we probably won't really need it, yes. Kevin
On Wed, Jan 31, 2018 at 03:31:27PM +0100, Kevin Wolf wrote: > Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben: > > On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote: > > > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben: > > > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads > > > > before main() quits") introduced iothread_stop_all() to avoid the > > > > following virtio-scsi assertion failure: > > > > > > > > assert(blk_get_aio_context(d->conf.blk) == s->ctx); > > > > > > > > Back then the assertion failed because when bdrv_close_all() made > > > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext > > > > instead of s->ctx. > > > > > > > > The same assertion can still fail today when vcpus submit new I/O > > > > requests after iothread_stop_all() has moved the BDS to the global > > > > AioContext. > > > > > > > > This patch hardens the iothread_stop_all() approach by pausing vcpus > > > > before calling iothread_stop_all(). > > > > > > > > Note that the assertion failure is a race condition. It is not possible > > > > to reproduce it reliably. > > > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > > > Does pausing the vcpus actually make sure that the iothread isn't active > > > any more, or do we still have a small window where the vcpu is already > > > stopped, but the iothread is still processing requests? > > > > > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all() > > > does either not have any effect, or if it does have an effect, it's > > > wrong. You can't just force an in-use BDS into a different AioContext > > > when the user that set the AioContext is still there. > > > > > > At the very least, do we need a blk_drain_all() before stopping the > > > iothreads? > > > > bdrv_set_aio_context() contains aio_disable_external() + > > bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all > > requests, even those sitting in a descriptor ring that hasn't been > > processed yet. > > Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can > have my R-b either way then: > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> Thanks for the review! I have sent v2 with a comment for you to review. Stefan
© 2016 - 2024 Red Hat, Inc.