[PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()

Kevin Wolf posted 2 patches 1 year, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230905145002.46391-1-kwolf@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>
include/migration/vmstate.h |  8 ++++---
hw/virtio/virtio.c          | 45 ++++++++++++++++++++++++++++++++-----
2 files changed, 45 insertions(+), 8 deletions(-)
[PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
Posted by Kevin Wolf 1 year, 2 months ago
This fixes a recently introduced assertion failure that was reported to
happen when migrating virtio-net with a failover. The latent bug that
we're executing code in coroutine context that was never supposed to run
there has existed for a long time. However, the new assertion that
callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
makes it very visible because it's now always a crash.

Kevin Wolf (2):
  vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
  virtio: Drop out of coroutine context in virtio_load()

 include/migration/vmstate.h |  8 ++++---
 hw/virtio/virtio.c          | 45 ++++++++++++++++++++++++++++++++-----
 2 files changed, 45 insertions(+), 8 deletions(-)

-- 
2.41.0
Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
Posted by Stefan Hajnoczi 1 year, 2 months ago
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
> 
> Kevin Wolf (2):
>   vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
>   virtio: Drop out of coroutine context in virtio_load()
> 
>  include/migration/vmstate.h |  8 ++++---
>  hw/virtio/virtio.c          | 45 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 45 insertions(+), 8 deletions(-)
> 
> -- 
> 2.41.0
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
Posted by Stefan Hajnoczi 1 year, 2 months ago
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
> 
> Kevin Wolf (2):
>   vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
>   virtio: Drop out of coroutine context in virtio_load()
> 
>  include/migration/vmstate.h |  8 ++++---
>  hw/virtio/virtio.c          | 45 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 45 insertions(+), 8 deletions(-)

This looks like a bandaid for a specific instance of this problem rather
than a solution that takes care of the root cause.

Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn?

Stefan
Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()
Posted by Kevin Wolf 1 year, 2 months ago
Am 07.09.2023 um 20:42 hat Stefan Hajnoczi geschrieben:
> On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> > This fixes a recently introduced assertion failure that was reported to
> > happen when migrating virtio-net with a failover. The latent bug that
> > we're executing code in coroutine context that was never supposed to run
> > there has existed for a long time. However, the new assertion that
> > callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> > makes it very visible because it's now always a crash.
> > 
> > Kevin Wolf (2):
> >   vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
> >   virtio: Drop out of coroutine context in virtio_load()
> > 
> >  include/migration/vmstate.h |  8 ++++---
> >  hw/virtio/virtio.c          | 45 ++++++++++++++++++++++++++++++++-----
> >  2 files changed, 45 insertions(+), 8 deletions(-)
> 
> This looks like a bandaid for a specific instance of this problem
> rather than a solution that takes care of the root cause.
> 
> Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn?

I think it is. Note that this doesn't solve the problem, virtio_load()
calls functions that must run _outside_ coroutine context. So once the
migration code is cleaned up to consistenly run in coroutine context,
you can remove the check and the one line for the !qemu_in_coroutine()
case from this series. The rest stays as it is.

It is not a solution that takes care of the root cause, but I also can't
think of one. The problem is that VMState callbacks both read/write the
migration stream (which should be done in coroutine context) and set the
device state (which can involve functions that must not run in coroutine
context). Untangling this, if possible at all, is not easy and certainly
not something for stable releases.

Kevin