[PATCH 0/2] python/qemu/machine: fix potential hang in QMP accept

marcandre.lureau@redhat.com posted 2 patches 3 years, 10 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220628134939.680174-1-marcandre.lureau@redhat.com
Maintainers: John Snow <jsnow@redhat.com>, Cleber Rosa <crosa@redhat.com>, Beraldo Leal <bleal@redhat.com>
There is a newer version of this series
python/qemu/machine/machine.py | 58 ++++++++++++++++++++++++----------
python/qemu/qmp/legacy.py      | 10 ++++++
2 files changed, 51 insertions(+), 17 deletions(-)
[PATCH 0/2] python/qemu/machine: fix potential hang in QMP accept
Posted by marcandre.lureau@redhat.com 3 years, 10 months ago
From: Marc-André Lureau <marcandre.lureau@redhat.com>

Hi,

As reported earlier by Richard Henderson ("virgl avocado hang" thread), avocado
tests may hang when QEMU exits before the QMP connection is established.

My proposal to fix the problem here is to do both accept() and wait()
concurrently by turning some code async. Obviously, there is much larger
work to be done to turn more code into async and avoid _sync() wrappers, but
I do not intend to tackle that.

Please comment/review

Marc-André Lureau (2):
  python/qemu/machine: replace subprocess.Popen with asyncio
  python/qemu/machine: accept QMP connection asynchronously

 python/qemu/machine/machine.py | 58 ++++++++++++++++++++++++----------
 python/qemu/qmp/legacy.py      | 10 ++++++
 2 files changed, 51 insertions(+), 17 deletions(-)

-- 
2.37.0.rc0


Re: [PATCH 0/2] python/qemu/machine: fix potential hang in QMP accept
Posted by John Snow 3 years, 10 months ago
On Tue, Jun 28, 2022 at 9:49 AM <marcandre.lureau@redhat.com> wrote:
>
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
>
> Hi,
>
> As reported earlier by Richard Henderson ("virgl avocado hang" thread), avocado
> tests may hang when QEMU exits before the QMP connection is established.
>
> My proposal to fix the problem here is to do both accept() and wait()
> concurrently by turning some code async. Obviously, there is much larger
> work to be done to turn more code into async and avoid _sync() wrappers, but
> I do not intend to tackle that.
>
> Please comment/review
>

This has been on my list, it's been a problem for a while. If this
series doesn't regress anything, I'm happy to take it. It'd be nice to
get a proper "idiomatic" asyncio Machine class, but that can wait. I
just got back from a vacation trip, please harass me in a few days if
I haven't cleared this off my to-do list.

Thanks,

--js

> Marc-André Lureau (2):
>   python/qemu/machine: replace subprocess.Popen with asyncio
>   python/qemu/machine: accept QMP connection asynchronously
>
>  python/qemu/machine/machine.py | 58 ++++++++++++++++++++++++----------
>  python/qemu/qmp/legacy.py      | 10 ++++++
>  2 files changed, 51 insertions(+), 17 deletions(-)
>
> --
> 2.37.0.rc0
>
Re: [PATCH 0/2] python/qemu/machine: fix potential hang in QMP accept
Posted by Marc-André Lureau 3 years, 10 months ago
Hi

On Tue, Jun 28, 2022 at 9:08 PM John Snow <jsnow@redhat.com> wrote:
>
> On Tue, Jun 28, 2022 at 9:49 AM <marcandre.lureau@redhat.com> wrote:
> >
> > From: Marc-André Lureau <marcandre.lureau@redhat.com>
> >
> > Hi,
> >
> > As reported earlier by Richard Henderson ("virgl avocado hang" thread), avocado
> > tests may hang when QEMU exits before the QMP connection is established.
> >
> > My proposal to fix the problem here is to do both accept() and wait()
> > concurrently by turning some code async. Obviously, there is much larger
> > work to be done to turn more code into async and avoid _sync() wrappers, but
> > I do not intend to tackle that.
> >
> > Please comment/review
> >
>
> This has been on my list, it's been a problem for a while. If this
> series doesn't regress anything, I'm happy to take it. It'd be nice to
> get a proper "idiomatic" asyncio Machine class, but that can wait. I
> just got back from a vacation trip, please harass me in a few days if
> I haven't cleared this off my to-do list.
>

It has a few regressions. Plus I am considering Daniel's suggestions.
Let me revisit first.

thanks
Re: [PATCH 0/2] python/qemu/machine: fix potential hang in QMP accept
Posted by Daniel P. Berrangé 3 years, 10 months ago
On Tue, Jun 28, 2022 at 05:49:37PM +0400, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Hi,
> 
> As reported earlier by Richard Henderson ("virgl avocado hang" thread), avocado
> tests may hang when QEMU exits before the QMP connection is established.
> 
> My proposal to fix the problem here is to do both accept() and wait()
> concurrently by turning some code async. Obviously, there is much larger
> work to be done to turn more code into async and avoid _sync() wrappers, but
> I do not intend to tackle that.

IIUC, in this case the Python API has a listener socket, and QEMU is
the client socket. As you say this has a possible designed in hang
since there's not a good way 100% sure whether a client connection is
still pending or not. The plus side is that it means that QEMU should
die when the parent python app goes away and the server end of the
monitor FD closes.

The startup race though could be avoided by using FD passing with a
reversed relationship. ie Python opens a listener socket, and passes
the pre-opened FD to the forkd QEMU process. The python can connect()
and be confident that either connect will (eventually) succeed, or
it will definitely get a failure when QEMU exits (abnormally) because
the pre-opened listener FD will get closed.

There would need to be another means of ensuring cleanup of QEMU
processes though. Probably QEMU itself ought to support a flag to
the monitor to indicate that it is "single connection" mode, such
that when the first client terminates, QEMU exits

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|