[Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket

Anthony PERARD posted 4 patches 4 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20191025170505.2834957-1-anthony.perard@citrix.com
There is a newer version of this series
tools/libxl/libxl_disk.c        |  6 +--
tools/libxl/libxl_dm.c          |  8 ++--
tools/libxl/libxl_dom_save.c    |  2 +-
tools/libxl/libxl_dom_suspend.c |  2 +-
tools/libxl/libxl_domain.c      |  8 ++--
tools/libxl/libxl_event.c       |  3 +-
tools/libxl/libxl_fork.c        | 55 ++++++++++++++++++++++++
tools/libxl/libxl_internal.c    | 31 +++++++++++++-
tools/libxl/libxl_internal.h    | 53 +++++++++++++++++------
tools/libxl/libxl_pci.c         |  8 ++--
tools/libxl/libxl_qmp.c         | 75 +++++++++++++++++++++++++--------
tools/libxl/libxl_usb.c         | 28 ++++++------
12 files changed, 219 insertions(+), 60 deletions(-)
[Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket
Posted by Anthony PERARD 4 years, 6 months ago
Patch series available in this git branch:
https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git br.fix-ev_qmp-multi-connect-v1

Hi,

QEMU's QMP socket doesn't allow multiple concurrent connection. Also, it
listen() on the socket with a `backlog' of only 1. On Linux at least, once that
backlog is filled connect() will return EAGAIN if the socket fd is
non-blocking. libxl may attempt many concurrent connect() attempt if for
example a guest is started with several PCI passthrough devices, and a
connect() failure lead to a failure to start the guest.

Since we can't change the listen()'s `backlog' that QEMU use, we need other
ways to workaround the issue. This patch series introduce a lock to acquire
before attempting to connect() to the QMP socket. Since the lock might be held
for to long, the series also introduce a way to cancel the acquisition of the
lock, this means killing the process that tries to get the lock.

Alternatively to this craziness, it might be possible to increase the `backlog'
value by having libxl opening the QMP socket on behalf of QEMU. But this is
only possible with a recent version of QEMU (2.12 or newer, released in Apr
2018, or qemu-xen-4.12 or newer). It would involve to discover QEMU's
capability before we start the DM, which libxl isn't capable yet.

Cheers,

Anthony PERARD (4):
  libxl: Introduce libxl__ev_child_kill
  libxl: Introduce libxl__ev_qmplock
  libxl: libxl__ev_qmp_send now takes an egc
  libxl_qmp: Have a lock for QMP socket access

 tools/libxl/libxl_disk.c        |  6 +--
 tools/libxl/libxl_dm.c          |  8 ++--
 tools/libxl/libxl_dom_save.c    |  2 +-
 tools/libxl/libxl_dom_suspend.c |  2 +-
 tools/libxl/libxl_domain.c      |  8 ++--
 tools/libxl/libxl_event.c       |  3 +-
 tools/libxl/libxl_fork.c        | 55 ++++++++++++++++++++++++
 tools/libxl/libxl_internal.c    | 31 +++++++++++++-
 tools/libxl/libxl_internal.h    | 53 +++++++++++++++++------
 tools/libxl/libxl_pci.c         |  8 ++--
 tools/libxl/libxl_qmp.c         | 75 +++++++++++++++++++++++++--------
 tools/libxl/libxl_usb.c         | 28 ++++++------
 12 files changed, 219 insertions(+), 60 deletions(-)

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket
Posted by Ian Jackson 4 years, 5 months ago
Hi.  Thanks for tackling this swamp.  All very unfortunate.

Anthony PERARD writes ("[RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"):
> Alternatively to this craziness, it might be possible to increase
> the `backlog' value by having libxl opening the QMP socket on behalf
> of QEMU. But this is only possible with a recent version of QEMU
> (2.12 or newer, released in Apr 2018, or qemu-xen-4.12 or newer). It
> would involve to discover QEMU's capability before we start the DM,
> which libxl isn't capable yet.

I have an ancient unapplied patch somewhere which runs qemu --help
and greps the output.  If you would like, I can dig it out.

But one problem with that approach is this: without that feature in
qemu, what would we do ?  Live with the bug where domain creation
fails ?  Bodge it by serialising within domain create (awkwardating
the code) ?

I have some other suggestions which ought to be considered:


1. Send a patch to qemu upstream to allow specifying the socket listen
queue.

1(a) Expect distros to apply that patch to older qemus, if they ship
older qemus.  Have libxl unconditionally specify that argument.

1(b) grep the help output (as I propose above) and if the patch is not
present, use LD_PRELOAD to wrap listen(2).


2. Send a patch to qemu upstream to change the fixed queue length from
1 to 10000.  Expect distros to apply that patch to older qemus (even,
perhaps, if it is not accepted upstream!)  Change libxl to detect
EAGAIN from qmp connect() and print a message explaining what patch is
missing.


Since you have provided an implementation of the fork/lock strategy,
I'll now go and do a detailed review of that.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket
Posted by Anthony PERARD 4 years, 5 months ago
On Mon, Oct 28, 2019 at 11:25:26AM +0000, Ian Jackson wrote:
> Hi.  Thanks for tackling this swamp.  All very unfortunate.
> 
> Anthony PERARD writes ("[RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"):
> > Alternatively to this craziness, it might be possible to increase
> > the `backlog' value by having libxl opening the QMP socket on behalf
> > of QEMU. But this is only possible with a recent version of QEMU
> > (2.12 or newer, released in Apr 2018, or qemu-xen-4.12 or newer). It
> > would involve to discover QEMU's capability before we start the DM,
> > which libxl isn't capable yet.
> 
> I have an ancient unapplied patch somewhere which runs qemu --help
> and greps the output.  If you would like, I can dig it out.
> 
> But one problem with that approach is this: without that feature in
> qemu, what would we do ?  Live with the bug where domain creation
> fails ?  Bodge it by serialising within domain create (awkwardating
> the code) ?
> 
> I have some other suggestions which ought to be considered:
> 
> 
> 1. Send a patch to qemu upstream to allow specifying the socket listen
> queue.
> 
> 1(a) Expect distros to apply that patch to older qemus, if they ship
> older qemus.  Have libxl unconditionally specify that argument.
> 
> 1(b) grep the help output (as I propose above) and if the patch is not
> present, use LD_PRELOAD to wrap listen(2).
> 
> 
> 2. Send a patch to qemu upstream to change the fixed queue length from
> 1 to 10000.  Expect distros to apply that patch to older qemus (even,
> perhaps, if it is not accepted upstream!)  Change libxl to detect
> EAGAIN from qmp connect() and print a message explaining what patch is
> missing.

Those suggestions are interesting idea, but I would prefer to have libxl
been able to deal with any version of QEMU, so without having to patch
QEMU. Beside serialising QMP access in the code, fork/lock strategy
might be the only other way. (Well there is also fork/connect with a
blocking fd, but we already have code for fork/lock.)

So I'll keep working on the fork/lock strategy.

> Since you have provided an implementation of the fork/lock strategy,
> I'll now go and do a detailed review of that.

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket
Posted by Ian Jackson 4 years, 5 months ago
Anthony PERARD writes ("Re: [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"):
> Those suggestions are interesting idea, but I would prefer to have libxl
> been able to deal with any version of QEMU, so without having to patch
> QEMU. Beside serialising QMP access in the code, fork/lock strategy
> might be the only other way. (Well there is also fork/connect with a
> blocking fd, but we already have code for fork/lock.)
> 
> So I'll keep working on the fork/lock strategy.

OK.  Thanks for the detailed reply, which makes sense to me.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket
Posted by Sander Eikelenboom 4 years, 6 months ago
On 25/10/2019 19:05, Anthony PERARD wrote:
> Patch series available in this git branch:
> https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git br.fix-ev_qmp-multi-connect-v1
> 
> Hi,
> 
> QEMU's QMP socket doesn't allow multiple concurrent connection. Also, it
> listen() on the socket with a `backlog' of only 1. On Linux at least, once that
> backlog is filled connect() will return EAGAIN if the socket fd is
> non-blocking. libxl may attempt many concurrent connect() attempt if for
> example a guest is started with several PCI passthrough devices, and a
> connect() failure lead to a failure to start the guest.

Hi Anthony,

Just tested with the patch series and it fixes my issue with starting a
guest with several PCI passthrough devices.

Thanks,

Sander


> Since we can't change the listen()'s `backlog' that QEMU use, we need other
> ways to workaround the issue. This patch series introduce a lock to acquire
> before attempting to connect() to the QMP socket. Since the lock might be held
> for to long, the series also introduce a way to cancel the acquisition of the
> lock, this means killing the process that tries to get the lock.
> 
> Alternatively to this craziness, it might be possible to increase the `backlog'
> value by having libxl opening the QMP socket on behalf of QEMU. But this is
> only possible with a recent version of QEMU (2.12 or newer, released in Apr
> 2018, or qemu-xen-4.12 or newer). It would involve to discover QEMU's
> capability before we start the DM, which libxl isn't capable yet.
> 
> Cheers,
> 
> Anthony PERARD (4):
>   libxl: Introduce libxl__ev_child_kill
>   libxl: Introduce libxl__ev_qmplock
>   libxl: libxl__ev_qmp_send now takes an egc
>   libxl_qmp: Have a lock for QMP socket access
> 
>  tools/libxl/libxl_disk.c        |  6 +--
>  tools/libxl/libxl_dm.c          |  8 ++--
>  tools/libxl/libxl_dom_save.c    |  2 +-
>  tools/libxl/libxl_dom_suspend.c |  2 +-
>  tools/libxl/libxl_domain.c      |  8 ++--
>  tools/libxl/libxl_event.c       |  3 +-
>  tools/libxl/libxl_fork.c        | 55 ++++++++++++++++++++++++
>  tools/libxl/libxl_internal.c    | 31 +++++++++++++-
>  tools/libxl/libxl_internal.h    | 53 +++++++++++++++++------
>  tools/libxl/libxl_pci.c         |  8 ++--
>  tools/libxl/libxl_qmp.c         | 75 +++++++++++++++++++++++++--------
>  tools/libxl/libxl_usb.c         | 28 ++++++------
>  12 files changed, 219 insertions(+), 60 deletions(-)
> 


-- 

Met vriendelijke groet,

Sander Eikelenboom
mailto:Sander@Eikelenboom.IT

Eikelenboom IT Services
Kaapseweg 70
5642 HK Eindhoven
M: 06-14387484

PGP public key for sander@eikelenboom.it:
key id: 0xC4B99EEDECF2AE69
fingerprint: 07BB B819 FF93 E54D 5F5C  0BDE C4B9 9EED ECF2 AE69

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel