[libvirt PATCH 00/17] qemu: Implement external limit manager feature

Andrea Bolognani posted 17 patches 3 years, 1 month ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20210305191404.529903-1-abologna@redhat.com
NEWS.rst                           |  10 +
src/conf/domain_conf.h             |   5 +-
src/qemu/libvirtd_qemu.aug         |   1 +
src/qemu/qemu.conf                 |  12 +
src/qemu/qemu_command.c            |   4 -
src/qemu/qemu_conf.c               |   4 +
src/qemu/qemu_conf.h               |   1 +
src/qemu/qemu_domain.c             |  47 ++--
src/qemu/qemu_migration.c          |   2 +
src/qemu/qemu_process.c            |  30 ++-
src/qemu/test_libvirtd_qemu.aug.in |   1 +
src/util/vircommand.c              |  21 +-
src/util/virprocess.c              | 340 ++++++++++++++++++++---------
src/util/virprocess.h              |   2 +-
tests/virprocessmock.c             |   7 +
15 files changed, 354 insertions(+), 133 deletions(-)
[libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Andrea Bolognani 3 years, 1 month ago
This feature has been requested by KubeVirt developers and will make
it possible for them to make some VFIO-related features, such as
migration and hotplug, work correctly.

  https://bugzilla.redhat.com/show_bug.cgi?id=1916346

The first part of the series, especially the first 9 patches, is
preparation work: it addresses a few annoying issues with our APIs
that deal with process limits, and makes them all nice, consistent
and easy to reason about while moving policy code from the generic
code to the QEMU driver where it belongs.

Andrea Bolognani (17):
  util: Document limit-related functions
  util: Simplify stubs
  util: Always pass a pid to virProcessSetMax*()
  util: Introduce virProcess{Get,Set}Limit()
  qemu: Make some minor tweaks
  qemu: Set all limits at the same time
  util: Have virCommand remember whether limits are set
  qemu: Set limits only when explicitly asked to do so
  util: Don't special-case setting a limit to zero
  conf: Rename original_memlock -> originalMemlock
  tests: Mock virProcessGetMaxMemLock()
  util: Try to get limits from /proc
  qemu: Don't ignore virProcessGetMaxMemLock() errors
  qemu: Refactor qemuDomainAdjustMaxMemLock()
  qemu: Add external_limit_manager config knob
  qemu: Wire up external limit manager
  news: Document external limit manager feature

 NEWS.rst                           |  10 +
 src/conf/domain_conf.h             |   5 +-
 src/qemu/libvirtd_qemu.aug         |   1 +
 src/qemu/qemu.conf                 |  12 +
 src/qemu/qemu_command.c            |   4 -
 src/qemu/qemu_conf.c               |   4 +
 src/qemu/qemu_conf.h               |   1 +
 src/qemu/qemu_domain.c             |  47 ++--
 src/qemu/qemu_migration.c          |   2 +
 src/qemu/qemu_process.c            |  30 ++-
 src/qemu/test_libvirtd_qemu.aug.in |   1 +
 src/util/vircommand.c              |  21 +-
 src/util/virprocess.c              | 340 ++++++++++++++++++++---------
 src/util/virprocess.h              |   2 +-
 tests/virprocessmock.c             |   7 +
 15 files changed, 354 insertions(+), 133 deletions(-)

-- 
2.26.2


Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Daniel P. Berrangé 3 years, 1 month ago
On Fri, Mar 05, 2021 at 08:13:47PM +0100, Andrea Bolognani wrote:
> This feature has been requested by KubeVirt developers and will make
> it possible for them to make some VFIO-related features, such as
> migration and hotplug, work correctly.
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1916346
> 
> The first part of the series, especially the first 9 patches, is
> preparation work: it addresses a few annoying issues with our APIs
> that deal with process limits, and makes them all nice, consistent
> and easy to reason about while moving policy code from the generic
> code to the QEMU driver where it belongs.

IIUC, the code was already supposed to attempt to set the limits
and gracefully carry on if it was unable todo so. Can you outline
the actual problem being solved, as wading through the bug comments
to understand the precise detail of the problem  is not very clear.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Andrea Bolognani 3 years, 1 month ago
On Mon, 2021-03-08 at 10:54 +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 05, 2021 at 08:13:47PM +0100, Andrea Bolognani wrote:
> > This feature has been requested by KubeVirt developers and will make
> > it possible for them to make some VFIO-related features, such as
> > migration and hotplug, work correctly.
> > 
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1916346
> > 
> > The first part of the series, especially the first 9 patches, is
> > preparation work: it addresses a few annoying issues with our APIs
> > that deal with process limits, and makes them all nice, consistent
> > and easy to reason about while moving policy code from the generic
> > code to the QEMU driver where it belongs.
> 
> IIUC, the code was already supposed to attempt to set the limits
> and gracefully carry on if it was unable todo so. Can you outline
> the actual problem being solved, as wading through the bug comments
> to understand the precise detail of the problem  is not very clear.

Not being able to set the memory locking limit shouldn't be a soft
failure, because that might cause QEMU to start up apparently fine
and then error out later on during the lifetime of the VM, which is
much worse than not starting up at all. We're actually doing this
correctly for the most part, which is why hotplug requests for VFIO
devices in KubeVirt result in errors under certain conditions.

The reason why VFIO device assignment is currently not completely
broken in KubeVirt is that, when the QEMU process is initially
started, we set the memory locking limit after fork(), so we can do
that using setrlimit() which doesn't require additional capabilities,
but in the hotplug scenario libvirtd needs to change the limits of a
different process: in that case we are forced to use prlimit(), which
fails due to the lack of CAP_SYS_RESOURCE.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Daniel P. Berrangé 3 years, 1 month ago
On Mon, Mar 08, 2021 at 02:11:56PM +0100, Andrea Bolognani wrote:
> On Mon, 2021-03-08 at 10:54 +0000, Daniel P. Berrangé wrote:
> > On Fri, Mar 05, 2021 at 08:13:47PM +0100, Andrea Bolognani wrote:
> > > This feature has been requested by KubeVirt developers and will make
> > > it possible for them to make some VFIO-related features, such as
> > > migration and hotplug, work correctly.
> > > 
> > >   https://bugzilla.redhat.com/show_bug.cgi?id=1916346
> > > 
> > > The first part of the series, especially the first 9 patches, is
> > > preparation work: it addresses a few annoying issues with our APIs
> > > that deal with process limits, and makes them all nice, consistent
> > > and easy to reason about while moving policy code from the generic
> > > code to the QEMU driver where it belongs.
> > 
> > IIUC, the code was already supposed to attempt to set the limits
> > and gracefully carry on if it was unable todo so. Can you outline
> > the actual problem being solved, as wading through the bug comments
> > to understand the precise detail of the problem  is not very clear.
> 
> Not being able to set the memory locking limit shouldn't be a soft
> failure, because that might cause QEMU to start up apparently fine
> and then error out later on during the lifetime of the VM, which is
> much worse than not starting up at all. We're actually doing this
> correctly for the most part, which is why hotplug requests for VFIO
> devices in KubeVirt result in errors under certain conditions.

Yep, I agree we shoudln't have it be a mere warning.  IIUC, we would
try to set the limit, and if we fail, then we query the current limit
to see if its satisfactory and report a fatal error if not.

> The reason why VFIO device assignment is currently not completely
> broken in KubeVirt is that, when the QEMU process is initially
> started, we set the memory locking limit after fork(), so we can do
> that using setrlimit() which doesn't require additional capabilities,
> but in the hotplug scenario libvirtd needs to change the limits of a
> different process: in that case we are forced to use prlimit(), which
> fails due to the lack of CAP_SYS_RESOURCE.

Since you added code to parse existing limits from /proc, I'm wondering
if we can just do without the config option. Simply try to use prlimit
and if it fails, query existing limits to determine if we sould treat
the prlimit as fatal or ignore it.  Overall I'd prefer libvirt to
"just work" out of the box rather than requiring people to know about
setting a "make-vfio-hotplug-work=yes" flag in the config file.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Andrea Bolognani 3 years, 1 month ago
On Mon, 2021-03-08 at 13:17 +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 08, 2021 at 02:11:56PM +0100, Andrea Bolognani wrote:
> > The reason why VFIO device assignment is currently not completely
> > broken in KubeVirt is that, when the QEMU process is initially
> > started, we set the memory locking limit after fork(), so we can do
> > that using setrlimit() which doesn't require additional capabilities,
> > but in the hotplug scenario libvirtd needs to change the limits of a
> > different process: in that case we are forced to use prlimit(), which
> > fails due to the lack of CAP_SYS_RESOURCE.
> 
> Since you added code to parse existing limits from /proc, I'm wondering
> if we can just do without the config option. Simply try to use prlimit
> and if it fails, query existing limits to determine if we sould treat
> the prlimit as fatal or ignore it.  Overall I'd prefer libvirt to
> "just work" out of the box rather than requiring people to know about
> setting a "make-vfio-hotplug-work=yes" flag in the config file.

The problem with that approach is what to do when *lowering* the
limit, for example as a consequence of hot-unplugging the last VFIO
device from the VM.

If we're controlling the memory locking limit ourselves, then failure
to lower it should be an error, because leaving the limit much higher
than necessary creates potential for DoS by a compromised QEMU; on
the other hand, if the limit is controlled by an external process,
all we can really do is assume they will do the right thing after
hot-unplugging has happened.

I don't think discoverability is too much of an issue, as anyone who
needs to use this option will already have needed to figure out a lot
more in order to effectively take over memory locking limit
management responsibilities from libvirt...

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Daniel P. Berrangé 3 years, 1 month ago
On Mon, Mar 08, 2021 at 04:32:26PM +0100, Andrea Bolognani wrote:
> On Mon, 2021-03-08 at 13:17 +0000, Daniel P. Berrangé wrote:
> > On Mon, Mar 08, 2021 at 02:11:56PM +0100, Andrea Bolognani wrote:
> > > The reason why VFIO device assignment is currently not completely
> > > broken in KubeVirt is that, when the QEMU process is initially
> > > started, we set the memory locking limit after fork(), so we can do
> > > that using setrlimit() which doesn't require additional capabilities,
> > > but in the hotplug scenario libvirtd needs to change the limits of a
> > > different process: in that case we are forced to use prlimit(), which
> > > fails due to the lack of CAP_SYS_RESOURCE.
> > 
> > Since you added code to parse existing limits from /proc, I'm wondering
> > if we can just do without the config option. Simply try to use prlimit
> > and if it fails, query existing limits to determine if we sould treat
> > the prlimit as fatal or ignore it.  Overall I'd prefer libvirt to
> > "just work" out of the box rather than requiring people to know about
> > setting a "make-vfio-hotplug-work=yes" flag in the config file.
> 
> The problem with that approach is what to do when *lowering* the
> limit, for example as a consequence of hot-unplugging the last VFIO
> device from the VM.
> 
> If we're controlling the memory locking limit ourselves, then failure
> to lower it should be an error, because leaving the limit much higher
> than necessary creates potential for DoS by a compromised QEMU; on
> the other hand, if the limit is controlled by an external process,
> all we can really do is assume they will do the right thing after
> hot-unplugging has happened.

IMHO once QEMU vCPUs start running, immediately assume QEMU is
compromised / hostile. IOW, the DoS risk arrived the moment it
was given the higher limit.  We're just failing to close off the
existing risk we've already accepted, which doesn't worry me much.

On unplug the only thing we actually do when memory lock reduce
fails is to log a warning message, it is never treated as a
fatal error.

So the only difference is whether we skip the warning message
when we get EPERM from prlimit(), or always emit the warning.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Andrea Bolognani 3 years, 1 month ago
On Mon, 2021-03-08 at 15:57 +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 08, 2021 at 04:32:26PM +0100, Andrea Bolognani wrote:
> > On Mon, 2021-03-08 at 13:17 +0000, Daniel P. Berrangé wrote:
> > > Since you added code to parse existing limits from /proc, I'm wondering
> > > if we can just do without the config option. Simply try to use prlimit
> > > and if it fails, query existing limits to determine if we sould treat
> > > the prlimit as fatal or ignore it.  Overall I'd prefer libvirt to
> > > "just work" out of the box rather than requiring people to know about
> > > setting a "make-vfio-hotplug-work=yes" flag in the config file.
> > 
> > The problem with that approach is what to do when *lowering* the
> > limit, for example as a consequence of hot-unplugging the last VFIO
> > device from the VM.
> > 
> > If we're controlling the memory locking limit ourselves, then failure
> > to lower it should be an error, because leaving the limit much higher
> > than necessary creates potential for DoS by a compromised QEMU; on
> > the other hand, if the limit is controlled by an external process,
> > all we can really do is assume they will do the right thing after
> > hot-unplugging has happened.
> 
> IMHO once QEMU vCPUs start running, immediately assume QEMU is
> compromised / hostile. IOW, the DoS risk arrived the moment it
> was given the higher limit.  We're just failing to close off the
> existing risk we've already accepted, which doesn't worry me much.
> 
> On unplug the only thing we actually do when memory lock reduce
> fails is to log a warning message, it is never treated as a
> fatal error.
> 
> So the only difference is whether we skip the warning message
> when we get EPERM from prlimit(), or always emit the warning.

You're right, we're currently just soft-failing when we can't lower
the memlock limit on unplug. Given this and your assessment of the
security implications, which I trust, we should indeed be able to
avoid introducing the qemu.conf knob and just behave sanely in all
scenarios out of the box. I'll give it a try.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 00/17] qemu: Implement external limit manager feature
Posted by Michal Privoznik 3 years, 1 month ago
On 3/5/21 8:13 PM, Andrea Bolognani wrote:
> This feature has been requested by KubeVirt developers and will make
> it possible for them to make some VFIO-related features, such as
> migration and hotplug, work correctly.
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=1916346
> 
> The first part of the series, especially the first 9 patches, is
> preparation work: it addresses a few annoying issues with our APIs
> that deal with process limits, and makes them all nice, consistent
> and easy to reason about while moving policy code from the generic
> code to the QEMU driver where it belongs.
> 
> Andrea Bolognani (17):
>    util: Document limit-related functions
>    util: Simplify stubs
>    util: Always pass a pid to virProcessSetMax*()
>    util: Introduce virProcess{Get,Set}Limit()
>    qemu: Make some minor tweaks
>    qemu: Set all limits at the same time
>    util: Have virCommand remember whether limits are set
>    qemu: Set limits only when explicitly asked to do so
>    util: Don't special-case setting a limit to zero
>    conf: Rename original_memlock -> originalMemlock
>    tests: Mock virProcessGetMaxMemLock()
>    util: Try to get limits from /proc
>    qemu: Don't ignore virProcessGetMaxMemLock() errors
>    qemu: Refactor qemuDomainAdjustMaxMemLock()
>    qemu: Add external_limit_manager config knob
>    qemu: Wire up external limit manager
>    news: Document external limit manager feature
> 
>   NEWS.rst                           |  10 +
>   src/conf/domain_conf.h             |   5 +-
>   src/qemu/libvirtd_qemu.aug         |   1 +
>   src/qemu/qemu.conf                 |  12 +
>   src/qemu/qemu_command.c            |   4 -
>   src/qemu/qemu_conf.c               |   4 +
>   src/qemu/qemu_conf.h               |   1 +
>   src/qemu/qemu_domain.c             |  47 ++--
>   src/qemu/qemu_migration.c          |   2 +
>   src/qemu/qemu_process.c            |  30 ++-
>   src/qemu/test_libvirtd_qemu.aug.in |   1 +
>   src/util/vircommand.c              |  21 +-
>   src/util/virprocess.c              | 340 ++++++++++++++++++++---------
>   src/util/virprocess.h              |   2 +-
>   tests/virprocessmock.c             |   7 +
>   15 files changed, 354 insertions(+), 133 deletions(-)
> 

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal