[PATCH] qemu: Do not error out when setting affinity failed

Martin Kletzander posted 1 patch 3 years, 7 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/8d147bfcdfe9bc871d07d52aef28ef9de285c151.1599222134.git.mkletzan@redhat.com
src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 38 insertions(+), 3 deletions(-)
[PATCH] qemu: Do not error out when setting affinity failed
Posted by Martin Kletzander 3 years, 7 months ago
At least in a particular scenario described in the code.  Basically when
libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
to set QEMU affinity to all CPUs (because there is no setting requested in the
XML) it fails.  But if we ignore the failure in this particular case than you
can limit the CPUs used by controlling the affinity for libvirtd itself.

In any other case (anything requested in the XML, pinning a live domain, etc.)
the call is still considered fatal and the action errors out.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
---
 src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index cfe09d632633..270bb37d3682 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
 static int
 qemuProcessInitCpuAffinity(virDomainObjPtr vm)
 {
+    bool settingAll = false;
     g_autoptr(virBitmap) cpumapToSet = NULL;
     virDomainNumatuneMemMode mem_mode;
     qemuDomainObjPrivatePtr priv = vm->privateData;
@@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
         if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
             return -1;
     } else {
+        settingAll = true;
         if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
             return -1;
     }
 
     if (cpumapToSet &&
         virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
-        return -1;
+        /*
+         * We only want to error out if we failed to set the affinity to
+         * user-requested mapping.  If we are just trying to reset the affinity
+         * to all CPUs and this fails it can only be an issue if:
+         *  1) libvirtd does not have CAP_SYS_NICE
+         *  2) libvirtd does not run on all CPUs
+         *
+         * However since this scenario is very improbable, we rather skip
+         * reporting the error because it helps running libvirtd in a a scenario
+         * where pinning is handled by someone else.
+         *
+         * See also: https://bugzilla.redhat.com/1819801#c2
+         */
+        if (settingAll)
+            virResetLastError();
+        else
+            return -1;
     }
 
     return 0;
@@ -2726,8 +2744,25 @@ qemuProcessSetupPid(virDomainObjPtr vm,
         affinity_cpumask = use_cpumask;
 
     /* Setup legacy affinity. */
-    if (affinity_cpumask && virProcessSetAffinity(pid, affinity_cpumask) < 0)
-        goto cleanup;
+    if (affinity_cpumask && virProcessSetAffinity(pid, affinity_cpumask) < 0) {
+        /*
+         * We only want to error out if we failed to set the affinity to
+         * user-requested mapping.  If we are just trying to reset the affinity
+         * to all CPUs and this fails it can only be an issue if:
+         *  1) libvirtd does not have CAP_SYS_NICE
+         *  2) libvirtd does not run on all CPUs
+         *
+         * However since this scenario is very improbable, we rather skip
+         * reporting the error because it helps running libvirtd in a a scenario
+         * where pinning is handled by someone else.
+         *
+         * See also: https://bugzilla.redhat.com/1819801#c2
+         */
+        if (affinity_cpumask == hostcpumap)
+            virResetLastError();
+        else
+            goto cleanup;
+    }
 
     /* Set scheduler type and priority, but not for the main thread. */
     if (sched &&
-- 
2.28.0

Re: [PATCH] qemu: Do not error out when setting affinity failed
Posted by Daniel P. Berrangé 3 years, 7 months ago
On Fri, Sep 04, 2020 at 02:22:14PM +0200, Martin Kletzander wrote:
> At least in a particular scenario described in the code.  Basically when
> libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
> to set QEMU affinity to all CPUs (because there is no setting requested in the
> XML) it fails.  But if we ignore the failure in this particular case than you
> can limit the CPUs used by controlling the affinity for libvirtd itself.
> 
> In any other case (anything requested in the XML, pinning a live domain, etc.)
> the call is still considered fatal and the action errors out.
> 
> Resolves: https://bugzilla.redhat.com/1819801

I'd prefer if this commit message outlined the reason why this change is
ok, instead of just pointing to the BZ

eg add the following text:


Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

> 
> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
> ---
>  src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index cfe09d632633..270bb37d3682 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
>  static int
>  qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>  {
> +    bool settingAll = false;
>      g_autoptr(virBitmap) cpumapToSet = NULL;
>      virDomainNumatuneMemMode mem_mode;
>      qemuDomainObjPrivatePtr priv = vm->privateData;
> @@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>          if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
>              return -1;
>      } else {
> +        settingAll = true;
>          if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
>              return -1;
>      }
>  
>      if (cpumapToSet &&
>          virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
> -        return -1;
> +        /*
> +         * We only want to error out if we failed to set the affinity to
> +         * user-requested mapping.  If we are just trying to reset the affinity
> +         * to all CPUs and this fails it can only be an issue if:
> +         *  1) libvirtd does not have CAP_SYS_NICE
> +         *  2) libvirtd does not run on all CPUs
> +         *
> +         * However since this scenario is very improbable, we rather skip
> +         * reporting the error because it helps running libvirtd in a a scenario
> +         * where pinning is handled by someone else.

I wouldn't call this scenario "improbably" - it is entirely expected
by some of our users. Replace these three lines with

  "This scenario can easily occurr when libvirtd is run
   inside a container with restrictive permissions and CPU
   pinning"


With the text changes

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] qemu: Do not error out when setting affinity failed
Posted by Martin Kletzander 3 years, 7 months ago
On Fri, Sep 04, 2020 at 01:32:24PM +0100, Daniel P. Berrangé wrote:
>On Fri, Sep 04, 2020 at 02:22:14PM +0200, Martin Kletzander wrote:
>> At least in a particular scenario described in the code.  Basically when
>> libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
>> to set QEMU affinity to all CPUs (because there is no setting requested in the
>> XML) it fails.  But if we ignore the failure in this particular case than you
>> can limit the CPUs used by controlling the affinity for libvirtd itself.
>>
>> In any other case (anything requested in the XML, pinning a live domain, etc.)
>> the call is still considered fatal and the action errors out.
>>
>> Resolves: https://bugzilla.redhat.com/1819801
>
>I'd prefer if this commit message outlined the reason why this change is
>ok, instead of just pointing to the BZ
>
>eg add the following text:
>
>
>Consider a host with 8 CPUs. There are the following possible scenarios
>
>1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs
>
>2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs
>
>3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
>   QEMU should get 8 CPUs
>
>4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
>   QEMU should get 8 CPUs
>
>5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
>   QEMU should get 4 CPUs
>
>6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
>   QEMU should get 4 CPUs
>
>Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.
>
>Scenario 3 works because libvirt checks current affinity first and
>skips the sched_setaffinity call, avoiding the SYS_NICE issue
>
>Scenario 4 works only if CAP_SYS_NICE is availalbe
>
>Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
>cpuset is not set on the container.
>
>If libvirt blindly ignores the sched_setaffinity failure, then scenarios
>4, 5 and 6 should all work, but with caveat in case 4 and 6, that
>QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
>This is still better than failing.
>
>Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
>ignore it when there was no affinity specified in the XML config.
>If user specified affinity explicitly, libvirt must report an error if
>it can't be honoured.
>

I replaced my commit message with yours.

>>
>> Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
>> Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
>> ---
>>  src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index cfe09d632633..270bb37d3682 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
>>  static int
>>  qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>>  {
>> +    bool settingAll = false;
>>      g_autoptr(virBitmap) cpumapToSet = NULL;
>>      virDomainNumatuneMemMode mem_mode;
>>      qemuDomainObjPrivatePtr priv = vm->privateData;
>> @@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>>          if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
>>              return -1;
>>      } else {
>> +        settingAll = true;
>>          if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
>>              return -1;
>>      }
>>
>>      if (cpumapToSet &&
>>          virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
>> -        return -1;
>> +        /*
>> +         * We only want to error out if we failed to set the affinity to
>> +         * user-requested mapping.  If we are just trying to reset the affinity
>> +         * to all CPUs and this fails it can only be an issue if:
>> +         *  1) libvirtd does not have CAP_SYS_NICE
>> +         *  2) libvirtd does not run on all CPUs
>> +         *
>> +         * However since this scenario is very improbable, we rather skip
>> +         * reporting the error because it helps running libvirtd in a a scenario
>> +         * where pinning is handled by someone else.
>
>I wouldn't call this scenario "improbably" - it is entirely expected
>by some of our users. Replace these three lines with
>
>  "This scenario can easily occurr when libvirtd is run
>   inside a container with restrictive permissions and CPU
>   pinning"
>

Yeah, I meant "improbable on bare metal".  I replaced them with your explanation.

>
>With the text changes
>
>Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>

Thanks, pushed.

>
>Regards,
>Daniel
>-- 
>|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] qemu: Do not error out when setting affinity failed
Posted by Ján Tomko 3 years, 7 months ago
On a Friday in 2020, Martin Kletzander wrote:
>On Fri, Sep 04, 2020 at 01:32:24PM +0100, Daniel P. Berrangé wrote:
>>On Fri, Sep 04, 2020 at 02:22:14PM +0200, Martin Kletzander wrote:
>>>At least in a particular scenario described in the code.  Basically when
>>>libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
>>>to set QEMU affinity to all CPUs (because there is no setting requested in the
>>>XML) it fails.  But if we ignore the failure in this particular case than you
>>>can limit the CPUs used by controlling the affinity for libvirtd itself.
>>>
>>>In any other case (anything requested in the XML, pinning a live domain, etc.)
>>>the call is still considered fatal and the action errors out.
>>>
>>>Resolves: https://bugzilla.redhat.com/1819801
>>
>>I'd prefer if this commit message outlined the reason why this change is
>>ok, instead of just pointing to the BZ
>>
>>eg add the following text:
>>
>>
>>Consider a host with 8 CPUs. There are the following possible scenarios
>>
>>1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs
>>
>>2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs
>>
>>3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
>>  QEMU should get 8 CPUs
>>
>>4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
>>  QEMU should get 8 CPUs
>>
>>5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
>>  QEMU should get 4 CPUs
>>
>>6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
>>  QEMU should get 4 CPUs
>>
>>Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.
>>
>>Scenario 3 works because libvirt checks current affinity first and
>>skips the sched_setaffinity call, avoiding the SYS_NICE issue
>>
>>Scenario 4 works only if CAP_SYS_NICE is availalbe
>>
>>Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
>>cpuset is not set on the container.
>>
>>If libvirt blindly ignores the sched_setaffinity failure, then scenarios
>>4, 5 and 6 should all work, but with caveat in case 4 and 6, that
>>QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
>>This is still better than failing.
>>
>>Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
>>ignore it when there was no affinity specified in the XML config.
>>If user specified affinity explicitly, libvirt must report an error if
>>it can't be honoured.
>>
>
>I replaced my commit message with yours.
>
>>>
>>>Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
>>>Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
>>>---
>>> src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
>>> 1 file changed, 38 insertions(+), 3 deletions(-)
>>>
>>>diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>>index cfe09d632633..270bb37d3682 100644
>>>--- a/src/qemu/qemu_process.c
>>>+++ b/src/qemu/qemu_process.c
>>>@@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
>>> static int
>>> qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>>> {
>>>+    bool settingAll = false;
>>>     g_autoptr(virBitmap) cpumapToSet = NULL;
>>>     virDomainNumatuneMemMode mem_mode;
>>>     qemuDomainObjPrivatePtr priv = vm->privateData;
>>>@@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
>>>         if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
>>>             return -1;
>>>     } else {
>>>+        settingAll = true;
>>>         if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
>>>             return -1;
>>>     }
>>>
>>>     if (cpumapToSet &&
>>>         virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
>>>-        return -1;
>>>+        /*
>>>+         * We only want to error out if we failed to set the affinity to
>>>+         * user-requested mapping.  If we are just trying to reset the affinity
>>>+         * to all CPUs and this fails it can only be an issue if:
>>>+         *  1) libvirtd does not have CAP_SYS_NICE
>>>+         *  2) libvirtd does not run on all CPUs
>>>+         *
>>>+         * However since this scenario is very improbable, we rather skip
>>>+         * reporting the error because it helps running libvirtd in a a scenario
>>>+         * where pinning is handled by someone else.

The patch as-is reports the error in all cases, it is merely ignored in
some cases.

virResetLastError clears the thread-local error structure, but it has no
power to remove the error from the log. It mostly should not be used
outside of public APIs where it makes sure we don't leave an error
from a previous API invocation there.

Reporting harmless errors confuses users and we have made changes in the
past specifically to suppress harmless errors when libvirt is run
in a container.

>>
>>I wouldn't call this scenario "improbably" - it is entirely expected
>>by some of our users. Replace these three lines with
>>
>> "This scenario can easily occurr when libvirtd is run

*occur

>>  inside a container with restrictive permissions and CPU
>>  pinning"
>>
>
>Yeah, I meant "improbable on bare metal".  I replaced them with your explanation.

Note that in the version pushed to master, you only replaced the text in
one of the two comments.

Jano

>>
>>With the text changes
>>
>>Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>>
>
>Thanks, pushed.
>
>>
>>Regards,
>>Daniel
>>-- 
>>|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>>|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>>|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] qemu: Do not error out when setting affinity failed
Posted by Martin Kletzander 3 years, 7 months ago
On Fri, Sep 04, 2020 at 04:37:06PM +0200, Ján Tomko wrote:
>virResetLastError clears the thread-local error structure, but it has no
>power to remove the error from the log. It mostly should not be used
>outside of public APIs where it makes sure we don't leave an error
>from a previous API invocation there.
>
>Reporting harmless errors confuses users and we have made changes in the
>past specifically to suppress harmless errors when libvirt is run
>in a container.
>

Like this?

   https://www.redhat.com/archives/libvir-list/2020-September/msg00311.html