[libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation

Martin Kletzander posted 1 patch 6 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/2b28b2c4a6bc9d8adb54631733373e11396511fe.1517571843.git.mkletzan@redhat.com
src/qemu/qemu_process.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
[libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
Posted by Martin Kletzander 6 years, 2 months ago
Since one of the things in capabilities (info from resctrl updated with data
about caches) can be change on the system by remounting the /sys/fs/resctrl with
different options, the capabilities need to be refreshed.  There is a better fix
in the works, but it's going to be way bigger than this (hence the XXX note
there), so for the time being let's workaround this.  And in order not to slow
down the domain starting, only get the capabilities if there are any cachetunes.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
---
 src/qemu/qemu_process.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index bcd4ac8ad699..6cdb9ebc7fce 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2514,9 +2514,15 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver,
 {
     int ret = -1;
     size_t i = 0;
-    virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false);
+    virCapsPtr caps = NULL;
     qemuDomainObjPrivatePtr priv = vm->privateData;
 
+    if (!vm->def->ncachetunes)
+        return 0;
+
+    /* Force capability refresh since resctrl info can change
+     * XXX: move cache info into virresctrl so caps are not needed */
+    caps = virQEMUDriverGetCapabilities(driver, true);
     if (!caps)
         return -1;
 
-- 
2.16.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
Posted by Peter Krempa 6 years, 2 months ago
On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote:
> Since one of the things in capabilities (info from resctrl updated with data
> about caches) can be change on the system by remounting the /sys/fs/resctrl with
> different options, the capabilities need to be refreshed.  There is a better fix
> in the works, but it's going to be way bigger than this (hence the XXX note
> there), so for the time being let's workaround this.  And in order not to slow
> down the domain starting, only get the capabilities if there are any cachetunes.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780

This BZ describes a crash if the filesystem is remounted, but you are
attempting to fix this not by fixing the code that crashed but by
re-loading the information if possibly somebody remounted it.

This does not seem to be the correct fix since you still have a race
window, where the options can be changed after the refresh is executed
and prior to using them in the code where it actually crashed.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
Posted by Martin Kletzander 6 years, 2 months ago
On Fri, Feb 02, 2018 at 02:29:03PM +0100, Peter Krempa wrote:
>On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote:
>> Since one of the things in capabilities (info from resctrl updated with data
>> about caches) can be change on the system by remounting the /sys/fs/resctrl with
>> different options, the capabilities need to be refreshed.  There is a better fix
>> in the works, but it's going to be way bigger than this (hence the XXX note
>> there), so for the time being let's workaround this.  And in order not to slow
>> down the domain starting, only get the capabilities if there are any cachetunes.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
>
>This BZ describes a crash if the filesystem is remounted, but you are
>attempting to fix this not by fixing the code that crashed but by
>re-loading the information if possibly somebody remounted it.
>
>This does not seem to be the correct fix since you still have a race
>window, where the options can be changed after the refresh is executed
>and prior to using them in the code where it actually crashed.

Yeah, I'm looking at that as well.  It will need a restructuring (moving some
conf code to util - it'll also look nicer), but fix for exactly what is
happening here is enough for now.

Martin--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
Posted by Peter Krempa 6 years, 2 months ago
On Fri, Feb 02, 2018 at 14:57:54 +0100, Martin Kletzander wrote:
> On Fri, Feb 02, 2018 at 02:29:03PM +0100, Peter Krempa wrote:
> > On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote:
> > > Since one of the things in capabilities (info from resctrl updated with data
> > > about caches) can be change on the system by remounting the /sys/fs/resctrl with
> > > different options, the capabilities need to be refreshed.  There is a better fix
> > > in the works, but it's going to be way bigger than this (hence the XXX note
> > > there), so for the time being let's workaround this.  And in order not to slow
> > > down the domain starting, only get the capabilities if there are any cachetunes.
> > > 
> > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
> > 
> > This BZ describes a crash if the filesystem is remounted, but you are
> > attempting to fix this not by fixing the code that crashed but by
> > re-loading the information if possibly somebody remounted it.
> > 
> > This does not seem to be the correct fix since you still have a race
> > window, where the options can be changed after the refresh is executed
> > and prior to using them in the code where it actually crashed.
> 
> Yeah, I'm looking at that as well.  It will need a restructuring (moving some
> conf code to util - it'll also look nicer), but fix for exactly what is
> happening here is enough for now.

Well, I'm not okay with selling this as a fix for the crash described in
the bugzilla. I might be okay with doing this as a mitigation for stale
data, but this is not a fix for the crash in any way.

We have the same kind of issue (minus the crash) with hugetlbfs mount
data or host cpu maps in the numa host description and we don't refresh
the capabilities at every start of those.

So there's the problem that the cache filesystem data is stale and your
host might fail to start which is inconvenient but not a big of a
problem.

The second issue is the crash if the data is stale and this certainly
does not fix that.

I will be okay with this patch (since it's in an unlikely code path) if
you rewrite the commit message and completely drop the mention of the
bugzilla above. This patch simply does not fix that BZ.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list