Add mechanisms to force QEMU capabilities refetches

[libvirt] [PATCH RFC 0/3] Add mechanisms to force QEMU capabilities refetches

Posted by John Ferlan 6 years ago

Sending as an RFC primarily because I'm looking for whether either
or both mechanisms in the series is more or less desired. Likewise,
if it's felt that the current process of telling customers to just
delete the cache is acceptible, then so be it. If there's other ideas
I'm willing to give them a go too. I did consider adding a virsh
option to "virsh capabilities" (still possible) and/or a virt-admin
option to force the refresh. These just were "easier" and didn't
require an API adjustment to implement.

Patch1 is essentially a means to determine if the kernel config
was changed to allow nested virtualization and to force a refresh
of the capabilities in that case. Without doing so the CPU settings
for a guest may not add the vmx=on depending on configuration and
for the user that doesn't make sense. There is a private bz on this
so I won't bother posting it.

Patch2 and Patch3 make use of the 'service libvirtd reload' function
in order to invalidate all the entries in the internal QEMU capabilities
hash table and then to force a reread. This perhaps has downsides related
to guest usage and previous means to use reload and not refresh if a guest
was running. On the other hand, we tell people to just clear the QEMU
capabilities cache (e.g. rm /var/cache/libvirt/qemu/capabilities/*.xml)
and restart libvirtd, so in essence, the same result. It's not clear
how frequently this is used (it's essentially a SIGHUP to libvirtd).

BTW: I did try deleting the files in the cache in one iteration, but
for some reason (and I didn't investigate too long), the files wouldn't
be recreated even though the cache was repopulated. They were recreated
after a libvirt restart...

John Ferlan (3):
  qemu: Add check for whether nesting was enabled
  util: Introduce virFileCacheForEach
  qemu: Add ability to invalidate the capabilities cache

 src/libvirt_private.syms     |  1 +
 src/qemu/qemu_capabilities.c | 78 ++++++++++++++++++++++++++++++++++++
 src/qemu/qemu_capabilities.h |  2 +
 src/qemu/qemu_capspriv.h     |  2 +
 src/qemu/qemu_driver.c       |  4 +-
 src/util/virfilecache.c      | 24 +++++++++++
 src/util/virfilecache.h      |  5 +++
 tests/qemucapabilitiestest.c |  3 ++
 8 files changed, 118 insertions(+), 1 deletion(-)

-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH RFC 0/3] Add mechanisms to force QEMU capabilities refetches

Posted by Daniel P. Berrangé 6 years ago

On Tue, Nov 13, 2018 at 03:21:03PM -0500, John Ferlan wrote:
> Sending as an RFC primarily because I'm looking for whether either
> or both mechanisms in the series is more or less desired. Likewise,
> if it's felt that the current process of telling customers to just
> delete the cache is acceptible, then so be it. If there's other ideas
> I'm willing to give them a go too. I did consider adding a virsh
> option to "virsh capabilities" (still possible) and/or a virt-admin
> option to force the refresh. These just were "easier" and didn't
> require an API adjustment to implement.
> 
> Patch1 is essentially a means to determine if the kernel config
> was changed to allow nested virtualization and to force a refresh
> of the capabilities in that case. Without doing so the CPU settings
> for a guest may not add the vmx=on depending on configuration and
> for the user that doesn't make sense. There is a private bz on this
> so I won't bother posting it.
> 
> Patch2 and Patch3 make use of the 'service libvirtd reload' function
> in order to invalidate all the entries in the internal QEMU capabilities
> hash table and then to force a reread. This perhaps has downsides related
> to guest usage and previous means to use reload and not refresh if a guest
> was running. On the other hand, we tell people to just clear the QEMU
> capabilities cache (e.g. rm /var/cache/libvirt/qemu/capabilities/*.xml)
> and restart libvirtd, so in essence, the same result. It's not clear
> how frequently this is used (it's essentially a SIGHUP to libvirtd).

IMHO the fact that we cache stuff should be completely invisible outside
of libvirt. Sure we've had some bugs in this area, but they are not very
frequent so I'm not enthusiastic to expose any knob to force rebuild beyond
just deleting files.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH RFC 0/3] Add mechanisms to force QEMU capabilities refetches

Posted by John Ferlan 6 years ago


On 11/14/18 4:25 AM, Daniel P. Berrangé wrote:
> On Tue, Nov 13, 2018 at 03:21:03PM -0500, John Ferlan wrote:
>> Sending as an RFC primarily because I'm looking for whether either
>> or both mechanisms in the series is more or less desired. Likewise,
>> if it's felt that the current process of telling customers to just
>> delete the cache is acceptible, then so be it. If there's other ideas
>> I'm willing to give them a go too. I did consider adding a virsh
>> option to "virsh capabilities" (still possible) and/or a virt-admin
>> option to force the refresh. These just were "easier" and didn't
>> require an API adjustment to implement.
>>
>> Patch1 is essentially a means to determine if the kernel config
>> was changed to allow nested virtualization and to force a refresh
>> of the capabilities in that case. Without doing so the CPU settings
>> for a guest may not add the vmx=on depending on configuration and
>> for the user that doesn't make sense. There is a private bz on this
>> so I won't bother posting it.
>>
>> Patch2 and Patch3 make use of the 'service libvirtd reload' function
>> in order to invalidate all the entries in the internal QEMU capabilities
>> hash table and then to force a reread. This perhaps has downsides related
>> to guest usage and previous means to use reload and not refresh if a guest
>> was running. On the other hand, we tell people to just clear the QEMU
>> capabilities cache (e.g. rm /var/cache/libvirt/qemu/capabilities/*.xml)
>> and restart libvirtd, so in essence, the same result. It's not clear
>> how frequently this is used (it's essentially a SIGHUP to libvirtd).
> 
> IMHO the fact that we cache stuff should be completely invisible outside
> of libvirt. Sure we've had some bugs in this area, but they are not very
> frequent so I'm not enthusiastic to expose any knob to force rebuild beyond
> just deleting files.
> 

OK - so that more or less obviates patch2 and patch3...

Of course the fact that we cache stuff hasn't been completely invisible
and telling someone to fix the problem by "simply" removing the cache
files and pointing them to the cache location seems a bit "awkward" once
you figure out that is the problem of course. Many times it's just not
intuitively obvious!

OTOH, baking in the idea that a "reload" could remove the chance that
caching was the problem could be useful. I guess it just felt like it
was a perhaps less used option. Assuming of course most would use
stop/start or restart instead.

John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH RFC 0/3] Add mechanisms to force QEMU capabilities refetches

Posted by Daniel P. Berrangé 6 years ago

On Wed, Nov 14, 2018 at 02:22:12PM -0500, John Ferlan wrote:
> 
> 
> On 11/14/18 4:25 AM, Daniel P. Berrangé wrote:
> > On Tue, Nov 13, 2018 at 03:21:03PM -0500, John Ferlan wrote:
> >> Sending as an RFC primarily because I'm looking for whether either
> >> or both mechanisms in the series is more or less desired. Likewise,
> >> if it's felt that the current process of telling customers to just
> >> delete the cache is acceptible, then so be it. If there's other ideas
> >> I'm willing to give them a go too. I did consider adding a virsh
> >> option to "virsh capabilities" (still possible) and/or a virt-admin
> >> option to force the refresh. These just were "easier" and didn't
> >> require an API adjustment to implement.
> >>
> >> Patch1 is essentially a means to determine if the kernel config
> >> was changed to allow nested virtualization and to force a refresh
> >> of the capabilities in that case. Without doing so the CPU settings
> >> for a guest may not add the vmx=on depending on configuration and
> >> for the user that doesn't make sense. There is a private bz on this
> >> so I won't bother posting it.
> >>
> >> Patch2 and Patch3 make use of the 'service libvirtd reload' function
> >> in order to invalidate all the entries in the internal QEMU capabilities
> >> hash table and then to force a reread. This perhaps has downsides related
> >> to guest usage and previous means to use reload and not refresh if a guest
> >> was running. On the other hand, we tell people to just clear the QEMU
> >> capabilities cache (e.g. rm /var/cache/libvirt/qemu/capabilities/*.xml)
> >> and restart libvirtd, so in essence, the same result. It's not clear
> >> how frequently this is used (it's essentially a SIGHUP to libvirtd).
> > 
> > IMHO the fact that we cache stuff should be completely invisible outside
> > of libvirt. Sure we've had some bugs in this area, but they are not very
> > frequent so I'm not enthusiastic to expose any knob to force rebuild beyond
> > just deleting files.
> > 
> 
> OK - so that more or less obviates patch2 and patch3...
> 
> Of course the fact that we cache stuff hasn't been completely invisible
> and telling someone to fix the problem by "simply" removing the cache
> files and pointing them to the cache location seems a bit "awkward" once
> you figure out that is the problem of course. Many times it's just not
> intuitively obvious!
> 
> OTOH, baking in the idea that a "reload" could remove the chance that
> caching was the problem could be useful. I guess it just felt like it
> was a perhaps less used option. Assuming of course most would use
> stop/start or restart instead.

Looking at this from a more general POV,  cache invalidation bugs are
just one of many different bugs that can & have impacted libvirt over
the years. Adding a force reload API is essentially saying we want to
have 

   virConnectWhackPossibleBug()

because we think we might have a future bug. I don't think that is a
good design precedent or rationale - essentially its admitting failure.
If caching really were so terribly implemented that this is considered
needed, then I'd argue caching should be deleted. I don't think we are
in such a bad case though - the kind of problems have been fairly
niche in impact.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list