[libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information

Bjoern Walk posted 7 patches 5 years ago
Failed in applying to current master (apply log)
docs/news.xml                       |  11 +++
include/libvirt/libvirt-domain.h    |  76 +++++++++++++++++
src/driver-hypervisor.h             |   9 ++
src/libvirt-domain.c                |  62 ++++++++++++++
src/libvirt_public.syms             |   5 ++
src/qemu/qemu_domain.c              |  80 +++++++++++++++++-
src/qemu/qemu_domain.h              |  47 +++++++++++
src/qemu/qemu_driver.c              | 126 +++++++++++++++++++++++++++-
src/qemu/qemu_monitor.c             |  53 +-----------
src/qemu/qemu_monitor.h             |  48 ++---------
src/qemu/qemu_monitor_json.c        |  20 ++---
src/qemu/qemu_process.c             |   2 +-
src/remote/remote_daemon_dispatch.c |  50 +++++++++++
src/remote/remote_driver.c          |  48 +++++++++++
src/remote/remote_protocol.x        |  22 ++++-
src/remote_protocol-structs         |  13 +++
tools/virsh-domain-monitor.c        |  94 +++++++++++++++++++--
tools/virsh.pod                     |   5 +-
18 files changed, 651 insertions(+), 120 deletions(-)
[libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Bjoern Walk 5 years ago
This patch series introduces the ability to save additional information
for the domain state and exposes this information in virsh domstate.

For example in the case of QEMU guest panic events, we can provide additional
information like the crash reason or register state of the domain. This
information usually gets logged in the domain log but for debugging it is
useful to have it accessible from the client. Therefore, let's introduce a new
public API function, virDomainGetStateParams, an extensible version of
virDomainGetState, which returns the complete state of the domain, including
newly introduced additional information.

Let's also extend virsh domstate and introduce a new parameter --info to show
the domain state, reason and additional information when available.

    virsh # domstate --info guest-1
    crashed (panicked)
    s390.core     = 0
    s390.psw-mask = 0x0002000180000000
    s390.psw-addr = 0x000000000010f146
    s390.reason   = disabled-wait

v2 -> v3:
    * try to find a reasonable naming-scheme for parameters
    * make state/reason in virDomainGetStateParams optional
    * use new API only when requested
    * print additional information per line for easier consumption
v1 -> v2:
    * refactored the public API according to discussions to provide a
      more machine-parsable interface


Bjoern Walk (7):
  qemu: monitor: move event data structure to domain
  qemu: domain: store and update panic info
  lib: introduce virDomainGetStateParams function
  remote: implement remoteDomainGetStateParams
  qemu: implement qemuDomainGetStateParams
  virsh: domstate: report detailed state if available
  news: add entry for virDomainGetStateParams

 docs/news.xml                       |  11 +++
 include/libvirt/libvirt-domain.h    |  76 +++++++++++++++++
 src/driver-hypervisor.h             |   9 ++
 src/libvirt-domain.c                |  62 ++++++++++++++
 src/libvirt_public.syms             |   5 ++
 src/qemu/qemu_domain.c              |  80 +++++++++++++++++-
 src/qemu/qemu_domain.h              |  47 +++++++++++
 src/qemu/qemu_driver.c              | 126 +++++++++++++++++++++++++++-
 src/qemu/qemu_monitor.c             |  53 +-----------
 src/qemu/qemu_monitor.h             |  48 ++---------
 src/qemu/qemu_monitor_json.c        |  20 ++---
 src/qemu/qemu_process.c             |   2 +-
 src/remote/remote_daemon_dispatch.c |  50 +++++++++++
 src/remote/remote_driver.c          |  48 +++++++++++
 src/remote/remote_protocol.x        |  22 ++++-
 src/remote_protocol-structs         |  13 +++
 tools/virsh-domain-monitor.c        |  94 +++++++++++++++++++--
 tools/virsh.pod                     |   5 +-
 18 files changed, 651 insertions(+), 120 deletions(-)

-- 
2.19.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Daniel P. Berrangé 5 years ago
On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> This patch series introduces the ability to save additional information
> for the domain state and exposes this information in virsh domstate.
> 
> For example in the case of QEMU guest panic events, we can provide additional
> information like the crash reason or register state of the domain. This
> information usually gets logged in the domain log but for debugging it is
> useful to have it accessible from the client. Therefore, let's introduce a new
> public API function, virDomainGetStateParams, an extensible version of
> virDomainGetState, which returns the complete state of the domain, including
> newly introduced additional information.
> 
> Let's also extend virsh domstate and introduce a new parameter --info to show
> the domain state, reason and additional information when available.
> 
>     virsh # domstate --info guest-1
>     crashed (panicked)
>     s390.core     = 0
>     s390.psw-mask = 0x0002000180000000
>     s390.psw-addr = 0x000000000010f146
>     s390.reason   = disabled-wait

This info is all just guest panick related data, so I'm not covinced we
should overload  "domstate" for this random set of low level hardware
parameters.

Why not just have  virDomainGetPanicInfo() and "virsh dompanicinfo"

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Peter Krempa 5 years ago
On Thu, Apr 04, 2019 at 09:49:16 +0100, Daniel Berrange wrote:
> On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> > This patch series introduces the ability to save additional information
> > for the domain state and exposes this information in virsh domstate.
> > 
> > For example in the case of QEMU guest panic events, we can provide additional
> > information like the crash reason or register state of the domain. This
> > information usually gets logged in the domain log but for debugging it is
> > useful to have it accessible from the client. Therefore, let's introduce a new
> > public API function, virDomainGetStateParams, an extensible version of
> > virDomainGetState, which returns the complete state of the domain, including
> > newly introduced additional information.
> > 
> > Let's also extend virsh domstate and introduce a new parameter --info to show
> > the domain state, reason and additional information when available.
> > 
> >     virsh # domstate --info guest-1
> >     crashed (panicked)
> >     s390.core     = 0
> >     s390.psw-mask = 0x0002000180000000
> >     s390.psw-addr = 0x000000000010f146
> >     s390.reason   = disabled-wait
> 
> This info is all just guest panick related data, so I'm not covinced we
> should overload  "domstate" for this random set of low level hardware
> parameters.

I'm not even sure whether it's worth having an API to query it at all.
We don't really have means to store the data reliably across libvirtd
restarts as there is no status XML for inactive VMs. This means that the
data will get lost. It also will become instantly invalidated when
starting the VM.

Currently we log the data into the domain log file which in my opinion
feels good enough for this kind of low level information which is not of
much use for non-devs.

Additionally the advantage of logging is that bug reporting tools
usually capture the VM log files.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Daniel P. Berrangé 5 years ago
On Thu, Apr 04, 2019 at 11:48:40AM +0200, Peter Krempa wrote:
> On Thu, Apr 04, 2019 at 09:49:16 +0100, Daniel Berrange wrote:
> > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> > > This patch series introduces the ability to save additional information
> > > for the domain state and exposes this information in virsh domstate.
> > > 
> > > For example in the case of QEMU guest panic events, we can provide additional
> > > information like the crash reason or register state of the domain. This
> > > information usually gets logged in the domain log but for debugging it is
> > > useful to have it accessible from the client. Therefore, let's introduce a new
> > > public API function, virDomainGetStateParams, an extensible version of
> > > virDomainGetState, which returns the complete state of the domain, including
> > > newly introduced additional information.
> > > 
> > > Let's also extend virsh domstate and introduce a new parameter --info to show
> > > the domain state, reason and additional information when available.
> > > 
> > >     virsh # domstate --info guest-1
> > >     crashed (panicked)
> > >     s390.core     = 0
> > >     s390.psw-mask = 0x0002000180000000
> > >     s390.psw-addr = 0x000000000010f146
> > >     s390.reason   = disabled-wait
> > 
> > This info is all just guest panick related data, so I'm not covinced we
> > should overload  "domstate" for this random set of low level hardware
> > parameters.
> 
> I'm not even sure whether it's worth having an API to query it at all.
> We don't really have means to store the data reliably across libvirtd
> restarts as there is no status XML for inactive VMs. This means that the
> data will get lost. It also will become instantly invalidated when
> starting the VM.

I'm not bothered about loosing it when starting the VM. I tend to view
it as "point in time" data about CPU state.

I think the most immediately useful is to actually include this in an
async event when the crash happens.

It is possible to configure the panic action so that either QEMU is
killed (and optionally) restarted), or for QEMU to simply have its
CPUs stopped.

In the latter case I think it is reasonable to have an API to report
it and this lets us save it in the domain state XML too.

As soon as QEMU is actually stopped though, I think we should no
longer try to report it.

IOW, apps should use the event primarily. If they want to get it via
the API, they must configure QEMU to stop CPUs on panic, instead of
shutting down/starting.

I think this would fit well with my suggestion to consider this API
as a way to report live CPU registers (eg a libvirt API to expose
QEMU's  "info registers" data). Obviously that woudl require a QMP
impl first. Just saying this from a conceptual design POV.

> Currently we log the data into the domain log file which in my opinion
> feels good enough for this kind of low level information which is not of
> much use for non-devs.

We really should add an API via virStream to fetch logs

> Additionally the advantage of logging is that bug reporting tools
> usually capture the VM log files.

It is definitely good to have it in the logs, but at the same time this
is structured data and so it is good to preseve the structure for apps
if they want todo live analysis without needing human to read logs.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Bjoern Walk 5 years ago
Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]:
> On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> > This patch series introduces the ability to save additional information
> > for the domain state and exposes this information in virsh domstate.
> > 
> > For example in the case of QEMU guest panic events, we can provide additional
> > information like the crash reason or register state of the domain. This
> > information usually gets logged in the domain log but for debugging it is
> > useful to have it accessible from the client. Therefore, let's introduce a new
> > public API function, virDomainGetStateParams, an extensible version of
> > virDomainGetState, which returns the complete state of the domain, including
> > newly introduced additional information.
> > 
> > Let's also extend virsh domstate and introduce a new parameter --info to show
> > the domain state, reason and additional information when available.
> > 
> >     virsh # domstate --info guest-1
> >     crashed (panicked)
> >     s390.core     = 0
> >     s390.psw-mask = 0x0002000180000000
> >     s390.psw-addr = 0x000000000010f146
> >     s390.reason   = disabled-wait
> 
> This info is all just guest panick related data, so I'm not covinced we
> should overload  "domstate" for this random set of low level hardware
> parameters.

I want to have a flexible and extensible API function for all states
that provide any additional information. The crashed/panicked state just
happens to be the only one that does currently... We discussed the API
in v1 here: https://www.redhat.com/archives/libvir-list/2018-July/msg00690.html

> Why not just have  virDomainGetPanicInfo() and "virsh dompanicinfo"

Do we want to later add an additional public API function per state that
implements any additional information?

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Peter Krempa 5 years ago
On Thu, Apr 04, 2019 at 11:59:50 +0200, Bjoern Walk wrote:
> Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]:
> > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:

[...]

> > Why not just have  virDomainGetPanicInfo() and "virsh dompanicinfo"
> 
> Do we want to later add an additional public API function per state that
> implements any additional information?

Any elaboration on the additional information you'd want to report?

We have https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats
which is able to report any information including state info
(VIR_DOMAIN_STATS_STATE) using typed parameters. The API works even for
inactive VMs so should be usable in this case.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Daniel P. Berrangé 5 years ago
On Thu, Apr 04, 2019 at 11:59:50AM +0200, Bjoern Walk wrote:
> Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]:
> > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> > > This patch series introduces the ability to save additional information
> > > for the domain state and exposes this information in virsh domstate.
> > > 
> > > For example in the case of QEMU guest panic events, we can provide additional
> > > information like the crash reason or register state of the domain. This
> > > information usually gets logged in the domain log but for debugging it is
> > > useful to have it accessible from the client. Therefore, let's introduce a new
> > > public API function, virDomainGetStateParams, an extensible version of
> > > virDomainGetState, which returns the complete state of the domain, including
> > > newly introduced additional information.
> > > 
> > > Let's also extend virsh domstate and introduce a new parameter --info to show
> > > the domain state, reason and additional information when available.
> > > 
> > >     virsh # domstate --info guest-1
> > >     crashed (panicked)
> > >     s390.core     = 0
> > >     s390.psw-mask = 0x0002000180000000
> > >     s390.psw-addr = 0x000000000010f146
> > >     s390.reason   = disabled-wait
> > 
> > This info is all just guest panick related data, so I'm not covinced we
> > should overload  "domstate" for this random set of low level hardware
> > parameters.
> 
> I want to have a flexible and extensible API function for all states
> that provide any additional information. The crashed/panicked state just
> happens to be the only one that does currently... We discussed the API
> in v1 here: https://www.redhat.com/archives/libvir-list/2018-July/msg00690.html
> 
> > Why not just have  virDomainGetPanicInfo() and "virsh dompanicinfo"
> 
> Do we want to later add an additional public API function per state that
> implements any additional information?

I guess the obvious extra thing to want to report is CPU registers, since
the crash info is largely containing register and/or memory address info.

QEMU has "info registers" but no QMP variant of it at this time.

With this in mind though, the proposed API is not satisfactory. Specifically
the  field

  # define VIR_DOMAIN_STATE_PARAM_TYPE "info_type"

As that assumes an either / or reporting need.  If we added register
info, then we would potentially need to report crash info *and* register
info at the same time.

I think this is easy to solve though - just delete the
VIR_DOMAIN_STATE_PARAM_TYPE field as it is redundant. Apps can just
look at whatever named parameters exist & use those they care about.

The more critical thing is that in an SMP system, we'll need to report
registers per CPU, but this API is a per-VM level reporting.

This is something we do with virDomainGetCPUStats().

So I think we need a design closer to that API, and perhaps call it
"virDomainGetCPUState"  / virsh  domcpustate


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Bjoern Walk 5 years ago
So, what's my course of action here?

Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 11:32AM +0100]:
> I guess the obvious extra thing to want to report is CPU registers, since
> the crash info is largely containing register and/or memory address info.

Sounds reasonable if we go with the virDomainGetCPUState API.

> QEMU has "info registers" but no QMP variant of it at this time.

Can I still implement my stuff with the proposed API without the actual
register information until this is implemented in QEMU? Meaning, just
reporting the relevant register information from the panic event. While
it is nice to have the full feature set available, implementing the QMP
command for this would be a bit out of the scope of this proposal.

> With this in mind though, the proposed API is not satisfactory. Specifically
> the  field
> 
>   # define VIR_DOMAIN_STATE_PARAM_TYPE "info_type"
> 
> As that assumes an either / or reporting need.  If we added register
> info, then we would potentially need to report crash info *and* register
> info at the same time.
> 
> I think this is easy to solve though - just delete the
> VIR_DOMAIN_STATE_PARAM_TYPE field as it is redundant. Apps can just
> look at whatever named parameters exist & use those they care about.

Sure, the type parameter was basically an easy way for the client to
figure out what data it has retrieved, especially when the parameter
space was that large with all the (potential) different states.

> The more critical thing is that in an SMP system, we'll need to report
> registers per CPU, but this API is a per-VM level reporting.
> 
> This is something we do with virDomainGetCPUStats().
> 
> So I think we need a design closer to that API, and perhaps call it
> "virDomainGetCPUState"  / virsh  domcpustate

Any hint how this API actually should look like? Exactly like
virDomainGetCPUStats with the per-CPU-dance or should we just report
back all available information?

In the case of a panicked domain, how should we report the relevant
information (crashed core/reason)?

> Regards,
> Daniel

Thanks for the discussion and suggestions.

Bjoern

-- 
IBM Systems
Linux on Z & Virtualization Development
--------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--------------------------------------------------
Vorsitzende des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/7] extend virsh domstate to show additional information
Posted by Bjoern Walk 5 years ago
Bjoern Walk <bwalk@linux.ibm.com> [2019-04-05, 12:35PM +0200]:
> So, what's my course of action here?

ping. Any advice would be much appreciated.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list