docs/news.xml | 11 +++ include/libvirt/libvirt-domain.h | 76 +++++++++++++++++ src/driver-hypervisor.h | 9 ++ src/libvirt-domain.c | 62 ++++++++++++++ src/libvirt_public.syms | 5 ++ src/qemu/qemu_domain.c | 80 +++++++++++++++++- src/qemu/qemu_domain.h | 47 +++++++++++ src/qemu/qemu_driver.c | 126 +++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 53 +----------- src/qemu/qemu_monitor.h | 48 ++--------- src/qemu/qemu_monitor_json.c | 20 ++--- src/qemu/qemu_process.c | 2 +- src/remote/remote_daemon_dispatch.c | 50 +++++++++++ src/remote/remote_driver.c | 48 +++++++++++ src/remote/remote_protocol.x | 22 ++++- src/remote_protocol-structs | 13 +++ tools/virsh-domain-monitor.c | 94 +++++++++++++++++++-- tools/virsh.pod | 5 +- 18 files changed, 651 insertions(+), 120 deletions(-)
This patch series introduces the ability to save additional information for the domain state and exposes this information in virsh domstate. For example in the case of QEMU guest panic events, we can provide additional information like the crash reason or register state of the domain. This information usually gets logged in the domain log but for debugging it is useful to have it accessible from the client. Therefore, let's introduce a new public API function, virDomainGetStateParams, an extensible version of virDomainGetState, which returns the complete state of the domain, including newly introduced additional information. Let's also extend virsh domstate and introduce a new parameter --info to show the domain state, reason and additional information when available. virsh # domstate --info guest-1 crashed (panicked) s390.core = 0 s390.psw-mask = 0x0002000180000000 s390.psw-addr = 0x000000000010f146 s390.reason = disabled-wait v2 -> v3: * try to find a reasonable naming-scheme for parameters * make state/reason in virDomainGetStateParams optional * use new API only when requested * print additional information per line for easier consumption v1 -> v2: * refactored the public API according to discussions to provide a more machine-parsable interface Bjoern Walk (7): qemu: monitor: move event data structure to domain qemu: domain: store and update panic info lib: introduce virDomainGetStateParams function remote: implement remoteDomainGetStateParams qemu: implement qemuDomainGetStateParams virsh: domstate: report detailed state if available news: add entry for virDomainGetStateParams docs/news.xml | 11 +++ include/libvirt/libvirt-domain.h | 76 +++++++++++++++++ src/driver-hypervisor.h | 9 ++ src/libvirt-domain.c | 62 ++++++++++++++ src/libvirt_public.syms | 5 ++ src/qemu/qemu_domain.c | 80 +++++++++++++++++- src/qemu/qemu_domain.h | 47 +++++++++++ src/qemu/qemu_driver.c | 126 +++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 53 +----------- src/qemu/qemu_monitor.h | 48 ++--------- src/qemu/qemu_monitor_json.c | 20 ++--- src/qemu/qemu_process.c | 2 +- src/remote/remote_daemon_dispatch.c | 50 +++++++++++ src/remote/remote_driver.c | 48 +++++++++++ src/remote/remote_protocol.x | 22 ++++- src/remote_protocol-structs | 13 +++ tools/virsh-domain-monitor.c | 94 +++++++++++++++++++-- tools/virsh.pod | 5 +- 18 files changed, 651 insertions(+), 120 deletions(-) -- 2.19.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: > This patch series introduces the ability to save additional information > for the domain state and exposes this information in virsh domstate. > > For example in the case of QEMU guest panic events, we can provide additional > information like the crash reason or register state of the domain. This > information usually gets logged in the domain log but for debugging it is > useful to have it accessible from the client. Therefore, let's introduce a new > public API function, virDomainGetStateParams, an extensible version of > virDomainGetState, which returns the complete state of the domain, including > newly introduced additional information. > > Let's also extend virsh domstate and introduce a new parameter --info to show > the domain state, reason and additional information when available. > > virsh # domstate --info guest-1 > crashed (panicked) > s390.core = 0 > s390.psw-mask = 0x0002000180000000 > s390.psw-addr = 0x000000000010f146 > s390.reason = disabled-wait This info is all just guest panick related data, so I'm not covinced we should overload "domstate" for this random set of low level hardware parameters. Why not just have virDomainGetPanicInfo() and "virsh dompanicinfo" Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Apr 04, 2019 at 09:49:16 +0100, Daniel Berrange wrote: > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: > > This patch series introduces the ability to save additional information > > for the domain state and exposes this information in virsh domstate. > > > > For example in the case of QEMU guest panic events, we can provide additional > > information like the crash reason or register state of the domain. This > > information usually gets logged in the domain log but for debugging it is > > useful to have it accessible from the client. Therefore, let's introduce a new > > public API function, virDomainGetStateParams, an extensible version of > > virDomainGetState, which returns the complete state of the domain, including > > newly introduced additional information. > > > > Let's also extend virsh domstate and introduce a new parameter --info to show > > the domain state, reason and additional information when available. > > > > virsh # domstate --info guest-1 > > crashed (panicked) > > s390.core = 0 > > s390.psw-mask = 0x0002000180000000 > > s390.psw-addr = 0x000000000010f146 > > s390.reason = disabled-wait > > This info is all just guest panick related data, so I'm not covinced we > should overload "domstate" for this random set of low level hardware > parameters. I'm not even sure whether it's worth having an API to query it at all. We don't really have means to store the data reliably across libvirtd restarts as there is no status XML for inactive VMs. This means that the data will get lost. It also will become instantly invalidated when starting the VM. Currently we log the data into the domain log file which in my opinion feels good enough for this kind of low level information which is not of much use for non-devs. Additionally the advantage of logging is that bug reporting tools usually capture the VM log files. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Apr 04, 2019 at 11:48:40AM +0200, Peter Krempa wrote: > On Thu, Apr 04, 2019 at 09:49:16 +0100, Daniel Berrange wrote: > > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: > > > This patch series introduces the ability to save additional information > > > for the domain state and exposes this information in virsh domstate. > > > > > > For example in the case of QEMU guest panic events, we can provide additional > > > information like the crash reason or register state of the domain. This > > > information usually gets logged in the domain log but for debugging it is > > > useful to have it accessible from the client. Therefore, let's introduce a new > > > public API function, virDomainGetStateParams, an extensible version of > > > virDomainGetState, which returns the complete state of the domain, including > > > newly introduced additional information. > > > > > > Let's also extend virsh domstate and introduce a new parameter --info to show > > > the domain state, reason and additional information when available. > > > > > > virsh # domstate --info guest-1 > > > crashed (panicked) > > > s390.core = 0 > > > s390.psw-mask = 0x0002000180000000 > > > s390.psw-addr = 0x000000000010f146 > > > s390.reason = disabled-wait > > > > This info is all just guest panick related data, so I'm not covinced we > > should overload "domstate" for this random set of low level hardware > > parameters. > > I'm not even sure whether it's worth having an API to query it at all. > We don't really have means to store the data reliably across libvirtd > restarts as there is no status XML for inactive VMs. This means that the > data will get lost. It also will become instantly invalidated when > starting the VM. I'm not bothered about loosing it when starting the VM. I tend to view it as "point in time" data about CPU state. I think the most immediately useful is to actually include this in an async event when the crash happens. It is possible to configure the panic action so that either QEMU is killed (and optionally) restarted), or for QEMU to simply have its CPUs stopped. In the latter case I think it is reasonable to have an API to report it and this lets us save it in the domain state XML too. As soon as QEMU is actually stopped though, I think we should no longer try to report it. IOW, apps should use the event primarily. If they want to get it via the API, they must configure QEMU to stop CPUs on panic, instead of shutting down/starting. I think this would fit well with my suggestion to consider this API as a way to report live CPU registers (eg a libvirt API to expose QEMU's "info registers" data). Obviously that woudl require a QMP impl first. Just saying this from a conceptual design POV. > Currently we log the data into the domain log file which in my opinion > feels good enough for this kind of low level information which is not of > much use for non-devs. We really should add an API via virStream to fetch logs > Additionally the advantage of logging is that bug reporting tools > usually capture the VM log files. It is definitely good to have it in the logs, but at the same time this is structured data and so it is good to preseve the structure for apps if they want todo live analysis without needing human to read logs. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]: > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: > > This patch series introduces the ability to save additional information > > for the domain state and exposes this information in virsh domstate. > > > > For example in the case of QEMU guest panic events, we can provide additional > > information like the crash reason or register state of the domain. This > > information usually gets logged in the domain log but for debugging it is > > useful to have it accessible from the client. Therefore, let's introduce a new > > public API function, virDomainGetStateParams, an extensible version of > > virDomainGetState, which returns the complete state of the domain, including > > newly introduced additional information. > > > > Let's also extend virsh domstate and introduce a new parameter --info to show > > the domain state, reason and additional information when available. > > > > virsh # domstate --info guest-1 > > crashed (panicked) > > s390.core = 0 > > s390.psw-mask = 0x0002000180000000 > > s390.psw-addr = 0x000000000010f146 > > s390.reason = disabled-wait > > This info is all just guest panick related data, so I'm not covinced we > should overload "domstate" for this random set of low level hardware > parameters. I want to have a flexible and extensible API function for all states that provide any additional information. The crashed/panicked state just happens to be the only one that does currently... We discussed the API in v1 here: https://www.redhat.com/archives/libvir-list/2018-July/msg00690.html > Why not just have virDomainGetPanicInfo() and "virsh dompanicinfo" Do we want to later add an additional public API function per state that implements any additional information? > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Apr 04, 2019 at 11:59:50 +0200, Bjoern Walk wrote: > Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]: > > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: [...] > > Why not just have virDomainGetPanicInfo() and "virsh dompanicinfo" > > Do we want to later add an additional public API function per state that > implements any additional information? Any elaboration on the additional information you'd want to report? We have https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats which is able to report any information including state info (VIR_DOMAIN_STATS_STATE) using typed parameters. The API works even for inactive VMs so should be usable in this case. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Apr 04, 2019 at 11:59:50AM +0200, Bjoern Walk wrote: > Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 09:49AM +0100]: > > On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote: > > > This patch series introduces the ability to save additional information > > > for the domain state and exposes this information in virsh domstate. > > > > > > For example in the case of QEMU guest panic events, we can provide additional > > > information like the crash reason or register state of the domain. This > > > information usually gets logged in the domain log but for debugging it is > > > useful to have it accessible from the client. Therefore, let's introduce a new > > > public API function, virDomainGetStateParams, an extensible version of > > > virDomainGetState, which returns the complete state of the domain, including > > > newly introduced additional information. > > > > > > Let's also extend virsh domstate and introduce a new parameter --info to show > > > the domain state, reason and additional information when available. > > > > > > virsh # domstate --info guest-1 > > > crashed (panicked) > > > s390.core = 0 > > > s390.psw-mask = 0x0002000180000000 > > > s390.psw-addr = 0x000000000010f146 > > > s390.reason = disabled-wait > > > > This info is all just guest panick related data, so I'm not covinced we > > should overload "domstate" for this random set of low level hardware > > parameters. > > I want to have a flexible and extensible API function for all states > that provide any additional information. The crashed/panicked state just > happens to be the only one that does currently... We discussed the API > in v1 here: https://www.redhat.com/archives/libvir-list/2018-July/msg00690.html > > > Why not just have virDomainGetPanicInfo() and "virsh dompanicinfo" > > Do we want to later add an additional public API function per state that > implements any additional information? I guess the obvious extra thing to want to report is CPU registers, since the crash info is largely containing register and/or memory address info. QEMU has "info registers" but no QMP variant of it at this time. With this in mind though, the proposed API is not satisfactory. Specifically the field # define VIR_DOMAIN_STATE_PARAM_TYPE "info_type" As that assumes an either / or reporting need. If we added register info, then we would potentially need to report crash info *and* register info at the same time. I think this is easy to solve though - just delete the VIR_DOMAIN_STATE_PARAM_TYPE field as it is redundant. Apps can just look at whatever named parameters exist & use those they care about. The more critical thing is that in an SMP system, we'll need to report registers per CPU, but this API is a per-VM level reporting. This is something we do with virDomainGetCPUStats(). So I think we need a design closer to that API, and perhaps call it "virDomainGetCPUState" / virsh domcpustate Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
So, what's my course of action here? Daniel P. Berrangé <berrange@redhat.com> [2019-04-04, 11:32AM +0100]: > I guess the obvious extra thing to want to report is CPU registers, since > the crash info is largely containing register and/or memory address info. Sounds reasonable if we go with the virDomainGetCPUState API. > QEMU has "info registers" but no QMP variant of it at this time. Can I still implement my stuff with the proposed API without the actual register information until this is implemented in QEMU? Meaning, just reporting the relevant register information from the panic event. While it is nice to have the full feature set available, implementing the QMP command for this would be a bit out of the scope of this proposal. > With this in mind though, the proposed API is not satisfactory. Specifically > the field > > # define VIR_DOMAIN_STATE_PARAM_TYPE "info_type" > > As that assumes an either / or reporting need. If we added register > info, then we would potentially need to report crash info *and* register > info at the same time. > > I think this is easy to solve though - just delete the > VIR_DOMAIN_STATE_PARAM_TYPE field as it is redundant. Apps can just > look at whatever named parameters exist & use those they care about. Sure, the type parameter was basically an easy way for the client to figure out what data it has retrieved, especially when the parameter space was that large with all the (potential) different states. > The more critical thing is that in an SMP system, we'll need to report > registers per CPU, but this API is a per-VM level reporting. > > This is something we do with virDomainGetCPUStats(). > > So I think we need a design closer to that API, and perhaps call it > "virDomainGetCPUState" / virsh domcpustate Any hint how this API actually should look like? Exactly like virDomainGetCPUStats with the per-CPU-dance or should we just report back all available information? In the case of a panicked domain, how should we report the relevant information (crashed core/reason)? > Regards, > Daniel Thanks for the discussion and suggestions. Bjoern -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Matthias Hartmann Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Bjoern Walk <bwalk@linux.ibm.com> [2019-04-05, 12:35PM +0200]: > So, what's my course of action here? ping. Any advice would be much appreciated. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.