FAULT INJECTION FRAMEWORK

[Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Damien Hedde 4 years, 10 months ago

Hi all,

This series adds a python framework aiming to provide some ways to do fault
injection in a running vm. In its current state, it allows to easily interact
with memory, change gpios and qom properties.

The framework consists in a python script based on the qmp existing module
which allows to interact with the vm.

The series introduces a QMP command to schedule some virtual-clock-time-based
notifications. The notification is sent back to the python framework and can
be used to build time-driven fault scenario.

Additionaly the series adds some new QMP commands:

Commands are added to read/write memory or memory-mapped registers. Arguments
are similar to the existing [p]memsave commands.

A command is added to set a value to a qdev gpio.

Here's is a simple visual example which inject characters on the uart output
of the zynq platform:
$ # qemu must have been launched with -qmp unix:/tmp/qmpsock,server
$ # create the python framework object
$ import fault_injection
$ inj = fault_injection.FaultInjectionFramework("/tmp/qmpsock", 0)
$
$ # function which display a 'X' to the first uart
$ # it access directly the register using the physical address
$ def cb():
$   inj.write_pmem(0xe0000030,4, 88)
$
$ # schedule the function on a notification in 10s
$ inj.notify(10 * 1000 * 1000 * 1000, cb, True)
$
$ # handle one notification
$ inj.run_once()

The framework has been tested using python 2, on qemu running xilinx_zynq or
virt arm machines.

The series is organised as follows. Patches 1 and 2 adds the memory qmp and
gpio commands. Patch 3 adds the notification mechanism. Patches 4 and 5 add
a python helper module and some documention.

Thanks to the Xilinx's QEMU team who sponsored this work.

Damien Hedde (5):
  introduce [p]mem(read|write) qmp commands
  introduce a qmp command to set gpios
  add qmp time-notify event triggering system
  fault_injection: introduce Python scripting framework
  docs: add fault injection framework documentation

 cpus.c                         | 126 +++++++++++++++
 docs/fault_injection.txt       | 149 ++++++++++++++++++
 monitor/Makefile.objs          |   1 +
 monitor/qmp-cmd-time-notify.c  | 116 ++++++++++++++
 monitor/qmp-cmds.c             |  30 ++++
 monitor/trace-events           |   4 +
 qapi/misc.json                 | 196 +++++++++++++++++++++++
 scripts/qmp/fault_injection.py | 278 +++++++++++++++++++++++++++++++++
 8 files changed, 900 insertions(+)
 create mode 100644 docs/fault_injection.txt
 create mode 100644 monitor/qmp-cmd-time-notify.c
 create mode 100644 scripts/qmp/fault_injection.py

-- 
2.22.0

Re: [Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Stefan Hajnoczi 4 years, 10 months ago

On Fri, Jun 28, 2019 at 02:45:29PM +0200, Damien Hedde wrote:
> This series adds a python framework aiming to provide some ways to do fault
> injection in a running vm. In its current state, it allows to easily interact
> with memory, change gpios and qom properties.
> 
> The framework consists in a python script based on the qmp existing module
> which allows to interact with the vm.

How does this compare to qtest?  There seems to be a lot of overlap
between them.

Why is it called "fault injection"?  The commands seem to be
general-purpose device testing functions (like qtest and libqos), not
functions for testing error code paths as would be expected from a fault
injection framework.

Stefan

Re: [Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Philippe Mathieu-Daudé 4 years, 10 months ago

On 7/1/19 10:37 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 28, 2019 at 02:45:29PM +0200, Damien Hedde wrote:
>> This series adds a python framework aiming to provide some ways to do fault
>> injection in a running vm. In its current state, it allows to easily interact
>> with memory, change gpios and qom properties.
>>
>> The framework consists in a python script based on the qmp existing module
>> which allows to interact with the vm.
> 
> How does this compare to qtest?  There seems to be a lot of overlap
> between them.
> 
> Why is it called "fault injection"?  The commands seem to be
> general-purpose device testing functions (like qtest and libqos), not
> functions for testing error code paths as would be expected from a fault
> injection framework.

I understand qtest is to test QEMU, while this framework/command is to
test how the guest react to an hardware faults.

To use the qtest_mem commands you need to run QEMU with the qtest
chardev backend, while this series expose a QMP interface.

To avoid the overlap, a cleaner follow up might be to have qtest wrap
these QMP commands (mostly like HMP commands do).

Another note while looking at a glance, qtest uses the 1st cpu address
space view, this series allow to select a specific cpu.

It makes sense to me to be able to select address spaces by name (more
generic, not restricted to a cpu view, since one might want to inject
fault in a device ram not always mapped to a cpu: dma, emac desc).

Re: [Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Stefan Hajnoczi 4 years, 9 months ago

On Mon, Jul 01, 2019 at 12:16:44PM +0200, Philippe Mathieu-Daudé wrote:
> On 7/1/19 10:37 AM, Stefan Hajnoczi wrote:
> > On Fri, Jun 28, 2019 at 02:45:29PM +0200, Damien Hedde wrote:
> >> This series adds a python framework aiming to provide some ways to do fault
> >> injection in a running vm. In its current state, it allows to easily interact
> >> with memory, change gpios and qom properties.
> >>
> >> The framework consists in a python script based on the qmp existing module
> >> which allows to interact with the vm.
> > 
> > How does this compare to qtest?  There seems to be a lot of overlap
> > between them.
> > 
> > Why is it called "fault injection"?  The commands seem to be
> > general-purpose device testing functions (like qtest and libqos), not
> > functions for testing error code paths as would be expected from a fault
> > injection framework.
> 
> I understand qtest is to test QEMU, while this framework/command is to
> test how the guest react to an hardware faults.

The commands seems to be equivalent to qtest commands, just implemented
as QMP commands.

Damien: Can you explain the use case more and show some example test
cases?

> To use the qtest_mem commands you need to run QEMU with the qtest
> chardev backend, while this series expose a QMP interface.
> 
> To avoid the overlap, a cleaner follow up might be to have qtest wrap
> these QMP commands (mostly like HMP commands do).
> 
> Another note while looking at a glance, qtest uses the 1st cpu address
> space view, this series allow to select a specific cpu.
> 
> It makes sense to me to be able to select address spaces by name (more
> generic, not restricted to a cpu view, since one might want to inject
> fault in a device ram not always mapped to a cpu: dma, emac desc).

The naming issue still stands: none of the commands actually perform
fault injection.  They can be used for other types of testing or even
non-testing purposes.

Fault injection commands would be "make the next watchdog expiry fail",
"return error code X on the next DMA request", "report an AHCI link
failure", etc.

These commands are lower-level.  Therefore, I think "fault injection
framework" is a misnomer and will age poorly if this API is extended in
the future.

Stefan

Re: [Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Damien Hedde 4 years, 9 months ago

On 7/3/19 11:29 AM, Stefan Hajnoczi wrote:
> On Mon, Jul 01, 2019 at 12:16:44PM +0200, Philippe Mathieu-Daudé wrote:
>> On 7/1/19 10:37 AM, Stefan Hajnoczi wrote:
>>> On Fri, Jun 28, 2019 at 02:45:29PM +0200, Damien Hedde wrote:
>>>> This series adds a python framework aiming to provide some ways to do fault
>>>> injection in a running vm. In its current state, it allows to easily interact
>>>> with memory, change gpios and qom properties.
>>>>
>>>> The framework consists in a python script based on the qmp existing module
>>>> which allows to interact with the vm.
>>>
>>> How does this compare to qtest?  There seems to be a lot of overlap
>>> between them.
>>>
>>> Why is it called "fault injection"?  The commands seem to be
>>> general-purpose device testing functions (like qtest and libqos), not
>>> functions for testing error code paths as would be expected from a fault
>>> injection framework.
>>
>> I understand qtest is to test QEMU, while this framework/command is to
>> test how the guest react to an hardware faults.
> 
> The commands seems to be equivalent to qtest commands, just implemented
> as QMP commands.
> 
> Damien: Can you explain the use case more and show some example test
> cases?

The goal is to test and validate the software running on the vp. We want
to generate some fault to test if the software behave correctly. We
target corner cases that do not happen otherwise on the vp. Basically we
would like, using some scripts, to run some specific scenarios and check
that the expected behavior happens.

Regarding qtest, I was not aware that it provided such commands. I'm
sorry I've missed that. Just checked after reading your feedback,
commands seem indeed equivalent. I don't know if running the vp with
qtest enabled has some hidden drawbacks. But if that's not the case, we
can work to extend the existing qtest commands (or switch some of them
to QMP like Philippe proposed, I don't know what's best).

> 
>> To use the qtest_mem commands you need to run QEMU with the qtest
>> chardev backend, while this series expose a QMP interface.
>>
>> To avoid the overlap, a cleaner follow up might be to have qtest wrap
>> these QMP commands (mostly like HMP commands do).
>>
>> Another note while looking at a glance, qtest uses the 1st cpu address
>> space view, this series allow to select a specific cpu.
>>
>> It makes sense to me to be able to select address spaces by name (more
>> generic, not restricted to a cpu view, since one might want to inject
>> fault in a device ram not always mapped to a cpu: dma, emac desc).

Good point.

> 
> The naming issue still stands: none of the commands actually perform
> fault injection.  They can be used for other types of testing or even
> non-testing purposes.
> 
> Fault injection commands would be "make the next watchdog expiry fail",
> "return error code X on the next DMA request", "report an AHCI link
> failure", etc.
> 
> These commands are lower-level.  Therefore, I think "fault injection
> framework" is a misnomer and will age poorly if this API is extended in
> the future.

The only fault injection naming was for the python module. I suppose
that if we just extend qtest, there is no need for a new module or
documentation file.

Thanks,

Damien

Re: [Qemu-devel] [RFC PATCH 0/5] FAULT INJECTION FRAMEWORK

Posted by Stefan Hajnoczi 4 years, 9 months ago

On Wed, Jul 03, 2019 at 05:47:47PM +0200, Damien Hedde wrote:
> On 7/3/19 11:29 AM, Stefan Hajnoczi wrote:
> > On Mon, Jul 01, 2019 at 12:16:44PM +0200, Philippe Mathieu-Daudé wrote:
> >> On 7/1/19 10:37 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Jun 28, 2019 at 02:45:29PM +0200, Damien Hedde wrote:
> >>>> This series adds a python framework aiming to provide some ways to do fault
> >>>> injection in a running vm. In its current state, it allows to easily interact
> >>>> with memory, change gpios and qom properties.
> >>>>
> >>>> The framework consists in a python script based on the qmp existing module
> >>>> which allows to interact with the vm.
> >>>
> >>> How does this compare to qtest?  There seems to be a lot of overlap
> >>> between them.
> >>>
> >>> Why is it called "fault injection"?  The commands seem to be
> >>> general-purpose device testing functions (like qtest and libqos), not
> >>> functions for testing error code paths as would be expected from a fault
> >>> injection framework.
> >>
> >> I understand qtest is to test QEMU, while this framework/command is to
> >> test how the guest react to an hardware faults.
> > 
> > The commands seems to be equivalent to qtest commands, just implemented
> > as QMP commands.
> > 
> > Damien: Can you explain the use case more and show some example test
> > cases?
> 
> The goal is to test and validate the software running on the vp. We want
> to generate some fault to test if the software behave correctly. We
> target corner cases that do not happen otherwise on the vp. Basically we
> would like, using some scripts, to run some specific scenarios and check
> that the expected behavior happens.
> 
> Regarding qtest, I was not aware that it provided such commands. I'm
> sorry I've missed that. Just checked after reading your feedback,
> commands seem indeed equivalent. I don't know if running the vp with
> qtest enabled has some hidden drawbacks. But if that's not the case, we
> can work to extend the existing qtest commands (or switch some of them
> to QMP like Philippe proposed, I don't know what's best).

I'm not 100% sure that qtest is the right tool for the job.  Maybe you
really need to add QMP commands as you have done.

Could you share some test cases so reviewers have an idea of how these
new commands are used for fault injection?

qtest is special in that no guest code executes.  QEMU allocates guest
RAM and initializes devices as usual but TCG/KVM do not execute guest
CPU instructions.  Does your use case require guest execution?

Here is a presentation on qtest if you want to get an overview:
https://www.youtube.com/watch?v=4TSaMmrnHy8

Stefan