[Qemu-devel] [PATCH 0/3] script for crash-testing -device

Eduardo Habkost posted 3 patches 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170322160052.2820-1-ehabkost@redhat.com
Test checkpatch failed
Test docker passed
Test s390x passed
There is a newer version of this series
scripts/device-crash-test.py | 486 +++++++++++++++++++++++++++++++++++++++++++
scripts/qemu.py              |  10 +-
scripts/qtest.py             |   6 +
3 files changed, 499 insertions(+), 3 deletions(-)
create mode 100755 scripts/device-crash-test.py
[Qemu-devel] [PATCH 0/3] script for crash-testing -device
Posted by Eduardo Habkost 7 years ago
This series adds scripts/device-crashtest.py, that can be used to
crash-test -device with multiple machine/accel/device
combinations.

The script found a few crashes on some machines/devices. A dump
of existing cases can be seen here:
  https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb

The script contains a whitelist that can also be useful as
documentation of existing ways -device can fail or crash.

Note that the script takes a few hours to run on the default mode
(testing all accel/machine/device combinations), but the "-r N"
option can be used to make it only test N random samples.

Example script output:

  $ ../scripts/device-crash-test.py -v --shuffle
  INFO: test case: machine=verdex binary=./aarch64-softmmu/qemu-system-aarch64 device=exynos4210-ehci-usb accel=tcg
  INFO: test case: machine=none binary=./aarch64-softmmu/qemu-system-aarch64 device=onenand accel=tcg
  INFO: test case: machine=pc-i440fx-2.2 binary=./x86_64-softmmu/qemu-system-x86_64 device=ide-cd accel=kvm
  INFO: success: ./x86_64-softmmu/qemu-system-x86_64 -S -machine pc-i440fx-2.2,accel=kvm -device ide-cd
  INFO: test case: machine=SPARCClassic binary=./sparc-softmmu/qemu-system-sparc device=memory accel=tcg
  qemu received signal 6: -S -machine SPARCClassic,accel=tcg -device memory
  ERROR: failed: machine=SPARCClassic binary=./sparc-softmmu/qemu-system-sparc device=memory accel=tcg
  ERROR: cmdline: ./sparc-softmmu/qemu-system-sparc -S -machine SPARCClassic,accel=tcg -device memory
  ERROR: log: qemu-system-sparc: /root/qemu-build/exec.c:1500: find_ram_offset: Assertion `size != 0' failed.
  ERROR: exit code: -6
  INFO: test case: machine=romulus-bmc binary=./arm-softmmu/qemu-system-arm device=ich9-usb-uhci6 accel=tcg
  INFO: test case: machine=ref405ep binary=./ppc-softmmu/qemu-system-ppc device=ivshmem-doorbell accel=tcg
  INFO: test case: machine=romulus-bmc binary=./aarch64-softmmu/qemu-system-aarch64 device=l2x0 accel=tcg
  INFO: test case: machine=pc-i440fx-1.7 binary=./x86_64-softmmu/qemu-system-x86_64 device=virtio-input-host-pci accel=tcg
  INFO: test case: machine=none binary=./ppc-softmmu/qemu-system-ppc device=virtio-tablet-pci accel=tcg
  INFO: test case: machine=terrier binary=./aarch64-softmmu/qemu-system-aarch64 device=sst25vf016b accel=tcg
  INFO: success: ./aarch64-softmmu/qemu-system-aarch64 -S -machine terrier,accel=tcg -device sst25vf016b
  INFO: test case: machine=none binary=./i386-softmmu/qemu-system-i386 device=intel-iommu accel=kvm
  qemu received signal 6: -S -machine none,accel=kvm -device intel-iommu
  ERROR: failed: machine=none binary=./i386-softmmu/qemu-system-i386 device=intel-iommu accel=kvm
  ERROR: cmdline: ./i386-softmmu/qemu-system-i386 -S -machine none,accel=kvm -device intel-iommu
  ERROR: log: /root/qemu-build/hw/i386/intel_iommu.c:2565:vtd_realize: Object 0x7fe117fabfb0 is not an instance of type generic-pc-machine
  ERROR: exit code: -6
  INFO: test case: machine=tosa binary=./aarch64-softmmu/qemu-system-aarch64 device=integrator_core accel=tcg
  INFO: test case: machine=isapc binary=./i386-softmmu/qemu-system-i386 device=i82550 accel=kvm
  INFO: test case: machine=xlnx-ep108 binary=./aarch64-softmmu/qemu-system-aarch64 device=digic accel=tcg
  qemu received signal 6: -S -machine xlnx-ep108,accel=tcg -device digic
  ERROR: failed: machine=xlnx-ep108 binary=./aarch64-softmmu/qemu-system-aarch64 device=digic accel=tcg
  ERROR: cmdline: ./aarch64-softmmu/qemu-system-aarch64 -S -machine xlnx-ep108,accel=tcg -device digic
  ERROR: log: audio: Could not init `oss' audio driver
  ERROR: log: Unexpected error in qemu_chr_fe_init() at /root/qemu-build/chardev/char.c:512:
  ERROR: log: qemu-system-aarch64: -device digic: Device 'serial0' is in use
  ERROR: exit code: -6
  INFO: test case: machine=raspi2 binary=./arm-softmmu/qemu-system-arm device=sd-card accel=tcg
  INFO: success: ./arm-softmmu/qemu-system-arm -S -machine raspi2,accel=tcg -device sd-card
  [...]

Eduardo Habkost (3):
  qemu.py: Always save QEMU exit code
  qtest.py: Support QTEST_LOG environment variable
  scripts: Test script to look for -device crashes

 scripts/device-crash-test.py | 486 +++++++++++++++++++++++++++++++++++++++++++
 scripts/qemu.py              |  10 +-
 scripts/qtest.py             |   6 +
 3 files changed, 499 insertions(+), 3 deletions(-)
 create mode 100755 scripts/device-crash-test.py

-- 
2.11.0.259.g40922b1


Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
Posted by Eduardo Habkost 7 years ago
On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote:
> This series adds scripts/device-crashtest.py, that can be used to
> crash-test -device with multiple machine/accel/device
> combinations.
> 
> The script found a few crashes on some machines/devices. A dump
> of existing cases can be seen here:
>   https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb
> 
> The script contains a whitelist that can also be useful as
> documentation of existing ways -device can fail or crash.
> 
> Note that the script takes a few hours to run on the default mode
> (testing all accel/machine/device combinations), but the "-r N"
> option can be used to make it only test N random samples.

Something I forgot to mention: I would like to run some subset of
these tests on "make check", but I don't know how we could choose
that subset. We could run, e.g., 100 random samples, but I am not
sure we really want to make "make check" non-deterministic.

Ideas?

-- 
Eduardo

Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
Posted by Thomas Huth 7 years ago
On 22.03.2017 20:13, Eduardo Habkost wrote:
> On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote:
>> This series adds scripts/device-crashtest.py, that can be used to
>> crash-test -device with multiple machine/accel/device
>> combinations.
>>
>> The script found a few crashes on some machines/devices. A dump
>> of existing cases can be seen here:
>>   https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb
>>
>> The script contains a whitelist that can also be useful as
>> documentation of existing ways -device can fail or crash.
>>
>> Note that the script takes a few hours to run on the default mode
>> (testing all accel/machine/device combinations), but the "-r N"
>> option can be used to make it only test N random samples.

Wow, impressive script, that must have been a lot of work 'til you've
got it in a usable shape with that huge whitelist!

> Something I forgot to mention: I would like to run some subset of
> these tests on "make check", but I don't know how we could choose
> that subset. We could run, e.g., 100 random samples, but I am not
> sure we really want to make "make check" non-deterministic.

Maybe limit the tests to the devices that have a high chance to work on
different machines? ... that means primarily PCI, ISA and USB devices, I
guess.

 Thomas


Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
Posted by Eduardo Habkost 7 years ago
On Thu, Mar 23, 2017 at 04:43:01PM +0100, Thomas Huth wrote:
> On 22.03.2017 20:13, Eduardo Habkost wrote:
> > On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote:
> >> This series adds scripts/device-crashtest.py, that can be used to
> >> crash-test -device with multiple machine/accel/device
> >> combinations.
> >>
> >> The script found a few crashes on some machines/devices. A dump
> >> of existing cases can be seen here:
> >>   https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb
> >>
> >> The script contains a whitelist that can also be useful as
> >> documentation of existing ways -device can fail or crash.
> >>
> >> Note that the script takes a few hours to run on the default mode
> >> (testing all accel/machine/device combinations), but the "-r N"
> >> option can be used to make it only test N random samples.
> 
> Wow, impressive script, that must have been a lot of work 'til you've
> got it in a usable shape with that huge whitelist!
> 
> > Something I forgot to mention: I would like to run some subset of
> > these tests on "make check", but I don't know how we could choose
> > that subset. We could run, e.g., 100 random samples, but I am not
> > sure we really want to make "make check" non-deterministic.
> 
> Maybe limit the tests to the devices that have a high chance to work on
> different machines? ... that means primarily PCI, ISA and USB devices, I
> guess.

On the other hand, I believe the remaining devices are the ones
most likely to crash machines unexpectedly...

For reference, these are the numbers when trying to test every
single machine type:

Total: 89321 test cases
pci: 27749 test cases
usb: 5125 test cases
isa: 3948 test cases

From those 89k test cases, 67k fail (cleanly). The top reasons they fail are:

Count | Whitelist entry
------+------------------------------------------------------------------------
20681 | {'log': "No '[\\w-]+' bus found for device '[\\w-]+'"}
13076 | {'log': "Option '-device [\\w.,-]+' cannot be handled by this machine"}
 4821 | {'log': '(Guest|ROM|Flash|Kernel) image must be specified'}
 4096 | {'device': '.*-(i386|x86_64)-cpu'}
 3200 | {'log': "images* must be given with the 'pflash' parameter"}
 3084 | {'log': "[cC]ould not load [\\w ]+ (BIOS|bios) '[\\w-]+\\.bin'"}
 1120 | {'log': 'Device [\\w.,-]+ can not be dynamically instantiated'}
  800 | {'log': "Couldn't find rom image '[\\w-]+\\.bin'"}
  607 | {'device': 'vhost-scsi.*'}
  551 | {'loglevel': 40, 'log': "Device 'serial0' is in use", 'exitcode': -6}
  476 | {'log': 'Device [\\w.,-]+ is not supported by this machine yet'}

So, a few things we can do:

1) Using query-device-slots: if the test code knew in advance
which buses/device-types are supported by each machine, we could
limit the number of devices being tested. That means the test
code will probably benefit from a query-device-slots command.

This would get rid of the following:

20681 | {'log': "No '[\\w-]+' bus found for device '[\\w-]+'"}
13076 | {'log': "Option '-device [\\w.,-]+' cannot be handled by this machine"}
 1120 | {'log': 'Device [\\w.,-]+ can not be dynamically instantiated'}
  476 | {'log': 'Device [\\w.,-]+ is not supported by this machine yet'}

2) Don't keep trying to test machines that can't be tested out of
the box because they need rom or kernel images.  The script can
first try to run the machine with no -device arguments, to ensure
it is really usable, before trying to test it with all devices.

This will get rid of the following:

 4821 | {'log': '(Guest|ROM|Flash|Kernel) image must be specified'}
 3200 | {'log': "images* must be given with the 'pflash' parameter"}
 3084 | {'log': "[cC]ould not load [\\w ]+ (BIOS|bios) '[\\w-]+\\.bin'"}
  800 | {'log': "Couldn't find rom image '[\\w-]+\\.bin'"}

3) Not testing the devices from the "devices that won't work out
   of the box" section. There are ~18k test cases matching those
   entries.

If I did the calculations right, all of the above would eliminate
more than 63k test cases.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
Posted by Marcel Apfelbaum 7 years ago
On 03/23/2017 05:43 PM, Thomas Huth wrote:
> On 22.03.2017 20:13, Eduardo Habkost wrote:
>> On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote:
>>> This series adds scripts/device-crashtest.py, that can be used to
>>> crash-test -device with multiple machine/accel/device
>>> combinations.
>>>
>>> The script found a few crashes on some machines/devices. A dump
>>> of existing cases can be seen here:
>>>   https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb
>>>
>>> The script contains a whitelist that can also be useful as
>>> documentation of existing ways -device can fail or crash.
>>>
>>> Note that the script takes a few hours to run on the default mode
>>> (testing all accel/machine/device combinations), but the "-r N"
>>> option can be used to make it only test N random samples.
>
> Wow, impressive script, that must have been a lot of work 'til you've
> got it in a usable shape with that huge whitelist!
>

+1

Great work Eduardo, thanks!

>> Something I forgot to mention: I would like to run some subset of
>> these tests on "make check", but I don't know how we could choose
>> that subset. We could run, e.g., 100 random samples, but I am not
>> sure we really want to make "make check" non-deterministic.
>
> Maybe limit the tests to the devices that have a high chance to work on
> different machines? ... that means primarily PCI, ISA and USB devices, I
> guess.
>

Is hard to maintain that list, it will miss new devices and so on.
We should have a "nightly" run or something, but still, maintaining
the white list of known errors is still problematic.

Thanks,
Marcel

>  Thomas
>