[PATCH for 7.2-rc3 v1 0/2] virtio fixes

Alex Bennée posted 2 patches 2 weeks, 1 day ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221123152134.179929-1-alex.bennee@linaro.org
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>
include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
2 files changed, 59 insertions(+), 9 deletions(-)
[PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 2 weeks, 1 day ago
Hi,

This hopefully fixes the problems with VirtIO migration caused by the
previous refactoring of virtio_device_started(). That introduced a
different order of checking which didn't give the VM state primacy but
wasn't noticed as we don't properly exercise VirtIO device migration
and caused issues when dev->started wasn't checked in the core code.
The introduction of virtio_device_should_start() split the overloaded
function up but the broken order still remained. The series finally
fixes that by restoring the original semantics but with the cleaned up
functions.

I've added more documentation to the various structures involved as
well as the functions. There is still some inconsistencies in the
VirtIO code between different devices but I think that can be looked
at over the 8.0 cycle.

Alex Bennée (2):
  include/hw: attempt to document VirtIO feature variables
  include/hw: VM state takes precedence in virtio_device_should_start

 include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
 include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
 2 files changed, 59 insertions(+), 9 deletions(-)

-- 
2.34.1


Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Michael S. Tsirkin 2 weeks, 1 day ago
On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> Hi,
> 
> This hopefully fixes the problems with VirtIO migration caused by the
> previous refactoring of virtio_device_started(). That introduced a
> different order of checking which didn't give the VM state primacy but
> wasn't noticed as we don't properly exercise VirtIO device migration
> and caused issues when dev->started wasn't checked in the core code.
> The introduction of virtio_device_should_start() split the overloaded
> function up but the broken order still remained. The series finally
> fixes that by restoring the original semantics but with the cleaned up
> functions.
> 
> I've added more documentation to the various structures involved as
> well as the functions. There is still some inconsistencies in the
> VirtIO code between different devices but I think that can be looked
> at over the 8.0 cycle.


Thanks a lot! Did you try this with gitlab CI? A patch similar to your
2/2 broke it previously ...

> Alex Bennée (2):
>   include/hw: attempt to document VirtIO feature variables
>   include/hw: VM state takes precedence in virtio_device_should_start
> 
>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>  2 files changed, 59 insertions(+), 9 deletions(-)
> 
> -- 
> 2.34.1
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 2 weeks, 1 day ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> Hi,
>> 
>> This hopefully fixes the problems with VirtIO migration caused by the
>> previous refactoring of virtio_device_started(). That introduced a
>> different order of checking which didn't give the VM state primacy but
>> wasn't noticed as we don't properly exercise VirtIO device migration
>> and caused issues when dev->started wasn't checked in the core code.
>> The introduction of virtio_device_should_start() split the overloaded
>> function up but the broken order still remained. The series finally
>> fixes that by restoring the original semantics but with the cleaned up
>> functions.
>> 
>> I've added more documentation to the various structures involved as
>> well as the functions. There is still some inconsistencies in the
>> VirtIO code between different devices but I think that can be looked
>> at over the 8.0 cycle.
>
>
> Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> 2/2 broke it previously ...

Looking into it now - so far hasn't broken locally but I guess there is
something different about the CI.

>
>> Alex Bennée (2):
>>   include/hw: attempt to document VirtIO feature variables
>>   include/hw: VM state takes precedence in virtio_device_should_start
>> 
>>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>>  2 files changed, 59 insertions(+), 9 deletions(-)
>> 
>> -- 
>> 2.34.1


-- 
Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Michael S. Tsirkin 2 weeks, 1 day ago
On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> Hi,
> >> 
> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> previous refactoring of virtio_device_started(). That introduced a
> >> different order of checking which didn't give the VM state primacy but
> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> and caused issues when dev->started wasn't checked in the core code.
> >> The introduction of virtio_device_should_start() split the overloaded
> >> function up but the broken order still remained. The series finally
> >> fixes that by restoring the original semantics but with the cleaned up
> >> functions.
> >> 
> >> I've added more documentation to the various structures involved as
> >> well as the functions. There is still some inconsistencies in the
> >> VirtIO code between different devices but I think that can be looked
> >> at over the 8.0 cycle.
> >
> >
> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> > 2/2 broke it previously ...
> 
> Looking into it now - so far hasn't broken locally but I guess there is
> something different about the CI.


yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2

Or with QEMU_CI set to 1 and then run fedora container and then clang-system manually.

> >
> >> Alex Bennée (2):
> >>   include/hw: attempt to document VirtIO feature variables
> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> 
> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> 
> >> -- 
> >> 2.34.1
> 
> 
> -- 
> Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 2 weeks ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> Hi,
>> >> 
>> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> previous refactoring of virtio_device_started(). That introduced a
>> >> different order of checking which didn't give the VM state primacy but
>> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> and caused issues when dev->started wasn't checked in the core code.
>> >> The introduction of virtio_device_should_start() split the overloaded
>> >> function up but the broken order still remained. The series finally
>> >> fixes that by restoring the original semantics but with the cleaned up
>> >> functions.
>> >> 
>> >> I've added more documentation to the various structures involved as
>> >> well as the functions. There is still some inconsistencies in the
>> >> VirtIO code between different devices but I think that can be looked
>> >> at over the 8.0 cycle.
>> >
>> >
>> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> > 2/2 broke it previously ...
>> 
>> Looking into it now - so far hasn't broken locally but I guess there is
>> something different about the CI.
>
>
> yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>
> Or with QEMU_CI set to 1 and then run fedora container and then
> clang-system manually.

I'm having trouble re-creating the failures in CI locally on my boxen. I
have triggered a bug on s390 but that looks like a pre-existing problem
with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
think that is a limitation of the test harness.

Will keep looking.

>
>> >
>> >> Alex Bennée (2):
>> >>   include/hw: attempt to document VirtIO feature variables
>> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> 
>> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> 
>> >> -- 
>> >> 2.34.1
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Michael S. Tsirkin 2 weeks ago
On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> >> 
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> >> Hi,
> >> >> 
> >> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> >> previous refactoring of virtio_device_started(). That introduced a
> >> >> different order of checking which didn't give the VM state primacy but
> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> >> and caused issues when dev->started wasn't checked in the core code.
> >> >> The introduction of virtio_device_should_start() split the overloaded
> >> >> function up but the broken order still remained. The series finally
> >> >> fixes that by restoring the original semantics but with the cleaned up
> >> >> functions.
> >> >> 
> >> >> I've added more documentation to the various structures involved as
> >> >> well as the functions. There is still some inconsistencies in the
> >> >> VirtIO code between different devices but I think that can be looked
> >> >> at over the 8.0 cycle.
> >> >
> >> >
> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> >> > 2/2 broke it previously ...
> >> 
> >> Looking into it now - so far hasn't broken locally but I guess there is
> >> something different about the CI.
> >
> >
> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
> >
> > Or with QEMU_CI set to 1 and then run fedora container and then
> > clang-system manually.
> 
> I'm having trouble re-creating the failures in CI locally on my boxen. I
> have triggered a bug on s390 but that looks like a pre-existing problem
> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
> think that is a limitation of the test harness.
> 
> Will keep looking.

Why not just trigger it on gitlab CI - it's very repeatable there?

> >
> >> >
> >> >> Alex Bennée (2):
> >> >>   include/hw: attempt to document VirtIO feature variables
> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> >> 
> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> >> 
> >> >> -- 
> >> >> 2.34.1
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> 
> 
> -- 
> Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 1 week, 6 days ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> function up but the broken order still remained. The series finally
>> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> functions.
>> >> >> 
>> >> >> I've added more documentation to the various structures involved as
>> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> at over the 8.0 cycle.
>> >> >
>> >> >
>> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> > 2/2 broke it previously ...
>> >> 
>> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> something different about the CI.
>> >
>> >
>> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >
>> > Or with QEMU_CI set to 1 and then run fedora container and then
>> > clang-system manually.
>> 
>> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> have triggered a bug on s390 but that looks like a pre-existing problem
>> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> think that is a limitation of the test harness.
>> 
>> Will keep looking.
>
> Why not just trigger it on gitlab CI - it's very repeatable there?

I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:

  # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
  # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#

And the command:

  env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess

Gives the following failure, while a leak may not be that exciting it
does point to a potential corruption issue. Unfortunately I don't get a
decent backtrace from the tool:

  # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
  # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
  ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
  # Start of arm tests
  # Start of virt tests
  # Start of virtio-mmio tests
  # Start of virtio-bus tests
  # Start of vhost-user-gpio-device tests
  # Start of vhost-user-gpio tests
  # Start of vhost-user-gpio-tests tests
  # Start of read-guest-mem tests
  # Start of memfile tests
  # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
  # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
  ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
  # set_protocol_features: 0x200
  # set_owner: start of session
  # vhost-user: un-handled message: 14
  # vhost-user: un-handled message: 14
  # set_vring_num: 0/1024
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
  qemu-system-arm: Failed to set msg fds.
  qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
  ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
  # End of memfile tests
  # End of read-guest-mem tests
  # End of vhost-user-gpio-tests tests
  # End of vhost-user-gpio tests
  # End of vhost-user-gpio-device tests
  # End of virtio-bus tests
  # End of virtio-mmio tests
  # End of virt tests
  # End of arm tests
  1..1

  =================================================================
  ==1024371==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
      #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
      #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
      #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
      #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
      #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
      #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
      #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
      #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
      #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
      #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
      #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
      #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5

  SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
  ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
  fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
  🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
  ✗



>> >
>> >> >
>> >> >> Alex Bennée (2):
>> >> >>   include/hw: attempt to document VirtIO feature variables
>> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> >> 
>> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> >> 
>> >> >> -- 
>> >> >> 2.34.1
>> >> 
>> >> 
>> >> -- 
>> >> Alex Bennée
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Michael S. Tsirkin 1 week, 6 days ago
On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
> >> 
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> 
> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
> >> >> 
> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> 
> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
> >> >> >> Hi,
> >> >> >> 
> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
> >> >> >> previous refactoring of virtio_device_started(). That introduced a
> >> >> >> different order of checking which didn't give the VM state primacy but
> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
> >> >> >> and caused issues when dev->started wasn't checked in the core code.
> >> >> >> The introduction of virtio_device_should_start() split the overloaded
> >> >> >> function up but the broken order still remained. The series finally
> >> >> >> fixes that by restoring the original semantics but with the cleaned up
> >> >> >> functions.
> >> >> >> 
> >> >> >> I've added more documentation to the various structures involved as
> >> >> >> well as the functions. There is still some inconsistencies in the
> >> >> >> VirtIO code between different devices but I think that can be looked
> >> >> >> at over the 8.0 cycle.
> >> >> >
> >> >> >
> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
> >> >> > 2/2 broke it previously ...
> >> >> 
> >> >> Looking into it now - so far hasn't broken locally but I guess there is
> >> >> something different about the CI.
> >> >
> >> >
> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
> >> >
> >> > Or with QEMU_CI set to 1 and then run fedora container and then
> >> > clang-system manually.
> >> 
> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
> >> have triggered a bug on s390 but that looks like a pre-existing problem
> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
> >> think that is a limitation of the test harness.
> >> 
> >> Will keep looking.
> >
> > Why not just trigger it on gitlab CI - it's very repeatable there?
> 
> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
> 
>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
> 
> And the command:
> 
>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
> 
> Gives the following failure, while a leak may not be that exciting it
> does point to a potential corruption issue. Unfortunately I don't get a
> decent backtrace from the tool:
> 
>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>   # Start of arm tests
>   # Start of virt tests
>   # Start of virtio-mmio tests
>   # Start of virtio-bus tests
>   # Start of vhost-user-gpio-device tests
>   # Start of vhost-user-gpio tests
>   # Start of vhost-user-gpio-tests tests
>   # Start of read-guest-mem tests
>   # Start of memfile tests
>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>   # set_protocol_features: 0x200
>   # set_owner: start of session
>   # vhost-user: un-handled message: 14
>   # vhost-user: un-handled message: 14
>   # set_vring_num: 0/1024
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>   qemu-system-arm: Failed to set msg fds.
>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>   # End of memfile tests
>   # End of read-guest-mem tests
>   # End of vhost-user-gpio-tests tests
>   # End of vhost-user-gpio tests
>   # End of vhost-user-gpio-device tests
>   # End of virtio-bus tests
>   # End of virtio-mmio tests
>   # End of virt tests
>   # End of arm tests
>   1..1
> 
>   =================================================================
>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
> 
>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
> 
>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>   ✗


ok ... was gpio always like this? from 1st commit? if not bisect?

> 
> 
> >> >
> >> >> >
> >> >> >> Alex Bennée (2):
> >> >> >>   include/hw: attempt to document VirtIO feature variables
> >> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
> >> >> >> 
> >> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
> >> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
> >> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
> >> >> >> 
> >> >> >> -- 
> >> >> >> 2.34.1
> >> >> 
> >> >> 
> >> >> -- 
> >> >> Alex Bennée
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> 
> 
> -- 
> Alex Bennée


Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 1 week, 6 days ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> >> 
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> >> 
>> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> >> function up but the broken order still remained. The series finally
>> >> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> >> functions.
>> >> >> >> 
>> >> >> >> I've added more documentation to the various structures involved as
>> >> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> >> at over the 8.0 cycle.
>> >> >> >
>> >> >> >
>> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> >> > 2/2 broke it previously ...
>> >> >> 
>> >> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> >> something different about the CI.
>> >> >
>> >> >
>> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >> >
>> >> > Or with QEMU_CI set to 1 and then run fedora container and then
>> >> > clang-system manually.
>> >> 
>> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> >> have triggered a bug on s390 but that looks like a pre-existing problem
>> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> >> think that is a limitation of the test harness.
>> >> 
>> >> Will keep looking.
>> >
>> > Why not just trigger it on gitlab CI - it's very repeatable there?
>> 
>> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
>> 
>>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
>> 
>> And the command:
>> 
>>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
>> 
>> Gives the following failure, while a leak may not be that exciting it
>> does point to a potential corruption issue. Unfortunately I don't get a
>> decent backtrace from the tool:
>> 
>>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # Start of arm tests
>>   # Start of virt tests
>>   # Start of virtio-mmio tests
>>   # Start of virtio-bus tests
>>   # Start of vhost-user-gpio-device tests
>>   # Start of vhost-user-gpio tests
>>   # Start of vhost-user-gpio-tests tests
>>   # Start of read-guest-mem tests
>>   # Start of memfile tests
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # set_protocol_features: 0x200
>>   # set_owner: start of session
>>   # vhost-user: un-handled message: 14
>>   # vhost-user: un-handled message: 14
>>   # set_vring_num: 0/1024
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>>   # End of memfile tests
>>   # End of read-guest-mem tests
>>   # End of vhost-user-gpio-tests tests
>>   # End of vhost-user-gpio tests
>>   # End of vhost-user-gpio-device tests
>>   # End of virtio-bus tests
>>   # End of virtio-mmio tests
>>   # End of virt tests
>>   # End of arm tests
>>   1..1
>> 
>>   =================================================================
>>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
>> 
>>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
>> 
>>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>>   ✗
>
>
> ok ... was gpio always like this? from 1st commit? if not bisect?
<snip>

I'm almost tempted to drop the mmio variant of the gpio test. I suspect
there is a reason the only other mmio vhost-user test is for virtio-net.

-- 
Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 1 week, 6 days ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 10:24:14PM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> >> 
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> >> 
>> >> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> >> function up but the broken order still remained. The series finally
>> >> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> >> functions.
>> >> >> >> 
>> >> >> >> I've added more documentation to the various structures involved as
>> >> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> >> at over the 8.0 cycle.
>> >> >> >
>> >> >> >
>> >> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> >> > 2/2 broke it previously ...
>> >> >> 
>> >> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> >> something different about the CI.
>> >> >
>> >> >
>> >> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >> >
>> >> > Or with QEMU_CI set to 1 and then run fedora container and then
>> >> > clang-system manually.
>> >> 
>> >> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> >> have triggered a bug on s390 but that looks like a pre-existing problem
>> >> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> >> think that is a limitation of the test harness.
>> >> 
>> >> Will keep looking.
>> >
>> > Why not just trigger it on gitlab CI - it's very repeatable there?
>> 
>> I can repeat a problem locally on Debian Bullseye and Ubuntu 22.04 with clang and leak sanitizer:
>> 
>>   # QEMU configure log Thu 24 Nov 16:02:56 GMT 2022
>>   # Configured with: '../../configure' '--cc=clang' '--cxx=clang++' '--enable-sanitizers' '--target-list=arm-softmmu,aarch64-softmmu,i386-softmmu,x86_64-softmmu,ppc64-softmmu'#
>> 
>> And the command:
>> 
>>   env QTEST_QEMU_BINARY=./qemu-system-arm QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=178 G_TEST_DBUS_DAEMON=/home/alex.bennee/lsrc/qemu.git/tests/dbus-vmstate-daemon.sh ./tests/qtest/qos-test -p /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
>> 
>> Gives the following failure, while a leak may not be that exciting it
>> does point to a potential corruption issue. Unfortunately I don't get a
>> decent backtrace from the tool:
>> 
>>   # random seed: R02S071fe8d68317a8b01e5e7fadbf1ac60a
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest
>>   ==1024354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # Start of arm tests
>>   # Start of virt tests
>>   # Start of virtio-mmio tests
>>   # Start of virtio-bus tests
>>   # Start of vhost-user-gpio-device tests
>>   # Start of vhost-user-gpio tests
>>   # Start of vhost-user-gpio-tests tests
>>   # Start of read-guest-mem tests
>>   # Start of memfile tests
>>   # starting QEMU: exec ./qemu-system-arm -qtest unix:/tmp/qtest-1024352.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1024352.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt  -device vhost-user-gpio-device,id=gpio0,chardev=chr-vhost-user-test -m 256 -object memory-backend-memfd,id=mem,size=256M, -numa node,memdev=mem -chardev socket,id=chr-vhost-user-test,path=/tmp/vhost-test-8DD2V1/vhost-user-test.sock -accel qtest
>>   # GLib-DEBUG: setenv()/putenv() are not thread-safe and should not be used after threads are created
>>   ==1024371==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
>>   # set_protocol_features: 0x200
>>   # set_owner: start of session
>>   # vhost-user: un-handled message: 14
>>   # vhost-user: un-handled message: 14
>>   # set_vring_num: 0/1024
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   qemu-system-arm: Failed to set msg fds.
>>   qemu-system-arm: vhost_set_vring_call failed: Invalid argument (22)
>>   ok 1 /arm/virt/virtio-mmio/virtio-bus/vhost-user-gpio-device/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess # SKIP No memory at address 0x0
>>   # End of memfile tests
>>   # End of read-guest-mem tests
>>   # End of vhost-user-gpio-tests tests
>>   # End of vhost-user-gpio tests
>>   # End of vhost-user-gpio-device tests
>>   # End of virtio-bus tests
>>   # End of virtio-mmio tests
>>   # End of virt tests
>>   # End of arm tests
>>   1..1
>> 
>>   =================================================================
>>   ==1024371==ERROR: LeakSanitizer: detected memory leaks
>> 
>>   Direct leak of 240 byte(s) in 1 object(s) allocated from:
>>       #0 0x561d9a5d7a18 in __interceptor_calloc (/home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/qemu-system-arm+0x1d1fa18) (BuildId: 0bdc7c2ada2277089db16d57f17c314e9e53e41c)
>>       #1 0x7f46ee656c40 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5ec40) (BuildId: 0ab0b740e34eeb0c84656ba53737f4c440dfbed4)
>>       #2 0x561d9bf7875b in virtio_device_realize /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/virtio/virtio.c:4175:9
>>       #3 0x561d9c321bf4 in device_set_realized /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../hw/core/qdev.c:566:13
>>       #4 0x561d9c33dda8 in property_set_bool /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:2285:5
>>       #5 0x561d9c338fb3 in object_property_set /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/object.c:1420:5
>>       #6 0x561d9c344c7c in object_property_set_qobject /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../qom/qom-qobject.c:28:10
>>       #7 0x561d9b367954 in qdev_device_add /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/qdev-monitor.c:733:11
>>       #8 0x561d9b36f832 in qemu_create_cli_devices /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2536:5
>>       #9 0x561d9b36f832 in qmp_x_exit_preconfig /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:2604:5
>>       #10 0x561d9b37613f in qemu_init /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/vl.c:3601:9
>>       #11 0x561d9a6125a5 in main /home/alex.bennee/lsrc/qemu.git/builds/all.clang-sanitizers/../../softmmu/main.c:47:5
>> 
>>   SUMMARY: AddressSanitizer: 240 byte(s) leaked in 1 allocation(s).
>>   ../../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
>>   fish: Job 1, 'env QTEST_QEMU_BINARY=./qemu-sy…' terminated by signal SIGABRT (Abort)
>>   🕙22:26:18 alex.bennee@hackbox2:qemu.git/builds/all.clang-sanitizers  on  for-7.2/virtio-fixes [$?] [⚡ IOT]
>>   ✗
>
>
> ok ... was gpio always like this? from 1st commit? if not bisect?

I think so. I've managed to track down what the malloc is. It's the
memory allocated by:

  gpio->vhost_dev.vqs = g_new0(struct vhost_virtqueue, gpio->vhost_dev.nvqs);

It is never freed because we never get to vu_gpio_device_unrealize() but
as far as I can tell none of the qtests ever trigger the unrealize() so
I'm not sure why it is special.

>
>> 
>> 
>> >> >
>> >> >> >
>> >> >> >> Alex Bennée (2):
>> >> >> >>   include/hw: attempt to document VirtIO feature variables
>> >> >> >>   include/hw: VM state takes precedence in virtio_device_should_start
>> >> >> >> 
>> >> >> >>  include/hw/virtio/vhost.h  | 25 +++++++++++++++++++---
>> >> >> >>  include/hw/virtio/virtio.h | 43 ++++++++++++++++++++++++++++++++------
>> >> >> >>  2 files changed, 59 insertions(+), 9 deletions(-)
>> >> >> >> 
>> >> >> >> -- 
>> >> >> >> 2.34.1
>> >> >> 
>> >> >> 
>> >> >> -- 
>> >> >> Alex Bennée
>> >> 
>> >> 
>> >> -- 
>> >> Alex Bennée
>> 
>> 
>> -- 
>> Alex Bennée


-- 
Alex Bennée
Re: [PATCH for 7.2-rc3 v1 0/2] virtio fixes
Posted by Alex Bennée 2 weeks ago
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Nov 24, 2022 at 09:21:15AM +0000, Alex Bennée wrote:
>> 
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Nov 23, 2022 at 04:03:49PM +0000, Alex Bennée wrote:
>> >> 
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> 
>> >> > On Wed, Nov 23, 2022 at 03:21:32PM +0000, Alex Bennée wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> This hopefully fixes the problems with VirtIO migration caused by the
>> >> >> previous refactoring of virtio_device_started(). That introduced a
>> >> >> different order of checking which didn't give the VM state primacy but
>> >> >> wasn't noticed as we don't properly exercise VirtIO device migration
>> >> >> and caused issues when dev->started wasn't checked in the core code.
>> >> >> The introduction of virtio_device_should_start() split the overloaded
>> >> >> function up but the broken order still remained. The series finally
>> >> >> fixes that by restoring the original semantics but with the cleaned up
>> >> >> functions.
>> >> >> 
>> >> >> I've added more documentation to the various structures involved as
>> >> >> well as the functions. There is still some inconsistencies in the
>> >> >> VirtIO code between different devices but I think that can be looked
>> >> >> at over the 8.0 cycle.
>> >> >
>> >> >
>> >> > Thanks a lot! Did you try this with gitlab CI? A patch similar to your
>> >> > 2/2 broke it previously ...
>> >> 
>> >> Looking into it now - so far hasn't broken locally but I guess there is
>> >> something different about the CI.
>> >
>> >
>> > yes - pls push to gitlab, create pipeline e.g. with QEMU_CI set to 2
>> >
>> > Or with QEMU_CI set to 1 and then run fedora container and then
>> > clang-system manually.
>> 
>> I'm having trouble re-creating the failures in CI locally on my boxen. I
>> have triggered a bug on s390 but that looks like a pre-existing problem
>> with VRING_SET_ENDIAN being triggered for the vhost-user-gpio tests. I
>> think that is a limitation of the test harness.
>> 
>> Will keep looking.
>
> Why not just trigger it on gitlab CI - it's very repeatable there?

I've got a fix for gpio and am running it through CI now:

  https://gitlab.com/stsquad/qemu/-/pipelines/704285944

My main concern is I had to do something no other vhost-user device does
and I'm not sure if thats down to misunderstanding or the other devices
just getting lucky.

-- 
Alex Bennée