[PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

Maxim Levitsky posted 7 patches 3 years, 11 months ago
Test asan passed
Test docker-mingw@fedora passed
Test checkpatch failed
Test docker-quick@centos7 failed
Test FreeBSD passed
Failed in applying to current master (apply log)
hw/core/bus.c          | 36 ++++++++++++++++++-----------
hw/core/qdev.c         | 52 ++++++++++++++++++++++++++++++------------
hw/scsi/scsi-bus.c     | 48 +++++++++++++++++++++++++++++++++-----
hw/scsi/virtio-scsi.c  | 46 +++++++++++++++++++++++++++++--------
include/hw/qdev-core.h |  3 +++
include/hw/scsi/scsi.h |  2 ++
include/qemu/rcu.h     |  1 +
qdev-monitor.c         |  3 +++
util/rcu.c             | 33 +++++++++++++++++++++++++++
9 files changed, 181 insertions(+), 43 deletions(-)
[PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Posted by Maxim Levitsky 3 years, 11 months ago
Hi!

This is a patch series that is a result of my discussion with Paulo on
how to correctly fix the root cause of the BZ #1812399.

The root cause of this bug is the fact that IO thread is running mostly
unlocked versus main thread on which device hotplug is done.

qdev_device_add first creates the device object, then places it on the bus,
and only then realizes it.

However some drivers and currently only virtio-scsi enumerate its child bus
devices on each request that is received from the guest and that can happen on the IO
thread.

Thus we have a window when new device is on the bus but not realized and can be accessed
by the virtio-scsi driver in that state.

Fix that by doing two things:

1. Add partial RCU protection to the list of a bus's child devices.
This allows the scsi IO thread to safely enumerate the child devices
while it races with the hotplug placing the device on the bus.

2. Make the virtio-scsi driver check .realized property of the scsi device
and avoid touching the device if it isn't

Note that in the particular bug report the issue wasn't a race but rather due
to combination of things, the .realize code in the middle managed to trigger IO on the virtqueue
which caused the virtio-scsi driver to access the half realized device. However
since this can happen as well with real IO thread, this patch series was done,
which fixes this as well.

Changes from V1:
  * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the failing unit test,
    make check pass now

  * Patches 6,7 are new as well: I added scsi_device_get as suggested by Stefan as well, although
    this is more a refactoring that anything else as it doesn't solve
    an existing race.

  * Addressed most of the review feedback from V1
    - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK

The series were tested with make check and smoke tested by booting a VM and
adding/removing few virtio-scsi disks from it in a loop.

Best regards,
	Maxim Levitsky

Maxim Levitsky (7):
  scsi/scsi_bus: switch search direction in scsi_device_find
  Implement drain_call_rcu and use it in hmp_device_del
  device-core: use RCU for list of childs of a bus
  device-core: use atomic_set on .realized property
  virtio-scsi: don't touch scsi devices that are not yet realized or
    about to be un-realized
  scsi: Add scsi_device_get
  virtio-scsi: use scsi_device_get

 hw/core/bus.c          | 36 ++++++++++++++++++-----------
 hw/core/qdev.c         | 52 ++++++++++++++++++++++++++++++------------
 hw/scsi/scsi-bus.c     | 48 +++++++++++++++++++++++++++++++++-----
 hw/scsi/virtio-scsi.c  | 46 +++++++++++++++++++++++++++++--------
 include/hw/qdev-core.h |  3 +++
 include/hw/scsi/scsi.h |  2 ++
 include/qemu/rcu.h     |  1 +
 qdev-monitor.c         |  3 +++
 util/rcu.c             | 33 +++++++++++++++++++++++++++
 9 files changed, 181 insertions(+), 43 deletions(-)

-- 
2.17.2


Re: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Posted by no-reply@patchew.org 3 years, 11 months ago
Patchew URL: https://patchew.org/QEMU/20200511160951.8733-1-mlevitsk@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    iotest-qcow2: 111
  TEST    iotest-qcow2: 114
**
ERROR:/tmp/qemu-test/src/qom/object.c:1124:object_unref: assertion failed: (obj->ref > 0)
Broken pipe
/tmp/qemu-test/src/tests/qtest/libqtest.c:175: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
ERROR - too few tests run (expected 6, got 5)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 117
  TEST    iotest-qcow2: 120
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=f4bb7d21c4384634a322c2f5cd38b1bf', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-t97jyowh/src/docker-src.2020-05-11-13.48.16.8561:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=f4bb7d21c4384634a322c2f5cd38b1bf
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-t97jyowh/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    15m0.135s
user    0m8.625s


The full log is available at
http://patchew.org/logs/20200511160951.8733-1-mlevitsk@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Posted by no-reply@patchew.org 3 years, 11 months ago
Patchew URL: https://patchew.org/QEMU/20200511160951.8733-1-mlevitsk@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200511160951.8733-1-mlevitsk@redhat.com
Subject: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
f10acc6 virtio-scsi: use scsi_device_get
4b28e41 scsi: Add scsi_device_get
969c784 virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized
c95c33f device-core: use atomic_set on .realized property
239a8ce device-core: use RCU for list of childs of a bus
dd0c3a8 Implement drain_call_rcu and use it in hmp_device_del
cc7a085 scsi/scsi_bus: switch search direction in scsi_device_find

=== OUTPUT BEGIN ===
1/7 Checking commit cc7a085c2c59 (scsi/scsi_bus: switch search direction in scsi_device_find)
2/7 Checking commit dd0c3a8cfd9b (Implement drain_call_rcu and use it in hmp_device_del)
3/7 Checking commit 239a8cee3c60 (device-core: use RCU for list of childs of a bus)
4/7 Checking commit c95c33f4a7dc (device-core: use atomic_set on .realized property)
5/7 Checking commit 969c784b8d8e (virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized)
6/7 Checking commit 4b28e41ee772 (scsi: Add scsi_device_get)
WARNING: Block comments use a trailing */ on a separate line
#29: FILE: hw/scsi/scsi-bus.c:1592:
+ * */

ERROR: braces {} are necessary for all arms of this statement
#67: FILE: hw/scsi/scsi-bus.c:1630:
+    if (!dev)
[...]

total: 1 errors, 1 warnings, 66 lines checked

Patch 6/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/7 Checking commit f10acc631cf7 (virtio-scsi: use scsi_device_get)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200511160951.8733-1-mlevitsk@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com