hw/virtio/vhost-user-scmi.c | 7 +++++++ include/hw/virtio/vhost-user-scmi.h | 1 + 2 files changed, 8 insertions(+)
The QEMU CI fails in virtio-scmi test occasionally. As reported by
Thomas Huth, this happens most likely when the system is loaded and it
fails with the following error:
qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
As discovered by Fabiano Rosas, the cause is a duplicate invocation of
msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_disconnect
...
qemu_chr_write_buffer
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_set_status
...
qemu_cleanup
While vu_scmi_stop calls are protected by vhost_dev_is_started()
check, it's apparently not enough. vhost-user-blk and vhost-user-gpio
use an extra protection, see f5b22d06fb (vhost: recheck dev state in
the vhost_migration_log routine) for the motivation. Let's use the
same in vhost-user-scmi, which fixes the failure above.
Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
---
hw/virtio/vhost-user-scmi.c | 7 +++++++
include/hw/virtio/vhost-user-scmi.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/hw/virtio/vhost-user-scmi.c b/hw/virtio/vhost-user-scmi.c
index d386fb2df9..918bb7dcf7 100644
--- a/hw/virtio/vhost-user-scmi.c
+++ b/hw/virtio/vhost-user-scmi.c
@@ -63,6 +63,7 @@ static int vu_scmi_start(VirtIODevice *vdev)
error_report("Error starting vhost-user-scmi: %d", ret);
goto err_guest_notifiers;
}
+ scmi->started_vu = true;
/*
* guest_notifier_mask/pending not used yet, so just unmask
@@ -90,6 +91,12 @@ static void vu_scmi_stop(VirtIODevice *vdev)
struct vhost_dev *vhost_dev = &scmi->vhost_dev;
int ret;
+ /* vhost_dev_is_started() check in the callers is not fully reliable. */
+ if (!scmi->started_vu) {
+ return;
+ }
+ scmi->started_vu = false;
+
if (!k->set_guest_notifiers) {
return;
}
diff --git a/include/hw/virtio/vhost-user-scmi.h b/include/hw/virtio/vhost-user-scmi.h
index 6175a74ebd..c90db77dd5 100644
--- a/include/hw/virtio/vhost-user-scmi.h
+++ b/include/hw/virtio/vhost-user-scmi.h
@@ -25,6 +25,7 @@ struct VHostUserSCMI {
VirtQueue *cmd_vq;
VirtQueue *event_vq;
bool connected;
+ bool started_vu;
};
#endif /* _QEMU_VHOST_USER_SCMI_H */
--
2.40.1
Milan Zamazal <mzamazal@redhat.com> writes: > The QEMU CI fails in virtio-scmi test occasionally. As reported by > Thomas Huth, this happens most likely when the system is loaded and it > fails with the following error: > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_disconnect > ... > qemu_chr_write_buffer > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_set_status > ... > qemu_cleanup > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > the vhost_migration_log routine) for the motivation. Let's use the > same in vhost-user-scmi, which fixes the failure above. > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Fabiano Rosas <farosas@suse.de> writes: > Milan Zamazal <mzamazal@redhat.com> writes: > >> The QEMU CI fails in virtio-scmi test occasionally. As reported by >> Thomas Huth, this happens most likely when the system is loaded and it >> fails with the following error: >> >> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >> msix_unset_vector_notifiers: Assertion >> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >> failed. >> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >> QEMU death from signal 6 (Aborted) (core dumped) >> >> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >> >> msix_unset_vector_notifiers >> virtio_pci_set_guest_notifiers >> vu_scmi_stop >> vu_scmi_disconnect >> ... >> qemu_chr_write_buffer >> >> msix_unset_vector_notifiers >> virtio_pci_set_guest_notifiers >> vu_scmi_stop >> vu_scmi_set_status >> ... >> qemu_cleanup >> >> While vu_scmi_stop calls are protected by vhost_dev_is_started() >> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >> the vhost_migration_log routine) for the motivation. Let's use the >> same in vhost-user-scmi, which fixes the failure above. >> >> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > > Reviewed-by: Fabiano Rosas <farosas@suse.de> Please note that this bug fix should IMO definitely go to 8.1, to not have a bug in vhost-user-scmi and to not have broken tests. Any chance to get it merged? Thanks, Milan
On 03/08/2023 09.10, Milan Zamazal wrote: > Fabiano Rosas <farosas@suse.de> writes: > >> Milan Zamazal <mzamazal@redhat.com> writes: >> >>> The QEMU CI fails in virtio-scmi test occasionally. As reported by >>> Thomas Huth, this happens most likely when the system is loaded and it >>> fails with the following error: >>> >>> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >>> msix_unset_vector_notifiers: Assertion >>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >>> failed. >>> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >>> QEMU death from signal 6 (Aborted) (core dumped) >>> >>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >>> >>> msix_unset_vector_notifiers >>> virtio_pci_set_guest_notifiers >>> vu_scmi_stop >>> vu_scmi_disconnect >>> ... >>> qemu_chr_write_buffer >>> >>> msix_unset_vector_notifiers >>> virtio_pci_set_guest_notifiers >>> vu_scmi_stop >>> vu_scmi_set_status >>> ... >>> qemu_cleanup >>> >>> While vu_scmi_stop calls are protected by vhost_dev_is_started() >>> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >>> the vhost_migration_log routine) for the motivation. Let's use the >>> same in vhost-user-scmi, which fixes the failure above. >>> >>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> >> >> Reviewed-by: Fabiano Rosas <farosas@suse.de> > > Please note that this bug fix should IMO definitely go to 8.1, to not > have a bug in vhost-user-scmi and to not have broken tests. Any chance > to get it merged? If nobody else is planning a pull request with this patch included, I can take it for my next PR (since it is fixing the CI, I just saw another failure here: https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) Thomas
On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote: > On 03/08/2023 09.10, Milan Zamazal wrote: > > Fabiano Rosas <farosas@suse.de> writes: > > > > > Milan Zamazal <mzamazal@redhat.com> writes: > > > > > > > The QEMU CI fails in virtio-scmi test occasionally. As reported by > > > > Thomas Huth, this happens most likely when the system is loaded and it > > > > fails with the following error: > > > > > > > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > > > > msix_unset_vector_notifiers: Assertion > > > > `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' > > > > failed. > > > > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected > > > > QEMU death from signal 6 (Aborted) (core dumped) > > > > > > > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > > > > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > > > > > > > msix_unset_vector_notifiers > > > > virtio_pci_set_guest_notifiers > > > > vu_scmi_stop > > > > vu_scmi_disconnect > > > > ... > > > > qemu_chr_write_buffer > > > > > > > > msix_unset_vector_notifiers > > > > virtio_pci_set_guest_notifiers > > > > vu_scmi_stop > > > > vu_scmi_set_status > > > > ... > > > > qemu_cleanup > > > > > > > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > > > > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > > > > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > > > > the vhost_migration_log routine) for the motivation. Let's use the > > > > same in vhost-user-scmi, which fixes the failure above. > > > > > > > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > > > > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > > > > > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > > > > Please note that this bug fix should IMO definitely go to 8.1, to not > > have a bug in vhost-user-scmi and to not have broken tests. Any chance > > to get it merged? > > If nobody else is planning a pull request with this patch included, I can > take it for my next PR (since it is fixing the CI, I just saw another > failure here: > https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) > > Thomas > I picked it up but if you like I can drop it. -- MST
On 03/08/2023 11.50, Michael S. Tsirkin wrote: > On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote: >> On 03/08/2023 09.10, Milan Zamazal wrote: >>> Fabiano Rosas <farosas@suse.de> writes: >>> >>>> Milan Zamazal <mzamazal@redhat.com> writes: >>>> >>>>> The QEMU CI fails in virtio-scmi test occasionally. As reported by >>>>> Thomas Huth, this happens most likely when the system is loaded and it >>>>> fails with the following error: >>>>> >>>>> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >>>>> msix_unset_vector_notifiers: Assertion >>>>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >>>>> failed. >>>>> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >>>>> QEMU death from signal 6 (Aborted) (core dumped) >>>>> >>>>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >>>>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >>>>> >>>>> msix_unset_vector_notifiers >>>>> virtio_pci_set_guest_notifiers >>>>> vu_scmi_stop >>>>> vu_scmi_disconnect >>>>> ... >>>>> qemu_chr_write_buffer >>>>> >>>>> msix_unset_vector_notifiers >>>>> virtio_pci_set_guest_notifiers >>>>> vu_scmi_stop >>>>> vu_scmi_set_status >>>>> ... >>>>> qemu_cleanup >>>>> >>>>> While vu_scmi_stop calls are protected by vhost_dev_is_started() >>>>> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >>>>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >>>>> the vhost_migration_log routine) for the motivation. Let's use the >>>>> same in vhost-user-scmi, which fixes the failure above. >>>>> >>>>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >>>>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> >>>> >>>> Reviewed-by: Fabiano Rosas <farosas@suse.de> >>> >>> Please note that this bug fix should IMO definitely go to 8.1, to not >>> have a bug in vhost-user-scmi and to not have broken tests. Any chance >>> to get it merged? >> >> If nobody else is planning a pull request with this patch included, I can >> take it for my next PR (since it is fixing the CI, I just saw another >> failure here: >> https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) >> >> Thomas >> > > I picked it up but if you like I can drop it. I think it better fits into your tree, so if you plan a PR before the next RC, then please go ahead. I'll drop it from my branch again. Thomas
On 20/07/2023 12.10, Milan Zamazal wrote: > The QEMU CI fails in virtio-scmi test occasionally. As reported by > Thomas Huth, this happens most likely when the system is loaded and it > fails with the following error: > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_disconnect > ... > qemu_chr_write_buffer > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_set_status > ... > qemu_cleanup > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > the vhost_migration_log routine) for the motivation. Let's use the > same in vhost-user-scmi, which fixes the failure above. > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > --- > hw/virtio/vhost-user-scmi.c | 7 +++++++ > include/hw/virtio/vhost-user-scmi.h | 1 + > 2 files changed, 8 insertions(+) Thanks, that fixes the problem for me! Tested-by: Thomas Huth <thuth@redhat.com>
© 2016 - 2024 Red Hat, Inc.