In this version: - rename rcu_read_locked() to rcu_read_is_locked(). - adjust the sanity check in address_space_to_flatview(). - improve some comments. The duration of loading non-iterable vmstate accounts for a significant portion of downtime (starting with the timestamp of source qemu stop and ending with the timestamp of target qemu start). Most of the time is spent committing memory region changes repeatedly. This patch packs all the changes to memory region during the period of loading non-iterable vmstate in a single memory transaction. With the increase of devices, this patch will greatly improve the performance. Here are the test1 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 8 16-queue vhost-net device - 16 4-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 150 ms 740+ ms after about 30 ms 630+ ms (This result is different from that of v1. It may be that someone has changed something on my host.., but it does not affect the display of the optimization effect.) In test2, we keep the number of the device the same as test1, reduce the number of queues per device: Here are the test2 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 8 1-queue vhost-net device - 16 1-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 90 ms about 250 ms after about 25 ms about 160 ms In test3, we keep the number of queues per device the same as test1, reduce the number of devices: Here are the test3 results: test info: - Host - Intel(R) Xeon(R) Platinum 8260 CPU - NVIDIA Mellanox ConnectX-5 - VM - 32 CPUs 128GB RAM VM - 1 16-queue vhost-net device - 1 4-queue vhost-user-blk device. time of loading non-iterable vmstate downtime before about 20 ms about 70 ms after about 11 ms about 60 ms As we can see from the test results above, both the number of queues and the number of devices have a great impact on the time of loading non-iterable vmstate. The growth of the number of devices and queues will lead to more mr commits, and the time consumption caused by the flatview reconstruction will also increase. Please review, Chuang. [v4] - attach more information in the cover letter. - remove changes on virtio_load. - add rcu_read_locked() to detect holding of rcu lock. [v3] - move virtio_load_check_delay() from virtio_memory_listener_commit() to virtio_vmstate_change(). - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() will be called when delay_check is true. [v2] - rebase to latest upstream. - add sanity check to address_space_to_flatview(). - postpone the init of the vring cache until migration's loading completes. [v1] The duration of loading non-iterable vmstate accounts for a significant portion of downtime (starting with the timestamp of source qemu stop and ending with the timestamp of target qemu start). Most of the time is spent committing memory region changes repeatedly. This patch packs all the changes to memory region during the period of loading non-iterable vmstate in a single memory transaction. With the increase of devices, this patch will greatly improve the performance. Here are the test results: test vm info: - 32 CPUs 128GB RAM - 8 16-queue vhost-net device - 16 4-queue vhost-user-blk device. time of loading non-iterable vmstate before about 210 ms after about 40 ms
On Tue, Jan 17, 2023 at 07:55:08PM +0800, Chuang Xu wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. Acked-by: Peter Xu <peterx@redhat.com> -- Peter Xu
On 1/17/23 12:55, Chuang Xu wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. > > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms > > great improvements on the load times, congrats! Claudio
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. queued series. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. > > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. Hi As on the review, I aggree with the patches, but I am waiting for Paolo to review the rcu change (that I think that it is trivial, but I am not the rcu maintainer). If it happens that you need to send another version, I think that you can change the RFC for PATCH. Again, very good job. Later, Juan. > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: Hi I had to drop this. It breaks migration of dbus-vmstate. .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. Broken pipe ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) TAP parsing error: Too few tests run (expected 23, got 12) ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― Can you take a look at this? I reproduced it with "make check" and qemu compiled with the configure line attached. Later, Juan. configure --enable-trace-backends=log --prefix=/usr --sysconfdir=/etc/sysconfig/ --audio-drv-list=pa --target-list=x86_64-softmmu --with-coroutine=ucontext --with-git-submodules=validate --enable-fdt=system --enable-alsa --enable-attr --enable-auth-pam --enable-avx2 --enable-avx512f --enable-bochs --enable-bpf --enable-brlapi --disable-bsd-user --enable-bzip2 --enable-cap-ng --disable-capstone --disable-cfi --disable-cfi-debug --enable-cloop --disable-cocoa --enable-containers --disable-coreaudio --enable-coroutine-pool --enable-crypto-afalg --enable-curl --enable-curses --enable-dbus-display --enable-debug-info --disable-debug-mutex --disable-debug-stack-usage --disable-debug-tcg --enable-dmg --disable-docs --disable-dsound --enable-fdt --disable-fuse --disable-fuse-lseek --disable-fuzzing --disable-gcov --enable-gettext --enable-gio --enable-glusterfs --enable-gnutls --disable-gprof --enable-gtk --disable-guest-agent --disable-guest-agent-msi --disable-hax --disable-hvf --enable-iconv --enable-install-blobs --enable-jack --enable-keyring --enable-kvm --enable-l2tpv3 --enable-libdaxctl --enable-libiscsi --enable-libnfs --enable-libpmem --enable-libssh --enable-libudev --enable-libusb --enable-linux-aio --enable-linux-io-uring --disable-linux-user --enable-live-block-migration --disable-lto --disable-lzfse --enable-lzo --disable-malloc-trim --enable-membarrier --enable-module-upgrades --enable-modules --enable-mpath --enable-multiprocess --disable-netmap --enable-nettle --enable-numa --disable-nvmm --enable-opengl --enable-oss --enable-pa --enable-parallels --enable-pie --enable-plugins --enable-png --disable-profiler --enable-pvrdma --enable-qcow1 --enable-qed --disable-qom-cast-debug --enable-rbd --enable-rdma --enable-replication --enable-rng-none --disable-safe-stack --disable-sanitizers --enable-stack-protector --enable-sdl --enable-sdl-image --enable-seccomp --enable-selinux --enable-slirp --enable-slirp-smbd --enable-smartcard --enable-snappy --enable-sparse --enable-spice --enable-spice-protocol --enable-system --enable-tcg --disable-tcg-interpreter --disable-tools --enable-tpm --disable-tsan --disable-u2f --enable-usb-redir --disable-user --disable-vde --enable-vdi --enable-vhost-crypto --enable-vhost-kernel --enable-vhost-net --enable-vhost-user --enable-vhost-user-blk-server --enable-vhost-vdpa --enable-virglrenderer --enable-virtfs --enable-virtiofsd --enable-vnc --enable-vnc-jpeg --enable-vnc-sasl --enable-vte --enable-vvfat --enable-werror --disable-whpx --enable-xen --enable-xen-pci-passthrough --enable-xkbcommon --enable-zstd --disable-gcrypt
Hi, Juan On 2023/2/16 上午3:10, Juan Quintela wrote: > Chuang Xu <xuchuangxclwt@bytedance.com> wrote: >> In this version: > Hi > > I had to drop this. It breaks migration of dbus-vmstate. > > .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. > Broken pipe > ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 23, got 12) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Can you take a look at this? > > I reproduced it with "make check" and qemu compiled with the configure > line attached. > > Later, Juan. > > configure --enable-trace-backends=log > --prefix=/usr > --sysconfdir=/etc/sysconfig/ > --audio-drv-list=pa > --target-list=x86_64-softmmu > --with-coroutine=ucontext > --with-git-submodules=validate > --enable-fdt=system > --enable-alsa > --enable-attr > --enable-auth-pam > --enable-avx2 > --enable-avx512f > --enable-bochs > --enable-bpf > --enable-brlapi > --disable-bsd-user > --enable-bzip2 > --enable-cap-ng > --disable-capstone > --disable-cfi > --disable-cfi-debug > --enable-cloop > --disable-cocoa > --enable-containers > --disable-coreaudio > --enable-coroutine-pool > --enable-crypto-afalg > --enable-curl > --enable-curses > --enable-dbus-display > --enable-debug-info > --disable-debug-mutex > --disable-debug-stack-usage > --disable-debug-tcg > --enable-dmg > --disable-docs > --disable-dsound > --enable-fdt > --disable-fuse > --disable-fuse-lseek > --disable-fuzzing > --disable-gcov > --enable-gettext > --enable-gio > --enable-glusterfs > --enable-gnutls > --disable-gprof > --enable-gtk > --disable-guest-agent > --disable-guest-agent-msi > --disable-hax > --disable-hvf > --enable-iconv > --enable-install-blobs > --enable-jack > --enable-keyring > --enable-kvm > --enable-l2tpv3 > --enable-libdaxctl > --enable-libiscsi > --enable-libnfs > --enable-libpmem > --enable-libssh > --enable-libudev > --enable-libusb > --enable-linux-aio > --enable-linux-io-uring > --disable-linux-user > --enable-live-block-migration > --disable-lto > --disable-lzfse > --enable-lzo > --disable-malloc-trim > --enable-membarrier > --enable-module-upgrades > --enable-modules > --enable-mpath > --enable-multiprocess > --disable-netmap > --enable-nettle > --enable-numa > --disable-nvmm > --enable-opengl > --enable-oss > --enable-pa > --enable-parallels > --enable-pie > --enable-plugins > --enable-png > --disable-profiler > --enable-pvrdma > --enable-qcow1 > --enable-qed > --disable-qom-cast-debug > --enable-rbd > --enable-rdma > --enable-replication > --enable-rng-none > --disable-safe-stack > --disable-sanitizers > --enable-stack-protector > --enable-sdl > --enable-sdl-image > --enable-seccomp > --enable-selinux > --enable-slirp > --enable-slirp-smbd > --enable-smartcard > --enable-snappy > --enable-sparse > --enable-spice > --enable-spice-protocol > --enable-system > --enable-tcg > --disable-tcg-interpreter > --disable-tools > --enable-tpm > --disable-tsan > --disable-u2f > --enable-usb-redir > --disable-user > --disable-vde > --enable-vdi > --enable-vhost-crypto > --enable-vhost-kernel > --enable-vhost-net > --enable-vhost-user > --enable-vhost-user-blk-server > --enable-vhost-vdpa > --enable-virglrenderer > --enable-virtfs > --enable-virtiofsd > --enable-vnc > --enable-vnc-jpeg > --enable-vnc-sasl > --enable-vte > --enable-vvfat > --enable-werror > --disable-whpx > --enable-xen > --enable-xen-pci-passthrough > --enable-xkbcommon > --enable-zstd > --disable-gcrypt I'll fix this error in v6. In addition to the test mentioned in your email, are there any other conditions that need to be tested? I hope to have a full test before I send v6. Thanks!
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > Hi, Juan >> --target-list=x86_64-softmmu Compile withouth this line, that will compile all system emulators. If you pass "make check" there, I would think that you have done your part. Thanks, Juan.
Hi, Juan Thanks for your test results! On 2023/2/16 上午3:10, Juan Quintela wrote: > Chuang Xu wrote: >> In this version: > Hi > > I had to drop this. It breaks migration of dbus-vmstate. Previously, I only focused on the precopy migration test in the normal scenario, but did not run qtest. So I need to apologize for my inexperience.. > > .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. > Broken pipe > ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 23, got 12) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Can you take a look at this? > > I reproduced it with "make check" and qemu compiled with the configure > line attached. > > Later, Juan. > > configure --enable-trace-backends=log > --prefix=/usr > --sysconfdir=/etc/sysconfig/ > --audio-drv-list=pa > --target-list=x86_64-softmmu > --with-coroutine=ucontext > --with-git-submodules=validate > --enable-fdt=system > --enable-alsa > --enable-attr > --enable-auth-pam > --enable-avx2 > --enable-avx512f > --enable-bochs > --enable-bpf > --enable-brlapi > --disable-bsd-user > --enable-bzip2 > --enable-cap-ng > --disable-capstone > --disable-cfi > --disable-cfi-debug > --enable-cloop > --disable-cocoa > --enable-containers > --disable-coreaudio > --enable-coroutine-pool > --enable-crypto-afalg > --enable-curl > --enable-curses > --enable-dbus-display > --enable-debug-info > --disable-debug-mutex > --disable-debug-stack-usage > --disable-debug-tcg > --enable-dmg > --disable-docs > --disable-dsound > --enable-fdt > --disable-fuse > --disable-fuse-lseek > --disable-fuzzing > --disable-gcov > --enable-gettext > --enable-gio > --enable-glusterfs > --enable-gnutls > --disable-gprof > --enable-gtk > --disable-guest-agent > --disable-guest-agent-msi > --disable-hax > --disable-hvf > --enable-iconv > --enable-install-blobs > --enable-jack > --enable-keyring > --enable-kvm > --enable-l2tpv3 > --enable-libdaxctl > --enable-libiscsi > --enable-libnfs > --enable-libpmem > --enable-libssh > --enable-libudev > --enable-libusb > --enable-linux-aio > --enable-linux-io-uring > --disable-linux-user > --enable-live-block-migration > --disable-lto > --disable-lzfse > --enable-lzo > --disable-malloc-trim > --enable-membarrier > --enable-module-upgrades > --enable-modules > --enable-mpath > --enable-multiprocess > --disable-netmap > --enable-nettle > --enable-numa > --disable-nvmm > --enable-opengl > --enable-oss > --enable-pa > --enable-parallels > --enable-pie > --enable-plugins > --enable-png > --disable-profiler > --enable-pvrdma > --enable-qcow1 > --enable-qed > --disable-qom-cast-debug > --enable-rbd > --enable-rdma > --enable-replication > --enable-rng-none > --disable-safe-stack > --disable-sanitizers > --enable-stack-protector > --enable-sdl > --enable-sdl-image > --enable-seccomp > --enable-selinux > --enable-slirp > --enable-slirp-smbd > --enable-smartcard > --enable-snappy > --enable-sparse > --enable-spice > --enable-spice-protocol > --enable-system > --enable-tcg > --disable-tcg-interpreter > --disable-tools > --enable-tpm > --disable-tsan > --disable-u2f > --enable-usb-redir > --disable-user > --disable-vde > --enable-vdi > --enable-vhost-crypto > --enable-vhost-kernel > --enable-vhost-net > --enable-vhost-user > --enable-vhost-user-blk-server > --enable-vhost-vdpa > --enable-virglrenderer > --enable-virtfs > --enable-virtiofsd > --enable-vnc > --enable-vnc-jpeg > --enable-vnc-sasl > --enable-vte > --enable-vvfat > --enable-werror > --disable-whpx > --enable-xen > --enable-xen-pci-passthrough > --enable-xkbcommon > --enable-zstd > --disable-gcrypt I ran qtest with reference to your environment, and finally reported two errors. Error 1(the same as yours): QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/virtio-net-failover --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: /data00/migration/qemu-open/include/exec/memory.h:1114: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. Broken pipe ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) TAP parsing error: Too few tests run (expected 23, got 12) Coredump backtrace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f3af64a8535 in __GI_abort () at abort.c:79 #2 0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()", file=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=1114, function=) at assert.c:92 #3 0x00007f3af64b61a2 in __GI___assert_fail (assertion=assertion@entry=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()", file=file@entry=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=line@entry=1114, function=function@entry=0x55d9426cdce0 <__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101 #4 0x000055d942373853 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/exec/memory.h:1112 #5 0x000055d9423746f5 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126 #6 address_space_set_flatview (as=as@entry=0x55d944738648) at ../softmmu/memory.c:1029 #7 0x000055d94237ace3 in address_space_update_topology (as=0x55d944738648) at ../softmmu/memory.c:1080 #8 address_space_init (as=as@entry=0x55d944738648, root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 "virtio-net-pci") at ../softmmu/memory.c:3082 #9 0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, devfn=, name=0x55d9444b6c40 "virtio-net-pci", pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145 #10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at ../hw/pci/pci.c:2036 #11 0x000055d942404a8f in device_set_realized (obj=, value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510 #12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, v=, name=, opaque=0x55d9444c71d0, errp=0x7f3aef7feae0) at ../qom/object.c:2285 #13 0x000055d94240a0e3 in object_property_set (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1420 #14 0x000055d94240d15f in object_property_set_qobject (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=0x55d945306cb0, errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28 #15 0x000055d94240a354 in object_property_set_bool (obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1489 #16 0x000055d94240427c in qdev_realize (dev=, bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at ../hw/core/qdev.c:292 #17 0x000055d9421ef4a0 in qdev_device_add_from_qdict (opts=0x55d945309c00, from_json=, errp=, errp@entry=0x7f3aef7feae0) at /data00/migration/qemu-open/include/hw/qdev-core.h:17 #18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, n=0x55d9454e8530) at ../hw/net/virtio-net.c:933 #19 virtio_net_set_features (vdev=, features=4611687122246533156) at ../hw/net/virtio-net.c:1004 #20 0x000055d94233d248 in virtio_set_features_nocheck (vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851 #21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530, f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027 #22 0x000055d942207601 in vmstate_load_state (f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 , opaque=0x55d9454e8530, version_id=11) at ../migration/vmstate.c:137 #23 0x000055d942222672 in vmstate_load (f=0x55d944700de0, se=0x55d94561b700) at ../migration/savevm.c:919 #24 0x000055d942222927 in qemu_loadvm_section_start_full (f=f@entry=0x55d944700de0, mis=0x55d9444c23e0) at ../migration/savevm.c:2503 #25 0x000055d942225cc8 in qemu_loadvm_state_main (f=f@entry=0x55d944700de0, mis=mis@entry=0x55d9444c23e0) at ../migration/savevm.c:2729 #26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at ../migration/savevm.c:2816 #27 0x000055d94221480e in process_incoming_migration_co (opaque=) at ../migration/migration.c:606 #28 0x000055d94257d2ab in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:177 #29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 , end=0x0, thousands=0 L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267> "H\213\214$\b\001") at grouping.c:171 #30 0x0000000000000000 in ?? () It seems that when address_space_to_flatview() is called, there is mr transaction in progress, and the rcu read lock is not held. I need to further consider the conditions for sanity check or whether we can hold the rcu read lock before address_space_init() to solve the problem. Error 2: ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR 180/180 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 146.32s killed by signal 6 SIGABRT >>> QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― qemu-system-x86_64: ../softmmu/memory.c:1094: memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()' failed. ** ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) Coredump backtrace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fed5c14d535 in __GI_abort () at abort.c:79 #2 0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424 "qemu_mutex_iothread_locked()", file=0x561bc41ae94b "../softmmu/memory.c", line=1094, function=) at assert.c:92 #3 0x00007fed5c15b1a2 in __GI___assert_fail (assertion=assertion@entry=0x561bc4152424 "qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b "../softmmu/memory.c", line=line@entry=1094, function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746> "memory_region_transaction_commit") at assert.c:101 #4 0x0000561bc3e5a053 in memory_region_transaction_commit () at ../softmmu/memory.c:1094 #5 0x0000561bc3d07b55 in qemu_loadvm_state_main (f=f@entry=0x561bc6443aa0, mis=mis@entry=0x561bc62028a0) at ../migration/savevm.c:2789 #6 0x0000561bc3d08e46 in postcopy_ram_listen_thread (opaque=opaque@entry=0x561bc62028a0) at ../migration/savevm.c:1922 #7 0x0000561bc404b3da in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #8 0x00007fed5c2f2fa3 in start_thread (arg=) at pthread_create.c:486 #9 0x00007fed5c22406f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Error2 is related to postcopy. I don't know much about the code of postcopy. So I need some time to look at this part of code. And later I will send another email to discuss it with Peter. Copy Peter. Thanks!
© 2016 - 2024 Red Hat, Inc.