Hi Gavin, Let me figure out this. Have you also included the below patch along with the architecture agnostic patch-set accepted in this Qemu cycle? https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@linaro.org/ Thanks Salil. > From: Gavin Shan <gshan@redhat.com> > Sent: Wednesday, August 7, 2024 10:54 AM > To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org; mst@redhat.com > > Hi Salil, > > With this series and latest upstream Linux kernel (host), I ran into core > dump as below. > I'm not sure if it's a known issue or not. > > # uname -r > 6.11.0-rc2-gavin+ > # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel > kvm \ > -machine virt,gic-version=host,nvdimm=on -cpu host \ > -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1 \ > -m 4096M,slots=16,maxmem=128G \ > -object memory-backend-ram,id=mem0,size=2048M \ > -object memory-backend-ram,id=mem1,size=2048M \ > -numa node,nodeid=0,memdev=mem0,cpus=0-0 \ > -numa node,nodeid=1,memdev=mem1,cpus=1-1 \ > : > qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core > dumped) > > # gdb /var/lib/systemd/coredump/core.0 > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 > (gdb) bt > #0 0x0000ffff9eec42e8 in __pthread_kill_implementation () at > /lib64/libc.so.6 > #1 0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6 > #2 0x0000ffff9ee69034 in abort () at /lib64/libc.so.6 > #3 0x0000aaaac71152c0 in kvm_arm_create_host_vcpu > (cpu=0xaaaae4c0cb80) > at ../target/arm/kvm.c:1093 > #4 0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at > ../hw/arm/virt.c:2534 > #5 0x0000aaaac6b0d31c in machine_run_board_init > (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at > ../hw/core/machine.c:1576 > #6 0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620 > #7 0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120 > <error_fatal>) > at ../system/vl.c:2712 > #8 0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at > ../system/vl.c:3758 > #9 0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at > ../system/main.c:47 > > Thanks, > Gavin >
Hi Gavin, I tested ARM arch specific patches with the latest Qemu which contains below mentioned fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully. Though I did see a kernel crash on attempting to hotplug first vCPU. (qemu) device_add host-arm-cpu,id=core4,core-id=4 (qemu) [ 365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190 [ 365.126366] Mem abort info: [ 365.126640] ESR = 0x000000009600004e [ 365.127010] EC = 0x25: DABT (current EL), IL = 32 bits [ 365.127524] SET = 0, FnV = 0 [ 365.127822] EA = 0, S1PTW = 0 [ 365.128130] FSC = 0x0e: level 2 permission fault [ 365.128598] Data abort info: [ 365.128881] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 [ 365.129447] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 365.129943] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000 [ 365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781 [ 365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP [ 365.132661] Modules linked in: [ 365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228 [ 365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 365.135679] pc : register_cpu+0x138/0x250 [ 365.136093] lr : register_cpu+0x120/0x250 [ 365.136506] sp : ffff800082cbba10 [ 365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098 [ 365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0 [ 365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00 [ 365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff [ 365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c [ 365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000 [ 365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001 [ 365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460 [ 365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880 [ 365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010 [ 365.144129] Call trace: [ 365.144382] register_cpu+0x138/0x250 [ 365.144759] arch_register_cpu+0x7c/0xc4 [ 365.145166] acpi_processor_add+0x468/0x590 [ 365.145594] acpi_bus_attach+0x1ac/0x2dc [ 365.146002] acpi_dev_for_one_check+0x34/0x40 [ 365.146449] device_for_each_child+0x5c/0xb0 [ 365.146887] acpi_dev_for_each_child+0x3c/0x64 [ 365.147341] acpi_bus_attach+0x78/0x2dc [ 365.147734] acpi_bus_scan+0x68/0x208 [ 365.148110] acpi_scan_rescan_bus+0x4c/0x78 [ 365.148537] acpi_device_hotplug+0x1f8/0x460 [ 365.148975] acpi_hotplug_work_fn+0x24/0x3c [ 365.149402] process_one_work+0x150/0x294 [ 365.149817] worker_thread+0x2e4/0x3ec [ 365.150199] kthread+0x118/0x11c [ 365.150536] ret_from_fork+0x10/0x20 [ 365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f) [ 365.151527] ---[ end trace 0000000000000000 ]--- Do let me know how the Qemu with Arch specific patches goes. Thanks Salil. > From: Salil Mehta > Sent: Wednesday, August 7, 2024 2:27 PM > To: 'Gavin Shan' <gshan@redhat.com>; qemu-devel@nongnu.org; qemu- > arm@nongnu.org; mst@redhat.com > > Hi Gavin, > > Let me figure out this. Have you also included the below patch along with > the architecture agnostic patch-set accepted in this Qemu cycle? > > https://lore.kernel.org/all/20240801142322.3948866-3- > peter.maydell@linaro.org/ > > > Thanks > Salil. > > > From: Gavin Shan <gshan@redhat.com> > > Sent: Wednesday, August 7, 2024 10:54 AM > > To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; > > qemu-arm@nongnu.org; mst@redhat.com > > > > Hi Salil, > > > > With this series and latest upstream Linux kernel (host), I ran into > > core dump as below. > > I'm not sure if it's a known issue or not. > > > > # uname -r > > 6.11.0-rc2-gavin+ > > # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel > kvm > > \ > > -machine virt,gic-version=host,nvdimm=on -cpu host \ > > -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1 \ > > -m 4096M,slots=16,maxmem=128G \ > > -object memory-backend-ram,id=mem0,size=2048M \ > > -object memory-backend-ram,id=mem1,size=2048M \ > > -numa node,nodeid=0,memdev=mem0,cpus=0-0 \ > > -numa node,nodeid=1,memdev=mem1,cpus=1-1 \ > > : > > qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core > > dumped) > > > > # gdb /var/lib/systemd/coredump/core.0 > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 > > (gdb) bt > > #0 0x0000ffff9eec42e8 in __pthread_kill_implementation () at > > /lib64/libc.so.6 > > #1 0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6 > > #2 0x0000ffff9ee69034 in abort () at /lib64/libc.so.6 > > #3 0x0000aaaac71152c0 in kvm_arm_create_host_vcpu > > (cpu=0xaaaae4c0cb80) > > at ../target/arm/kvm.c:1093 > > #4 0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at > > ../hw/arm/virt.c:2534 > > #5 0x0000aaaac6b0d31c in machine_run_board_init > > (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at > > ../hw/core/machine.c:1576 > > #6 0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620 > > #7 0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120 > > <error_fatal>) > > at ../system/vl.c:2712 > > #8 0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at > > ../system/vl.c:3758 > > #9 0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at > > ../system/main.c:47 > > > > Thanks, > > Gavin > >
Hi Salil, On 8/8/24 2:07 AM, Salil Mehta wrote: > I tested ARM arch specific patches with the latest Qemu which contains below mentioned > fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully. > Though I did see a kernel crash on attempting to hotplug first vCPU. > > (qemu) device_add host-arm-cpu,id=core4,core-id=4 > (qemu) [ 365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190 > [ 365.126366] Mem abort info: > [ 365.126640] ESR = 0x000000009600004e > [ 365.127010] EC = 0x25: DABT (current EL), IL = 32 bits > [ 365.127524] SET = 0, FnV = 0 > [ 365.127822] EA = 0, S1PTW = 0 > [ 365.128130] FSC = 0x0e: level 2 permission fault > [ 365.128598] Data abort info: > [ 365.128881] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 > [ 365.129447] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > [ 365.129943] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000 > [ 365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781 > [ 365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP > [ 365.132661] Modules linked in: > [ 365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228 > [ 365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 > [ 365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn > [ 365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 365.135679] pc : register_cpu+0x138/0x250 > [ 365.136093] lr : register_cpu+0x120/0x250 > [ 365.136506] sp : ffff800082cbba10 > [ 365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098 > [ 365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0 > [ 365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00 > [ 365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff > [ 365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c > [ 365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000 > [ 365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001 > [ 365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460 > [ 365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880 > [ 365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010 > [ 365.144129] Call trace: > [ 365.144382] register_cpu+0x138/0x250 > [ 365.144759] arch_register_cpu+0x7c/0xc4 > [ 365.145166] acpi_processor_add+0x468/0x590 > [ 365.145594] acpi_bus_attach+0x1ac/0x2dc > [ 365.146002] acpi_dev_for_one_check+0x34/0x40 > [ 365.146449] device_for_each_child+0x5c/0xb0 > [ 365.146887] acpi_dev_for_each_child+0x3c/0x64 > [ 365.147341] acpi_bus_attach+0x78/0x2dc > [ 365.147734] acpi_bus_scan+0x68/0x208 > [ 365.148110] acpi_scan_rescan_bus+0x4c/0x78 > [ 365.148537] acpi_device_hotplug+0x1f8/0x460 > [ 365.148975] acpi_hotplug_work_fn+0x24/0x3c > [ 365.149402] process_one_work+0x150/0x294 > [ 365.149817] worker_thread+0x2e4/0x3ec > [ 365.150199] kthread+0x118/0x11c > [ 365.150536] ret_from_fork+0x10/0x20 > [ 365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f) > [ 365.151527] ---[ end trace 0000000000000000 ]--- > Should be fixed by: https://lkml.org/lkml/2024/8/8/155 Thanks, Gavin
Hi Salil, On 8/7/24 11:27 PM, Salil Mehta wrote: > > Let me figure out this. Have you also included the below patch along with the > architecture agnostic patch-set accepted in this Qemu cycle? > > https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@linaro.org/ > There are no vCPU fd to be parked and unparked when the core dump happenes. I tried it, but didn't help. I added more debugging messages and the core dump is triggered in the following path. It seems 'cpu->sve_vq.map' isn't correct since it's populated in CPU realization path, and those non-cold-booted CPUs aren't realized in the booting stage. # dmesg | grep "Scalable Vector Extension" [ 0.117121] CPU features: detected: Scalable Vector Extension # start_vm ===> machvirt_init: create CPU object (idx=0, type=[host-arm-cpu]) cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init ===> machvirt_init: realize CPU object (idx=0) virt_cpu_pre_plug arm_cpu_realizefn cpu_common_realizefn virt_cpu_plug ===> machvirt_init: create CPU object (idx=1, type=[host-arm-cpu]) cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init kvm_arch_init_vcpu: Error -22 from kvm_arm_sve_set_vls() qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core dumped) Thanks, Gavin >> >> With this series and latest upstream Linux kernel (host), I ran into core >> dump as below. >> I'm not sure if it's a known issue or not. >> >> # uname -r >> 6.11.0-rc2-gavin+ >> # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel >> kvm \ >> -machine virt,gic-version=host,nvdimm=on -cpu host \ >> -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1 \ >> -m 4096M,slots=16,maxmem=128G \ >> -object memory-backend-ram,id=mem0,size=2048M \ >> -object memory-backend-ram,id=mem1,size=2048M \ >> -numa node,nodeid=0,memdev=mem0,cpus=0-0 \ >> -numa node,nodeid=1,memdev=mem1,cpus=1-1 \ >> : >> qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core >> dumped) >> >> # gdb /var/lib/systemd/coredump/core.0 >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 >> (gdb) bt >> #0 0x0000ffff9eec42e8 in __pthread_kill_implementation () at >> /lib64/libc.so.6 >> #1 0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6 >> #2 0x0000ffff9ee69034 in abort () at /lib64/libc.so.6 >> #3 0x0000aaaac71152c0 in kvm_arm_create_host_vcpu >> (cpu=0xaaaae4c0cb80) >> at ../target/arm/kvm.c:1093 >> #4 0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at >> ../hw/arm/virt.c:2534 >> #5 0x0000aaaac6b0d31c in machine_run_board_init >> (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at >> ../hw/core/machine.c:1576 >> #6 0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620 >> #7 0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120 >> <error_fatal>) >> at ../system/vl.c:2712 >> #8 0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at >> ../system/vl.c:3758 >> #9 0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at >> ../system/main.c:47 >> >> Thanks, >> Gavin >> >
Hi Gavin, Thanks for further information. > From: Gavin Shan <gshan@redhat.com> > Sent: Thursday, August 8, 2024 12:41 AM > To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org; mst@redhat.com > > Hi Salil, > > On 8/7/24 11:27 PM, Salil Mehta wrote: > > > > Let me figure out this. Have you also included the below patch along > > with the architecture agnostic patch-set accepted in this Qemu cycle? > > > > https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@lin > > aro.org/ > > > > There are no vCPU fd to be parked and unparked when the core dump > happenes. I tried it, but didn't help. I added more debugging messages and > the core dump is triggered in the following path. It seems 'cpu- > >sve_vq.map' isn't correct since it's populated in CPU realization path, and > those non-cold-booted CPUs aren't realized in the booting stage. Ah, I've to fix the SVE support. I'm already working on it and will be part of the RFC V4. Have you tried booting VM by disabling the SVE support? > > # dmesg | grep "Scalable Vector Extension" > [ 0.117121] CPU features: detected: Scalable Vector Extension > > # start_vm > ===> machvirt_init: create CPU object (idx=0, type=[host-arm-cpu]) > cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn > aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init ===> > machvirt_init: realize CPU object (idx=0) virt_cpu_pre_plug > arm_cpu_realizefn cpu_common_realizefn virt_cpu_plug ===> > machvirt_init: create CPU object (idx=1, type=[host-arm-cpu]) > cpu_common_initfn arm_cpu_initfn aarch64_cpu_initfn > aarch64_cpu_instance_init aarch64_host_initfn arm_cpu_post_init > kvm_arch_init_vcpu: Error -22 from kvm_arm_sve_set_vls() > qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core > dumped) Yes, sure. Thanks Salil. > > Thanks, > Gavin > > >> > >> With this series and latest upstream Linux kernel (host), I ran into core > >> dump as below. > >> I'm not sure if it's a known issue or not. > >> > >> # uname -r > >> 6.11.0-rc2-gavin+ > >> # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 - > accel > >> kvm \ > >> -machine virt,gic-version=host,nvdimm=on -cpu host \ > >> -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1 \ > >> -m 4096M,slots=16,maxmem=128G \ > >> -object memory-backend-ram,id=mem0,size=2048M \ > >> -object memory-backend-ram,id=mem1,size=2048M \ > >> -numa node,nodeid=0,memdev=mem0,cpus=0-0 \ > >> -numa node,nodeid=1,memdev=mem1,cpus=1-1 \ > >> : > >> qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core > >> dumped) > >> > >> # gdb /var/lib/systemd/coredump/core.0 > >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 > >> (gdb) bt > >> #0 0x0000ffff9eec42e8 in __pthread_kill_implementation () at > >> /lib64/libc.so.6 > >> #1 0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6 > >> #2 0x0000ffff9ee69034 in abort () at /lib64/libc.so.6 > >> #3 0x0000aaaac71152c0 in kvm_arm_create_host_vcpu > >> (cpu=0xaaaae4c0cb80) > >> at ../target/arm/kvm.c:1093 > >> #4 0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at > >> ../hw/arm/virt.c:2534 > >> #5 0x0000aaaac6b0d31c in machine_run_board_init > >> (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at > >> ../hw/core/machine.c:1576 > >> #6 0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620 > >> #7 0x0000aaaac6f590dc in qmp_x_exit_preconfig > (errp=0xaaaac8911120 > >> <error_fatal>) > >> at ../system/vl.c:2712 > >> #8 0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at > >> ../system/vl.c:3758 > >> #9 0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at > >> ../system/main.c:47 > >> > >> Thanks, > >> Gavin > >> > > >
Hi Salil, On 8/8/24 9:48 AM, Salil Mehta wrote: >> On 8/7/24 11:27 PM, Salil Mehta wrote: >> > >> > Let me figure out this. Have you also included the below patch along >> > with the architecture agnostic patch-set accepted in this Qemu cycle? >> > >> > https://lore.kernel.org/all/20240801142322.3948866-3-peter.maydell@lin >> > aro.org/ >> > >> >> There are no vCPU fd to be parked and unparked when the core dump >> happenes. I tried it, but didn't help. I added more debugging messages and >> the core dump is triggered in the following path. It seems 'cpu- >> >sve_vq.map' isn't correct since it's populated in CPU realization path, and >> those non-cold-booted CPUs aren't realized in the booting stage. > > > Ah, I've to fix the SVE support. I'm already working on it and will be part of > the RFC V4. > > Have you tried booting VM by disabling the SVE support? > I'm able to boot the guest after SVE is disabled by clearing the corresponding bits in ID_AA64PFR0, as below. static bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf) { : /* * SVE is explicitly disabled. Otherwise, the non-cold-booted * CPUs can't be initialized in the vCPU hotplug scenario. */ err = read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr0, ARM64_SYS_REG(3, 0, 0, 4, 0)); ahcf->isar.id_aa64pfr0 &= ~R_ID_AA64PFR0_SVE_MASK; } However, I'm unable to hot-add a vCPU and haven't get a chance to look at it closely. (qemu) device_add host-arm-cpu,id=cpu,socket-id=1 (qemu) [ 258.901027] Unable to handle kernel write to read-only memory at virtual address ffff800080fa7190 [ 258.901686] Mem abort info: [ 258.901889] ESR = 0x000000009600004e [ 258.902160] EC = 0x25: DABT (current EL), IL = 32 bits [ 258.902543] SET = 0, FnV = 0 [ 258.902763] EA = 0, S1PTW = 0 [ 258.902991] FSC = 0x0e: level 2 permission fault [ 258.903338] Data abort info: [ 258.903547] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 [ 258.903943] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 258.904304] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 258.904687] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000 [ 258.905258] [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003, pmd=00600000b8c00781 [ 258.906026] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP [ 258.906474] Modules linked in: [ 258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0-rc2-gavin-gb446a2dae984 #7 [ 258.907338] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024 [ 258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 258.908899] pc : register_cpu+0x140/0x290 [ 258.909195] lr : register_cpu+0x128/0x290 [ 258.909487] sp : ffff8000817fba10 [ 258.909727] x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 [ 258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: ffff80008153dad0 [ 258.910762] x23: 0000000000000001 x22: ffff0000ff7de210 x21: ffff8000811b9a00 [ 258.911279] x20: 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff [ 258.911798] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c [ 258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000 [ 258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4 [ 258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefefefefefeff [ 258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3 : ffff0000053fabb0 [ 258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190 x0 : 0000000000000002 [ 258.914968] Call trace: [ 258.915154] register_cpu+0x140/0x290 [ 258.915429] arch_register_cpu+0x84/0xd8 [ 258.915726] acpi_processor_add+0x480/0x5b0 [ 258.916042] acpi_bus_attach+0x1c4/0x300 [ 258.916334] acpi_dev_for_one_check+0x3c/0x50 [ 258.916689] device_for_each_child+0x68/0xc8 [ 258.917012] acpi_dev_for_each_child+0x48/0x80 [ 258.917344] acpi_bus_attach+0x84/0x300 [ 258.917629] acpi_bus_scan+0x74/0x220 [ 258.917902] acpi_scan_rescan_bus+0x54/0x88 [ 258.918211] acpi_device_hotplug+0x208/0x478 [ 258.918529] acpi_hotplug_work_fn+0x2c/0x50 [ 258.918839] process_one_work+0x15c/0x3c0 [ 258.919139] worker_thread+0x2ec/0x400 [ 258.919417] kthread+0x120/0x130 [ 258.919658] ret_from_fork+0x10/0x20 [ 258.919924] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f) [ 258.920373] ---[ end trace 0000000000000000 ]--- Thanks, Gavin
Hi Gavin, > From: Gavin Shan <gshan@redhat.com> > Sent: Thursday, August 8, 2024 1:29 AM > To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org; mst@redhat.com > > Hi Salil, > > On 8/8/24 9:48 AM, Salil Mehta wrote: > >> On 8/7/24 11:27 PM, Salil Mehta wrote: > >> > > >> > Let me figure out this. Have you also included the below patch along > >> > with the architecture agnostic patch-set accepted in this Qemu cycle? > >> > > >> > https://lore.kernel.org/all/20240801142322.3948866-3- > peter.maydell@lin > >> > aro.org/ > >> > > >> > >> There are no vCPU fd to be parked and unparked when the core dump > >> happenes. I tried it, but didn't help. I added more debugging messages > and > >> the core dump is triggered in the following path. It seems 'cpu- > >> >sve_vq.map' isn't correct since it's populated in CPU realization path, > and > >> those non-cold-booted CPUs aren't realized in the booting stage. > > > > > > Ah, I've to fix the SVE support. I'm already working on it and will be > > part of the RFC V4. > > > > Have you tried booting VM by disabling the SVE support? > > > > I'm able to boot the guest after SVE is disabled by clearing the > corresponding bits in ID_AA64PFR0, as below. > > static bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf) > { > : > > /* > * SVE is explicitly disabled. Otherwise, the non-cold-booted > * CPUs can't be initialized in the vCPU hotplug scenario. > */ > err = read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr0, > ARM64_SYS_REG(3, 0, 0, 4, 0)); > ahcf->isar.id_aa64pfr0 &= ~R_ID_AA64PFR0_SVE_MASK; } > > However, I'm unable to hot-add a vCPU and haven't get a chance to look at > it closely. > > (qemu) device_add host-arm-cpu,id=cpu,socket-id=1 > (qemu) [ 258.901027] Unable to handle kernel write to read-only memory > at virtual address ffff800080fa7190 [ 258.901686] Mem abort info: > [ 258.901889] ESR = 0x000000009600004e > [ 258.902160] EC = 0x25: DABT (current EL), IL = 32 bits > [ 258.902543] SET = 0, FnV = 0 > [ 258.902763] EA = 0, S1PTW = 0 > [ 258.902991] FSC = 0x0e: level 2 permission fault > [ 258.903338] Data abort info: > [ 258.903547] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 > [ 258.903943] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > [ 258.904304] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 258.904687] swapper pgtable: 4k pages, 48-bit VAs, > pgdp=00000000b8e24000 [ 258.905258] [ffff800080fa7190] > pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003, > pmd=00600000b8c00781 [ 258.906026] Internal error: Oops: > 000000009600004e [#1] PREEMPT SMP [ 258.906474] Modules linked in: > [ 258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0- > rc2-gavin-gb446a2dae984 #7 [ 258.907338] Hardware name: QEMU KVM > Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024 [ > 258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ > 258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS > BTYPE=--) [ 258.908899] pc : register_cpu+0x140/0x290 [ 258.909195] lr : > register_cpu+0x128/0x290 [ 258.909487] sp : ffff8000817fba10 [ 258.909727] > x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 [ > 258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: > ffff80008153dad0 [ 258.910762] x23: 0000000000000001 x22: > ffff0000ff7de210 x21: ffff8000811b9a00 [ 258.911279] x20: > 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff [ 258.911798] > x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c [ > 258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000 > [ 258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 : > ffff8000808a6cd4 [ 258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f > x6 : fefefefefefefeff [ 258.913906] x5 : ffff0000053fab40 x4 : > ffff0000053fa920 x3 : ffff0000053fabb0 [ 258.914439] x2 : ffff000000de1100 > x1 : ffff800080fa7190 x0 : 0000000000000002 [ 258.914968] Call trace: > [ 258.915154] register_cpu+0x140/0x290 [ 258.915429] > arch_register_cpu+0x84/0xd8 [ 258.915726] > acpi_processor_add+0x480/0x5b0 [ 258.916042] > acpi_bus_attach+0x1c4/0x300 [ 258.916334] > acpi_dev_for_one_check+0x3c/0x50 [ 258.916689] > device_for_each_child+0x68/0xc8 [ 258.917012] > acpi_dev_for_each_child+0x48/0x80 [ 258.917344] > acpi_bus_attach+0x84/0x300 [ 258.917629] acpi_bus_scan+0x74/0x220 [ > 258.917902] acpi_scan_rescan_bus+0x54/0x88 [ 258.918211] > acpi_device_hotplug+0x208/0x478 [ 258.918529] > acpi_hotplug_work_fn+0x2c/0x50 [ 258.918839] > process_one_work+0x15c/0x3c0 [ 258.919139] > worker_thread+0x2ec/0x400 [ 258.919417] kthread+0x120/0x130 [ > 258.919658] ret_from_fork+0x10/0x20 [ 258.919924] Code: 91064021 > 9ad72000 8b130c33 d503201f (f820327f) [ 258.920373] ---[ end trace > 0000000000000000 ]--- Yes, this crash. Thanks for confirming! > > Thanks, > Gavin > >
Hi Salil, On 8/8/24 10:29 AM, Gavin Shan wrote: > On 8/8/24 9:48 AM, Salil Mehta wrote: > > However, I'm unable to hot-add a vCPU and haven't get a chance to look > at it closely. > > (qemu) device_add host-arm-cpu,id=cpu,socket-id=1 > (qemu) [ 258.901027] Unable to handle kernel write to read-only memory at virtual address ffff800080fa7190 > [ 258.901686] Mem abort info: > [ 258.901889] ESR = 0x000000009600004e > [ 258.902160] EC = 0x25: DABT (current EL), IL = 32 bits > [ 258.902543] SET = 0, FnV = 0 > [ 258.902763] EA = 0, S1PTW = 0 > [ 258.902991] FSC = 0x0e: level 2 permission fault > [ 258.903338] Data abort info: > [ 258.903547] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 > [ 258.903943] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > [ 258.904304] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 258.904687] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000 > [ 258.905258] [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003, pud=10000000b95b1003, pmd=00600000b8c00781 > [ 258.906026] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP > [ 258.906474] Modules linked in: > [ 258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted 6.11.0-rc2-gavin-gb446a2dae984 #7 > [ 258.907338] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024 > [ 258.908009] Workqueue: kacpi_hotplug acpi_hotplug_work_fn > [ 258.908401] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > [ 258.908899] pc : register_cpu+0x140/0x290 > [ 258.909195] lr : register_cpu+0x128/0x290 > [ 258.909487] sp : ffff8000817fba10 > [ 258.909727] x29: ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 > [ 258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: ffff80008153dad0 > [ 258.910762] x23: 0000000000000001 x22: ffff0000ff7de210 x21: ffff8000811b9a00 > [ 258.911279] x20: 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff > [ 258.911798] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000005a46a1c > [ 258.912326] x14: ffffffffffffffff x13: ffff000005a4632b x12: 0000000000000000 > [ 258.912854] x11: 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4 > [ 258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefefefefefeff > [ 258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3 : ffff0000053fabb0 > [ 258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190 x0 : 0000000000000002 > [ 258.914968] Call trace: > [ 258.915154] register_cpu+0x140/0x290 > [ 258.915429] arch_register_cpu+0x84/0xd8 > [ 258.915726] acpi_processor_add+0x480/0x5b0 > [ 258.916042] acpi_bus_attach+0x1c4/0x300 > [ 258.916334] acpi_dev_for_one_check+0x3c/0x50 > [ 258.916689] device_for_each_child+0x68/0xc8 > [ 258.917012] acpi_dev_for_each_child+0x48/0x80 > [ 258.917344] acpi_bus_attach+0x84/0x300 > [ 258.917629] acpi_bus_scan+0x74/0x220 > [ 258.917902] acpi_scan_rescan_bus+0x54/0x88 > [ 258.918211] acpi_device_hotplug+0x208/0x478 > [ 258.918529] acpi_hotplug_work_fn+0x2c/0x50 > [ 258.918839] process_one_work+0x15c/0x3c0 > [ 258.919139] worker_thread+0x2ec/0x400 > [ 258.919417] kthread+0x120/0x130 > [ 258.919658] ret_from_fork+0x10/0x20 > [ 258.919924] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f) > [ 258.920373] ---[ end trace 0000000000000000 ]--- > The fix [1] is needed by the guest kernel. With this, I'm able to hot add vCPU and hot remove vCPU successfully. [1] https://lkml.org/lkml/2024/8/8/155 Thanks, Gavin
Hi Gavin, > From: Gavin Shan <gshan@redhat.com> > Sent: Thursday, August 8, 2024 5:15 AM > To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org; mst@redhat.com > > Hi Salil, > > On 8/8/24 10:29 AM, Gavin Shan wrote: > > On 8/8/24 9:48 AM, Salil Mehta wrote: > > > > However, I'm unable to hot-add a vCPU and haven't get a chance to look > > at it closely. > > > > (qemu) device_add host-arm-cpu,id=cpu,socket-id=1 > > (qemu) [ 258.901027] Unable to handle kernel write to read-only > > memory at virtual address ffff800080fa7190 [ 258.901686] Mem abort info: > > [ 258.901889] ESR = 0x000000009600004e [ 258.902160] EC = 0x25: > > DABT (current EL), IL = 32 bits [ 258.902543] SET = 0, FnV = 0 [ > > 258.902763] EA = 0, S1PTW = 0 [ 258.902991] FSC = 0x0e: level 2 > > permission fault [ 258.903338] Data abort info: > > [ 258.903547] ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 [ > > 258.903943] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 258.904304] > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 258.904687] swapper > > pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e24000 [ 258.905258] > > [ffff800080fa7190] pgd=10000000b95b0003, p4d=10000000b95b0003, > > pud=10000000b95b1003, pmd=00600000b8c00781 [ 258.906026] Internal > > error: Oops: 000000009600004e [#1] PREEMPT SMP [ 258.906474] Modules > linked in: > > [ 258.906705] CPU: 0 UID: 0 PID: 29 Comm: kworker/u8:1 Not tainted > > 6.11.0-rc2-gavin-gb446a2dae984 #7 [ 258.907338] Hardware name: QEMU > > KVM Virtual Machine, BIOS edk2-stable202402-prebuilt.qemu.org > > 02/14/2024 [ 258.908009] Workqueue: kacpi_hotplug > > acpi_hotplug_work_fn [ 258.908401] pstate: 63400005 (nZCv daif +PAN > > -UAO +TCO +DIT -SSBS BTYPE=--) [ 258.908899] pc : > > register_cpu+0x140/0x290 [ 258.909195] lr : register_cpu+0x128/0x290 > > [ 258.909487] sp : ffff8000817fba10 [ 258.909727] x29: > > ffff8000817fba10 x28: 0000000000000000 x27: ffff0000011f9098 [ > > 258.910246] x26: ffff80008167b1b0 x25: 0000000000000001 x24: > > ffff80008153dad0 [ 258.910762] x23: 0000000000000001 x22: > > ffff0000ff7de210 x21: ffff8000811b9a00 [ 258.911279] x20: > > 0000000000000000 x19: ffff800080fa7190 x18: ffffffffffffffff [ > > 258.911798] x17: 0000000000000000 x16: 0000000000000000 x15: > > ffff000005a46a1c [ 258.912326] x14: ffffffffffffffff x13: > > ffff000005a4632b x12: 0000000000000000 [ 258.912854] x11: > > 0000000000000040 x10: 0000000000000000 x9 : ffff8000808a6cd4 [ > > 258.913382] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : > fefefefefefefeff [ 258.913906] x5 : ffff0000053fab40 x4 : ffff0000053fa920 x3 > : ffff0000053fabb0 [ 258.914439] x2 : ffff000000de1100 x1 : ffff800080fa7190 > x0 : 0000000000000002 [ 258.914968] Call trace: > > [ 258.915154] register_cpu+0x140/0x290 [ 258.915429] > > arch_register_cpu+0x84/0xd8 [ 258.915726] > > acpi_processor_add+0x480/0x5b0 [ 258.916042] > > acpi_bus_attach+0x1c4/0x300 [ 258.916334] > > acpi_dev_for_one_check+0x3c/0x50 [ 258.916689] > > device_for_each_child+0x68/0xc8 [ 258.917012] > > acpi_dev_for_each_child+0x48/0x80 [ 258.917344] > > acpi_bus_attach+0x84/0x300 [ 258.917629] acpi_bus_scan+0x74/0x220 [ > > 258.917902] acpi_scan_rescan_bus+0x54/0x88 [ 258.918211] > > acpi_device_hotplug+0x208/0x478 [ 258.918529] > > acpi_hotplug_work_fn+0x2c/0x50 [ 258.918839] > > process_one_work+0x15c/0x3c0 > [ 258.919139] worker_thread+0x2ec/0x400 > > [ 258.919417] kthread+0x120/0x130 [ 258.919658] > > ret_from_fork+0x10/0x20 [ 258.919924] Code: 91064021 9ad72000 > > 8b130c33 d503201f (f820327f) [ 258.920373] ---[ end trace > > 0000000000000000 ]--- > > > > The fix [1] is needed by the guest kernel. With this, I'm able to hot add vCPU > and hot remove vCPU successfully. > > [1] https://lkml.org/lkml/2024/8/8/155 Good catch in the kernel. And many thanks for fixing as well. > > Thanks, > Gavin >
© 2016 - 2024 Red Hat, Inc.