[v3] RE: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

RE: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

Posted by Salil Mehta via 3 months, 2 weeks ago

Hi Gavin,

I tested ARM arch specific patches with the latest Qemu which contains below mentioned
fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully.
Though I did see a kernel crash on attempting to hotplug first vCPU. 

(qemu) device_add host-arm-cpu,id=core4,core-id=4
(qemu) [  365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190
[  365.126366] Mem abort info:
[  365.126640]   ESR = 0x000000009600004e
[  365.127010]   EC = 0x25: DABT (current EL), IL = 32 bits
[  365.127524]   SET = 0, FnV = 0
[  365.127822]   EA = 0, S1PTW = 0
[  365.128130]   FSC = 0x0e: level 2 permission fault
[  365.128598] Data abort info:
[  365.128881]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
[  365.129447]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  365.129943]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000
[  365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781
[  365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
[  365.132661] Modules linked in:
[  365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228
[  365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  365.135679] pc : register_cpu+0x138/0x250
[  365.136093] lr : register_cpu+0x120/0x250
[  365.136506] sp : ffff800082cbba10
[  365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098
[  365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0
[  365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00
[  365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff
[  365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c
[  365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000
[  365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001
[  365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460
[  365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880
[  365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010
[  365.144129] Call trace:
[  365.144382]  register_cpu+0x138/0x250
[  365.144759]  arch_register_cpu+0x7c/0xc4
[  365.145166]  acpi_processor_add+0x468/0x590
[  365.145594]  acpi_bus_attach+0x1ac/0x2dc
[  365.146002]  acpi_dev_for_one_check+0x34/0x40
[  365.146449]  device_for_each_child+0x5c/0xb0
[  365.146887]  acpi_dev_for_each_child+0x3c/0x64
[  365.147341]  acpi_bus_attach+0x78/0x2dc
[  365.147734]  acpi_bus_scan+0x68/0x208
[  365.148110]  acpi_scan_rescan_bus+0x4c/0x78
[  365.148537]  acpi_device_hotplug+0x1f8/0x460
[  365.148975]  acpi_hotplug_work_fn+0x24/0x3c
[  365.149402]  process_one_work+0x150/0x294
[  365.149817]  worker_thread+0x2e4/0x3ec
[  365.150199]  kthread+0x118/0x11c
[  365.150536]  ret_from_fork+0x10/0x20
[  365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
[  365.151527] ---[ end trace 0000000000000000 ]---


Do let me know how the Qemu with Arch specific patches goes.

Thanks
Salil.

>  From: Salil Mehta
>  Sent: Wednesday, August 7, 2024 2:27 PM
>  To: 'Gavin Shan' <gshan@redhat.com>; qemu-devel@nongnu.org; qemu-
>  arm@nongnu.org; mst@redhat.com
>  
>  Hi Gavin,
>  
>  Let me figure out this. Have you also included the below patch along with
>  the architecture agnostic patch-set accepted in this Qemu cycle?
>  
>  https://lore.kernel.org/all/20240801142322.3948866-3-
>  peter.maydell@linaro.org/
>  
>  
>  Thanks
>  Salil.
>  
>  >  From: Gavin Shan <gshan@redhat.com>
>  >  Sent: Wednesday, August 7, 2024 10:54 AM
>  >  To: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  > qemu-arm@nongnu.org; mst@redhat.com
>  >
>  >  Hi Salil,
>  >
>  >  With this series and latest upstream Linux kernel (host), I ran into
>  > core  dump as below.
>  >  I'm not sure if it's a known issue or not.
>  >
>  >  # uname -r
>  >  6.11.0-rc2-gavin+
>  >  # /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 -accel
>  kvm
>  > \
>  >     -machine virt,gic-version=host,nvdimm=on -cpu host                 \
>  >     -smp maxcpus=2,cpus=1,sockets=2,clusters=1,cores=1,threads=1       \
>  >     -m 4096M,slots=16,maxmem=128G                                      \
>  >     -object memory-backend-ram,id=mem0,size=2048M                      \
>  >     -object memory-backend-ram,id=mem1,size=2048M                      \
>  >     -numa node,nodeid=0,memdev=mem0,cpus=0-0                           \
>  >     -numa node,nodeid=1,memdev=mem1,cpus=1-1                           \
>  >       :
>  >  qemu-system-aarch64: Failed to initialize host vcpu 1 Aborted (core
>  >  dumped)
>  >
>  >  # gdb /var/lib/systemd/coredump/core.0
>  >  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64
>  >  (gdb) bt
>  >  #0  0x0000ffff9eec42e8 in __pthread_kill_implementation () at
>  >  /lib64/libc.so.6
>  >  #1  0x0000ffff9ee7c73c in raise () at /lib64/libc.so.6
>  >  #2  0x0000ffff9ee69034 in abort () at /lib64/libc.so.6
>  >  #3  0x0000aaaac71152c0 in kvm_arm_create_host_vcpu
>  >  (cpu=0xaaaae4c0cb80)
>  >       at ../target/arm/kvm.c:1093
>  >  #4  0x0000aaaac7057520 in machvirt_init (machine=0xaaaae48198c0) at
>  >  ../hw/arm/virt.c:2534
>  >  #5  0x0000aaaac6b0d31c in machine_run_board_init
>  >       (machine=0xaaaae48198c0, mem_path=0x0, errp=0xfffff754ee38) at
>  >  ../hw/core/machine.c:1576
>  >  #6  0x0000aaaac6f58d70 in qemu_init_board () at ../system/vl.c:2620
>  >  #7  0x0000aaaac6f590dc in qmp_x_exit_preconfig (errp=0xaaaac8911120
>  >  <error_fatal>)
>  >       at ../system/vl.c:2712
>  >  #8  0x0000aaaac6f5b728 in qemu_init (argc=82, argv=0xfffff754f1d8) at
>  >  ../system/vl.c:3758
>  >  #9  0x0000aaaac6a5315c in main (argc=82, argv=0xfffff754f1d8) at
>  >  ../system/main.c:47
>  >
>  >  Thanks,
>  >  Gavin
>  >

Re: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

Posted by Gavin Shan 3 months, 2 weeks ago

Hi Salil,

On 8/8/24 2:07 AM, Salil Mehta wrote:
> I tested ARM arch specific patches with the latest Qemu which contains below mentioned
> fix and I cannot reproduce the crash. I used kernel linux-6.11-rc2 and it booted successfully.
> Though I did see a kernel crash on attempting to hotplug first vCPU.
> 
> (qemu) device_add host-arm-cpu,id=core4,core-id=4
> (qemu) [  365.125477] Unable to handle kernel write to read-only memory at virtual address ffff800081ba4190
> [  365.126366] Mem abort info:
> [  365.126640]   ESR = 0x000000009600004e
> [  365.127010]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  365.127524]   SET = 0, FnV = 0
> [  365.127822]   EA = 0, S1PTW = 0
> [  365.128130]   FSC = 0x0e: level 2 permission fault
> [  365.128598] Data abort info:
> [  365.128881]   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> [  365.129447]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> [  365.129943]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  365.130442] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000045830000
> [  365.131068] [ffff800081ba4190] pgd=0000000000000000, p4d=10000000467df003, pud=10000000467e0003, pmd=0060000045600781
> [  365.132069] Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
> [  365.132661] Modules linked in:
> [  365.132952] CPU: 0 UID: 0 PID: 11 Comm: kworker/u24:0 Not tainted 6.11.0-rc2 #228
> [  365.133699] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [  365.134415] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  365.134969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  365.135679] pc : register_cpu+0x138/0x250
> [  365.136093] lr : register_cpu+0x120/0x250
> [  365.136506] sp : ffff800082cbba10
> [  365.136847] x29: ffff800082cbba10 x28: ffff8000826479c0 x27: ffff000000a7e098
> [  365.137575] x26: ffff8000827c2838 x25: 0000000000000004 x24: ffff80008264d9b0
> [  365.138311] x23: 0000000000000004 x22: ffff000012a482d0 x21: ffff800081e30a00
> [  365.139037] x20: 0000000000000000 x19: ffff800081ba4190 x18: ffffffffffffffff
> [  365.139764] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000001adaa1c
> [  365.140490] x14: ffffffffffffffff x13: ffff000001ada2e0 x12: 0000000000000000
> [  365.141216] x11: ffff800081e32780 x10: 0000000000000000 x9 : 0000000000000001
> [  365.141945] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 6f7274726e737460
> [  365.142668] x5 : ffff0000027b1920 x4 : ffff0000027b1b40 x3 : ffff0000027b1880
> [  365.143400] x2 : ffff0000001933c0 x1 : ffff800081ba4190 x0 : 0000000000000010
> [  365.144129] Call trace:
> [  365.144382]  register_cpu+0x138/0x250
> [  365.144759]  arch_register_cpu+0x7c/0xc4
> [  365.145166]  acpi_processor_add+0x468/0x590
> [  365.145594]  acpi_bus_attach+0x1ac/0x2dc
> [  365.146002]  acpi_dev_for_one_check+0x34/0x40
> [  365.146449]  device_for_each_child+0x5c/0xb0
> [  365.146887]  acpi_dev_for_each_child+0x3c/0x64
> [  365.147341]  acpi_bus_attach+0x78/0x2dc
> [  365.147734]  acpi_bus_scan+0x68/0x208
> [  365.148110]  acpi_scan_rescan_bus+0x4c/0x78
> [  365.148537]  acpi_device_hotplug+0x1f8/0x460
> [  365.148975]  acpi_hotplug_work_fn+0x24/0x3c
> [  365.149402]  process_one_work+0x150/0x294
> [  365.149817]  worker_thread+0x2e4/0x3ec
> [  365.150199]  kthread+0x118/0x11c
> [  365.150536]  ret_from_fork+0x10/0x20
> [  365.150903] Code: 91064021 9ad72000 8b130c33 d503201f (f820327f)
> [  365.151527] ---[ end trace 0000000000000000 ]---
> 

Should be fixed by: https://lkml.org/lkml/2024/8/8/155

Thanks,
Gavin