[PATCH] kvm: Fix crash by initializing kvm_state early

Gavin Shan posted 1 patch 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230730234840.1989974-1-gshan@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
accel/kvm/kvm-all.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
[PATCH] kvm: Fix crash by initializing kvm_state early
Posted by Gavin Shan 9 months ago
Runs into core dump on arm64 and the backtrace extracted from the
core dump is shown as below. It's caused by accessing @kvm_state which
isn't initialized at that point due to commit 176d073029 ("hw/arm/virt:
Use machine_memory_devices_init()"), where the machine's memory region
is added ealier than before.

    main
    qemu_init
    configure_accelerators
    qemu_opts_foreach
    do_configure_accelerator
    accel_init_machine
    kvm_init
    virt_kvm_type
    virt_set_memmap
    machine_memory_devices_init
    memory_region_add_subregion
    memory_region_add_subregion_common
    memory_region_update_container_subregions
    memory_region_transaction_begin
    qemu_flush_coalesced_mmio_buffer
    kvm_flush_coalesced_mmio_buffer

Fix it by initializing @kvm_state early. With this applied, no crash
is observed on arm64.

Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()")
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 accel/kvm/kvm-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 373d876c05..c825cba12f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2464,6 +2464,7 @@ static int kvm_init(MachineState *ms)
     qemu_mutex_init(&kml_slots_lock);
 
     s = KVM_STATE(ms->accelerator);
+    kvm_state = s;
 
     /*
      * On systems where the kernel can support different base page
@@ -2695,8 +2696,6 @@ static int kvm_init(MachineState *ms)
 #endif
     }
 
-    kvm_state = s;
-
     ret = kvm_arch_init(ms, s);
     if (ret < 0) {
         goto err;
-- 
2.41.0
Re: [PATCH] kvm: Fix crash by initializing kvm_state early
Posted by David Hildenbrand 9 months ago
On 31.07.23 01:48, Gavin Shan wrote:
> Runs into core dump on arm64 and the backtrace extracted from the
> core dump is shown as below. It's caused by accessing @kvm_state which
> isn't initialized at that point due to commit 176d073029 ("hw/arm/virt:
> Use machine_memory_devices_init()"), where the machine's memory region
> is added ealier than before.

s/ealier/earlier/

> 
>      main
>      qemu_init
>      configure_accelerators
>      qemu_opts_foreach
>      do_configure_accelerator
>      accel_init_machine
>      kvm_init
>      virt_kvm_type
>      virt_set_memmap
>      machine_memory_devices_init
>      memory_region_add_subregion
>      memory_region_add_subregion_common
>      memory_region_update_container_subregions
>      memory_region_transaction_begin
>      qemu_flush_coalesced_mmio_buffer
>      kvm_flush_coalesced_mmio_buffer
> 
> Fix it by initializing @kvm_state early. With this applied, no crash
> is observed on arm64.

Interestingly, we register memory listeners in kvm_init() after setting 
kvm_state, so in theory it should have worked fine.

But it's rather surprising that we see kvm_flush_coalesced_mmio_buffer() 
call even though we didn't even setup a listener with 
kvm_coalesce_mmio_region / kvm_uncoalesce_mmio_region.

Such a notifier-specific flush might have been better placed in the 
MemoryListener->begin() call. But that needs more thought, as 
qemu_flush_coalesced_mmio_buffer() is called from a couple of places.
> 
> Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()")
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>   accel/kvm/kvm-all.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 373d876c05..c825cba12f 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2464,6 +2464,7 @@ static int kvm_init(MachineState *ms)
>       qemu_mutex_init(&kml_slots_lock);
>   
>       s = KVM_STATE(ms->accelerator);
> +    kvm_state = s;
>   
>       /*
>        * On systems where the kernel can support different base page
> @@ -2695,8 +2696,6 @@ static int kvm_init(MachineState *ms)
>   #endif
>       }
>   
> -    kvm_state = s;
> -
>       ret = kvm_arch_init(ms, s);
>       if (ret < 0) {
>           goto err;

As an alternative, we might simply do nothing in 
kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet. 
We don't have any notifier registered in that case.

Thanks!

-- 
Cheers,

David / dhildenb
Re: [PATCH] kvm: Fix crash by initializing kvm_state early
Posted by Peter Maydell 9 months ago
On Mon, 31 Jul 2023 at 08:18, David Hildenbrand <david@redhat.com> wrote:
>
> On 31.07.23 01:48, Gavin Shan wrote:
> > Runs into core dump on arm64 and the backtrace extracted from the
> > core dump is shown as below. It's caused by accessing @kvm_state which
> > isn't initialized at that point due to commit 176d073029 ("hw/arm/virt:
> > Use machine_memory_devices_init()"), where the machine's memory region
> > is added ealier than before.
>
> s/ealier/earlier/
>
> >
> >      main
> >      qemu_init
> >      configure_accelerators
> >      qemu_opts_foreach
> >      do_configure_accelerator
> >      accel_init_machine
> >      kvm_init
> >      virt_kvm_type
> >      virt_set_memmap
> >      machine_memory_devices_init
> >      memory_region_add_subregion
> >      memory_region_add_subregion_common
> >      memory_region_update_container_subregions
> >      memory_region_transaction_begin
> >      qemu_flush_coalesced_mmio_buffer
> >      kvm_flush_coalesced_mmio_buffer
> >
> > Fix it by initializing @kvm_state early. With this applied, no crash
> > is observed on arm64.

> As an alternative, we might simply do nothing in
> kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet.
> We don't have any notifier registered in that case.

Yes, this seems better I think -- conceptually kvm_init()
probably ought to first set up the accelerator state and
then set kvm_state last, so that other code that looks
at the kvm_state global either sees NULL or else a
completely valid state, not a possibly half-initialised
one. (We should probably also NULL the global in the
error-exit path, though I imagine we're about to exit
in that case.)

Is somebody able to write/test a patch for that today?
Ideally we'd fix this for tomorrow's rc...

thanks
-- PMM
Re: [PATCH] kvm: Fix crash by initializing kvm_state early
Posted by Gavin Shan 9 months ago

On 7/31/23 22:39, Peter Maydell wrote:
> On Mon, 31 Jul 2023 at 08:18, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 31.07.23 01:48, Gavin Shan wrote:
>>> Runs into core dump on arm64 and the backtrace extracted from the
>>> core dump is shown as below. It's caused by accessing @kvm_state which
>>> isn't initialized at that point due to commit 176d073029 ("hw/arm/virt:
>>> Use machine_memory_devices_init()"), where the machine's memory region
>>> is added ealier than before.
>>
>> s/ealier/earlier/
>>
>>>
>>>       main
>>>       qemu_init
>>>       configure_accelerators
>>>       qemu_opts_foreach
>>>       do_configure_accelerator
>>>       accel_init_machine
>>>       kvm_init
>>>       virt_kvm_type
>>>       virt_set_memmap
>>>       machine_memory_devices_init
>>>       memory_region_add_subregion
>>>       memory_region_add_subregion_common
>>>       memory_region_update_container_subregions
>>>       memory_region_transaction_begin
>>>       qemu_flush_coalesced_mmio_buffer
>>>       kvm_flush_coalesced_mmio_buffer
>>>
>>> Fix it by initializing @kvm_state early. With this applied, no crash
>>> is observed on arm64.
> 
>> As an alternative, we might simply do nothing in
>> kvm_flush_coalesced_mmio_buffer(), in case kvm_state is not setup yet.
>> We don't have any notifier registered in that case.
> 
> Yes, this seems better I think -- conceptually kvm_init()
> probably ought to first set up the accelerator state and
> then set kvm_state last, so that other code that looks
> at the kvm_state global either sees NULL or else a
> completely valid state, not a possibly half-initialised
> one. (We should probably also NULL the global in the
> error-exit path, though I imagine we're about to exit
> in that case.)
> 
> Is somebody able to write/test a patch for that today?
> Ideally we'd fix this for tomorrow's rc...
> 

Thanks for your comments, David and Peter. v2 was posted for a quick merge.

https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00702.html

Thanks,
Gavin