[PATCH] target/loongarch/kvm: Fix VM recovery from disk failures

Song Gao posted 1 patch 2 weeks, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240430012356.2620763-1-gaosong@loongson.cn
Maintainers: Song Gao <gaosong@loongson.cn>
There is a newer version of this series
target/loongarch/machine.c | 2 ++
1 file changed, 2 insertions(+)
[PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Song Gao 2 weeks, 4 days ago
vmstate does not save kvm_state_conter,
which can cause VM recovery from disk to fail.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/machine.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index c7029fb9b4..4cd1bf06ff 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
         VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
                              0, vmstate_tlb, LoongArchTLB),
 
+        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
+
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription * const []) {
-- 
2.25.1
Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Philippe Mathieu-Daudé 2 weeks, 3 days ago
(Cc'ing migration maintainers)

On 30/4/24 03:23, Song Gao wrote:
> vmstate does not save kvm_state_conter,
> which can cause VM recovery from disk to fail.

Cc: qemu-stable@nongnu.org
Fixes: d11681c94f ("target/loongarch: Implement kvm_arch_init_vcpu")

> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/machine.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
> index c7029fb9b4..4cd1bf06ff 100644
> --- a/target/loongarch/machine.c
> +++ b/target/loongarch/machine.c
> @@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
>           VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
>                                0, vmstate_tlb, LoongArchTLB),
>   
> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
> +
>           VMSTATE_END_OF_LIST()
>       },
>       .subsections = (const VMStateDescription * const []) {

The migration stream is versioned, so you should increase it,
but this field is only relevant for KVM (it shouldn't be there
in non-KVM builds). IMHO the correct migration way to fix that
is (untested):

-- >8 --
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index c7029fb9b4..08032c6d71 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,8 +8,27 @@
  #include "qemu/osdep.h"
  #include "cpu.h"
  #include "migration/cpu.h"
+#include "sysemu/kvm.h"
  #include "vec.h"

+#ifdef CONFIG_KVM
+static bool kvmcpu_needed(void *opaque)
+{
+    return kvm_enabled();
+}
+
+static const VMStateDescription vmstate_kvmtimer = {
+    .name = "cpu/kvmtimer",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = kvmcpu_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif /* CONFIG_KVM */
+
  static const VMStateDescription vmstate_fpu_reg = {
      .name = "fpu_reg",
      .version_id = 1,
@@ -194,6 +213,9 @@ const VMStateDescription vmstate_loongarch_cpu = {
          VMSTATE_END_OF_LIST()
      },
      .subsections = (const VMStateDescription * const []) {
+#ifdef CONFIG_KVM
+        &vmstate_kvmcpu,
+#endif
          &vmstate_fpu,
          &vmstate_lsx,
          &vmstate_lasx,
---
Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Fabiano Rosas 2 weeks, 3 days ago
Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> (Cc'ing migration maintainers)
>
> On 30/4/24 03:23, Song Gao wrote:
>> vmstate does not save kvm_state_conter,
>> which can cause VM recovery from disk to fail.
>
> Cc: qemu-stable@nongnu.org
> Fixes: d11681c94f ("target/loongarch: Implement kvm_arch_init_vcpu")
>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/machine.c | 2 ++
>>   1 file changed, 2 insertions(+)
>> 
>> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
>> index c7029fb9b4..4cd1bf06ff 100644
>> --- a/target/loongarch/machine.c
>> +++ b/target/loongarch/machine.c
>> @@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
>>           VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
>>                                0, vmstate_tlb, LoongArchTLB),
>>   
>> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
>> +
>>           VMSTATE_END_OF_LIST()
>>       },
>>       .subsections = (const VMStateDescription * const []) {
>
> The migration stream is versioned, so you should increase it,
> but this field is only relevant for KVM (it shouldn't be there
> in non-KVM builds). IMHO the correct migration way to fix that
> is (untested):
>
> -- >8 --
> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
> index c7029fb9b4..08032c6d71 100644
> --- a/target/loongarch/machine.c
> +++ b/target/loongarch/machine.c
> @@ -8,8 +8,27 @@
>   #include "qemu/osdep.h"
>   #include "cpu.h"
>   #include "migration/cpu.h"
> +#include "sysemu/kvm.h"
>   #include "vec.h"
>
> +#ifdef CONFIG_KVM
> +static bool kvmcpu_needed(void *opaque)
> +{
> +    return kvm_enabled();
> +}
> +
> +static const VMStateDescription vmstate_kvmtimer = {
> +    .name = "cpu/kvmtimer",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = kvmcpu_needed,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +#endif /* CONFIG_KVM */
> +
>   static const VMStateDescription vmstate_fpu_reg = {
>       .name = "fpu_reg",
>       .version_id = 1,
> @@ -194,6 +213,9 @@ const VMStateDescription vmstate_loongarch_cpu = {
>           VMSTATE_END_OF_LIST()
>       },
>       .subsections = (const VMStateDescription * const []) {
> +#ifdef CONFIG_KVM
> +        &vmstate_kvmcpu,
> +#endif
>           &vmstate_fpu,
>           &vmstate_lsx,
>           &vmstate_lasx,
> ---

LGTM, I'd just leave only the .needed function under CONFIG_KVM instead
of the whole subsection.
Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Peter Xu 2 weeks, 2 days ago
On Tue, Apr 30, 2024 at 11:00:24AM -0300, Fabiano Rosas wrote:
> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
> 
> > (Cc'ing migration maintainers)
> >
> > On 30/4/24 03:23, Song Gao wrote:
> >> vmstate does not save kvm_state_conter,
> >> which can cause VM recovery from disk to fail.
> >
> > Cc: qemu-stable@nongnu.org
> > Fixes: d11681c94f ("target/loongarch: Implement kvm_arch_init_vcpu")
> >
> >> Signed-off-by: Song Gao <gaosong@loongson.cn>
> >> ---
> >>   target/loongarch/machine.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >> 
> >> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
> >> index c7029fb9b4..4cd1bf06ff 100644
> >> --- a/target/loongarch/machine.c
> >> +++ b/target/loongarch/machine.c
> >> @@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
> >>           VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
> >>                                0, vmstate_tlb, LoongArchTLB),
> >>   
> >> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
> >> +
> >>           VMSTATE_END_OF_LIST()
> >>       },
> >>       .subsections = (const VMStateDescription * const []) {
> >
> > The migration stream is versioned, so you should increase it,
> > but this field is only relevant for KVM (it shouldn't be there
> > in non-KVM builds). IMHO the correct migration way to fix that
> > is (untested):
> >
> > -- >8 --
> > diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
> > index c7029fb9b4..08032c6d71 100644
> > --- a/target/loongarch/machine.c
> > +++ b/target/loongarch/machine.c
> > @@ -8,8 +8,27 @@
> >   #include "qemu/osdep.h"
> >   #include "cpu.h"
> >   #include "migration/cpu.h"
> > +#include "sysemu/kvm.h"
> >   #include "vec.h"
> >
> > +#ifdef CONFIG_KVM
> > +static bool kvmcpu_needed(void *opaque)
> > +{
> > +    return kvm_enabled();
> > +}
> > +
> > +static const VMStateDescription vmstate_kvmtimer = {
> > +    .name = "cpu/kvmtimer",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = kvmcpu_needed,
> > +    .fields = (const VMStateField[]) {
> > +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +#endif /* CONFIG_KVM */
> > +
> >   static const VMStateDescription vmstate_fpu_reg = {
> >       .name = "fpu_reg",
> >       .version_id = 1,
> > @@ -194,6 +213,9 @@ const VMStateDescription vmstate_loongarch_cpu = {
> >           VMSTATE_END_OF_LIST()
> >       },
> >       .subsections = (const VMStateDescription * const []) {
> > +#ifdef CONFIG_KVM
> > +        &vmstate_kvmcpu,
> > +#endif
> >           &vmstate_fpu,
> >           &vmstate_lsx,
> >           &vmstate_lasx,
> > ---
> 
> LGTM, I'd just leave only the .needed function under CONFIG_KVM instead
> of the whole subsection.

But when !KVM it means there's no ".needed" and it'll still be migrated?

IMHO it depends on whether loongarch is in the state already of trying to
keep its ABI at all.  I think we should still try to enjoy the time when
that ABI is not required, then we can simply add whatever fields, and let
things break with no big deal.

Note that if with CONFIG_KVM it means it can break migration between kvm
v.s. tcg when only one qemu enabled kvm when compile.  Considering the
patch is from the maintainer (which seems to say "breaking that is all
fine!") I'd say the original patch looks good actually, as it allows kvm /
tcg migrations too as a baseline.

Thanks,

-- 
Peter Xu


Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Fabiano Rosas 2 weeks, 1 day ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Apr 30, 2024 at 11:00:24AM -0300, Fabiano Rosas wrote:
>> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
>> 
>> > (Cc'ing migration maintainers)
>> >
>> > On 30/4/24 03:23, Song Gao wrote:
>> >> vmstate does not save kvm_state_conter,
>> >> which can cause VM recovery from disk to fail.
>> >
>> > Cc: qemu-stable@nongnu.org
>> > Fixes: d11681c94f ("target/loongarch: Implement kvm_arch_init_vcpu")
>> >
>> >> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> >> ---
>> >>   target/loongarch/machine.c | 2 ++
>> >>   1 file changed, 2 insertions(+)
>> >> 
>> >> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
>> >> index c7029fb9b4..4cd1bf06ff 100644
>> >> --- a/target/loongarch/machine.c
>> >> +++ b/target/loongarch/machine.c
>> >> @@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
>> >>           VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
>> >>                                0, vmstate_tlb, LoongArchTLB),
>> >>   
>> >> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
>> >> +
>> >>           VMSTATE_END_OF_LIST()
>> >>       },
>> >>       .subsections = (const VMStateDescription * const []) {
>> >
>> > The migration stream is versioned, so you should increase it,
>> > but this field is only relevant for KVM (it shouldn't be there
>> > in non-KVM builds). IMHO the correct migration way to fix that
>> > is (untested):
>> >
>> > -- >8 --
>> > diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
>> > index c7029fb9b4..08032c6d71 100644
>> > --- a/target/loongarch/machine.c
>> > +++ b/target/loongarch/machine.c
>> > @@ -8,8 +8,27 @@
>> >   #include "qemu/osdep.h"
>> >   #include "cpu.h"
>> >   #include "migration/cpu.h"
>> > +#include "sysemu/kvm.h"
>> >   #include "vec.h"
>> >
>> > +#ifdef CONFIG_KVM
>> > +static bool kvmcpu_needed(void *opaque)
>> > +{
>> > +    return kvm_enabled();
>> > +}
>> > +
>> > +static const VMStateDescription vmstate_kvmtimer = {
>> > +    .name = "cpu/kvmtimer",
>> > +    .version_id = 1,
>> > +    .minimum_version_id = 1,
>> > +    .needed = kvmcpu_needed,
>> > +    .fields = (const VMStateField[]) {
>> > +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
>> > +        VMSTATE_END_OF_LIST()
>> > +    }
>> > +};
>> > +#endif /* CONFIG_KVM */
>> > +
>> >   static const VMStateDescription vmstate_fpu_reg = {
>> >       .name = "fpu_reg",
>> >       .version_id = 1,
>> > @@ -194,6 +213,9 @@ const VMStateDescription vmstate_loongarch_cpu = {
>> >           VMSTATE_END_OF_LIST()
>> >       },
>> >       .subsections = (const VMStateDescription * const []) {
>> > +#ifdef CONFIG_KVM
>> > +        &vmstate_kvmcpu,
>> > +#endif
>> >           &vmstate_fpu,
>> >           &vmstate_lsx,
>> >           &vmstate_lasx,
>> > ---
>> 
>> LGTM, I'd just leave only the .needed function under CONFIG_KVM instead
>> of the whole subsection.
>
> But when !KVM it means there's no ".needed" and it'll still be migrated?

I expressed myself poorly, I meant put the return from .needed under
CONFIG_KVM. But that is not even necessary, kvm_enabled() is enough.

>
> IMHO it depends on whether loongarch is in the state already of trying to
> keep its ABI at all.  I think we should still try to enjoy the time when
> that ABI is not required, then we can simply add whatever fields, and let
> things break with no big deal.
>
> Note that if with CONFIG_KVM it means it can break migration between kvm
> v.s. tcg when only one qemu enabled kvm when compile.  Considering the
> patch is from the maintainer (which seems to say "breaking that is all
> fine!") I'd say the original patch looks good actually, as it allows kvm /
> tcg migrations too as a baseline.

I'm fine with this approach as well.

>
> Thanks,
Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by gaosong 1 week, 3 days ago
Thanks for the comments !

在 2024/5/2 下午8:45, Fabiano Rosas 写道:
> Peter Xu <peterx@redhat.com> writes:
>
>> On Tue, Apr 30, 2024 at 11:00:24AM -0300, Fabiano Rosas wrote:
>>> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
>>>
>>>> (Cc'ing migration maintainers)
>>>>
>>>> On 30/4/24 03:23, Song Gao wrote:
>>>>> vmstate does not save kvm_state_conter,
>>>>> which can cause VM recovery from disk to fail.
>>>> Cc: qemu-stable@nongnu.org
>>>> Fixes: d11681c94f ("target/loongarch: Implement kvm_arch_init_vcpu")
>>>>
>>>>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>>>>> ---
>>>>>    target/loongarch/machine.c | 2 ++
>>>>>    1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
>>>>> index c7029fb9b4..4cd1bf06ff 100644
>>>>> --- a/target/loongarch/machine.c
>>>>> +++ b/target/loongarch/machine.c
>>>>> @@ -191,6 +191,8 @@ const VMStateDescription vmstate_loongarch_cpu = {
>>>>>            VMSTATE_STRUCT_ARRAY(env.tlb, LoongArchCPU, LOONGARCH_TLB_MAX,
>>>>>                                 0, vmstate_tlb, LoongArchTLB),
>>>>>    
>>>>> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
>>>>> +
>>>>>            VMSTATE_END_OF_LIST()
>>>>>        },
>>>>>        .subsections = (const VMStateDescription * const []) {
>>>> The migration stream is versioned, so you should increase it,
>>>> but this field is only relevant for KVM (it shouldn't be there
>>>> in non-KVM builds). IMHO the correct migration way to fix that
>>>> is (untested):
>>>>
>>>> -- >8 --
>>>> diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
>>>> index c7029fb9b4..08032c6d71 100644
>>>> --- a/target/loongarch/machine.c
>>>> +++ b/target/loongarch/machine.c
>>>> @@ -8,8 +8,27 @@
>>>>    #include "qemu/osdep.h"
>>>>    #include "cpu.h"
>>>>    #include "migration/cpu.h"
>>>> +#include "sysemu/kvm.h"
>>>>    #include "vec.h"
>>>>
>>>> +#ifdef CONFIG_KVM
>>>> +static bool kvmcpu_needed(void *opaque)
>>>> +{
>>>> +    return kvm_enabled();
>>>> +}
>>>> +
>>>> +static const VMStateDescription vmstate_kvmtimer = {
>>>> +    .name = "cpu/kvmtimer",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .needed = kvmcpu_needed,
>>>> +    .fields = (const VMStateField[]) {
>>>> +        VMSTATE_UINT64(kvm_state_counter, LoongArchCPU),
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    }
>>>> +};
>>>> +#endif /* CONFIG_KVM */
>>>> +
>>>>    static const VMStateDescription vmstate_fpu_reg = {
>>>>        .name = "fpu_reg",
>>>>        .version_id = 1,
>>>> @@ -194,6 +213,9 @@ const VMStateDescription vmstate_loongarch_cpu = {
>>>>            VMSTATE_END_OF_LIST()
>>>>        },
>>>>        .subsections = (const VMStateDescription * const []) {
>>>> +#ifdef CONFIG_KVM
>>>> +        &vmstate_kvmcpu,
>>>> +#endif
>>>>            &vmstate_fpu,
>>>>            &vmstate_lsx,
>>>>            &vmstate_lasx,
>>>> ---
>>> LGTM, I'd just leave only the .needed function under CONFIG_KVM instead
>>> of the whole subsection.
>> But when !KVM it means there's no ".needed" and it'll still be migrated?
> I expressed myself poorly, I meant put the return from .needed under
> CONFIG_KVM. But that is not even necessary, kvm_enabled() is enough.
>
>> IMHO it depends on whether loongarch is in the state already of trying to
>> keep its ABI at all.  I think we should still try to enjoy the time when
>> that ABI is not required, then we can simply add whatever fields, and let
>> things break with no big deal.
>>
>> Note that if with CONFIG_KVM it means it can break migration between kvm
>> v.s. tcg when only one qemu enabled kvm when compile.
Just remove CONIFG_KVM  would be OK?

Thanks.
Song Gao
>>    Considering the
>> patch is from the maintainer (which seems to say "breaking that is all
>> fine!") I'd say the original patch looks good actually, as it allows kvm /
>> tcg migrations too as a baseline.
> I'm fine with this approach as well.
>
>> Thanks,


Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Peter Xu 1 week, 3 days ago
On Tue, May 07, 2024 at 04:12:34PM +0800, gaosong wrote:
> Just remove CONIFG_KVM  would be OK?

You're the loongarch maintainer so I'd say your call. :)

If you're not yet sure, IMHO we should go with the simplest, which is the
original oneliner patch.

Thanks,

-- 
Peter Xu


Re: [PATCH] target/loongarch/kvm: Fix VM recovery from disk failures
Posted by Peter Maydell 1 week, 3 days ago
On Tue, 7 May 2024 at 16:47, Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, May 07, 2024 at 04:12:34PM +0800, gaosong wrote:
> > Just remove CONIFG_KVM  would be OK?
>
> You're the loongarch maintainer so I'd say your call. :)
>
> If you're not yet sure, IMHO we should go with the simplest, which is the
> original oneliner patch.

The original patch needs to also bump the version numbers when
it adds the new field.

Even when we do not wish to maintain migration compatibility,
bumping the version number means that users get a (more or less)
helpful error message if they try an unsupported cross-version
migration, rather than weird behaviour.

thanks
-- PMM