[RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state

Nicholas Piggin posted 3 patches 6 months ago
[RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by Nicholas Piggin 6 months ago
The flic pending state is not migrated, so if the machine is migrated
while an interrupt is pending, it can be lost. This shows up in
qtest migration test, an extint is pending (due to console writes?)
and the CPU waits via s390_cpu_set_psw and expects the interrupt to
wake it. However when the flic pending state is lost, s390_cpu_has_int
returns false, so s390_cpu_exec_interrupt falls through to halting
again.

Fix this by migrating pending. This prevents the qtest from hanging.
Does service_param need to be migrated? Or the IO lists?

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/intc/s390_flic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6771645699..b70cf2295a 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
     .fields = (const VMStateField[]) {
         VMSTATE_UINT8(simm, QEMUS390FLICState),
         VMSTATE_UINT8(nimm, QEMUS390FLICState),
+        VMSTATE_UINT32(pending, QEMUS390FLICState),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.43.0
Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by David Hildenbrand 6 months ago
Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> The flic pending state is not migrated, so if the machine is migrated
> while an interrupt is pending, it can be lost. This shows up in
> qtest migration test, an extint is pending (due to console writes?)
> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> wake it. However when the flic pending state is lost, s390_cpu_has_int
> returns false, so s390_cpu_exec_interrupt falls through to halting
> again.
> 
> Fix this by migrating pending. This prevents the qtest from hanging.
> Does service_param need to be migrated? Or the IO lists?
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/intc/s390_flic.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> index 6771645699..b70cf2295a 100644
> --- a/hw/intc/s390_flic.c
> +++ b/hw/intc/s390_flic.c
> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>       .fields = (const VMStateField[]) {
>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>           VMSTATE_END_OF_LIST()
>       }
>   };

Likely you have to handle this using QEMU compat machines.

-- 
Thanks,

David / dhildenb
Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by Richard Henderson 6 months ago
On 5/26/24 08:53, David Hildenbrand wrote:
> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>> The flic pending state is not migrated, so if the machine is migrated
>> while an interrupt is pending, it can be lost. This shows up in
>> qtest migration test, an extint is pending (due to console writes?)
>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>> returns false, so s390_cpu_exec_interrupt falls through to halting
>> again.
>>
>> Fix this by migrating pending. This prevents the qtest from hanging.
>> Does service_param need to be migrated? Or the IO lists?
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   hw/intc/s390_flic.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>> index 6771645699..b70cf2295a 100644
>> --- a/hw/intc/s390_flic.c
>> +++ b/hw/intc/s390_flic.c
>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>       .fields = (const VMStateField[]) {
>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>           VMSTATE_END_OF_LIST()
>>       }
>>   };
> 
> Likely you have to handle this using QEMU compat machines.

Well, since existing migration is broken, I don't think you have to preserve 
compatibility.  But you do have to bump the version number.


r~


Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by David Hildenbrand 6 months ago
Am 26.05.24 um 21:44 schrieb Richard Henderson:
> On 5/26/24 08:53, David Hildenbrand wrote:
>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>> The flic pending state is not migrated, so if the machine is migrated
>>> while an interrupt is pending, it can be lost. This shows up in
>>> qtest migration test, an extint is pending (due to console writes?)
>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>> again.
>>>
>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>> Does service_param need to be migrated? Or the IO lists?
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>   hw/intc/s390_flic.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>> index 6771645699..b70cf2295a 100644
>>> --- a/hw/intc/s390_flic.c
>>> +++ b/hw/intc/s390_flic.c
>>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
>>>       .fields = (const VMStateField[]) {
>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>           VMSTATE_END_OF_LIST()
>>>       }
>>>   };
>>
>> Likely you have to handle this using QEMU compat machines.
> 
> Well, since existing migration is broken, I don't think you have to preserve 

Migration is broken only in some case "while an interrupt is pending, it can be 
lost".

> compatibility.  But you do have to bump the version number.

Looking at it, this is TCG only, so likely we don't care that much about 
migration compatibility. But I have no idea what level of compatibility we want 
to support there.

-- 
Thanks,

David / dhildenb


Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by Thomas Huth 6 months ago
On 26/05/2024 22.33, David Hildenbrand wrote:
> Am 26.05.24 um 21:44 schrieb Richard Henderson:
>> On 5/26/24 08:53, David Hildenbrand wrote:
>>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
>>>> The flic pending state is not migrated, so if the machine is migrated
>>>> while an interrupt is pending, it can be lost. This shows up in
>>>> qtest migration test, an extint is pending (due to console writes?)
>>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
>>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
>>>> returns false, so s390_cpu_exec_interrupt falls through to halting
>>>> again.
>>>>
>>>> Fix this by migrating pending. This prevents the qtest from hanging.
>>>> Does service_param need to be migrated? Or the IO lists?
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>>   hw/intc/s390_flic.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
>>>> index 6771645699..b70cf2295a 100644
>>>> --- a/hw/intc/s390_flic.c
>>>> +++ b/hw/intc/s390_flic.c
>>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
>>>> qemu_s390_flic_vmstate = {
>>>>       .fields = (const VMStateField[]) {
>>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
>>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
>>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
>>>>           VMSTATE_END_OF_LIST()
>>>>       }
>>>>   };
>>>
>>> Likely you have to handle this using QEMU compat machines.
>>
>> Well, since existing migration is broken, I don't think you have to preserve 
> 
> Migration is broken only in some case "while an interrupt is pending, it can 
> be lost".
> 
>> compatibility.  But you do have to bump the version number.
> 
> Looking at it, this is TCG only, so likely we don't care that much about 
> migration compatibility. But I have no idea what level of compatibility we 
> want to support there.

Yes, this seems to only affect the TCG-only flic device, where migration has 
never been working very reliably. So I think we don't really need the whole 
compat-machine dance here. But I think we should at least bump the 
version_id to 2 now and then use

    VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);

for the new field. That way we would at least support forward migrations 
without too much hassle.

  Thomas


Re: [RFC PATCH 1/3] hw/intc/s390_flic: Migrate pending state
Posted by Nicholas Piggin 6 months ago
On Mon May 27, 2024 at 3:51 PM AEST, Thomas Huth wrote:
> On 26/05/2024 22.33, David Hildenbrand wrote:
> > Am 26.05.24 um 21:44 schrieb Richard Henderson:
> >> On 5/26/24 08:53, David Hildenbrand wrote:
> >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin:
> >>>> The flic pending state is not migrated, so if the machine is migrated
> >>>> while an interrupt is pending, it can be lost. This shows up in
> >>>> qtest migration test, an extint is pending (due to console writes?)
> >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to
> >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int
> >>>> returns false, so s390_cpu_exec_interrupt falls through to halting
> >>>> again.
> >>>>
> >>>> Fix this by migrating pending. This prevents the qtest from hanging.
> >>>> Does service_param need to be migrated? Or the IO lists?
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>>   hw/intc/s390_flic.c | 1 +
> >>>>   1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
> >>>> index 6771645699..b70cf2295a 100644
> >>>> --- a/hw/intc/s390_flic.c
> >>>> +++ b/hw/intc/s390_flic.c
> >>>> @@ -369,6 +369,7 @@ static const VMStateDescription 
> >>>> qemu_s390_flic_vmstate = {
> >>>>       .fields = (const VMStateField[]) {
> >>>>           VMSTATE_UINT8(simm, QEMUS390FLICState),
> >>>>           VMSTATE_UINT8(nimm, QEMUS390FLICState),
> >>>> +        VMSTATE_UINT32(pending, QEMUS390FLICState),
> >>>>           VMSTATE_END_OF_LIST()
> >>>>       }
> >>>>   };
> >>>
> >>> Likely you have to handle this using QEMU compat machines.
> >>
> >> Well, since existing migration is broken, I don't think you have to preserve 
> > 
> > Migration is broken only in some case "while an interrupt is pending, it can 
> > be lost".
> > 
> >> compatibility.  But you do have to bump the version number.
> > 
> > Looking at it, this is TCG only, so likely we don't care that much about 
> > migration compatibility. But I have no idea what level of compatibility we 
> > want to support there.
>
> Yes, this seems to only affect the TCG-only flic device, where migration has 
> never been working very reliably. So I think we don't really need the whole 
> compat-machine dance here. But I think we should at least bump the 
> version_id to 2 now and then use
>
>     VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2);
>
> for the new field. That way we would at least support forward migrations 
> without too much hassle.

Well if you could rebuild this state by checking sources or something
it might be possible to avoid. Or if you could always mark pending and
software can tolerate superflous. But that also seems like a lot of
headache for something that was always flaky.

Thanks,
Nick