The flic pending state is not migrated, so if the machine is migrated
while an interrupt is pending, it can be lost. This shows up in
qtest migration test, an extint is pending (due to console writes?)
and the CPU waits via s390_cpu_set_psw and expects the interrupt to
wake it. However when the flic pending state is lost, s390_cpu_has_int
returns false, so s390_cpu_exec_interrupt falls through to halting
again.
Fix this by migrating pending. This prevents the qtest from hanging.
Does service_param need to be migrated? Or the IO lists?
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/intc/s390_flic.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c
index 6771645699..b70cf2295a 100644
--- a/hw/intc/s390_flic.c
+++ b/hw/intc/s390_flic.c
@@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = {
.fields = (const VMStateField[]) {
VMSTATE_UINT8(simm, QEMUS390FLICState),
VMSTATE_UINT8(nimm, QEMUS390FLICState),
+ VMSTATE_UINT32(pending, QEMUS390FLICState),
VMSTATE_END_OF_LIST()
}
};
--
2.43.0
Am 25.05.24 um 15:12 schrieb Nicholas Piggin: > The flic pending state is not migrated, so if the machine is migrated > while an interrupt is pending, it can be lost. This shows up in > qtest migration test, an extint is pending (due to console writes?) > and the CPU waits via s390_cpu_set_psw and expects the interrupt to > wake it. However when the flic pending state is lost, s390_cpu_has_int > returns false, so s390_cpu_exec_interrupt falls through to halting > again. > > Fix this by migrating pending. This prevents the qtest from hanging. > Does service_param need to be migrated? Or the IO lists? > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > hw/intc/s390_flic.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c > index 6771645699..b70cf2295a 100644 > --- a/hw/intc/s390_flic.c > +++ b/hw/intc/s390_flic.c > @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = { > .fields = (const VMStateField[]) { > VMSTATE_UINT8(simm, QEMUS390FLICState), > VMSTATE_UINT8(nimm, QEMUS390FLICState), > + VMSTATE_UINT32(pending, QEMUS390FLICState), > VMSTATE_END_OF_LIST() > } > }; Likely you have to handle this using QEMU compat machines. -- Thanks, David / dhildenb
On 5/26/24 08:53, David Hildenbrand wrote: > Am 25.05.24 um 15:12 schrieb Nicholas Piggin: >> The flic pending state is not migrated, so if the machine is migrated >> while an interrupt is pending, it can be lost. This shows up in >> qtest migration test, an extint is pending (due to console writes?) >> and the CPU waits via s390_cpu_set_psw and expects the interrupt to >> wake it. However when the flic pending state is lost, s390_cpu_has_int >> returns false, so s390_cpu_exec_interrupt falls through to halting >> again. >> >> Fix this by migrating pending. This prevents the qtest from hanging. >> Does service_param need to be migrated? Or the IO lists? >> >> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> --- >> hw/intc/s390_flic.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c >> index 6771645699..b70cf2295a 100644 >> --- a/hw/intc/s390_flic.c >> +++ b/hw/intc/s390_flic.c >> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = { >> .fields = (const VMStateField[]) { >> VMSTATE_UINT8(simm, QEMUS390FLICState), >> VMSTATE_UINT8(nimm, QEMUS390FLICState), >> + VMSTATE_UINT32(pending, QEMUS390FLICState), >> VMSTATE_END_OF_LIST() >> } >> }; > > Likely you have to handle this using QEMU compat machines. Well, since existing migration is broken, I don't think you have to preserve compatibility. But you do have to bump the version number. r~
Am 26.05.24 um 21:44 schrieb Richard Henderson: > On 5/26/24 08:53, David Hildenbrand wrote: >> Am 25.05.24 um 15:12 schrieb Nicholas Piggin: >>> The flic pending state is not migrated, so if the machine is migrated >>> while an interrupt is pending, it can be lost. This shows up in >>> qtest migration test, an extint is pending (due to console writes?) >>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to >>> wake it. However when the flic pending state is lost, s390_cpu_has_int >>> returns false, so s390_cpu_exec_interrupt falls through to halting >>> again. >>> >>> Fix this by migrating pending. This prevents the qtest from hanging. >>> Does service_param need to be migrated? Or the IO lists? >>> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >>> --- >>> hw/intc/s390_flic.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c >>> index 6771645699..b70cf2295a 100644 >>> --- a/hw/intc/s390_flic.c >>> +++ b/hw/intc/s390_flic.c >>> @@ -369,6 +369,7 @@ static const VMStateDescription qemu_s390_flic_vmstate = { >>> .fields = (const VMStateField[]) { >>> VMSTATE_UINT8(simm, QEMUS390FLICState), >>> VMSTATE_UINT8(nimm, QEMUS390FLICState), >>> + VMSTATE_UINT32(pending, QEMUS390FLICState), >>> VMSTATE_END_OF_LIST() >>> } >>> }; >> >> Likely you have to handle this using QEMU compat machines. > > Well, since existing migration is broken, I don't think you have to preserve Migration is broken only in some case "while an interrupt is pending, it can be lost". > compatibility. But you do have to bump the version number. Looking at it, this is TCG only, so likely we don't care that much about migration compatibility. But I have no idea what level of compatibility we want to support there. -- Thanks, David / dhildenb
On 26/05/2024 22.33, David Hildenbrand wrote: > Am 26.05.24 um 21:44 schrieb Richard Henderson: >> On 5/26/24 08:53, David Hildenbrand wrote: >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin: >>>> The flic pending state is not migrated, so if the machine is migrated >>>> while an interrupt is pending, it can be lost. This shows up in >>>> qtest migration test, an extint is pending (due to console writes?) >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int >>>> returns false, so s390_cpu_exec_interrupt falls through to halting >>>> again. >>>> >>>> Fix this by migrating pending. This prevents the qtest from hanging. >>>> Does service_param need to be migrated? Or the IO lists? >>>> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >>>> --- >>>> hw/intc/s390_flic.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c >>>> index 6771645699..b70cf2295a 100644 >>>> --- a/hw/intc/s390_flic.c >>>> +++ b/hw/intc/s390_flic.c >>>> @@ -369,6 +369,7 @@ static const VMStateDescription >>>> qemu_s390_flic_vmstate = { >>>> .fields = (const VMStateField[]) { >>>> VMSTATE_UINT8(simm, QEMUS390FLICState), >>>> VMSTATE_UINT8(nimm, QEMUS390FLICState), >>>> + VMSTATE_UINT32(pending, QEMUS390FLICState), >>>> VMSTATE_END_OF_LIST() >>>> } >>>> }; >>> >>> Likely you have to handle this using QEMU compat machines. >> >> Well, since existing migration is broken, I don't think you have to preserve > > Migration is broken only in some case "while an interrupt is pending, it can > be lost". > >> compatibility. But you do have to bump the version number. > > Looking at it, this is TCG only, so likely we don't care that much about > migration compatibility. But I have no idea what level of compatibility we > want to support there. Yes, this seems to only affect the TCG-only flic device, where migration has never been working very reliably. So I think we don't really need the whole compat-machine dance here. But I think we should at least bump the version_id to 2 now and then use VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2); for the new field. That way we would at least support forward migrations without too much hassle. Thomas
On Mon May 27, 2024 at 3:51 PM AEST, Thomas Huth wrote: > On 26/05/2024 22.33, David Hildenbrand wrote: > > Am 26.05.24 um 21:44 schrieb Richard Henderson: > >> On 5/26/24 08:53, David Hildenbrand wrote: > >>> Am 25.05.24 um 15:12 schrieb Nicholas Piggin: > >>>> The flic pending state is not migrated, so if the machine is migrated > >>>> while an interrupt is pending, it can be lost. This shows up in > >>>> qtest migration test, an extint is pending (due to console writes?) > >>>> and the CPU waits via s390_cpu_set_psw and expects the interrupt to > >>>> wake it. However when the flic pending state is lost, s390_cpu_has_int > >>>> returns false, so s390_cpu_exec_interrupt falls through to halting > >>>> again. > >>>> > >>>> Fix this by migrating pending. This prevents the qtest from hanging. > >>>> Does service_param need to be migrated? Or the IO lists? > >>>> > >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > >>>> --- > >>>> hw/intc/s390_flic.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/hw/intc/s390_flic.c b/hw/intc/s390_flic.c > >>>> index 6771645699..b70cf2295a 100644 > >>>> --- a/hw/intc/s390_flic.c > >>>> +++ b/hw/intc/s390_flic.c > >>>> @@ -369,6 +369,7 @@ static const VMStateDescription > >>>> qemu_s390_flic_vmstate = { > >>>> .fields = (const VMStateField[]) { > >>>> VMSTATE_UINT8(simm, QEMUS390FLICState), > >>>> VMSTATE_UINT8(nimm, QEMUS390FLICState), > >>>> + VMSTATE_UINT32(pending, QEMUS390FLICState), > >>>> VMSTATE_END_OF_LIST() > >>>> } > >>>> }; > >>> > >>> Likely you have to handle this using QEMU compat machines. > >> > >> Well, since existing migration is broken, I don't think you have to preserve > > > > Migration is broken only in some case "while an interrupt is pending, it can > > be lost". > > > >> compatibility. But you do have to bump the version number. > > > > Looking at it, this is TCG only, so likely we don't care that much about > > migration compatibility. But I have no idea what level of compatibility we > > want to support there. > > Yes, this seems to only affect the TCG-only flic device, where migration has > never been working very reliably. So I think we don't really need the whole > compat-machine dance here. But I think we should at least bump the > version_id to 2 now and then use > > VMSTATE_UINT32_V(pending, QEMUS390FLICState, 2); > > for the new field. That way we would at least support forward migrations > without too much hassle. Well if you could rebuild this state by checking sources or something it might be possible to avoid. Or if you could always mark pending and software can tolerate superflous. But that also seems like a lot of headache for something that was always flaky. Thanks, Nick
© 2016 - 2024 Red Hat, Inc.