libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

[PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Alejandro Vallejo 3 days, 12 hours ago

CPU hotplug relies on the guest having access to the legacy online CPU
bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
this causes the MADT to get corrupted due to spurious modifications of
the "online" flag in MADT entries and the table checksum during the
initial acpica passes.

Seeing how ACPI CPU hotplug is the only event delivered via GPE, remove
the GPE handler too.

This shrinks PVH's DSDT substantially and fixes the MADT corruption
problem.

Fixes: e9a8dc050f9a("libacpi: Build DSDT for PVH guests")
Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
 tools/libacpi/mk_dsdt.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 8ac4f9d0b4..f71de6c8c6 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -218,6 +218,11 @@ int main(int argc, char **argv)
     pop_block();
     /**** Processor end ****/
 #else
+    if (dm_version == QEMU_NONE) {
+        pop_block();
+        pop_block();
+        return 0;
+    }
 
     /* Operation Region 'PRST': bitmask of online CPUs. */
     stmt("OperationRegion", "PRST, SystemIO, %#x, %d",
@@ -264,10 +269,6 @@ int main(int argc, char **argv)
     pop_block();
     pop_block();
 
-    if (dm_version == QEMU_NONE) {
-        pop_block();
-        return 0;
-    }
     /**** Processor end ****/
 
 

base-commit: 53c599cc33b61ae70d59572f3c1d843a3def84e2
-- 
2.43.0

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Jan Beulich 3 days, 12 hours ago

On 10.09.2025 16:49, Alejandro Vallejo wrote:
> CPU hotplug relies on the guest having access to the legacy online CPU
> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
> this causes the MADT to get corrupted due to spurious modifications of
> the "online" flag in MADT entries and the table checksum during the
> initial acpica passes.

I don't understand this MADT corruption aspect, which - aiui - is why
there's a Fixes: tag here. The code change itself looks plausible.

Jan

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Alejandro Vallejo 3 days, 12 hours ago

On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>> CPU hotplug relies on the guest having access to the legacy online CPU
>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>> this causes the MADT to get corrupted due to spurious modifications of
>> the "online" flag in MADT entries and the table checksum during the
>> initial acpica passes.
>
> I don't understand this MADT corruption aspect, which - aiui - is why
> there's a Fixes: tag here. The code change itself looks plausible.
>
> Jan

When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
then we get all 1s (because there's no IOREQ server). Which confuses the GPE
handler.

Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
or just it being spuriously executed as part of the initial acpica pass, I don't
know.

Both statements combined means the checksum and online flags in the MADT get
changed after initial parsing making it appear as-if all 128 CPUs were plugged.

This patch makes the checksums be correct after acpica init.

Grygorii noticed the checksum mismatch while validating an ACPI dump on a PVH
Linux system.

Cheers,
Alejandro

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Grygorii Strashko 3 days, 11 hours ago


On 10.09.25 18:16, Alejandro Vallejo wrote:
> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>> this causes the MADT to get corrupted due to spurious modifications of
>>> the "online" flag in MADT entries and the table checksum during the
>>> initial acpica passes.
>>
>> I don't understand this MADT corruption aspect, which - aiui - is why
>> there's a Fixes: tag here. The code change itself looks plausible.
>>
>> Jan
> 
> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
> handler.
> 
> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
> or just it being spuriously executed as part of the initial acpica pass, I don't
> know.
> 
> Both statements combined means the checksum and online flags in the MADT get
> changed after initial parsing making it appear as-if all 128 CPUs were plugged.
> 
> This patch makes the checksums be correct after acpica init.
> 
> Grygorii noticed the checksum mismatch while validating an ACPI dump on a PVH
> Linux system.

Below is "acpidump -r 0xfc000000" from PVH guest (not dom0) for MADT before/after this patch:

Before:

Firmware Warning (ACPI): Incorrect checksum in table [APIC] - 0x59, should be 0xFFFFFFE3 (20250404/utcksum-208)
APIC @ 0x0000000000000000
     0000: 41 50 49 43 52 00 00 00 02 59 58 65 6E 00 00 00  APICR....YXen...
                                      ^^ incorrect
     0010: 48 56 4D 00 00 00 00 00 00 00 00 00 48 56 4D 4C  HVM.........HVML
     0020: 00 00 00 00 00 00 E0 FE 00 00 00 00 02 0A 00 00  ................
     0030: 02 00 00 00 00 00 01 0C 00 00 00 00 C0 FE 00 00  ................
     0040: 00 00 00 08 00 00 01 00 00 00 00 08 01 02 01 00  ................
     0050: 00 00

After:
APIC @ 0x0000000000000000
     0000: 41 50 49 43 52 00 00 00 02 76 58 65 6E 00 00 00  APICR....vXen...
                                      ^^ correct
     0010: 48 56 4D 00 00 00 00 00 00 00 00 00 48 56 4D 4C  HVM.........HVML
     0020: 00 00 00 00 00 00 E0 FE 00 00 00 00 02 0A 00 00  ................
     0030: 02 00 00 00 00 00 01 0C 00 00 00 00 C0 FE 00 00  ................
     0040: 00 00 00 08 00 00 01 00 00 00 00 08 01 02 01 00  ................
     0050: 00 00

-- 
Best regards,
-grygorii

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Jan Beulich 3 days, 12 hours ago

On 10.09.2025 17:16, Alejandro Vallejo wrote:
> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>> this causes the MADT to get corrupted due to spurious modifications of
>>> the "online" flag in MADT entries and the table checksum during the
>>> initial acpica passes.
>>
>> I don't understand this MADT corruption aspect, which - aiui - is why
>> there's a Fixes: tag here. The code change itself looks plausible.
> 
> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
> handler.
> 
> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
> or just it being spuriously executed as part of the initial acpica pass, I don't
> know.
> 
> Both statements combined means the checksum and online flags in the MADT get
> changed after initial parsing making it appear as-if all 128 CPUs were plugged.

I can follow this part (the online flags one, that is).

> This patch makes the checksums be correct after acpica init.

I'm still in trouble with this one. If MADT is modified in the process, there's
only one of two possible options:
1) It's expected for the checksum to no longer be correct.
2) The checksum is being fixed up in the process.
That's independent of being HVM or PVH and independent of guest boot or later.
(Of course there's a sub-variant of 2, where the adjusting of the checksum
would be broken, but that wouldn't be covered by your change.)

Jan

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Alejandro Vallejo 3 days, 10 hours ago

On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote:
> On 10.09.2025 17:16, Alejandro Vallejo wrote:
>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>>> this causes the MADT to get corrupted due to spurious modifications of
>>>> the "online" flag in MADT entries and the table checksum during the
>>>> initial acpica passes.
>>>
>>> I don't understand this MADT corruption aspect, which - aiui - is why
>>> there's a Fixes: tag here. The code change itself looks plausible.
>> 
>> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
>> handler.
>> 
>> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
>> or just it being spuriously executed as part of the initial acpica pass, I don't
>> know.
>> 
>> Both statements combined means the checksum and online flags in the MADT get
>> changed after initial parsing making it appear as-if all 128 CPUs were plugged.
>
> I can follow this part (the online flags one, that is).
>
>> This patch makes the checksums be correct after acpica init.
>
> I'm still in trouble with this one. If MADT is modified in the process, there's
> only one of two possible options:
> 1) It's expected for the checksum to no longer be correct.
> 2) The checksum is being fixed up in the process.
> That's independent of being HVM or PVH and independent of guest boot or later.
> (Of course there's a sub-variant of 2, where the adjusting of the checksum
> would be broken, but that wouldn't be covered by your change.)
>
> Jan

I see what you mean now. The checksum correction code LOOKS correct. But I
wonder about the table length... We report a table as big as it needs to be,
but the checksum update is done irrespective of FLG being inside the valid range
of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being signalled
that'd trigger the (unseen) online flag to be enabled and the checksum adjusted,
except the checksum must not being adjusted.

I could add even more AML to cover that, but that'd be QEMU misbehaving (or
being absent). This patch covers the latter case, but it might be good to
change the commit message to reflect the real problem.

Cheers,
Alejandro

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Alejandro Vallejo 3 days, 10 hours ago

On Wed Sep 10, 2025 at 7:01 PM CEST, Alejandro Vallejo wrote:
> On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote:
>> On 10.09.2025 17:16, Alejandro Vallejo wrote:
>>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>>>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>>>> this causes the MADT to get corrupted due to spurious modifications of
>>>>> the "online" flag in MADT entries and the table checksum during the
>>>>> initial acpica passes.
>>>>
>>>> I don't understand this MADT corruption aspect, which - aiui - is why
>>>> there's a Fixes: tag here. The code change itself looks plausible.
>>> 
>>> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
>>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
>>> handler.
>>> 
>>> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
>>> or just it being spuriously executed as part of the initial acpica pass, I don't
>>> know.
>>> 
>>> Both statements combined means the checksum and online flags in the MADT get
>>> changed after initial parsing making it appear as-if all 128 CPUs were plugged.
>>
>> I can follow this part (the online flags one, that is).
>>
>>> This patch makes the checksums be correct after acpica init.
>>
>> I'm still in trouble with this one. If MADT is modified in the process, there's
>> only one of two possible options:
>> 1) It's expected for the checksum to no longer be correct.
>> 2) The checksum is being fixed up in the process.
>> That's independent of being HVM or PVH and independent of guest boot or later.
>> (Of course there's a sub-variant of 2, where the adjusting of the checksum
>> would be broken, but that wouldn't be covered by your change.)
>>
>> Jan
>
> I see what you mean now. The checksum correction code LOOKS correct. But I
> wonder about the table length... We report a table as big as it needs to be,
> but the checksum update is done irrespective of FLG being inside the valid range
> of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being signalled
> that'd trigger the (unseen) online flag to be enabled and the checksum adjusted,
> except the checksum must not being adjusted.
>
> I could add even more AML to cover that, but that'd be QEMU misbehaving (or
> being absent). This patch covers the latter case, but it might be good to
> change the commit message to reflect the real problem.
>
> Cheers,
> Alejandro

It doesn't quite add up in the mismatch though. There might be something else
lurking in there.

Regardless, I don't want this junk in PVH. Would a commit reword suffice to have
it acked?

Cheers,
Alejandro

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Jan Beulich 2 days, 19 hours ago

On 10.09.2025 19:29, Alejandro Vallejo wrote:
> On Wed Sep 10, 2025 at 7:01 PM CEST, Alejandro Vallejo wrote:
>> On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote:
>>> On 10.09.2025 17:16, Alejandro Vallejo wrote:
>>>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>>>>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>>>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>>>>> this causes the MADT to get corrupted due to spurious modifications of
>>>>>> the "online" flag in MADT entries and the table checksum during the
>>>>>> initial acpica passes.
>>>>>
>>>>> I don't understand this MADT corruption aspect, which - aiui - is why
>>>>> there's a Fixes: tag here. The code change itself looks plausible.
>>>>
>>>> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
>>>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
>>>> handler.
>>>>
>>>> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
>>>> or just it being spuriously executed as part of the initial acpica pass, I don't
>>>> know.
>>>>
>>>> Both statements combined means the checksum and online flags in the MADT get
>>>> changed after initial parsing making it appear as-if all 128 CPUs were plugged.
>>>
>>> I can follow this part (the online flags one, that is).
>>>
>>>> This patch makes the checksums be correct after acpica init.
>>>
>>> I'm still in trouble with this one. If MADT is modified in the process, there's
>>> only one of two possible options:
>>> 1) It's expected for the checksum to no longer be correct.
>>> 2) The checksum is being fixed up in the process.
>>> That's independent of being HVM or PVH and independent of guest boot or later.
>>> (Of course there's a sub-variant of 2, where the adjusting of the checksum
>>> would be broken, but that wouldn't be covered by your change.)
>>
>> I see what you mean now. The checksum correction code LOOKS correct. But I
>> wonder about the table length... We report a table as big as it needs to be,
>> but the checksum update is done irrespective of FLG being inside the valid range
>> of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being signalled
>> that'd trigger the (unseen) online flag to be enabled and the checksum adjusted,
>> except the checksum must not being adjusted.
>>
>> I could add even more AML to cover that, but that'd be QEMU misbehaving (or
>> being absent). This patch covers the latter case, but it might be good to
>> change the commit message to reflect the real problem.
> 
> It doesn't quite add up in the mismatch though. There might be something else
> lurking in there.
> 
> Regardless, I don't want this junk in PVH. Would a commit reword suffice to have
> it acked?

I think so, yes.

Jan

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs

Posted by Alejandro Vallejo 2 days, 16 hours ago

On Thu Sep 11, 2025 at 9:44 AM CEST, Jan Beulich wrote:
> On 10.09.2025 19:29, Alejandro Vallejo wrote:
>> On Wed Sep 10, 2025 at 7:01 PM CEST, Alejandro Vallejo wrote:
>>> On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote:
>>>> On 10.09.2025 17:16, Alejandro Vallejo wrote:
>>>>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>>>>>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>>>>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>>>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>>>>>> this causes the MADT to get corrupted due to spurious modifications of
>>>>>>> the "online" flag in MADT entries and the table checksum during the
>>>>>>> initial acpica passes.
>>>>>>
>>>>>> I don't understand this MADT corruption aspect, which - aiui - is why
>>>>>> there's a Fixes: tag here. The code change itself looks plausible.
>>>>>
>>>>> When there's no DM to provide a real and honest online CPU bitmap on PIO 0xAF00
>>>>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
>>>>> handler.
>>>>>
>>>>> Somehow, the GPE handler is being triggered. Whether this is due to a real SCI
>>>>> or just it being spuriously executed as part of the initial acpica pass, I don't
>>>>> know.
>>>>>
>>>>> Both statements combined means the checksum and online flags in the MADT get
>>>>> changed after initial parsing making it appear as-if all 128 CPUs were plugged.
>>>>
>>>> I can follow this part (the online flags one, that is).
>>>>
>>>>> This patch makes the checksums be correct after acpica init.
>>>>
>>>> I'm still in trouble with this one. If MADT is modified in the process, there's
>>>> only one of two possible options:
>>>> 1) It's expected for the checksum to no longer be correct.
>>>> 2) The checksum is being fixed up in the process.
>>>> That's independent of being HVM or PVH and independent of guest boot or later.
>>>> (Of course there's a sub-variant of 2, where the adjusting of the checksum
>>>> would be broken, but that wouldn't be covered by your change.)
>>>
>>> I see what you mean now. The checksum correction code LOOKS correct. But I
>>> wonder about the table length... We report a table as big as it needs to be,
>>> but the checksum update is done irrespective of FLG being inside the valid range
>>> of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being signalled
>>> that'd trigger the (unseen) online flag to be enabled and the checksum adjusted,
>>> except the checksum must not being adjusted.
>>>
>>> I could add even more AML to cover that, but that'd be QEMU misbehaving (or
>>> being absent). This patch covers the latter case, but it might be good to
>>> change the commit message to reflect the real problem.
>> 
>> It doesn't quite add up in the mismatch though. There might be something else
>> lurking in there.
>> 
>> Regardless, I don't want this junk in PVH. Would a commit reword suffice to have
>> it acked?
>
> I think so, yes.
>
> Jan

The problem is present in HVM too, I think. It just clicked to me that if AML
overflows the MADT it WILL corrupt whatever is after it, and how many times (and
in what direction) the checksum changes is undefined because the memory after the
MADT is undefined. I somehow thought we allocated a full 128-CPU MADT, but that
doesn't seem to be the case (rightfully so).

So this is a general ACPI memory corruption bug. The saving grace is that tables
are already parsed by the time execute AML, and the corruption doesn't seem to
reach the DSDT.

Modifying the DSDT to avoid the overflow seems unavoidable. That should fix the
root cause.

I still want to remove it all on PVH, but HVM should be fixed too.

Cheers,
Alejandro