CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
by the device model. The GPE handler checks this and compares it against
the "online" flag on each MADT LAPIC entry, setting the flag to its
related bit in the bitmap and adjusting the table's checksum.
The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
reaches 128, even if that overflows the MADT into some other (hopefully
mapped) memory. The reading isn't as problematic as the writing though.
If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
then the bit where the "online" flag would be is flipped, thus
corrupting that memory. And the MADT checksum gets adjusted for a flip
that happened outside its range. It's all terrible.
Note that this corruption happens regardless of the device-model being
present or not, because even if the bitmap holds 0s, the overflowed
memory might not at the bits corresponding to the "online" flag.
This patch adjusts the DSDT so entries >=NCPUS are skipped.
Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
there.
v2:
* New patch with the general fix for HVM too. Turns out the correction
logic was buggy after all.
---
tools/libacpi/mk_dsdt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 8ac4f9d0b4..06a229e2e9 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -239,7 +239,8 @@ int main(int argc, char **argv)
/* Extract current CPU's status: 0=offline; 1=online. */
stmt("And", "Local1, 1, Local2");
/* Check if status is up-to-date in the relevant MADT LAPIC entry... */
- push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+ push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))",
+ cpu, cpu);
/* ...If not, update it and the MADT checksum, and notify OSPM. */
stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
push_block("If", "LEqual(Local2, 1)");
--
2.43.0
On 11.09.2025 13:53, Alejandro Vallejo wrote: > CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 > by the device model. The GPE handler checks this and compares it against > the "online" flag on each MADT LAPIC entry, setting the flag to its > related bit in the bitmap and adjusting the table's checksum. > > The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it > reaches 128, even if that overflows the MADT into some other (hopefully > mapped) memory. The reading isn't as problematic as the writing though. > > If an "entry" outside the MADT is deemed to disagree with the CPU bitmap > then the bit where the "online" flag would be is flipped, thus > corrupting that memory. And the MADT checksum gets adjusted for a flip > that happened outside its range. It's all terrible. > > Note that this corruption happens regardless of the device-model being > present or not, because even if the bitmap holds 0s, the overflowed > memory might not at the bits corresponding to the "online" flag. > > This patch adjusts the DSDT so entries >=NCPUS are skipped. > > Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU hotplug notify handler in ACPI DSDT"), though. Before that there was a different issue (as mentioned in the description). > --- a/tools/libacpi/mk_dsdt.c > +++ b/tools/libacpi/mk_dsdt.c > @@ -239,7 +239,8 @@ int main(int argc, char **argv) > /* Extract current CPU's status: 0=offline; 1=online. */ > stmt("And", "Local1, 1, Local2"); > /* Check if status is up-to-date in the relevant MADT LAPIC entry... */ > - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu); > + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))", > + cpu, cpu); Don't we need to use \\_SB.NCPU here? From the other two uses it's not quite clear; it might also be that the one using this form is actually needlessly doing so. Yet here it may be better if only for consistency's sake, as the LNotEqual() also operates on an absolute reference. The other thing is that I'm not fluent in AML operand evaluation rules. We want to avoid even the read access to FLG, and I'm unconvinced And() will avoid evaluating its 2nd argument when the first one is 0. IOW this may need to become a 2nd "If". I further think that strictly speaking you mean LAnd() here, not And() (but the above concern remains; all the ASL spec says is "Source1 and Source2 are evaluated as integers" for both And() and LAnd()). Jan
On Thu Sep 11, 2025 at 4:52 PM CEST, Jan Beulich wrote: > On 11.09.2025 13:53, Alejandro Vallejo wrote: >> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >> by the device model. The GPE handler checks this and compares it against >> the "online" flag on each MADT LAPIC entry, setting the flag to its >> related bit in the bitmap and adjusting the table's checksum. >> >> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >> reaches 128, even if that overflows the MADT into some other (hopefully >> mapped) memory. The reading isn't as problematic as the writing though. >> >> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >> then the bit where the "online" flag would be is flipped, thus >> corrupting that memory. And the MADT checksum gets adjusted for a flip >> that happened outside its range. It's all terrible. >> >> Note that this corruption happens regardless of the device-model being >> present or not, because even if the bitmap holds 0s, the overflowed >> memory might not at the bits corresponding to the "online" flag. >> >> This patch adjusts the DSDT so entries >=NCPUS are skipped. >> >> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") > > The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU > hotplug notify handler in ACPI DSDT"), though. Before that there was a > different issue (as mentioned in the description). As you mentioned elsewhere, it probably is 087543338924("hvmloader: limit CPUs exposed to guests") that matters. Until then the DSDT was correct. > >> --- a/tools/libacpi/mk_dsdt.c >> +++ b/tools/libacpi/mk_dsdt.c >> @@ -239,7 +239,8 @@ int main(int argc, char **argv) >> /* Extract current CPU's status: 0=offline; 1=online. */ >> stmt("And", "Local1, 1, Local2"); >> /* Check if status is up-to-date in the relevant MADT LAPIC entry... */ >> - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu); >> + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))", >> + cpu, cpu); > > Don't we need to use \\_SB.NCPU here? From the other two uses it's not > quite clear; it might also be that the one using this form is actually > needlessly doing so. Yet here it may be better if only for consistency's > sake, as the LNotEqual() also operates on an absolute reference. \SB.PMAT method does the same thing. I'll just change that too while at it. > The other thing is that I'm not fluent in AML operand evaluation rules. > We want to avoid even the read access to FLG, and I'm unconvinced And() > will avoid evaluating its 2nd argument when the first one is 0. IOW this > may need to become a 2nd "If". I don't think there are any rules, it's unspecified. While in practice it wouldn't matter a lot, it's indeed better not to rely on it not blowing up. After sending this, I wondered about having a separate if with an early return. > > I further think that strictly speaking you mean LAnd() here, not And() > (but the above concern remains; all the ASL spec says is "Source1 and > Source2 are evaluated as integers" for both And() and LAnd()). I very definitely did mean LAnd! Nice catch. As for > > Jan TL;DR: Will s/And/LAnd/ and move it to a separate If Cheers, Alejandro
On 11.09.2025 17:32, Alejandro Vallejo wrote: > On Thu Sep 11, 2025 at 4:52 PM CEST, Jan Beulich wrote: >> On 11.09.2025 13:53, Alejandro Vallejo wrote: >>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >>> by the device model. The GPE handler checks this and compares it against >>> the "online" flag on each MADT LAPIC entry, setting the flag to its >>> related bit in the bitmap and adjusting the table's checksum. >>> >>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >>> reaches 128, even if that overflows the MADT into some other (hopefully >>> mapped) memory. The reading isn't as problematic as the writing though. >>> >>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >>> then the bit where the "online" flag would be is flipped, thus >>> corrupting that memory. And the MADT checksum gets adjusted for a flip >>> that happened outside its range. It's all terrible. >>> >>> Note that this corruption happens regardless of the device-model being >>> present or not, because even if the bitmap holds 0s, the overflowed >>> memory might not at the bits corresponding to the "online" flag. >>> >>> This patch adjusts the DSDT so entries >=NCPUS are skipped. >>> >>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >> >> The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU >> hotplug notify handler in ACPI DSDT"), though. Before that there was a >> different issue (as mentioned in the description). > > As you mentioned elsewhere, it probably is 087543338924("hvmloader: limit CPUs > exposed to guests") that matters. Until then the DSDT was correct. > >> >>> --- a/tools/libacpi/mk_dsdt.c >>> +++ b/tools/libacpi/mk_dsdt.c >>> @@ -239,7 +239,8 @@ int main(int argc, char **argv) >>> /* Extract current CPU's status: 0=offline; 1=online. */ >>> stmt("And", "Local1, 1, Local2"); >>> /* Check if status is up-to-date in the relevant MADT LAPIC entry... */ >>> - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu); >>> + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))", >>> + cpu, cpu); >> >> Don't we need to use \\_SB.NCPU here? From the other two uses it's not >> quite clear; it might also be that the one using this form is actually >> needlessly doing so. Yet here it may be better if only for consistency's >> sake, as the LNotEqual() also operates on an absolute reference. > > \SB.PMAT method does the same thing. I'll just change that too while at it. > >> The other thing is that I'm not fluent in AML operand evaluation rules. >> We want to avoid even the read access to FLG, and I'm unconvinced And() >> will avoid evaluating its 2nd argument when the first one is 0. IOW this >> may need to become a 2nd "If". > > I don't think there are any rules, it's unspecified. While in practice it > wouldn't matter a lot, it's indeed better not to rely on it not blowing up. > > After sending this, I wondered about having a separate if with an early return. > >> >> I further think that strictly speaking you mean LAnd() here, not And() >> (but the above concern remains; all the ASL spec says is "Source1 and >> Source2 are evaluated as integers" for both And() and LAnd()). > > I very definitely did mean LAnd! Nice catch. As for > >> >> Jan > > TL;DR: Will s/And/LAnd/ and move it to a separate If Except that once you use a separate If, no And() or LAnd() will be needed anymore. Jan
On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: > CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 > by the device model. The GPE handler checks this and compares it against > the "online" flag on each MADT LAPIC entry, setting the flag to its > related bit in the bitmap and adjusting the table's checksum. > > The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it > reaches 128, even if that overflows the MADT into some other (hopefully > mapped) memory. The reading isn't as problematic as the writing though. > > If an "entry" outside the MADT is deemed to disagree with the CPU bitmap > then the bit where the "online" flag would be is flipped, thus > corrupting that memory. And the MADT checksum gets adjusted for a flip > that happened outside its range. It's all terrible. > > Note that this corruption happens regardless of the device-model being > present or not, because even if the bitmap holds 0s, the overflowed > memory might not at the bits corresponding to the "online" flag. > > This patch adjusts the DSDT so entries >=NCPUS are skipped. > > Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") > Reported-by: Grygorii Strashko <grygorii_strashko@epam.com> > Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com> > --- > Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested > there. > > v2: > * New patch with the general fix for HVM too. Turns out the correction > logic was buggy after all. Hmm, this does sound rather more serious. I have a nagging feeling that until recently we always wrote 128 MADT entries. So, while this looks like a good fix, I think we might want a second Fixes tag. ~Andrew
On 11.09.2025 14:03, Andrew Cooper wrote: > On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: >> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >> by the device model. The GPE handler checks this and compares it against >> the "online" flag on each MADT LAPIC entry, setting the flag to its >> related bit in the bitmap and adjusting the table's checksum. >> >> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >> reaches 128, even if that overflows the MADT into some other (hopefully >> mapped) memory. The reading isn't as problematic as the writing though. >> >> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >> then the bit where the "online" flag would be is flipped, thus >> corrupting that memory. And the MADT checksum gets adjusted for a flip >> that happened outside its range. It's all terrible. >> >> Note that this corruption happens regardless of the device-model being >> present or not, because even if the bitmap holds 0s, the overflowed >> memory might not at the bits corresponding to the "online" flag. >> >> This patch adjusts the DSDT so entries >=NCPUS are skipped. >> >> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com> >> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com> >> --- >> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested >> there. >> >> v2: >> * New patch with the general fix for HVM too. Turns out the correction >> logic was buggy after all. > > Hmm, this does sound rather more serious. I have a nagging feeling that > until recently we always wrote 128 MADT entries. Not exactly recently, but looks like that's my fault then: 0875433389240 ("hvmloader: limit CPUs exposed to guests"). Jan
On Thu Sep 11, 2025 at 5:04 PM CEST, Jan Beulich wrote: > On 11.09.2025 14:03, Andrew Cooper wrote: >> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: >>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >>> by the device model. The GPE handler checks this and compares it against >>> the "online" flag on each MADT LAPIC entry, setting the flag to its >>> related bit in the bitmap and adjusting the table's checksum. >>> >>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >>> reaches 128, even if that overflows the MADT into some other (hopefully >>> mapped) memory. The reading isn't as problematic as the writing though. >>> >>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >>> then the bit where the "online" flag would be is flipped, thus >>> corrupting that memory. And the MADT checksum gets adjusted for a flip >>> that happened outside its range. It's all terrible. >>> >>> Note that this corruption happens regardless of the device-model being >>> present or not, because even if the bitmap holds 0s, the overflowed >>> memory might not at the bits corresponding to the "online" flag. >>> >>> This patch adjusts the DSDT so entries >=NCPUS are skipped. >>> >>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com> >>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com> >>> --- >>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested >>> there. >>> >>> v2: >>> * New patch with the general fix for HVM too. Turns out the correction >>> logic was buggy after all. >> >> Hmm, this does sound rather more serious. I have a nagging feeling that >> until recently we always wrote 128 MADT entries. > > Not exactly recently, but looks like that's my fault then: 0875433389240 > ("hvmloader: limit CPUs exposed to guests"). > > Jan Very right. I got to that commit, but thought nr_processor_objects would match NCPUS. Wrong assumption. That sorts out wich fixes tag to attribute this to. Cheers, Alejandro
On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote: > On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: >> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >> by the device model. The GPE handler checks this and compares it against >> the "online" flag on each MADT LAPIC entry, setting the flag to its >> related bit in the bitmap and adjusting the table's checksum. >> >> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >> reaches 128, even if that overflows the MADT into some other (hopefully >> mapped) memory. The reading isn't as problematic as the writing though. >> >> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >> then the bit where the "online" flag would be is flipped, thus >> corrupting that memory. And the MADT checksum gets adjusted for a flip >> that happened outside its range. It's all terrible. >> >> Note that this corruption happens regardless of the device-model being >> present or not, because even if the bitmap holds 0s, the overflowed >> memory might not at the bits corresponding to the "online" flag. >> >> This patch adjusts the DSDT so entries >=NCPUS are skipped. >> >> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com> >> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com> >> --- >> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested >> there. >> >> v2: >> * New patch with the general fix for HVM too. Turns out the correction >> logic was buggy after all. > > Hmm, this does sound rather more serious. I have a nagging feeling that > until recently we always wrote 128 MADT entries. If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when. Then it got extended to whatever it needed to be. I have the nagging feeling that rather opaque "some OSs (cough Windows cough) don't like more than 16 CPUs was actually this bug in action. Making the DSDTs with exactly 16 CPUs a particular kind of silly. > > So, while this looks like a good fix, I think we might want a second > Fixes tag. Happy to add it, but I really don't see anything like that in the git log. Cheers, Alejandro
On 11/09/2025 1:35 pm, Alejandro Vallejo wrote: > On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote: >> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote: >>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00 >>> by the device model. The GPE handler checks this and compares it against >>> the "online" flag on each MADT LAPIC entry, setting the flag to its >>> related bit in the bitmap and adjusting the table's checksum. >>> >>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it >>> reaches 128, even if that overflows the MADT into some other (hopefully >>> mapped) memory. The reading isn't as problematic as the writing though. >>> >>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap >>> then the bit where the "online" flag would be is flipped, thus >>> corrupting that memory. And the MADT checksum gets adjusted for a flip >>> that happened outside its range. It's all terrible. >>> >>> Note that this corruption happens regardless of the device-model being >>> present or not, because even if the bitmap holds 0s, the overflowed >>> memory might not at the bits corresponding to the "online" flag. >>> >>> This patch adjusts the DSDT so entries >=NCPUS are skipped. >>> >>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...") >>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com> >>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com> >>> --- >>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested >>> there. >>> >>> v2: >>> * New patch with the general fix for HVM too. Turns out the correction >>> logic was buggy after all. >> Hmm, this does sound rather more serious. I have a nagging feeling that >> until recently we always wrote 128 MADT entries. > If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when. > Then it got extended to whatever it needed to be. > > I have the nagging feeling that rather opaque "some OSs (cough Windows cough) > don't like more than 16 CPUs was actually this bug in action. Making the DSDTs > with exactly 16 CPUs a particular kind of silly. > >> So, while this looks like a good fix, I think we might want a second >> Fixes tag. > Happy to add it, but I really don't see anything like that in the git log. I can't find it either. Maybe my memory is getting faulty. ~Andrew
© 2016 - 2025 Red Hat, Inc.