UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
Aim:
- To solve the assertion that checks if CpuMpData->FinishedCount
equals (CpuMpData->CpuCount - 1). The assertion arises from a timing
discrepancy between the BSP's completion of startup signal checks and
the APs' incrementation of the FinishedCount.
- This patch also ensures that "finished" reporting from the APs is as
later as possible.
More specifially:
In the SwitchApContext() function, the BSP trigers
the startup signal and check whether the APs have received it. After
completing this check, the BSP then verifies if the FinishedCount is
equal to CpuCount-1.
On the AP side, upon receiving the startup signal, they invoke
SwitchContextPerAp() and increase the FinishedCount to indicate their
activation. However, even when all APs have received the startup signal,
they might not have finished incrementing the FinishedCount. This timing
gap results in the triggering of the assertion.
Solution:
Instead of assertion, use while loop to waits until all the APs have
incremented the FinishedCount.
Fixes: 964a4f032dcd
Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
---
UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 6f1456cfe1..9a6ec5db5c 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -913,8 +913,8 @@ DxeApEntryPoint (
UINTN ProcessorNumber;
GetProcessorNumber (CpuMpData, &ProcessorNumber);
- InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount);
RestoreVolatileRegisters (&CpuMpData->CpuData[0].VolatileRegisters, FALSE);
+ InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount);
PlaceAPInMwaitLoopOrRunLoop (
CpuMpData->ApLoopMode,
CpuMpData->CpuData[ProcessorNumber].StartupApSignal,
@@ -2201,7 +2201,12 @@ MpInitLibInitialize (
// looping process there.
//
SwitchApContext (MpHandOff);
- ASSERT (CpuMpData->FinishedCount == (CpuMpData->CpuCount - 1));
+ //
+ // Wait for all APs finished initialization
+ //
+ while (CpuMpData->FinishedCount < (CpuMpData->CpuCount - 1)) {
+ CpuPause ();
+ }
//
// Set Apstate as Idle, otherwise Aps cannot be waken-up again.
--
2.36.1.windows.1
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#110052): https://edk2.groups.io/g/devel/message/110052
Mute This Topic: https://groups.io/mt/102176057/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-
On 10/25/23 13:42, Yuanhao Xie wrote: > Aim: > - To solve the assertion that checks if CpuMpData->FinishedCount > equals (CpuMpData->CpuCount - 1). The assertion arises from a timing > discrepancy between the BSP's completion of startup signal checks and > the APs' incrementation of the FinishedCount. > - This patch also ensures that "finished" reporting from the APs is as > later as possible. > > More specifially: > > In the SwitchApContext() function, the BSP trigers > the startup signal and check whether the APs have received it. After > completing this check, the BSP then verifies if the FinishedCount is > equal to CpuCount-1. > > On the AP side, upon receiving the startup signal, they invoke > SwitchContextPerAp() and increase the FinishedCount to indicate their > activation. However, even when all APs have received the startup signal, > they might not have finished incrementing the FinishedCount. This timing > gap results in the triggering of the assertion. > > Solution: > Instead of assertion, use while loop to waits until all the APs have > incremented the FinishedCount. > > Fixes: 964a4f032dcd > > Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com> > Cc: Ray Ni <ray.ni@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Rahul Kumar <rahul1.kumar@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > --- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index 6f1456cfe1..9a6ec5db5c 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -913,8 +913,8 @@ DxeApEntryPoint ( > UINTN ProcessorNumber; > > GetProcessorNumber (CpuMpData, &ProcessorNumber); > - InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); > RestoreVolatileRegisters (&CpuMpData->CpuData[0].VolatileRegisters, FALSE); > + InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); > PlaceAPInMwaitLoopOrRunLoop ( > CpuMpData->ApLoopMode, > CpuMpData->CpuData[ProcessorNumber].StartupApSignal, > @@ -2201,7 +2201,12 @@ MpInitLibInitialize ( > // looping process there. > // > SwitchApContext (MpHandOff); > - ASSERT (CpuMpData->FinishedCount == (CpuMpData->CpuCount - 1)); > + // > + // Wait for all APs finished initialization > + // > + while (CpuMpData->FinishedCount < (CpuMpData->CpuCount - 1)) { > + CpuPause (); > + } > > // > // Set Apstate as Idle, otherwise Aps cannot be waken-up again. https://github.com/tianocore/edk2/pull/4964 (in progress) -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110102): https://edk2.groups.io/g/devel/message/110102 Mute This Topic: https://groups.io/mt/102176057/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 10/26/23 15:41, Laszlo Ersek wrote: > On 10/25/23 13:42, Yuanhao Xie wrote: >> Aim: >> - To solve the assertion that checks if CpuMpData->FinishedCount >> equals (CpuMpData->CpuCount - 1). The assertion arises from a timing >> discrepancy between the BSP's completion of startup signal checks and >> the APs' incrementation of the FinishedCount. >> - This patch also ensures that "finished" reporting from the APs is as >> later as possible. >> >> More specifially: >> >> In the SwitchApContext() function, the BSP trigers >> the startup signal and check whether the APs have received it. After >> completing this check, the BSP then verifies if the FinishedCount is >> equal to CpuCount-1. >> >> On the AP side, upon receiving the startup signal, they invoke >> SwitchContextPerAp() and increase the FinishedCount to indicate their >> activation. However, even when all APs have received the startup signal, >> they might not have finished incrementing the FinishedCount. This timing >> gap results in the triggering of the assertion. >> >> Solution: >> Instead of assertion, use while loop to waits until all the APs have >> incremented the FinishedCount. >> >> Fixes: 964a4f032dcd >> >> Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com> >> Cc: Ray Ni <ray.ni@intel.com> >> Cc: Eric Dong <eric.dong@intel.com> >> Cc: Rahul Kumar <rahul1.kumar@intel.com> >> Cc: Tom Lendacky <thomas.lendacky@amd.com> >> --- >> UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> index 6f1456cfe1..9a6ec5db5c 100644 >> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c >> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c >> @@ -913,8 +913,8 @@ DxeApEntryPoint ( >> UINTN ProcessorNumber; >> >> GetProcessorNumber (CpuMpData, &ProcessorNumber); >> - InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); >> RestoreVolatileRegisters (&CpuMpData->CpuData[0].VolatileRegisters, FALSE); >> + InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); >> PlaceAPInMwaitLoopOrRunLoop ( >> CpuMpData->ApLoopMode, >> CpuMpData->CpuData[ProcessorNumber].StartupApSignal, >> @@ -2201,7 +2201,12 @@ MpInitLibInitialize ( >> // looping process there. >> // >> SwitchApContext (MpHandOff); >> - ASSERT (CpuMpData->FinishedCount == (CpuMpData->CpuCount - 1)); >> + // >> + // Wait for all APs finished initialization >> + // >> + while (CpuMpData->FinishedCount < (CpuMpData->CpuCount - 1)) { >> + CpuPause (); >> + } >> >> // >> // Set Apstate as Idle, otherwise Aps cannot be waken-up again. > > https://github.com/tianocore/edk2/pull/4964 > > (in progress) Commit 74c687cc2f2f. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110139): https://edk2.groups.io/g/devel/message/110139 Mute This Topic: https://groups.io/mt/102176057/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 10/25/23 13:42, Yuanhao Xie wrote: > Aim: > - To solve the assertion that checks if CpuMpData->FinishedCount > equals (CpuMpData->CpuCount - 1). The assertion arises from a timing > discrepancy between the BSP's completion of startup signal checks and > the APs' incrementation of the FinishedCount. > - This patch also ensures that "finished" reporting from the APs is as > later as possible. > > More specifially: > > In the SwitchApContext() function, the BSP trigers > the startup signal and check whether the APs have received it. After > completing this check, the BSP then verifies if the FinishedCount is > equal to CpuCount-1. > > On the AP side, upon receiving the startup signal, they invoke > SwitchContextPerAp() and increase the FinishedCount to indicate their > activation. However, even when all APs have received the startup signal, > they might not have finished incrementing the FinishedCount. This timing > gap results in the triggering of the assertion. > > Solution: > Instead of assertion, use while loop to waits until all the APs have > incremented the FinishedCount. > > Fixes: 964a4f032dcd > > Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com> > Cc: Ray Ni <ray.ni@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Rahul Kumar <rahul1.kumar@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > --- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index 6f1456cfe1..9a6ec5db5c 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -913,8 +913,8 @@ DxeApEntryPoint ( > UINTN ProcessorNumber; > > GetProcessorNumber (CpuMpData, &ProcessorNumber); > - InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); > RestoreVolatileRegisters (&CpuMpData->CpuData[0].VolatileRegisters, FALSE); > + InterlockedIncrement ((UINT32 *)&CpuMpData->FinishedCount); > PlaceAPInMwaitLoopOrRunLoop ( > CpuMpData->ApLoopMode, > CpuMpData->CpuData[ProcessorNumber].StartupApSignal, > @@ -2201,7 +2201,12 @@ MpInitLibInitialize ( > // looping process there. > // > SwitchApContext (MpHandOff); > - ASSERT (CpuMpData->FinishedCount == (CpuMpData->CpuCount - 1)); > + // > + // Wait for all APs finished initialization > + // > + while (CpuMpData->FinishedCount < (CpuMpData->CpuCount - 1)) { > + CpuPause (); > + } > > // > // Set Apstate as Idle, otherwise Aps cannot be waken-up again. Reviewed-by: Laszlo Ersek <lersek@redhat.com> The change is not testable using OVMF, because OVMF (intentionally) uses ApLoopMode=ApInHltLoop, and in that case, neither hunk is reachable. (Accordingly, the log message reports WaitLoopExecutionMode as zero.) I've still regression-tested this change, with my usual configs: - OVMF IA32 with SMM_REQUIRE, on q35 - OVMF IA32X64 with SMM_REQUIRE, on q35 - OVMF X64 without SMM_REQUIRE, on pc (i440fx) The test goes like - boot with 1 cold-plugged plus 2 more hot-pluggable VCPUs - [*] - hotplug 2 VCPUs - [*] - hot-unplug 2 VCPUs - [*] - poweroff where [*] stands for: - run efibootmgr, bound to each online VCPU in separation - ACPI S3 suspend/resume - run efibootmgr, bound to each online VCPU in separation I used Fedora and RHEL guests. So: Regression-tested-by: Laszlo Ersek <lersek@redhat.com> Thanks Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110101): https://edk2.groups.io/g/devel/message/110101 Mute This Topic: https://groups.io/mt/102176057/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
© 2016 - 2024 Red Hat, Inc.