[edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection

Ankur Arora posted 9 patches 2 years ago
There is a newer version of this series
[edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Posted by Ankur Arora 2 years ago
Designate a worker CPU (we use the one executing the root MMI
handler), which will do the actual ejection via QEMU in CpuEject().

CpuEject(), on the worker CPU, ejects each marked CPU by first
selecting its APIC ID and then sending the QEMU "eject" command.
QEMU in-turn signals the remote VCPU thread which context-switches
it out of the SMI.

CpuEject(), on the CPU being ejected, spins around in its holding
area until this final context-switch. This does mean that there is
some CPU state that would ordinarily be restored (in SmiRendezvous()
and in SmiEntry.nasm::CommonHandler), but will not be anymore.
This unrestored state includes FPU state, CET enable, stuffing of
RSB and the final RSM. Since the CPU state is destroyed by QEMU,
this should be okay.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Aaron Young <aaron.young@oracle.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73 ++++++++++++++++++++++++++++++++++----
 1 file changed, 67 insertions(+), 6 deletions(-)

diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
index 526f51faf070..bf91344eef9c 100644
--- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
+++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
@@ -193,9 +193,12 @@ RevokeNewSlot:
   CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
   on each CPU at exit from SMM.
 
-  If, the executing CPU is not being ejected, nothing to be done.
+  If, the executing CPU is neither a worker, nor being ejected, nothing
+  to be done.
   If, the executing CPU is being ejected, wait in a CpuDeadLoop()
   until ejected.
+  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
+  for CPUs being ejected.
 
   @param[in] ProcessorNum      Index of executing CPU.
 
@@ -217,6 +220,56 @@ CpuEject (
     return;
   }
 
+  if (ApicId == CPU_EJECT_WORKER) {
+    UINT32 CpuIndex;
+
+    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength; CpuIndex++) {
+      UINT64 RemoveApicId;
+
+      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
+
+      if ((RemoveApicId != CPU_EJECT_INVALID &&
+           RemoveApicId != CPU_EJECT_WORKER)) {
+        //
+        // This to-be-ejected-CPU has already received the BSP's SMI exit
+        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
+        // followed by this callback or is already waiting in the
+        // CpuDeadLoop() below.
+        //
+        // Tell QEMU to context-switch it out.
+        //
+        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
+        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
+
+        //
+        // Compiler barrier to ensure the next store isn't reordered
+        //
+        MemoryFence ();
+
+        //
+        // Clear the eject status for CpuIndex to ensure that an invalid
+        // SMI later does not end up trying to eject it or a newly
+        // hotplugged CpuIndex does not go into the dead loop.
+        //
+        mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
+
+        DEBUG ((DEBUG_INFO, "%a: Unplugged CPU %u -> " FMT_APIC_ID "\n",
+               __FUNCTION__, CpuIndex, RemoveApicId));
+      }
+    }
+
+    //
+    // Clear our own worker status.
+    //
+    mCpuHotEjectData->ApicIdMap[ProcessorNum] = CPU_EJECT_INVALID;
+
+    //
+    // We are done until the next hot-unplug; clear the handler.
+    //
+    mCpuHotEjectData->Handler = NULL;
+    return;
+  }
+
   //
   // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
   // after having been cleared to exit the SMI by the monarch and thus have
@@ -327,6 +380,19 @@ UnplugCpus (
   }
 
   if (EjectCount != 0) {
+    UINTN  Worker;
+
+    Status = mMmCpuService->WhoAmI (mMmCpuService, &Worker);
+    ASSERT_EFI_ERROR (Status);
+    //
+    // UnplugCpus() is called via the root MMI handler and thus we are
+    // executing in the BSP context.
+    //
+    // Mark ourselves as the worker CPU.
+    //
+    ASSERT (mCpuHotEjectData->ApicIdMap[Worker] == CPU_EJECT_INVALID);
+    mCpuHotEjectData->ApicIdMap[Worker] = CPU_EJECT_WORKER;
+
     //
     // We have processors to be ejected; install the handler.
     //
@@ -451,11 +517,6 @@ CpuHotplugMmi (
   if (EFI_ERROR (Status)) {
     goto Fatal;
   }
-  if (ToUnplugCount > 0) {
-    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
-      __FUNCTION__));
-    goto Fatal;
-  }
 
   if (PluggedCount > 0) {
     Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
-- 
2.9.3



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#70874): https://edk2.groups.io/g/devel/message/70874
Mute This Topic: https://groups.io/mt/80199930/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Posted by Laszlo Ersek 1 year, 12 months ago
On 01/29/21 01:59, Ankur Arora wrote:
> Designate a worker CPU (we use the one executing the root MMI
> handler), which will do the actual ejection via QEMU in CpuEject().
>
> CpuEject(), on the worker CPU, ejects each marked CPU by first
> selecting its APIC ID and then sending the QEMU "eject" command.
> QEMU in-turn signals the remote VCPU thread which context-switches
> it out of the SMI.
>
> CpuEject(), on the CPU being ejected, spins around in its holding
> area until this final context-switch. This does mean that there is
> some CPU state that would ordinarily be restored (in SmiRendezvous()
> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
> This unrestored state includes FPU state, CET enable, stuffing of
> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
> this should be okay.
>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Aaron Young <aaron.young@oracle.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 67 insertions(+), 6 deletions(-)

(1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
and code too).


> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> index 526f51faf070..bf91344eef9c 100644
> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> @@ -193,9 +193,12 @@ RevokeNewSlot:
>    CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>    on each CPU at exit from SMM.
>
> -  If, the executing CPU is not being ejected, nothing to be done.
> +  If, the executing CPU is neither a worker, nor being ejected, nothing
> +  to be done.
>    If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>    until ejected.
> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
> +  for CPUs being ejected.
>
>    @param[in] ProcessorNum      Index of executing CPU.
>
> @@ -217,6 +220,56 @@ CpuEject (
>      return;
>    }
>
> +  if (ApicId == CPU_EJECT_WORKER) {

(2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
generality). I wish I understood this idea earlier in the patch set.

(2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
dropped.

(2b) In this patch, the question whether the executing CPU is the BSP or
not, should be decided with the same logic that is visible in
PlatformSmmBspElection()
[OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:

  MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
  BOOLEAN                     IsBsp;

  ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
  IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);

(2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
in UnplugCpus() below.

(2d) The "is worker" language (in comments etc) should be replaced with
direct "is BSP" language.


> +    UINT32 CpuIndex;
> +
> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength; CpuIndex++) {
> +      UINT64 RemoveApicId;
> +
> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
> +
> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
> +           RemoveApicId != CPU_EJECT_WORKER)) {
> +        //
> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
> +        // followed by this callback or is already waiting in the
> +        // CpuDeadLoop() below.
> +        //
> +        // Tell QEMU to context-switch it out.
> +        //
> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);

(3) While the QEMU CPU selector value *usually* matches the APIC ID,
it's not an invariant. APIC IDs have an internal structure, composed of
bit-fields, where each bit-field accommodates one hierarchy level in the
CPU topology (thread, core, die (maybe), and socket).

However, this mapping need not be surjective. QEMU lets you create
"pathological" CPU topologies, for example one with:
- 3 threads/core,
- 5 cores/socket,
- (say) 2 sockets.

Under that example, the bit-field standing for the "thread number" level
would have 2 bits, theoretically permitting *4* threads/core, and the
bit-field standing for the "core number" level would have 3 bits,
theoretically allowing for *8* cores/socket.

Considering the fully populated topology, you'd see the CPU selector
range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
processors in total). However, the APIC ID *image* of this CPU selector
*domain* would not be "contiguous" -- the APIC ID space, with the
above-described structure, would accommodate 4*8*2=64 logical
processors. For example, each APIC ID that stood for the nonexistent
"thread#3" on a particular core would be left unused (no CPU selector
would map to it).

All in all, you can't write the APIC ID to the CPU selector register,
for ejection. You need to select the CPU whose APIC ID is the APIC ID
you want to eject, and then initiate ejection.

This requires one of two alternatives:


(3a) The first option is to keep the change local to this patch.

This alternative is the more CPU-hungry (and uglier) one.

The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
there is a match, eject.


(3b) The second option is much more elegant (and it's faster too), but
it requires a much more intrusive update to the patch set.

First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
operates on, has to be changed from APIC_ID to a structure type that
pairs APIC_ID with the QEMU CPU selector. [*]

Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
also save the "CurrentSelector" value (in the other field of the output
array element structure).

Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
replaced with a structure type similar (or identical) to the one
described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
the QEMU selector for that processor, and the APIC ID.

Fourth, the actual ejection should use the selector.

Fifth, the debug message (below) should continue logging the APIC ID, to
mirror the DEBUG_INFO message in ProcessHotAddedCpus().


Would you be willing to implement (3b)?


> +
> +        //
> +        // Compiler barrier to ensure the next store isn't reordered
> +        //
> +        MemoryFence ();
> +
> +        //
> +        // Clear the eject status for CpuIndex to ensure that an invalid
> +        // SMI later does not end up trying to eject it or a newly
> +        // hotplugged CpuIndex does not go into the dead loop.
> +        //
> +        mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
> +
> +        DEBUG ((DEBUG_INFO, "%a: Unplugged CPU %u -> " FMT_APIC_ID "\n",
> +               __FUNCTION__, CpuIndex, RemoveApicId));

(4) The DEBUG_INFO log message is in the right place (and uses the right
debug mask), but it is afflicted by the usual warts (indentation, format
specifiers etc). Please reapply the comments I made elsewhere.


(5a) Please replace "CPU" with "ProcessorNumber" (so that we know it's
the protocol-assigned number, not the QEMU selector).

(5b) Please replace the arrow " -> " with the string " APIC ID ".


Thanks!
Laszlo

> +      }
> +    }
> +
> +    //
> +    // Clear our own worker status.
> +    //
> +    mCpuHotEjectData->ApicIdMap[ProcessorNum] = CPU_EJECT_INVALID;
> +
> +    //
> +    // We are done until the next hot-unplug; clear the handler.
> +    //
> +    mCpuHotEjectData->Handler = NULL;
> +    return;
> +  }
> +
>    //
>    // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
>    // after having been cleared to exit the SMI by the monarch and thus have
> @@ -327,6 +380,19 @@ UnplugCpus (
>    }
>
>    if (EjectCount != 0) {
> +    UINTN  Worker;
> +
> +    Status = mMmCpuService->WhoAmI (mMmCpuService, &Worker);
> +    ASSERT_EFI_ERROR (Status);
> +    //
> +    // UnplugCpus() is called via the root MMI handler and thus we are
> +    // executing in the BSP context.
> +    //
> +    // Mark ourselves as the worker CPU.
> +    //
> +    ASSERT (mCpuHotEjectData->ApicIdMap[Worker] == CPU_EJECT_INVALID);
> +    mCpuHotEjectData->ApicIdMap[Worker] = CPU_EJECT_WORKER;
> +
>      //
>      // We have processors to be ejected; install the handler.
>      //
> @@ -451,11 +517,6 @@ CpuHotplugMmi (
>    if (EFI_ERROR (Status)) {
>      goto Fatal;
>    }
> -  if (ToUnplugCount > 0) {
> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
> -      __FUNCTION__));
> -    goto Fatal;
> -  }
>
>    if (PluggedCount > 0) {
>      Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
>



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71024): https://edk2.groups.io/g/devel/message/71024
Mute This Topic: https://groups.io/mt/80199930/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Posted by Ankur Arora 1 year, 12 months ago
On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
> On 01/29/21 01:59, Ankur Arora wrote:
>> Designate a worker CPU (we use the one executing the root MMI
>> handler), which will do the actual ejection via QEMU in CpuEject().
>>
>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>> selecting its APIC ID and then sending the QEMU "eject" command.
>> QEMU in-turn signals the remote VCPU thread which context-switches
>> it out of the SMI.
>>
>> CpuEject(), on the CPU being ejected, spins around in its holding
>> area until this final context-switch. This does mean that there is
>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>> This unrestored state includes FPU state, CET enable, stuffing of
>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>> this should be okay.
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Aaron Young <aaron.young@oracle.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>   OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73 ++++++++++++++++++++++++++++++++++----
>>   1 file changed, 67 insertions(+), 6 deletions(-)
> 
> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
> and code too).
> 
> 
>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> index 526f51faf070..bf91344eef9c 100644
>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>     CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>     on each CPU at exit from SMM.
>>
>> -  If, the executing CPU is not being ejected, nothing to be done.
>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>> +  to be done.
>>     If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>     until ejected.
>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>> +  for CPUs being ejected.
>>
>>     @param[in] ProcessorNum      Index of executing CPU.
>>
>> @@ -217,6 +220,56 @@ CpuEject (
>>       return;
>>     }
>>
>> +  if (ApicId == CPU_EJECT_WORKER) {
> 
> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
> generality). I wish I understood this idea earlier in the patch set.
> 
> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
> dropped.
> 
> (2b) In this patch, the question whether the executing CPU is the BSP or
> not, should be decided with the same logic that is visible in
> PlatformSmmBspElection()
> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
> 
>    MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>    BOOLEAN                     IsBsp;
> 
>    ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>    IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
> 
> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
> in UnplugCpus() below.
> 
> (2d) The "is worker" language (in comments etc) should be replaced with
> direct "is BSP" language.
> 
> 
>> +    UINT32 CpuIndex;
>> +
>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength; CpuIndex++) {
>> +      UINT64 RemoveApicId;
>> +
>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>> +
>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>> +        //
>> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>> +        // followed by this callback or is already waiting in the
>> +        // CpuDeadLoop() below.
>> +        //
>> +        // Tell QEMU to context-switch it out.
>> +        //
>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
> 
> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
> it's not an invariant. APIC IDs have an internal structure, composed of
> bit-fields, where each bit-field accommodates one hierarchy level in the
> CPU topology (thread, core, die (maybe), and socket).
> 
> However, this mapping need not be surjective. QEMU lets you create
> "pathological" CPU topologies, for example one with:
> - 3 threads/core,
> - 5 cores/socket,
> - (say) 2 sockets.
> 
> Under that example, the bit-field standing for the "thread number" level
> would have 2 bits, theoretically permitting *4* threads/core, and the
> bit-field standing for the "core number" level would have 3 bits,
> theoretically allowing for *8* cores/socket.
> 
> Considering the fully populated topology, you'd see the CPU selector
> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
> processors in total). However, the APIC ID *image* of this CPU selector
> *domain* would not be "contiguous" -- the APIC ID space, with the
> above-described structure, would accommodate 4*8*2=64 logical
> processors. For example, each APIC ID that stood for the nonexistent
> "thread#3" on a particular core would be left unused (no CPU selector
> would map to it).
> 
> All in all, you can't write the APIC ID to the CPU selector register,
> for ejection. You need to select the CPU whose APIC ID is the APIC ID
> you want to eject, and then initiate ejection.

Yeah, this is a clear bug. Should have seen it earlier. Thanks for
pointing it out.

> 
> This requires one of two alternatives:
> 
> 
> (3a) The first option is to keep the change local to this patch.
> 
> This alternative is the more CPU-hungry (and uglier) one.
> 
> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
> there is a match, eject.
> 
> 
> (3b) The second option is much more elegant (and it's faster too), but
> it requires a much more intrusive update to the patch set.
> 
> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
> operates on, has to be changed from APIC_ID to a structure type that
> pairs APIC_ID with the QEMU CPU selector. [*]
> 
> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
> also save the "CurrentSelector" value (in the other field of the output
> array element structure).
> 
> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
> replaced with a structure type similar (or identical) to the one
> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
> the QEMU selector for that processor, and the APIC ID.
> 
> Fourth, the actual ejection should use the selector.
> 
> Fifth, the debug message (below) should continue logging the APIC ID, to
> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
> 
> 
> Would you be willing to implement (3b)?

3b is clearly the better solution. However, is there enough value in
the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
carry both the cpu-selector and APIC ID?

As you say, the ejection itself just needs the ProcessorNum -> QEMU cpu-selector
mapping.

Ankur

> 
> 
>> +
>> +        //
>> +        // Compiler barrier to ensure the next store isn't reordered
>> +        //
>> +        MemoryFence ();
>> +
>> +        //
>> +        // Clear the eject status for CpuIndex to ensure that an invalid
>> +        // SMI later does not end up trying to eject it or a newly
>> +        // hotplugged CpuIndex does not go into the dead loop.
>> +        //
>> +        mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
>> +
>> +        DEBUG ((DEBUG_INFO, "%a: Unplugged CPU %u -> " FMT_APIC_ID "\n",
>> +               __FUNCTION__, CpuIndex, RemoveApicId));
> 
> (4) The DEBUG_INFO log message is in the right place (and uses the right
> debug mask), but it is afflicted by the usual warts (indentation, format
> specifiers etc). Please reapply the comments I made elsewhere.
> 
> 
> (5a) Please replace "CPU" with "ProcessorNumber" (so that we know it's
> the protocol-assigned number, not the QEMU selector).
> 
> (5b) Please replace the arrow " -> " with the string " APIC ID ".
> 
> 
> Thanks!
> Laszlo
> 
>> +      }
>> +    }
>> +
>> +    //
>> +    // Clear our own worker status.
>> +    //
>> +    mCpuHotEjectData->ApicIdMap[ProcessorNum] = CPU_EJECT_INVALID;
>> +
>> +    //
>> +    // We are done until the next hot-unplug; clear the handler.
>> +    //
>> +    mCpuHotEjectData->Handler = NULL;
>> +    return;
>> +  }
>> +
>>     //
>>     // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
>>     // after having been cleared to exit the SMI by the monarch and thus have
>> @@ -327,6 +380,19 @@ UnplugCpus (
>>     }
>>
>>     if (EjectCount != 0) {
>> +    UINTN  Worker;
>> +
>> +    Status = mMmCpuService->WhoAmI (mMmCpuService, &Worker);
>> +    ASSERT_EFI_ERROR (Status);
>> +    //
>> +    // UnplugCpus() is called via the root MMI handler and thus we are
>> +    // executing in the BSP context.
>> +    //
>> +    // Mark ourselves as the worker CPU.
>> +    //
>> +    ASSERT (mCpuHotEjectData->ApicIdMap[Worker] == CPU_EJECT_INVALID);
>> +    mCpuHotEjectData->ApicIdMap[Worker] = CPU_EJECT_WORKER;
>> +
>>       //
>>       // We have processors to be ejected; install the handler.
>>       //
>> @@ -451,11 +517,6 @@ CpuHotplugMmi (
>>     if (EFI_ERROR (Status)) {
>>       goto Fatal;
>>     }
>> -  if (ToUnplugCount > 0) {
>> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
>> -      __FUNCTION__));
>> -    goto Fatal;
>> -  }
>>
>>     if (PluggedCount > 0) {
>>       Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
>>
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71030): https://edk2.groups.io/g/devel/message/71030
Mute This Topic: https://groups.io/mt/80199930/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Posted by Laszlo Ersek 1 year, 12 months ago
On 02/01/21 20:21, Ankur Arora wrote:
> On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
>> On 01/29/21 01:59, Ankur Arora wrote:
>>> Designate a worker CPU (we use the one executing the root MMI
>>> handler), which will do the actual ejection via QEMU in CpuEject().
>>>
>>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>>> selecting its APIC ID and then sending the QEMU "eject" command.
>>> QEMU in-turn signals the remote VCPU thread which context-switches
>>> it out of the SMI.
>>>
>>> CpuEject(), on the CPU being ejected, spins around in its holding
>>> area until this final context-switch. This does mean that there is
>>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>>> This unrestored state includes FPU state, CET enable, stuffing of
>>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>>> this should be okay.
>>>
>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Cc: Aaron Young <aaron.young@oracle.com>
>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>> ---
>>>   OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73
>>> ++++++++++++++++++++++++++++++++++----
>>>   1 file changed, 67 insertions(+), 6 deletions(-)
>>
>> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
>> and code too).
>>
>>
>>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>> b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>> index 526f51faf070..bf91344eef9c 100644
>>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>>     CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>>     on each CPU at exit from SMM.
>>>
>>> -  If, the executing CPU is not being ejected, nothing to be done.
>>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>>> +  to be done.
>>>     If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>>     until ejected.
>>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>>> +  for CPUs being ejected.
>>>
>>>     @param[in] ProcessorNum      Index of executing CPU.
>>>
>>> @@ -217,6 +220,56 @@ CpuEject (
>>>       return;
>>>     }
>>>
>>> +  if (ApicId == CPU_EJECT_WORKER) {
>>
>> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
>> generality). I wish I understood this idea earlier in the patch set.
>>
>> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
>> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
>> dropped.
>>
>> (2b) In this patch, the question whether the executing CPU is the BSP or
>> not, should be decided with the same logic that is visible in
>> PlatformSmmBspElection()
>> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
>>
>>    MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>>    BOOLEAN                     IsBsp;
>>
>>    ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>>    IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>>
>> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
>> in UnplugCpus() below.
>>
>> (2d) The "is worker" language (in comments etc) should be replaced with
>> direct "is BSP" language.
>>
>>
>>> +    UINT32 CpuIndex;
>>> +
>>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength;
>>> CpuIndex++) {
>>> +      UINT64 RemoveApicId;
>>> +
>>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>>> +
>>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>>> +        //
>>> +        // This to-be-ejected-CPU has already received the BSP's SMI
>>> exit
>>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>>> +        // followed by this callback or is already waiting in the
>>> +        // CpuDeadLoop() below.
>>> +        //
>>> +        // Tell QEMU to context-switch it out.
>>> +        //
>>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
>>
>> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
>> it's not an invariant. APIC IDs have an internal structure, composed of
>> bit-fields, where each bit-field accommodates one hierarchy level in the
>> CPU topology (thread, core, die (maybe), and socket).
>>
>> However, this mapping need not be surjective. QEMU lets you create
>> "pathological" CPU topologies, for example one with:
>> - 3 threads/core,
>> - 5 cores/socket,
>> - (say) 2 sockets.
>>
>> Under that example, the bit-field standing for the "thread number" level
>> would have 2 bits, theoretically permitting *4* threads/core, and the
>> bit-field standing for the "core number" level would have 3 bits,
>> theoretically allowing for *8* cores/socket.
>>
>> Considering the fully populated topology, you'd see the CPU selector
>> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
>> processors in total). However, the APIC ID *image* of this CPU selector
>> *domain* would not be "contiguous" -- the APIC ID space, with the
>> above-described structure, would accommodate 4*8*2=64 logical
>> processors. For example, each APIC ID that stood for the nonexistent
>> "thread#3" on a particular core would be left unused (no CPU selector
>> would map to it).
>>
>> All in all, you can't write the APIC ID to the CPU selector register,
>> for ejection. You need to select the CPU whose APIC ID is the APIC ID
>> you want to eject, and then initiate ejection.
> 
> Yeah, this is a clear bug. Should have seen it earlier. Thanks for
> pointing it out.
> 
>>
>> This requires one of two alternatives:
>>
>>
>> (3a) The first option is to keep the change local to this patch.
>>
>> This alternative is the more CPU-hungry (and uglier) one.
>>
>> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
>> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
>> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
>> there is a match, eject.
>>
>>
>> (3b) The second option is much more elegant (and it's faster too), but
>> it requires a much more intrusive update to the patch set.
>>
>> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
>> operates on, has to be changed from APIC_ID to a structure type that
>> pairs APIC_ID with the QEMU CPU selector. [*]
>>
>> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
>> also save the "CurrentSelector" value (in the other field of the output
>> array element structure).
>>
>> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
>> replaced with a structure type similar (or identical) to the one
>> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
>> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
>> the QEMU selector for that processor, and the APIC ID.
>>
>> Fourth, the actual ejection should use the selector.
>>
>> Fifth, the debug message (below) should continue logging the APIC ID, to
>> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
>>
>>
>> Would you be willing to implement (3b)?
> 
> 3b is clearly the better solution. However, is there enough value in
> the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
> carry both the cpu-selector and APIC ID?
> 
> As you say, the ejection itself just needs the ProcessorNum -> QEMU
> cpu-selector mapping.

Good question, and I'm torn.

The default DEBUG build of OVMF enables INFO messages, and more severe
ones. It does not enable VERBOSE messages.

In such a DEBUG build (i.e., with otherwise default settings), the log
entries that relate to hot-plug do not report selectors. Conversely, on
hot-unplug, the log would report selectors *only* (not APIC IDs). I feel
this makes it more difficult to get useful OVMF debug logs in bug
reports, especially from users of distro-provided OVMF builds.

If we can ask the user to rebuild with VERBOSE enabled and re-run the
test, that's always great, but frequently users don't know how to
rebuild OVMF, plus usually OVMF is hidden so deep in their virt stack
that they have no idea how to make that stack use a custom OVMF build.

... How about this: in UnplugCpus(), we could emit an INFO message about
the ProcessorNumber <-> APIC ID <-> Selector correspondence, and when
the eject happens, it would suffice to log (at INFO level) only
ProcessorNumber <-> Selector.

Thanks,
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71064): https://edk2.groups.io/g/devel/message/71064
Mute This Topic: https://groups.io/mt/80199930/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Posted by Ankur Arora 1 year, 12 months ago
On 2021-02-02 5:23 a.m., Laszlo Ersek wrote:
> On 02/01/21 20:21, Ankur Arora wrote:
>> On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
>>> On 01/29/21 01:59, Ankur Arora wrote:
>>>> Designate a worker CPU (we use the one executing the root MMI
>>>> handler), which will do the actual ejection via QEMU in CpuEject().
>>>>
>>>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>>>> selecting its APIC ID and then sending the QEMU "eject" command.
>>>> QEMU in-turn signals the remote VCPU thread which context-switches
>>>> it out of the SMI.
>>>>
>>>> CpuEject(), on the CPU being ejected, spins around in its holding
>>>> area until this final context-switch. This does mean that there is
>>>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>>>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>>>> This unrestored state includes FPU state, CET enable, stuffing of
>>>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>>>> this should be okay.
>>>>
>>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>> Cc: Aaron Young <aaron.young@oracle.com>
>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>> ---
>>>>    OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73
>>>> ++++++++++++++++++++++++++++++++++----
>>>>    1 file changed, 67 insertions(+), 6 deletions(-)
>>>
>>> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
>>> and code too).
>>>
>>>
>>>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> index 526f51faf070..bf91344eef9c 100644
>>>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>>>      CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>>>      on each CPU at exit from SMM.
>>>>
>>>> -  If, the executing CPU is not being ejected, nothing to be done.
>>>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>>>> +  to be done.
>>>>      If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>>>      until ejected.
>>>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>>>> +  for CPUs being ejected.
>>>>
>>>>      @param[in] ProcessorNum      Index of executing CPU.
>>>>
>>>> @@ -217,6 +220,56 @@ CpuEject (
>>>>        return;
>>>>      }
>>>>
>>>> +  if (ApicId == CPU_EJECT_WORKER) {
>>>
>>> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
>>> generality). I wish I understood this idea earlier in the patch set.
>>>
>>> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
>>> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
>>> dropped.
>>>
>>> (2b) In this patch, the question whether the executing CPU is the BSP or
>>> not, should be decided with the same logic that is visible in
>>> PlatformSmmBspElection()
>>> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
>>>
>>>     MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>>>     BOOLEAN                     IsBsp;
>>>
>>>     ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>>>     IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>>>
>>> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
>>> in UnplugCpus() below.
>>>
>>> (2d) The "is worker" language (in comments etc) should be replaced with
>>> direct "is BSP" language.
>>>
>>>
>>>> +    UINT32 CpuIndex;
>>>> +
>>>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength;
>>>> CpuIndex++) {
>>>> +      UINT64 RemoveApicId;
>>>> +
>>>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>>>> +
>>>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>>>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>>>> +        //
>>>> +        // This to-be-ejected-CPU has already received the BSP's SMI
>>>> exit
>>>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>>>> +        // followed by this callback or is already waiting in the
>>>> +        // CpuDeadLoop() below.
>>>> +        //
>>>> +        // Tell QEMU to context-switch it out.
>>>> +        //
>>>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>>>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
>>>
>>> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
>>> it's not an invariant. APIC IDs have an internal structure, composed of
>>> bit-fields, where each bit-field accommodates one hierarchy level in the
>>> CPU topology (thread, core, die (maybe), and socket).
>>>
>>> However, this mapping need not be surjective. QEMU lets you create
>>> "pathological" CPU topologies, for example one with:
>>> - 3 threads/core,
>>> - 5 cores/socket,
>>> - (say) 2 sockets.
>>>
>>> Under that example, the bit-field standing for the "thread number" level
>>> would have 2 bits, theoretically permitting *4* threads/core, and the
>>> bit-field standing for the "core number" level would have 3 bits,
>>> theoretically allowing for *8* cores/socket.
>>>
>>> Considering the fully populated topology, you'd see the CPU selector
>>> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
>>> processors in total). However, the APIC ID *image* of this CPU selector
>>> *domain* would not be "contiguous" -- the APIC ID space, with the
>>> above-described structure, would accommodate 4*8*2=64 logical
>>> processors. For example, each APIC ID that stood for the nonexistent
>>> "thread#3" on a particular core would be left unused (no CPU selector
>>> would map to it).
>>>
>>> All in all, you can't write the APIC ID to the CPU selector register,
>>> for ejection. You need to select the CPU whose APIC ID is the APIC ID
>>> you want to eject, and then initiate ejection.
>>
>> Yeah, this is a clear bug. Should have seen it earlier. Thanks for
>> pointing it out.
>>
>>>
>>> This requires one of two alternatives:
>>>
>>>
>>> (3a) The first option is to keep the change local to this patch.
>>>
>>> This alternative is the more CPU-hungry (and uglier) one.
>>>
>>> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
>>> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
>>> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
>>> there is a match, eject.
>>>
>>>
>>> (3b) The second option is much more elegant (and it's faster too), but
>>> it requires a much more intrusive update to the patch set.
>>>
>>> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
>>> operates on, has to be changed from APIC_ID to a structure type that
>>> pairs APIC_ID with the QEMU CPU selector. [*]
>>>
>>> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
>>> also save the "CurrentSelector" value (in the other field of the output
>>> array element structure).
>>>
>>> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
>>> replaced with a structure type similar (or identical) to the one
>>> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
>>> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
>>> the QEMU selector for that processor, and the APIC ID.
>>>
>>> Fourth, the actual ejection should use the selector.
>>>
>>> Fifth, the debug message (below) should continue logging the APIC ID, to
>>> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
>>>
>>>
>>> Would you be willing to implement (3b)?
>>
>> 3b is clearly the better solution. However, is there enough value in
>> the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
>> carry both the cpu-selector and APIC ID?
>>
>> As you say, the ejection itself just needs the ProcessorNum -> QEMU
>> cpu-selector mapping.
> 
> Good question, and I'm torn.
> 
> The default DEBUG build of OVMF enables INFO messages, and more severe
> ones. It does not enable VERBOSE messages.
> 
> In such a DEBUG build (i.e., with otherwise default settings), the log
> entries that relate to hot-plug do not report selectors. Conversely, on
> hot-unplug, the log would report selectors *only* (not APIC IDs). I feel
> this makes it more difficult to get useful OVMF debug logs in bug
> reports, especially from users of distro-provided OVMF builds.
> 
> If we can ask the user to rebuild with VERBOSE enabled and re-run the
> test, that's always great, but frequently users don't know how to
> rebuild OVMF, plus usually OVMF is hidden so deep in their virt stack
> that they have no idea how to make that stack use a custom OVMF build.
> 
> ... How about this: in UnplugCpus(), we could emit an INFO message about
> the ProcessorNumber <-> APIC ID <-> Selector correspondence, and when
> the eject happens, it would suffice to log (at INFO level) only
> ProcessorNumber <-> Selector.

This makes sense to me. Will do.

Thanks
Ankur

> 
> Thanks,
> Laszlo
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71092): https://edk2.groups.io/g/devel/message/71092
Mute This Topic: https://groups.io/mt/80199930/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-