[edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject

Ankur Arora posted 10 patches 3 years, 8 months ago
There is a newer version of this series
[edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Posted by Ankur Arora 3 years, 8 months ago
Add logic in EjectCpu() to do the actual the CPU ejection.

On the BSP, ejection happens by first selecting the CPU via
its QemuSelector and then sending the QEMU "eject" command.
QEMU in-turn signals the remote VCPU thread which context-switches
the CPU out of the SMI handler.

Meanwhile the CPU being ejected, waits around in its holding
area until it is context-switched out. Note that it is possible
that a slow CPU gets ejected before it reaches the wait loop.
However, this would never happen before it has executed the
"AllCpusInSync" loop in SmiRendezvous().
It can mean that an ejected CPU does not execute code after
that point but given that the CPU state will be destroyed by
QEMU, the missed cleanup is no great loss.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Aaron Young <aaron.young@oracle.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---

Notes:
    Addresses the following reviewing comments from v6:
    (1) s/CpuEject/EjectCpu/g
    (2,2a,2c) Get rid of eject-worker and related.
    (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP.
    (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to
     do the actual ejection.
    (4,5a,5b) Fix the format etc in the final unplugged log message
    () Also as discussed elsewhere document the ordering requirements for
     mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler.
    () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this
     patch.
    () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/

 OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h  |   1 +
 OvmfPkg/CpuHotplugSmm/CpuHotplug.c                 | 127 +++++++++++++++++++--
 .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c  |  31 +++++
 3 files changed, 152 insertions(+), 7 deletions(-)

diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
index 2ec7a107a64d..d0e83102c13f 100644
--- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
+++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
@@ -34,6 +34,7 @@
 #define QEMU_CPUHP_STAT_ENABLED                BIT0
 #define QEMU_CPUHP_STAT_INSERT                 BIT1
 #define QEMU_CPUHP_STAT_REMOVE                 BIT2
+#define QEMU_CPUHP_STAT_EJECT                  BIT3
 #define QEMU_CPUHP_STAT_FW_REMOVE              BIT4
 
 #define QEMU_CPUHP_RW_CMD_DATA               0x8
diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
index 851e2b28aad9..0484be8fe43c 100644
--- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
+++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
@@ -18,6 +18,7 @@
 #include <Pcd/CpuHotEjectData.h>             // CPU_HOT_EJECT_DATA
 #include <Protocol/MmCpuIo.h>                // EFI_MM_CPU_IO_PROTOCOL
 #include <Protocol/SmmCpuService.h>          // EFI_SMM_CPU_SERVICE_PROTOCOL
+#include <Register/Intel/ArchitecturalMsr.h> // MSR_IA32_APIC_BASE_REGISTER
 #include <Uefi/UefiBaseType.h>               // EFI_STATUS
 
 #include "ApicId.h"                          // APIC_ID
@@ -191,12 +192,39 @@ RevokeNewSlot:
 }
 
 /**
+  EjectCpu needs to know the BSP at SMI exit at a point when
+  some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn
+  down.
+  Reuse the logic from OvmfPkg::PlatformSmmBspElection() to
+  do that.
+
+  @param[in] ProcessorNum      ProcessorNum denotes the processor handle number
+                               in EFI_SMM_CPU_SERVICE_PROTOCOL.
+**/
+STATIC
+BOOLEAN
+CheckIfBsp (
+  IN UINTN ProcessorNum
+  )
+{
+  MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
+  BOOLEAN IsBsp;
+
+  ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
+  IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
+  return IsBsp;
+}
+
+/**
   CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
   on each CPU at exit from SMM.
 
-  If, the executing CPU is not being ejected, nothing to be done.
+  If, the executing CPU is neither the BSP, nor being ejected, nothing
+  to be done.
   If, the executing CPU is being ejected, wait in a halted loop
   until ejected.
+  If, the executing CPU is the BSP, set QEMU CPU status to eject
+  for CPUs being ejected.
 
   @param[in] ProcessorNum      ProcessorNum denotes the CPU exiting SMM,
                                and will be used as an index into
@@ -211,9 +239,99 @@ EjectCpu (
   )
 {
   UINT64 QemuSelector;
+  BOOLEAN IsBsp;
 
+  IsBsp = CheckIfBsp (ProcessorNum);
+
+  //
+  // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated
+  // on the BSP in the ongoing SMI iteration at two places:
+  //
+  // - UnplugCpus() where the BSP determines if a CPU is under ejection
+  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
+  //   describes any such updates are guaranteed to be ordered-before the
+  //   dereference below.
+  //
+  // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for
+  //   CPUs after they have been hot-ejected.
+  //
+  //   The CPU under ejection: might be executing anywhere between the
+  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
+  //   dereference QemuSelectorMap[ProcessorNum].
+  //   Given that the BSP ensures that this store only happens after the
+  //   CPU has been ejected, this CPU would never see the after value.
+  //   (Note that any CPU that is already executing the CpuSleep() loop
+  //   below never raced any updates and always saw the before value.)
+  //
+  //   CPUs not-under ejection: never see any changes so they are fine.
+  //
+  //   Lastly, note that we are also guaranteed that any dereferencing
+  //   CPU only sees the before or after value and not an intermediate
+  //   value. This is because QemuSelectorMap[ProcessorNum] is aligned at
+  //   a natural boundary.
+  //
   QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
-  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
+  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) {
+    return;
+  }
+
+  if (IsBsp) {
+    UINT32 Idx;
+
+    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
+      UINT64 QemuSelector;
+
+      QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
+
+      if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
+        //
+        // This to-be-ejected-CPU has already received the BSP's SMI exit
+        // signal and, will execute SmmCpuFeaturesRendezvousExit()
+        // followed by this callback or is already waiting in the
+        // CpuSleep() loop below.
+        //
+        // Tell QEMU to context-switch it out.
+        //
+        QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
+        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
+
+        //
+        // We need a compiler barrier here to ensure that the compiler
+        // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores.
+        //
+        // A store fence is not strictly necessary on x86 which has
+        // TSO; however, both of these stores are in different address spaces
+        // so also add a Store Fence here.
+        //
+        MemoryFence ();
+
+        //
+        // Clear the eject status for this CPU Idx to ensure that an invalid
+        // SMI later does not end up trying to eject it or a newly
+        // hotplugged CPU Idx does not go into the dead loop.
+        //
+        mCpuHotEjectData->QemuSelectorMap[Idx] =
+          CPU_EJECT_QEMU_SELECTOR_INVALID;
+
+        DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, "
+          "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector));
+      }
+    }
+
+    //
+    // We are done until the next hot-unplug; clear the handler.
+    //
+    // By virtue of the MemoryFence() in the ejection loop above, the
+    // following store is ordered-after all the ejections are done.
+    // (We know that there is at least one CPU hot-eject handler if this
+    // handler was installed.)
+    //
+    // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this
+    // means that the only CPUs which might dereference
+    // mCpuHotEjectData->Handler are not under ejection, so we can
+    // safely reset.
+    //
+    mCpuHotEjectData->Handler = NULL;
     return;
   }
 
@@ -496,11 +614,6 @@ CpuHotplugMmi (
   if (EFI_ERROR (Status)) {
     goto Fatal;
   }
-  if (ToUnplugCount > 0) {
-    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
-      __FUNCTION__));
-    goto Fatal;
-  }
 
   if (PluggedCount > 0) {
     Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
index 99988285b6a2..ddfef05ee6cf 100644
--- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
+++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
@@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
   // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
   // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.)
   //
+  // mCpuHotEjectData itself is stable once setup so it can be
+  // dereferenced without needing any synchronization,
+  // but, mCpuHotEjectData->Handler is updated on the BSP in the
+  // ongoing SMI iteration at two places:
+  //
+  // - UnplugCpus() where the BSP determines if a CPU is under ejection
+  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
+  //   describes any such updates are guaranteed to be ordered-before the
+  //   dereference below.
+  //
+  // - EjectCpu() (which is called via the Handler below), on the BSP
+  //   updates mCpuHotEjectData->Handler once it is done with all ejections.
+  //
+  //   The CPU under ejection: might be executing anywhere between the
+  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
+  //   dereference the Handler field.
+  //   Given that the BSP ensures that this store only happens after all
+  //   CPUs under ejection have been ejected, this CPU would never see
+  //   the after value.
+  //   (Note that any CPU that is already executing the CpuSleep() loop
+  //   below never raced any updates and always saw the before value.)
+  //
+  //   CPUs not-under ejection: might see either value of the Handler
+  //   which is fine, because the Handler is a NOP for CPUs not-under
+  //   ejection.
+  //
+  //   Lastly, note that we are also guaranteed that any dereferencing
+  //   CPU only sees the before or after value and not an intermediate
+  //   value. This is because mCpuHotEjectData->Handler is aligned at a
+  //   natural boundary.
+  //
 
   if (mCpuHotEjectData != NULL) {
     CPU_HOT_EJECT_HANDLER Handler;
-- 
2.9.3



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71920): https://edk2.groups.io/g/devel/message/71920
Mute This Topic: https://groups.io/mt/80819864/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Posted by Laszlo Ersek 3 years, 8 months ago
On 02/22/21 08:19, Ankur Arora wrote:
> Add logic in EjectCpu() to do the actual the CPU ejection.
> 
> On the BSP, ejection happens by first selecting the CPU via
> its QemuSelector and then sending the QEMU "eject" command.
> QEMU in-turn signals the remote VCPU thread which context-switches
> the CPU out of the SMI handler.
> 
> Meanwhile the CPU being ejected, waits around in its holding
> area until it is context-switched out. Note that it is possible
> that a slow CPU gets ejected before it reaches the wait loop.
> However, this would never happen before it has executed the
> "AllCpusInSync" loop in SmiRendezvous().
> It can mean that an ejected CPU does not execute code after
> that point but given that the CPU state will be destroyed by
> QEMU, the missed cleanup is no great loss.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Aaron Young <aaron.young@oracle.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
> 
> Notes:
>     Addresses the following reviewing comments from v6:
>     (1) s/CpuEject/EjectCpu/g
>     (2,2a,2c) Get rid of eject-worker and related.
>     (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP.
>     (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to
>      do the actual ejection.
>     (4,5a,5b) Fix the format etc in the final unplugged log message
>     () Also as discussed elsewhere document the ordering requirements for
>      mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler.
>     () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this
>      patch.
>     () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/
> 
>  OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h  |   1 +
>  OvmfPkg/CpuHotplugSmm/CpuHotplug.c                 | 127 +++++++++++++++++++--
>  .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c  |  31 +++++
>  3 files changed, 152 insertions(+), 7 deletions(-)
> 
> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> index 2ec7a107a64d..d0e83102c13f 100644
> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> @@ -34,6 +34,7 @@
>  #define QEMU_CPUHP_STAT_ENABLED                BIT0
>  #define QEMU_CPUHP_STAT_INSERT                 BIT1
>  #define QEMU_CPUHP_STAT_REMOVE                 BIT2
> +#define QEMU_CPUHP_STAT_EJECT                  BIT3
>  #define QEMU_CPUHP_STAT_FW_REMOVE              BIT4
>  
>  #define QEMU_CPUHP_RW_CMD_DATA               0x8
> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> index 851e2b28aad9..0484be8fe43c 100644
> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> @@ -18,6 +18,7 @@
>  #include <Pcd/CpuHotEjectData.h>             // CPU_HOT_EJECT_DATA
>  #include <Protocol/MmCpuIo.h>                // EFI_MM_CPU_IO_PROTOCOL
>  #include <Protocol/SmmCpuService.h>          // EFI_SMM_CPU_SERVICE_PROTOCOL
> +#include <Register/Intel/ArchitecturalMsr.h> // MSR_IA32_APIC_BASE_REGISTER
>  #include <Uefi/UefiBaseType.h>               // EFI_STATUS
>  
>  #include "ApicId.h"                          // APIC_ID
> @@ -191,12 +192,39 @@ RevokeNewSlot:
>  }
>  
>  /**
> +  EjectCpu needs to know the BSP at SMI exit at a point when
> +  some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn
> +  down.
> +  Reuse the logic from OvmfPkg::PlatformSmmBspElection() to
> +  do that.
> +
> +  @param[in] ProcessorNum      ProcessorNum denotes the processor handle number
> +                               in EFI_SMM_CPU_SERVICE_PROTOCOL.
> +**/
> +STATIC
> +BOOLEAN
> +CheckIfBsp (
> +  IN UINTN ProcessorNum
> +  )

(1a) Please remove the ProcessorNum parameter -- comment and parameter
list alike --; it's useless. In the parameter list, just replace the
line's contents with "VOID".

(1b) Additionally, please state:

  @retval TRUE   If the CPU executing this function is the BSP.

  @retval FALSE  If the CPU executing this function is an AP.




> +{
> +  MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
> +  BOOLEAN IsBsp;

(2) "IsBsp" should line up with "ApicBaseMsr".


> +
> +  ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
> +  IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
> +  return IsBsp;
> +}
> +
> +/**
>    CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
>    on each CPU at exit from SMM.
>  
> -  If, the executing CPU is not being ejected, nothing to be done.
> +  If, the executing CPU is neither the BSP, nor being ejected, nothing
> +  to be done.
>    If, the executing CPU is being ejected, wait in a halted loop
>    until ejected.
> +  If, the executing CPU is the BSP, set QEMU CPU status to eject
> +  for CPUs being ejected.
>  
>    @param[in] ProcessorNum      ProcessorNum denotes the CPU exiting SMM,
>                                 and will be used as an index into
> @@ -211,9 +239,99 @@ EjectCpu (
>    )
>  {
>    UINT64 QemuSelector;
> +  BOOLEAN IsBsp;
>  
> +  IsBsp = CheckIfBsp (ProcessorNum);
> +
> +  //
> +  // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated
> +  // on the BSP in the ongoing SMI iteration at two places:

(3) I feel "iteration" doesn't really apply; I'd simply drop that word.
"ongoing SMI" seems sufficient (or maybe "ongoing SMI handling").


> +  //
> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
> +  //   describes any such updates are guaranteed to be ordered-before the
> +  //   dereference below.
> +  //
> +  // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for
> +  //   CPUs after they have been hot-ejected.
> +  //
> +  //   The CPU under ejection: might be executing anywhere between the
> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
> +  //   dereference QemuSelectorMap[ProcessorNum].
> +  //   Given that the BSP ensures that this store only happens after the
> +  //   CPU has been ejected, this CPU would never see the after value.
> +  //   (Note that any CPU that is already executing the CpuSleep() loop
> +  //   below never raced any updates and always saw the before value.)
> +  //
> +  //   CPUs not-under ejection: never see any changes so they are fine.
> +  //
> +  //   Lastly, note that we are also guaranteed that any dereferencing
> +  //   CPU only sees the before or after value and not an intermediate
> +  //   value. This is because QemuSelectorMap[ProcessorNum] is aligned at
> +  //   a natural boundary.
> +  //

(4) Well, about the last paragraph -- when I saw that you used UINT64 in
QemuSelectorMap, to allow for a sentinel value, I immediately thought of
IA32. This module is built for IA32 too, and there I'm not so sure about
the atomicity of UINT64 accesses.

I didn't raise it at that point because I wasn't sure if we were
actually going to rely on the atomicity. And, I don't think we are.
Again, let me quote Paolo's example from <https://lwn.net/Articles/844224/>:

    thread 1                              thread 2
    --------------------------------      ------------------------
    a.x = 1;
    smp_wmb();
    WRITE_ONCE(message, &a);              datum = READ_ONCE(message);
                                          smp_rmb();
                                          if (datum != NULL)
                                            printk("%x\n", datum->x);

The access to object "x" is not restricted in any particular way. In
thread#1, we have an smp_wmb() which enforces both a compiler barrier
and a store-release fence, and then we have an atomic access to
"message". But the "a.x" assignment need not be atomic.

In our case, "a.x = 1" maps to (a) populating "QemuSelectorMap", and (b)
setting up the "Handler" function pointer, in UnplugCpus(). The
smp_wmb() is the "ReleaseMemoryFence" in UnplugCpus(). The WRITE_ONCE is
the final "AllCpusInSync = FALSE" assignment in BSPHandler()
[UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c].

Regarding thread#2, the READ_ONCE is matched by the "AllCpusInSync"
loop. smp_rmb() is the "AcquireMemoryFence" I called for, under PATCH v8
07/10, "OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler". And the
"datum" access happens (a) in SmmCpuFeaturesRendezvousExit(), where we
fetch/call Handler, and (b) here, where we read/modify "QemuSelectorMap".

So this seems to imply we can:

- drop the natural-alignment padding of "QemuSelectorMap", from PATCH v8
06/10, "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state",

- drop the last paragraph of the comment above.


>    QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
> -  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
> +  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) {
> +    return;
> +  }
> +
> +  if (IsBsp) {

(5) Can you reorder the high-level flow here? Such as:

  if (CheckIfBsp ()) {
    //
    // send the eject requests to QEMU here
    //
    return;
  }

  //
  // Reached only on APs:
  //
  QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
    return;
  }

  //
  // AP being hot-ejected; pen it here.
  //


> +    UINT32 Idx;
> +
> +    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
> +      UINT64 QemuSelector;
> +
> +      QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
> +
> +      if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
> +        //
> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
> +        // signal and, will execute SmmCpuFeaturesRendezvousExit()
> +        // followed by this callback or is already waiting in the
> +        // CpuSleep() loop below.
> +        //
> +        // Tell QEMU to context-switch it out.
> +        //
> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
> +
> +        //
> +        // We need a compiler barrier here to ensure that the compiler
> +        // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores.
> +        //
> +        // A store fence is not strictly necessary on x86 which has
> +        // TSO; however, both of these stores are in different address spaces
> +        // so also add a Store Fence here.
> +        //
> +        MemoryFence ();

(6) I wonder if this compiler barrier + comment block are helpful.
Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors
didn't contain built-in fences, all hell would break lose. We're using
EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe
ordering-wise, even without an explicit compiler barrier here.

To me personally, this particular fence only muddies the picture --
where we already have an acquire memory fence and a store memory fence
to couple with each other.

I'd recommend removing this. (If you disagree, I'm willing to listen to
arguments, of course!)


> +
> +        //
> +        // Clear the eject status for this CPU Idx to ensure that an invalid
> +        // SMI later does not end up trying to eject it or a newly
> +        // hotplugged CPU Idx does not go into the dead loop.
> +        //
> +        mCpuHotEjectData->QemuSelectorMap[Idx] =
> +          CPU_EJECT_QEMU_SELECTOR_INVALID;
> +
> +        DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, "
> +          "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector));

(7) Please log QemuSelector with "%Lu".


> +      }
> +    }
> +
> +    //
> +    // We are done until the next hot-unplug; clear the handler.
> +    //
> +    // By virtue of the MemoryFence() in the ejection loop above, the
> +    // following store is ordered-after all the ejections are done.
> +    // (We know that there is at least one CPU hot-eject handler if this
> +    // handler was installed.)
> +    //
> +    // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this
> +    // means that the only CPUs which might dereference
> +    // mCpuHotEjectData->Handler are not under ejection, so we can
> +    // safely reset.
> +    //
> +    mCpuHotEjectData->Handler = NULL;
>      return;
>    }
>  
> @@ -496,11 +614,6 @@ CpuHotplugMmi (
>    if (EFI_ERROR (Status)) {
>      goto Fatal;
>    }
> -  if (ToUnplugCount > 0) {
> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
> -      __FUNCTION__));
> -    goto Fatal;
> -  }
>  
>    if (PluggedCount > 0) {
>      Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> index 99988285b6a2..ddfef05ee6cf 100644
> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
>    // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
>    // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.)
>    //
> +  // mCpuHotEjectData itself is stable once setup so it can be
> +  // dereferenced without needing any synchronization,
> +  // but, mCpuHotEjectData->Handler is updated on the BSP in the
> +  // ongoing SMI iteration at two places:
> +  //
> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
> +  //   describes any such updates are guaranteed to be ordered-before the
> +  //   dereference below.
> +  //
> +  // - EjectCpu() (which is called via the Handler below), on the BSP
> +  //   updates mCpuHotEjectData->Handler once it is done with all ejections.
> +  //
> +  //   The CPU under ejection: might be executing anywhere between the
> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
> +  //   dereference the Handler field.
> +  //   Given that the BSP ensures that this store only happens after all
> +  //   CPUs under ejection have been ejected, this CPU would never see
> +  //   the after value.
> +  //   (Note that any CPU that is already executing the CpuSleep() loop
> +  //   below never raced any updates and always saw the before value.)
> +  //
> +  //   CPUs not-under ejection: might see either value of the Handler
> +  //   which is fine, because the Handler is a NOP for CPUs not-under
> +  //   ejection.
> +  //
> +  //   Lastly, note that we are also guaranteed that any dereferencing
> +  //   CPU only sees the before or after value and not an intermediate
> +  //   value. This is because mCpuHotEjectData->Handler is aligned at a
> +  //   natural boundary.
> +  //
>  
>    if (mCpuHotEjectData != NULL) {
>      CPU_HOT_EJECT_HANDLER Handler;
> 

(8) I can't really put my finger on it, I just feel that repeating
(open-coding) this wall of text here is not really productive.

Do you think that, after you add the "acquire memory fence" comment in
patch #7, we could avoid most of the text here? I think we should only
point out (in patch #7) the "release fence" that the logic here pairs with.

If you really want to present it all from both perspectives, I guess I'm
OK with that, but then I believe we should drop the last paragraph at
least (see point (4)).

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#72116): https://edk2.groups.io/g/devel/message/72116
Mute This Topic: https://groups.io/mt/80819864/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Posted by Ankur Arora 3 years, 8 months ago
On 2021-02-23 1:39 p.m., Laszlo Ersek wrote:
> On 02/22/21 08:19, Ankur Arora wrote:
>> Add logic in EjectCpu() to do the actual the CPU ejection.
>>
>> On the BSP, ejection happens by first selecting the CPU via
>> its QemuSelector and then sending the QEMU "eject" command.
>> QEMU in-turn signals the remote VCPU thread which context-switches
>> the CPU out of the SMI handler.
>>
>> Meanwhile the CPU being ejected, waits around in its holding
>> area until it is context-switched out. Note that it is possible
>> that a slow CPU gets ejected before it reaches the wait loop.
>> However, this would never happen before it has executed the
>> "AllCpusInSync" loop in SmiRendezvous().
>> It can mean that an ejected CPU does not execute code after
>> that point but given that the CPU state will be destroyed by
>> QEMU, the missed cleanup is no great loss.
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Aaron Young <aaron.young@oracle.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>
>> Notes:
>>      Addresses the following reviewing comments from v6:
>>      (1) s/CpuEject/EjectCpu/g
>>      (2,2a,2c) Get rid of eject-worker and related.
>>      (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP.
>>      (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to
>>       do the actual ejection.
>>      (4,5a,5b) Fix the format etc in the final unplugged log message
>>      () Also as discussed elsewhere document the ordering requirements for
>>       mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler.
>>      () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this
>>       patch.
>>      () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/
>>
>>   OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h  |   1 +
>>   OvmfPkg/CpuHotplugSmm/CpuHotplug.c                 | 127 +++++++++++++++++++--
>>   .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c  |  31 +++++
>>   3 files changed, 152 insertions(+), 7 deletions(-)
>>
>> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> index 2ec7a107a64d..d0e83102c13f 100644
>> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> @@ -34,6 +34,7 @@
>>   #define QEMU_CPUHP_STAT_ENABLED                BIT0
>>   #define QEMU_CPUHP_STAT_INSERT                 BIT1
>>   #define QEMU_CPUHP_STAT_REMOVE                 BIT2
>> +#define QEMU_CPUHP_STAT_EJECT                  BIT3
>>   #define QEMU_CPUHP_STAT_FW_REMOVE              BIT4
>>   
>>   #define QEMU_CPUHP_RW_CMD_DATA               0x8
>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> index 851e2b28aad9..0484be8fe43c 100644
>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> @@ -18,6 +18,7 @@
>>   #include <Pcd/CpuHotEjectData.h>             // CPU_HOT_EJECT_DATA
>>   #include <Protocol/MmCpuIo.h>                // EFI_MM_CPU_IO_PROTOCOL
>>   #include <Protocol/SmmCpuService.h>          // EFI_SMM_CPU_SERVICE_PROTOCOL
>> +#include <Register/Intel/ArchitecturalMsr.h> // MSR_IA32_APIC_BASE_REGISTER
>>   #include <Uefi/UefiBaseType.h>               // EFI_STATUS
>>   
>>   #include "ApicId.h"                          // APIC_ID
>> @@ -191,12 +192,39 @@ RevokeNewSlot:
>>   }
>>   
>>   /**
>> +  EjectCpu needs to know the BSP at SMI exit at a point when
>> +  some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn
>> +  down.
>> +  Reuse the logic from OvmfPkg::PlatformSmmBspElection() to
>> +  do that.
>> +
>> +  @param[in] ProcessorNum      ProcessorNum denotes the processor handle number
>> +                               in EFI_SMM_CPU_SERVICE_PROTOCOL.
>> +**/
>> +STATIC
>> +BOOLEAN
>> +CheckIfBsp (
>> +  IN UINTN ProcessorNum
>> +  )
> 
> (1a) Please remove the ProcessorNum parameter -- comment and parameter
> list alike --; it's useless. In the parameter list, just replace the
> line's contents with "VOID".

Heh. Yeah, I'm not certain why I added that.
  
> (1b) Additionally, please state:
> 
>    @retval TRUE   If the CPU executing this function is the BSP.
> 
>    @retval FALSE  If the CPU executing this function is an AP.
> 

Will add.

> 
> 
> 
>> +{
>> +  MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>> +  BOOLEAN IsBsp;
> 
> (2) "IsBsp" should line up with "ApicBaseMsr".
> 
> 
>> +
>> +  ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>> +  IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>> +  return IsBsp;
>> +}
>> +
>> +/**
>>     CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
>>     on each CPU at exit from SMM.
>>   
>> -  If, the executing CPU is not being ejected, nothing to be done.
>> +  If, the executing CPU is neither the BSP, nor being ejected, nothing
>> +  to be done.
>>     If, the executing CPU is being ejected, wait in a halted loop
>>     until ejected.
>> +  If, the executing CPU is the BSP, set QEMU CPU status to eject
>> +  for CPUs being ejected.
>>   
>>     @param[in] ProcessorNum      ProcessorNum denotes the CPU exiting SMM,
>>                                  and will be used as an index into
>> @@ -211,9 +239,99 @@ EjectCpu (
>>     )
>>   {
>>     UINT64 QemuSelector;
>> +  BOOLEAN IsBsp;
>>   
>> +  IsBsp = CheckIfBsp (ProcessorNum);
>> +
>> +  //
>> +  // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated
>> +  // on the BSP in the ongoing SMI iteration at two places:
> 
> (3) I feel "iteration" doesn't really apply; I'd simply drop that word.
> "ongoing SMI" seems sufficient (or maybe "ongoing SMI handling").

Yeah that reads better.

> 
> 
>> +  //
>> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
>> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
>> +  //   describes any such updates are guaranteed to be ordered-before the
>> +  //   dereference below.
>> +  //
>> +  // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for
>> +  //   CPUs after they have been hot-ejected.
>> +  //
>> +  //   The CPU under ejection: might be executing anywhere between the
>> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
>> +  //   dereference QemuSelectorMap[ProcessorNum].
>> +  //   Given that the BSP ensures that this store only happens after the
>> +  //   CPU has been ejected, this CPU would never see the after value.
>> +  //   (Note that any CPU that is already executing the CpuSleep() loop
>> +  //   below never raced any updates and always saw the before value.)
>> +  //
>> +  //   CPUs not-under ejection: never see any changes so they are fine.
>> +  //
>> +  //   Lastly, note that we are also guaranteed that any dereferencing
>> +  //   CPU only sees the before or after value and not an intermediate
>> +  //   value. This is because QemuSelectorMap[ProcessorNum] is aligned at
>> +  //   a natural boundary.
>> +  //
> 
> (4) Well, about the last paragraph -- when I saw that you used UINT64 in
> QemuSelectorMap, to allow for a sentinel value, I immediately thought of
> IA32. This module is built for IA32 too, and there I'm not so sure about
> the atomicity of UINT64 accesses.
> 
> I didn't raise it at that point because I wasn't sure if we were
> actually going to rely on the atomicity. And, I don't think we are.
> Again, let me quote Paolo's example from <https://lwn.net/Articles/844224/>:
> 
>      thread 1                              thread 2
>      --------------------------------      ------------------------
>      a.x = 1;
>      smp_wmb();
>      WRITE_ONCE(message, &a);              datum = READ_ONCE(message);
>                                            smp_rmb();
>                                            if (datum != NULL)
>                                              printk("%x\n", datum->x);
> 
> The access to object "x" is not restricted in any particular way. In
> thread#1, we have an smp_wmb() which enforces both a compiler barrier
> and a store-release fence, and then we have an atomic access to
> "message". But the "a.x" assignment need not be atomic.

I concur. I think I conflated atomic access to "message" with also
needing atomic access for the variables we need to ensure transitivity
on.
As you say below, this means that a bunch of this logic (and comments)
can be simplified.
  
> In our case, "a.x = 1" maps to (a) populating "QemuSelectorMap", and (b)
> setting up the "Handler" function pointer, in UnplugCpus(). The
> smp_wmb() is the "ReleaseMemoryFence" in UnplugCpus(). The WRITE_ONCE is
> the final "AllCpusInSync = FALSE" assignment in BSPHandler()
> [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c].
> 
> Regarding thread#2, the READ_ONCE is matched by the "AllCpusInSync"
> loop. smp_rmb() is the "AcquireMemoryFence" I called for, under PATCH v8
> 07/10, "OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler". And the
> "datum" access happens (a) in SmmCpuFeaturesRendezvousExit(), where we
> fetch/call Handler, and (b) here, where we read/modify "QemuSelectorMap".
> 
> So this seems to imply we can:
> 
> - drop the natural-alignment padding of "QemuSelectorMap", from PATCH v8
> 06/10, "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state",
> 
> - drop the last paragraph of the comment above.

Will do.

> 
> 
>>     QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
>> -  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
>> +  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) {
>> +    return;
>> +  }
>> +
>> +  if (IsBsp) {
> 
> (5) Can you reorder the high-level flow here? Such as:
> 
>    if (CheckIfBsp ()) {
>      //
>      // send the eject requests to QEMU here
>      //
>      return;
>    }
> 
>    //
>    // Reached only on APs:
>    //
>    QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
>    if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
>      return;
>    }
> 
>    //
>    // AP being hot-ejected; pen it here.
>    //

Yeah, this is clearer.

> 
> 
>> +    UINT32 Idx;
>> +
>> +    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
>> +      UINT64 QemuSelector;
>> +
>> +      QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
>> +
>> +      if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
>> +        //
>> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
>> +        // signal and, will execute SmmCpuFeaturesRendezvousExit()
>> +        // followed by this callback or is already waiting in the
>> +        // CpuSleep() loop below.
>> +        //
>> +        // Tell QEMU to context-switch it out.
>> +        //
>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
>> +
>> +        //
>> +        // We need a compiler barrier here to ensure that the compiler
>> +        // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores.
>> +        //
>> +        // A store fence is not strictly necessary on x86 which has
>> +        // TSO; however, both of these stores are in different address spaces
>> +        // so also add a Store Fence here.
>> +        //
>> +        MemoryFence ();
> 
> (6) I wonder if this compiler barrier + comment block are helpful.
> Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors
> didn't contain built-in fences, all hell would break lose. We're using
> EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe
> ordering-wise, even without an explicit compiler barrier here.
> 
> To me personally, this particular fence only muddies the picture --
> where we already have an acquire memory fence and a store memory fence
> to couple with each other.
> 
> I'd recommend removing this. (If you disagree, I'm willing to listen to
> arguments, of course!)

You are right that we don't need a memory fence here -- given that there
is an implicit fence due to the MMIO.

As for the compiler fence, I'm just now re-looking at handlers in
EFI_MM_CPU_IO_PROTOCOL and they do seem to include a compiler barrier.

So I agree with you that we have all the fences that we need. However,
I do think it's a good idea to document both of these here.

> 
>> +
>> +        //
>> +        // Clear the eject status for this CPU Idx to ensure that an invalid
>> +        // SMI later does not end up trying to eject it or a newly
>> +        // hotplugged CPU Idx does not go into the dead loop.
>> +        //
>> +        mCpuHotEjectData->QemuSelectorMap[Idx] =
>> +          CPU_EJECT_QEMU_SELECTOR_INVALID;
>> +
>> +        DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, "
>> +          "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector));
> 
> (7) Please log QemuSelector with "%Lu".

Ack.
> 
> 
>> +      }
>> +    }
>> +
>> +    //
>> +    // We are done until the next hot-unplug; clear the handler.
>> +    //
>> +    // By virtue of the MemoryFence() in the ejection loop above, the
>> +    // following store is ordered-after all the ejections are done.
>> +    // (We know that there is at least one CPU hot-eject handler if this
>> +    // handler was installed.)
>> +    //
>> +    // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this
>> +    // means that the only CPUs which might dereference
>> +    // mCpuHotEjectData->Handler are not under ejection, so we can
>> +    // safely reset.
>> +    //
>> +    mCpuHotEjectData->Handler = NULL;
>>       return;
>>     }
>>   
>> @@ -496,11 +614,6 @@ CpuHotplugMmi (
>>     if (EFI_ERROR (Status)) {
>>       goto Fatal;
>>     }
>> -  if (ToUnplugCount > 0) {
>> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
>> -      __FUNCTION__));
>> -    goto Fatal;
>> -  }
>>   
>>     if (PluggedCount > 0) {
>>       Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
>> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> index 99988285b6a2..ddfef05ee6cf 100644
>> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
>>     // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
>>     // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.)
>>     //
>> +  // mCpuHotEjectData itself is stable once setup so it can be
>> +  // dereferenced without needing any synchronization,
>> +  // but, mCpuHotEjectData->Handler is updated on the BSP in the
>> +  // ongoing SMI iteration at two places:
>> +  //
>> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
>> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
>> +  //   describes any such updates are guaranteed to be ordered-before the
>> +  //   dereference below.
>> +  //
>> +  // - EjectCpu() (which is called via the Handler below), on the BSP
>> +  //   updates mCpuHotEjectData->Handler once it is done with all ejections.
>> +  //
>> +  //   The CPU under ejection: might be executing anywhere between the
>> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
>> +  //   dereference the Handler field.
>> +  //   Given that the BSP ensures that this store only happens after all
>> +  //   CPUs under ejection have been ejected, this CPU would never see
>> +  //   the after value.
>> +  //   (Note that any CPU that is already executing the CpuSleep() loop
>> +  //   below never raced any updates and always saw the before value.)
>> +  //
>> +  //   CPUs not-under ejection: might see either value of the Handler
>> +  //   which is fine, because the Handler is a NOP for CPUs not-under
>> +  //   ejection.
>> +  //
>> +  //   Lastly, note that we are also guaranteed that any dereferencing
>> +  //   CPU only sees the before or after value and not an intermediate
>> +  //   value. This is because mCpuHotEjectData->Handler is aligned at a
>> +  //   natural boundary.
>> +  //
>>   
>>     if (mCpuHotEjectData != NULL) {
>>       CPU_HOT_EJECT_HANDLER Handler;
>>
> 
> (8) I can't really put my finger on it, I just feel that repeating
> (open-coding) this wall of text here is not really productive.

Part of the reason I wanted to document this here was to get your
opinion on it and figure out how much of it is useful and how
much might be overkill.

> 
> Do you think that, after you add the "acquire memory fence" comment in
> patch #7, we could avoid most of the text here? I think we should only
> point out (in patch #7) the "release fence" that the logic here pairs with.> 
> If you really want to present it all from both perspectives, I guess I'm
> OK with that, but then I believe we should drop the last paragraph at
> least (see point (4)).

Rereading it after a gap of a few days and given that most of this is
just a repeat, I'm also tending towards overkill. I think a comment
talking about acquire/release pairing is useful. Rest of it can probably
be met with just a pointer towards the comment in EjectCpus(). Does that
make sense?

Thanks
Ankur


> 
> Thanks
> Laszlo
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#72144): https://edk2.groups.io/g/devel/message/72144
Mute This Topic: https://groups.io/mt/80819864/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Posted by Laszlo Ersek 3 years, 8 months ago
On 02/24/21 04:44, Ankur Arora wrote:
> On 2021-02-23 1:39 p.m., Laszlo Ersek wrote:
>> On 02/22/21 08:19, Ankur Arora wrote:

>>> +    UINT32 Idx;
>>> +
>>> +    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
>>> +      UINT64 QemuSelector;
>>> +
>>> +      QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
>>> +
>>> +      if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
>>> +        //
>>> +        // This to-be-ejected-CPU has already received the BSP's SMI
>>> exit
>>> +        // signal and, will execute SmmCpuFeaturesRendezvousExit()
>>> +        // followed by this callback or is already waiting in the
>>> +        // CpuSleep() loop below.
>>> +        //
>>> +        // Tell QEMU to context-switch it out.
>>> +        //
>>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
>>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
>>> +
>>> +        //
>>> +        // We need a compiler barrier here to ensure that the compiler
>>> +        // does not reorder the CpuStatus and QemuSelectorMap[Idx]
>>> stores.
>>> +        //
>>> +        // A store fence is not strictly necessary on x86 which has
>>> +        // TSO; however, both of these stores are in different
>>> address spaces
>>> +        // so also add a Store Fence here.
>>> +        //
>>> +        MemoryFence ();
>>
>> (6) I wonder if this compiler barrier + comment block are helpful.
>> Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors
>> didn't contain built-in fences, all hell would break lose. We're using
>> EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe
>> ordering-wise, even without an explicit compiler barrier here.
>>
>> To me personally, this particular fence only muddies the picture --
>> where we already have an acquire memory fence and a store memory fence
>> to couple with each other.
>>
>> I'd recommend removing this. (If you disagree, I'm willing to listen to
>> arguments, of course!)
> 
> You are right that we don't need a memory fence here -- given that there
> is an implicit fence due to the MMIO.
> 
> As for the compiler fence, I'm just now re-looking at handlers in
> EFI_MM_CPU_IO_PROTOCOL and they do seem to include a compiler barrier.
> 
> So I agree with you that we have all the fences that we need. However,
> I do think it's a good idea to document both of these here.

OK.

>>> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>>> b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>>> index 99988285b6a2..ddfef05ee6cf 100644
>>> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>>> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>>> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
>>>     // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
>>>     // in this SMI exit (otherwise mCpuHotEjectData->Handler is not
>>> armed.)
>>>     //
>>> +  // mCpuHotEjectData itself is stable once setup so it can be
>>> +  // dereferenced without needing any synchronization,
>>> +  // but, mCpuHotEjectData->Handler is updated on the BSP in the
>>> +  // ongoing SMI iteration at two places:
>>> +  //
>>> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
>>> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
>>> +  //   describes any such updates are guaranteed to be
>>> ordered-before the
>>> +  //   dereference below.
>>> +  //
>>> +  // - EjectCpu() (which is called via the Handler below), on the BSP
>>> +  //   updates mCpuHotEjectData->Handler once it is done with all
>>> ejections.
>>> +  //
>>> +  //   The CPU under ejection: might be executing anywhere between the
>>> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
>>> +  //   dereference the Handler field.
>>> +  //   Given that the BSP ensures that this store only happens after
>>> all
>>> +  //   CPUs under ejection have been ejected, this CPU would never see
>>> +  //   the after value.
>>> +  //   (Note that any CPU that is already executing the CpuSleep() loop
>>> +  //   below never raced any updates and always saw the before value.)
>>> +  //
>>> +  //   CPUs not-under ejection: might see either value of the Handler
>>> +  //   which is fine, because the Handler is a NOP for CPUs not-under
>>> +  //   ejection.
>>> +  //
>>> +  //   Lastly, note that we are also guaranteed that any dereferencing
>>> +  //   CPU only sees the before or after value and not an intermediate
>>> +  //   value. This is because mCpuHotEjectData->Handler is aligned at a
>>> +  //   natural boundary.
>>> +  //
>>>       if (mCpuHotEjectData != NULL) {
>>>       CPU_HOT_EJECT_HANDLER Handler;
>>>
>>
>> (8) I can't really put my finger on it, I just feel that repeating
>> (open-coding) this wall of text here is not really productive.
> 
> Part of the reason I wanted to document this here was to get your
> opinion on it and figure out how much of it is useful and how
> much might be overkill.
> 
>>
>> Do you think that, after you add the "acquire memory fence" comment in
>> patch #7, we could avoid most of the text here? I think we should only
>> point out (in patch #7) the "release fence" that the logic here pairs
>> with.> If you really want to present it all from both perspectives, I
>> guess I'm
>> OK with that, but then I believe we should drop the last paragraph at
>> least (see point (4)).
> 
> Rereading it after a gap of a few days and given that most of this is
> just a repeat, I'm also tending towards overkill. I think a comment
> talking about acquire/release pairing is useful. Rest of it can probably
> be met with just a pointer towards the comment in EjectCpus(). Does that
> make sense?

Yes, absolutely. Short comment + pointer to the "other half" (which has
the large comment too) seem best.

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#72210): https://edk2.groups.io/g/devel/message/72210
Mute This Topic: https://groups.io/mt/80819864/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-