[v2] Optimize load uCode performance

[edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.

Posted by Eric Dong 7 years, 7 months ago

Read uCode from memory has better performance than from flash.
But it needs extra effort to let BSP copy uCode from flash to
memory. Also BSP already enable cache in SEC phase, so it use
less time to relocate uCode from flash to memory. After
verification, if system has more than one processor, it will
reduce some time if load uCode from memory.

This change enable this optimization.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 108eea0a6f..c3cd6d7d51 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1520,6 +1520,7 @@ MpInitLibInitialize (
   UINTN                    ApResetVectorSize;
   UINTN                    BackupBufferAddr;
   UINTN                    ApIdtBase;
+  VOID                     *MicrocodePatchInRam;
 
   OldCpuMpData = GetCpuMpDataFromGuidedHob ();
   if (OldCpuMpData == NULL) {
@@ -1587,8 +1588,39 @@ MpInitLibInitialize (
   CpuMpData->SwitchBspFlag    = FALSE;
   CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
   CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
-  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
   CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
+  //
+  // If platform has more than one CPU, relocate microcode to memory to reduce
+  // loading microcode time.
+  //
+  MicrocodePatchInRam = NULL;
+  if (MaxLogicalProcessorNumber > 1) {
+    MicrocodePatchInRam = AllocatePages (
+                            EFI_SIZE_TO_PAGES (
+                              (UINTN)CpuMpData->MicrocodePatchRegionSize
+                              )
+                            );
+    ASSERT (MicrocodePatchInRam != NULL);
+  }
+  if (MicrocodePatchInRam == NULL) {
+    //
+    // there is only one processor, or no microcode patch is available, or
+    // memory allocation failed
+    //
+    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
+  } else {
+    //
+    // there are multiple processors, and a microcode patch is available, and
+    // memory allocation succeeded
+    //
+    CopyMem (
+      MicrocodePatchInRam,
+      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
+      (UINTN)CpuMpData->MicrocodePatchRegionSize
+      );
+    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
+  }
+
   InitializeSpinLock(&CpuMpData->MpLock);
 
   //
-- 
2.15.0.windows.1

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.

Posted by Laszlo Ersek 7 years, 7 months ago

On 07/12/18 12:49, Eric Dong wrote:
> Read uCode from memory has better performance than from flash.
> But it needs extra effort to let BSP copy uCode from flash to
> memory. Also BSP already enable cache in SEC phase, so it use
> less time to relocate uCode from flash to memory. After
> verification, if system has more than one processor, it will
> reduce some time if load uCode from memory.
> 
> This change enable this optimization.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 108eea0a6f..c3cd6d7d51 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
>    UINTN                    ApResetVectorSize;
>    UINTN                    BackupBufferAddr;
>    UINTN                    ApIdtBase;
> +  VOID                     *MicrocodePatchInRam;
>  
>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
>    if (OldCpuMpData == NULL) {
> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
>    CpuMpData->SwitchBspFlag    = FALSE;
>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
> -  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
>    CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
> +  //
> +  // If platform has more than one CPU, relocate microcode to memory to reduce
> +  // loading microcode time.
> +  //
> +  MicrocodePatchInRam = NULL;
> +  if (MaxLogicalProcessorNumber > 1) {
> +    MicrocodePatchInRam = AllocatePages (
> +                            EFI_SIZE_TO_PAGES (
> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
> +                              )
> +                            );
> +    ASSERT (MicrocodePatchInRam != NULL);
> +  }
> +  if (MicrocodePatchInRam == NULL) {
> +    //
> +    // there is only one processor, or no microcode patch is available, or
> +    // memory allocation failed
> +    //
> +    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
> +  } else {
> +    //
> +    // there are multiple processors, and a microcode patch is available, and
> +    // memory allocation succeeded
> +    //
> +    CopyMem (
> +      MicrocodePatchInRam,
> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
> +      );
> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
> +  }
> +
>    InitializeSpinLock(&CpuMpData->MpLock);
>  
>    //
> 

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.

Posted by Laszlo Ersek 7 years, 7 months ago

On 07/12/18 23:59, Laszlo Ersek wrote:
> On 07/12/18 12:49, Eric Dong wrote:
>> Read uCode from memory has better performance than from flash.
>> But it needs extra effort to let BSP copy uCode from flash to
>> memory. Also BSP already enable cache in SEC phase, so it use
>> less time to relocate uCode from flash to memory. After
>> verification, if system has more than one processor, it will
>> reduce some time if load uCode from memory.
>>
>> This change enable this optimization.
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Eric Dong <eric.dong@intel.com>
>> ---
>>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index 108eea0a6f..c3cd6d7d51 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
>>    UINTN                    ApResetVectorSize;
>>    UINTN                    BackupBufferAddr;
>>    UINTN                    ApIdtBase;
>> +  VOID                     *MicrocodePatchInRam;
>>  
>>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
>>    if (OldCpuMpData == NULL) {
>> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
>>    CpuMpData->SwitchBspFlag    = FALSE;
>>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
>>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
>> -  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
>>    CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
>> +  //
>> +  // If platform has more than one CPU, relocate microcode to memory to reduce
>> +  // loading microcode time.
>> +  //
>> +  MicrocodePatchInRam = NULL;
>> +  if (MaxLogicalProcessorNumber > 1) {
>> +    MicrocodePatchInRam = AllocatePages (
>> +                            EFI_SIZE_TO_PAGES (
>> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
>> +                              )
>> +                            );
>> +    ASSERT (MicrocodePatchInRam != NULL);
>> +  }
>> +  if (MicrocodePatchInRam == NULL) {
>> +    //
>> +    // there is only one processor, or no microcode patch is available, or
>> +    // memory allocation failed
>> +    //
>> +    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
>> +  } else {
>> +    //
>> +    // there are multiple processors, and a microcode patch is available, and
>> +    // memory allocation succeeded
>> +    //
>> +    CopyMem (
>> +      MicrocodePatchInRam,
>> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
>> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
>> +      );
>> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
>> +  }
>> +
>>    InitializeSpinLock(&CpuMpData->MpLock);
>>  
>>    //
>>
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 

Sorry, I have to take that back -- please do not commit this patch.

For this version of the patch:

Nacked-by: Laszlo Ersek <lersek@redhat.com>

The ASSERT() is wrong. Again, in the code above, AllocatePages() can
return NULL not only because of memory allocation failure, but also
because the number of pages to allocate can be zero! If the platform has
no microcode patch to apply.

I knew this full well when I suggested the code, but then I forgot about
it when you mentioned the ASSERT(). I think you also knew about it, and
forgot about it too. :)

In particular, this patch would make it *impossible* to boot OVMF with
multiple processors, because OVMF *never* provides a microcode update.

So, please remove the ASSERT.

Alternatively, you could modify the ASSERT() like this:

  //
  // if we attempt actual memory allocation, we expect it to succeed
  //
  ASSERT (
    (CpuMpData->MicrocodePatchRegionSize == 0) ||
    (MicrocodePatchInRam != NULL)
    );

Thanks,
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.

Posted by Dong, Eric 7 years, 7 months ago

Hi Laszlo,

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Laszlo Ersek
> Sent: Friday, July 13, 2018 6:14 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to
> memory to save time.
> 
> On 07/12/18 23:59, Laszlo Ersek wrote:
> > On 07/12/18 12:49, Eric Dong wrote:
> >> Read uCode from memory has better performance than from flash.
> >> But it needs extra effort to let BSP copy uCode from flash to memory.
> >> Also BSP already enable cache in SEC phase, so it use less time to
> >> relocate uCode from flash to memory. After verification, if system
> >> has more than one processor, it will reduce some time if load uCode
> >> from memory.
> >>
> >> This change enable this optimization.
> >>
> >> Cc: Laszlo Ersek <lersek@redhat.com>
> >> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Eric Dong <eric.dong@intel.com>
> >> ---
> >>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34
> >> +++++++++++++++++++++++++++++++++-
> >>  1 file changed, 33 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> index 108eea0a6f..c3cd6d7d51 100644
> >> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
> >>    UINTN                    ApResetVectorSize;
> >>    UINTN                    BackupBufferAddr;
> >>    UINTN                    ApIdtBase;
> >> +  VOID                     *MicrocodePatchInRam;
> >>
> >>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
> >>    if (OldCpuMpData == NULL) {
> >> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
> >>    CpuMpData->SwitchBspFlag    = FALSE;
> >>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
> >>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData-
> >CpuData + MaxLogicalProcessorNumber);
> >> -  CpuMpData->MicrocodePatchAddress    = PcdGet64
> (PcdCpuMicrocodePatchAddress);
> >>    CpuMpData->MicrocodePatchRegionSize = PcdGet64
> >> (PcdCpuMicrocodePatchRegionSize);
> >> +  //
> >> +  // If platform has more than one CPU, relocate microcode to memory
> >> + to reduce  // loading microcode time.
> >> +  //
> >> +  MicrocodePatchInRam = NULL;
> >> +  if (MaxLogicalProcessorNumber > 1) {
> >> +    MicrocodePatchInRam = AllocatePages (
> >> +                            EFI_SIZE_TO_PAGES (
> >> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
> >> +                              )
> >> +                            );
> >> +    ASSERT (MicrocodePatchInRam != NULL);  }  if
> >> + (MicrocodePatchInRam == NULL) {
> >> +    //
> >> +    // there is only one processor, or no microcode patch is available, or
> >> +    // memory allocation failed
> >> +    //
> >> +    CpuMpData->MicrocodePatchAddress = PcdGet64
> >> + (PcdCpuMicrocodePatchAddress);  } else {
> >> +    //
> >> +    // there are multiple processors, and a microcode patch is available,
> and
> >> +    // memory allocation succeeded
> >> +    //
> >> +    CopyMem (
> >> +      MicrocodePatchInRam,
> >> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
> >> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
> >> +      );
> >> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
> >> + }
> >> +
> >>    InitializeSpinLock(&CpuMpData->MpLock);
> >>
> >>    //
> >>
> >
> > Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> >
> 
> Sorry, I have to take that back -- please do not commit this patch.
> 
> For this version of the patch:
> 
> Nacked-by: Laszlo Ersek <lersek@redhat.com>
> 
> The ASSERT() is wrong. Again, in the code above, AllocatePages() can return
> NULL not only because of memory allocation failure, but also because the
> number of pages to allocate can be zero! If the platform has no microcode
> patch to apply.
> 
> I knew this full well when I suggested the code, but then I forgot about it when
> you mentioned the ASSERT(). I think you also knew about it, and forgot about
> it too. :)
> 
> In particular, this patch would make it *impossible* to boot OVMF with
> multiple processors, because OVMF *never* provides a microcode update.
> 
> So, please remove the ASSERT.
> 

Agree, I removed ASSERT code and send V3 patches.

> Alternatively, you could modify the ASSERT() like this:
> 
>   //
>   // if we attempt actual memory allocation, we expect it to succeed
>   //
>   ASSERT (
>     (CpuMpData->MicrocodePatchRegionSize == 0) ||
>     (MicrocodePatchInRam != NULL)
>     );
> 
> Thanks,
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

[edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
[edk2] [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
[edk2] [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core.