[edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix #AC split lock

John E Lofgren posted 1 patch 4 years, 7 months ago
Failed in applying to current master (apply log)
UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
[edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix #AC split lock
Posted by John E Lofgren 4 years, 7 months ago
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2150

V2 changes:
Add xchg 16 bit instructions to handle sgdt and sidt base
63:48 bits and 47:32 bits.
Add comment to explain why xchg 64bit isnt being used

Fix #AC split lock's caused by seperating base and limit from sgdt
and sidt by changing xchg operands to 32-bit to stop from crossing
cacheline.

Signed-off-by: John E Lofgren <john.e.lofgren@intel.com>
---
 UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
index 4db1a09f28..7b7642b290 100644
--- a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
+++ b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
@@ -180,21 +180,29 @@ HasErrorCode:
     push    qword [rbp + 24]
 
 ;; UINT64  Gdtr[2], Idtr[2];
+    ; sidt and sgdt saves 10 bytes to memory, 8 bytes = base and 2 bytes = limit.
+    ; To avoid #AC split lock when separating base and limit into their
+    ; own separate 64 bit memory, we can’t use 64 bit xchg since base [63:48] bits
+    ; may cross the cache line.
     xor     rax, rax
     push    rax
     push    rax
     sidt    [rsp]
-    xchg    rax, [rsp + 2]
-    xchg    rax, [rsp]
-    xchg    rax, [rsp + 8]
+    xchg    eax, [rsp + 2]
+    xchg    eax, [rsp]
+    xchg    eax, [rsp + 8]
+    xchg     ax, [rsp + 6]
+    xchg     ax, [rsp + 4]
 
     xor     rax, rax
     push    rax
     push    rax
     sgdt    [rsp]
-    xchg    rax, [rsp + 2]
-    xchg    rax, [rsp]
-    xchg    rax, [rsp + 8]
+    xchg    eax, [rsp + 2]
+    xchg    eax, [rsp]
+    xchg    eax, [rsp + 8]
+    xchg     ax, [rsp + 6]
+    xchg     ax, [rsp + 4]
 
 ;; UINT64  Ldtr, Tr;
     xor     rax, rax
-- 
2.16.2.windows.1


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#47045): https://edk2.groups.io/g/devel/message/47045
Mute This Topic: https://groups.io/mt/34083544/1787277
Mute #ac: https://groups.io/mk?hashtag=ac&subid=3901457
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix #AC split lock
Posted by Dong, Eric 4 years, 7 months ago
Cc other reviewers.

Reviewed-by: Eric Dong <eric.dong@intel.com>

Thanks,
Eric
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of John E
> Lofgren
> Sent: Tuesday, September 10, 2019 2:41 AM
> To: devel@edk2.groups.io
> Subject: [edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix
> #AC split lock
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2150
> 
> V2 changes:
> Add xchg 16 bit instructions to handle sgdt and sidt base
> 63:48 bits and 47:32 bits.
> Add comment to explain why xchg 64bit isnt being used
> 
> Fix #AC split lock's caused by seperating base and limit from sgdt and sidt by
> changing xchg operands to 32-bit to stop from crossing cacheline.
> 
> Signed-off-by: John E Lofgren <john.e.lofgren@intel.com>
> ---
> 
> UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nas
> m | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git
> a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.n
> asm
> b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.n
> asm
> index 4db1a09f28..7b7642b290 100644
> ---
> a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.n
> asm
> +++
> b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.
> +++ nasm
> @@ -180,21 +180,29 @@ HasErrorCode:
>      push    qword [rbp + 24]
> 
>  ;; UINT64  Gdtr[2], Idtr[2];
> +    ; sidt and sgdt saves 10 bytes to memory, 8 bytes = base and 2 bytes =
> limit.
> +    ; To avoid #AC split lock when separating base and limit into their
> +    ; own separate 64 bit memory, we can’t use 64 bit xchg since base
> [63:48] bits
> +    ; may cross the cache line.
>      xor     rax, rax
>      push    rax
>      push    rax
>      sidt    [rsp]
> -    xchg    rax, [rsp + 2]
> -    xchg    rax, [rsp]
> -    xchg    rax, [rsp + 8]
> +    xchg    eax, [rsp + 2]
> +    xchg    eax, [rsp]
> +    xchg    eax, [rsp + 8]
> +    xchg     ax, [rsp + 6]
> +    xchg     ax, [rsp + 4]
> 
>      xor     rax, rax
>      push    rax
>      push    rax
>      sgdt    [rsp]
> -    xchg    rax, [rsp + 2]
> -    xchg    rax, [rsp]
> -    xchg    rax, [rsp + 8]
> +    xchg    eax, [rsp + 2]
> +    xchg    eax, [rsp]
> +    xchg    eax, [rsp + 8]
> +    xchg     ax, [rsp + 6]
> +    xchg     ax, [rsp + 4]
> 
>  ;; UINT64  Ldtr, Tr;
>      xor     rax, rax
> --
> 2.16.2.windows.1
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#47182): https://edk2.groups.io/g/devel/message/47182
Mute This Topic: https://groups.io/mt/34083544/1787277
Mute #ac: https://groups.io/mk?hashtag=ac&subid=3901457
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix #AC split lock
Posted by Laszlo Ersek 4 years, 7 months ago
On 09/09/19 20:40, John E Lofgren wrote:
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2150
> 
> V2 changes:
> Add xchg 16 bit instructions to handle sgdt and sidt base
> 63:48 bits and 47:32 bits.
> Add comment to explain why xchg 64bit isnt being used
> 
> Fix #AC split lock's caused by seperating base and limit from sgdt
> and sidt by changing xchg operands to 32-bit to stop from crossing
> cacheline.
> 
> Signed-off-by: John E Lofgren <john.e.lofgren@intel.com>
> ---
>  UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)

The commit message (and the bug report) are very difficult to understand.

This is the first time I hear about "split lock" and "alignment check
exception", so please bear with me. This is my take on the problem.

(1) "split lock" is explained well here:

https://lwn.net/Articles/784864/

In short, we can consider it a performance anti-pattern. A locking
instruction (such as XCHG) is invoked on incorrectly aligned data, which
causes performance degradation for the whole system. In some cases this
can be a security issue even, because less privileged code can block
(slow down) more privileged code running on a different CPU.

(2) Alignment Check Exception is a way to detect the occurrence of
"split lock". It must be configured explicitly, when the system software
wishes to be informed explicitly about a split lock event.

Therefore, the "#AC split lock" expression in the commit message is very
confusing. Those are two different things. One is the problem ("split
lock"), and the other is the (kind of) solution ("alignment check
exception").

We don't care about #AC (the exception) here. My understanding is that
the open source edk2 tree does not enable #AC. The following include file:

  MdePkg/Include/Register/Intel/Msr/P6Msr.h

defines MSR_P6_TEST_CTL (value 0x33), which the above LWN article
references as MSR_TEST_CTL. The article refers to bit 29, but that seems
to be part of the "Reserved1" bit-field in the MSR_P6_TEST_CTL_REGISTER.

There seems to be a bit field that could be related (Disable_LOCK,
bit#31), but again, there is no reference to Disable_LOCK in the edk2
codebase.

So my whole point here is that we should clearly separate "#AC" from
"split lock" in the commit message (and in the code comment). Those are
separate things. And "split lock" may apply to edk2, but "#AC" does not
(to the open source tree anyway).

(3) OK, assuming this code indeed triggers a "split lock" event. Why is
that a problem? Yes, it may degrade performance for the system. Why do
we care? We are *already* handling an exception. That should not be a
very frequent event, for any processor (BSP or AP) in the system.

Is the problem that a closed source platform enables #AC, and then the
fault handler in CpuExceptionHandlerLib -- which gets invoked due to an
independent fault -- runs into a *double* fault, due to the split lock
raising #AC?

This should be clearly explained in the commit message. I'm not prying
at proprietary platform details, but the circumstances / symptoms of the
issue should be clearly described.

More comments below, regarding the original code:

> 
> diff --git a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
> index 4db1a09f28..7b7642b290 100644
> --- a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
> +++ b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm
> @@ -180,21 +180,29 @@ HasErrorCode:
>      push    qword [rbp + 24]
>  
>  ;; UINT64  Gdtr[2], Idtr[2];
> +    ; sidt and sgdt saves 10 bytes to memory, 8 bytes = base and 2 bytes = limit.
> +    ; To avoid #AC split lock when separating base and limit into their
> +    ; own separate 64 bit memory, we can’t use 64 bit xchg since base [63:48] bits
> +    ; may cross the cache line.
>      xor     rax, rax
>      push    rax
>      push    rax

So, the contents of RAX is 0 now, and we've made 16 bytes (filled with
0) room on the stack.

>      sidt    [rsp]

This instruction writes 10 bytes at the base of that "room" on the
stack. Offsets 0 and 1 contain the "limit", offsets 2-9 (inclusive)
contain the "base address".

  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |L|L|B|B|B|B|B|B|B|B|0|0|0|0|0|0|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

> -    xchg    rax, [rsp + 2]

(I guess this is the instruction that splits the lock.)

Now RAX has the base address, and the stack looks like:

  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |L|L|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

> -    xchg    rax, [rsp]

Now RAX has the limit, and the stack is:

  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |B|B|B|B|B|B|B|B|0|0|0|0|0|0|0|0|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

> -    xchg    rax, [rsp + 8]

Now RAX is zero again, and the stack is:

  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |B|B|B|B|B|B|B|B|L|L|0|0|0|0|0|0|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This is very "clever" and all, but why do we have to use a locking
instruction? We save RBX on the stack, earlier in CommonInterruptEntry,
and we restore it in the end. So what's wrong with:

  sidt    [rsp]
  mov     bx, word [rsp]       ; read limit into bx
  mov     rax, qword [rsp + 2] ; read base address into rax
  mov     qword [rsp], rax     ; write base address to qword#0
  mov     word [rsp + 8], bx   ; write limit to qword#1

The second MOV instruction may straddle a cache line, yes, but there is
no locking at all.

Thanks,
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#47220): https://edk2.groups.io/g/devel/message/47220
Mute This Topic: https://groups.io/mt/34083544/1787277
Mute #ac: https://groups.io/mk?hashtag=ac&subid=3901457
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix #AC split lock
Posted by John E Lofgren 4 years, 7 months ago
Hi Laszlo,
2. Yes, I can change commit message/comments to separate split lock and #AC.
3. Yes it’s a close platform that is enabling #AC which hits double fault because split lock inside CpuExceptionHandlerLib.

Code:	
I was wondering same thing, why are they using locking mechanism. I wasn’t sure so that’s why keep xchg instead of changing to mov.  I have yet to think of any reasons.  I think it's a good idea to use mov instead it accomplishes the same thing and easier to understand.

Thank you,
John

>-----Original Message-----
>From: Laszlo Ersek [mailto:lersek@redhat.com]
>Sent: Friday, September 13, 2019 9:32 AM
>To: devel@edk2.groups.io; Lofgren, John E <john.e.lofgren@intel.com>
>Cc: Ni, Ray <ray.ni@intel.com>; Gao, Liming <liming.gao@intel.com>; Dong,
>Eric <eric.dong@intel.com>
>Subject: Re: [edk2-devel] [Patch V2] UefiCpuPkg/CpuExceptionHandlerLib: Fix
>#AC split lock
>
>On 09/09/19 20:40, John E Lofgren wrote:
>> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2150
>>
>> V2 changes:
>> Add xchg 16 bit instructions to handle sgdt and sidt base
>> 63:48 bits and 47:32 bits.
>> Add comment to explain why xchg 64bit isnt being used
>>
>> Fix #AC split lock's caused by seperating base and limit from sgdt and
>> sidt by changing xchg operands to 32-bit to stop from crossing
>> cacheline.
>>
>> Signed-off-by: John E Lofgren <john.e.lofgren@intel.com>
>> ---
>>
>>
>UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nas
>m
>> | 20 ++++++++++++++------
>>  1 file changed, 14 insertions(+), 6 deletions(-)
>
>The commit message (and the bug report) are very difficult to understand.
>
>This is the first time I hear about "split lock" and "alignment check exception",
>so please bear with me. This is my take on the problem.
>
>(1) "split lock" is explained well here:
>
>https://lwn.net/Articles/784864/
>
>In short, we can consider it a performance anti-pattern. A locking instruction
>(such as XCHG) is invoked on incorrectly aligned data, which causes
>performance degradation for the whole system. In some cases this can be a
>security issue even, because less privileged code can block (slow down) more
>privileged code running on a different CPU.
>
>(2) Alignment Check Exception is a way to detect the occurrence of "split
>lock". It must be configured explicitly, when the system software wishes to be
>informed explicitly about a split lock event.
>
>Therefore, the "#AC split lock" expression in the commit message is very
>confusing. Those are two different things. One is the problem ("split lock"),
>and the other is the (kind of) solution ("alignment check exception").
>
>We don't care about #AC (the exception) here. My understanding is that the
>open source edk2 tree does not enable #AC. The following include file:
>
>  MdePkg/Include/Register/Intel/Msr/P6Msr.h
>
>defines MSR_P6_TEST_CTL (value 0x33), which the above LWN article
>references as MSR_TEST_CTL. The article refers to bit 29, but that seems to be
>part of the "Reserved1" bit-field in the MSR_P6_TEST_CTL_REGISTER.
>
>There seems to be a bit field that could be related (Disable_LOCK, bit#31), but
>again, there is no reference to Disable_LOCK in the edk2 codebase.
>
>So my whole point here is that we should clearly separate "#AC" from "split
>lock" in the commit message (and in the code comment). Those are separate
>things. And "split lock" may apply to edk2, but "#AC" does not (to the open
>source tree anyway).
>
>(3) OK, assuming this code indeed triggers a "split lock" event. Why is that a
>problem? Yes, it may degrade performance for the system. Why do we care?
>We are *already* handling an exception. That should not be a very frequent
>event, for any processor (BSP or AP) in the system.
>
>Is the problem that a closed source platform enables #AC, and then the fault
>handler in CpuExceptionHandlerLib -- which gets invoked due to an
>independent fault -- runs into a *double* fault, due to the split lock raising
>#AC?
>
>This should be clearly explained in the commit message. I'm not prying at
>proprietary platform details, but the circumstances / symptoms of the issue
>should be clearly described.
>
>More comments below, regarding the original code:
>
>>
>> diff --git
>>
>a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.na
>> sm
>>
>b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.na
>> sm
>> index 4db1a09f28..7b7642b290 100644
>> ---
>>
>a/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.na
>> sm
>> +++
>b/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAs
>> +++ m.nasm
>> @@ -180,21 +180,29 @@ HasErrorCode:
>>      push    qword [rbp + 24]
>>
>>  ;; UINT64  Gdtr[2], Idtr[2];
>> +    ; sidt and sgdt saves 10 bytes to memory, 8 bytes = base and 2 bytes =
>limit.
>> +    ; To avoid #AC split lock when separating base and limit into their
>> +    ; own separate 64 bit memory, we can’t use 64 bit xchg since base [63:48]
>bits
>> +    ; may cross the cache line.
>>      xor     rax, rax
>>      push    rax
>>      push    rax
>
>So, the contents of RAX is 0 now, and we've made 16 bytes (filled with
>0) room on the stack.
>
>>      sidt    [rsp]
>
>This instruction writes 10 bytes at the base of that "room" on the stack.
>Offsets 0 and 1 contain the "limit", offsets 2-9 (inclusive) contain the "base
>address".
>
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  |L|L|B|B|B|B|B|B|B|B|0|0|0|0|0|0|
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>> -    xchg    rax, [rsp + 2]
>
>(I guess this is the instruction that splits the lock.)
>
>Now RAX has the base address, and the stack looks like:
>
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  |L|L|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>> -    xchg    rax, [rsp]
>
>Now RAX has the limit, and the stack is:
>
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  |B|B|B|B|B|B|B|B|0|0|0|0|0|0|0|0|
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>> -    xchg    rax, [rsp + 8]
>
>Now RAX is zero again, and the stack is:
>
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  |B|B|B|B|B|B|B|B|L|L|0|0|0|0|0|0|
>  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>This is very "clever" and all, but why do we have to use a locking instruction?
>We save RBX on the stack, earlier in CommonInterruptEntry, and we restore it
>in the end. So what's wrong with:
>
>  sidt    [rsp]
>  mov     bx, word [rsp]       ; read limit into bx
>  mov     rax, qword [rsp + 2] ; read base address into rax
>  mov     qword [rsp], rax     ; write base address to qword#0
>  mov     word [rsp + 8], bx   ; write limit to qword#1
>
>The second MOV instruction may straddle a cache line, yes, but there is no
>locking at all.
>
>Thanks,
>Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#47232): https://edk2.groups.io/g/devel/message/47232
Mute This Topic: https://groups.io/mt/34083544/1787277
Mute #ac: https://groups.io/mk?hashtag=ac&subid=3901457
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-