[PATCH v2 16/16] x86/fsgsbase: Improve code generation in read_registers()

Andrew Cooper posted 16 patches 5 months, 4 weeks ago
[PATCH v2 16/16] x86/fsgsbase: Improve code generation in read_registers()
Posted by Andrew Cooper 5 months, 4 weeks ago
It turns out that using the higher level helpers adjacent like this leads to
terrible code generation.  Due to -fno-strict-alising, the store into state->
invalidates the read_cr4() address calculation (which is really cpu_info->cr4
under the hood), meaning that it can't be hoisted.

As a result we get "locate the top of stack block, get cr4, and see if
FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.

Hoist the calculation manually, which results in two basic blocks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Side-by-side disassembly: https://termbin.com/9xfq
---
 xen/arch/x86/traps.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 0c5393cb2166..8c261d219c07 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -118,9 +118,18 @@ static void read_registers(struct extra_state *state)
     state->cr3 = read_cr3();
     state->cr4 = read_cr4();
 
-    state->fsb = read_fs_base();
-    state->gsb = read_gs_base();
-    state->gss = read_gs_shadow();
+    if ( state->cr4 & X86_CR4_FSGSBASE )
+    {
+        state->fsb = __rdfsbase();
+        state->gsb = __rdgsbase();
+        state->gss = __rdgskern();
+    }
+    else
+    {
+        state->fsb = rdmsr(MSR_FS_BASE);
+        state->gsb = rdmsr(MSR_GS_BASE);
+        state->gss = rdmsr(MSR_SHADOW_GS_BASE);
+    }
 
     asm ( "mov %%ds, %0" : "=m" (state->ds) );
     asm ( "mov %%es, %0" : "=m" (state->es) );
-- 
2.39.5


Re: [PATCH v2 16/16] x86/fsgsbase: Improve code generation in read_registers()
Posted by Jan Beulich 5 months, 3 weeks ago
On 15.08.2025 22:41, Andrew Cooper wrote:
> It turns out that using the higher level helpers adjacent like this leads to
> terrible code generation.  Due to -fno-strict-alising, the store into state->
> invalidates the read_cr4() address calculation (which is really cpu_info->cr4
> under the hood), meaning that it can't be hoisted.
> 
> As a result we get "locate the top of stack block, get cr4, and see if
> FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.
> 
> Hoist the calculation manually, which results in two basic blocks.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Otoh the function here isn't really performance or size critical. I'm undecided
whether the undesirable open-coding or the bad code gen are the lesser evil.

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -118,9 +118,18 @@ static void read_registers(struct extra_state *state)
>      state->cr3 = read_cr3();
>      state->cr4 = read_cr4();
>  
> -    state->fsb = read_fs_base();
> -    state->gsb = read_gs_base();
> -    state->gss = read_gs_shadow();
> +    if ( state->cr4 & X86_CR4_FSGSBASE )
> +    {
> +        state->fsb = __rdfsbase();
> +        state->gsb = __rdgsbase();
> +        state->gss = __rdgskern();

This, btw, supports my desire to not use "kern" but "shadow" in the new helper's
name.

Jan
Re: [PATCH v2 16/16] x86/fsgsbase: Improve code generation in read_registers()
Posted by Andrew Cooper 5 months, 3 weeks ago
On 19/08/2025 2:19 pm, Jan Beulich wrote:
> On 15.08.2025 22:41, Andrew Cooper wrote:
>> It turns out that using the higher level helpers adjacent like this leads to
>> terrible code generation.  Due to -fno-strict-alising, the store into state->
>> invalidates the read_cr4() address calculation (which is really cpu_info->cr4
>> under the hood), meaning that it can't be hoisted.
>>
>> As a result we get "locate the top of stack block, get cr4, and see if
>> FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.
>>
>> Hoist the calculation manually, which results in two basic blocks.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Otoh the function here isn't really performance or size critical. I'm undecided
> whether the undesirable open-coding or the bad code gen are the lesser evil.

This function no, but every other place touching FS and GS is
performance critical.  They're all messy to start with, and get worse
under FRED.

~Andrew

Re: [PATCH v2 16/16] x86/fsgsbase: Improve code generation in read_registers()
Posted by Jan Beulich 5 months, 3 weeks ago
On 21.08.2025 22:52, Andrew Cooper wrote:
> On 19/08/2025 2:19 pm, Jan Beulich wrote:
>> On 15.08.2025 22:41, Andrew Cooper wrote:
>>> It turns out that using the higher level helpers adjacent like this leads to
>>> terrible code generation.  Due to -fno-strict-alising, the store into state->
>>> invalidates the read_cr4() address calculation (which is really cpu_info->cr4
>>> under the hood), meaning that it can't be hoisted.
>>>
>>> As a result we get "locate the top of stack block, get cr4, and see if
>>> FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.
>>>
>>> Hoist the calculation manually, which results in two basic blocks.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Otoh the function here isn't really performance or size critical. I'm undecided
>> whether the undesirable open-coding or the bad code gen are the lesser evil.
> 
> This function no, but every other place touching FS and GS is
> performance critical.  They're all messy to start with, and get worse
> under FRED.

Is there any (further) bad effect to the function here by the time all of the
FRED bits are in?

Jan