[PATCH v3 05/22] x86/fsgsbase: Improve code generation in read_registers()

Andrew Cooper posted 22 patches 4 months, 1 week ago
[PATCH v3 05/22] x86/fsgsbase: Improve code generation in read_registers()
Posted by Andrew Cooper 4 months, 1 week ago
It turns out that using the higher level helpers adjacent like this leads to
terrible code generation.  Due to -fno-strict-alising, the store into state->
invalidates the read_cr4() address calculation (which is really cpu_info->cr4
under the hood), meaning that it can't be hoisted.

As a result we get "locate the top of stack block, get cr4, and see if
FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.

Hoist the calculation manually, which results in two basic blocks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Side-by-side disassembly: https://termbin.com/9xfq
---
 xen/arch/x86/traps.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 0c5393cb2166..545c42a10862 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -118,9 +118,18 @@ static void read_registers(struct extra_state *state)
     state->cr3 = read_cr3();
     state->cr4 = read_cr4();
 
-    state->fsb = read_fs_base();
-    state->gsb = read_gs_base();
-    state->gss = read_gs_shadow();
+    if ( state->cr4 & X86_CR4_FSGSBASE )
+    {
+        state->fsb = __rdfsbase();
+        state->gsb = __rdgsbase();
+        state->gss = __rdgs_shadow();
+    }
+    else
+    {
+        state->fsb = rdmsr(MSR_FS_BASE);
+        state->gsb = rdmsr(MSR_GS_BASE);
+        state->gss = rdmsr(MSR_SHADOW_GS_BASE);
+    }
 
     asm ( "mov %%ds, %0" : "=m" (state->ds) );
     asm ( "mov %%es, %0" : "=m" (state->es) );
-- 
2.39.5


Re: [PATCH v3 05/22] x86/fsgsbase: Improve code generation in read_registers()
Posted by Jan Beulich 4 months ago
On 04.10.2025 00:53, Andrew Cooper wrote:
> It turns out that using the higher level helpers adjacent like this leads to
> terrible code generation.  Due to -fno-strict-alising, the store into state->
> invalidates the read_cr4() address calculation (which is really cpu_info->cr4
> under the hood), meaning that it can't be hoisted.
> 
> As a result we get "locate the top of stack block, get cr4, and see if
> FSGSBASE is set" repeated 3 times, and an unreasoanble number of basic blocks.
> 
> Hoist the calculation manually, which results in two basic blocks.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
albeit preferably with ...

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -118,9 +118,18 @@ static void read_registers(struct extra_state *state)
>      state->cr3 = read_cr3();
>      state->cr4 = read_cr4();
>  
> -    state->fsb = read_fs_base();
> -    state->gsb = read_gs_base();
> -    state->gss = read_gs_shadow();
> +    if ( state->cr4 & X86_CR4_FSGSBASE )
> +    {
> +        state->fsb = __rdfsbase();
> +        state->gsb = __rdgsbase();
> +        state->gss = __rdgs_shadow();
> +    }
> +    else
> +    {
> +        state->fsb = rdmsr(MSR_FS_BASE);
> +        state->gsb = rdmsr(MSR_GS_BASE);
> +        state->gss = rdmsr(MSR_SHADOW_GS_BASE);
> +    }

... a brief comment added towards the deliberate open-coding.

Jan
Re: [PATCH v3 05/22] x86/fsgsbase: Improve code generation in read_registers()
Posted by Demi Marie Obenour 4 months, 1 week ago
On 10/3/25 18:53, Andrew Cooper wrote:
> It turns out that using the higher level helpers adjacent like this leads to
> terrible code generation.  Due to -fno-strict-alising, the store into state->
> invalidates the read_cr4() address calculation (which is really cpu_info->cr4
> under the hood), meaning that it can't be hoisted.

It would be nice if compilers could warn when they
could not optimize because of something like this.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)