x86/cpu: Cleanup for NX adjustments

[PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Andrew Cooper 1 week, 4 days ago

All 64-bit CPUs have CLFLUSH.  AMD introduced it in the K8, Intel in the P4
Willamette core prior to 64-bit support, and VIA/Centaur in the Isaiah.
Furthermore, the reported cacheline size is 64 on all CPUs to date.

Arguably changeset 19435c10abf7 ("x86: consolidate/enhance TLB flushing
interface", 2007) should have initialised c->x86_clflush_size earlier, but
even at the time of changeset 3330013e6739 ("VT-d / x86: re-arrange cache
syncing", 2022), early_cpu_init() had CLFLUSH-parsing logic but simply failed
to record the size.

By removing get_cache_line_size() and assuming 16 bytes, the practical
consequence for early IOMMU initialisation of SandyBridge era systems is to
flush every cacheline 4 times (a pipeline stall too, as those CPUs could only
have one flush in flight at a single time).

Record c->x86_clflush_size in early_cpu_init(), and panic() if CLFLUSH isn't
found.  Drop the redundant initialisation of c->x86_cache_alignment.

Remove the fallback to 16 bytes in cache_{flush,writeback}(), opting instead
for an ASSERT() to confirm that the logic hasn't been re-arranged too early.

Fixes: 3330013e6739 ("VT-d / x86: re-arrange cache syncing")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Julian Vetter <julian.vetter@vates.tech>
CC: Teddy Astie <teddy.astie@vates.tech>
---
 xen/arch/x86/cpu/common.c |  7 ++++---
 xen/arch/x86/flushtlb.c   | 19 +++++++------------
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index ebe2baf8b98a..f8c80db6eb1d 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
 	uint64_t val;
 	u32 eax, ebx, ecx, edx;
 
-	c->x86_cache_alignment = 32;
-
 	/* Get vendor name */
 	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
 	*(u32 *)&c->x86_vendor_id[0] = ebx;
@@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
 	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
 		unsigned int size = ((ebx >> 8) & 0xff) * 8;
 
+		c->x86_clflush_size = size;
 		c->x86_cache_alignment = size;
 
 		/*
@@ -380,7 +379,9 @@ void __init early_cpu_init(bool verbose)
 		}
 		else
 			setup_clear_cpu_cap(X86_FEATURE_CLZERO);
-	}
+	} else
+		panic("CLFLUSH information not available\n");
+
 	/* Leaf 0x1 capabilities filled in early for Xen. */
 	c->x86_capability[FEATURESET_1d] = edx;
 	c->x86_capability[FEATURESET_1c] = ecx;
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 23721bb52c90..1f8877dcab23 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -234,8 +234,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
 
         if ( (!(flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL)) ||
               (flags & FLUSH_VA_VALID)) &&
-             c->x86_clflush_size && c->x86_cache_size && sz &&
-             ((sz >> 10) < c->x86_cache_size) )
+             c->x86_cache_size && sz && ((sz >> 10) < c->x86_cache_size) )
         {
             if ( flags & FLUSH_CACHE_EVICT )
                 cache_flush(va, sz);
@@ -264,13 +263,11 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
  */
 void cache_flush(const void *addr, unsigned int size)
 {
-    /*
-     * This function may be called before current_cpu_data is established.
-     * Hence a fallback is needed to prevent the loop below becoming infinite.
-     */
-    unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
+    unsigned int clflush_size = current_cpu_data.x86_clflush_size;
     const void *end = addr + size;
 
+    ASSERT(clflush_size);
+
     alternative("", "mfence", X86_BUG_CLFLUSH_MFENCE);
 
     addr -= (unsigned long)addr & (clflush_size - 1);
@@ -301,11 +298,9 @@ void cache_writeback(const void *addr, unsigned int size)
     if ( !boot_cpu_has(X86_FEATURE_CLWB) )
         return cache_flush(addr, size);
 
-    /*
-     * This function may be called before current_cpu_data is established.
-     * Hence a fallback is needed to prevent the loop below becoming infinite.
-     */
-    clflush_size = current_cpu_data.x86_clflush_size ?: 16;
+    clflush_size = current_cpu_data.x86_clflush_size;
+    ASSERT(clflush_size);
+
     addr -= (unsigned long)addr & (clflush_size - 1);
     for ( ; addr < end; addr += clflush_size )
         clwb(addr);
-- 
2.39.5

Re: [PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Jan Beulich 1 week, 3 days ago

On 26.01.2026 18:53, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
>  	uint64_t val;
>  	u32 eax, ebx, ecx, edx;
>  
> -	c->x86_cache_alignment = 32;
> -
>  	/* Get vendor name */
>  	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
>  	*(u32 *)&c->x86_vendor_id[0] = ebx;
> @@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
>  	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
>  		unsigned int size = ((ebx >> 8) & 0xff) * 8;
>  
> +		c->x86_clflush_size = size;
>  		c->x86_cache_alignment = size;

With this change, can't the writing of the field in generic_identify()
go away? CPU_DATA_INIT() in particular doesn't invalidate it. Perferably
with that dropped (unless of course there is a reason not to):
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Tangentially, "cpuid=no-clflush" didn't have any effect on any of this so
far, and also isn't going to have with the changes you make.

Jan

Re: [PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Andrew Cooper 1 week, 3 days ago

On 27/01/2026 10:37 am, Jan Beulich wrote:
> On 26.01.2026 18:53, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
>>  	uint64_t val;
>>  	u32 eax, ebx, ecx, edx;
>>  
>> -	c->x86_cache_alignment = 32;
>> -
>>  	/* Get vendor name */
>>  	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
>>  	*(u32 *)&c->x86_vendor_id[0] = ebx;
>> @@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
>>  	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
>>  		unsigned int size = ((ebx >> 8) & 0xff) * 8;
>>  
>> +		c->x86_clflush_size = size;
>>  		c->x86_cache_alignment = size;
> With this change, can't the writing of the field in generic_identify()
> go away? CPU_DATA_INIT() in particular doesn't invalidate it.

No, it can't.  The value needs setting up on every AP, right now at least.

>  Perferably
> with that dropped (unless of course there is a reason not to):
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Hopefully this is a good enough reason.  I know we agreed to make it a
single global, but that's future work.

>
> Tangentially, "cpuid=no-clflush" didn't have any effect on any of this so
> far, and also isn't going to have with the changes you make.

The line immediately out of context above will applies the clear cap
mask, so will cause cpuid=no-clflush to take effect.

~Andrew

Re: [PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Jan Beulich 1 week, 3 days ago

On 27.01.2026 12:08, Andrew Cooper wrote:
> On 27/01/2026 10:37 am, Jan Beulich wrote:
>> On 26.01.2026 18:53, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu/common.c
>>> +++ b/xen/arch/x86/cpu/common.c
>>> @@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
>>>  	uint64_t val;
>>>  	u32 eax, ebx, ecx, edx;
>>>  
>>> -	c->x86_cache_alignment = 32;
>>> -
>>>  	/* Get vendor name */
>>>  	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
>>>  	*(u32 *)&c->x86_vendor_id[0] = ebx;
>>> @@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
>>>  	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
>>>  		unsigned int size = ((ebx >> 8) & 0xff) * 8;
>>>  
>>> +		c->x86_clflush_size = size;
>>>  		c->x86_cache_alignment = size;
>> With this change, can't the writing of the field in generic_identify()
>> go away? CPU_DATA_INIT() in particular doesn't invalidate it.
> 
> No, it can't.  The value needs setting up on every AP, right now at least.

Are you sure? APs inherit part of the BSP's data (initialize_cpu_data()),
and reset_cpuinfo() doesn't clear ->x86_clflush_size afaics.

>> Tangentially, "cpuid=no-clflush" didn't have any effect on any of this so
>> far, and also isn't going to have with the changes you make.
> 
> The line immediately out of context above will applies the clear cap
> mask, so will cause cpuid=no-clflush to take effect.

This concerns me. With your change, "cpuid=no-clflush" will lead to an
unconditional panic() then. Whereas previously, with cleared_caps[] being
applied by identify_cpu() only after generic_identify() has already
evaluated the CLFLUSH bit, there was no effect at all.

I don't think this panic()ing is desirable, but as an absolute minimum this
(drastic) change in behavior would want calling out in the description.

Further, if the panic() was to stay, there's no point having cpu_has_clflush
evaluate to anything other than constant true anymore.

Again tangentially (and partly the reason why I overlooked that aspect
originally): While early_cpu_init() respects cleared_caps[] for leaf 1, it
doesn't for any of leaf 7's subleaves, nor for ARCH_CAPS.

Jan

Re: [PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Andrew Cooper 1 week, 3 days ago

On 27/01/2026 11:35 am, Jan Beulich wrote:
> On 27.01.2026 12:08, Andrew Cooper wrote:
>> On 27/01/2026 10:37 am, Jan Beulich wrote:
>>> On 26.01.2026 18:53, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/cpu/common.c
>>>> +++ b/xen/arch/x86/cpu/common.c
>>>> @@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
>>>>  	uint64_t val;
>>>>  	u32 eax, ebx, ecx, edx;
>>>>  
>>>> -	c->x86_cache_alignment = 32;
>>>> -
>>>>  	/* Get vendor name */
>>>>  	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
>>>>  	*(u32 *)&c->x86_vendor_id[0] = ebx;
>>>> @@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
>>>>  	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
>>>>  		unsigned int size = ((ebx >> 8) & 0xff) * 8;
>>>>  
>>>> +		c->x86_clflush_size = size;
>>>>  		c->x86_cache_alignment = size;
>>> With this change, can't the writing of the field in generic_identify()
>>> go away? CPU_DATA_INIT() in particular doesn't invalidate it.
>> No, it can't.  The value needs setting up on every AP, right now at least.
> Are you sure? APs inherit part of the BSP's data (initialize_cpu_data()),
> and reset_cpuinfo() doesn't clear ->x86_clflush_size afaics.

Every time I look at that, it gets more insane.

For every CPU, initialize_cpu_data() clobbers boot_cpu_data, *then*
copies the result into cpu_data[] array.

This cannot possibly be correct.  Why on earth did I ack it?

>>> Tangentially, "cpuid=no-clflush" didn't have any effect on any of this so
>>> far, and also isn't going to have with the changes you make.
>> The line immediately out of context above will applies the clear cap
>> mask, so will cause cpuid=no-clflush to take effect.
> This concerns me. With your change, "cpuid=no-clflush" will lead to an
> unconditional panic() then.

So will no-cmpxchg8b.  The only reason no-lm won't is because that's
evaluated before parsing the cmdline.

> Whereas previously, with cleared_caps[] being
> applied by identify_cpu() only after generic_identify() has already
> evaluated the CLFLUSH bit, there was no effect at all.

That wasn't no effect.  The effect (upon request of an impossible thing)
would be that part of Xen would have ignored the request and functioned,
but another part of Xen would have propagated that to guests, which will
probably have equally rude things to say.

> I don't think this panic()ing is desirable, but as an absolute minimum this
> (drastic) change in behavior would want calling out in the description.
>
> Further, if the panic() was to stay, there's no point having cpu_has_clflush
> evaluate to anything other than constant true anymore.

I'm not overly interested in users complaining about a panic() if they
ask for an impossible thing.  Better that than the prior behaviour we had.

Talking of other impossible things, cpuid=no-$foo does nothing for
FPU/DE/PSE/PGE or MMX which are the features hard wired to 1 already,
and with 0 users in the tree.

Again, Xen will assume the safe thing, but pass the impossible request
on to guests.

> Again tangentially (and partly the reason why I overlooked that aspect
> originally): While early_cpu_init() respects cleared_caps[] for leaf 1, it
> doesn't for any of leaf 7's subleaves, nor for ARCH_CAPS.

No idea.  I stopped doing archaeology on all the wrong-looking things I
found, because there are just too many of them.

~Andrew

Re: [PATCH 01/16] x86/cpu: Fix boot time cache flushing

Posted by Jan Beulich 1 week, 2 days ago

On 27.01.2026 18:53, Andrew Cooper wrote:
> On 27/01/2026 11:35 am, Jan Beulich wrote:
>> On 27.01.2026 12:08, Andrew Cooper wrote:
>>> On 27/01/2026 10:37 am, Jan Beulich wrote:
>>>> On 26.01.2026 18:53, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/cpu/common.c
>>>>> +++ b/xen/arch/x86/cpu/common.c
>>>>> @@ -319,8 +319,6 @@ void __init early_cpu_init(bool verbose)
>>>>>  	uint64_t val;
>>>>>  	u32 eax, ebx, ecx, edx;
>>>>>  
>>>>> -	c->x86_cache_alignment = 32;
>>>>> -
>>>>>  	/* Get vendor name */
>>>>>  	cpuid(0x00000000, &c->cpuid_level, &ebx, &ecx, &edx);
>>>>>  	*(u32 *)&c->x86_vendor_id[0] = ebx;
>>>>> @@ -352,6 +350,7 @@ void __init early_cpu_init(bool verbose)
>>>>>  	if (edx & cpufeat_mask(X86_FEATURE_CLFLUSH)) {
>>>>>  		unsigned int size = ((ebx >> 8) & 0xff) * 8;
>>>>>  
>>>>> +		c->x86_clflush_size = size;
>>>>>  		c->x86_cache_alignment = size;
>>>> With this change, can't the writing of the field in generic_identify()
>>>> go away? CPU_DATA_INIT() in particular doesn't invalidate it.
>>> No, it can't.  The value needs setting up on every AP, right now at least.
>> Are you sure? APs inherit part of the BSP's data (initialize_cpu_data()),
>> and reset_cpuinfo() doesn't clear ->x86_clflush_size afaics.
> 
> Every time I look at that, it gets more insane.
> 
> For every CPU, initialize_cpu_data() clobbers boot_cpu_data, *then*
> copies the result into cpu_data[] array.
> 
> This cannot possibly be correct.  Why on earth did I ack it?

I wonder what you're looking at. My initialize_cpu_data() has

    struct cpuinfo_x86 c = boot_cpu_data;

which means a copy is being made, the address of which is then handed
to reset_cpuinfo().

>>>> Tangentially, "cpuid=no-clflush" didn't have any effect on any of this so
>>>> far, and also isn't going to have with the changes you make.
>>> The line immediately out of context above will applies the clear cap
>>> mask, so will cause cpuid=no-clflush to take effect.
>> This concerns me. With your change, "cpuid=no-clflush" will lead to an
>> unconditional panic() then.
> 
> So will no-cmpxchg8b.

Which doesn't make the situation any better. (I think you mean no-cmpxchg16b
though?)

>> Whereas previously, with cleared_caps[] being
>> applied by identify_cpu() only after generic_identify() has already
>> evaluated the CLFLUSH bit, there was no effect at all.
> 
> That wasn't no effect.  The effect (upon request of an impossible thing)
> would be that part of Xen would have ignored the request and functioned,
> but another part of Xen would have propagated that to guests, which will
> probably have equally rude things to say.

Well, I thought it was clear from context that I meant "no effect for Xen
itself". As to guests - as long as they're properly checking CPUID bits
and refrain from using insns which CPUID says aren't available, I don't
see why they should get upset.

When knowing one may run virtualized, the concept of "I know one feature
(e.g. LM) implies another (e.g. CLFLUSH)" is flawed. Any combination of
features can be surfaced, so long as true dependencies between them are
respected. IOW I disagree with "cpuid=no-clflush" requesting an impossible
thing. "cpuid=no-lm", otoh, does for a 64-bit target environment.

>> I don't think this panic()ing is desirable, but as an absolute minimum this
>> (drastic) change in behavior would want calling out in the description.
>>
>> Further, if the panic() was to stay, there's no point having cpu_has_clflush
>> evaluate to anything other than constant true anymore.
> 
> I'm not overly interested in users complaining about a panic() if they
> ask for an impossible thing.  Better that than the prior behaviour we had.
> 
> Talking of other impossible things, cpuid=no-$foo does nothing for
> FPU/DE/PSE/PGE or MMX which are the features hard wired to 1 already,
> and with 0 users in the tree.

Indeed, and while there is that "Currently accepted:" section in the doc,
I can't help thinking that even for the speculation control aspect that
it explicitly names it has already gone stale. Yes, in the past we said
we'd mean to not support use of arbitrary forms of this option, yet

"Unless otherwise noted, options only have any effect in their negative form,
 to hide the named feature(s).  Ignoring a feature using this mechanism will
 cause Xen not to use the feature, nor offer them as usable to guests."

to me really says otherwise. Even if intended to be thus restricted, it
would then feel rather odd that we implement support for an option with
hundreds of sub-options, out of which only a handful are supposed to be
possibly used.

On concrete example where a presently not explicitly permitted form
could be useful to people is "no-rdseed" on AMD hardware affected by
one of the two known issues (patches sadly still only pending). This
viable mitigation would be unsupported by your implied interpretation.

Jan