[PATCH] x86/mm: Skip global ASID broadcast TLB flush when PCID is disabled

Tom Lendacky posted 1 patch 4 weeks, 1 day ago
arch/x86/include/asm/tlbflush.h | 3 ++-
arch/x86/mm/tlb.c               | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
[PATCH] x86/mm: Skip global ASID broadcast TLB flush when PCID is disabled
Posted by Tom Lendacky 4 weeks, 1 day ago
Booting with "nopcid" clears X86_FEATURE_PCID in x86_nopcid_setup(),
but X86_FEATURE_INVLPGB is left intact. On AMD CPUs that support
INVLPGB, broadcast TLB flushing remains active even though CR4.PCIDE
is off.

There are two checks that decide whether the global ASID code runs,
mm_global_asid() and consider_global_asid(), that key off of the
X86_FEATURE_INVLPGB feature. Once an mm becomes active on more than
three CPUs, consider_global_asid() assigns it a global ASID, after which
flush_tlb_mm_range() takes the broadcast_tlb_flush() path. The
broadcast_tlb_flush() function calls invlpgb_flush_single_pcid_nosync() or
invlpgb_flush_user_nr_nosync() with kern_pcid(asid), which expands to
(asid + 1).  Both helpers set INVLPGB_FLAG_PCID and place the non-zero
PCID into EDX[31:16] of the INVLPGB instruction. Issuing INVLPGB with
INVLPGB_FLAG_PCID (RAX[1]) and a non-zero PCID (RDX[27:16]) while
CR4.PCIDE is not set results in a #GP:

 Oops: general protection fault, kernel NULL pointer dereference 0x1: 0000 [#1] SMP NOPTI
 CPU: 158 UID: 0 PID: 3119 Comm: snap Not tainted 7.1.0-rc3 #1 PREEMPT(full)
 Hardware name: ...
 RIP: 0010:broadcast_tlb_flush+0x9d/0x210
 Code: ... 89 da 48 83 c8 07 <0f> 01 fe eb 08 cc cc cc ...
 RSP: 0000:ffa0000031217cb0 EFLAGS: 00010202
 RAX: 00007f8ee8540007 RBX: 0000000008070000 RCX: 0000000000000000
 RDX: 0000000000070000 RSI: 0000000000000001 RDI: 00007f8ee8540000
 RBP: ff110001189a0700 R08: 00007f8ee8541000 R09: 000000000000000c
 R10: 000000000000000c R11: 0000000000070000 R12: ff1100c0433aec80
 R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000007
 FS:  00007f8ed3fff6c0(0000) GS:ff1100c0b7b8a000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f8ee8540058 CR3: 00080001433af000 CR4: 0000000000751ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  flush_tlb_mm_range+0xd8/0x220
  ptep_clear_flush+0x56/0x60
  wp_page_copy+0x293/0x6c0
  ? _raw_spin_unlock+0x15/0x30
  __handle_mm_fault+0x512/0x6d0
  handle_mm_fault+0x145/0x2a0
  do_user_addr_fault+0x187/0x6f0
  exc_page_fault+0x6a/0x180
  asm_exc_page_fault+0x22/0x30
 RIP: 0033:0x55c553ebc639
 Code: ...
 RSP: 002b:00007f8ed3ffeaa8 EFLAGS: 00010246
 RAX: 00007f8ee8540058 RBX: 000055c554ff77b8 RCX: 00007f8ee853c1c8
 RDX: 0000000000000000 RSI: 000000c000062b20 RDI: 00007f8ee8540040
 RBP: 00007f8ed3ffeb00 R08: 0000000000006040 R09: 0000000000000005
 R10: 0000000000000004 R11: 00000000000001c0 R12: 0000000000000001
 R13: 0000005990100204 R14: 000000c000624fc0 R15: 3fffffffffffffff
  </TASK>
 Modules linked in: ...
 ---[ end trace 0000000000000000 ]---

Add an X86_FEATURE_PCID check to both mm_global_asid() and
consider_global_asid() so that the global ASID code is skipped when PCID
is disabled.  No global ASID is ever allocated under "nopcid",
mm_global_asid() always returns 0, and broadcast_tlb_flush() is not
reachable. Per-mm flushes fall back to the IPI-based path.  The remaining
INVLPGB users all pass a PCID value of zero without setting the
INVLPGB_FLAG_PCID flag and continue to use the broadcast TLB support.

Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes")
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/tlbflush.h | 3 ++-
 arch/x86/mm/tlb.c               | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 0545fe75c3fa..c4a9ec78fe69 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -272,7 +272,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
 {
 	u16 asid;
 
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    !cpu_feature_enabled(X86_FEATURE_PCID))
 		return 0;
 
 	asid = smp_load_acquire(&mm->context.global_asid);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index af43d177087e..e2ace9f056c9 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -445,7 +445,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
  */
 static void consider_global_asid(struct mm_struct *mm)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
+	    !cpu_feature_enabled(X86_FEATURE_PCID))
 		return;
 
 	/* Check every once in a while. */

base-commit: 1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524
-- 
2.51.1
Re: [PATCH] x86/mm: Skip global ASID broadcast TLB flush when PCID is disabled
Posted by Dave Hansen 4 weeks, 1 day ago
On 5/13/26 11:11, Tom Lendacky wrote:
> Booting with "nopcid" clears X86_FEATURE_PCID in x86_nopcid_setup(),
> but X86_FEATURE_INVLPGB is left intact. On AMD CPUs that support
> INVLPGB, broadcast TLB flushing remains active even though CR4.PCIDE
> is off.
> 
> There are two checks that decide whether the global ASID code runs,
> mm_global_asid() and consider_global_asid(), that key off of the
> X86_FEATURE_INVLPGB feature. Once an mm becomes active on more than
> three CPUs, consider_global_asid() assigns it a global ASID, after which
> flush_tlb_mm_range() takes the broadcast_tlb_flush() path. The
> broadcast_tlb_flush() function calls invlpgb_flush_single_pcid_nosync() or
> invlpgb_flush_user_nr_nosync() with kern_pcid(asid), which expands to
> (asid + 1).  Both helpers set INVLPGB_FLAG_PCID and place the non-zero
> PCID into EDX[31:16] of the INVLPGB instruction. Issuing INVLPGB with
> INVLPGB_FLAG_PCID (RAX[1]) and a non-zero PCID (RDX[27:16]) while
> CR4.PCIDE is not set results in a #GP:

Just to be clear, the APM says there's a #GP if: "CR4.PCID =0 and
EDX[PCID] is not zero." I don't think INVLPGB_FLAG_PCID (RAX[1]) is
supposed to play a role.

Although the architecture doesn't care, it's also at least a _little_
funky to be setting INVLPGB_FLAG_PCID here:

        u8 flags = INVLPGB_FLAG_PCID | INVLPGB_FLAG_VA;

when PCIDs are disabled.

How about we add a:

static const struct cpuid_dep cpuid_deps[] = {
...
        { X86_FEATURE_INVLPGB,                 X86_FEATURE_PCID   },

and be done with it?

If you don't have PCIDs, you don't get INVLPGB at all. Yes, you could
theoretically use INVLPGB for kernel mappings without PCIDs. But all
actual CPUs that have INVLPGB presumably also have PCID support.

Then we never have to worry about cases where INVLPGB==1, but PCID==0 at
*all*.
Re: [PATCH] x86/mm: Skip global ASID broadcast TLB flush when PCID is disabled
Posted by Tom Lendacky 4 weeks ago
On 5/13/26 13:29, Dave Hansen wrote:
> On 5/13/26 11:11, Tom Lendacky wrote:
>> Booting with "nopcid" clears X86_FEATURE_PCID in x86_nopcid_setup(),
>> but X86_FEATURE_INVLPGB is left intact. On AMD CPUs that support
>> INVLPGB, broadcast TLB flushing remains active even though CR4.PCIDE
>> is off.
>>
>> There are two checks that decide whether the global ASID code runs,
>> mm_global_asid() and consider_global_asid(), that key off of the
>> X86_FEATURE_INVLPGB feature. Once an mm becomes active on more than
>> three CPUs, consider_global_asid() assigns it a global ASID, after which
>> flush_tlb_mm_range() takes the broadcast_tlb_flush() path. The
>> broadcast_tlb_flush() function calls invlpgb_flush_single_pcid_nosync() or
>> invlpgb_flush_user_nr_nosync() with kern_pcid(asid), which expands to
>> (asid + 1).  Both helpers set INVLPGB_FLAG_PCID and place the non-zero
>> PCID into EDX[31:16] of the INVLPGB instruction. Issuing INVLPGB with
>> INVLPGB_FLAG_PCID (RAX[1]) and a non-zero PCID (RDX[27:16]) while
>> CR4.PCIDE is not set results in a #GP:
> 
> Just to be clear, the APM says there's a #GP if: "CR4.PCID =0 and
> EDX[PCID] is not zero." I don't think INVLPGB_FLAG_PCID (RAX[1]) is
> supposed to play a role.
> 
> Although the architecture doesn't care, it's also at least a _little_
> funky to be setting INVLPGB_FLAG_PCID here:
> 
>         u8 flags = INVLPGB_FLAG_PCID | INVLPGB_FLAG_VA;
> 
> when PCIDs are disabled.
> 
> How about we add a:
> 
> static const struct cpuid_dep cpuid_deps[] = {
> ...
>         { X86_FEATURE_INVLPGB,                 X86_FEATURE_PCID   },
> 
> and be done with it?
> 
> If you don't have PCIDs, you don't get INVLPGB at all. Yes, you could
> theoretically use INVLPGB for kernel mappings without PCIDs. But all
> actual CPUs that have INVLPGB presumably also have PCID support.
> 
> Then we never have to worry about cases where INVLPGB==1, but PCID==0 at
> *all*.

That's ok with me. Tested and confirmed working. I can submit a v2 if
everyone is ok with this approach. I'll give it a couple of more days...

Thanks,
Tom
Re: [PATCH] x86/mm: Skip global ASID broadcast TLB flush when PCID is disabled
Posted by Tom Lendacky 4 weeks, 1 day ago
On 5/13/26 13:11, Tom Lendacky wrote:
> Booting with "nopcid" clears X86_FEATURE_PCID in x86_nopcid_setup(),
> but X86_FEATURE_INVLPGB is left intact. On AMD CPUs that support
> INVLPGB, broadcast TLB flushing remains active even though CR4.PCIDE
> is off.
> 
> There are two checks that decide whether the global ASID code runs,
> mm_global_asid() and consider_global_asid(), that key off of the
> X86_FEATURE_INVLPGB feature. Once an mm becomes active on more than
> three CPUs, consider_global_asid() assigns it a global ASID, after which
> flush_tlb_mm_range() takes the broadcast_tlb_flush() path. The
> broadcast_tlb_flush() function calls invlpgb_flush_single_pcid_nosync() or
> invlpgb_flush_user_nr_nosync() with kern_pcid(asid), which expands to
> (asid + 1).  Both helpers set INVLPGB_FLAG_PCID and place the non-zero
> PCID into EDX[31:16] of the INVLPGB instruction. Issuing INVLPGB with
> INVLPGB_FLAG_PCID (RAX[1]) and a non-zero PCID (RDX[27:16]) while
> CR4.PCIDE is not set results in a #GP:
> 
>  Oops: general protection fault, kernel NULL pointer dereference 0x1: 0000 [#1] SMP NOPTI
>  CPU: 158 UID: 0 PID: 3119 Comm: snap Not tainted 7.1.0-rc3 #1 PREEMPT(full)
>  Hardware name: ...
>  RIP: 0010:broadcast_tlb_flush+0x9d/0x210
>  Code: ... 89 da 48 83 c8 07 <0f> 01 fe eb 08 cc cc cc ...
>  RSP: 0000:ffa0000031217cb0 EFLAGS: 00010202
>  RAX: 00007f8ee8540007 RBX: 0000000008070000 RCX: 0000000000000000
>  RDX: 0000000000070000 RSI: 0000000000000001 RDI: 00007f8ee8540000
>  RBP: ff110001189a0700 R08: 00007f8ee8541000 R09: 000000000000000c
>  R10: 000000000000000c R11: 0000000000070000 R12: ff1100c0433aec80
>  R13: 0000000000000000 R14: 0000000000000008 R15: 0000000000000007
>  FS:  00007f8ed3fff6c0(0000) GS:ff1100c0b7b8a000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f8ee8540058 CR3: 00080001433af000 CR4: 0000000000751ef0
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   flush_tlb_mm_range+0xd8/0x220
>   ptep_clear_flush+0x56/0x60
>   wp_page_copy+0x293/0x6c0
>   ? _raw_spin_unlock+0x15/0x30
>   __handle_mm_fault+0x512/0x6d0
>   handle_mm_fault+0x145/0x2a0
>   do_user_addr_fault+0x187/0x6f0
>   exc_page_fault+0x6a/0x180
>   asm_exc_page_fault+0x22/0x30
>  RIP: 0033:0x55c553ebc639
>  Code: ...
>  RSP: 002b:00007f8ed3ffeaa8 EFLAGS: 00010246
>  RAX: 00007f8ee8540058 RBX: 000055c554ff77b8 RCX: 00007f8ee853c1c8
>  RDX: 0000000000000000 RSI: 000000c000062b20 RDI: 00007f8ee8540040
>  RBP: 00007f8ed3ffeb00 R08: 0000000000006040 R09: 0000000000000005
>  R10: 0000000000000004 R11: 00000000000001c0 R12: 0000000000000001
>  R13: 0000005990100204 R14: 000000c000624fc0 R15: 3fffffffffffffff
>   </TASK>
>  Modules linked in: ...
>  ---[ end trace 0000000000000000 ]---
> 
> Add an X86_FEATURE_PCID check to both mm_global_asid() and
> consider_global_asid() so that the global ASID code is skipped when PCID
> is disabled.  No global ASID is ever allocated under "nopcid",
> mm_global_asid() always returns 0, and broadcast_tlb_flush() is not
> reachable. Per-mm flushes fall back to the IPI-based path.  The remaining
> INVLPGB users all pass a PCID value of zero without setting the
> INVLPGB_FLAG_PCID flag and continue to use the broadcast TLB support.

I'm not an expert in the TLB-related support so I'd appreciate if a
bunch of eyes could be put on this to verify this is the proper approach.

Thanks!
Tom

> 
> Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes")
> Assisted-by: Claude:claude-opus-4.7
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/tlbflush.h | 3 ++-
>  arch/x86/mm/tlb.c               | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 0545fe75c3fa..c4a9ec78fe69 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -272,7 +272,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm)
>  {
>  	u16 asid;
>  
> -	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
> +	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
> +	    !cpu_feature_enabled(X86_FEATURE_PCID))
>  		return 0;
>  
>  	asid = smp_load_acquire(&mm->context.global_asid);
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index af43d177087e..e2ace9f056c9 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -445,7 +445,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid)
>   */
>  static void consider_global_asid(struct mm_struct *mm)
>  {
> -	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
> +	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) ||
> +	    !cpu_feature_enabled(X86_FEATURE_PCID))
>  		return;
>  
>  	/* Check every once in a while. */
> 
> base-commit: 1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524