[PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing

Rik van Riel posted 14 patches 9 months, 3 weeks ago
There is a newer version of this series
[PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
In the page reclaim code, we only track the CPU(s) where the TLB needs
to be flushed, rather than all the individual mappings that may be getting
invalidated.

Use broadcast TLB flushing when that is available.

This is a temporary hack to ensure that the PCID context for
tasks in the next patch gets properly flushed from the page
reclaim code, because the IPI based flushing in arch_tlbbatch_flush
only flushes the currently loaded TLB context on each CPU.

Patch 10 replaces this with the actual mechanism used to do
broadcast TLB flushing from the page reclaim code.

Signed-off-by: Rik van Riel <riel@surriel.com>
Tested-by: Manali Shukla <Manali.Shukla@amd.com>
Tested-by: Brendan Jackman <jackmanb@google.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/mm/tlb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 2d7ed0fda61f..16839651f67f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1326,7 +1326,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 	 * a local TLB flush is needed. Optimize this use-case by calling
 	 * flush_tlb_func_local() directly in this case.
 	 */
-	if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+		invlpgb_flush_all_nonglobals();
+	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
 		flush_tlb_multi(&batch->cpumask, info);
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
 		lockdep_assert_irqs_enabled();
-- 
2.47.1
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Borislav Petkov 9 months, 3 weeks ago
On Sun, Feb 23, 2025 at 02:48:56PM -0500, Rik van Riel wrote:
> In the page reclaim code, we only track the CPU(s) where the TLB needs
> to be flushed, rather than all the individual mappings that may be getting
> invalidated.
> 
> Use broadcast TLB flushing when that is available.
> 
> This is a temporary hack to ensure that the PCID context for
> tasks in the next patch gets properly flushed from the page

There's no "next patch" in git - just merge this three-liner with the next
one.

> reclaim code, because the IPI based flushing in arch_tlbbatch_flush
> only flushes the currently loaded TLB context on each CPU.
> 
> Patch 10 replaces this with the actual mechanism used to do

Same. No "patch 10" in git.

> broadcast TLB flushing from the page reclaim code.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Tested-by: Manali Shukla <Manali.Shukla@amd.com>
> Tested-by: Brendan Jackman <jackmanb@google.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  arch/x86/mm/tlb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 2d7ed0fda61f..16839651f67f 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1326,7 +1326,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>  	 * a local TLB flush is needed. Optimize this use-case by calling
>  	 * flush_tlb_func_local() directly in this case.
>  	 */
> -	if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
> +	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> +		invlpgb_flush_all_nonglobals();

I'm confused. The docs say rAX needs to be 0x6 to do "Invalidate all TLB
entries that match {ASID, PCID} excluding Global". But you're calling INVLPGB
with rAX==0 and the APM doesn't say what this does.

I'm guessing you want this to mean invalidate all non-globals for any ASID and
PCID. So I muss be missing the place in the docs where it says so...

Hmmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Mon, 2025-02-24 at 14:27 +0100, Borislav Petkov wrote:
> On Sun, Feb 23, 2025 at 02:48:56PM -0500, Rik van Riel wrote:
> > 
> > +++ b/arch/x86/mm/tlb.c
> > @@ -1326,7 +1326,9 @@ void arch_tlbbatch_flush(struct
> > arch_tlbflush_unmap_batch *batch)
> >  	 * a local TLB flush is needed. Optimize this use-case by
> > calling
> >  	 * flush_tlb_func_local() directly in this case.
> >  	 */
> > -	if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
> > +	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> > +		invlpgb_flush_all_nonglobals();
> 
> I'm confused. The docs say rAX needs to be 0x6 to do "Invalidate all
> TLB
> entries that match {ASID, PCID} excluding Global". But you're calling
> INVLPGB
> with rAX==0 and the APM doesn't say what this does.
> 
> I'm guessing you want this to mean invalidate all non-globals for any
> ASID and
> PCID. So I muss be missing the place in the docs where it says so...

You are right that it does not explicitly say it.
However, it does strongly hint at it:

"The TLB control field is specified in rAX[5:0]. It determines
which components of the address (VA, PCID, ASID) are valid for
comparison in the TLB and whether to include global entries in 
the invalidation process.

...

rAX[3:0] provides for various types of invalidations. A few 
examples are listed in the following table, but all values 
are legal." 

This text, to me, suggests we can filter the TLB
invalidations by some combination of virtual address,
PCID, or ASID, or not filter by any of those keys,
and invalidate them all.

Who do we need to ask to confirm that reading?

-- 
All Rights Reversed.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Borislav Petkov 9 months, 3 weeks ago
On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
> Who do we need to ask to confirm that reading?

Lemme figure it out.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Borislav Petkov 9 months, 3 weeks ago
On Tue, Feb 25, 2025 at 09:38:03PM +0100, Borislav Petkov wrote:
> On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
> > Who do we need to ask to confirm that reading?
> 
> Lemme figure it out.

Confirmed - that really is the case.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Tom Lendacky 9 months, 3 weeks ago
On 2/25/25 15:03, Borislav Petkov wrote:
> On Tue, Feb 25, 2025 at 09:38:03PM +0100, Borislav Petkov wrote:
>> On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
>>> Who do we need to ask to confirm that reading?
>>
>> Lemme figure it out.
> 
> Confirmed - that really is the case.

Hmmm... since this is for host/hypervisor TLB handling, this makes me
think that we should always be setting the ASID valid bit with 0 for the
ASID value in EDX[15:0] on all INVLPGB instructions in this series.
Otherwise we will also be flushing any guest ASID TLB entries that match
the conditions, when that isn't intended.

Thanks,
Tom

> 
> Thx.
>
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Wed, 2025-02-26 at 11:00 -0600, Tom Lendacky wrote:
> On 2/25/25 15:03, Borislav Petkov wrote:
> > On Tue, Feb 25, 2025 at 09:38:03PM +0100, Borislav Petkov wrote:
> > > On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
> > > > Who do we need to ask to confirm that reading?
> > > 
> > > Lemme figure it out.
> > 
> > Confirmed - that really is the case.
> 
> Hmmm... since this is for host/hypervisor TLB handling, this makes me
> think that we should always be setting the ASID valid bit with 0 for
> the
> ASID value in EDX[15:0] on all INVLPGB instructions in this series.
> Otherwise we will also be flushing any guest ASID TLB entries that
> match
> the conditions, when that isn't intended.

The ASID is always zero in this series.

We only ever use the PCID for address space identification.

-- 
All Rights Reversed.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Tom Lendacky 9 months, 3 weeks ago
On 2/26/25 11:02, Rik van Riel wrote:
> On Wed, 2025-02-26 at 11:00 -0600, Tom Lendacky wrote:
>> On 2/25/25 15:03, Borislav Petkov wrote:
>>> On Tue, Feb 25, 2025 at 09:38:03PM +0100, Borislav Petkov wrote:
>>>> On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
>>>>> Who do we need to ask to confirm that reading?
>>>>
>>>> Lemme figure it out.
>>>
>>> Confirmed - that really is the case.
>>
>> Hmmm... since this is for host/hypervisor TLB handling, this makes me
>> think that we should always be setting the ASID valid bit with 0 for
>> the
>> ASID value in EDX[15:0] on all INVLPGB instructions in this series.
>> Otherwise we will also be flushing any guest ASID TLB entries that
>> match
>> the conditions, when that isn't intended.
> 
> The ASID is always zero in this series.
> 
> We only ever use the PCID for address space identification.

Right, but the ASID valid bit is not set, so the flushing may match more
than just host/hypervisor TLB entries.

Thanks,
Tom

>
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Wed, 2025-02-26 at 11:36 -0600, Tom Lendacky wrote:
> 
> Right, but the ASID valid bit is not set, so the flushing may match
> more
> than just host/hypervisor TLB entries.
> 
Good point, when using SVM these flushes could result
in flushing more TLB entries than we really want.

On the flip side, when SVM is not initialized, the
invlpgb instruction will fail with a general protection
fault if we have anything at all in the ASID field.

I don't know whether setting the ASID valid bit in
rAX will cause a system to crash when SVM is not
enabled or initialized.

-- 
All Rights Reversed.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Tom Lendacky 9 months, 3 weeks ago
On 2/26/25 11:46, Rik van Riel wrote:
> On Wed, 2025-02-26 at 11:36 -0600, Tom Lendacky wrote:
>>
>> Right, but the ASID valid bit is not set, so the flushing may match
>> more
>> than just host/hypervisor TLB entries.
>>
> Good point, when using SVM these flushes could result
> in flushing more TLB entries than we really want.
> 
> On the flip side, when SVM is not initialized, the
> invlpgb instruction will fail with a general protection
> fault if we have anything at all in the ASID field.
> 
> I don't know whether setting the ASID valid bit in
> rAX will cause a system to crash when SVM is not
> enabled or initialized.

As long as you keep the ASID value in EDX[15:0] as 0, then you won't
#GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.

Thanks,
Tom

>
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Wed, 2025-02-26 at 12:12 -0600, Tom Lendacky wrote:
> 
> As long as you keep the ASID value in EDX[15:0] as 0, then you won't
> #GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.
> 
I've been spending some time reading the KVM code,
and I don't think invlpgb would be currently useful
with KVM.

From reading pre_svm_run(), new_asid(), and svm_vcpu_run(),
it looks like the ASID number used might be different for
each VCPU, assigned on a per (physical host) CPU basis.

It would take some surgery to change that around.

Some googling around also suggests that the ASID address
space is even more limited than the PCID address space :(

-- 
All Rights Reversed.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Tom Lendacky 9 months, 3 weeks ago
On 2/27/25 19:13, Rik van Riel wrote:
> On Wed, 2025-02-26 at 12:12 -0600, Tom Lendacky wrote:
>>
>> As long as you keep the ASID value in EDX[15:0] as 0, then you won't
>> #GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.
>>
> I've been spending some time reading the KVM code,
> and I don't think invlpgb would be currently useful
> with KVM.
> 
> From reading pre_svm_run(), new_asid(), and svm_vcpu_run(),
> it looks like the ASID number used might be different for
> each VCPU, assigned on a per (physical host) CPU basis.
> 
> It would take some surgery to change that around.
> 
> Some googling around also suggests that the ASID address
> space is even more limited than the PCID address space :(

Right, to support using INVLPGB in guests you need a global ASID, which is
an ASID that doesn't change over the VMs lifetime and is used on all
vCPUs. Global ASIDs are only available and used today with SEV guests. At
that point you would not intercept the instruction and, based on APM vol
3, the ASID value is replaced with the guest ASID value.

"A guest that executes a legal INVLPGB that is not intercepted will have
the requested ASID field replaced by the current ASID and the valid ASID
bit set before doing the broadcast invalidation."

So I'm in the process of verifying that issuing INVLPLG in a guest with
the ASID valid bit set and an ASID value of 0 (EDX[15:0]) won't #GP, but
will just replace the specified ASID value with the guest ASID value in
hardware.

For non-SEV guests, INVLPGB would need to be intercepted and somehow
emulated or just not advertised to the guest so that the IPI path is used.

Thanks,
Tom

>
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Tom Lendacky 9 months, 3 weeks ago
On 2/28/25 09:02, Tom Lendacky wrote:
> On 2/27/25 19:13, Rik van Riel wrote:
>> On Wed, 2025-02-26 at 12:12 -0600, Tom Lendacky wrote:
>>>
>>> As long as you keep the ASID value in EDX[15:0] as 0, then you won't
>>> #GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.
>>>
>> I've been spending some time reading the KVM code,
>> and I don't think invlpgb would be currently useful
>> with KVM.
>>
>> From reading pre_svm_run(), new_asid(), and svm_vcpu_run(),
>> it looks like the ASID number used might be different for
>> each VCPU, assigned on a per (physical host) CPU basis.
>>
>> It would take some surgery to change that around.
>>
>> Some googling around also suggests that the ASID address
>> space is even more limited than the PCID address space :(
> 
> Right, to support using INVLPGB in guests you need a global ASID, which is
> an ASID that doesn't change over the VMs lifetime and is used on all
> vCPUs. Global ASIDs are only available and used today with SEV guests. At
> that point you would not intercept the instruction and, based on APM vol
> 3, the ASID value is replaced with the guest ASID value.
> 
> "A guest that executes a legal INVLPGB that is not intercepted will have
> the requested ASID field replaced by the current ASID and the valid ASID
> bit set before doing the broadcast invalidation."
> 
> So I'm in the process of verifying that issuing INVLPLG in a guest with
> the ASID valid bit set and an ASID value of 0 (EDX[15:0]) won't #GP, but
> will just replace the specified ASID value with the guest ASID value in
> hardware.

I verified that when (a non-intercepted) INVLPGB is issued in a guest,
hardware will set the ASID valid bit and use the guest ASID value
(regardless of the value specified in EDX[15:0]) before doing the
broadcast invalidation.

So the implementation of setting the ASID-valid bit and specifying ASID 0
is not incompatible in the guest.

Thanks,
Tom

> 
> For non-SEV guests, INVLPGB would need to be intercepted and somehow
> emulated or just not advertised to the guest so that the IPI path is used.
> 
> Thanks,
> Tom
> 
>>
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Sean Christopherson 9 months, 3 weeks ago
On Fri, Feb 28, 2025, Tom Lendacky wrote:
> On 2/27/25 19:13, Rik van Riel wrote:
> > On Wed, 2025-02-26 at 12:12 -0600, Tom Lendacky wrote:
> >>
> >> As long as you keep the ASID value in EDX[15:0] as 0, then you won't
> >> #GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.
> >>
> > I've been spending some time reading the KVM code,
> > and I don't think invlpgb would be currently useful
> > with KVM.
> > 
> > From reading pre_svm_run(), new_asid(), and svm_vcpu_run(),
> > it looks like the ASID number used might be different for
> > each VCPU, assigned on a per (physical host) CPU basis.
> > 
> > It would take some surgery to change that around.
> > 
> > Some googling around also suggests that the ASID address
> > space is even more limited than the PCID address space :(

KVM's mess of ASID handling isn't due to space limitations, it's because early
AMD hardware didn't support a targeted ASID flush.  To avoid flushing the entire
TLB, KVM fudged around lack of precise flushing by using a new ASID.  Under the
hood, hardware uses the new ASID so the previous entries are effectively"flushed",
and will eventually be flushed for real once their evicted due to TLB pressure.

Irrespective of INVLPGB support, I am all in favor of ripping out the ASID
shenanigans in favor of static, per-VM ASIDs for all VM types.  For CPUs that
don't support FLUSHBYASID, KVM can simply blast TLB_CONTROL_FLUSH_ALL_ASID.

FLUSHBYASID was added ~15 years ago.  If someone is still running hardware that's
that old, they can't possibly care about performance.

That would meaningfuly simplify KVM code, likely be a performance win on modern
hardware, and gives us direct line of sight to using INVLPGB (assuming it's a
performance win).
 
> Right, to support using INVLPGB in guests you need a global ASID, 

I'm pretty sure Rik is talking about using INVLPGB in the _host_, e.g. by doing
INVLPGB in kvm_arch_flush_remote_tlbs() instead of blasing an IPI to all vCPUs.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Wed, 2025-02-26 at 12:12 -0600, Tom Lendacky wrote:
> On 2/26/25 11:46, Rik van Riel wrote:
> > 
> > I don't know whether setting the ASID valid bit in
> > rAX will cause a system to crash when SVM is not
> > enabled or initialized.
> 
> As long as you keep the ASID value in EDX[15:0] as 0, then you won't
> #GP. ASID 0 is the host/hypervisor. An ASID > 0 belongs to a guest.

Thank you, that is good to know.

I think I'll wait for feedback on v14 of the series
to figure out whether to post that as part of a new
series, or as a follow-up enhancement.

-- 
All Rights Reversed.
Re: [PATCH v13 06/14] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing
Posted by Rik van Riel 9 months, 3 weeks ago
On Tue, 2025-02-25 at 22:03 +0100, Borislav Petkov wrote:
> On Tue, Feb 25, 2025 at 09:38:03PM +0100, Borislav Petkov wrote:
> > On Tue, Feb 25, 2025 at 02:17:20PM -0500, Rik van Riel wrote:
> > > Who do we need to ask to confirm that reading?
> > 
> > Lemme figure it out.
> 
> Confirmed - that really is the case.

Awesome, thank you for checking that!

-- 
All Rights Reversed.