[PATCH] RISC-V: KVM: flush VS-stage TLB after VCPU migration to prevent stale entries

Ben Zong-You Xie posted 1 patch 2 months, 2 weeks ago
There is a newer version of this series
arch/riscv/kvm/vmid.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH] RISC-V: KVM: flush VS-stage TLB after VCPU migration to prevent stale entries
Posted by Ben Zong-You Xie 2 months, 2 weeks ago
From: Hui Min Mina Chou <minachou@andestech.com>

If multiple VCPUs of the same Guest/VM run on the same Host CPU,
hfence.vvma only flushes that Host CPU’s VS-stage TLB. Other Host CPUs
may retain stale VS-stage entries. When a VCPU later migrates to a
different Host CPU, it can hit these stale GVA to GPA mappings, causing
unexpected faults in the Guest.

To fix this, kvm_riscv_gstage_vmid_sanitize() is extended to flush both
G-stage and VS-stage TLBs whenever a VCPU migrates to a different Host CPU.
This ensures that no stale VS-stage mappings remain after VCPU migration.

Fixes: b79bf2025dbc ("RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()")
Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
---
 arch/riscv/kvm/vmid.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
index 3b426c800480..38c6f532a6f8 100644
--- a/arch/riscv/kvm/vmid.c
+++ b/arch/riscv/kvm/vmid.c
@@ -146,4 +146,10 @@ void kvm_riscv_gstage_vmid_sanitize(struct kvm_vcpu *vcpu)
 
 	vmid = READ_ONCE(vcpu->kvm->arch.vmid.vmid);
 	kvm_riscv_local_hfence_gvma_vmid_all(vmid);
+
+	/*
+	 * Flush VS-stage TLBs entry after VCPU migration to avoid using
+	 * stale entries.
+	 */
+	kvm_riscv_local_hfence_vvma_all(vmid);
 }
-- 
2.34.1

Re: [PATCH] RISC-V: KVM: flush VS-stage TLB after VCPU migration to prevent stale entries
Posted by Radim Krčmář 2 months, 1 week ago
2025-10-02T11:34:02+08:00, Ben Zong-You Xie <ben717@andestech.com>:
> From: Hui Min Mina Chou <minachou@andestech.com>
>
> If multiple VCPUs of the same Guest/VM run on the same Host CPU,
> hfence.vvma only flushes that Host CPU’s VS-stage TLB. Other Host CPUs
> may retain stale VS-stage entries. When a VCPU later migrates to a
> different Host CPU, it can hit these stale GVA to GPA mappings, causing
> unexpected faults in the Guest.

The issue can also be hit with a single VCPU migrated over two harts:

  1) [hart A] accessing X as Y, caching X->Y in first stage TLB
  2) [hart B] remapping X to Z, sfence.vma
  3) [hart A] accessing X as Y, instead of correct Z

Migration from 2 to 1 does hfence.gvma, but that doesn't flush first
stage TLB, so the translation produces an error due to stale entries.

What RISC-V implementation are you using?  (And does the implementation
have the same memory access performance in V=0 and V=1 modes, even
though the latter has two levels of TLBs?)

> To fix this, kvm_riscv_gstage_vmid_sanitize() is extended to flush both
> G-stage and VS-stage TLBs whenever a VCPU migrates to a different Host CPU.
> This ensures that no stale VS-stage mappings remain after VCPU migration.
>
> Fixes: b79bf2025dbc ("RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()")

b79bf2025dbc does not change behavior.
The bug must have been introduced earlier.

> Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
> Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
> ---
> diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
> @@ -146,4 +146,10 @@ void kvm_riscv_gstage_vmid_sanitize(struct kvm_vcpu *vcpu)

The function is now doing more that sanitizing gstage.
Maybe we can again call it kvm_riscv_local_tlb_sanitize()?

>  
>  	vmid = READ_ONCE(vcpu->kvm->arch.vmid.vmid);
>  	kvm_riscv_local_hfence_gvma_vmid_all(vmid);
> +
> +	/*
> +	 * Flush VS-stage TLBs entry after VCPU migration to avoid using
> +	 * stale entries.
> +	 */
> +	kvm_riscv_local_hfence_vvma_all(vmid);
>  }

I had some nits, but the approach is sound,

Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com>

Thanks.

---
There is a room for a RISC-V extension that tells whether the two TLB
flushes are needed, or hfence.gvma is enough. :)
Re: [PATCH] RISC-V: KVM: flush VS-stage TLB after VCPU migration to prevent stale entries
Posted by Ben Zong-You Xie 2 months, 1 week ago
Hi Radim,

Thanks for the review and the detailed comments.

> What RISC-V implementation are you using?  (And does the implementation
> have the same memory access performance in V=0 and V=1 modes, even
> though the latter has two levels of TLBs?)
> 

The issue is found when validating our new AndesCore AX66.
The address translation performance is the same for U and VU-mode when the uTLB is hit.

> > To fix this, kvm_riscv_gstage_vmid_sanitize() is extended to flush both
> > G-stage and VS-stage TLBs whenever a VCPU migrates to a different Host CPU.
> > This ensures that no stale VS-stage mappings remain after VCPU migration.
> >
> > Fixes: b79bf2025dbc ("RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()")
> 
> b79bf2025dbc does not change behavior.
> The bug must have been introduced earlier.
> 

Will fix the incorrect Fixes tag in the next version.
Thanks for pointing that out, we'll change to the following:

    Fixes: 92e450507d56 ("RISC-V: KVM: Cleanup stale TLB entries when host CPU changes")

> > Signed-off-by: Hui Min Mina Chou <minachou@andestech.com>
> > Signed-off-by: Ben Zong-You Xie <ben717@andestech.com>
> > ---
> > diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
> > @@ -146,4 +146,10 @@ void kvm_riscv_gstage_vmid_sanitize(struct kvm_vcpu *vcpu)
> 
> The function is now doing more that sanitizing gstage.
> Maybe we can again call it kvm_riscv_local_tlb_sanitize()?
> 

As for the naming, your suggestion makes sense.
We’re also considering whether it should be moved back from vmid.c to tlb.c,
and we’d like to hear other maintainers’ opinions before doing so.

Thanks again for your feedback.

Best regards,
Ben