From nobody Sat Feb 7 19:04:12 2026 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C49624290D; Wed, 4 Feb 2026 13:45:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212730; cv=none; b=I/cBQK+YIqjMdB0wL3zyzmUqfAQaNp+cSSxudiGWWi3qBp0KXxNnB3V64RwmaQugrMOOuIuMmjuZ0A0A1Nie437WbyN3slfDYf5wz5zPn+cqmsY8DuzUrvfA/dyRHKIEwmqm3haQBIyyVyAkz9Hc7NuisPnozGI9TgXiEt9zDFg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212730; c=relaxed/simple; bh=l2QoocVh1MkjJ/OASEXnWhSzB3evRgqizqlv/TPBEXw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y6z/dsk3IhjacWa20FWOEvgXoHH4rca1lcXfOkZoxxvcfRxl+LNdR5rIMFxpvdCvcche/GNtgC4Ro7oVC2eP8xV3ylNSwsVjzNx2rmHe59Eomt5LTezlF1cXEkwregXt32YjkpNjgMPJwOV1L7XW5deaPBIzM9GV5DZvrXdDvWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=W7XfmUOO; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="W7XfmUOO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1770212720; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=UcyR3qVHV1UnQIcaNKvbBmMhNxOrCjCwN5PBTCop7RA=; b=W7XfmUOOkD0ttS2lj29fhqKkkTc4DfUvF0ehdCA27RmroEXc1qLRx8Wvs7GlQ1iDEEOy727+0oSqRpENIScmLARUjpNHEvX3Yu+XbLNvUMYADf8XdZW9htfPH0Q7cUuQGRfNzr/ABWzXRJ7Zkgp62PT2MxMX+zMLzVY9lsqjO8M= Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0WyXBz83_1770212716 cluster:ay36) by smtp.aliyun-inc.com; Wed, 04 Feb 2026 21:45:18 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v5 1/3] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode Date: Wed, 4 Feb 2026 21:45:05 +0800 Message-Id: <20260204134507.33912-2-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> References: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Introduces one per-VM architecture-specific fields to support runtime configuration of the G-stage page table format: - kvm->arch.kvm_riscv_gstage_pgd_levels: the corresponding number of page table levels for the selected mode. These fields replace the previous global variables kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different virtual machines to independently select their G-stage page table format instead of being forced to share the maximum mode detected by the kernel at boot time. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones --- arch/riscv/include/asm/kvm_gstage.h | 20 +++++---- arch/riscv/include/asm/kvm_host.h | 19 +++++++++ arch/riscv/kvm/gstage.c | 65 ++++++++++++++--------------- arch/riscv/kvm/main.c | 12 +++--- arch/riscv/kvm/mmu.c | 20 +++++---- arch/riscv/kvm/vm.c | 2 +- arch/riscv/kvm/vmid.c | 3 +- 7 files changed, 84 insertions(+), 57 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index 595e2183173e..b12605fbca44 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -29,16 +29,22 @@ struct kvm_gstage_mapping { #define kvm_riscv_gstage_index_bits 10 #endif =20 -extern unsigned long kvm_riscv_gstage_mode; -extern unsigned long kvm_riscv_gstage_pgd_levels; +extern unsigned long kvm_riscv_gstage_max_pgd_levels; =20 #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gs= tage_pgd_xbits)) -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \ - (kvm_riscv_gstage_pgd_levels * \ - kvm_riscv_gstage_index_bits) + \ - kvm_riscv_gstage_pgd_xbits) -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bi= ts)) + +static inline unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *ka) +{ + return (HGATP_PAGE_SHIFT + + ka->kvm_riscv_gstage_pgd_levels * kvm_riscv_gstage_index_bits + + kvm_riscv_gstage_pgd_xbits); +} + +static inline gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *ka) +{ + return BIT_ULL(kvm_riscv_gstage_gpa_bits(ka)); +} =20 bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, pte_t **ptepp, u32 *ptep_level); diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm= _host.h index 24585304c02b..0ace5e98c133 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -87,6 +87,23 @@ struct kvm_vcpu_stat { struct kvm_arch_memory_slot { }; =20 +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels) +{ + switch (pgd_levels) { + case 2: + return HGATP_MODE_SV32X4; + case 3: + return HGATP_MODE_SV39X4; + case 4: + return HGATP_MODE_SV48X4; + case 5: + return HGATP_MODE_SV57X4; + default: + WARN_ON_ONCE(1); + return HGATP_MODE_OFF; + } +} + struct kvm_arch { /* G-stage vmid */ struct kvm_vmid vmid; @@ -103,6 +120,8 @@ struct kvm_arch { =20 /* KVM_CAP_RISCV_MP_STATE_RESET */ bool mp_state_reset; + + unsigned long kvm_riscv_gstage_pgd_levels; }; =20 struct kvm_cpu_trap { diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index b67d60d722c2..2d0045f502d1 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -12,22 +12,21 @@ #include =20 #ifdef CONFIG_64BIT -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV39X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 3; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 3; #else -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV32X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 2; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 2; #endif =20 #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) =20 -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage, + gpa_t addr, u32 level) { unsigned long mask; unsigned long shift =3D HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits *= level); =20 - if (level =3D=3D (kvm_riscv_gstage_pgd_levels - 1)) + if (level =3D=3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1) mask =3D (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; else mask =3D PTRS_PER_PTE - 1; @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t= pte) return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); } =20 -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_lev= el) +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned l= ong page_size, + u32 *out_level) { u32 i; unsigned long psz =3D 1UL << 12; =20 - for (i =3D 0; i < kvm_riscv_gstage_pgd_levels; i++) { + for (i =3D 0; i < gstage->kvm->arch.kvm_riscv_gstage_pgd_levels; i++) { if (page_size =3D=3D (psz << (i * kvm_riscv_gstage_index_bits))) { *out_level =3D i; return 0; @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page= _size, u32 *out_level) return -EINVAL; } =20 -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorde= r) +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgorder) { - if (kvm_riscv_gstage_pgd_levels < level) + if (gstage->kvm->arch.kvm_riscv_gstage_pgd_levels < level) return -EINVAL; =20 *out_pgorder =3D 12 + (level * kvm_riscv_gstage_index_bits); return 0; } =20 -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize) +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgsize) { int rc; unsigned long page_order =3D PAGE_SHIFT; =20 - rc =3D gstage_level_to_page_order(level, &page_order); + rc =3D gstage_level_to_page_order(gstage, level, &page_order); if (rc) return rc; =20 @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstag= e, gpa_t addr, pte_t **ptepp, u32 *ptep_level) { pte_t *ptep; - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; =20 *ptep_level =3D current_level; ptep =3D (pte_t *)gstage->pgd; - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; while (ptep && pte_val(ptep_get(ptep))) { if (gstage_pte_leaf(ptep)) { *ptep_level =3D current_level; @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage,= gpa_t addr, current_level--; *ptep_level =3D current_level; ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; } else { ptep =3D NULL; } @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage,= u32 level, gpa_t addr) { unsigned long order =3D PAGE_SHIFT; =20 - if (gstage_level_to_page_order(level, &order)) + if (gstage_level_to_page_order(gstage, level, &order)) return; addr &=3D ~(BIT(order) - 1); =20 @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, struct kvm_mmu_memory_cache *pcache, const struct kvm_gstage_mapping *map) { - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; pte_t *next_ptep =3D (pte_t *)gstage->pgd; - pte_t *ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + pte_t *ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_le= vel)]; =20 if (current_level < map->level) return -EINVAL; @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, } =20 current_level--; - ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; } =20 if (pte_val(*ptep) !=3D pte_val(map->pte)) { @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage, out_map->addr =3D gpa; out_map->level =3D 0; =20 - ret =3D gstage_page_size_to_level(page_size, &out_map->level); + ret =3D gstage_page_size_to_level(gstage, page_size, &out_map->level); if (ret) return ret; =20 @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, u32 next_ptep_level; unsigned long next_page_size, page_size; =20 - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) return; =20 @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, if (ptep_level && !gstage_pte_leaf(ptep)) { next_ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); next_ptep_level =3D ptep_level - 1; - ret =3D gstage_level_to_page_size(next_ptep_level, &next_page_size); + ret =3D gstage_level_to_page_size(gstage, next_ptep_level, &next_page_si= ze); if (ret) return; =20 @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gs= tage, =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstag= e, gpa_t start, gpa_t end =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void) /* Try Sv57x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV57X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV57X4; - kvm_riscv_gstage_pgd_levels =3D 5; + kvm_riscv_gstage_max_pgd_levels =3D 5; goto done; } =20 /* Try Sv48x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV48X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV48X4; - kvm_riscv_gstage_pgd_levels =3D 4; + kvm_riscv_gstage_max_pgd_levels =3D 4; goto done; } =20 /* Try Sv39x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV39X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV39X4; - kvm_riscv_gstage_pgd_levels =3D 3; + kvm_riscv_gstage_max_pgd_levels =3D 3; goto done; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV32X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV32X4; - kvm_riscv_gstage_pgd_levels =3D 2; + kvm_riscv_gstage_max_pgd_levels =3D 2; goto done; } #endif =20 /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_mode =3D HGATP_MODE_OFF; - kvm_riscv_gstage_pgd_levels =3D 0; + kvm_riscv_gstage_max_pgd_levels =3D 0; =20 done: csr_write(CSR_HGATP, 0); diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 45536af521f0..786c0025e2c3 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void) return rc; =20 kvm_riscv_gstage_mode_detect(); - switch (kvm_riscv_gstage_mode) { - case HGATP_MODE_SV32X4: + switch (kvm_riscv_gstage_max_pgd_levels) { + case 2: str =3D "Sv32x4"; break; - case HGATP_MODE_SV39X4: + case 3: str =3D "Sv39x4"; break; - case HGATP_MODE_SV48X4: + case 4: str =3D "Sv48x4"; break; - case HGATP_MODE_SV57X4: + case 5: str =3D "Sv57x4"; break; default: @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void) (rc) ? slist : "no features"); } =20 - kvm_info("using %s G-stage page table format\n", str); + kvm_info("Max G-stage page table format %s\n", str); =20 kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); =20 diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 4ab06697bfc0..458a2ed98818 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phy= s_addr_t hpa, if (!writable) map.pte =3D pte_wrprotect(map.pte); =20 - ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm->arch.kvm_riscv_gstage_p= gd_levels); if (ret) goto out; =20 @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, * space addressable by the KVM guest GPA space. */ if ((new->base_gfn + new->npages) >=3D - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT)) + kvm_riscv_gstage_gpa_size(&kvm->arch) >> PAGE_SHIFT) return -EFAULT; =20 hva =3D new->userspace_addr; @@ -332,7 +332,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _memory_slot *memslot, memset(out_map, 0, sizeof(*out_map)); =20 /* We need minimum second+third level pages */ - ret =3D kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(pcache, kvm->arch.kvm_riscv_gstage_pgd= _levels); if (ret) { kvm_err("Failed to topup G-stage cache\n"); return ret; @@ -431,6 +431,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm) return -ENOMEM; kvm->arch.pgd =3D page_to_virt(pgd_page); kvm->arch.pgd_phys =3D page_to_phys(pgd_page); + kvm->arch.kvm_riscv_gstage_pgd_levels =3D kvm_riscv_gstage_max_pgd_levels; =20 return 0; } @@ -446,10 +447,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) gstage.flags =3D 0; gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); gstage.pgd =3D kvm->arch.pgd; - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, fa= lse); + kvm_riscv_gstage_unmap_range(&gstage, 0UL, + kvm_riscv_gstage_gpa_size(&kvm->arch), false); pgd =3D READ_ONCE(kvm->arch.pgd); kvm->arch.pgd =3D NULL; kvm->arch.pgd_phys =3D 0; + kvm->arch.kvm_riscv_gstage_pgd_levels =3D 0; } spin_unlock(&kvm->mmu_lock); =20 @@ -459,11 +462,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) =20 void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu) { - unsigned long hgatp =3D kvm_riscv_gstage_mode << HGATP_MODE_SHIFT; - struct kvm_arch *k =3D &vcpu->kvm->arch; + struct kvm_arch *ka =3D &vcpu->kvm->arch; + unsigned long hgatp =3D kvm_riscv_gstage_mode(ka->kvm_riscv_gstage_pgd_le= vels) + << HGATP_MODE_SHIFT; =20 - hgatp |=3D (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; - hgatp |=3D (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + hgatp |=3D (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; + hgatp |=3D (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; =20 ncsr_write(CSR_HGATP, hgatp); =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 66d91ae6e9b2..4b2156df40fc 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -200,7 +200,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) r =3D KVM_USER_MEM_SLOTS; break; case KVM_CAP_VM_GPA_BITS: - r =3D kvm_riscv_gstage_gpa_bits; + r =3D kvm_riscv_gstage_gpa_bits(&kvm->arch); break; default: r =3D 0; diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c index cf34d448289d..c15bdb1dd8be 100644 --- a/arch/riscv/kvm/vmid.c +++ b/arch/riscv/kvm/vmid.c @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); void __init kvm_riscv_gstage_vmid_detect(void) { /* Figure-out number of VMID bits in HW */ - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_= VMID); + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_leve= ls) << + HGATP_MODE_SHIFT) | HGATP_VMID); vmid_bits =3D csr_read(CSR_HGATP); vmid_bits =3D (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; vmid_bits =3D fls_long(vmid_bits); --=20 2.50.1 From nobody Sat Feb 7 19:04:12 2026 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 773D515ECCC; Wed, 4 Feb 2026 13:45:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212728; cv=none; b=eqXQP/dgy4QaxScKmrQCpVY//3Uyymas9iigmrikRKq0ruR6cm7naQwpRyCjQ/KDG9JjDPlTBHAS9rhqfMnt0TLrhmGTNJ5XOcg/8YTysTOueHVKB2EpkU+SZNp2IsQx7f1EFKK8tN758FCBtUtHx9b0RljIZy4bYue9pnetpqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212728; c=relaxed/simple; bh=87LIlEktxosYN3tpXjpSdcBf0foNHMP4J/TIDkS36u8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CDD/r1+UHeiI8uWax+SjzqzOvrJXlQAZYo9dNYGJJJiifb1z1HQ16gDCnV0pxjJ+2qDsOawiLPxB0LiLwnpbvlJfxhedVzPHICfvtDwBSau3j17mb8waGZCoCrjmGtiROTKfJQwtyvCJl61woFkUTE0vucq0IpGAOl8JkR57yWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=SuvUr+FR; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="SuvUr+FR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1770212722; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=yciz5NKAtbksgmsOTnGqD+95AQXuV0o1EGcHqIl5HB4=; b=SuvUr+FRZSFWmBPtdpd+I/J2gt1dSzIJmqebiF+u/YO0mHJYIL8x4YHdmwQWARtFKEz1W7V6SXpjQ/ANugJ3mvGI1U32XGI3JggIsXCQrBztvaQIJf6gm+jUfLg2xJC3d1PcxgRWQAds3V+3MgFiAkzmnwqxVDBXgtunhaLymj0= Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0WyXBz9k_1770212719 cluster:ay36) by smtp.aliyun-inc.com; Wed, 04 Feb 2026 21:45:20 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v5 2/3] RISC-V: KVM: Detect and expose supported HGATP G-stage modes Date: Wed, 4 Feb 2026 21:45:06 +0800 Message-Id: <20260204134507.33912-3-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> References: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values supported by the host and record them in a bitmask. Keep tracking the maximum supported G-stage page table level for existing internal users. Also provide lightweight helpers to retrieve the supported-mode bitmask and validate a requested HGATP.MODE against it. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones --- arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index b12605fbca44..76c37b5dc02d 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -30,6 +30,7 @@ struct kvm_gstage_mapping { #endif =20 extern unsigned long kvm_riscv_gstage_max_pgd_levels; +extern u32 kvm_riscv_gstage_mode_mask; =20 #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gs= tage_pgd_xbits)) @@ -75,4 +76,14 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage= , gpa_t start, gpa_t end =20 void kvm_riscv_gstage_mode_detect(void); =20 +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) +{ + return kvm_riscv_gstage_mode_mask; +} + +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) +{ + return kvm_riscv_gstage_mode_mask & BIT(mode); +} + #endif diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index 2d0045f502d1..328d4138f162 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_= init =3D 3; #else unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 2; #endif +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ +u32 kvm_riscv_gstage_mode_mask __ro_after_init; =20 #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gst= age, gpa_t start, gpa_t end } } =20 +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) +{ + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D mode); +} + void __init kvm_riscv_gstage_mode_detect(void) { + kvm_riscv_gstage_mode_mask =3D 0; + kvm_riscv_gstage_max_pgd_levels =3D 0; + #ifdef CONFIG_64BIT - /* Try Sv57x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV57X4) { - kvm_riscv_gstage_max_pgd_levels =3D 5; - goto done; + /* Try Sv39x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { + kvm_riscv_gstage_mode_mask |=3D BIT(HGATP_MODE_SV39X4); + kvm_riscv_gstage_max_pgd_levels =3D 3; } =20 /* Try Sv48x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV48X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { + kvm_riscv_gstage_mode_mask |=3D BIT(HGATP_MODE_SV48X4); kvm_riscv_gstage_max_pgd_levels =3D 4; - goto done; } =20 - /* Try Sv39x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV39X4) { - kvm_riscv_gstage_max_pgd_levels =3D 3; - goto done; + /* Try Sv57x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { + kvm_riscv_gstage_mode_mask |=3D BIT(HGATP_MODE_SV57X4); + kvm_riscv_gstage_max_pgd_levels =3D 5; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV32X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { + kvm_riscv_gstage_mode_mask |=3D BIT(HGATP_MODE_SV32X4); kvm_riscv_gstage_max_pgd_levels =3D 2; - goto done; } #endif =20 - /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_max_pgd_levels =3D 0; - -done: csr_write(CSR_HGATP, 0); kvm_riscv_local_hfence_gvma_all(); } --=20 2.50.1 From nobody Sat Feb 7 19:04:12 2026 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82073392827; Wed, 4 Feb 2026 13:45:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.99 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212736; cv=none; b=nHFzo9WfUF8T0EsbSL8XHkScJnFZZpEcCRhjkFwlC3/KP6pbkx7wB6O1MV3Vulg8f5Zabvf9TD1cp8nkYQu7TfclwYFXgEK9/1ltzlK/fWQLQOqjrbtN2XxBV8KyurGsmovq0y/qWczq+fUpiN5ChUuULBGwiKyj3hcLkFnFT7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770212736; c=relaxed/simple; bh=6AoUWFoe1qTdaPzHfdR8hNXmzBqZ37rlpHoBFd6gM1M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=T6z11rme227JXTTxJQeYbpF6aPt5Zstu+FNfNzL22wmfbyTUtPnZERwhuI4AMMJYzVmv/bKtxvkmlw1EJsPD8aAIpdbRzE38bAd8xXbXp2qpkN2k9pkhngK9Wf1fUnmQNbI9qi5d1aQS+/Pl083ly32tVnhRoxsZ3acc7ZXmYTk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=p8L3brfY; arc=none smtp.client-ip=115.124.30.99 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="p8L3brfY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1770212724; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=5R42yBenU3WsqPgksKfBN/gXqqDJrrdf2h4U39oXsd8=; b=p8L3brfYKi6VejGdkPtQhQstAeQ2lIpJfXKDrlXjdyndWuit/nXBGn/lM605LeSCM1tkTZy8Rs0IReLS0FJr1+VbCm3boFg2JrCNIA6Zelr6QTz++3/XotT3TYsjcaA11jKGNDKCWMW6GR0FHpfmozvARvKAauXaLgtiR7l5e7A= Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0WyXBzAg_1770212721 cluster:ay36) by smtp.aliyun-inc.com; Wed, 04 Feb 2026 21:45:22 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v5 3/3] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE Date: Wed, 4 Feb 2026 21:45:07 +0800 Message-Id: <20260204134507.33912-4-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> References: <20260204134507.33912-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Add a VM capability that allows userspace to select the G-stage page table format by setting HGATP.MODE on a per-VM basis. Userspace enables the capability via KVM_ENABLE_CAP, passing the requested HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is not supported by the host, and with -EBUSY if the VM has already been committed (e.g. vCPUs have been created or any memslot is populated). KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the HGATP.MODE formats supported by the host. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones --- Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ arch/riscv/kvm/vm.c | 19 +++++++++++++++++-- include/uapi/linux/kvm.h | 1 + 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 01a3abef8abb..62dc120857c1 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8765,6 +8765,33 @@ helpful if user space wants to emulate instructions = which are not This capability can be enabled dynamically even if VCPUs were already created and are running. =20 +7.47 KVM_CAP_RISCV_SET_HGATP_MODE +--------------------------------- + +:Architectures: riscv +:Type: VM +:Parameters: args[0] contains the requested HGATP mode +:Returns: + - 0 on success. + - -EINVAL if args[0] is outside the range of HGATP modes supported by the + hardware. + - -EBUSY if vCPUs have already been created for the VM, if the VM has any + non-empty memslots. + +This capability allows userspace to explicitly select the HGATP mode for +the VM. The selected mode must be supported by both KVM and hardware. This +capability must be enabled before creating any vCPUs or memslots. + +If this capability is not enabled, KVM will select the default HGATP mode +automatically. The default is the highest HGATP.MODE value supported by +hardware. + +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of +HGATP.MODE values supported by the host. A return value of 0 indicates that +the capability is not supported. Supported-mode bitmask use HGATP.MODE +encodings as defined by the RISC-V privileged specification, such as Sv39x4 +corresponds to HGATP.MODE=3D8, so userspace should test bitmask & BIT(8). + 8. Other capabilities. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 4b2156df40fc..7d1e1d257df5 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) case KVM_CAP_VM_GPA_BITS: r =3D kvm_riscv_gstage_gpa_bits(&kvm->arch); break; + case KVM_CAP_RISCV_SET_HGATP_MODE: + r =3D kvm_riscv_get_hgatp_mode_mask(); + break; default: r =3D 0; break; @@ -212,12 +215,24 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { + if (cap->flags) + return -EINVAL; + switch (cap->cap) { case KVM_CAP_RISCV_MP_STATE_RESET: - if (cap->flags) - return -EINVAL; kvm->arch.mp_state_reset =3D true; return 0; + case KVM_CAP_RISCV_SET_HGATP_MODE: + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0])) + return -EINVAL; + + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm)) + return -EBUSY; +#ifdef CONFIG_64BIT + kvm->arch.kvm_riscv_gstage_pgd_levels =3D + 3 + cap->args[0] - HGATP_MODE_SV39X4; +#endif + return 0; default: return -EINVAL; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dddb781b0507..00c02a880518 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -974,6 +974,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 +#define KVM_CAP_RISCV_SET_HGATP_MODE 247 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; --=20 2.50.1