From nobody Sun Feb 8 06:22:05 2026 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D45862BD0B; Sun, 25 Jan 2026 15:05:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769353521; cv=none; b=iJLwgERysq+bKpVlP95qjhyTxtXvxbsJjBvhkPW6tnDxcEiKnA+PDlelS/j8tjoiNXLPPyeqwzmuWzkP8rHbo7hSXrK0MlcgQy8OKExbrhZu8evB+0lWBJlL3WWZPP5fR5HhJ33Xst15zBQbtF+g3c+hq76nIqS4v7y6Ix4/7Zs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769353521; c=relaxed/simple; bh=zy/EZfM/s3NQM1G5nUw7nfsK164Q8zaa9O+Cc5Hh6y8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bqy/aod6E2r1BZU9WhR/NxaYiXmFR6yrBRmhCKO/axw7i7YEAW+3oIClt2qB0+DCVrVrldqiOoO863reW1xxbP8FXBOKZM2AXGeq5mq/+R8O56ihc9DVZX9SbIPEndGUuUb0jCJRKTA2Pec7GVtwrpy1Apa7Qi68dz4tMt8O2TE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=UFvlxvn7; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="UFvlxvn7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1769353509; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=f8p++v5+JYLkUeO4P6jmjl/30YD1PNIWUjnLLLkOmac=; b=UFvlxvn77i00IztWvBDjNl91MvUbvf9aWndMgBARcjHUAG7j/rqmSC7wt8+UDueuwAiSdLderaLJMASMsv28Y+ubkbMyJ7hkZrdagLuQL2CcM1pC/qy3lj+Crjcl1nPYGg9vt59DSHqbKCV28mva3glsnaUh58bgMVrrqkGDU3w= Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0WxlpgU0_1769353506 cluster:ay36) by smtp.aliyun-inc.com; Sun, 25 Jan 2026 23:05:07 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com Cc: guoren@kernel.org, ajones@ventanamicro.com, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v3 1/2] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode Date: Sun, 25 Jan 2026 23:04:49 +0800 Message-Id: <20260125150450.27068-2-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260125150450.27068-1-fangyu.yu@linux.alibaba.com> References: <20260125150450.27068-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Introduces one per-VM architecture-specific fields to support runtime configuration of the G-stage page table format: - kvm->arch.kvm_riscv_gstage_pgd_levels: the corresponding number of page table levels for the selected mode. These fields replace the previous global variables kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different virtual machines to independently select their G-stage page table format instead of being forced to share the maximum mode detected by the kernel at boot time. Signed-off-by: Fangyu Yu --- arch/riscv/include/asm/kvm_gstage.h | 11 ++--- arch/riscv/include/asm/kvm_host.h | 19 +++++++ arch/riscv/kvm/gstage.c | 77 ++++++++++++++++------------- arch/riscv/kvm/main.c | 12 ++--- arch/riscv/kvm/mmu.c | 23 ++++++--- arch/riscv/kvm/vm.c | 2 +- arch/riscv/kvm/vmid.c | 3 +- 7 files changed, 90 insertions(+), 57 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index 595e2183173e..7993b15ebfcd 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -29,16 +29,10 @@ struct kvm_gstage_mapping { #define kvm_riscv_gstage_index_bits 10 #endif =20 -extern unsigned long kvm_riscv_gstage_mode; -extern unsigned long kvm_riscv_gstage_pgd_levels; +extern unsigned long kvm_riscv_gstage_max_pgd_levels; =20 #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gs= tage_pgd_xbits)) -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \ - (kvm_riscv_gstage_pgd_levels * \ - kvm_riscv_gstage_index_bits) + \ - kvm_riscv_gstage_pgd_xbits) -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bi= ts)) =20 bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, pte_t **ptepp, u32 *ptep_level); @@ -69,4 +63,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage,= gpa_t start, gpa_t end =20 void kvm_riscv_gstage_mode_detect(void); =20 +gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *k); +unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *k); + #endif diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm= _host.h index 24585304c02b..a111bff965fa 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -87,6 +87,22 @@ struct kvm_vcpu_stat { struct kvm_arch_memory_slot { }; =20 +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels) +{ + switch (pgd_levels) { + case 2: + return HGATP_MODE_SV32X4; + case 3: + return HGATP_MODE_SV39X4; + case 4: + return HGATP_MODE_SV48X4; + case 5: + return HGATP_MODE_SV57X4; + default: + return HGATP_MODE_OFF; + } +} + struct kvm_arch { /* G-stage vmid */ struct kvm_vmid vmid; @@ -103,6 +119,9 @@ struct kvm_arch { =20 /* KVM_CAP_RISCV_MP_STATE_RESET */ bool mp_state_reset; + + unsigned long kvm_riscv_gstage_pgd_levels; + bool gstage_mode_user_initialized; }; =20 struct kvm_cpu_trap { diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index b67d60d722c2..2633f7df8866 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -12,22 +12,21 @@ #include =20 #ifdef CONFIG_64BIT -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV39X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 3; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 3; #else -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV32X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 2; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 2; #endif =20 #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) =20 -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage, + gpa_t addr, u32 level) { unsigned long mask; unsigned long shift =3D HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits *= level); =20 - if (level =3D=3D (kvm_riscv_gstage_pgd_levels - 1)) + if (level =3D=3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1) mask =3D (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; else mask =3D PTRS_PER_PTE - 1; @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t= pte) return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); } =20 -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_lev= el) +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned l= ong page_size, + u32 *out_level) { u32 i; unsigned long psz =3D 1UL << 12; =20 - for (i =3D 0; i < kvm_riscv_gstage_pgd_levels; i++) { + for (i =3D 0; i < gstage->kvm->arch.kvm_riscv_gstage_pgd_levels; i++) { if (page_size =3D=3D (psz << (i * kvm_riscv_gstage_index_bits))) { *out_level =3D i; return 0; @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page= _size, u32 *out_level) return -EINVAL; } =20 -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorde= r) +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgorder) { - if (kvm_riscv_gstage_pgd_levels < level) + if (gstage->kvm->arch.kvm_riscv_gstage_pgd_levels < level) return -EINVAL; =20 *out_pgorder =3D 12 + (level * kvm_riscv_gstage_index_bits); return 0; } =20 -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize) +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgsize) { int rc; unsigned long page_order =3D PAGE_SHIFT; =20 - rc =3D gstage_level_to_page_order(level, &page_order); + rc =3D gstage_level_to_page_order(gstage, level, &page_order); if (rc) return rc; =20 @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstag= e, gpa_t addr, pte_t **ptepp, u32 *ptep_level) { pte_t *ptep; - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; =20 *ptep_level =3D current_level; ptep =3D (pte_t *)gstage->pgd; - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; while (ptep && pte_val(ptep_get(ptep))) { if (gstage_pte_leaf(ptep)) { *ptep_level =3D current_level; @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage,= gpa_t addr, current_level--; *ptep_level =3D current_level; ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; } else { ptep =3D NULL; } @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage,= u32 level, gpa_t addr) { unsigned long order =3D PAGE_SHIFT; =20 - if (gstage_level_to_page_order(level, &order)) + if (gstage_level_to_page_order(gstage, level, &order)) return; addr &=3D ~(BIT(order) - 1); =20 @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, struct kvm_mmu_memory_cache *pcache, const struct kvm_gstage_mapping *map) { - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.kvm_riscv_gstage_pgd_levels - 1; pte_t *next_ptep =3D (pte_t *)gstage->pgd; - pte_t *ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + pte_t *ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_le= vel)]; =20 if (current_level < map->level) return -EINVAL; @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, } =20 current_level--; - ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; } =20 if (pte_val(*ptep) !=3D pte_val(map->pte)) { @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage, out_map->addr =3D gpa; out_map->level =3D 0; =20 - ret =3D gstage_page_size_to_level(page_size, &out_map->level); + ret =3D gstage_page_size_to_level(gstage, page_size, &out_map->level); if (ret) return ret; =20 @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, u32 next_ptep_level; unsigned long next_page_size, page_size; =20 - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) return; =20 @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, if (ptep_level && !gstage_pte_leaf(ptep)) { next_ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); next_ptep_level =3D ptep_level - 1; - ret =3D gstage_level_to_page_size(next_ptep_level, &next_page_size); + ret =3D gstage_level_to_page_size(gstage, next_ptep_level, &next_page_si= ze); if (ret) return; =20 @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gs= tage, =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstag= e, gpa_t start, gpa_t end =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -319,41 +321,48 @@ void __init kvm_riscv_gstage_mode_detect(void) /* Try Sv57x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV57X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV57X4; - kvm_riscv_gstage_pgd_levels =3D 5; + kvm_riscv_gstage_max_pgd_levels =3D 5; goto done; } =20 /* Try Sv48x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV48X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV48X4; - kvm_riscv_gstage_pgd_levels =3D 4; + kvm_riscv_gstage_max_pgd_levels =3D 4; goto done; } =20 /* Try Sv39x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV39X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV39X4; - kvm_riscv_gstage_pgd_levels =3D 3; + kvm_riscv_gstage_max_pgd_levels =3D 3; goto done; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV32X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV32X4; - kvm_riscv_gstage_pgd_levels =3D 2; + kvm_riscv_gstage_max_pgd_levels =3D 2; goto done; } #endif =20 /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_mode =3D HGATP_MODE_OFF; - kvm_riscv_gstage_pgd_levels =3D 0; + kvm_riscv_gstage_max_pgd_levels =3D 0; =20 done: csr_write(CSR_HGATP, 0); kvm_riscv_local_hfence_gvma_all(); } + +unsigned long kvm_riscv_gstage_gpa_bits(struct kvm_arch *ka) +{ + return (HGATP_PAGE_SHIFT + + ka->kvm_riscv_gstage_pgd_levels * kvm_riscv_gstage_index_bits + + kvm_riscv_gstage_pgd_xbits); +} + +gpa_t kvm_riscv_gstage_gpa_size(struct kvm_arch *ka) +{ + return BIT_ULL(kvm_riscv_gstage_gpa_bits(ka)); +} diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 45536af521f0..786c0025e2c3 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void) return rc; =20 kvm_riscv_gstage_mode_detect(); - switch (kvm_riscv_gstage_mode) { - case HGATP_MODE_SV32X4: + switch (kvm_riscv_gstage_max_pgd_levels) { + case 2: str =3D "Sv32x4"; break; - case HGATP_MODE_SV39X4: + case 3: str =3D "Sv39x4"; break; - case HGATP_MODE_SV48X4: + case 4: str =3D "Sv48x4"; break; - case HGATP_MODE_SV57X4: + case 5: str =3D "Sv57x4"; break; default: @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void) (rc) ? slist : "no features"); } =20 - kvm_info("using %s G-stage page table format\n", str); + kvm_info("Max G-stage page table format %s\n", str); =20 kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); =20 diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 4ab06697bfc0..f91a25175305 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phy= s_addr_t hpa, if (!writable) map.pte =3D pte_wrprotect(map.pte); =20 - ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm->arch.kvm_riscv_gstage_p= gd_levels); if (ret) goto out; =20 @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, * space addressable by the KVM guest GPA space. */ if ((new->base_gfn + new->npages) >=3D - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT)) + kvm_riscv_gstage_gpa_size(&kvm->arch) >> PAGE_SHIFT) return -EFAULT; =20 hva =3D new->userspace_addr; @@ -332,7 +332,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _memory_slot *memslot, memset(out_map, 0, sizeof(*out_map)); =20 /* We need minimum second+third level pages */ - ret =3D kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(pcache, kvm->arch.kvm_riscv_gstage_pgd= _levels); if (ret) { kvm_err("Failed to topup G-stage cache\n"); return ret; @@ -431,6 +431,10 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm) return -ENOMEM; kvm->arch.pgd =3D page_to_virt(pgd_page); kvm->arch.pgd_phys =3D page_to_phys(pgd_page); + if (!kvm->arch.gstage_mode_user_initialized) { + /* User-space didn't set KVM_CAP_RISC_HGATP_MODE capability. */ + kvm->arch.kvm_riscv_gstage_pgd_levels =3D kvm_riscv_gstage_max_pgd_level= s; + } =20 return 0; } @@ -446,10 +450,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) gstage.flags =3D 0; gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); gstage.pgd =3D kvm->arch.pgd; - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, fa= lse); + kvm_riscv_gstage_unmap_range(&gstage, 0UL, + kvm_riscv_gstage_gpa_size(&kvm->arch), false); pgd =3D READ_ONCE(kvm->arch.pgd); kvm->arch.pgd =3D NULL; kvm->arch.pgd_phys =3D 0; + kvm->arch.kvm_riscv_gstage_pgd_levels =3D 0; } spin_unlock(&kvm->mmu_lock); =20 @@ -459,11 +465,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) =20 void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu) { - unsigned long hgatp =3D kvm_riscv_gstage_mode << HGATP_MODE_SHIFT; - struct kvm_arch *k =3D &vcpu->kvm->arch; + struct kvm_arch *ka =3D &vcpu->kvm->arch; + unsigned long hgatp =3D kvm_riscv_gstage_mode(ka->kvm_riscv_gstage_pgd_le= vels) + << HGATP_MODE_SHIFT; =20 - hgatp |=3D (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; - hgatp |=3D (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + hgatp |=3D (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; + hgatp |=3D (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; =20 ncsr_write(CSR_HGATP, hgatp); =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 66d91ae6e9b2..4b2156df40fc 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -200,7 +200,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) r =3D KVM_USER_MEM_SLOTS; break; case KVM_CAP_VM_GPA_BITS: - r =3D kvm_riscv_gstage_gpa_bits; + r =3D kvm_riscv_gstage_gpa_bits(&kvm->arch); break; default: r =3D 0; diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c index cf34d448289d..c15bdb1dd8be 100644 --- a/arch/riscv/kvm/vmid.c +++ b/arch/riscv/kvm/vmid.c @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); void __init kvm_riscv_gstage_vmid_detect(void) { /* Figure-out number of VMID bits in HW */ - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_= VMID); + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_leve= ls) << + HGATP_MODE_SHIFT) | HGATP_VMID); vmid_bits =3D csr_read(CSR_HGATP); vmid_bits =3D (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; vmid_bits =3D fls_long(vmid_bits); --=20 2.50.1 From nobody Sun Feb 8 06:22:05 2026 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A979331BC84; Sun, 25 Jan 2026 15:05:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769353523; cv=none; b=rPXfts7dP2MgK06HFf9AQpYg25tjPM/emUoY+1DyYeIXkgDjyD/cBxuBF7eB2aPusWlHf+hAL0btHwEheDBz8rQUgGrdZg3Va2MSjDRzEDQ2WPZge9Gb/p3SO5fgiQF7fV96441fF7Mua3UiNJpavxBgSiUqa3Dw71WNczNaP4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769353523; c=relaxed/simple; bh=+HYGC9MkxCv0GMPldzg0W9scvCudKS8glLZcufmFJuk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=erkvx0mHxOhvZyv1zdJxvwRYyNtsopNlW9o4NuJCyttiGqTMnpA8PH7RTsglKGozPbw5q/LXUU00nSAJRIZRSQFHuljPHaRz6buqddBeYkwLHkGj0e30es1A/hoKkwcP9ubwyTCSzwENJTcyXtfMN/ihR2b/zCihQ6e0XQXBOOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=QAoLzSNw; arc=none smtp.client-ip=115.124.30.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="QAoLzSNw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1769353511; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=mf2rWU47N7BSYXT74kTz7Bi679iF6CzJnW0oUTiOa7k=; b=QAoLzSNwAGDHd76qnbtyYwmEYEXPv/CphY9Wcqmp17ni/1E15qvVFvuy0a9tE3BLQvoDU/MceGpQLLwygrjPVeAA7L5sx2KOZQ/wpffa7NLoyAtHo2MSiZPPpmm8va7wEpH7IMn8XZFwT8ONVwGvhs5/n4wvhNMz9z3SmLm0Wqg= Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0WxlpgUb_1769353508 cluster:ay36) by smtp.aliyun-inc.com; Sun, 25 Jan 2026 23:05:09 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com Cc: guoren@kernel.org, ajones@ventanamicro.com, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE Date: Sun, 25 Jan 2026 23:04:50 +0800 Message-Id: <20260125150450.27068-3-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260125150450.27068-1-fangyu.yu@linux.alibaba.com> References: <20260125150450.27068-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu This capability allows userspace to explicitly select the HGATP mode for the VM. The selected mode must be less than or equal to the max HGATP mode supported by the hardware. This capability must be enabled before creating any vCPUs, and can only be set once per VM. Signed-off-by: Fangyu Yu --- Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++ arch/riscv/kvm/vm.c | 26 ++++++++++++++++++++++++-- include/uapi/linux/kvm.h | 1 + 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 01a3abef8abb..9d0794b174c7 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8765,6 +8765,24 @@ helpful if user space wants to emulate instructions = which are not This capability can be enabled dynamically even if VCPUs were already created and are running. =20 +7.47 KVM_CAP_RISCV_SET_HGATP_MODE +--------------------------------- + +:Architectures: riscv +:Type: VM +:Parameters: args[0] contains the requested HGATP mode +:Returns: + - 0 on success. + - -EINVAL if args[0] is outside the range of HGATP modes supported by the + hardware. + - -EBUSY if vCPUs have already been created for the VM, if the VM has any + non-empty memslots, or if the capability has already been set for the = VM. + +This capability allows userspace to explicitly select the HGATP mode for +the VM. The selected mode must be less than or equal to the maximum HGATP +mode supported by the hardware. This capability must be enabled before +creating any vCPUs, and can only be set once per VM. + 8. Other capabilities. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 4b2156df40fc..7bc9b193dcaa 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) case KVM_CAP_VM_GPA_BITS: r =3D kvm_riscv_gstage_gpa_bits(&kvm->arch); break; + case KVM_CAP_RISCV_SET_HGATP_MODE: + r =3D IS_ENABLED(CONFIG_64BIT) ? 1 : 0; + break; default: r =3D 0; break; @@ -212,12 +215,31 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { + if (cap->flags) + return -EINVAL; + switch (cap->cap) { case KVM_CAP_RISCV_MP_STATE_RESET: - if (cap->flags) - return -EINVAL; kvm->arch.mp_state_reset =3D true; return 0; + case KVM_CAP_RISCV_SET_HGATP_MODE: +#ifdef CONFIG_64BIT + if (cap->args[0] < HGATP_MODE_SV39X4 || + cap->args[0] > kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels= )) + return -EINVAL; + + if (kvm->arch.gstage_mode_user_initialized || kvm->created_vcpus || + !kvm_are_all_memslots_empty(kvm)) + return -EBUSY; + + kvm->arch.gstage_mode_user_initialized =3D true; + kvm->arch.kvm_riscv_gstage_pgd_levels =3D + 3 + cap->args[0] - HGATP_MODE_SV39X4; + kvm_debug("VM (vmid:%lu) using SV%lluX4 G-stage page table format\n", + kvm->arch.vmid.vmid, + 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9); +#endif + return 0; default: return -EINVAL; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dddb781b0507..00c02a880518 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -974,6 +974,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 +#define KVM_CAP_RISCV_SET_HGATP_MODE 247 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; --=20 2.50.1