From nobody Thu Apr 2 01:54:01 2026 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CB0A3CD8AA; Mon, 30 Mar 2026 12:26:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873601; cv=none; b=jioHk1CpSpzQxVw8HAiHf2zDlsQYeYdsp9uevC2LNlH8rrlUIBAdiBo/zFbsIfuudTpf/vcqZsTn9SyGyG0icACKtbyXsgccINBn4PkMI4Ir4/f8qNB5Ddjy4Os5N4+tYxBDG93TlU1IhxDgZXjtYsU42grWUFwkaOjhGUgqClI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873601; c=relaxed/simple; bh=35d6NZTFNWLSn2ug/3HfrJHmzVoelEuuRcwF9MNGi18=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aWwzwiRi7SfvVPnAuN0RLauiOhJYLYyu1slb0GfHUg/5wAUDnFurfDRtFdGHwrPNiDzKiURn+OSwerPXj1NsLqJ4DoYUt8ZYmWQG89JZa1F7Dy2BDUag6Jm8i9kCNUQAD7uiHLtGLHbAf/Mea1h+uNdSPKQamBEI+YLaFUkQT/U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=BHsKLxbJ; arc=none smtp.client-ip=115.124.30.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="BHsKLxbJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1774873591; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=IP4OR0IOXc3EH919Yujr0f/z3szQm2ZfCPxkDR27Nqw=; b=BHsKLxbJzGss7bEy7qIhZTm0YiJLrraUZSNMMAZou7RoMe0OwtgpDgRPVvunJteI7OQ71nE/f+bGrb70Ng4LlCZ0DNaqMmQ8n37XmvVpOfBe76jbsHhyrFbaq+hnAvTICr8F/nxTPYoAnZUTsaF8hIPDVL5w5I5CN7UQzgY+vsQ= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037033178;MF=fangyu.yu@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0X.zfcHu_1774873588; Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0X.zfcHu_1774873588 cluster:ay36) by smtp.aliyun-inc.com; Mon, 30 Mar 2026 20:26:29 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, skhan@linuxfoundation.org Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v6 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode Date: Mon, 30 Mar 2026 20:25:58 +0800 Message-Id: <20260330122601.22140-2-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> References: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Introduces one per-VM architecture-specific fields to support runtime configuration of the G-stage page table format: - kvm->arch.pgd_levels: the corresponding number of page table levels for the selected mode. These fields replace the previous global variables kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different virtual machines to independently select their G-stage page table format instead of being forced to share the maximum mode detected by the kernel at boot time. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones Reviewed-by: Anup Patel Reviewed-by: Guo Ren --- arch/riscv/include/asm/kvm_gstage.h | 37 ++++++++++++---- arch/riscv/include/asm/kvm_host.h | 1 + arch/riscv/kvm/gstage.c | 65 ++++++++++++++--------------- arch/riscv/kvm/main.c | 12 +++--- arch/riscv/kvm/mmu.c | 20 +++++---- arch/riscv/kvm/vm.c | 2 +- arch/riscv/kvm/vmid.c | 3 +- 7 files changed, 83 insertions(+), 57 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index 595e2183173e..5aa58d1f692a 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -29,16 +29,22 @@ struct kvm_gstage_mapping { #define kvm_riscv_gstage_index_bits 10 #endif =20 -extern unsigned long kvm_riscv_gstage_mode; -extern unsigned long kvm_riscv_gstage_pgd_levels; +extern unsigned long kvm_riscv_gstage_max_pgd_levels; =20 #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gs= tage_pgd_xbits)) -#define kvm_riscv_gstage_gpa_bits (HGATP_PAGE_SHIFT + \ - (kvm_riscv_gstage_pgd_levels * \ - kvm_riscv_gstage_index_bits) + \ - kvm_riscv_gstage_pgd_xbits) -#define kvm_riscv_gstage_gpa_size ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bi= ts)) + +static inline unsigned long kvm_riscv_gstage_gpa_bits(unsigned long pgd_le= vels) +{ + return (HGATP_PAGE_SHIFT + + pgd_levels * kvm_riscv_gstage_index_bits + + kvm_riscv_gstage_pgd_xbits); +} + +static inline gpa_t kvm_riscv_gstage_gpa_size(unsigned long pgd_levels) +{ + return BIT_ULL(kvm_riscv_gstage_gpa_bits(pgd_levels)); +} =20 bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr, pte_t **ptepp, u32 *ptep_level); @@ -69,4 +75,21 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage= , gpa_t start, gpa_t end =20 void kvm_riscv_gstage_mode_detect(void); =20 +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels) +{ + switch (pgd_levels) { + case 2: + return HGATP_MODE_SV32X4; + case 3: + return HGATP_MODE_SV39X4; + case 4: + return HGATP_MODE_SV48X4; + case 5: + return HGATP_MODE_SV57X4; + default: + WARN_ON_ONCE(1); + return HGATP_MODE_OFF; + } +} + #endif diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm= _host.h index 24585304c02b..478f699e9dec 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -94,6 +94,7 @@ struct kvm_arch { /* G-stage page table */ pgd_t *pgd; phys_addr_t pgd_phys; + unsigned long pgd_levels; =20 /* Guest Timer */ struct kvm_guest_timer timer; diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index b67d60d722c2..4beb9322fe76 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -12,22 +12,21 @@ #include =20 #ifdef CONFIG_64BIT -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV39X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 3; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 3; #else -unsigned long kvm_riscv_gstage_mode __ro_after_init =3D HGATP_MODE_SV32X4; -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init =3D 2; +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 2; #endif =20 #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) =20 -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level) +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage, + gpa_t addr, u32 level) { unsigned long mask; unsigned long shift =3D HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits *= level); =20 - if (level =3D=3D (kvm_riscv_gstage_pgd_levels - 1)) + if (level =3D=3D gstage->kvm->arch.pgd_levels - 1) mask =3D (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; else mask =3D PTRS_PER_PTE - 1; @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t= pte) return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte))); } =20 -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_lev= el) +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned l= ong page_size, + u32 *out_level) { u32 i; unsigned long psz =3D 1UL << 12; =20 - for (i =3D 0; i < kvm_riscv_gstage_pgd_levels; i++) { + for (i =3D 0; i < gstage->kvm->arch.pgd_levels; i++) { if (page_size =3D=3D (psz << (i * kvm_riscv_gstage_index_bits))) { *out_level =3D i; return 0; @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page= _size, u32 *out_level) return -EINVAL; } =20 -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorde= r) +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgorder) { - if (kvm_riscv_gstage_pgd_levels < level) + if (gstage->kvm->arch.pgd_levels < level) return -EINVAL; =20 *out_pgorder =3D 12 + (level * kvm_riscv_gstage_index_bits); return 0; } =20 -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize) +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level, + unsigned long *out_pgsize) { int rc; unsigned long page_order =3D PAGE_SHIFT; =20 - rc =3D gstage_level_to_page_order(level, &page_order); + rc =3D gstage_level_to_page_order(gstage, level, &page_order); if (rc) return rc; =20 @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstag= e, gpa_t addr, pte_t **ptepp, u32 *ptep_level) { pte_t *ptep; - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.pgd_levels - 1; =20 *ptep_level =3D current_level; ptep =3D (pte_t *)gstage->pgd; - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; while (ptep && pte_val(ptep_get(ptep))) { if (gstage_pte_leaf(ptep)) { *ptep_level =3D current_level; @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage,= gpa_t addr, current_level--; *ptep_level =3D current_level; ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); - ptep =3D &ptep[gstage_pte_index(addr, current_level)]; + ptep =3D &ptep[gstage_pte_index(gstage, addr, current_level)]; } else { ptep =3D NULL; } @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage,= u32 level, gpa_t addr) { unsigned long order =3D PAGE_SHIFT; =20 - if (gstage_level_to_page_order(level, &order)) + if (gstage_level_to_page_order(gstage, level, &order)) return; addr &=3D ~(BIT(order) - 1); =20 @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, struct kvm_mmu_memory_cache *pcache, const struct kvm_gstage_mapping *map) { - u32 current_level =3D kvm_riscv_gstage_pgd_levels - 1; + u32 current_level =3D gstage->kvm->arch.pgd_levels - 1; pte_t *next_ptep =3D (pte_t *)gstage->pgd; - pte_t *ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + pte_t *ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_le= vel)]; =20 if (current_level < map->level) return -EINVAL; @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, } =20 current_level--; - ptep =3D &next_ptep[gstage_pte_index(map->addr, current_level)]; + ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_level)]; } =20 if (pte_val(*ptep) !=3D pte_val(map->pte)) { @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage, out_map->addr =3D gpa; out_map->level =3D 0; =20 - ret =3D gstage_page_size_to_level(page_size, &out_map->level); + ret =3D gstage_page_size_to_level(gstage, page_size, &out_map->level); if (ret) return ret; =20 @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, u32 next_ptep_level; unsigned long next_page_size, page_size; =20 - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) return; =20 @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage,= gpa_t addr, if (ptep_level && !gstage_pte_leaf(ptep)) { next_ptep =3D (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep)); next_ptep_level =3D ptep_level - 1; - ret =3D gstage_level_to_page_size(next_ptep_level, &next_page_size); + ret =3D gstage_level_to_page_size(gstage, next_ptep_level, &next_page_si= ze); if (ret) return; =20 @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gs= tage, =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstag= e, gpa_t start, gpa_t end =20 while (addr < end) { found_leaf =3D kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_leve= l); - ret =3D gstage_level_to_page_size(ptep_level, &page_size); + ret =3D gstage_level_to_page_size(gstage, ptep_level, &page_size); if (ret) break; =20 @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void) /* Try Sv57x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV57X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV57X4; - kvm_riscv_gstage_pgd_levels =3D 5; + kvm_riscv_gstage_max_pgd_levels =3D 5; goto done; } =20 /* Try Sv48x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV48X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV48X4; - kvm_riscv_gstage_pgd_levels =3D 4; + kvm_riscv_gstage_max_pgd_levels =3D 4; goto done; } =20 /* Try Sv39x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV39X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV39X4; - kvm_riscv_gstage_pgd_levels =3D 3; + kvm_riscv_gstage_max_pgd_levels =3D 3; goto done; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV32X4) { - kvm_riscv_gstage_mode =3D HGATP_MODE_SV32X4; - kvm_riscv_gstage_pgd_levels =3D 2; + kvm_riscv_gstage_max_pgd_levels =3D 2; goto done; } #endif =20 /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_mode =3D HGATP_MODE_OFF; - kvm_riscv_gstage_pgd_levels =3D 0; + kvm_riscv_gstage_max_pgd_levels =3D 0; =20 done: csr_write(CSR_HGATP, 0); diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c index 0f3fe3986fc0..90ee0a032b9a 100644 --- a/arch/riscv/kvm/main.c +++ b/arch/riscv/kvm/main.c @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void) return rc; =20 kvm_riscv_gstage_mode_detect(); - switch (kvm_riscv_gstage_mode) { - case HGATP_MODE_SV32X4: + switch (kvm_riscv_gstage_max_pgd_levels) { + case 2: str =3D "Sv32x4"; break; - case HGATP_MODE_SV39X4: + case 3: str =3D "Sv39x4"; break; - case HGATP_MODE_SV48X4: + case 4: str =3D "Sv48x4"; break; - case HGATP_MODE_SV57X4: + case 5: str =3D "Sv57x4"; break; default: @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void) (rc) ? slist : "no features"); } =20 - kvm_info("using %s G-stage page table format\n", str); + kvm_info("highest G-stage page table mode is %s\n", str); =20 kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits()); =20 diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 088d33ba90ed..fbcdd75cb9af 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phy= s_addr_t hpa, if (!writable) map.pte =3D pte_wrprotect(map.pte); =20 - ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(&pcache, kvm->arch.pgd_levels); if (ret) goto out; =20 @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, * space addressable by the KVM guest GPA space. */ if ((new->base_gfn + new->npages) >=3D - (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT)) + kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels) >> PAGE_SHIFT) return -EFAULT; =20 hva =3D new->userspace_addr; @@ -472,7 +472,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _memory_slot *memslot, memset(out_map, 0, sizeof(*out_map)); =20 /* We need minimum second+third level pages */ - ret =3D kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels); + ret =3D kvm_mmu_topup_memory_cache(pcache, kvm->arch.pgd_levels); if (ret) { kvm_err("Failed to topup G-stage cache\n"); return ret; @@ -575,6 +575,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm) return -ENOMEM; kvm->arch.pgd =3D page_to_virt(pgd_page); kvm->arch.pgd_phys =3D page_to_phys(pgd_page); + kvm->arch.pgd_levels =3D kvm_riscv_gstage_max_pgd_levels; =20 return 0; } @@ -590,10 +591,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) gstage.flags =3D 0; gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); gstage.pgd =3D kvm->arch.pgd; - kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, fa= lse); + kvm_riscv_gstage_unmap_range(&gstage, 0UL, + kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false); pgd =3D READ_ONCE(kvm->arch.pgd); kvm->arch.pgd =3D NULL; kvm->arch.pgd_phys =3D 0; + kvm->arch.pgd_levels =3D 0; } spin_unlock(&kvm->mmu_lock); =20 @@ -603,11 +606,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) =20 void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu) { - unsigned long hgatp =3D kvm_riscv_gstage_mode << HGATP_MODE_SHIFT; - struct kvm_arch *k =3D &vcpu->kvm->arch; + struct kvm_arch *ka =3D &vcpu->kvm->arch; + unsigned long hgatp =3D kvm_riscv_gstage_mode(ka->pgd_levels) + << HGATP_MODE_SHIFT; =20 - hgatp |=3D (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; - hgatp |=3D (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; + hgatp |=3D (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID; + hgatp |=3D (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; =20 ncsr_write(CSR_HGATP, hgatp); =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 13c63ae1a78b..4d82a886102c 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) r =3D KVM_USER_MEM_SLOTS; break; case KVM_CAP_VM_GPA_BITS: - r =3D kvm_riscv_gstage_gpa_bits; + r =3D kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels); break; default: r =3D 0; diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c index cf34d448289d..c15bdb1dd8be 100644 --- a/arch/riscv/kvm/vmid.c +++ b/arch/riscv/kvm/vmid.c @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock); void __init kvm_riscv_gstage_vmid_detect(void) { /* Figure-out number of VMID bits in HW */ - csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_= VMID); + csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_leve= ls) << + HGATP_MODE_SHIFT) | HGATP_VMID); vmid_bits =3D csr_read(CSR_HGATP); vmid_bits =3D (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT; vmid_bits =3D fls_long(vmid_bits); --=20 2.50.1 From nobody Thu Apr 2 01:54:01 2026 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B3813CD8C5; Mon, 30 Mar 2026 12:26:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873602; cv=none; b=eu9eHcjPxE4O1NgaCYaYiAx3Vthf2GYMwXpr5oUAMghjii9qMbLA52LGPvtX0Pf1hNQRtHSpsvDwzWBQVzLnUbFS/fLFw1sJiyA4RVIguWnQ3izuFbRmHzDm6aZg9pxxP/Vv0Z7XIjSwQ/PiFIFE47bd/l4tnMtc3eUNd1ksRy8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873602; c=relaxed/simple; bh=4UX4zR/SoyFBXcevu5BS/rR4BpG0cIklXlPJTft38QE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ngi0UUxjgIueidswIVoN5v23UW3bHVfyIUIv3rIYla4Ekdex71j510M4UIF95QwHfbBcU3YCQBFNs9fz8XGJ8gmO527fWmlTWmsRviSIroqcoWZDGSxoaCdzcX3KH0yOwut+RIRvI5GZ2qE/TrjGtc9kNcBYvrvGImWai7XL5hA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=GD+8srE7; arc=none smtp.client-ip=115.124.30.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="GD+8srE7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1774873592; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=zoncOzDn/GwxdQdnVQulFi3wkN3Paqj7D9C+6rwYdQM=; b=GD+8srE7DgEnpbtQo1/tVXKTYfOlU7hYcLNLSGOyCdPlH7CGJT9nt09LVbrTsAt5KZyepUKNcwH8wdwgs3mOCphHeukMgnw/QwlJa2oLYpoToXic00Y7sjVIk42PbepSCe7eoBUokErHf0lEc5kSSzgs5cv52eZIu+ju6SX+qC8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R791e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam011083073210;MF=fangyu.yu@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0X.zfcIY_1774873590; Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0X.zfcIY_1774873590 cluster:ay36) by smtp.aliyun-inc.com; Mon, 30 Mar 2026 20:26:30 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, skhan@linuxfoundation.org Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v6 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage Date: Mon, 30 Mar 2026 20:25:59 +0800 Message-Id: <20260330122601.22140-3-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> References: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Gstage page-table helpers frequently chase gstage->kvm->arch to fetch pgd_levels. This adds noise and repeats the same dereference chain in hot paths. Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch when setting up a gstage instance. Introduce kvm_riscv_gstage_init() to centralize initialization and switch gstage code to use gstage->pgd_levels. Suggested-by: Anup Patel Signed-off-by: Fangyu Yu Reviewed-by: Anup Patel --- arch/riscv/include/asm/kvm_gstage.h | 10 ++++++ arch/riscv/kvm/gstage.c | 10 +++--- arch/riscv/kvm/mmu.c | 50 ++++++----------------------- 3 files changed, 25 insertions(+), 45 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index 5aa58d1f692a..70d9d483365e 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -15,6 +15,7 @@ struct kvm_gstage { #define KVM_GSTAGE_FLAGS_LOCAL BIT(0) unsigned long vmid; pgd_t *pgd; + unsigned long pgd_levels; }; =20 struct kvm_gstage_mapping { @@ -92,4 +93,13 @@ static inline unsigned long kvm_riscv_gstage_mode(unsign= ed long pgd_levels) } } =20 +static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct= kvm *kvm) +{ + gstage->kvm =3D kvm; + gstage->flags =3D 0; + gstage->vmid =3D READ_ONCE(kvm->arch.vmid.vmid); + gstage->pgd =3D kvm->arch.pgd; + gstage->pgd_levels =3D kvm->arch.pgd_levels; +} + #endif diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index 4beb9322fe76..7c4c34bc191b 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -26,7 +26,7 @@ static inline unsigned long gstage_pte_index(struct kvm_g= stage *gstage, unsigned long mask; unsigned long shift =3D HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits *= level); =20 - if (level =3D=3D gstage->kvm->arch.pgd_levels - 1) + if (level =3D=3D gstage->pgd_levels - 1) mask =3D (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1; else mask =3D PTRS_PER_PTE - 1; @@ -45,7 +45,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *g= stage, unsigned long pa u32 i; unsigned long psz =3D 1UL << 12; =20 - for (i =3D 0; i < gstage->kvm->arch.pgd_levels; i++) { + for (i =3D 0; i < gstage->pgd_levels; i++) { if (page_size =3D=3D (psz << (i * kvm_riscv_gstage_index_bits))) { *out_level =3D i; return 0; @@ -58,7 +58,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *g= stage, unsigned long pa static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level, unsigned long *out_pgorder) { - if (gstage->kvm->arch.pgd_levels < level) + if (gstage->pgd_levels < level) return -EINVAL; =20 *out_pgorder =3D 12 + (level * kvm_riscv_gstage_index_bits); @@ -83,7 +83,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage,= gpa_t addr, pte_t **ptepp, u32 *ptep_level) { pte_t *ptep; - u32 current_level =3D gstage->kvm->arch.pgd_levels - 1; + u32 current_level =3D gstage->pgd_levels - 1; =20 *ptep_level =3D current_level; ptep =3D (pte_t *)gstage->pgd; @@ -127,7 +127,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage, struct kvm_mmu_memory_cache *pcache, const struct kvm_gstage_mapping *map) { - u32 current_level =3D gstage->kvm->arch.pgd_levels - 1; + u32 current_level =3D gstage->pgd_levels - 1; pte_t *next_ptep =3D (pte_t *)gstage->pgd; pte_t *ptep =3D &next_ptep[gstage_pte_index(gstage, map->addr, current_le= vel)]; =20 diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index fbcdd75cb9af..2d3def024270 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -24,10 +24,7 @@ static void mmu_wp_memory_region(struct kvm *kvm, int sl= ot) phys_addr_t end =3D (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; struct kvm_gstage gstage; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 spin_lock(&kvm->mmu_lock); kvm_riscv_gstage_wp_range(&gstage, start, end); @@ -49,10 +46,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, ph= ys_addr_t hpa, struct kvm_gstage_mapping map; struct kvm_gstage gstage; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 end =3D (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; pfn =3D __phys_to_pfn(hpa); @@ -89,10 +83,7 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, u= nsigned long size) { struct kvm_gstage gstage; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 spin_lock(&kvm->mmu_lock); kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false); @@ -109,10 +100,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kv= m *kvm, phys_addr_t end =3D (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; struct kvm_gstage gstage; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 kvm_riscv_gstage_wp_range(&gstage, start, end); } @@ -141,10 +129,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm, phys_addr_t size =3D slot->npages << PAGE_SHIFT; struct kvm_gstage gstage; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 spin_lock(&kvm->mmu_lock); kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false); @@ -250,10 +235,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_g= fn_range *range) if (!kvm->arch.pgd) return false; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); mmu_locked =3D spin_trylock(&kvm->mmu_lock); kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT, (range->end - range->start) << PAGE_SHIFT, @@ -275,10 +257,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range= *range) =20 WARN_ON(size !=3D PAGE_SIZE && size !=3D PMD_SIZE && size !=3D PUD_SIZE); =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT, &ptep, &ptep_level)) return false; @@ -298,10 +277,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_= range *range) =20 WARN_ON(size !=3D PAGE_SIZE && size !=3D PMD_SIZE && size !=3D PUD_SIZE); =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT, &ptep, &ptep_level)) return false; @@ -463,10 +439,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kv= m_memory_slot *memslot, struct kvm_gstage gstage; struct page *page; =20 - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); =20 /* Setup initial state of output mapping */ memset(out_map, 0, sizeof(*out_map)); @@ -587,10 +560,7 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm) =20 spin_lock(&kvm->mmu_lock); if (kvm->arch.pgd) { - gstage.kvm =3D kvm; - gstage.flags =3D 0; - gstage.vmid =3D READ_ONCE(kvm->arch.vmid.vmid); - gstage.pgd =3D kvm->arch.pgd; + kvm_riscv_gstage_init(&gstage, kvm); kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false); pgd =3D READ_ONCE(kvm->arch.pgd); --=20 2.50.1 From nobody Thu Apr 2 01:54:01 2026 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1521F37CD57; Mon, 30 Mar 2026 12:26:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873597; cv=none; b=a6NEoNA8fPL3rmULtJj4TtfQ+9fLjzLlW1gp3daCKG9Gm+NiXtsQbVeq2SjNIznQkrryN8fvUprAR3Ow7MX+5085VCl8hZkujblWCQ1oKsDJmXyX9UGm3/AQw8IYDybn37X7sJ1Evtz/bdwWVrhkvCEiCn/cTa1CE5hmsqVg4q8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873597; c=relaxed/simple; bh=Xn/WZtZyMpPye9m8abp212wYw38KSOoY3nIsJV0J4GM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JojDiDFyvSEUjGM8vaGRz8JghZGX+recLLM2FkCJcLQ22rEUJc5ULtPmUBLoB33ncZPJYNYCo8HqHS4efUyy91vgdvu2ZNgy/UjTJRNiTpa5NOLlCf5ZLqbkzt6AHrmrQnqMm0eAuRxyt55C4iTLw/iRmiKfjiz5z/jkmd8NQhI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Jfzrhjxl; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Jfzrhjxl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1774873593; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=PT+yx01BbG86omMromavfHwf6ELYE5YcV+gpW/gZMbk=; b=Jfzrhjxl6h90Y2n4R7N8OpzN2M8yrMIIEFmfAby5Ur791EvlDJZelFZZRY5VABVQsr83GXDbFhtWDQ4QyXOpKWWCtwzeGCu9krIVpBVas8Vg/LGzDv7erhOYZHPMW07zUfjdiNCZyFnpxYtiPxfZVXjhgsueBwpG2BtklrFWyc8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam011083073210;MF=fangyu.yu@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0X.zfcJ9_1774873591; Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0X.zfcJ9_1774873591 cluster:ay36) by smtp.aliyun-inc.com; Mon, 30 Mar 2026 20:26:32 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, skhan@linuxfoundation.org Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v6 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes Date: Mon, 30 Mar 2026 20:26:00 +0800 Message-Id: <20260330122601.22140-4-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> References: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values supported by the host and record them in a bitmask. Keep tracking the maximum supported G-stage page table level for existing internal users. Also provide lightweight helpers to retrieve the supported-mode bitmask and validate a requested HGATP.MODE against it. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones Reviewed-by: Guo Ren --- arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++ arch/riscv/kvm/gstage.c | 43 +++++++++++++++-------------- 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/k= vm_gstage.h index 70d9d483365e..bbf8f45c6563 100644 --- a/arch/riscv/include/asm/kvm_gstage.h +++ b/arch/riscv/include/asm/kvm_gstage.h @@ -31,6 +31,7 @@ struct kvm_gstage_mapping { #endif =20 extern unsigned long kvm_riscv_gstage_max_pgd_levels; +extern u32 kvm_riscv_gstage_supported_mode_mask; =20 #define kvm_riscv_gstage_pgd_xbits 2 #define kvm_riscv_gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gs= tage_pgd_xbits)) @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gs= tage *gstage, struct kvm * gstage->pgd_levels =3D kvm->arch.pgd_levels; } =20 +static inline u32 kvm_riscv_get_hgatp_mode_mask(void) +{ + return kvm_riscv_gstage_supported_mode_mask; +} + +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode) +{ + return kvm_riscv_gstage_supported_mode_mask & BIT(mode); +} + #endif diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c index 7c4c34bc191b..459041255c14 100644 --- a/arch/riscv/kvm/gstage.c +++ b/arch/riscv/kvm/gstage.c @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_= init =3D 3; #else unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init =3D 2; #endif +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */ +u32 kvm_riscv_gstage_supported_mode_mask __ro_after_init; =20 #define gstage_pte_leaf(__ptep) \ (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gst= age, gpa_t start, gpa_t end } } =20 +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode) +{ + csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT); + return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D mode); +} + void __init kvm_riscv_gstage_mode_detect(void) { + kvm_riscv_gstage_supported_mode_mask =3D 0; + kvm_riscv_gstage_max_pgd_levels =3D 0; + #ifdef CONFIG_64BIT - /* Try Sv57x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV57X4) { - kvm_riscv_gstage_max_pgd_levels =3D 5; - goto done; + /* Try Sv39x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) { + kvm_riscv_gstage_supported_mode_mask |=3D BIT(HGATP_MODE_SV39X4); + kvm_riscv_gstage_max_pgd_levels =3D 3; } =20 /* Try Sv48x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV48X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) { + kvm_riscv_gstage_supported_mode_mask |=3D BIT(HGATP_MODE_SV48X4); kvm_riscv_gstage_max_pgd_levels =3D 4; - goto done; } =20 - /* Try Sv39x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV39X4) { - kvm_riscv_gstage_max_pgd_levels =3D 3; - goto done; + /* Try Sv57x4 G-stage mode */ + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) { + kvm_riscv_gstage_supported_mode_mask |=3D BIT(HGATP_MODE_SV57X4); + kvm_riscv_gstage_max_pgd_levels =3D 5; } #else /* CONFIG_32BIT */ /* Try Sv32x4 G-stage mode */ - csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); - if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) =3D=3D HGATP_MODE_SV32X4) { + if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) { + kvm_riscv_gstage_supported_mode_mask |=3D BIT(HGATP_MODE_SV32X4); kvm_riscv_gstage_max_pgd_levels =3D 2; - goto done; } #endif =20 - /* KVM depends on !HGATP_MODE_OFF */ - kvm_riscv_gstage_max_pgd_levels =3D 0; - -done: csr_write(CSR_HGATP, 0); kvm_riscv_local_hfence_gvma_all(); } --=20 2.50.1 From nobody Thu Apr 2 01:54:01 2026 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D83FA3CD8D7; Mon, 30 Mar 2026 12:26:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873605; cv=none; b=UByQ/M/2D1hd7z/irJZ6wKoJbXR9+CCUpL1IAtO3eNtLRCBWyQtNXzeuF3cPetYhTZ/2aoRAUAxIzfEk1hMsYTycPy8mbzVXjRZ8sFIzeatG9C0yy9CeY5R8YChtDEM66d0mdgSz6z8eSzEsCjryWJR7wN3HGNfctjuXRnDKZp0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774873605; c=relaxed/simple; bh=t+nBbtYRRgP5iBScihFqmbDfpM8X25wYYMjt+qeUvm0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iV6ID3GE38bLNeFjCujc0arlznsl61D+OV1qmv3dXViX/6cWvoog4k57gSgpn0SLC80oGALeouqLVpejRisLzoIxJeYrvPtQBVxGSCebscFvuXxqKd/TVD8SKSGe4nLjhF2iKALlzShSVdMyiyfafhhsadKD1+PiNjI0PQaLkrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=gAgl0YHo; arc=none smtp.client-ip=115.124.30.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="gAgl0YHo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1774873595; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=hFPV9OZsT6eJowdta/xFZt6DIyl+vW5qGVBuswXMeAQ=; b=gAgl0YHoRXUI4fqRUUraqNR7gl5evH2jb92yTK3Zo72XiGqGp3L7jU5oUdX1wrauvmb75qSXROgnS5d9vsElxXNMTfFFjxBF/AOg2SXRYas1CBYQ0JDwNllD1Kmfu6oOBc/kaHqLNuWdLqLd8Nz2Y4btnAIYFUwvgKiUZ0RCiSE= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037033178;MF=fangyu.yu@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0X.zfcJd_1774873592; Received: from localhost.localdomain(mailfrom:fangyu.yu@linux.alibaba.com fp:SMTPD_---0X.zfcJd_1774873592 cluster:ay36) by smtp.aliyun-inc.com; Mon, 30 Mar 2026 20:26:33 +0800 From: fangyu.yu@linux.alibaba.com To: pbonzini@redhat.com, corbet@lwn.net, anup@brainfault.org, atish.patra@linux.dev, pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr, skhan@linuxfoundation.org Cc: guoren@kernel.org, radim.krcmar@oss.qualcomm.com, andrew.jones@oss.qualcomm.com, linux-doc@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Fangyu Yu Subject: [PATCH v6 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE Date: Mon, 30 Mar 2026 20:26:01 +0800 Message-Id: <20260330122601.22140-5-fangyu.yu@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> References: <20260330122601.22140-1-fangyu.yu@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fangyu Yu Add a VM capability that allows userspace to select the G-stage page table format by setting HGATP.MODE on a per-VM basis. Userspace enables the capability via KVM_ENABLE_CAP, passing the requested HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is not supported by the host, and with -EBUSY if the VM has already been committed (e.g. vCPUs have been created or any memslot is populated). KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the HGATP.MODE formats supported by the host. Signed-off-by: Fangyu Yu Reviewed-by: Andrew Jones Reviewed-by: Guo Ren --- Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++ arch/riscv/kvm/vm.c | 18 ++++++++++++++++-- include/uapi/linux/kvm.h | 1 + 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 032516783e96..9d7f6958fa81 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions = which are not This capability can be enabled dynamically even if VCPUs were already created and are running. =20 +7.47 KVM_CAP_RISCV_SET_HGATP_MODE +--------------------------------- + +:Architectures: riscv +:Type: VM +:Parameters: args[0] contains the requested HGATP mode +:Returns: + - 0 on success. + - -EINVAL if args[0] is outside the range of HGATP modes supported by the + hardware. + - -EBUSY if vCPUs have already been created for the VM, if the VM has any + non-empty memslots. + +This capability allows userspace to explicitly select the HGATP mode for +the VM. The selected mode must be supported by both KVM and hardware. This +capability must be enabled before creating any vCPUs or memslots. + +If this capability is not enabled, KVM will select the default HGATP mode +automatically. The default is the highest HGATP.MODE value supported by +hardware. + +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of +HGATP.MODE values supported by the host. A return value of 0 indicates that +the capability is not supported. Supported-mode bitmask use HGATP.MODE +encodings as defined by the RISC-V privileged specification, such as Sv39x4 +corresponds to HGATP.MODE=3D8, so userspace should test bitmask & BIT(8). + 8. Other capabilities. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 4d82a886102c..5e82a3ad3ad0 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) case KVM_CAP_VM_GPA_BITS: r =3D kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels); break; + case KVM_CAP_RISCV_SET_HGATP_MODE: + r =3D kvm_riscv_get_hgatp_mode_mask(); + break; default: r =3D 0; break; @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { + if (cap->flags) + return -EINVAL; + switch (cap->cap) { case KVM_CAP_RISCV_MP_STATE_RESET: - if (cap->flags) - return -EINVAL; kvm->arch.mp_state_reset =3D true; return 0; + case KVM_CAP_RISCV_SET_HGATP_MODE: + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0])) + return -EINVAL; + + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm)) + return -EBUSY; +#ifdef CONFIG_64BIT + kvm->arch.pgd_levels =3D 3 + cap->args[0] - HGATP_MODE_SV39X4; +#endif + return 0; default: return -EINVAL; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 80364d4dbebb..a74a80fd4046 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -989,6 +989,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 #define KVM_CAP_S390_KEYOP 247 +#define KVM_CAP_RISCV_SET_HGATP_MODE 248 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; --=20 2.50.1