Documentation/virt/kvm/api.rst | 14 ++++++++++++++ arch/riscv/kvm/vm.c | 26 ++++++++++++++++++++++++-- include/uapi/linux/kvm.h | 1 + 3 files changed, 39 insertions(+), 2 deletions(-)
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
This capability allows userspace to explicitly select the HGATP mode
for the VM. The selected mode must be less than or equal to the max
HGATP mode supported by the hardware. This capability must be enabled
before creating any vCPUs, and can only be set once per VM.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
Documentation/virt/kvm/api.rst | 14 ++++++++++++++
arch/riscv/kvm/vm.c | 26 ++++++++++++++++++++++++--
include/uapi/linux/kvm.h | 1 +
3 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..9e17788e3a9d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8765,6 +8765,20 @@ helpful if user space wants to emulate instructions which are not
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.47 KVM_CAP_RISCV_SET_HGATP_MODE
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: args[0] contains the requested HGATP mode
+:Returns: 0 on success, -EINVAL if arg[0] is outside the range of hgatp
+ modes supported by the hardware.
+
+This capability allows userspace to explicitly select the HGATP mode for
+the VM. The selected mode must be less than or equal to the maximum HGATP
+mode supported by the hardware. This capability must be enabled before
+creating any vCPUs, and can only be set once per VM.
+
8. Other capabilities.
======================
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 4b2156df40fc..e9275023a73a 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -202,6 +202,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_GPA_BITS:
r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
break;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+#ifdef CONFIG_64BIT
+ r = 1;
+#else/* CONFIG_32BIT */
+ r = 0;
+#endif
+ break;
default:
r = 0;
break;
@@ -212,12 +219,27 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
{
+ if (cap->flags)
+ return -EINVAL;
switch (cap->cap) {
case KVM_CAP_RISCV_MP_STATE_RESET:
- if (cap->flags)
- return -EINVAL;
kvm->arch.mp_state_reset = true;
return 0;
+ case KVM_CAP_RISCV_SET_HGATP_MODE:
+#ifdef CONFIG_64BIT
+ if (cap->args[0] < HGATP_MODE_SV39X4 ||
+ cap->args[0] > kvm_riscv_gstage_max_mode)
+ return -EINVAL;
+ if (kvm->arch.gstage_mode_initialized)
+ return 0;
+ kvm->arch.gstage_mode_initialized = true;
+ kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
+ kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
+ kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
+ kvm_info("using SV%lluX4 G-stage page table format\n",
+ 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
+#endif
+ return 0;
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..00c02a880518 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -974,6 +974,7 @@ struct kvm_enable_cap {
#define KVM_CAP_GUEST_MEMFD_FLAGS 244
#define KVM_CAP_ARM_SEA_TO_USER 245
#define KVM_CAP_S390_USER_OPEREXEC 246
+#define KVM_CAP_RISCV_SET_HGATP_MODE 247
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.50.1
2026-01-05T22:32:32+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> This capability allows userspace to explicitly select the HGATP mode
> for the VM. The selected mode must be less than or equal to the max
> HGATP mode supported by the hardware. This capability must be enabled
> before creating any vCPUs, and can only be set once per VM.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> @@ -212,12 +219,27 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + if (cap->flags)
> + return -EINVAL;
> switch (cap->cap) {
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
> + cap->args[0] > kvm_riscv_gstage_max_mode)
> + return -EINVAL;
> + if (kvm->arch.gstage_mode_initialized)
> + return 0;
"must be enabled before creating any vCPUs" check is missing.
> + kvm->arch.gstage_mode_initialized = true;
> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
Even before creating VCPUs, I don't see enough protections to make this
work.
Userspace can only provide a hint about the physical address space size
before any other KVM code could have acted on the information.
It would be a serious issue if some code would operate on hgatp as if it
were X and others as Y.
The simplest solution would be to ensure that the CAP_SET VM ioctl can
only be executed before any other IOCTL, but a change in generic code to
achieve it would be frowned upon...
I would recommend looking at kvm_are_all_memslots_empty() first, as it's
quite likely that it could be sufficient for the purposes of changing
hgatp.
Thanks.
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> This capability allows userspace to explicitly select the HGATP mode
>> for the VM. The selected mode must be less than or equal to the max
>> HGATP mode supported by the hardware. This capability must be enabled
>> before creating any vCPUs, and can only be set once per VM.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> @@ -212,12 +219,27 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>
>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> {
>> + if (cap->flags)
>> + return -EINVAL;
>> switch (cap->cap) {
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +#ifdef CONFIG_64BIT
>> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
>> + cap->args[0] > kvm_riscv_gstage_max_mode)
>> + return -EINVAL;
>> + if (kvm->arch.gstage_mode_initialized)
>> + return 0;
>
>"must be enabled before creating any vCPUs" check is missing.
Agreed, I'll add the missing "must be enabled before creating any vCPUs" check by
rejecting the capability once kvm->created_vcpus is non-zero.
>
>> + kvm->arch.gstage_mode_initialized = true;
>> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
>> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
>> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
>
>Even before creating VCPUs, I don't see enough protections to make this
>work.
>
>Userspace can only provide a hint about the physical address space size
>before any other KVM code could have acted on the information.
>It would be a serious issue if some code would operate on hgatp as if it
>were X and others as Y.
>
>The simplest solution would be to ensure that the CAP_SET VM ioctl can
>only be executed before any other IOCTL, but a change in generic code to
>achieve it would be frowned upon...
>I would recommend looking at kvm_are_all_memslots_empty() first, as it's
>quite likely that it could be sufficient for the purposes of changing
>hgatp.
Using kvm_are_all_memslots_empty might be a good idea, and I will add a
check for this function in the v2.
>
>Thanks.
Thanks,
Fangyu
On Mon, Jan 05, 2026 at 10:32:32PM +0800, fangyu.yu@linux.alibaba.com wrote:
...
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
> + cap->args[0] > kvm_riscv_gstage_max_mode)
> + return -EINVAL;
> + if (kvm->arch.gstage_mode_initialized)
> + return 0;
> + kvm->arch.gstage_mode_initialized = true;
> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
> + kvm_info("using SV%lluX4 G-stage page table format\n",
> + 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
I don't think we want this kvm_info line, particularly if it doesn't also
include a VM ID in some form to allow readers to know which VM is using
the selected format. Let's either drop it or change it to kvm_debug and
include a VM ID.
Thanks,
drew
> +#endif
> + return 0;
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index dddb781b0507..00c02a880518 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -974,6 +974,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_GUEST_MEMFD_FLAGS 244
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.50.1
>
>
> --
> kvm-riscv mailing list
> kvm-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kvm-riscv
>...
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +#ifdef CONFIG_64BIT
>> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
>> + cap->args[0] > kvm_riscv_gstage_max_mode)
>> + return -EINVAL;
>> + if (kvm->arch.gstage_mode_initialized)
>> + return 0;
>> + kvm->arch.gstage_mode_initialized = true;
>> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
>> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
>> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
>> + kvm_info("using SV%lluX4 G-stage page table format\n",
>> + 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
>
>I don't think we want this kvm_info line, particularly if it doesn't also
>include a VM ID in some form to allow readers to know which VM is using
>the selected format. Let's either drop it or change it to kvm_debug and
>include a VM ID.
Agreed, I will switch it to kvm_debug() and include a VM ID.
>
>Thanks,
>drew
>
>> +#endif
>> + return 0;
>> default:
>> return -EINVAL;
>> }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index dddb781b0507..00c02a880518 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -974,6 +974,7 @@ struct kvm_enable_cap {
>> #define KVM_CAP_GUEST_MEMFD_FLAGS 244
>> #define KVM_CAP_ARM_SEA_TO_USER 245
>> #define KVM_CAP_S390_USER_OPEREXEC 246
>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247
>>
>> struct kvm_irq_routing_irqchip {
>> __u32 irqchip;
>> --
>> 2.50.1
>
>
On Mon, Jan 05, 2026 at 10:32:32PM +0800, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> This capability allows userspace to explicitly select the HGATP mode
> for the VM. The selected mode must be less than or equal to the max
> HGATP mode supported by the hardware. This capability must be enabled
> before creating any vCPUs, and can only be set once per VM.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
> Documentation/virt/kvm/api.rst | 14 ++++++++++++++
> arch/riscv/kvm/vm.c | 26 ++++++++++++++++++++++++--
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 01a3abef8abb..9e17788e3a9d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8765,6 +8765,20 @@ helpful if user space wants to emulate instructions which are not
> This capability can be enabled dynamically even if VCPUs were already
> created and are running.
>
> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> +---------------------------------
> +
> +:Architectures: riscv
> +:Type: VM
> +:Parameters: args[0] contains the requested HGATP mode
> +:Returns: 0 on success, -EINVAL if arg[0] is outside the range of hgatp
> + modes supported by the hardware.
> +
> +This capability allows userspace to explicitly select the HGATP mode for
> +the VM. The selected mode must be less than or equal to the maximum HGATP
> +mode supported by the hardware. This capability must be enabled before
> +creating any vCPUs, and can only be set once per VM.
I think I would prefer a KVM_CAP_RISCV_SET_MAX_GPA type of capability. The
reason is because, while one of the results of the max-gpa being set will
be to set hgatp, there may be other reasons to track the guest's maximum
physical address too and kvm userspace shouldn't need to think about each
individually.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 4b2156df40fc..e9275023a73a 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -202,6 +202,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_VM_GPA_BITS:
> r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
> break;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> + r = 1;
> +#else/* CONFIG_32BIT */
> + r = 0;
> +#endif
r = IS_ENABLED(CONFIG_64BIT) ? 1 : 0;
> + break;
> default:
> r = 0;
> break;
> @@ -212,12 +219,27 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> {
> + if (cap->flags)
> + return -EINVAL;
add blank line
> switch (cap->cap) {
> case KVM_CAP_RISCV_MP_STATE_RESET:
> - if (cap->flags)
> - return -EINVAL;
> kvm->arch.mp_state_reset = true;
> return 0;
> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
> + cap->args[0] > kvm_riscv_gstage_max_mode)
> + return -EINVAL;
> + if (kvm->arch.gstage_mode_initialized)
> + return 0;
I think we want to return -EBUSY here and it should be documented where it
already states "...can only be set once per VM"
> + kvm->arch.gstage_mode_initialized = true;
In the previous patch I thought we were missing this, but I see now it
means "user initialized". Let's rename it as such,
gstage_mode_user_initialized
> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
> + kvm_info("using SV%lluX4 G-stage page table format\n",
> + 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
> +#endif
> + return 0;
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index dddb781b0507..00c02a880518 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -974,6 +974,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_GUEST_MEMFD_FLAGS 244
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.50.1
>
Thanks,
drew
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> This capability allows userspace to explicitly select the HGATP mode
>> for the VM. The selected mode must be less than or equal to the max
>> HGATP mode supported by the hardware. This capability must be enabled
>> before creating any vCPUs, and can only be set once per VM.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>> Documentation/virt/kvm/api.rst | 14 ++++++++++++++
>> arch/riscv/kvm/vm.c | 26 ++++++++++++++++++++++++--
>> include/uapi/linux/kvm.h | 1 +
>> 3 files changed, 39 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 01a3abef8abb..9e17788e3a9d 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8765,6 +8765,20 @@ helpful if user space wants to emulate instructions which are not
>> This capability can be enabled dynamically even if VCPUs were already
>> created and are running.
>>
>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> +---------------------------------
>> +
>> +:Architectures: riscv
>> +:Type: VM
>> +:Parameters: args[0] contains the requested HGATP mode
>> +:Returns: 0 on success, -EINVAL if arg[0] is outside the range of hgatp
>> + modes supported by the hardware.
>> +
>> +This capability allows userspace to explicitly select the HGATP mode for
>> +the VM. The selected mode must be less than or equal to the maximum HGATP
>> +mode supported by the hardware. This capability must be enabled before
>> +creating any vCPUs, and can only be set once per VM.
>
>I think I would prefer a KVM_CAP_RISCV_SET_MAX_GPA type of capability. The
>reason is because, while one of the results of the max-gpa being set will
>be to set hgatp, there may be other reasons to track the guest's maximum
>physical address too and kvm userspace shouldn't need to think about each
>individually.
That makes sense, thanks.
>> +
>> 8. Other capabilities.
>> ======================
>>
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> index 4b2156df40fc..e9275023a73a 100644
>> --- a/arch/riscv/kvm/vm.c
>> +++ b/arch/riscv/kvm/vm.c
>> @@ -202,6 +202,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_VM_GPA_BITS:
>> r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
>> break;
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +#ifdef CONFIG_64BIT
>> + r = 1;
>> +#else/* CONFIG_32BIT */
>> + r = 0;
>> +#endif
>
> r = IS_ENABLED(CONFIG_64BIT) ? 1 : 0;
Ack.
>> + break;
>> default:
>> r = 0;
>> break;
>> @@ -212,12 +219,27 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>
>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> {
>> + if (cap->flags)
>> + return -EINVAL;
>
>add blank line
Ack, will add a blank line after the flags check.
>
>> switch (cap->cap) {
>> case KVM_CAP_RISCV_MP_STATE_RESET:
>> - if (cap->flags)
>> - return -EINVAL;
>> kvm->arch.mp_state_reset = true;
>> return 0;
>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +#ifdef CONFIG_64BIT
>> + if (cap->args[0] < HGATP_MODE_SV39X4 ||
>> + cap->args[0] > kvm_riscv_gstage_max_mode)
>> + return -EINVAL;
>> + if (kvm->arch.gstage_mode_initialized)
>> + return 0;
>
>I think we want to return -EBUSY here and it should be documented where it
>already states "...can only be set once per VM"
Agreed.
>> + kvm->arch.gstage_mode_initialized = true;
>
>In the previous patch I thought we were missing this, but I see now it
>means "user initialized". Let's rename it as such,
>
> gstage_mode_user_initialized
Agreed.
>> + kvm->arch.kvm_riscv_gstage_mode = cap->args[0];
>> + kvm->arch.kvm_riscv_gstage_pgd_levels = 3 +
>> + kvm->arch.kvm_riscv_gstage_mode - HGATP_MODE_SV39X4;
>> + kvm_info("using SV%lluX4 G-stage page table format\n",
>> + 39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
>> +#endif
>> + return 0;
>> default:
>> return -EINVAL;
>> }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index dddb781b0507..00c02a880518 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -974,6 +974,7 @@ struct kvm_enable_cap {
>> #define KVM_CAP_GUEST_MEMFD_FLAGS 244
>> #define KVM_CAP_ARM_SEA_TO_USER 245
>> #define KVM_CAP_S390_USER_OPEREXEC 246
>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 247
>>
>> struct kvm_irq_routing_irqchip {
>> __u32 irqchip;
>> --
>> 2.50.1
>>
>
>Thanks,
>drew
>
Thanks,
Fangyu
© 2016 - 2026 Red Hat, Inc.