[PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE

fangyu.yu@linux.alibaba.com posted 2 patches 2 weeks ago
There is a newer version of this series
[PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Posted by fangyu.yu@linux.alibaba.com 2 weeks ago
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>

This capability allows userspace to explicitly select the HGATP mode
for the VM. The selected mode must be less than or equal to the max
HGATP mode supported by the hardware. This capability must be enabled
before creating any vCPUs, and can only be set once per VM.

Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
 Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
 arch/riscv/kvm/vm.c            | 26 ++++++++++++++++++++++++--
 include/uapi/linux/kvm.h       |  1 +
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..9d0794b174c7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8765,6 +8765,24 @@ helpful if user space wants to emulate instructions which are not
 This capability can be enabled dynamically even if VCPUs were already
 created and are running.
 
+7.47 KVM_CAP_RISCV_SET_HGATP_MODE
+---------------------------------
+
+:Architectures: riscv
+:Type: VM
+:Parameters: args[0] contains the requested HGATP mode
+:Returns:
+  - 0 on success.
+  - -EINVAL if args[0] is outside the range of HGATP modes supported by the
+    hardware.
+  - -EBUSY if vCPUs have already been created for the VM, if the VM has any
+    non-empty memslots, or if the capability has already been set for the VM.
+
+This capability allows userspace to explicitly select the HGATP mode for
+the VM. The selected mode must be less than or equal to the maximum HGATP
+mode supported by the hardware. This capability must be enabled before
+creating any vCPUs, and can only be set once per VM.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 4b2156df40fc..7bc9b193dcaa 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VM_GPA_BITS:
 		r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
 		break;
+	case KVM_CAP_RISCV_SET_HGATP_MODE:
+		r = IS_ENABLED(CONFIG_64BIT) ? 1 : 0;
+		break;
 	default:
 		r = 0;
 		break;
@@ -212,12 +215,31 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 {
+	if (cap->flags)
+		return -EINVAL;
+
 	switch (cap->cap) {
 	case KVM_CAP_RISCV_MP_STATE_RESET:
-		if (cap->flags)
-			return -EINVAL;
 		kvm->arch.mp_state_reset = true;
 		return 0;
+	case KVM_CAP_RISCV_SET_HGATP_MODE:
+#ifdef CONFIG_64BIT
+		if (cap->args[0] < HGATP_MODE_SV39X4 ||
+		    cap->args[0] > kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels))
+			return -EINVAL;
+
+		if (kvm->arch.gstage_mode_user_initialized || kvm->created_vcpus ||
+		    !kvm_are_all_memslots_empty(kvm))
+			return -EBUSY;
+
+		kvm->arch.gstage_mode_user_initialized = true;
+		kvm->arch.kvm_riscv_gstage_pgd_levels =
+				3 + cap->args[0] - HGATP_MODE_SV39X4;
+		kvm_debug("VM (vmid:%lu) using SV%lluX4 G-stage page table format\n",
+			  kvm->arch.vmid.vmid,
+			  39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
+#endif
+		return 0;
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..00c02a880518 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -974,6 +974,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_GUEST_MEMFD_FLAGS 244
 #define KVM_CAP_ARM_SEA_TO_USER 245
 #define KVM_CAP_S390_USER_OPEREXEC 246
+#define KVM_CAP_RISCV_SET_HGATP_MODE 247
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.50.1
Re: [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Posted by Radim Krčmář 1 week, 3 days ago
2026-01-25T23:04:50+08:00, <fangyu.yu@linux.alibaba.com>:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> This capability allows userspace to explicitly select the HGATP mode
> for the VM. The selected mode must be less than or equal to the max
> HGATP mode supported by the hardware. This capability must be enabled
> before creating any vCPUs, and can only be set once per VM.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
>  Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
>  arch/riscv/kvm/vm.c            | 26 ++++++++++++++++++++++++--
>  include/uapi/linux/kvm.h       |  1 +
>  3 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> @@ -8765,6 +8765,24 @@ helpful if user space wants to emulate instructions which are not
> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> +---------------------------------
> +
> +:Architectures: riscv
> +:Type: VM
> +:Parameters: args[0] contains the requested HGATP mode
> +:Returns:
> +  - 0 on success.
> +  - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> +    hardware.
> +  - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> +    non-empty memslots, or if the capability has already been set for the VM.
> +
> +This capability allows userspace to explicitly select the HGATP mode for
> +the VM. The selected mode must be less than or equal to the maximum HGATP
> +mode supported by the hardware.

"The selected mode must be supported by both KVM and hardware."

(The comparison is a technical detail, and incorrect too since the value
 is bouded from the bottom as well.)

>                                  This capability must be enabled before
> +creating any vCPUs, and can only be set once per VM.

                     ^ "or memslots"

> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_VM_GPA_BITS:
>  		r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
>  		break;
> +	case KVM_CAP_RISCV_SET_HGATP_MODE:
> +		r = IS_ENABLED(CONFIG_64BIT) ? 1 : 0;

Maybe we can return the currently selected mode for a bit of extra info?
Another nice option would be to return a bitmask of all supported modes.

I think userspace has otherwise no reason to call it, since it's fine to
just try enable and handle the -EINVAL as "don't care".
1 syscall instead of 2.

> @@ -212,12 +215,31 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  
>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>  {
> +	case KVM_CAP_RISCV_SET_HGATP_MODE:
> +#ifdef CONFIG_64BIT
> +		if (cap->args[0] < HGATP_MODE_SV39X4 ||
> +		    cap->args[0] > kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels))
> +			return -EINVAL;
> +
> +		if (kvm->arch.gstage_mode_user_initialized || kvm->created_vcpus ||
> +		    !kvm_are_all_memslots_empty(kvm))
> +			return -EBUSY;
> +
> +		kvm->arch.gstage_mode_user_initialized = true;

No need to have gstage_mode_user_initialized, since if the user could
have changed it once, there shouldn't be an issue in changing it again.
It's the other protections that must work.

> +		kvm->arch.kvm_riscv_gstage_pgd_levels =
> +				3 + cap->args[0] - HGATP_MODE_SV39X4;
> +		kvm_debug("VM (vmid:%lu) using SV%lluX4 G-stage page table format\n",
> +			  kvm->arch.vmid.vmid,
> +			  39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);

(I don't think this debug message is going to be useful after a short
 debugging period, and it would clog the log on each VM launch, so I'd
 rather get rid of it.)

Thanks.
Re: Re: [PATCH v3 2/2] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
Posted by fangyu.yu@linux.alibaba.com 1 week, 2 days ago
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> This capability allows userspace to explicitly select the HGATP mode
>> for the VM. The selected mode must be less than or equal to the max
>> HGATP mode supported by the hardware. This capability must be enabled
>> before creating any vCPUs, and can only be set once per VM.
>>
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>>  Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
>>  arch/riscv/kvm/vm.c            | 26 ++++++++++++++++++++++++--
>>  include/uapi/linux/kvm.h       |  1 +
>>  3 files changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> @@ -8765,6 +8765,24 @@ helpful if user space wants to emulate instructions which are not
>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> +---------------------------------
>> +
>> +:Architectures: riscv
>> +:Type: VM
>> +:Parameters: args[0] contains the requested HGATP mode
>> +:Returns:
>> +  - 0 on success.
>> +  - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>> +    hardware.
>> +  - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>> +    non-empty memslots, or if the capability has already been set for the VM.
>> +
>> +This capability allows userspace to explicitly select the HGATP mode for
>> +the VM. The selected mode must be less than or equal to the maximum HGATP
>> +mode supported by the hardware.
>
>"The selected mode must be supported by both KVM and hardware."

This description is clear, Agreed.

>(The comparison is a technical detail, and incorrect too since the value
> is bouded from the bottom as well.)
>
>>                                  This capability must be enabled before
>> +creating any vCPUs, and can only be set once per VM.
>
>                     ^ "or memslots"

Right, thanks for catching that.

>
>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> @@ -202,6 +202,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  	case KVM_CAP_VM_GPA_BITS:
>>  		r = kvm_riscv_gstage_gpa_bits(&kvm->arch);
>>  		break;
>> +	case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +		r = IS_ENABLED(CONFIG_64BIT) ? 1 : 0;
>
>Maybe we can return the currently selected mode for a bit of extra info?
>Another nice option would be to return a bitmask of all supported modes.
>
>I think userspace has otherwise no reason to call it, since it's fine to
>just try enable and handle the -EINVAL as "don't care".
>1 syscall instead of 2.

I’d prefer to keep an explicit KVM_CHECK_EXTENSION implementation for
KVM_CAP_RISCV_SET_HGATP_MODE. Userspace commonly uses CHECK_EXTENSION for
capability discovery, and returning 0 would make it assume the capability is
unsupported even though KVM_ENABLE_CAP works.

Returning 1/0 should be sufficient here, as the actual mode support is
validated by KVM_ENABLE_CAP itself (with -EINVAL for unsupported modes).

>> @@ -212,12 +215,31 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  
>>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>  {
>> +	case KVM_CAP_RISCV_SET_HGATP_MODE:
>> +#ifdef CONFIG_64BIT
>> +		if (cap->args[0] < HGATP_MODE_SV39X4 ||
>> +		    cap->args[0] > kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels))
>> +			return -EINVAL;
>> +
>> +		if (kvm->arch.gstage_mode_user_initialized || kvm->created_vcpus ||
>> +		    !kvm_are_all_memslots_empty(kvm))
>> +			return -EBUSY;
>> +
>> +		kvm->arch.gstage_mode_user_initialized = true;
>
>No need to have gstage_mode_user_initialized, since if the user could
>have changed it once, there shouldn't be an issue in changing it again.
>It's the other protections that must work.

Agreed — I'll drop gstage_mode_user_initialized. Userspace can change the
mode multiple times before the VM is committed, and updates will be gated
by the existing protections (i.e. once <vcpus created/ran | memslots active>,
the mode change will be rejected).

>> +		kvm->arch.kvm_riscv_gstage_pgd_levels =
>> +				3 + cap->args[0] - HGATP_MODE_SV39X4;
>> +		kvm_debug("VM (vmid:%lu) using SV%lluX4 G-stage page table format\n",
>> +			  kvm->arch.vmid.vmid,
>> +			  39 + (cap->args[0] - HGATP_MODE_SV39X4) * 9);
>
>(I don't think this debug message is going to be useful after a short
> debugging period, and it would clog the log on each VM launch, so I'd
> rather get rid of it.)

I'll drop this kvm_debug() from the capability path to avoid spamming the
log on each VM creation.

>Thanks.

Thanks,
Fangyu