Documentation/virt/kvm/x86/intel-tdx.rst | 16 +++++++++++++++ arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 34 ++++++++++++++++++++++++-------- 3 files changed, 43 insertions(+), 8 deletions(-)
Hi Changes in V4: Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. Use KVM_BUG_ON() instead of WARN_ON(). Correct kvm_trylock_all_vcpus() return value. Changes in V3: Refer: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would trigger on the error path from __tdx_td_init() Put cpus_read_lock() handling back into tdx_mmu_release_hkid() Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let tdx_vm_ioctl() deal with kvm->lock The version 1 RFC: https://lore.kernel.org/all/20250313181629.17764-1-adrian.hunter@intel.com/ listed 3 options and implemented option 2. Sean replied with code for option 1, which tested out OK, so here it is plus a commit log. It depends upon kvm_trylock_all_vcpus(kvm) which is now upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4a454ced74c0ac97c8bd32f086ee3ad74528780 Sean Christopherson (1): KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Documentation/virt/kvm/x86/intel-tdx.rst | 16 +++++++++++++++ arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 34 ++++++++++++++++++++++++-------- 3 files changed, 43 insertions(+), 8 deletions(-) Regards Adrian
On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: > Changes in V4: > > Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. > Use KVM_BUG_ON() instead of WARN_ON(). > Correct kvm_trylock_all_vcpus() return value. > > Changes in V3: > Refer: > https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com > > [...] Applied to kvm-x86 vmx, thanks! [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM https://github.com/kvm-x86/linux/commit/111a7311a016 -- https://github.com/kvm-x86/kvm-unit-tests/tree/next
On Wed, Jun 25, 2025, Sean Christopherson wrote: > On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: > > Changes in V4: > > > > Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. > > Use KVM_BUG_ON() instead of WARN_ON(). > > Correct kvm_trylock_all_vcpus() return value. > > > > Changes in V3: > > Refer: > > https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com > > > > [...] > > Applied to kvm-x86 vmx, thanks! > > [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM > https://github.com/kvm-x86/linux/commit/111a7311a016 Fixed up to address a docs goof[*], new hash: https://github.com/kvm-x86/linux/commit/e4775f57ad51 [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au
On 6/26/2025 11:58 PM, Sean Christopherson wrote: > On Wed, Jun 25, 2025, Sean Christopherson wrote: >> On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: >>> Changes in V4: >>> >>> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. >>> Use KVM_BUG_ON() instead of WARN_ON(). >>> Correct kvm_trylock_all_vcpus() return value. >>> >>> Changes in V3: >>> Refer: >>> https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com >>> >>> [...] >> >> Applied to kvm-x86 vmx, thanks! >> >> [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM >> https://github.com/kvm-x86/linux/commit/111a7311a016 > > Fixed up to address a docs goof[*], new hash: > > https://github.com/kvm-x86/linux/commit/e4775f57ad51 > > [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au Hi Sean, I think it's targeted for v6.17, right? If so, do we need the enumeration for the new TDX ioctl? Yes, the userspace could always try and ignore the failure. But since the ship has not sailed, I would like to report it and hear your opinion.
On Fri, Jul 11, 2025, Xiaoyao Li wrote: > On 6/26/2025 11:58 PM, Sean Christopherson wrote: > > On Wed, Jun 25, 2025, Sean Christopherson wrote: > > > On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: > > > > Changes in V4: > > > > > > > > Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. > > > > Use KVM_BUG_ON() instead of WARN_ON(). > > > > Correct kvm_trylock_all_vcpus() return value. > > > > > > > > Changes in V3: > > > > Refer: > > > > https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com > > > > > > > > [...] > > > > > > Applied to kvm-x86 vmx, thanks! > > > > > > [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM > > > https://github.com/kvm-x86/linux/commit/111a7311a016 > > > > Fixed up to address a docs goof[*], new hash: > > > > https://github.com/kvm-x86/linux/commit/e4775f57ad51 > > > > [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au > > Hi Sean, > > I think it's targeted for v6.17, right? > > If so, do we need the enumeration for the new TDX ioctl? Yes, the userspace > could always try and ignore the failure. But since the ship has not sailed, > I would like to report it and hear your opinion. Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it should be easy enough to tack on a capability. This? diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f0d961436d0f..dcb879897cab 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -9147,6 +9147,13 @@ KVM exits with the register state of either the L1 or L2 guest depending on which executed at the time of an exit. Userspace must take care to differentiate between these cases. +8.46 KVM_CAP_TDX_TERMINATE_VM +----------------------------- + +:Architectures: x86 + +This capability indicates that KVM supports the KVM_TDX_TERMINATE_VM sub-ioctl. + 9. Known KVM API problems ========================= diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b58a74c1722d..e437a50429d3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4823,6 +4823,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_READONLY_MEM: r = kvm ? kvm_arch_has_readonly_mem(kvm) : 1; break; + case KVM_CAP_TDX_TERMINATE_VM: + r = !!(kvm_caps.supported_vm_types & BIT(KVM_X86_TDX_VM)); + break; default: break; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 7a4c35ff03fe..54293df4a342 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -960,6 +960,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_EL2 240 #define KVM_CAP_ARM_EL2_E2H0 241 #define KVM_CAP_RISCV_MP_STATE_RESET 242 +#define KVM_CAP_TDX_TERMINATE_VM 243 struct kvm_irq_routing_irqchip { __u32 irqchip;
On 7/11/2025 9:05 PM, Sean Christopherson wrote: > On Fri, Jul 11, 2025, Xiaoyao Li wrote: >> On 6/26/2025 11:58 PM, Sean Christopherson wrote: >>> On Wed, Jun 25, 2025, Sean Christopherson wrote: >>>> On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: >>>>> Changes in V4: >>>>> >>>>> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. >>>>> Use KVM_BUG_ON() instead of WARN_ON(). >>>>> Correct kvm_trylock_all_vcpus() return value. >>>>> >>>>> Changes in V3: >>>>> Refer: >>>>> https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com >>>>> >>>>> [...] >>>> >>>> Applied to kvm-x86 vmx, thanks! >>>> >>>> [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM >>>> https://github.com/kvm-x86/linux/commit/111a7311a016 >>> >>> Fixed up to address a docs goof[*], new hash: >>> >>> https://github.com/kvm-x86/linux/commit/e4775f57ad51 >>> >>> [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au >> >> Hi Sean, >> >> I think it's targeted for v6.17, right? >> >> If so, do we need the enumeration for the new TDX ioctl? Yes, the userspace >> could always try and ignore the failure. But since the ship has not sailed, >> I would like to report it and hear your opinion. > > Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it should be > easy enough to tack on a capability. > > This? I'm wondering if we need a TDX centralized enumeration interface, e.g., new field in struct kvm_tdx_capabilities. I believe there will be more and more TDX new features, and assigning each a KVM_CAP seems wasteful. > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f0d961436d0f..dcb879897cab 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -9147,6 +9147,13 @@ KVM exits with the register state of either the L1 or L2 guest > depending on which executed at the time of an exit. Userspace must > take care to differentiate between these cases. > > +8.46 KVM_CAP_TDX_TERMINATE_VM > +----------------------------- > + > +:Architectures: x86 > + > +This capability indicates that KVM supports the KVM_TDX_TERMINATE_VM sub-ioctl. > + > 9. Known KVM API problems > ========================= > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index b58a74c1722d..e437a50429d3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4823,6 +4823,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_READONLY_MEM: > r = kvm ? kvm_arch_has_readonly_mem(kvm) : 1; > break; > + case KVM_CAP_TDX_TERMINATE_VM: > + r = !!(kvm_caps.supported_vm_types & BIT(KVM_X86_TDX_VM)); > + break; > default: > break; > } > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 7a4c35ff03fe..54293df4a342 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -960,6 +960,7 @@ struct kvm_enable_cap { > #define KVM_CAP_ARM_EL2 240 > #define KVM_CAP_ARM_EL2_E2H0 241 > #define KVM_CAP_RISCV_MP_STATE_RESET 242 > +#define KVM_CAP_TDX_TERMINATE_VM 243 > > struct kvm_irq_routing_irqchip { > __u32 irqchip;
On Fri, Jul 11, 2025, Xiaoyao Li wrote: > On 7/11/2025 9:05 PM, Sean Christopherson wrote: > > On Fri, Jul 11, 2025, Xiaoyao Li wrote: > > > On 6/26/2025 11:58 PM, Sean Christopherson wrote: > > > > On Wed, Jun 25, 2025, Sean Christopherson wrote: > > > > > On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: > > > > > > Changes in V4: > > > > > > > > > > > > Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. > > > > > > Use KVM_BUG_ON() instead of WARN_ON(). > > > > > > Correct kvm_trylock_all_vcpus() return value. > > > > > > > > > > > > Changes in V3: > > > > > > Refer: > > > > > > https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com > > > > > > > > > > > > [...] > > > > > > > > > > Applied to kvm-x86 vmx, thanks! > > > > > > > > > > [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM > > > > > https://github.com/kvm-x86/linux/commit/111a7311a016 > > > > > > > > Fixed up to address a docs goof[*], new hash: > > > > > > > > https://github.com/kvm-x86/linux/commit/e4775f57ad51 > > > > > > > > [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au > > > > > > Hi Sean, > > > > > > I think it's targeted for v6.17, right? > > > > > > If so, do we need the enumeration for the new TDX ioctl? Yes, the userspace > > > could always try and ignore the failure. But since the ship has not sailed, > > > I would like to report it and hear your opinion. > > > > Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it should be > > easy enough to tack on a capability. > > > > This? > > I'm wondering if we need a TDX centralized enumeration interface, e.g., new > field in struct kvm_tdx_capabilities. I believe there will be more and more > TDX new features, and assigning each a KVM_CAP seems wasteful. Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, LOL, and we certainly have the capacity in the structure: __u64 reserved[250]; Sans documentation, something like so? -- diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 13da87c05098..70ffe6e8d216 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -963,6 +963,8 @@ struct kvm_tdx_cmd { __u64 hw_error; }; +#define KVM_TDX_CAP_TERMINATE_VM _BITULL(0) + struct kvm_tdx_capabilities { __u64 supported_attrs; __u64 supported_xfam; @@ -972,7 +974,9 @@ struct kvm_tdx_capabilities { __u64 kernel_tdvmcallinfo_1_r12; __u64 user_tdvmcallinfo_1_r12; - __u64 reserved[250]; + __u64 supported_caps; + + __u64 reserved[249]; /* Configurable CPUID bits for userspace */ struct kvm_cpuid2 cpuid; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f4d4fd5cc6e8..783b1046f6c1 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -189,6 +189,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, if (!caps->supported_xfam) return -EIO; + caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM; + caps->cpuid.nent = td_conf->num_cpuid_config; caps->user_tdvmcallinfo_1_r11 = -- Aha! And if we squeeze in a patch for 6.16. to zero out the reserved array, we can even avoid adding a capability to enumerate the TDX capability functionality. -- diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f4d4fd5cc6e8..9c2997665762 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -181,6 +181,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, { int i; + memset(caps->reserved, 0, sizeof(caps->reserved)); + caps->supported_attrs = tdx_get_supported_attrs(td_conf); if (!caps->supported_attrs) return -EIO; --
On 11.07.25 г. 17:19 ч., Sean Christopherson wrote: > On Fri, Jul 11, 2025, Xiaoyao Li wrote: >> On 7/11/2025 9:05 PM, Sean Christopherson wrote: >>> On Fri, Jul 11, 2025, Xiaoyao Li wrote: >>>> On 6/26/2025 11:58 PM, Sean Christopherson wrote: >>>>> On Wed, Jun 25, 2025, Sean Christopherson wrote: >>>>>> On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: >>>>>>> Changes in V4: >>>>>>> >>>>>>> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. >>>>>>> Use KVM_BUG_ON() instead of WARN_ON(). >>>>>>> Correct kvm_trylock_all_vcpus() return value. >>>>>>> >>>>>>> Changes in V3: >>>>>>> Refer: >>>>>>> https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com >>>>>>> >>>>>>> [...] >>>>>> >>>>>> Applied to kvm-x86 vmx, thanks! >>>>>> >>>>>> [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM >>>>>> https://github.com/kvm-x86/linux/commit/111a7311a016 >>>>> >>>>> Fixed up to address a docs goof[*], new hash: >>>>> >>>>> https://github.com/kvm-x86/linux/commit/e4775f57ad51 >>>>> >>>>> [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au >>>> >>>> Hi Sean, >>>> >>>> I think it's targeted for v6.17, right? >>>> >>>> If so, do we need the enumeration for the new TDX ioctl? Yes, the userspace >>>> could always try and ignore the failure. But since the ship has not sailed, >>>> I would like to report it and hear your opinion. >>> >>> Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it should be >>> easy enough to tack on a capability. >>> >>> This? >> >> I'm wondering if we need a TDX centralized enumeration interface, e.g., new >> field in struct kvm_tdx_capabilities. I believe there will be more and more >> TDX new features, and assigning each a KVM_CAP seems wasteful. > > Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, > > LOL, and we certainly have the capacity in the structure: > > __u64 reserved[250]; > > Sans documentation, something like so? > > -- > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index 13da87c05098..70ffe6e8d216 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -963,6 +963,8 @@ struct kvm_tdx_cmd { > __u64 hw_error; > }; > > +#define KVM_TDX_CAP_TERMINATE_VM _BITULL(0) > + > struct kvm_tdx_capabilities { > __u64 supported_attrs; > __u64 supported_xfam; > @@ -972,7 +974,9 @@ struct kvm_tdx_capabilities { > __u64 kernel_tdvmcallinfo_1_r12; > __u64 user_tdvmcallinfo_1_r12; > > - __u64 reserved[250]; > + __u64 supported_caps; > + > + __u64 reserved[249]; > > /* Configurable CPUID bits for userspace */ > struct kvm_cpuid2 cpuid; > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f4d4fd5cc6e8..783b1046f6c1 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -189,6 +189,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > if (!caps->supported_xfam) > return -EIO; > > + caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM; nit: For the sake of consistency make that a |= so that all subsequent additions to it will be uniform with the first. > + > caps->cpuid.nent = td_conf->num_cpuid_config; > > caps->user_tdvmcallinfo_1_r11 = > -- > > > Aha! And if we squeeze in a patch for 6.16. to zero out the reserved array, we > can even avoid adding a capability to enumerate the TDX capability functionality. > > -- > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f4d4fd5cc6e8..9c2997665762 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -181,6 +181,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > { > int i; > > + memset(caps->reserved, 0, sizeof(caps->reserved)); > + > caps->supported_attrs = tdx_get_supported_attrs(td_conf); > if (!caps->supported_attrs) > return -EIO; > -- >
On Thu, Jul 17, 2025, Nikolay Borisov wrote: > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index f4d4fd5cc6e8..783b1046f6c1 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -189,6 +189,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > > if (!caps->supported_xfam) > > return -EIO; > > + caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM; > > nit: For the sake of consistency make that a |= so that all subsequent > additions to it will be uniform with the first. Objection, speculation, your honor. :-D That assumes that the predominate pattern will be "|=". But if we end up with a collection of capabilities that are unconditionally enumerated by KVM, then I definitely want to express that as: caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM | KVM_TDX_CAP_FANCY_THING_1 | KVM_TDX_CAP_FANCY_THING_2 | KVM_TDX_CAP_FANCY_THING_3; not as caps->supported_caps |= KVM_TDX_CAP_TERMINATE_VM; caps->supported_caps |= KVM_TDX_CAP_FANCY_THING1; caps->supported_caps |= KVM_TDX_CAP_FANCY_THING2; caps->supported_caps |= KVM_TDX_CAP_FANCY_THING3; I find the former to be much easier to read, and it provides some amount of defense-in-depth against uninitialized data. The downside is that if there are conditional capabilities, then we need to ensure that they are added after the unconditional set is initialized. But we absolutely should be able to detect such bugs via selftests. And again, I find that this: caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM | KVM_TDX_CAP_FANCY_THING_1 | KVM_TDX_CAP_FANCY_THING_2 | KVM_TDX_CAP_FANCY_THING_3; if (i_can_has_cheezburger()) caps->supported_caps |= KVM_TDX_CAP_CHEEZBURGER; makes it easy to identify which capabilities are unconditional versus those that are dependent on something else.
On 7/11/2025 10:19 PM, Sean Christopherson wrote: > On Fri, Jul 11, 2025, Xiaoyao Li wrote: ... >> >> I'm wondering if we need a TDX centralized enumeration interface, e.g., new >> field in struct kvm_tdx_capabilities. I believe there will be more and more >> TDX new features, and assigning each a KVM_CAP seems wasteful. > > Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, > > LOL, and we certainly have the capacity in the structure: > > __u64 reserved[250]; > > Sans documentation, something like so? I suppose it will be squashed into the original patch, so just gave Tested-by: Xiaoyao Li <xiaoyao.li@intel.com> > -- > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index 13da87c05098..70ffe6e8d216 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -963,6 +963,8 @@ struct kvm_tdx_cmd { > __u64 hw_error; > }; > > +#define KVM_TDX_CAP_TERMINATE_VM _BITULL(0) > + > struct kvm_tdx_capabilities { > __u64 supported_attrs; > __u64 supported_xfam; > @@ -972,7 +974,9 @@ struct kvm_tdx_capabilities { > __u64 kernel_tdvmcallinfo_1_r12; > __u64 user_tdvmcallinfo_1_r12; > > - __u64 reserved[250]; > + __u64 supported_caps; > + > + __u64 reserved[249]; > > /* Configurable CPUID bits for userspace */ > struct kvm_cpuid2 cpuid; > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f4d4fd5cc6e8..783b1046f6c1 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -189,6 +189,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > if (!caps->supported_xfam) > return -EIO; > > + caps->supported_caps = KVM_TDX_CAP_TERMINATE_VM; > + > caps->cpuid.nent = td_conf->num_cpuid_config; > > caps->user_tdvmcallinfo_1_r11 =
On Wed, Jul 16, 2025, Xiaoyao Li wrote: > On 7/11/2025 10:19 PM, Sean Christopherson wrote: > > On Fri, Jul 11, 2025, Xiaoyao Li wrote: > ... > > > > > > I'm wondering if we need a TDX centralized enumeration interface, e.g., new > > > field in struct kvm_tdx_capabilities. I believe there will be more and more > > > TDX new features, and assigning each a KVM_CAP seems wasteful. > > > > Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, > > > > LOL, and we certainly have the capacity in the structure: > > > > __u64 reserved[250]; > > > > Sans documentation, something like so? > > I suppose it will be squashed into the original patch, I dropped the commit from kvm-x86/vmx, and will send a full v5. There's enough moving parts that I don't want to risk going 'round in circles trying to squash fixes :-) > so just gave > > Tested-by: Xiaoyao Li <xiaoyao.li@intel.com> Thanks!
On Fri, 2025-07-11 at 07:19 -0700, Sean Christopherson wrote: > -- > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f4d4fd5cc6e8..9c2997665762 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -181,6 +181,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > { > int i; > > + memset(caps->reserved, 0, sizeof(caps->reserved)); > + > caps->supported_attrs = tdx_get_supported_attrs(td_conf); > if (!caps->supported_attrs) > return -EIO; > -- I started to try to help by chipping in a log for this, but I couldn't justify it very well. struct kvm_tdx_capabilities gets copied from userspace before being populated So a userspace that knows to look for something in the reserved area could know to zero it. If they left their own data in the reserved area, and then relied on that data to remain the same, and then we started setting a new field in it I guess it could disturb it. But that is strange, and I'm not sure it really reduces much risk. Anyway here is the attempt to justify it. KVM: TDX: Zero reserved reserved area in struct kvm_tdx_capabilities Zero the reserved area in struct kvm_tdx_capabilities so that fields added in the reserved area won't disturb any userspace that previously had garbage there. struct kvm_tdx_capabilities holds information about the combined support of KVM and the TDX module. For future growth, there is an area of the struct marked as reserved. This way fields can be added into that space without increasing the size of the struct. However, currently the reserved area is not zeroed, meaning any data that userspace left in the reserved area would be clobbered by a future field written in the reserved area. So zero the reserved area to reduce the risk that userspace might try to rely on some data there.
On Fri, Jul 11, 2025, Rick P Edgecombe wrote: > On Fri, 2025-07-11 at 07:19 -0700, Sean Christopherson wrote: > > -- > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index f4d4fd5cc6e8..9c2997665762 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -181,6 +181,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, > > { > > int i; > > > > + memset(caps->reserved, 0, sizeof(caps->reserved)); > > + > > caps->supported_attrs = tdx_get_supported_attrs(td_conf); > > if (!caps->supported_attrs) > > return -EIO; > > -- > > I started to try to help by chipping in a log for this, but I couldn't justify > it very well. struct kvm_tdx_capabilities gets copied from userspace before > being populated So a userspace that knows to look for something in the reserved > area could know to zero it. If they left their own data in the reserved area, > and then relied on that data to remain the same, and then we started setting a > new field in it I guess it could disturb it. But that is strange, and I'm not > sure it really reduces much risk. Anyway here is the attempt to justify it. > > > KVM: TDX: Zero reserved reserved area in struct kvm_tdx_capabilities > > Zero the reserved area in struct kvm_tdx_capabilities so that fields added in > the reserved area won't disturb any userspace that previously had garbage there. It's not only about disturbing userspace, it's also about actually being able to repurpose the reserved fields in the future without needing *another* flag to tell userspace that it's ok to read the previously-reserved fields. I care about this much more than I care about userspace using reserved fields as scratch space. > struct kvm_tdx_capabilities holds information about the combined support of KVM > and the TDX module. For future growth, there is an area of the struct marked as > reserved. This way fields can be added into that space without increasing the > size of the struct. > > However, currently the reserved area is not zeroed, meaning any data that > userspace left in the reserved area would be clobbered by a future field written > in the reserved area. So zero the reserved area to reduce the risk that > userspace might try to rely on some data there.
On Fri, 2025-07-11 at 16:05 -0700, Sean Christopherson wrote: > > Zero the reserved area in struct kvm_tdx_capabilities so that fields added > > in > > the reserved area won't disturb any userspace that previously had garbage > > there. > > It's not only about disturbing userspace, it's also about actually being able > to repurpose the reserved fields in the future without needing *another* flag > to tell userspace that it's ok to read the previously-reserved fields. I care > about this much more than I care about userspace using reserved fields as > scratch space. If, before calling KVM_TDX_CAPABILITIES, userspace zeros the new field that it knows about, but isn't sure if the kernel does, it's the same no? Did you see that the way KVM_TDX_CAPABILITIES is implemented today is a little weird? It actually copies the whole struct kvm_tdx_capabilities from userspace and then sets some fields (not reserved) and then copies it back. So userspace can zero any fields it wants to know about before calling KVM_TDX_CAPABILITIES. Then it could know the same things as if the kernel zeroed it. I was actually wondering if we want to change the kernel to zero reserved, if it might make more sense to just copy caps->cpuid.nent field from userspace, and then populate the whole thing starting from a zero'd buffer in the kernel.
On 7/12/2025 7:17 AM, Edgecombe, Rick P wrote: > On Fri, 2025-07-11 at 16:05 -0700, Sean Christopherson wrote: >>> Zero the reserved area in struct kvm_tdx_capabilities so that fields added >>> in >>> the reserved area won't disturb any userspace that previously had garbage >>> there. >> >> It's not only about disturbing userspace, it's also about actually being able >> to repurpose the reserved fields in the future without needing *another* flag >> to tell userspace that it's ok to read the previously-reserved fields. I care >> about this much more than I care about userspace using reserved fields as >> scratch space. > > If, before calling KVM_TDX_CAPABILITIES, userspace zeros the new field that it > knows about, but isn't sure if the kernel does, it's the same no? > > Did you see that the way KVM_TDX_CAPABILITIES is implemented today is a little > weird? It actually copies the whole struct kvm_tdx_capabilities from userspace > and then sets some fields (not reserved) and then copies it back. So userspace > can zero any fields it wants to know about before calling KVM_TDX_CAPABILITIES. > Then it could know the same things as if the kernel zeroed it. > > I was actually wondering if we want to change the kernel to zero reserved, if it > might make more sense to just copy caps->cpuid.nent field from userspace, and > then populate the whole thing starting from a zero'd buffer in the kernel. +1 to zero the whole buffer of *caps in the kernel. current code seems to have issue on the caps->kernel_tdvmcallinfo_1_r11/kernel_tdvmcallinfo_1_r12/user_tdvmcallinfo_1_r12, as KVM cannot guarantee zero'ed value are returned to userspace.
On Mon, Jul 14, 2025, Xiaoyao Li wrote: > On 7/12/2025 7:17 AM, Edgecombe, Rick P wrote: > > On Fri, 2025-07-11 at 16:05 -0700, Sean Christopherson wrote: > > > > Zero the reserved area in struct kvm_tdx_capabilities so that fields added > > > > in > > > > the reserved area won't disturb any userspace that previously had garbage > > > > there. > > > > > > It's not only about disturbing userspace, it's also about actually being able > > > to repurpose the reserved fields in the future without needing *another* flag > > > to tell userspace that it's ok to read the previously-reserved fields. I care > > > about this much more than I care about userspace using reserved fields as > > > scratch space. > > > > If, before calling KVM_TDX_CAPABILITIES, userspace zeros the new field that it > > knows about, but isn't sure if the kernel does, it's the same no? Heh, yeah, this crossed my mind about 5 minutes after I logged off :-) > > Did you see that the way KVM_TDX_CAPABILITIES is implemented today is a little > > weird? It actually copies the whole struct kvm_tdx_capabilities from userspace > > and then sets some fields (not reserved) and then copies it back. So userspace > > can zero any fields it wants to know about before calling KVM_TDX_CAPABILITIES. > > Then it could know the same things as if the kernel zeroed it. > > > > I was actually wondering if we want to change the kernel to zero reserved, if it > > might make more sense to just copy caps->cpuid.nent field from userspace, and > > then populate the whole thing starting from a zero'd buffer in the kernel. > > +1 to zero the whole buffer of *caps in the kernel. Ya, I almost suggested that, but assumed there was a reason for copying the entire structure. > current code seems to have issue on the caps->kernel_tdvmcallinfo_1_r11/kernel_tdvmcallinfo_1_r12/user_tdvmcallinfo_1_r12, > as KVM cannot guarantee zero'ed value are returned to userspace. This? (untested) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f4d4fd5cc6e8..42cb328d8a7d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2270,25 +2270,26 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; struct kvm_tdx_capabilities __user *user_caps; struct kvm_tdx_capabilities *caps = NULL; + u32 nr_user_entries; int ret = 0; /* flags is reserved for future use */ if (cmd->flags) return -EINVAL; - caps = kmalloc(sizeof(*caps) + + caps = kzalloc(sizeof(*caps) + sizeof(struct kvm_cpuid_entry2) * td_conf->num_cpuid_config, GFP_KERNEL); if (!caps) return -ENOMEM; user_caps = u64_to_user_ptr(cmd->data); - if (copy_from_user(caps, user_caps, sizeof(*caps))) { + if (get_user(nr_user_entries, &user_caps->cpuid.nent)) { ret = -EFAULT; goto out; } - if (caps->cpuid.nent < td_conf->num_cpuid_config) { + if (nr_user_entries < td_conf->num_cpuid_config) { ret = -E2BIG; goto out; }
On 7/14/2025 9:56 PM, Sean Christopherson wrote: > On Mon, Jul 14, 2025, Xiaoyao Li wrote: >> On 7/12/2025 7:17 AM, Edgecombe, Rick P wrote: >>> On Fri, 2025-07-11 at 16:05 -0700, Sean Christopherson wrote: >>>>> Zero the reserved area in struct kvm_tdx_capabilities so that fields added >>>>> in >>>>> the reserved area won't disturb any userspace that previously had garbage >>>>> there. >>>> >>>> It's not only about disturbing userspace, it's also about actually being able >>>> to repurpose the reserved fields in the future without needing *another* flag >>>> to tell userspace that it's ok to read the previously-reserved fields. I care >>>> about this much more than I care about userspace using reserved fields as >>>> scratch space. >>> >>> If, before calling KVM_TDX_CAPABILITIES, userspace zeros the new field that it >>> knows about, but isn't sure if the kernel does, it's the same no? > > Heh, yeah, this crossed my mind about 5 minutes after I logged off :-) > >>> Did you see that the way KVM_TDX_CAPABILITIES is implemented today is a little >>> weird? It actually copies the whole struct kvm_tdx_capabilities from userspace >>> and then sets some fields (not reserved) and then copies it back. So userspace >>> can zero any fields it wants to know about before calling KVM_TDX_CAPABILITIES. >>> Then it could know the same things as if the kernel zeroed it. >>> >>> I was actually wondering if we want to change the kernel to zero reserved, if it >>> might make more sense to just copy caps->cpuid.nent field from userspace, and >>> then populate the whole thing starting from a zero'd buffer in the kernel. >> >> +1 to zero the whole buffer of *caps in the kernel. > > Ya, I almost suggested that, but assumed there was a reason for copying the entire > structure. > >> current code seems to have issue on the caps->kernel_tdvmcallinfo_1_r11/kernel_tdvmcallinfo_1_r12/user_tdvmcallinfo_1_r12, >> as KVM cannot guarantee zero'ed value are returned to userspace. > > This? (untested) Tested-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f4d4fd5cc6e8..42cb328d8a7d 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -2270,25 +2270,26 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) > const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; > struct kvm_tdx_capabilities __user *user_caps; > struct kvm_tdx_capabilities *caps = NULL; > + u32 nr_user_entries; > int ret = 0; > > /* flags is reserved for future use */ > if (cmd->flags) > return -EINVAL; > > - caps = kmalloc(sizeof(*caps) + > + caps = kzalloc(sizeof(*caps) + > sizeof(struct kvm_cpuid_entry2) * td_conf->num_cpuid_config, > GFP_KERNEL); > if (!caps) > return -ENOMEM; > > user_caps = u64_to_user_ptr(cmd->data); > - if (copy_from_user(caps, user_caps, sizeof(*caps))) { > + if (get_user(nr_user_entries, &user_caps->cpuid.nent)) { > ret = -EFAULT; > goto out; > } > > - if (caps->cpuid.nent < td_conf->num_cpuid_config) { > + if (nr_user_entries < td_conf->num_cpuid_config) { > ret = -E2BIG; > goto out; > }
On Fri, 2025-07-11 at 07:19 -0700, Sean Christopherson wrote: > > > Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it > > > should be > > > easy enough to tack on a capability. > > > > > > This? > > > > I'm wondering if we need a TDX centralized enumeration interface, e.g., new > > field in struct kvm_tdx_capabilities. I believe there will be more and more > > TDX new features, and assigning each a KVM_CAP seems wasteful. > > Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, How do you guys see it as wasteful? The highest cap is currently 242. For 32 bit KVM that leaves 2147483405 caps. If we create an interface we grow some code and docs, and get 64 additional ones for TDX only. The less interfaces the better I say, so KVM_CAP_TDX_TERMINATE_VM seems better. Xiaoyao, is this something QEMU needs? Or more of a completeness kind of thing?
On Fri, Jul 11, 2025, Rick P Edgecombe wrote: > On Fri, 2025-07-11 at 07:19 -0700, Sean Christopherson wrote: > > > > Bugger, you're right. It's sitting at the top of 'kvm-x86 vmx', so it > > > > should be > > > > easy enough to tack on a capability. > > > > > > > > This? > > > > > > I'm wondering if we need a TDX centralized enumeration interface, e.g., new > > > field in struct kvm_tdx_capabilities. I believe there will be more and more > > > TDX new features, and assigning each a KVM_CAP seems wasteful. > > > > Oh, yeah, that's a much better idea. In addition to not polluting KVM_CAP, > > How do you guys see it as wasteful? The highest cap is currently 242. For 32 bit > KVM that leaves 2147483405 caps. If we create an interface we grow some code and > docs, and get 64 additional ones for TDX only. It bleeds TDX details into arch neutral code. > The less interfaces the better I say, so KVM_CAP_TDX_TERMINATE_VM seems better. But we already have KVM_TDX_CAPABILITIES. This isn't really a new interface, it's a new field in a pre-existing interface. > Xiaoyao, is this something QEMU needs? Or more of a completeness kind of thing? Required by VMMs. KVM always needs to be able enumerate its new features. We absolutely do not want userspace making guesses based on e.g. kernel version.
On Fri, 2025-07-11 at 15:54 -0700, Sean Christopherson wrote: > > How do you guys see it as wasteful? The highest cap is currently 242. For 32 > > bit > > KVM that leaves 2147483405 caps. If we create an interface we grow some code > > and > > docs, and get 64 additional ones for TDX only. > > It bleeds TDX details into arch neutral code. There are tons of arch specific caps. Can you help me understand this point a little more? Is TDX special compared to the other arch specific ones? > > > The less interfaces the better I say, so KVM_CAP_TDX_TERMINATE_VM seems > > better. > > But we already have KVM_TDX_CAPABILITIES. This isn't really a new interface, > it's > a new field in a pre-existing interface. I guess. It's new place to check for the same type of information that caps currently provides. Not a big deal either way to me though. > > > Xiaoyao, is this something QEMU needs? Or more of a completeness kind of > > thing? > > Required by VMMs. KVM always needs to be able enumerate its new features. We > absolutely do not want userspace making guesses based on e.g. kernel version. Ok.
On 26/06/2025 18:58, Sean Christopherson wrote: > On Wed, Jun 25, 2025, Sean Christopherson wrote: >> On Wed, 11 Jun 2025 12:51:57 +0300, Adrian Hunter wrote: >>> Changes in V4: >>> >>> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. >>> Use KVM_BUG_ON() instead of WARN_ON(). >>> Correct kvm_trylock_all_vcpus() return value. >>> >>> Changes in V3: >>> Refer: >>> https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com >>> >>> [...] >> >> Applied to kvm-x86 vmx, thanks! >> >> [1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM >> https://github.com/kvm-x86/linux/commit/111a7311a016 > > Fixed up to address a docs goof[*], new hash: > > https://github.com/kvm-x86/linux/commit/e4775f57ad51 > > [*] https://lore.kernel.org/all/20250626171004.7a1a024b@canb.auug.org.au Thank you!
© 2016 - 2025 Red Hat, Inc.