Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been
created.
The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC
frequency that all subsequent created vCPUs use. It is only intended to
be called before any vCPU is created. Allowing it to be called after
that only results in confusion but nothing good.
Note this is an ABI change. But currently in Qemu (the de facto
userspace VMM) only TDX uses this VM ioctl, and it is only called once
before creating any vCPU, therefore the risk of breaking userspace is
pretty low.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
arch/x86/kvm/x86.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 699ca5e74bba..e5e55d549468 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
u32 user_tsc_khz;
r = -EINVAL;
+
+ if (kvm->created_vcpus)
+ goto out;
+
user_tsc_khz = (u32)arg;
if (kvm_caps.has_tsc_control &&
--
2.50.0
On 7/9/2025 11:08 AM, Kai Huang wrote: > Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > created. Probably the below is clear: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created > > The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > frequency that all subsequent created vCPUs use. It is only intended to > be called before any vCPU is created. Allowing it to be called after > that only results in confusion but nothing good. > > Note this is an ABI change. But currently in Qemu (the de facto > userspace VMM) only TDX uses this VM ioctl, and it is only called once > before creating any vCPU, therefore the risk of breaking userspace is > pretty low. > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> > --- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 699ca5e74bba..e5e55d549468 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > u32 user_tsc_khz; > > r = -EINVAL; > + > + if (kvm->created_vcpus) > + goto out; > + > user_tsc_khz = (u32)arg; > > if (kvm_caps.has_tsc_control &&
On Wed, 2025-07-09 at 14:21 +0530, Nikunj A. Dadhania wrote: > > On 7/9/2025 11:08 AM, Kai Huang wrote: > > Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > > created. > > Probably the below is clear: > > Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created Will do. Thanks. > > > > > The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > > frequency that all subsequent created vCPUs use. It is only intended to > > be called before any vCPU is created. Allowing it to be called after > > that only results in confusion but nothing good. > > > > Note this is an ABI change. But currently in Qemu (the de facto > > userspace VMM) only TDX uses this VM ioctl, and it is only called once > > before creating any vCPU, therefore the risk of breaking userspace is > > pretty low. > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Thanks. I'll keep you guys RB anyway after adding the mutex. Let me know if you are not OK.
On Wed, Jul 09, 2025 at 05:38:00PM +1200, Kai Huang wrote: >Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been >created. > >The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC >frequency that all subsequent created vCPUs use. It is only intended to >be called before any vCPU is created. Allowing it to be called after >that only results in confusion but nothing good. > >Note this is an ABI change. But currently in Qemu (the de facto >userspace VMM) only TDX uses this VM ioctl, and it is only called once >before creating any vCPU, therefore the risk of breaking userspace is >pretty low. > >Suggested-by: Sean Christopherson <seanjc@google.com> >Signed-off-by: Kai Huang <kai.huang@intel.com> >--- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index 699ca5e74bba..e5e55d549468 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > u32 user_tsc_khz; > > r = -EINVAL; >+ >+ if (kvm->created_vcpus) >+ goto out; >+ shouldn't kvm->lock be held? > user_tsc_khz = (u32)arg; > > if (kvm_caps.has_tsc_control && >-- >2.50.0 > >
On Wed, Jul 09, 2025, Chao Gao wrote: > On Wed, Jul 09, 2025 at 05:38:00PM +1200, Kai Huang wrote: > >Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > >created. > > > >The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > >frequency that all subsequent created vCPUs use. It is only intended to > >be called before any vCPU is created. Allowing it to be called after > >that only results in confusion but nothing good. > > > >Note this is an ABI change. But currently in Qemu (the de facto > >userspace VMM) only TDX uses this VM ioctl, and it is only called once > >before creating any vCPU, therefore the risk of breaking userspace is > >pretty low. > > > >Suggested-by: Sean Christopherson <seanjc@google.com> > >Signed-off-by: Kai Huang <kai.huang@intel.com> > >--- > > arch/x86/kvm/x86.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index 699ca5e74bba..e5e55d549468 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > > u32 user_tsc_khz; > > > > r = -EINVAL; > >+ > >+ if (kvm->created_vcpus) > >+ goto out; > >+ > > shouldn't kvm->lock be held? Yep.
On Wed, 2025-07-09 at 06:55 -0700, Sean Christopherson wrote: > On Wed, Jul 09, 2025, Chao Gao wrote: > > On Wed, Jul 09, 2025 at 05:38:00PM +1200, Kai Huang wrote: > > > Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > > > created. > > > > > > The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > > > frequency that all subsequent created vCPUs use. It is only intended to > > > be called before any vCPU is created. Allowing it to be called after > > > that only results in confusion but nothing good. > > > > > > Note this is an ABI change. But currently in Qemu (the de facto > > > userspace VMM) only TDX uses this VM ioctl, and it is only called once > > > before creating any vCPU, therefore the risk of breaking userspace is > > > pretty low. > > > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > --- > > > arch/x86/kvm/x86.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 699ca5e74bba..e5e55d549468 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > > > u32 user_tsc_khz; > > > > > > r = -EINVAL; > > > + > > > + if (kvm->created_vcpus) > > > + goto out; > > > + > > > > shouldn't kvm->lock be held? > > Yep. My bad. I'll fixup and send out v2 soon, together with the doc update.
On Thu, 2025-07-10 at 22:53 +0000, Huang, Kai wrote: > On Wed, 2025-07-09 at 06:55 -0700, Sean Christopherson wrote: > > On Wed, Jul 09, 2025, Chao Gao wrote: > > > On Wed, Jul 09, 2025 at 05:38:00PM +1200, Kai Huang wrote: > > > > Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > > > > created. > > > > > > > > The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > > > > frequency that all subsequent created vCPUs use. It is only intended to > > > > be called before any vCPU is created. Allowing it to be called after > > > > that only results in confusion but nothing good. > > > > > > > > Note this is an ABI change. But currently in Qemu (the de facto > > > > userspace VMM) only TDX uses this VM ioctl, and it is only called once > > > > before creating any vCPU, therefore the risk of breaking userspace is > > > > pretty low. > > > > > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > --- > > > > arch/x86/kvm/x86.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > index 699ca5e74bba..e5e55d549468 100644 > > > > --- a/arch/x86/kvm/x86.c > > > > +++ b/arch/x86/kvm/x86.c > > > > @@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > > > > u32 user_tsc_khz; > > > > > > > > r = -EINVAL; > > > > + > > > > + if (kvm->created_vcpus) > > > > + goto out; > > > > + > > > > > > shouldn't kvm->lock be held? > > > > Yep. > > My bad. I'll fixup and send out v2 soon, together with the doc update. Sorry for multiple emails, but I ended up with below. AFAICT the actual updating of kvm->arch.default_tsc_khz needs to be in the kvm->lock mutex too. Please let me know if you found any issue? diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 43ed57e048a8..86ea1e2b2737 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2006,7 +2006,7 @@ frequency is KHz. If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also be used as a vm ioctl to set the initial tsc frequency of subsequently -created vCPUs. +created vCPUs. It must be called before any vCPU is created. 4.56 KVM_GET_TSC_KHZ -------------------- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2806f7104295..4051c0cacb92 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7199,9 +7199,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) if (user_tsc_khz == 0) user_tsc_khz = tsc_khz; - WRITE_ONCE(kvm->arch.default_tsc_khz, user_tsc_khz); - r = 0; - + mutex_lock(&kvm->lock); + if (!kvm->created_vcpus) { + WRITE_ONCE(kvm->arch.default_tsc_khz, user_tsc_khz); + r = 0; + } + mutex_unlock(&kvm->lock); goto out; } case KVM_GET_TSC_KHZ: {
>AFAICT the actual updating of kvm->arch.default_tsc_khz needs to be in the >kvm->lock mutex too. Yep. >Please let me know if you found any issue? > >diff --git a/Documentation/virt/kvm/api.rst >b/Documentation/virt/kvm/api.rst >index 43ed57e048a8..86ea1e2b2737 100644 >--- a/Documentation/virt/kvm/api.rst >+++ b/Documentation/virt/kvm/api.rst >@@ -2006,7 +2006,7 @@ frequency is KHz. > > If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also > be used as a vm ioctl to set the initial tsc frequency of subsequently >-created vCPUs. >+created vCPUs. It must be called before any vCPU is created. ^^ remove one space here. "must be" sounds like a mandatory action, but IIUC the vm ioctl is optional for non-CC VMs. I'm not sure if this is just a problem of my interpretation. To make the API documentation super clear, how about: If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also be used as a vm ioctl to set the initial tsc frequency of vCPUs before any vCPU is created. Attempting to call this vm ioctl after vCPU creation will return an EINVAL error. > > 4.56 KVM_GET_TSC_KHZ > -------------------- >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index 2806f7104295..4051c0cacb92 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -7199,9 +7199,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned >int ioctl, unsigned long arg) > if (user_tsc_khz == 0) > user_tsc_khz = tsc_khz; > >- WRITE_ONCE(kvm->arch.default_tsc_khz, user_tsc_khz); >- r = 0; >- >+ mutex_lock(&kvm->lock); >+ if (!kvm->created_vcpus) { >+ WRITE_ONCE(kvm->arch.default_tsc_khz, >user_tsc_khz); >+ r = 0; >+ } >+ mutex_unlock(&kvm->lock); LGTM. > goto out; > } > case KVM_GET_TSC_KHZ: {
On Fri, 2025-07-11 at 10:17 +0800, Chao Gao wrote: > > AFAICT the actual updating of kvm->arch.default_tsc_khz needs to be in the > > kvm->lock mutex too. > > Yep. > > > Please let me know if you found any issue? > > > > diff --git a/Documentation/virt/kvm/api.rst > > b/Documentation/virt/kvm/api.rst > > index 43ed57e048a8..86ea1e2b2737 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -2006,7 +2006,7 @@ frequency is KHz. > > > > If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also > > be used as a vm ioctl to set the initial tsc frequency of subsequently > > -created vCPUs. > > +created vCPUs. It must be called before any vCPU is created. > > ^^ remove one space here. OK. > > "must be" sounds like a mandatory action, but IIUC the vm ioctl is optional for > non-CC VMs. I'm not sure if this is just a problem of my interpretation. The context of that paragraph has "can also be used ...", so to me it's implied that "if it is called", i.e., it's implied that it is optional for non-CC VMs. > > To make the API documentation super clear, how about: > > If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also > be used as a vm ioctl to set the initial tsc frequency of vCPUs before > any vCPU is created. Attempting to call this vm ioctl after vCPU creation > will return an EINVAL error. I am not sure we need to mention -EINVAL. This IOCTL is already returning -EINVAL for other errors (when invalid user_tsc_khz is supplied). IMHO: Userspace should just care about whether success or not. Being explicit is good, but sometimes it's better to have some room here. But I'll let Sean/Paolo to decide. > > > > > 4.56 KVM_GET_TSC_KHZ > > -------------------- > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 2806f7104295..4051c0cacb92 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -7199,9 +7199,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned > > int ioctl, unsigned long arg) > > if (user_tsc_khz == 0) > > user_tsc_khz = tsc_khz; > > > > - WRITE_ONCE(kvm->arch.default_tsc_khz, user_tsc_khz); > > - r = 0; > > - > > + mutex_lock(&kvm->lock); > > + if (!kvm->created_vcpus) { > > + WRITE_ONCE(kvm->arch.default_tsc_khz, > > user_tsc_khz); > > + r = 0; > > + } > > + mutex_unlock(&kvm->lock); > > LGTM. > > > goto out; > > } > > case KVM_GET_TSC_KHZ: {
On 7/9/2025 1:38 PM, Kai Huang wrote: > Reject the KVM_SET_TSC_KHZ VM ioctl when there's vCPU has already been > created. > > The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC > frequency that all subsequent created vCPUs use. It is only intended to > be called before any vCPU is created. Allowing it to be called after > that only results in confusion but nothing good. > > Note this is an ABI change. But currently in Qemu (the de facto > userspace VMM) only TDX uses this VM ioctl, and it is only called once > before creating any vCPU, therefore the risk of breaking userspace is > pretty low. > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 699ca5e74bba..e5e55d549468 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7194,6 +7194,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) > u32 user_tsc_khz; > > r = -EINVAL; > + > + if (kvm->created_vcpus) > + goto out; > + > user_tsc_khz = (u32)arg; > > if (kvm_caps.has_tsc_control &&
© 2016 - 2025 Red Hat, Inc.