arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/kvm/x86.c | 25 ++++++++++++++++++++++++- 2 files changed, 28 insertions(+), 1 deletion(-)
For intel platform, The BzyMhz field of Turbostat shows zero
due to the missing of part msr bits of MSR_PLATFORM_INFO.
Acquire necessary msr bits, and expose following msr info to guest,
to make sure guest can get correct turbo frequency info.
MSR_PLATFORM_INFO bits
bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO)
bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO)
Signed-off-by: Hao Xiang <hao.xiang@linux.alibaba.com>
---
arch/x86/include/asm/msr-index.h | 4 ++++
arch/x86/kvm/x86.c | 25 ++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d11135..1c8a276 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -68,6 +68,10 @@
#define MSR_PLATFORM_INFO 0x000000ce
#define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31
#define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
+/* MSR_PLATFORM_INFO bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) */
+#define MSR_PLATFORM_INFO_MAX_NON_TURBO_LIM_RATIO 0x00000000ff00
+/* MSR_PLATFORM_INFO bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) */
+#define MSR_PLATFORM_INFO_MAX_EFFICIENCY_RATIO 0xff0000000000
#define MSR_IA32_UMWAIT_CONTROL 0xe1
#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLE BIT(0)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c381770..621c3e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1679,6 +1679,29 @@ static u64 kvm_get_arch_capabilities(void)
return data;
}
+
+static u64 kvm_get_msr_platform_info(void)
+{
+ u64 msr_platform_info = 0;
+
+ rdmsrl_safe(MSR_PLATFORM_INFO, &msr_platform_info);
+ /*
+ * MSR_PLATFORM_INFO bits:
+ * bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO)
+ * bit 31, CPUID Faulting Enabled (CPUID_FAULTING_EN)
+ * bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO)
+ *
+ * Emulate part msr bits, expose above msr info to guest,
+ * to make sure guest can get correct turbo frequency info.
+ */
+
+ msr_platform_info &= (MSR_PLATFORM_INFO_MAX_NON_TURBO_LIM_RATIO |
+ MSR_PLATFORM_INFO_MAX_EFFICIENCY_RATIO);
+ msr_platform_info |= MSR_PLATFORM_INFO_CPUID_FAULT;
+
+ return msr_platform_info;
+}
+
static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
@@ -11919,7 +11942,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
goto free_guest_fpu;
vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
- vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
+ vcpu->arch.msr_platform_info = kvm_get_msr_platform_info();
kvm_xen_init_vcpu(vcpu);
kvm_vcpu_mtrr_init(vcpu);
vcpu_load(vcpu);
--
1.8.3.1
On Mon, Aug 21, 2023 at 11:26:32AM +0800, Hao Xiang wrote: >For intel platform, The BzyMhz field of Turbostat shows zero >due to the missing of part msr bits of MSR_PLATFORM_INFO. > >Acquire necessary msr bits, and expose following msr info to guest, >to make sure guest can get correct turbo frequency info. Userspace VMM (e.g., QEMU) can configure this MSR for guests. Please refer to tools/testing/selftests/kvm/x86_64/platform_info_test.c. The question is why KVM needs this patch given KVM already provides interfaces for QEMU to configure the MSR.
For reason that, The turbo frequency info depends on specific machine type. And the msr value of MSR_PLATFORM_INFO may be diferent on diffrent generation machine. Get following msr bits (needed by turbostat on intel platform) by rdmsr MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits as vcpu->arch.msr_platform_info default value. -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) On 2023/8/21 15:52, Chao Gao wrote: > On Mon, Aug 21, 2023 at 11:26:32AM +0800, Hao Xiang wrote: >> For intel platform, The BzyMhz field of Turbostat shows zero >> due to the missing of part msr bits of MSR_PLATFORM_INFO. >> >> Acquire necessary msr bits, and expose following msr info to guest, >> to make sure guest can get correct turbo frequency info. > > Userspace VMM (e.g., QEMU) can configure this MSR for guests. Please refer to > tools/testing/selftests/kvm/x86_64/platform_info_test.c. > > The question is why KVM needs this patch given KVM already provides interfaces > for QEMU to configure the MSR.
On Mon, Aug 21, 2023 at 05:11:16PM +0800, Hao Xiang wrote: >For reason that, > >The turbo frequency info depends on specific machine type. And the msr value >of MSR_PLATFORM_INFO may be diferent on diffrent generation machine. > >Get following msr bits (needed by turbostat on intel platform) by rdmsr >MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits >as vcpu->arch.msr_platform_info default value. > -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) > -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) I don't get why QEMU cannot do this with the existing interface, e.g., KVM_SET_MSRS. will the MSR value be migrated during VM migration? looks we are in a dilemma. on one side, if the value is migrated, the value can become inconsisntent with hardware value. On the other side, changing the ratio bits at runtime isn't the architectural behavior. And the MSR is per-socket. In theory, a system can have two sockets with different values of the MSR. what if a vCPU is created on a socket and then later runs on the other socket? > >On 2023/8/21 15:52, Chao Gao wrote: >> On Mon, Aug 21, 2023 at 11:26:32AM +0800, Hao Xiang wrote: >> > For intel platform, The BzyMhz field of Turbostat shows zero >> > due to the missing of part msr bits of MSR_PLATFORM_INFO. >> > >> > Acquire necessary msr bits, and expose following msr info to guest, >> > to make sure guest can get correct turbo frequency info. >> >> Userspace VMM (e.g., QEMU) can configure this MSR for guests. Please refer to >> tools/testing/selftests/kvm/x86_64/platform_info_test.c. >> >> The question is why KVM needs this patch given KVM already provides interfaces >> for QEMU to configure the MSR.
On 2023/8/21 18:44, Chao Gao wrote: > On Mon, Aug 21, 2023 at 05:11:16PM +0800, Hao Xiang wrote: >> For reason that, >> >> The turbo frequency info depends on specific machine type. And the msr value >> of MSR_PLATFORM_INFO may be diferent on diffrent generation machine. >> >> Get following msr bits (needed by turbostat on intel platform) by rdmsr >> MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits >> as vcpu->arch.msr_platform_info default value. >> -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) >> -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) > > I don't get why QEMU cannot do this with the existing interface, e.g., > KVM_SET_MSRS. > > will the MSR value be migrated during VM migration? > > looks we are in a dilemma. on one side, if the value is migrated, the value can > become inconsisntent with hardware value. On the other side, changing the ratio > bits at runtime isn't the architectural behavior. > > And the MSR is per-socket. In theory, a system can have two sockets with > different values of the MSR. what if a vCPU is created on a socket and then > later runs on the other socket? > Set these msr bits (needed by turbostat on intel platform) in KVM by default. Of cource, QEMU can also set MSR value by need. It does not conflict. >> >> On 2023/8/21 15:52, Chao Gao wrote: >>> On Mon, Aug 21, 2023 at 11:26:32AM +0800, Hao Xiang wrote: >>>> For intel platform, The BzyMhz field of Turbostat shows zero >>>> due to the missing of part msr bits of MSR_PLATFORM_INFO. >>>> >>>> Acquire necessary msr bits, and expose following msr info to guest, >>>> to make sure guest can get correct turbo frequency info. >>> >>> Userspace VMM (e.g., QEMU) can configure this MSR for guests. Please refer to >>> tools/testing/selftests/kvm/x86_64/platform_info_test.c. >>> >>> The question is why KVM needs this patch given KVM already provides interfaces >>> for QEMU to configure the MSR.
+Aaron When resending a patch, e.g. to change To: or Cc:, tag it RESEND. I got three copies of this... On Mon, Aug 21, 2023, Hao Xiang wrote: > > > On 2023/8/21 18:44, Chao Gao wrote: > > On Mon, Aug 21, 2023 at 05:11:16PM +0800, Hao Xiang wrote: > > > For reason that, > > > > > > The turbo frequency info depends on specific machine type. And the msr value > > > of MSR_PLATFORM_INFO may be diferent on diffrent generation machine. > > > > > > Get following msr bits (needed by turbostat on intel platform) by rdmsr > > > MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits > > > as vcpu->arch.msr_platform_info default value. > > > -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) > > > -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) > > > > I don't get why QEMU cannot do this with the existing interface, e.g., > > KVM_SET_MSRS. > > > > will the MSR value be migrated during VM migration? > > > > looks we are in a dilemma. on one side, if the value is migrated, the value can > > become inconsisntent with hardware value. On the other side, changing the ratio > > bits at runtime isn't the architectural behavior. > > > > And the MSR is per-socket. In theory, a system can have two sockets with > > different values of the MSR. what if a vCPU is created on a socket and then > > later runs on the other socket? > > > > Set these msr bits (needed by turbostat on intel platform) in KVM by > default. > Of cource, QEMU can also set MSR value by need. It does not conflict. It doesn't conflict per se, but it's still problematic. By stuffing a default value, KVM _forces_ userspace to override the MSR to align with the topology and CPUID defined by userspace. And if userspace uses KVM's "default" CPUID, or lack thereof, using the underlying values from hardware are all but guaranteed to be wrong. The existing code that sets MSR_PLATFORM_INFO_CPUID_FAULT really should not exist, i.e. KVM shouldn't shouldn't assume userspace wants to expose CPUID faulting to the guest. That particular one probably isn't worth trying to retroactively fix. Ditto for setting MSR_IA32_ARCH_CAPABILITIES; KVM is overstepping, but doing so likely doesn't cause problems. MSR_IA32_PERF_CAPABILITIES is a different story. Setting a non-zero default value is blatantly wrong, as KVM will advertise vPMU features even if userspace doesn't advertise. Aaron is planning on sending a patch for this one (I'm hoping we can get away with retroactively dropping the code without having to add a quirk). *If* we need KVM to expose the ratios to userspace, then the correct way to do so is handle turbo and efficiency ratio information is to by implementing support in kvm_get_msr_feature(), i.e. KVM_GET_MSRS on /dev/kvm. Emphasis on "if", because I would prefer to do nothing in KVM if that information is already surfaced to userspace through other mechanisms in the kernel.
On 8/22/2023 12:11 AM, Sean Christopherson wrote: > +Aaron > > When resending a patch, e.g. to change To: or Cc:, tag it RESEND. I got three > copies of this... > > On Mon, Aug 21, 2023, Hao Xiang wrote: >> >> >> On 2023/8/21 18:44, Chao Gao wrote: >>> On Mon, Aug 21, 2023 at 05:11:16PM +0800, Hao Xiang wrote: >>>> For reason that, >>>> >>>> The turbo frequency info depends on specific machine type. And the msr value >>>> of MSR_PLATFORM_INFO may be diferent on diffrent generation machine. >>>> >>>> Get following msr bits (needed by turbostat on intel platform) by rdmsr >>>> MSR_PLATFORM_INFO directly in KVM is more reasonable. And set these msr bits >>>> as vcpu->arch.msr_platform_info default value. >>>> -bit 15:8, Maximum Non-Turbo Ratio (MAX_NON_TURBO_LIM_RATIO) >>>> -bit 47:40, Maximum Efficiency Ratio (MAX_EFFICIENCY_RATIO) >>> >>> I don't get why QEMU cannot do this with the existing interface, e.g., >>> KVM_SET_MSRS. >>> >>> will the MSR value be migrated during VM migration? >>> >>> looks we are in a dilemma. on one side, if the value is migrated, the value can >>> become inconsisntent with hardware value. On the other side, changing the ratio >>> bits at runtime isn't the architectural behavior. >>> >>> And the MSR is per-socket. In theory, a system can have two sockets with >>> different values of the MSR. what if a vCPU is created on a socket and then >>> later runs on the other socket? >>> >> >> Set these msr bits (needed by turbostat on intel platform) in KVM by >> default. >> Of cource, QEMU can also set MSR value by need. It does not conflict. > > It doesn't conflict per se, but it's still problematic. By stuffing a default > value, KVM _forces_ userspace to override the MSR to align with the topology and > CPUID defined by userspace. I don't understand how this MSR is related to topology and CPUID? > And if userspace uses KVM's "default" CPUID, or lack > thereof, using the underlying values from hardware are all but guaranteed to be > wrong. Could you please elaborate? > The existing code that sets MSR_PLATFORM_INFO_CPUID_FAULT really should not exist, > i.e. KVM shouldn't shouldn't assume userspace wants to expose CPUID faulting to > the guest. That particular one probably isn't worth trying to retroactively fix. > > Ditto for setting MSR_IA32_ARCH_CAPABILITIES; KVM is overstepping, but doing so > likely doesn't cause problems. > > MSR_IA32_PERF_CAPABILITIES is a different story. Setting a non-zero default value > is blatantly wrong, as KVM will advertise vPMU features even if userspace doesn't > advertise. Aaron is planning on sending a patch for this one (I'm hoping we can > get away with retroactively dropping the code without having to add a quirk). > > *If* we need KVM to expose the ratios to userspace, then the correct way to do so > is handle turbo and efficiency ratio information is to by implementing support in > kvm_get_msr_feature(), i.e. KVM_GET_MSRS on /dev/kvm. Emphasis on "if", because > I would prefer to do nothing in KVM if that information is already surfaced to > userspace through other mechanisms in the kernel.
On Wed, Aug 23, 2023, Xiaoyao Li wrote: > On 8/22/2023 12:11 AM, Sean Christopherson wrote: > > > Set these msr bits (needed by turbostat on intel platform) in KVM by > > > default. Of cource, QEMU can also set MSR value by need. It does not > > > conflict. > > > > It doesn't conflict per se, but it's still problematic. By stuffing a default > > value, KVM _forces_ userspace to override the MSR to align with the topology and > > CPUID defined by userspace. > > I don't understand how this MSR is related to topology and CPUID? Heh, looked at the SDM to double check myself, and the first hit when searching for MSR_PLATFORM_INFO says: When TSC scaling is enabled for a guest using Intel PT, the VMM should ensure that the value of Maximum Non-Turbo Ratio[15:8] in MSR_PLATFORM_INFO (MSR 0CEH) and the TSC/”core crystal clock” ratio (EBX/EAX) in CPUID leaf 15H are set in a manner consistent with the resulting TSC rate that will be visible to the VM. As Chao pointed out, the MSR is technically per package, so a weird setup could have sockets with different frequencies, or enumerate a virtual topology to the guest with such a configuration. I doubt/hope no one actually does something like that, but it's theoretically possible, and one of the many reasons why KVM needs to stay out of the way and let userspace define the vCPU model. > > And if userspace uses KVM's "default" CPUID, or lack thereof, using the > > underlying values from hardware are all but guaranteed to be wrong. > > Could you please elaborate? I guess an empty CPUID would probably be ok? If there's no CPUID.0x15, it can't be wrong. It's largely a moot point though, I highly doubt anyone runs a "real" VM without populating _something_ in guest CPUID.
On 8/23/2023 10:31 PM, Sean Christopherson wrote: > On Wed, Aug 23, 2023, Xiaoyao Li wrote: >> On 8/22/2023 12:11 AM, Sean Christopherson wrote: >>>> Set these msr bits (needed by turbostat on intel platform) in KVM by >>>> default. Of cource, QEMU can also set MSR value by need. It does not >>>> conflict. >>> >>> It doesn't conflict per se, but it's still problematic. By stuffing a default >>> value, KVM _forces_ userspace to override the MSR to align with the topology and >>> CPUID defined by userspace. >> >> I don't understand how this MSR is related to topology and CPUID? > > Heh, looked at the SDM to double check myself, and the first hit when searching > for MSR_PLATFORM_INFO says: > > When TSC scaling is enabled for a guest using Intel PT, the VMM should ensure > that the value of Maximum Non-Turbo Ratio[15:8] in MSR_PLATFORM_INFO (MSR 0CEH) > and the TSC/”core crystal clock” ratio (EBX/EAX) in CPUID leaf 15H are set in > a manner consistent with the resulting TSC rate that will be visible to the VM. I see. > As Chao pointed out, the MSR is technically per package, so a weird setup could > have sockets with different frequencies, or enumerate a virtual topology to the > guest with such a configuration. Every feature might get into trouble if not consistent across packages, no matter per-thread/per-core/per-package. > I doubt/hope no one actually does something > like that, but it's theoretically possible, and one of the many reasons why KVM > needs to stay out of the way and let userspace define the vCPU model. For this specific case, the max non-turbo frequency needs to be consistent with TSC frequency. Because KVM has default TSC frequency as host's tsc_khz, for correctness, it should have a default value to match with KVM's default TSC when userspace provide no explicit configuration. But it's not the problem this patch targets. I'm OK to keep returning 0 as-is until some bug reported due to the inconsistent between max non-turbo frequency and TSC frequency. >>> And if userspace uses KVM's "default" CPUID, or lack thereof, using the >>> underlying values from hardware are all but guaranteed to be wrong. >> >> Could you please elaborate? > > I guess an empty CPUID would probably be ok? If there's no CPUID.0x15, it can't > be wrong. It's largely a moot point though, I highly doubt anyone runs a "real" > VM without populating _something_ in guest CPUID. current QEMU doesn't configure CPUID leaf 0x15, nor does it configure MSR_PLATFORM_INFO[15:8]. I need to take time to dig how Linux gets the TSC frequency.
© 2016 - 2025 Red Hat, Inc.