system/vl.c | 4 ++-- target/i386/cpu.c | 24 ++++++++++++------------ target/i386/host-cpu.c | 12 ------------ target/i386/kvm/kvm-cpu.c | 12 +++++++++--- 4 files changed, 23 insertions(+), 29 deletions(-)
Currently, if running "-overcommit cpu-pm=on" on hosts that don't have MWAIT support, the MWAIT/MONITOR feature is advertised to the guest and executing MWAIT/MONITOR on the guest triggers #UD. V2: - [PATCH 1]: took Thomas' suggestion for more generic fix - [PATCH 2/3]: no changes Zide Chen (3): vl: Allow multiple -overcommit commands target/i386: call cpu_exec_realizefn before x86_cpu_filter_features target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() system/vl.c | 4 ++-- target/i386/cpu.c | 24 ++++++++++++------------ target/i386/host-cpu.c | 12 ------------ target/i386/kvm/kvm-cpu.c | 12 +++++++++--- 4 files changed, 23 insertions(+), 29 deletions(-) -- 2.34.1
On Fri, 24 May 2024 13:00:14 -0700 Zide Chen <zide.chen@intel.com> wrote: > Currently, if running "-overcommit cpu-pm=on" on hosts that don't > have MWAIT support, the MWAIT/MONITOR feature is advertised to the > guest and executing MWAIT/MONITOR on the guest triggers #UD. this is missing proper description how do you trigger issue with reproducer and detailed description why guest sees MWAIT when it's not supported by host. > > V2: > - [PATCH 1]: took Thomas' suggestion for more generic fix > - [PATCH 2/3]: no changes > > Zide Chen (3): > vl: Allow multiple -overcommit commands > target/i386: call cpu_exec_realizefn before x86_cpu_filter_features > target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() > > system/vl.c | 4 ++-- > target/i386/cpu.c | 24 ++++++++++++------------ > target/i386/host-cpu.c | 12 ------------ > target/i386/kvm/kvm-cpu.c | 12 +++++++++--- > 4 files changed, 23 insertions(+), 29 deletions(-) >
On 5/28/2024 2:23 AM, Igor Mammedov wrote: > On Fri, 24 May 2024 13:00:14 -0700 > Zide Chen <zide.chen@intel.com> wrote: > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > this is missing proper description how do you trigger issue > with reproducer and detailed description why guest sees MWAIT > when it's not supported by host. If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so that it doesn't have a chance to check MWAIT against host features and will be advertised to the guest regardless of whether it's supported by the host or not. x86_cpu_realizefn() x86_cpu_filter_features() cpu_exec_realizefn() kvm_cpu_realizefn host_cpu_realizefn host_cpu_enable_cpu_pm env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; If it's not supported by the host, executing MONITOR or MWAIT instructions from the guest triggers #UD, no matter MWAIT_EXITING control is set or not.
On Tue, 28 May 2024 11:16:59 -0700 "Chen, Zide" <zide.chen@intel.com> wrote: > On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > On Fri, 24 May 2024 13:00:14 -0700 > > Zide Chen <zide.chen@intel.com> wrote: > > > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > >> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > > > this is missing proper description how do you trigger issue > > with reproducer and detailed description why guest sees MWAIT > > when it's not supported by host. > > If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the it's bette to provide full QEMU CLI and host/guest kernels used and what hardware was used if it's relevant so others can reproduce problem. > following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > that it doesn't have a chance to check MWAIT against host features and > will be advertised to the guest regardless of whether it's supported by > the host or not. > > x86_cpu_realizefn() > x86_cpu_filter_features() > cpu_exec_realizefn() > kvm_cpu_realizefn > host_cpu_realizefn > host_cpu_enable_cpu_pm > env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > > If it's not supported by the host, executing MONITOR or MWAIT > instructions from the guest triggers #UD, no matter MWAIT_EXITING > control is set or not. If I recall right, kvm was able to emulate mwait/monitor. So question is why it leads to exception instead?
On 5/29/2024 5:46 AM, Igor Mammedov wrote: > On Tue, 28 May 2024 11:16:59 -0700 > "Chen, Zide" <zide.chen@intel.com> wrote: > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>> On Fri, 24 May 2024 13:00:14 -0700 >>> Zide Chen <zide.chen@intel.com> wrote: >>> >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. >>> >>> this is missing proper description how do you trigger issue >>> with reproducer and detailed description why guest sees MWAIT >>> when it's not supported by host. >> >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > it's bette to provide full QEMU CLI and host/guest kernels used and what > hardware was used if it's relevant so others can reproduce problem. I ever reproduced this on an older Intel Icelake machine, a Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic issue, not specific to particular models. For the CLI, I think the only command line options that matter are -overcommit cpu-pm=on: to set enable_cpu_pm -cpu host: so that cpu->max_features is set For QEMU version, as long as it's after this commit: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") The guest fails to boot: [ 24.825568] smpboot: x86: Booting SMP configuration: [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #17 [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #145 [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: 0000000000000000 [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: 0000000000000004 [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: 0000000000000001 [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: 0000000000000000 [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: 0000000000093ff0 [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) knlGS:0000000000000000 [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: 0000000000770ef0 [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 25.137790] PKRU: 55555554 [ 25.137790] Call Trace: [ 25.137790] <TASK> [ 25.137790] ? die+0x37/0x90 [ 25.137790] ? do_trap+0xe3/0x110 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? do_error_trap+0x6a/0x90 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? exc_invalid_op+0x52/0x70 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] default_idle_call+0x30/0x100 [ 25.137790] cpuidle_idle_call+0x12c/0x170 [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 [ 25.137790] do_idle+0x7f/0xd0 [ 25.137790] cpu_startup_entry+0x29/0x30 [ 25.137790] rest_init+0xcc/0xd0 [ 25.137790] start_kernel+0x396/0x5d0 [ 25.137790] x86_64_start_reservations+0x18/0x30 [ 25.137790] x86_64_start_kernel+0xe7/0xf0 [ 25.137790] common_startup_64+0x13e/0x148 [ 25.137790] </TASK> [ 25.137790] Modules linked in: [ 25.137790] --[ end trace 0000000000000000 ]-- [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so >> that it doesn't have a chance to check MWAIT against host features and >> will be advertised to the guest regardless of whether it's supported by >> the host or not. >> >> x86_cpu_realizefn() >> x86_cpu_filter_features() >> cpu_exec_realizefn() >> kvm_cpu_realizefn >> host_cpu_realizefn >> host_cpu_enable_cpu_pm >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >> >> >> If it's not supported by the host, executing MONITOR or MWAIT >> instructions from the guest triggers #UD, no matter MWAIT_EXITING >> control is set or not. > > If I recall right, kvm was able to emulate mwait/monitor. > So question is why it leads to exception instead? KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I didn't find explicit proof from Intel SDM that #UD exceptions take precedence over MWAIT/MONITOR VM exits, but this is my speculation. For example, in ancient machines which don't support MWAIT yet, the only way it can do is #UD, not MWAIT VM exit?
Hi Zide, On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > Date: Wed, 29 May 2024 10:31:21 -0700 > From: "Chen, Zide" <zide.chen@intel.com> > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > On Tue, 28 May 2024 11:16:59 -0700 > > "Chen, Zide" <zide.chen@intel.com> wrote: > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > >>> On Fri, 24 May 2024 13:00:14 -0700 > >>> Zide Chen <zide.chen@intel.com> wrote: > >>> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > >>> > >>> this is missing proper description how do you trigger issue > >>> with reproducer and detailed description why guest sees MWAIT > >>> when it's not supported by host. > >> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > hardware was used if it's relevant so others can reproduce problem. > > I ever reproduced this on an older Intel Icelake machine, a > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > issue, not specific to particular models. > > For the CLI, I think the only command line options that matter are > -overcommit cpu-pm=on: to set enable_cpu_pm > -cpu host: so that cpu->max_features is set > > For QEMU version, as long as it's after this commit: 662175b91ff2 > ("i386: reorder call to cpu_exec_realizefn") > > The guest fails to boot: > > [ 24.825568] smpboot: x86: Booting SMP configuration: > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > #13 #14 #15 #17 > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > #136 #137 #138 #139 #140 #141 #142 #143 #145 > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: > 0000000000000000 > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: > 0000000000000004 > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: > 0000000000000001 > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: > 0000000000000000 > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: > 0000000000093ff0 > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) > knlGS:0000000000000000 > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: > 0000000000770ef0 > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: > 0000000000000400 > [ 25.137790] PKRU: 55555554 > [ 25.137790] Call Trace: > [ 25.137790] <TASK> > [ 25.137790] ? die+0x37/0x90 > [ 25.137790] ? do_trap+0xe3/0x110 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? do_error_trap+0x6a/0x90 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? exc_invalid_op+0x52/0x70 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] default_idle_call+0x30/0x100 > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > [ 25.137790] do_idle+0x7f/0xd0 > [ 25.137790] cpu_startup_entry+0x29/0x30 > [ 25.137790] rest_init+0xcc/0xd0 > [ 25.137790] start_kernel+0x396/0x5d0 > [ 25.137790] x86_64_start_reservations+0x18/0x30 > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > [ 25.137790] common_startup_64+0x13e/0x148 > [ 25.137790] </TASK> > [ 25.137790] Modules linked in: > [ 25.137790] --[ end trace 0000000000000000 ]-- > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > >> that it doesn't have a chance to check MWAIT against host features and > >> will be advertised to the guest regardless of whether it's supported by > >> the host or not. > >> > >> x86_cpu_realizefn() > >> x86_cpu_filter_features() > >> cpu_exec_realizefn() > >> kvm_cpu_realizefn > >> host_cpu_realizefn > >> host_cpu_enable_cpu_pm > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > >> > >> > >> If it's not supported by the host, executing MONITOR or MWAIT > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > >> control is set or not. > > > > If I recall right, kvm was able to emulate mwait/monitor. > > So question is why it leads to exception instead? > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > didn't find explicit proof from Intel SDM that #UD exceptions take > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For > example, in ancient machines which don't support MWAIT yet, the only way > it can do is #UD, not MWAIT VM exit? For the Host which doesn't support MWAIT, it shouldn't have the VMX control bit for mwait exit either, right? Could you pls check this on your machine? If VMX doesn't support this exit event, then triggering an exception will make sense. -Zhao
On 5/30/2024 6:54 AM, Zhao Liu wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: >> Date: Wed, 29 May 2024 10:31:21 -0700 >> From: "Chen, Zide" <zide.chen@intel.com> >> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off >> >> >> >> On 5/29/2024 5:46 AM, Igor Mammedov wrote: >>> On Tue, 28 May 2024 11:16:59 -0700 >>> "Chen, Zide" <zide.chen@intel.com> wrote: >>> >>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>>>> On Fri, 24 May 2024 13:00:14 -0700 >>>>> Zide Chen <zide.chen@intel.com> wrote: >>>>> >>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. >>>>> >>>>> this is missing proper description how do you trigger issue >>>>> with reproducer and detailed description why guest sees MWAIT >>>>> when it's not supported by host. >>>> >>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the >>> it's bette to provide full QEMU CLI and host/guest kernels used and what >>> hardware was used if it's relevant so others can reproduce problem. >> >> I ever reproduced this on an older Intel Icelake machine, a >> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic >> issue, not specific to particular models. >> >> For the CLI, I think the only command line options that matter are >> -overcommit cpu-pm=on: to set enable_cpu_pm >> -cpu host: so that cpu->max_features is set >> >> For QEMU version, as long as it's after this commit: 662175b91ff2 >> ("i386: reorder call to cpu_exec_realizefn") >> >> The guest fails to boot: >> >> [ 24.825568] smpboot: x86: Booting SMP configuration: >> [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 >> #13 #14 #15 #17 >> [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 >> #136 #137 #138 #139 #140 #141 #142 #143 #145 >> [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI >> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 >> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 >> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 >> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 >> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 >> [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 >> [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: >> 0000000000000000 >> [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: >> 0000000000000004 >> [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: >> 0000000000000001 >> [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: >> 0000000000000000 >> [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: >> 0000000000093ff0 >> [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) >> knlGS:0000000000000000 >> [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: >> 0000000000770ef0 >> [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: >> 0000000000000400 >> [ 25.137790] PKRU: 55555554 >> [ 25.137790] Call Trace: >> [ 25.137790] <TASK> >> [ 25.137790] ? die+0x37/0x90 >> [ 25.137790] ? do_trap+0xe3/0x110 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? do_error_trap+0x6a/0x90 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? exc_invalid_op+0x52/0x70 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] default_idle_call+0x30/0x100 >> [ 25.137790] cpuidle_idle_call+0x12c/0x170 >> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 >> [ 25.137790] do_idle+0x7f/0xd0 >> [ 25.137790] cpu_startup_entry+0x29/0x30 >> [ 25.137790] rest_init+0xcc/0xd0 >> [ 25.137790] start_kernel+0x396/0x5d0 >> [ 25.137790] x86_64_start_reservations+0x18/0x30 >> [ 25.137790] x86_64_start_kernel+0xe7/0xf0 >> [ 25.137790] common_startup_64+0x13e/0x148 >> [ 25.137790] </TASK> >> [ 25.137790] Modules linked in: >> [ 25.137790] --[ end trace 0000000000000000 ]-- >> [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI >> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 >> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 >> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 >> >>> >>>> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so >>>> that it doesn't have a chance to check MWAIT against host features and >>>> will be advertised to the guest regardless of whether it's supported by >>>> the host or not. >>>> >>>> x86_cpu_realizefn() >>>> x86_cpu_filter_features() >>>> cpu_exec_realizefn() >>>> kvm_cpu_realizefn >>>> host_cpu_realizefn >>>> host_cpu_enable_cpu_pm >>>> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >>>> >>>> >>>> If it's not supported by the host, executing MONITOR or MWAIT >>>> instructions from the guest triggers #UD, no matter MWAIT_EXITING >>>> control is set or not. >>> >>> If I recall right, kvm was able to emulate mwait/monitor. >>> So question is why it leads to exception instead? >> >> KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I >> didn't find explicit proof from Intel SDM that #UD exceptions take >> precedence over MWAIT/MONITOR VM exits, but this is my speculation. For >> example, in ancient machines which don't support MWAIT yet, the only way >> it can do is #UD, not MWAIT VM exit? > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > control bit for mwait exit either, right? > > Could you pls check this on your machine? If VMX doesn't support this > exit event, then triggering an exception will make sense. As Sean just confirmed, #UD takes priority over MWAIT exiting VM-exit, thus if the host doesn't support MWAIT, regardless the MWAIT exiting is set or not, executing MWAIT instruction from the guest triggers #UD, and the guest doesn't boot. This is not desired and VMM should not advertise MWAIT to the guest in this case.
On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu <zhao1.liu@intel.com> wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > Date: Wed, 29 May 2024 10:31:21 -0700 > > From: "Chen, Zide" <zide.chen@intel.com> > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > On Tue, 28 May 2024 11:16:59 -0700 > > > "Chen, Zide" <zide.chen@intel.com> wrote: > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > >>> Zide Chen <zide.chen@intel.com> wrote: > > >>> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > >>> > > >>> this is missing proper description how do you trigger issue > > >>> with reproducer and detailed description why guest sees MWAIT > > >>> when it's not supported by host. > > >> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > hardware was used if it's relevant so others can reproduce problem. > > > > I ever reproduced this on an older Intel Icelake machine, a > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > issue, not specific to particular models. > > > > For the CLI, I think the only command line options that matter are > > -overcommit cpu-pm=on: to set enable_cpu_pm > > -cpu host: so that cpu->max_features is set > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > ("i386: reorder call to cpu_exec_realizefn") > > > > The guest fails to boot: > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > #13 #14 #15 #17 > > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 > > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: > > 0000000000000000 > > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: > > 0000000000000004 > > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: > > 0000000000000001 > > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: > > 0000000000000000 > > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: > > 0000000000093ff0 > > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) > > knlGS:0000000000000000 > > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: > > 0000000000770ef0 > > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: > > 0000000000000400 > > [ 25.137790] PKRU: 55555554 > > [ 25.137790] Call Trace: > > [ 25.137790] <TASK> > > [ 25.137790] ? die+0x37/0x90 > > [ 25.137790] ? do_trap+0xe3/0x110 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] default_idle_call+0x30/0x100 > > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > > [ 25.137790] do_idle+0x7f/0xd0 > > [ 25.137790] cpu_startup_entry+0x29/0x30 > > [ 25.137790] rest_init+0xcc/0xd0 > > [ 25.137790] start_kernel+0x396/0x5d0 > > [ 25.137790] x86_64_start_reservations+0x18/0x30 > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > > [ 25.137790] common_startup_64+0x13e/0x148 > > [ 25.137790] </TASK> > > [ 25.137790] Modules linked in: > > [ 25.137790] --[ end trace 0000000000000000 ]-- > > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > > > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > > >> that it doesn't have a chance to check MWAIT against host features and > > >> will be advertised to the guest regardless of whether it's supported by > > >> the host or not. > > >> > > >> x86_cpu_realizefn() > > >> x86_cpu_filter_features() > > >> cpu_exec_realizefn() > > >> kvm_cpu_realizefn > > >> host_cpu_realizefn > > >> host_cpu_enable_cpu_pm > > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > >> > > >> > > >> If it's not supported by the host, executing MONITOR or MWAIT > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > > >> control is set or not. > > > > > > If I recall right, kvm was able to emulate mwait/monitor. > > > So question is why it leads to exception instead? > > > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > > didn't find explicit proof from Intel SDM that #UD exceptions take > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For > > example, in ancient machines which don't support MWAIT yet, the only way > > it can do is #UD, not MWAIT VM exit? > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > control bit for mwait exit either, right? > > Could you pls check this on your machine? If VMX doesn't support this > exit event, then triggering an exception will make sense. My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable, unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS to begin with. Recently Sean posted a patch related to that [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed https://lkml.org/lkml/2024/5/17/729 This needs someone with KVM expertise to chime in Perhaps Paolo/Sean could clarify expected behavior. > > -Zhao >
On Thu, May 30, 2024 at 04:49:33PM +0200, Igor Mammedov wrote: > On Thu, 30 May 2024 21:54:47 +0800 > Zhao Liu <zhao1.liu@intel.com> wrote: > > > Hi Zide, > > > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > > Date: Wed, 29 May 2024 10:31:21 -0700 > > > From: "Chen, Zide" <zide.chen@intel.com> > > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > > On Tue, 28 May 2024 11:16:59 -0700 > > > > "Chen, Zide" <zide.chen@intel.com> wrote: > > > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > > >>> Zide Chen <zide.chen@intel.com> wrote: > > > >>> > > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > > >>> > > > >>> this is missing proper description how do you trigger issue > > > >>> with reproducer and detailed description why guest sees MWAIT > > > >>> when it's not supported by host. > > > >> > > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > > hardware was used if it's relevant so others can reproduce problem. > > > > > > I ever reproduced this on an older Intel Icelake machine, a > > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > > issue, not specific to particular models. > > > > > > For the CLI, I think the only command line options that matter are > > > -overcommit cpu-pm=on: to set enable_cpu_pm > > > -cpu host: so that cpu->max_features is set > > > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > > ("i386: reorder call to cpu_exec_realizefn") > > > > > > The guest fails to boot: > > > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > > #13 #14 #15 #17 > > > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI > > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 > > > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: > > > 0000000000000000 > > > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: > > > 0000000000000004 > > > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: > > > 0000000000000001 > > > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: > > > 0000000000000000 > > > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: > > > 0000000000093ff0 > > > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) > > > knlGS:0000000000000000 > > > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: > > > 0000000000770ef0 > > > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: > > > 0000000000000400 > > > [ 25.137790] PKRU: 55555554 > > > [ 25.137790] Call Trace: > > > [ 25.137790] <TASK> > > > [ 25.137790] ? die+0x37/0x90 > > > [ 25.137790] ? do_trap+0xe3/0x110 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] default_idle_call+0x30/0x100 > > > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > > > [ 25.137790] do_idle+0x7f/0xd0 > > > [ 25.137790] cpu_startup_entry+0x29/0x30 > > > [ 25.137790] rest_init+0xcc/0xd0 > > > [ 25.137790] start_kernel+0x396/0x5d0 > > > [ 25.137790] x86_64_start_reservations+0x18/0x30 > > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > > > [ 25.137790] common_startup_64+0x13e/0x148 > > > [ 25.137790] </TASK> > > > [ 25.137790] Modules linked in: > > > [ 25.137790] --[ end trace 0000000000000000 ]-- > > > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI > > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > > > > > > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > > > >> that it doesn't have a chance to check MWAIT against host features and > > > >> will be advertised to the guest regardless of whether it's supported by > > > >> the host or not. > > > >> > > > >> x86_cpu_realizefn() > > > >> x86_cpu_filter_features() > > > >> cpu_exec_realizefn() > > > >> kvm_cpu_realizefn > > > >> host_cpu_realizefn > > > >> host_cpu_enable_cpu_pm > > > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > > >> > > > >> > > > >> If it's not supported by the host, executing MONITOR or MWAIT > > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > > > >> control is set or not. > > > > > > > > If I recall right, kvm was able to emulate mwait/monitor. > > > > So question is why it leads to exception instead? > > > > > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > > > didn't find explicit proof from Intel SDM that #UD exceptions take > > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For > > > example, in ancient machines which don't support MWAIT yet, the only way > > > it can do is #UD, not MWAIT VM exit? > > > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > > control bit for mwait exit either, right? > > > > Could you pls check this on your machine? If VMX doesn't support this > > exit event, then triggering an exception will make sense. > > My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable, emulating mwait correctly is very hard. KVM does not try. > unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would > explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS > to begin with. > > Recently Sean posted a patch related to that > [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed > https://lkml.org/lkml/2024/5/17/729 > > This needs someone with KVM expertise to chime in > Perhaps Paolo/Sean could clarify expected behavior. > > > > > > -Zhao > >
On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu <zhao1.liu@intel.com> wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > Date: Wed, 29 May 2024 10:31:21 -0700 > > From: "Chen, Zide" <zide.chen@intel.com> > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > On Tue, 28 May 2024 11:16:59 -0700 > > > "Chen, Zide" <zide.chen@intel.com> wrote: > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > >>> Zide Chen <zide.chen@intel.com> wrote: > > >>> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > >>> > > >>> this is missing proper description how do you trigger issue > > >>> with reproducer and detailed description why guest sees MWAIT > > >>> when it's not supported by host. > > >> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > hardware was used if it's relevant so others can reproduce problem. > > > > I ever reproduced this on an older Intel Icelake machine, a > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > issue, not specific to particular models. > > > > For the CLI, I think the only command line options that matter are > > -overcommit cpu-pm=on: to set enable_cpu_pm > > -cpu host: so that cpu->max_features is set > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > ("i386: reorder call to cpu_exec_realizefn") > > > > The guest fails to boot: > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > [ 24.826377] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > #13 #14 #15 #17 > > [ 24.985799] .... node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > [ 25.136955] invalid opcode: 0000 1 PREEMPT SMP NOPTI > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > [ 25.137790] RSP: 0000:ffffffff91403e70 EFLAGS: 00010046 > > [ 25.137790] RAX: ffffffff9140a980 RBX: ffffffff9140a980 RCX: > > 0000000000000000 > > [ 25.137790] RDX: 0000000000000000 RSI: ffff97f1ade21b20 RDI: > > 0000000000000004 > > [ 25.137790] RBP: 0000000000000000 R08: 00000005da4709cb R09: > > 0000000000000001 > > [ 25.137790] R10: 0000000000005da4 R11: 0000000000000009 R12: > > 0000000000000000 > > [ 25.137790] R13: ffff98573ff90fc0 R14: ffffffff9140a038 R15: > > 0000000000093ff0 > > [ 25.137790] FS: 0000000000000000(0000) GS:ffff97f1ade00000(0000) > > knlGS:0000000000000000 > > [ 25.137790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 25.137790] CR2: ffff97d8aa801000 CR3: 00000049e9430001 CR4: > > 0000000000770ef0 > > [ 25.137790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 25.137790] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: > > 0000000000000400 > > [ 25.137790] PKRU: 55555554 > > [ 25.137790] Call Trace: > > [ 25.137790] <TASK> > > [ 25.137790] ? die+0x37/0x90 > > [ 25.137790] ? do_trap+0xe3/0x110 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] default_idle_call+0x30/0x100 > > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > > [ 25.137790] do_idle+0x7f/0xd0 > > [ 25.137790] cpu_startup_entry+0x29/0x30 > > [ 25.137790] rest_init+0xcc/0xd0 > > [ 25.137790] start_kernel+0x396/0x5d0 > > [ 25.137790] x86_64_start_reservations+0x18/0x30 > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > > [ 25.137790] common_startup_64+0x13e/0x148 > > [ 25.137790] </TASK> > > [ 25.137790] Modules linked in: > > [ 25.137790] --[ end trace 0000000000000000 ]-- > > [ 25.137790] invalid opcode: 0000 2 PREEMPT SMP NOPTI > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > > > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > > >> that it doesn't have a chance to check MWAIT against host features and > > >> will be advertised to the guest regardless of whether it's supported by > > >> the host or not. > > >> > > >> x86_cpu_realizefn() > > >> x86_cpu_filter_features() > > >> cpu_exec_realizefn() > > >> kvm_cpu_realizefn > > >> host_cpu_realizefn > > >> host_cpu_enable_cpu_pm > > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > >> > > >> > > >> If it's not supported by the host, executing MONITOR or MWAIT > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > > >> control is set or not. > > > > > > If I recall right, kvm was able to emulate mwait/monitor. > > > So question is why it leads to exception instead? > > > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > > didn't find explicit proof from Intel SDM that #UD exceptions take > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. For > > example, in ancient machines which don't support MWAIT yet, the only way > > it can do is #UD, not MWAIT VM exit? > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > control bit for mwait exit either, right? > > Could you pls check this on your machine? If VMX doesn't support this > exit event, then triggering an exception will make sense. My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable, unless we have KVM_CAP_X86_DISABLE_EXITS enabled. And in the later case it would explode as expected, however then we shouldn't be able to set KVM_CAP_X86_DISABLE_EXITS to begin with. Recently Sean posted a patch related to that [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed https://lkml.org/lkml/2024/5/17/729 This needs someone with KVM expertise to chime in Perhaps Paolo/Sean could clarify expected behavior.
On Thu, May 30, 2024, Igor Mammedov wrote: > On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu <zhao1.liu@intel.com> wrote: ... > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > > > >> that it doesn't have a chance to check MWAIT against host features and > > > >> will be advertised to the guest regardless of whether it's supported by > > > >> the host or not. > > > >> > > > >> x86_cpu_realizefn() > > > >> x86_cpu_filter_features() > > > >> cpu_exec_realizefn() > > > >> kvm_cpu_realizefn > > > >> host_cpu_realizefn > > > >> host_cpu_enable_cpu_pm > > > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > > >> > > > >> > > > >> If it's not supported by the host, executing MONITOR or MWAIT > > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > > > >> control is set or not. > > > > > > > > If I recall right, kvm was able to emulate mwait/monitor. > > > > So question is why it leads to exception instead? Because KVM doesn't emulated MONITOR/MWAIT on #UD. > > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > > > didn't find explicit proof from Intel SDM that #UD exceptions take > > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. Yeah, typically #UD takes priority over VM-Exit interception checks. AMD's APM is much more explicit and states that all exceptions are checked on MONITOR/MWAIT before the interception check. > > > For example, in ancient machines which don't support MWAIT yet, the only > > > way it can do is #UD, not MWAIT VM exit? Not really relevant, because such CPUs wouldn't have MWAIT-exiting. > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > > control bit for mwait exit either, right? > > > > Could you pls check this on your machine? If VMX doesn't support this > > exit event, then triggering an exception will make sense. > > My assumption (probably wrong) was that KVM would emulate mwait if it's unavailable, Nope. In order to limit the attack surface of the emulator on modern CPUs, KVM only emulates select instructions in response to a #UD. But even if KVM did emulate MONITOR/MWAIT on #UD, this is inarguably a QEMU bug, e.g. QEMU will effectively coerce the guest into using a idle-polling mechanism.
© 2016 - 2024 Red Hat, Inc.