[PATCH] target/i386: introduce CPU property to work around Windows reset bug

Paolo Bonzini posted 1 patch 2 years, 1 month ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220324082346.72180-1-pbonzini@redhat.com
There is a newer version of this series
target/i386/cpu.c | 10 ++++++++++
target/i386/cpu.h |  3 +++
2 files changed, 13 insertions(+)
[PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Paolo Bonzini 2 years, 1 month ago
Some versions of Windows hang on reboot if their TSC value is greater
than 2^54.  The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.
As a workaround, reset the TSC to a small value.  Do not do this
unconditionally and require a special property to be set.

Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/cpu.c | 10 ++++++++++
 target/i386/cpu.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ec3b50bf6e..9b29cea8c4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5931,6 +5931,15 @@ static void x86_cpu_reset(DeviceState *dev)
     env->xstate_bv = 0;
 
     env->pat = 0x0007040600070406ULL;
+
+    /*
+     * Some versions of Windows hang on reboot if their TSC value is greater
+     * than 2^54.  As a workaround, reset the TSC to a small value.  Do not use
+     * zero, KVM applies special heuristics for CPU startup when TSC is cleared.
+     */
+    if (cpu->tsc_clear_on_reset) {
+        env->tsc = 1;
+    }
     env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
     if (env->features[FEAT_1_ECX] & CPUID_EXT_MONITOR) {
         env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_MWAIT;
@@ -6957,6 +6966,7 @@ static Property x86_cpu_properties[] = {
                      false),
     DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
     DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
+    DEFINE_PROP_BOOL("tsc-clear-on-reset", X86CPU, tsc_clear_on_reset, true),
     DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count,
                      true),
     /*
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e31e6bd8b8..66f7901729 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1864,6 +1864,9 @@ struct ArchCPU {
     /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
     bool kvm_pv_enforce_cpuid;
 
+    /* Clear TSC on reset */
+    bool tsc_clear_on_reset;
+
     /* Number of physical address bits supported */
     uint32_t phys_bits;
 
-- 
2.35.1
Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Daniel P. Berrangé 2 years, 1 month ago
On Thu, Mar 24, 2022 at 09:23:46AM +0100, Paolo Bonzini wrote:
> Some versions of Windows hang on reboot if their TSC value is greater
> than 2^54.  The calibration of the Hyper-V reference time overflows
> and fails; as a result the processors' clock sources are out of sync.
> As a workaround, reset the TSC to a small value.  Do not do this
> unconditionally and require a special property to be set.

What's the problem with doing it unconditionally ?

Requiring this special niche property means that it'll have to be
enabled by management apps. Most will never learn it exists, and
of those that do, many will take years to get this enabled and
into usage by users, and many won't even bother.

IMHO, this is the kind of situation where we need the fix to be
enabled by default, or we might as well not bother.

> 
> Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target/i386/cpu.c | 10 ++++++++++
>  target/i386/cpu.h |  3 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index ec3b50bf6e..9b29cea8c4 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -5931,6 +5931,15 @@ static void x86_cpu_reset(DeviceState *dev)
>      env->xstate_bv = 0;
>  
>      env->pat = 0x0007040600070406ULL;
> +
> +    /*
> +     * Some versions of Windows hang on reboot if their TSC value is greater
> +     * than 2^54.  As a workaround, reset the TSC to a small value.  Do not use
> +     * zero, KVM applies special heuristics for CPU startup when TSC is cleared.
> +     */
> +    if (cpu->tsc_clear_on_reset) {
> +        env->tsc = 1;
> +    }
>      env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
>      if (env->features[FEAT_1_ECX] & CPUID_EXT_MONITOR) {
>          env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_MWAIT;
> @@ -6957,6 +6966,7 @@ static Property x86_cpu_properties[] = {
>                       false),
>      DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
>      DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
> +    DEFINE_PROP_BOOL("tsc-clear-on-reset", X86CPU, tsc_clear_on_reset, true),
>      DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count,
>                       true),
>      /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index e31e6bd8b8..66f7901729 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1864,6 +1864,9 @@ struct ArchCPU {
>      /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
>      bool kvm_pv_enforce_cpuid;
>  
> +    /* Clear TSC on reset */
> +    bool tsc_clear_on_reset;
> +
>      /* Number of physical address bits supported */
>      uint32_t phys_bits;
>  
> -- 
> 2.35.1
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Daniel P. Berrangé 2 years, 1 month ago
On Thu, Mar 24, 2022 at 09:13:12AM +0000, Daniel P. Berrangé wrote:
> On Thu, Mar 24, 2022 at 09:23:46AM +0100, Paolo Bonzini wrote:
> > Some versions of Windows hang on reboot if their TSC value is greater
> > than 2^54.  The calibration of the Hyper-V reference time overflows
> > and fails; as a result the processors' clock sources are out of sync.
> > As a workaround, reset the TSC to a small value.  Do not do this
> > unconditionally and require a special property to be set.
> 
> What's the problem with doing it unconditionally ?
> 
> Requiring this special niche property means that it'll have to be
> enabled by management apps. Most will never learn it exists, and
> of those that do, many will take years to get this enabled and
> into usage by users, and many won't even bother.
> 
> IMHO, this is the kind of situation where we need the fix to be
> enabled by default, or we might as well not bother.

Sigh, hit send too soon. I see the property is actually turned
on in the defaults below, so effectively it will always be on
unconditionally as no one will bother to add support for turning
it off.

> 
> > 
> > Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  target/i386/cpu.c | 10 ++++++++++
> >  target/i386/cpu.h |  3 +++
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index ec3b50bf6e..9b29cea8c4 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -5931,6 +5931,15 @@ static void x86_cpu_reset(DeviceState *dev)
> >      env->xstate_bv = 0;
> >  
> >      env->pat = 0x0007040600070406ULL;
> > +
> > +    /*
> > +     * Some versions of Windows hang on reboot if their TSC value is greater
> > +     * than 2^54.  As a workaround, reset the TSC to a small value.  Do not use
> > +     * zero, KVM applies special heuristics for CPU startup when TSC is cleared.
> > +     */
> > +    if (cpu->tsc_clear_on_reset) {
> > +        env->tsc = 1;
> > +    }
> >      env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
> >      if (env->features[FEAT_1_ECX] & CPUID_EXT_MONITOR) {
> >          env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_MWAIT;
> > @@ -6957,6 +6966,7 @@ static Property x86_cpu_properties[] = {
> >                       false),
> >      DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
> >      DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
> > +    DEFINE_PROP_BOOL("tsc-clear-on-reset", X86CPU, tsc_clear_on_reset, true),
> >      DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count,
> >                       true),
> >      /*
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index e31e6bd8b8..66f7901729 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1864,6 +1864,9 @@ struct ArchCPU {
> >      /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
> >      bool kvm_pv_enforce_cpuid;
> >  
> > +    /* Clear TSC on reset */
> > +    bool tsc_clear_on_reset;
> > +
> >      /* Number of physical address bits supported */
> >      uint32_t phys_bits;
> >  
> > -- 
> > 2.35.1
> > 
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Paolo Bonzini 2 years, 1 month ago
On 3/24/22 10:15, Daniel P. Berrangé wrote:
> On Thu, Mar 24, 2022 at 09:13:12AM +0000, Daniel P. Berrangé wrote:
>> On Thu, Mar 24, 2022 at 09:23:46AM +0100, Paolo Bonzini wrote:
>>> Some versions of Windows hang on reboot if their TSC value is greater
>>> than 2^54.  The calibration of the Hyper-V reference time overflows
>>> and fails; as a result the processors' clock sources are out of sync.
>>> As a workaround, reset the TSC to a small value.  Do not do this
>>> unconditionally and require a special property to be set.
>>
>> What's the problem with doing it unconditionally ?
>>
>> Requiring this special niche property means that it'll have to be
>> enabled by management apps. Most will never learn it exists, and
>> of those that do, many will take years to get this enabled and
>> into usage by users, and many won't even bother.
>>
>> IMHO, this is the kind of situation where we need the fix to be
>> enabled by default, or we might as well not bother.
> 
> Sigh, hit send too soon. I see the property is actually turned
> on in the defaults below, so effectively it will always be on
> unconditionally as no one will bother to add support for turning
> it off.

Well, I have a patch to turn it on/off in Libvirt and I also planned to 
leave it off by default in RHEL patch updates (I'm not tying it to the 
machine type because it's not a guest ABI change).

I am myself conflicted on whether to leave it on or off in QEMU.  For 
example you could use the TSC to measure how long the VM has been up, 
but this patch makes that not work anymore.  Considering that the bug 
requires literally 2-3 months of VM uptime to manifest itself, it might 
be better to set up the property in libosinfo and only for Windows guests.

Also, since it is a bug in Windows, it will hopefully be fixed sooner or 
later.

Paolo

Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Daniel P. Berrangé 2 years, 1 month ago
On Thu, Mar 24, 2022 at 10:42:22AM +0100, Paolo Bonzini wrote:
> On 3/24/22 10:15, Daniel P. Berrangé wrote:
> > On Thu, Mar 24, 2022 at 09:13:12AM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Mar 24, 2022 at 09:23:46AM +0100, Paolo Bonzini wrote:
> > > > Some versions of Windows hang on reboot if their TSC value is greater
> > > > than 2^54.  The calibration of the Hyper-V reference time overflows
> > > > and fails; as a result the processors' clock sources are out of sync.
> > > > As a workaround, reset the TSC to a small value.  Do not do this
> > > > unconditionally and require a special property to be set.
> > > 
> > > What's the problem with doing it unconditionally ?
> > > 
> > > Requiring this special niche property means that it'll have to be
> > > enabled by management apps. Most will never learn it exists, and
> > > of those that do, many will take years to get this enabled and
> > > into usage by users, and many won't even bother.
> > > 
> > > IMHO, this is the kind of situation where we need the fix to be
> > > enabled by default, or we might as well not bother.
> > 
> > Sigh, hit send too soon. I see the property is actually turned
> > on in the defaults below, so effectively it will always be on
> > unconditionally as no one will bother to add support for turning
> > it off.
> 
> Well, I have a patch to turn it on/off in Libvirt and I also planned to
> leave it off by default in RHEL patch updates (I'm not tying it to the
> machine type because it's not a guest ABI change).
> 
> I am myself conflicted on whether to leave it on or off in QEMU.  For
> example you could use the TSC to measure how long the VM has been up, but
> this patch makes that not work anymore.  Considering that the bug requires
> literally 2-3 months of VM uptime to manifest itself, it might be better to
> set up the property in libosinfo and only for Windows guests.
> 
> Also, since it is a bug in Windows, it will hopefully be fixed sooner or
> later.

In the libvirt patches you mentioned VMWare has the same workaround
available, and not enabled by default. I looked up this VMWare setting
and it is interesting reading:

  https://kb.vmware.com/s/article/2092807

Of particular note

  "This only applies to virtual machine hardware version 10 as Windows
   resets the TSC on all CPUs on virtual machines with older hardware
   versions (which do not support hypervisor.cpuid.v2)."

do you know what they mean when they refer to 'hypervisor.cpuid.v2'
here ? I wonder if it gives any hints as to a root cause that could
be fixed ?

This hardware version 10 is well old - their current hardware version
is 19, so it seems to show the implemented some built-in fix in newer
hardware versions (their equiv of machine types). The vmware setting
dates from 2013, and if I read that kbase correctly isn't needed on
their modern hardware versions. Or maybe monitor_control.enable_softResetClearTSC
became the default in newer hardware versions ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Paolo Bonzini 2 years, 1 month ago
On 3/24/22 12:03, Daniel P. Berrangé wrote:
> 
>    "This only applies to virtual machine hardware version 10 as Windows
>     resets the TSC on all CPUs on virtual machines with older hardware
>     versions (which do not support hypervisor.cpuid.v2)."
> 
> do you know what they mean when they refer to 'hypervisor.cpuid.v2'
> here ? I wonder if it gives any hints as to a root cause that could
> be fixed ?

The difference between hardware versions probably is that older versions 
do not support Hyper-V enlightenments.  The bug does not happen if 
Windows uses the RTC for timekeeping, which is consistent with the 
description above.

> This hardware version 10 is well old - their current hardware version
> is 19, so it seems to show the implemented some built-in fix in newer
> hardware versions (their equiv of machine types). The vmware setting
> dates from 2013, and if I read that kbase correctly isn't needed on
> their modern hardware versions. Or maybe monitor_control.enable_softResetClearTSC
> became the default in newer hardware versions ?

Yeah, maybe it became the default but they didn't want to change 
previously-released hardware versions.  So that would be basically 
treating it as an ABI change, and only enable it in 7.0 machine types.

That said, the VMware kbase does paint a slightly different picture.  It 
implies that starting with hardware version 11 rebooting Windows is done 
through a hard reset instead of INIT.  I'm not sure how that would be 
done, but in the meanwhile our fix should take care of do_cpu_init as well.

Paolo

Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Paolo Bonzini 2 years, 1 month ago
On 3/24/22 12:24, Paolo Bonzini wrote:
> That said, the VMware kbase does paint a slightly different picture.  It 
> implies that starting with hardware version 11 rebooting Windows is done 
> through a hard reset instead of INIT.  I'm not sure how that would be 
> done, but in the meanwhile our fix should take care of do_cpu_init as well.

Ok, so here are my findings:

- Windows resets the system by writing 0xFE to port 0x64.  This is a 
hard reset on QEMU, but presumably it was a soft reset (INIT) on VMware 
until version 10.

- QEMU _does_ try to write 0 to the TSC on hard reset.  But KVM special 
cases 0 as "somebody is trying to hot-plug a new CPU" and wants to help 
out, so it keeps the CPU synchronized with the previous TSC.

So this is a pretty clear-cut QEMU bug.  It can be fixed by e.g. writing 
1 to the TSC instead of 0.

Paolo


Re: [PATCH] target/i386: introduce CPU property to work around Windows reset bug
Posted by Daniel P. Berrangé 2 years, 1 month ago
On Thu, Mar 24, 2022 at 06:13:03PM +0100, Paolo Bonzini wrote:
> On 3/24/22 12:24, Paolo Bonzini wrote:
> > That said, the VMware kbase does paint a slightly different picture.  It
> > implies that starting with hardware version 11 rebooting Windows is done
> > through a hard reset instead of INIT.  I'm not sure how that would be
> > done, but in the meanwhile our fix should take care of do_cpu_init as
> > well.
> 
> Ok, so here are my findings:
> 
> - Windows resets the system by writing 0xFE to port 0x64.  This is a hard
> reset on QEMU, but presumably it was a soft reset (INIT) on VMware until
> version 10.
> 
> - QEMU _does_ try to write 0 to the TSC on hard reset.  But KVM special
> cases 0 as "somebody is trying to hot-plug a new CPU" and wants to help out,
> so it keeps the CPU synchronized with the previous TSC.
> 
> So this is a pretty clear-cut QEMU bug.  It can be fixed by e.g. writing 1
> to the TSC instead of 0.

Ah, excellant findings, so we won't need a config knob after all.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|