[Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

guangrong.xiao@gmail.com posted 5 patches 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170412095111.11728-1-xiaoguangrong@tencent.com
Test checkpatch failed
Test docker passed
Test s390x passed
There is a newer version of this series
hw/timer/mc146818rtc.c | 370 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 272 insertions(+), 98 deletions(-)
[Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by guangrong.xiao@gmail.com 7 years ago
From: Xiao Guangrong <xiaoguangrong@tencent.com>

We noticed that the clock on some windows VMs, e.g, Window7 and window8
is really faster and the issue can be easily reproduced by staring the
VM with '-rtc base=localtime,clock=vm,driftfix=slew -no-hpet' and 
running attached code in the guest

The root cause is that the clock will be lost if the periodic period is
changed as currently code counts the next periodic time like this:
      next_irq_clock = (cur_clock & ~(period - 1)) + period;

consider the case if cur_clock = 0x11FF and period = 0x100, then the
next_irq_clock is 0x1200, however, there is only 1 clock left to trigger
the next irq. Unfortunately, Windows guests (at least Windows7) change
the period very frequently if it runs the attached code, so that the
lost clock is accumulated, the wall-time become faster and faster

The main idea to fix the issue is we use a accurate clock period to
calculate the next irq:
    next_irq_clock = cur_clock + period;

After that, it is really convenient to compensate clock if it is needed 

The code running in windows VM is attached:
// TimeInternalTest.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#pragma comment(lib, "winmm")
#include <stdio.h>
#include <windows.h>

#define SWITCH_PEROID  13

int _tmain(int argc, _TCHAR* argv[])
{
	if (argc != 2)
	{
		printf("parameter error!\n");
		printf("USAGE: *.exe time(ms)\n");
		printf("example: *.exe 40\n");
		return 0;
	}
	else
	{
		DWORD internal = atoi((char *)argv[1]);
		DWORD count = 0;

		while (1)
		{
			count++;
			timeBeginPeriod(1);
			DWORD start = timeGetTime();
			Sleep(internal);
			timeEndPeriod(1);
			if ((count % SWITCH_PEROID) == 0) {
				Sleep(1);
			}
		}
	}
	return 0;
}

Tai Yunfang (1):
  mc146818rtc: properly count the time for the next interrupt

Xiao Guangrong (4):
  mc146818rtc: update periodic timer only if it is needed
  mc146818rtc: fix clock lost after scaling coalesced irq
  mc146818rtc: move x86 specific code out of periodic_timer_update
  mc146818rtc: embrace all x86 specific code

 hw/timer/mc146818rtc.c | 370 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 272 insertions(+), 98 deletions(-)

-- 
2.9.3


Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Paolo Bonzini 7 years ago

On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
> The root cause is that the clock will be lost if the periodic period is
> changed as currently code counts the next periodic time like this:
>       next_irq_clock = (cur_clock & ~(period - 1)) + period;
> 
> consider the case if cur_clock = 0x11FF and period = 0x100, then the
> next_irq_clock is 0x1200, however, there is only 1 clock left to trigger
> the next irq. Unfortunately, Windows guests (at least Windows7) change
> the period very frequently if it runs the attached code, so that the
> lost clock is accumulated, the wall-time become faster and faster

Very interesting.

However, I think that the above should be exactly how the RTC should
work.  The original RTC circuit had 22 divider stages (see page 13 of
the datasheet[1], at the bottom right), and the periodic interrupt taps
the rising edge of one of the dividers (page 16, second paragraph).  The
datasheet also never mentions a comparator being used to trigger the
periodic interrupts.

Have you checked that this Windows bug doesn't happen on real hardware
too?  Or is the combination of driftfix=slew and changing periods that
is a problem?

The series also does more than this fix (or workaround), so I will
review it anyway.

[1] http://www.nxp.com/assets/documents/data/en/data-sheets/MC146818.pdf

Thanks,

Paolo

Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>
>
> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>> The root cause is that the clock will be lost if the periodic period is
>> changed as currently code counts the next periodic time like this:
>>       next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>
>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>> next_irq_clock is 0x1200, however, there is only 1 clock left to trigger
>> the next irq. Unfortunately, Windows guests (at least Windows7) change
>> the period very frequently if it runs the attached code, so that the
>> lost clock is accumulated, the wall-time become faster and faster
>
> Very interesting.
>

Yes, indeed.

> However, I think that the above should be exactly how the RTC should
> work.  The original RTC circuit had 22 divider stages (see page 13 of
> the datasheet[1], at the bottom right), and the periodic interrupt taps
> the rising edge of one of the dividers (page 16, second paragraph).  The
> datasheet also never mentions a comparator being used to trigger the
> periodic interrupts.
>

That was my thought before, however, after more test, i am not sure if
re-configuring RegA changes these divider stages internal...

> Have you checked that this Windows bug doesn't happen on real hardware
> too?  Or is the combination of driftfix=slew and changing periods that
> is a problem?
>

I have two physical windows 7 machines, both of them have
'useplatformclock = off' and ntp disabled, the wall time is really
accurate. The difference is that the physical machines are using Intel
Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
issue is easily be reproduced just in ~10 mins.

Our test mostly focus on 'driftfix=slew' and after this patchset the
time is accurate and stable.

I will do the test for dropping 'slew' and see what will happen...

> The series also does more than this fix (or workaround), so I will
> review it anyway.
>

Thank you, Paolo!


Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>
>
> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>
>>
>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>> The root cause is that the clock will be lost if the periodic period is
>>> changed as currently code counts the next periodic time like this:
>>>       next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>
>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>>> next_irq_clock is 0x1200, however, there is only 1 clock left to trigger
>>> the next irq. Unfortunately, Windows guests (at least Windows7) change
>>> the period very frequently if it runs the attached code, so that the
>>> lost clock is accumulated, the wall-time become faster and faster
>>
>> Very interesting.
>>
>
> Yes, indeed.
>
>> However, I think that the above should be exactly how the RTC should
>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>> the datasheet[1], at the bottom right), and the periodic interrupt taps
>> the rising edge of one of the dividers (page 16, second paragraph).  The
>> datasheet also never mentions a comparator being used to trigger the
>> periodic interrupts.
>>
>
> That was my thought before, however, after more test, i am not sure if
> re-configuring RegA changes these divider stages internal...
>
>> Have you checked that this Windows bug doesn't happen on real hardware
>> too?  Or is the combination of driftfix=slew and changing periods that
>> is a problem?
>>
>
> I have two physical windows 7 machines, both of them have
> 'useplatformclock = off' and ntp disabled, the wall time is really
> accurate. The difference is that the physical machines are using Intel
> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
> issue is easily be reproduced just in ~10 mins.
>
> Our test mostly focus on 'driftfix=slew' and after this patchset the
> time is accurate and stable.
>
> I will do the test for dropping 'slew' and see what will happen...
>

Well, the time is easily observed to be faster if 'driftfix=slew' is
not used. :(


[Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Zhanghailiang 7 years ago
Hi,

-----邮件原件-----
发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] 代表 Xiao Guangrong
发送时间: 2017年4月13日 16:53
收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org; yunfangtai@tencent.com; Xiao Guangrong
主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster



On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>
>
> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>
>>
>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>> The root cause is that the clock will be lost if the periodic period 
>>> is changed as currently code counts the next periodic time like this:
>>>       next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>
>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the 
>>> next_irq_clock is 0x1200, however, there is only 1 clock left to 
>>> trigger the next irq. Unfortunately, Windows guests (at least 
>>> Windows7) change the period very frequently if it runs the attached 
>>> code, so that the lost clock is accumulated, the wall-time become 
>>> faster and faster
>>
>> Very interesting.
>>
>
> Yes, indeed.
>
>> However, I think that the above should be exactly how the RTC should 
>> work.  The original RTC circuit had 22 divider stages (see page 13 of 
>> the datasheet[1], at the bottom right), and the periodic interrupt 
>> taps the rising edge of one of the dividers (page 16, second 
>> paragraph).  The datasheet also never mentions a comparator being 
>> used to trigger the periodic interrupts.
>>
>
> That was my thought before, however, after more test, i am not sure if 
> re-configuring RegA changes these divider stages internal...
>
>> Have you checked that this Windows bug doesn't happen on real 
>> hardware too?  Or is the combination of driftfix=slew and changing 
>> periods that is a problem?
>>
>
> I have two physical windows 7 machines, both of them have 
> 'useplatformclock = off' and ntp disabled, the wall time is really 
> accurate. The difference is that the physical machines are using Intel
> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the 
> issue is easily be reproduced just in ~10 mins.
>
> Our test mostly focus on 'driftfix=slew' and after this patchset the 
> time is accurate and stable.
>
> I will do the test for dropping 'slew' and see what will happen...
>

> Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :(

You mean, it only fixes the one case which with the ' driftfix=slew ' is used ?
We encountered this problem too, I have tried to fix it long time ago.  
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
(It seems that your solution is more useful)
But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware, 
but we didn't find any clear description about it. And it seems that other virtualization platforms
have this problem too:
VMware:
https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
Heper-v:
https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/


Hailiang
Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/13/2017 05:05 PM, Zhanghailiang wrote:
> Hi,
>
> -----邮件原件-----
> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] 代表 Xiao Guangrong
> 发送时间: 2017年4月13日 16:53
> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org; yunfangtai@tencent.com; Xiao Guangrong
> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>
>
>
> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>
>>
>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>> The root cause is that the clock will be lost if the periodic period
>>>> is changed as currently code counts the next periodic time like this:
>>>>       next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>
>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>> Windows7) change the period very frequently if it runs the attached
>>>> code, so that the lost clock is accumulated, the wall-time become
>>>> faster and faster
>>>
>>> Very interesting.
>>>
>>
>> Yes, indeed.
>>
>>> However, I think that the above should be exactly how the RTC should
>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>> taps the rising edge of one of the dividers (page 16, second
>>> paragraph).  The datasheet also never mentions a comparator being
>>> used to trigger the periodic interrupts.
>>>
>>
>> That was my thought before, however, after more test, i am not sure if
>> re-configuring RegA changes these divider stages internal...
>>
>>> Have you checked that this Windows bug doesn't happen on real
>>> hardware too?  Or is the combination of driftfix=slew and changing
>>> periods that is a problem?
>>>
>>
>> I have two physical windows 7 machines, both of them have
>> 'useplatformclock = off' and ntp disabled, the wall time is really
>> accurate. The difference is that the physical machines are using Intel
>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>> issue is easily be reproduced just in ~10 mins.
>>
>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>> time is accurate and stable.
>>
>> I will do the test for dropping 'slew' and see what will happen...
>>
>
>> Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :(
>
> You mean, it only fixes the one case which with the ' driftfix=slew ' is used ?

No. for both.

> We encountered this problem too, I have tried to fix it long time ago.
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
> (It seems that your solution is more useful)
> But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware,
> but we didn't find any clear description about it. And it seems that other virtualization platforms

That is the issue, the hardware spec does not detail how the clock is
counted when the timer interval is changed. What we can do at this time
is that speculate it from the behaviors. Current RTC is completely
unusable anyway.


> have this problem too:
> VMware:
> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
> Heper-v:
> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/

Hmm, slower clock is understandable, does really the Windows7 on hyperV
have faster clock? Did you meet it?


Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Hailiang Zhang 7 years ago
On 2017/4/13 17:18, Xiao Guangrong wrote:
>
> On 04/13/2017 05:05 PM, Zhanghailiang wrote:
>> Hi,
>>
>> -----邮件原件-----
>> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] 代表 Xiao Guangrong
>> 发送时间: 2017年4月13日 16:53
>> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
>> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org; yunfangtai@tencent.com; Xiao Guangrong
>> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>>
>>
>>
>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>>
>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>>
>>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>>> The root cause is that the clock will be lost if the periodic period
>>>>> is changed as currently code counts the next periodic time like this:
>>>>>        next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>>
>>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>>> Windows7) change the period very frequently if it runs the attached
>>>>> code, so that the lost clock is accumulated, the wall-time become
>>>>> faster and faster
>>>> Very interesting.
>>>>
>>> Yes, indeed.
>>>
>>>> However, I think that the above should be exactly how the RTC should
>>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>>> taps the rising edge of one of the dividers (page 16, second
>>>> paragraph).  The datasheet also never mentions a comparator being
>>>> used to trigger the periodic interrupts.
>>>>
>>> That was my thought before, however, after more test, i am not sure if
>>> re-configuring RegA changes these divider stages internal...
>>>
>>>> Have you checked that this Windows bug doesn't happen on real
>>>> hardware too?  Or is the combination of driftfix=slew and changing
>>>> periods that is a problem?
>>>>
>>> I have two physical windows 7 machines, both of them have
>>> 'useplatformclock = off' and ntp disabled, the wall time is really
>>> accurate. The difference is that the physical machines are using Intel
>>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>>> issue is easily be reproduced just in ~10 mins.
>>>
>>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>>> time is accurate and stable.
>>>
>>> I will do the test for dropping 'slew' and see what will happen...
>>>
>>> Well, the time is easily observed to be faster if 'driftfix=slew' is not used. :(
>> You mean, it only fixes the one case which with the ' driftfix=slew ' is used ?
> No. for both.
>
>> We encountered this problem too, I have tried to fix it long time ago.
>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
>> (It seems that your solution is more useful)
>> But it seems that it is impossible to fix, we need to emulate the behaviors of real hardware,
>> but we didn't find any clear description about it. And it seems that other virtualization platforms
> That is the issue, the hardware spec does not detail how the clock is
> counted when the timer interval is changed. What we can do at this time
> is that speculate it from the behaviors. Current RTC is completely
> unusable anyway.
>
>
>> have this problem too:
>> VMware:
>> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
>> Heper-v:
>> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/
> Hmm, slower clock is understandable, does really the Windows7 on hyperV
> have faster clock? Did you meet it?

I don't know, we didn't test it, besides, I'd like to know how long did your testcase run before
you judge it is stable with 'driftfix=slew'  option? (My previous patch can't fix it completely but
only narrows the gap between timer in guest and real timer.)

Hailiang
>
> .
>



Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/13/2017 05:29 PM, Hailiang Zhang wrote:
> On 2017/4/13 17:18, Xiao Guangrong wrote:
>>
>> On 04/13/2017 05:05 PM, Zhanghailiang wrote:
>>> Hi,
>>>
>>> -----邮件原件-----
>>> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>> 代表 Xiao Guangrong
>>> 发送时间: 2017年4月13日 16:53
>>> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
>>> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org;
>>> yunfangtai@tencent.com; Xiao Guangrong
>>> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>>>
>>>
>>>
>>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>>>
>>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>>>
>>>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>>>> The root cause is that the clock will be lost if the periodic period
>>>>>> is changed as currently code counts the next periodic time like this:
>>>>>>        next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>>>
>>>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>>>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>>>> Windows7) change the period very frequently if it runs the attached
>>>>>> code, so that the lost clock is accumulated, the wall-time become
>>>>>> faster and faster
>>>>> Very interesting.
>>>>>
>>>> Yes, indeed.
>>>>
>>>>> However, I think that the above should be exactly how the RTC should
>>>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>>>> taps the rising edge of one of the dividers (page 16, second
>>>>> paragraph).  The datasheet also never mentions a comparator being
>>>>> used to trigger the periodic interrupts.
>>>>>
>>>> That was my thought before, however, after more test, i am not sure if
>>>> re-configuring RegA changes these divider stages internal...
>>>>
>>>>> Have you checked that this Windows bug doesn't happen on real
>>>>> hardware too?  Or is the combination of driftfix=slew and changing
>>>>> periods that is a problem?
>>>>>
>>>> I have two physical windows 7 machines, both of them have
>>>> 'useplatformclock = off' and ntp disabled, the wall time is really
>>>> accurate. The difference is that the physical machines are using Intel
>>>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>>>> issue is easily be reproduced just in ~10 mins.
>>>>
>>>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>>>> time is accurate and stable.
>>>>
>>>> I will do the test for dropping 'slew' and see what will happen...
>>>>
>>>> Well, the time is easily observed to be faster if 'driftfix=slew' is
>>>> not used. :(
>>> You mean, it only fixes the one case which with the ' driftfix=slew '
>>> is used ?
>> No. for both.
>>
>>> We encountered this problem too, I have tried to fix it long time ago.
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
>>> (It seems that your solution is more useful)
>>> But it seems that it is impossible to fix, we need to emulate the
>>> behaviors of real hardware,
>>> but we didn't find any clear description about it. And it seems that
>>> other virtualization platforms
>> That is the issue, the hardware spec does not detail how the clock is
>> counted when the timer interval is changed. What we can do at this time
>> is that speculate it from the behaviors. Current RTC is completely
>> unusable anyway.
>>
>>
>>> have this problem too:
>>> VMware:
>>> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
>>> Heper-v:
>>> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/
>>>
>> Hmm, slower clock is understandable, does really the Windows7 on hyperV
>> have faster clock? Did you meet it?
>
> I don't know, we didn't test it, besides, I'd like to know how long did
> your testcase run before
> you judge it is stable with 'driftfix=slew'  option? (My previous patch
> can't fix it completely but
> only narrows the gap between timer in guest and real timer.)

More than 12 hours.



Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Hailiang Zhang 7 years ago
On 2017/4/13 17:35, Xiao Guangrong wrote:
>
> On 04/13/2017 05:29 PM, Hailiang Zhang wrote:
>> On 2017/4/13 17:18, Xiao Guangrong wrote:
>>> On 04/13/2017 05:05 PM, Zhanghailiang wrote:
>>>> Hi,
>>>>
>>>> -----邮件原件-----
>>>> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>> 代表 Xiao Guangrong
>>>> 发送时间: 2017年4月13日 16:53
>>>> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
>>>> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org;
>>>> yunfangtai@tencent.com; Xiao Guangrong
>>>> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>>>>
>>>>
>>>>
>>>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>>>>> The root cause is that the clock will be lost if the periodic period
>>>>>>> is changed as currently code counts the next periodic time like this:
>>>>>>>         next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>>>>
>>>>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the
>>>>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>>>>> Windows7) change the period very frequently if it runs the attached
>>>>>>> code, so that the lost clock is accumulated, the wall-time become
>>>>>>> faster and faster
>>>>>> Very interesting.
>>>>>>
>>>>> Yes, indeed.
>>>>>
>>>>>> However, I think that the above should be exactly how the RTC should
>>>>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>>>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>>>>> taps the rising edge of one of the dividers (page 16, second
>>>>>> paragraph).  The datasheet also never mentions a comparator being
>>>>>> used to trigger the periodic interrupts.
>>>>>>
>>>>> That was my thought before, however, after more test, i am not sure if
>>>>> re-configuring RegA changes these divider stages internal...
>>>>>
>>>>>> Have you checked that this Windows bug doesn't happen on real
>>>>>> hardware too?  Or is the combination of driftfix=slew and changing
>>>>>> periods that is a problem?
>>>>>>
>>>>> I have two physical windows 7 machines, both of them have
>>>>> 'useplatformclock = off' and ntp disabled, the wall time is really
>>>>> accurate. The difference is that the physical machines are using Intel
>>>>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>>>>> issue is easily be reproduced just in ~10 mins.
>>>>>
>>>>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>>>>> time is accurate and stable.
>>>>>
>>>>> I will do the test for dropping 'slew' and see what will happen...
>>>>>
>>>>> Well, the time is easily observed to be faster if 'driftfix=slew' is
>>>>> not used. :(
>>>> You mean, it only fixes the one case which with the ' driftfix=slew '
>>>> is used ?
>>> No. for both.
>>>
>>>> We encountered this problem too, I have tried to fix it long time ago.
>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
>>>> (It seems that your solution is more useful)
>>>> But it seems that it is impossible to fix, we need to emulate the
>>>> behaviors of real hardware,
>>>> but we didn't find any clear description about it. And it seems that
>>>> other virtualization platforms
>>> That is the issue, the hardware spec does not detail how the clock is
>>> counted when the timer interval is changed. What we can do at this time
>>> is that speculate it from the behaviors. Current RTC is completely
>>> unusable anyway.
>>>
>>>
>>>> have this problem too:
>>>> VMware:
>>>> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
>>>> Heper-v:
>>>> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/
>>>>
>>> Hmm, slower clock is understandable, does really the Windows7 on hyperV
>>> have faster clock? Did you meet it?
>> I don't know, we didn't test it, besides, I'd like to know how long did
>> your testcase run before
>> you judge it is stable with 'driftfix=slew'  option? (My previous patch
>> can't fix it completely but
>> only narrows the gap between timer in guest and real timer.)
> More than 12 hours.

Great, I'll test and look into it ... thanks.

>
>
> .
>



Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/13/2017 05:38 PM, Hailiang Zhang wrote:
> On 2017/4/13 17:35, Xiao Guangrong wrote:
>>
>> On 04/13/2017 05:29 PM, Hailiang Zhang wrote:
>>> On 2017/4/13 17:18, Xiao Guangrong wrote:
>>>> On 04/13/2017 05:05 PM, Zhanghailiang wrote:
>>>>> Hi,
>>>>>
>>>>> -----邮件原件-----
>>>>> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>>> 代表 Xiao Guangrong
>>>>> 发送时间: 2017年4月13日 16:53
>>>>> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
>>>>> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org;
>>>>> yunfangtai@tencent.com; Xiao Guangrong
>>>>> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>>>>>
>>>>>
>>>>>
>>>>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>>>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>>>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>>>>>> The root cause is that the clock will be lost if the periodic
>>>>>>>> period
>>>>>>>> is changed as currently code counts the next periodic time like
>>>>>>>> this:
>>>>>>>>         next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>>>>>
>>>>>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then
>>>>>>>> the
>>>>>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>>>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>>>>>> Windows7) change the period very frequently if it runs the attached
>>>>>>>> code, so that the lost clock is accumulated, the wall-time become
>>>>>>>> faster and faster
>>>>>>> Very interesting.
>>>>>>>
>>>>>> Yes, indeed.
>>>>>>
>>>>>>> However, I think that the above should be exactly how the RTC should
>>>>>>> work.  The original RTC circuit had 22 divider stages (see page
>>>>>>> 13 of
>>>>>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>>>>>> taps the rising edge of one of the dividers (page 16, second
>>>>>>> paragraph).  The datasheet also never mentions a comparator being
>>>>>>> used to trigger the periodic interrupts.
>>>>>>>
>>>>>> That was my thought before, however, after more test, i am not
>>>>>> sure if
>>>>>> re-configuring RegA changes these divider stages internal...
>>>>>>
>>>>>>> Have you checked that this Windows bug doesn't happen on real
>>>>>>> hardware too?  Or is the combination of driftfix=slew and changing
>>>>>>> periods that is a problem?
>>>>>>>
>>>>>> I have two physical windows 7 machines, both of them have
>>>>>> 'useplatformclock = off' and ntp disabled, the wall time is really
>>>>>> accurate. The difference is that the physical machines are using
>>>>>> Intel
>>>>>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>>>>>> issue is easily be reproduced just in ~10 mins.
>>>>>>
>>>>>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>>>>>> time is accurate and stable.
>>>>>>
>>>>>> I will do the test for dropping 'slew' and see what will happen...
>>>>>>
>>>>>> Well, the time is easily observed to be faster if 'driftfix=slew' is
>>>>>> not used. :(
>>>>> You mean, it only fixes the one case which with the ' driftfix=slew '
>>>>> is used ?
>>>> No. for both.
>>>>
>>>>> We encountered this problem too, I have tried to fix it long time ago.
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
>>>>> (It seems that your solution is more useful)
>>>>> But it seems that it is impossible to fix, we need to emulate the
>>>>> behaviors of real hardware,
>>>>> but we didn't find any clear description about it. And it seems that
>>>>> other virtualization platforms
>>>> That is the issue, the hardware spec does not detail how the clock is
>>>> counted when the timer interval is changed. What we can do at this time
>>>> is that speculate it from the behaviors. Current RTC is completely
>>>> unusable anyway.
>>>>
>>>>
>>>>> have this problem too:
>>>>> VMware:
>>>>> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
>>>>> Heper-v:
>>>>> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/
>>>>>
>>>>>
>>>> Hmm, slower clock is understandable, does really the Windows7 on hyperV
>>>> have faster clock? Did you meet it?
>>> I don't know, we didn't test it, besides, I'd like to know how long did
>>> your testcase run before
>>> you judge it is stable with 'driftfix=slew'  option? (My previous patch
>>> can't fix it completely but
>>> only narrows the gap between timer in guest and real timer.)
>> More than 12 hours.
>
> Great, I'll test and look into it ... thanks.
>

Hi Hailiang,

Does this patchset work for you? :)



Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Hailiang Zhang 7 years ago
On 2017/4/19 10:02, Xiao Guangrong wrote:
>
> On 04/13/2017 05:38 PM, Hailiang Zhang wrote:
>> On 2017/4/13 17:35, Xiao Guangrong wrote:
>>> On 04/13/2017 05:29 PM, Hailiang Zhang wrote:
>>>> On 2017/4/13 17:18, Xiao Guangrong wrote:
>>>>> On 04/13/2017 05:05 PM, Zhanghailiang wrote:
>>>>>> Hi,
>>>>>>
>>>>>> -----邮件原件-----
>>>>>> 发件人: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>>>> 代表 Xiao Guangrong
>>>>>> 发送时间: 2017年4月13日 16:53
>>>>>> 收件人: Paolo Bonzini; mst@redhat.com; mtosatti@redhat.com
>>>>>> 抄送: qemu-devel@nongnu.org; kvm@vger.kernel.org;
>>>>>> yunfangtai@tencent.com; Xiao Guangrong
>>>>>> 主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>>>>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>>>>>> On 12/04/2017 17:51, guangrong.xiao@gmail.com wrote:
>>>>>>>>> The root cause is that the clock will be lost if the periodic
>>>>>>>>> period
>>>>>>>>> is changed as currently code counts the next periodic time like
>>>>>>>>> this:
>>>>>>>>>          next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>>>>>>>
>>>>>>>>> consider the case if cur_clock = 0x11FF and period = 0x100, then
>>>>>>>>> the
>>>>>>>>> next_irq_clock is 0x1200, however, there is only 1 clock left to
>>>>>>>>> trigger the next irq. Unfortunately, Windows guests (at least
>>>>>>>>> Windows7) change the period very frequently if it runs the attached
>>>>>>>>> code, so that the lost clock is accumulated, the wall-time become
>>>>>>>>> faster and faster
>>>>>>>> Very interesting.
>>>>>>>>
>>>>>>> Yes, indeed.
>>>>>>>
>>>>>>>> However, I think that the above should be exactly how the RTC should
>>>>>>>> work.  The original RTC circuit had 22 divider stages (see page
>>>>>>>> 13 of
>>>>>>>> the datasheet[1], at the bottom right), and the periodic interrupt
>>>>>>>> taps the rising edge of one of the dividers (page 16, second
>>>>>>>> paragraph).  The datasheet also never mentions a comparator being
>>>>>>>> used to trigger the periodic interrupts.
>>>>>>>>
>>>>>>> That was my thought before, however, after more test, i am not
>>>>>>> sure if
>>>>>>> re-configuring RegA changes these divider stages internal...
>>>>>>>
>>>>>>>> Have you checked that this Windows bug doesn't happen on real
>>>>>>>> hardware too?  Or is the combination of driftfix=slew and changing
>>>>>>>> periods that is a problem?
>>>>>>>>
>>>>>>> I have two physical windows 7 machines, both of them have
>>>>>>> 'useplatformclock = off' and ntp disabled, the wall time is really
>>>>>>> accurate. The difference is that the physical machines are using
>>>>>>> Intel
>>>>>>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>>>>>>> issue is easily be reproduced just in ~10 mins.
>>>>>>>
>>>>>>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>>>>>>> time is accurate and stable.
>>>>>>>
>>>>>>> I will do the test for dropping 'slew' and see what will happen...
>>>>>>>
>>>>>>> Well, the time is easily observed to be faster if 'driftfix=slew' is
>>>>>>> not used. :(
>>>>>> You mean, it only fixes the one case which with the ' driftfix=slew '
>>>>>> is used ?
>>>>> No. for both.
>>>>>
>>>>>> We encountered this problem too, I have tried to fix it long time ago.
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
>>>>>> (It seems that your solution is more useful)
>>>>>> But it seems that it is impossible to fix, we need to emulate the
>>>>>> behaviors of real hardware,
>>>>>> but we didn't find any clear description about it. And it seems that
>>>>>> other virtualization platforms
>>>>> That is the issue, the hardware spec does not detail how the clock is
>>>>> counted when the timer interval is changed. What we can do at this time
>>>>> is that speculate it from the behaviors. Current RTC is completely
>>>>> unusable anyway.
>>>>>
>>>>>
>>>>>> have this problem too:
>>>>>> VMware:
>>>>>> https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
>>>>>> Heper-v:
>>>>>> https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/
>>>>>>
>>>>>>
>>>>> Hmm, slower clock is understandable, does really the Windows7 on hyperV
>>>>> have faster clock? Did you meet it?
>>>> I don't know, we didn't test it, besides, I'd like to know how long did
>>>> your testcase run before
>>>> you judge it is stable with 'driftfix=slew'  option? (My previous patch
>>>> can't fix it completely but
>>>> only narrows the gap between timer in guest and real timer.)
>>> More than 12 hours.
>> Great, I'll test and look into it ... thanks.
>>
> Hi Hailiang,
>
> Does this patchset work for you? :)

Yes, i think it works for us, nice work :)

>
>
> .
>



Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/19/2017 06:41 PM, Hailiang Zhang wrote:

>> Hi Hailiang,
>>
>> Does this patchset work for you? :)
>
> Yes, i think it works for us, nice work :)

Appreciate your test, Hailiang!

Paolo, any comment? :)


Re: [Qemu-devel]答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Paolo Bonzini 7 years ago

On 19/04/2017 13:13, Xiao Guangrong wrote:
> 
> 
> On 04/19/2017 06:41 PM, Hailiang Zhang wrote:
> 
>>> Hi Hailiang,
>>>
>>> Does this patchset work for you? :)
>>
>> Yes, i think it works for us, nice work :)
> 
> Appreciate your test, Hailiang!
> 
> Paolo, any comment? :)

Will review as soon as 2.9 gets out. :)

Paolo

Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Paolo Bonzini 7 years ago

On 13/04/2017 16:52, Xiao Guangrong wrote:
> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>> However, I think that the above should be exactly how the RTC should
>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>> the datasheet[1], at the bottom right), and the periodic interrupt taps
>>> the rising edge of one of the dividers (page 16, second paragraph).  The
>>> datasheet also never mentions a comparator being used to trigger the
>>> periodic interrupts.
>>>
>>
>> That was my thought before, however, after more test, i am not sure if
>> re-configuring RegA changes these divider stages internal...

It's unlikely because there is a separate divider reset command.  But
Hailiang found the same problem, and Bochs does the same implementation
as you.

It's even possible (not sure how likely) that the original MC146818 RTC
had the bug, but recent Super I/O chips fixed it to work around the
problem with Windows.

Thanks,

Paolo

>>> Have you checked that this Windows bug doesn't happen on real hardware
>>> too?  Or is the combination of driftfix=slew and changing periods that
>>> is a problem?
>>>
>>
>> I have two physical windows 7 machines, both of them have
>> 'useplatformclock = off' and ntp disabled, the wall time is really
>> accurate. The difference is that the physical machines are using Intel
>> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the
>> issue is easily be reproduced just in ~10 mins.
>>
>> Our test mostly focus on 'driftfix=slew' and after this patchset the
>> time is accurate and stable.
>>
>> I will do the test for dropping 'slew' and see what will happen...
>>
> 
> Well, the time is easily observed to be faster if 'driftfix=slew' is
> not used. :(
> 

Re: [Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster
Posted by Xiao Guangrong 7 years ago

On 04/14/2017 01:09 PM, Paolo Bonzini wrote:
>
>
> On 13/04/2017 16:52, Xiao Guangrong wrote:
>> On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>>> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>>> However, I think that the above should be exactly how the RTC should
>>>> work.  The original RTC circuit had 22 divider stages (see page 13 of
>>>> the datasheet[1], at the bottom right), and the periodic interrupt taps
>>>> the rising edge of one of the dividers (page 16, second paragraph).  The
>>>> datasheet also never mentions a comparator being used to trigger the
>>>> periodic interrupts.
>>>>
>>>
>>> That was my thought before, however, after more test, i am not sure if
>>> re-configuring RegA changes these divider stages internal...
>
> It's unlikely because there is a separate divider reset command.  But
> Hailiang found the same problem, and Bochs does the same implementation
> as you.
>

Happy to see the same approach is being used in Bochs. Thanks for your
check. :)