[v1] target/arm/kvm: Provide an option to adjust virtual time

[PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 6 months ago

v2:
 - move from RFC status to v1
 - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
 - add r-b's from Richard


This series is inspired by a series[1] posted by Bijan Mottahedeh about
a year ago.  The problem described in the cover letter of [1] is easily
reproducible and some users would like to have the option to avoid it.
However the solution, which is to adjust the virtual counter offset each
time the VM transitions to the running state, introduces a different
problem, which is that the virtual and physical counters diverge.  As
described in the cover letter of [1] this divergence is easily observed
when comparing the output of `date` and `hwclock` after suspending the
guest, waiting a while, and then resuming it.  Because this different
problem may actually be worse for some users, unlike [1], the series
posted here makes the virtual counter offset adjustment optional and not
even enabled by default.  Besides the adjustment being optional, this
series approaches the needed changes differently to apply them in more
appropriate locations.  Finally, unlike [1], this series doesn't attempt
to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
only ticks when the VM is not stopped, is sufficient.

I've based this series on the SVE series[2] because we're adding a new
CPU feature (kvm-adjvtime) and the SVE series introduces CPU feature
documentation, probing, and tests that we can then immediately apply
to kvm-adjvtime.

Additional notes
----------------

Note 1
------

As described above, when running a guest with kvm-adjtime enabled it
will be less likely the guest OS and guest applications get surprise
time jumps when they use the virtual counter.  However the counter will
no longer reflect real time.  It will lag behind.  If this is a problem
then the guest can resynchronize its time from an external source or
even from its physical counter.  If the suspend/resume is done with
libvirt's virsh, and the guest is running the guest agent, then it's
also possible to use a sequence like this

 $ virsh suspend $GUEST
 $ virsh resume $GUEST
 $ virsh domtime --sync $GUEST

in order to resynchronize a guest right after the resume.  Of course
there will still be time when the clock is not right, possibly creating
confusing timestamps in logs, for example, and the guest must still be
tolerant to the time synchronizations.

Note 2
------

Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that
the KVM register ID is not correct.  This cannot be fixed because it's
UAPI and if the UAPI headers are used then it can't be a problem.
However, if a userspace attempts to create the ID themselves from the
register's specification, then they will get KVM_REG_ARM_TIMER_CVAL
instead, as the _CNT and _CVAL definitions have their register
parameters swapped.

Note 3
------

I didn't test this with a 32-bit KVM host, but the changes to kvm32.c
are the same as kvm64.c. So what could go wrong? Test results would be
appreciated.
 

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html
[2] https://patchew.org/QEMU/20191001125845.8793-1-drjones@redhat.com/

Thanks,
drew


Andrew Jones (5):
  target/arm/kvm64: kvm64 cpus have timer registers
  timer: arm: Introduce functions to get the host cntfrq
  target/arm/kvm: Implement cpu feature kvm-adjvtime
  tests/arm-cpu-features: Check feature default values
  target/arm/cpu: Add the kvm-adjvtime CPU property

 docs/arm-cpu-features.rst | 27 +++++++++++++++++-
 include/qemu/timer.h      | 16 +++++++++++
 target/arm/cpu.c          |  2 ++
 target/arm/cpu.h          |  3 ++
 target/arm/cpu64.c        |  1 +
 target/arm/kvm.c          | 59 +++++++++++++++++++++++++++++++++++++++
 target/arm/kvm32.c        |  3 ++
 target/arm/kvm64.c        |  4 +++
 target/arm/kvm_arm.h      | 25 +++++++++++++++++
 target/arm/monitor.c      |  1 +
 tests/arm-cpu-features.c  | 48 +++++++++++++++++++++++++------
 11 files changed, 179 insertions(+), 10 deletions(-)

-- 
2.21.0

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Peter Maydell 4 years, 4 months ago

On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
>
> v2:
>  - move from RFC status to v1
>  - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
>  - add r-b's from Richard
>
>
> This series is inspired by a series[1] posted by Bijan Mottahedeh about
> a year ago.  The problem described in the cover letter of [1] is easily
> reproducible and some users would like to have the option to avoid it.
> However the solution, which is to adjust the virtual counter offset each
> time the VM transitions to the running state, introduces a different
> problem, which is that the virtual and physical counters diverge.  As
> described in the cover letter of [1] this divergence is easily observed
> when comparing the output of `date` and `hwclock` after suspending the
> guest, waiting a while, and then resuming it.  Because this different
> problem may actually be worse for some users, unlike [1], the series
> posted here makes the virtual counter offset adjustment optional and not
> even enabled by default.  Besides the adjustment being optional, this
> series approaches the needed changes differently to apply them in more
> appropriate locations.  Finally, unlike [1], this series doesn't attempt
> to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> only ticks when the VM is not stopped, is sufficient.

So I guess my overall question is "what is the x86 solution to
this problem, and why is this all arm-specific?" It would also
be helpful to know how it fits into all the other proposals regarding
time in VMs.

thanks
-- PMM

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote:
> On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
> >
> > v2:
> >  - move from RFC status to v1
> >  - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
> >  - add r-b's from Richard
> >
> >
> > This series is inspired by a series[1] posted by Bijan Mottahedeh about
> > a year ago.  The problem described in the cover letter of [1] is easily
> > reproducible and some users would like to have the option to avoid it.
> > However the solution, which is to adjust the virtual counter offset each
> > time the VM transitions to the running state, introduces a different
> > problem, which is that the virtual and physical counters diverge.  As
> > described in the cover letter of [1] this divergence is easily observed
> > when comparing the output of `date` and `hwclock` after suspending the
> > guest, waiting a while, and then resuming it.  Because this different
> > problem may actually be worse for some users, unlike [1], the series
> > posted here makes the virtual counter offset adjustment optional and not
> > even enabled by default.  Besides the adjustment being optional, this
> > series approaches the needed changes differently to apply them in more
> > appropriate locations.  Finally, unlike [1], this series doesn't attempt
> > to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> > only ticks when the VM is not stopped, is sufficient.
> 
> So I guess my overall question is "what is the x86 solution to
> this problem, and why is this all arm-specific?" It would also

x86 adjusts the counter offset by default, and I don't think there's any
way to turn that behavior off. I think it's too late to follow that
default for arm, but this series provides a way to opt into the same
behavior.

> be helpful to know how it fits into all the other proposals regarding
> time in VMs.

I've been lightly following the other stuff, but haven't yet seen any
overlap.

BTW, this series needs to be rebased and reposted. I've been waiting for
4.2 though.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrea Bolognani 4 years, 4 months ago

On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote:
> On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote:
> > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
> > > This series is inspired by a series[1] posted by Bijan Mottahedeh about
> > > a year ago.  The problem described in the cover letter of [1] is easily
> > > reproducible and some users would like to have the option to avoid it.
> > > However the solution, which is to adjust the virtual counter offset each
> > > time the VM transitions to the running state, introduces a different
> > > problem, which is that the virtual and physical counters diverge.  As
> > > described in the cover letter of [1] this divergence is easily observed
> > > when comparing the output of `date` and `hwclock` after suspending the
> > > guest, waiting a while, and then resuming it.  Because this different
> > > problem may actually be worse for some users, unlike [1], the series
> > > posted here makes the virtual counter offset adjustment optional and not
> > > even enabled by default.  Besides the adjustment being optional, this
> > > series approaches the needed changes differently to apply them in more
> > > appropriate locations.  Finally, unlike [1], this series doesn't attempt
> > > to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> > > only ticks when the VM is not stopped, is sufficient.
> > 
> > So I guess my overall question is "what is the x86 solution to
> > this problem, and why is this all arm-specific?" It would also
> 
> x86 adjusts the counter offset by default, and I don't think there's any
> way to turn that behavior off. I think it's too late to follow that
> default for arm, but this series provides a way to opt into the same
> behavior.

My understanding is that turning kvm-adjvtime either on or off
results in a different set of advantages and drawbacks, with neither
begin a one-size-fits-all solution. So it's good that we offer a way
for the user to pick one or the other based on their specific needs.

The main difference is that kvm-adjvtime=on matches x86's behavior,
which is fairly well understood and thoroughly documented, along with
the corresponding workarounds, dos and don'ts.

With that in mind, in my opinion it would make sense to make
kvm-adjvtime=on the behavior for newer machine types so that people
coming from an x86 background and/or having to manage both at the
same time will get a consistent experience and will be able to draw,
even for aarch64, on the considerable amount of existing x86-centric
literature on the subject.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Peter Maydell 4 years, 4 months ago

On Tue, 10 Dec 2019 at 09:51, Andrea Bolognani <abologna@redhat.com> wrote:
>
> On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote:
> > On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote:
> > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
> > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about
> > > > a year ago.  The problem described in the cover letter of [1] is easily
> > > > reproducible and some users would like to have the option to avoid it.
> > > > However the solution, which is to adjust the virtual counter offset each
> > > > time the VM transitions to the running state, introduces a different
> > > > problem, which is that the virtual and physical counters diverge.  As
> > > > described in the cover letter of [1] this divergence is easily observed
> > > > when comparing the output of `date` and `hwclock` after suspending the
> > > > guest, waiting a while, and then resuming it.  Because this different
> > > > problem may actually be worse for some users, unlike [1], the series
> > > > posted here makes the virtual counter offset adjustment optional and not
> > > > even enabled by default.  Besides the adjustment being optional, this
> > > > series approaches the needed changes differently to apply them in more
> > > > appropriate locations.  Finally, unlike [1], this series doesn't attempt
> > > > to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> > > > only ticks when the VM is not stopped, is sufficient.
> > >
> > > So I guess my overall question is "what is the x86 solution to
> > > this problem, and why is this all arm-specific?" It would also
> >
> > x86 adjusts the counter offset by default, and I don't think there's any
> > way to turn that behavior off. I think it's too late to follow that
> > default for arm, but this series provides a way to opt into the same
> > behavior.
>
> My understanding is that turning kvm-adjvtime either on or off
> results in a different set of advantages and drawbacks, with neither
> begin a one-size-fits-all solution. So it's good that we offer a way
> for the user to pick one or the other based on their specific needs.

If this is the case, shouldn't we be looking at having the
option exist for all architectures, not just arm? Obviously
pre-existing behaviour would imply having the default have
to differ for some archs/machines.

thanks
-- PMM

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Tue, Dec 10, 2019 at 10:29:22AM +0000, Peter Maydell wrote:
> On Tue, 10 Dec 2019 at 09:51, Andrea Bolognani <abologna@redhat.com> wrote:
> >
> > On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote:
> > > On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote:
> > > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
> > > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about
> > > > > a year ago.  The problem described in the cover letter of [1] is easily
> > > > > reproducible and some users would like to have the option to avoid it.
> > > > > However the solution, which is to adjust the virtual counter offset each
> > > > > time the VM transitions to the running state, introduces a different
> > > > > problem, which is that the virtual and physical counters diverge.  As
> > > > > described in the cover letter of [1] this divergence is easily observed
> > > > > when comparing the output of `date` and `hwclock` after suspending the
> > > > > guest, waiting a while, and then resuming it.  Because this different
> > > > > problem may actually be worse for some users, unlike [1], the series
> > > > > posted here makes the virtual counter offset adjustment optional and not
> > > > > even enabled by default.  Besides the adjustment being optional, this
> > > > > series approaches the needed changes differently to apply them in more
> > > > > appropriate locations.  Finally, unlike [1], this series doesn't attempt
> > > > > to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> > > > > only ticks when the VM is not stopped, is sufficient.
> > > >
> > > > So I guess my overall question is "what is the x86 solution to
> > > > this problem, and why is this all arm-specific?" It would also
> > >
> > > x86 adjusts the counter offset by default, and I don't think there's any
> > > way to turn that behavior off. I think it's too late to follow that
> > > default for arm, but this series provides a way to opt into the same
> > > behavior.
> >
> > My understanding is that turning kvm-adjvtime either on or off
> > results in a different set of advantages and drawbacks, with neither
> > begin a one-size-fits-all solution. So it's good that we offer a way
> > for the user to pick one or the other based on their specific needs.
> 
> If this is the case, shouldn't we be looking at having the
> option exist for all architectures, not just arm? Obviously
> pre-existing behaviour would imply having the default have
> to differ for some archs/machines.
>

x86 developers could easily add this feature if/when they need a way to
disable their current default behavior. But, since the kvm-adjvtime
default would likely be 'on' for them, then they'd probably prefer the
feature be named kvm-no-adjvtime, and default 'off'. Should we try to
anticipate what x86 might want when naming this feature? IMO, we should
not, especially because I'm doubtful that x86 will ever want to implement
it. Also, what about the other KVM capable architectures? Which defaults
do they have now? And do we expect them to want to expose a switch to the
user to change it?

OTOH, I agree with Andrea that it would be nice if arm had the same
default as x86, allowing the documentation regarding this stuff to apply
to both. If we did choose to turn this feature on by default for virt-5.0,
then maybe we should introduce the feature with the name kvm-no-adjvtime
instead.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Peter Maydell 4 years, 4 months ago

On Tue, 10 Dec 2019 at 11:05, Andrew Jones <drjones@redhat.com> wrote:
> x86 developers could easily add this feature if/when they need a way to
> disable their current default behavior. But, since the kvm-adjvtime
> default would likely be 'on' for them, then they'd probably prefer the
> feature be named kvm-no-adjvtime, and default 'off'. Should we try to
> anticipate what x86 might want when naming this feature? IMO, we should
> not, especially because I'm doubtful that x86 will ever want to implement
> it. Also, what about the other KVM capable architectures? Which defaults
> do they have now? And do we expect them to want to expose a switch to the
> user to change it?

My perspective here is mostly that I don't really understand
the ins and outs of KVM and in particular handling of
time in VMs, beyond knowing that it's complicated. So I
prefer approaches that push back to "do everything the same
for all architectures rather than having something that's
arm-specific", because then things get more review from
the larger mass of non-arm KVM/QEMU developers. Arm-specific
switches/interfaces/designs just make arm more of a
special-snowflake. I don't really have a basis to be able
to review the patchset beyond those general biases.

thanks
-- PMM

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Tue, Dec 10, 2019 at 11:48:38AM +0000, Peter Maydell wrote:
> On Tue, 10 Dec 2019 at 11:05, Andrew Jones <drjones@redhat.com> wrote:
> > x86 developers could easily add this feature if/when they need a way to
> > disable their current default behavior. But, since the kvm-adjvtime
> > default would likely be 'on' for them, then they'd probably prefer the
> > feature be named kvm-no-adjvtime, and default 'off'. Should we try to
> > anticipate what x86 might want when naming this feature? IMO, we should
> > not, especially because I'm doubtful that x86 will ever want to implement
> > it. Also, what about the other KVM capable architectures? Which defaults
> > do they have now? And do we expect them to want to expose a switch to the
> > user to change it?
> 
> My perspective here is mostly that I don't really understand
> the ins and outs of KVM and in particular handling of
> time in VMs, beyond knowing that it's complicated. So I
> prefer approaches that push back to "do everything the same
> for all architectures rather than having something that's
> arm-specific", because then things get more review from
> the larger mass of non-arm KVM/QEMU developers. Arm-specific
> switches/interfaces/designs just make arm more of a
> special-snowflake. I don't really have a basis to be able
> to review the patchset beyond those general biases.
>

So the ins and outs of this particular timekeeping issue (to the best of
my knowledge) is that x86 has implemented this behavior since
00f4d64ee76e ("kvmclock: clock should count only if vm is running"), which
was committed over six years ago. Possibly x86 VM time would behave more
like arm VM time if kvmclock was disabled, but that's not a recommended
configuration.

PPC got an equivalent patch to the x86 one in 2017, 42043e4f1241 ("spapr:
clock should count only if vm is running"), but when stopping time during
pause on spapr they actually *keep* 'date' and 'hwclock' in synch. I guess
whatever clocksource 'hwclock' uses on spapr was already stopping when
paused? For x86 those values diverge, and for arm without this series they
stay the same but experience jumps, and with this series they diverge like
x86. I don't see any way to disable the behaviour 42043e4f1241 introduces.

s390x got what appears to be its equivalent patch last year 9bc9d3d1ae3b
("s390x/tod: Properly stop the KVM TOD while the guest is not running").
The commit message doesn't state how hwclock and date values change /
don't change, and I don't see any way to disable the behavior.

MIPS has had this implemented since KVM support was introduced. No way
to disable it that I know of.

So why is this arm-specific? arm is just trying to catch up, but also
offer a switch allowing the current behavior to be selected. If other
architectures see value in the switch then they're free to adopt it too.
After having done this git mining, it looks more and more like we should
at least consider naming this feature 'kvm-no-adjvtime' and probably
also change arm's default.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrea Bolognani 4 years, 4 months ago

On Tue, 2019-12-10 at 14:32 +0100, Andrew Jones wrote:
> After having done this git mining, it looks more and more like we should
> at least consider naming this feature 'kvm-no-adjvtime' and probably
> also change arm's default.

I agree with everything except the naming: why would

  kvm-no-adjvtime=off  vtime is adjusted      (default)
  kvm-no-adjvtime=on   vtime is not adjusted

be better than

  kvm-adjvtime=on   vtime is adjusted      (default)
  kvm-adjvtime=off  vtime is not adjusted

? Both offer the exact same amount of flexibility, but the latter has
the advantage of not introducing any unwieldy double negatives.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote:
> On Tue, 2019-12-10 at 14:32 +0100, Andrew Jones wrote:
> > After having done this git mining, it looks more and more like we should
> > at least consider naming this feature 'kvm-no-adjvtime' and probably
> > also change arm's default.
> 
> I agree with everything except the naming: why would
> 
>   kvm-no-adjvtime=off  vtime is adjusted      (default)
>   kvm-no-adjvtime=on   vtime is not adjusted
> 
> be better than
> 
>   kvm-adjvtime=on   vtime is adjusted      (default)
>   kvm-adjvtime=off  vtime is not adjusted
> 
> ? Both offer the exact same amount of flexibility, but the latter has
> the advantage of not introducing any unwieldy double negatives.
>

A default of 'off' == 0 means not setting anything at all. There's
already precedent for 'kvm-no*' prefixed cpu features,

kvm-no-smi-migration
kvm-nopiodelay

(Unless there's something called a 'nopio' then I guess that one is
missing a -, but whatever...)

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrea Bolognani 4 years, 4 months ago

On Tue, 2019-12-10 at 15:33 +0100, Andrew Jones wrote:
> On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote:
> > I agree with everything except the naming: why would
> > 
> >   kvm-no-adjvtime=off  vtime is adjusted      (default)
> >   kvm-no-adjvtime=on   vtime is not adjusted
> > 
> > be better than
> > 
> >   kvm-adjvtime=on   vtime is adjusted      (default)
> >   kvm-adjvtime=off  vtime is not adjusted
> > 
> > ? Both offer the exact same amount of flexibility, but the latter has
> > the advantage of not introducing any unwieldy double negatives.
> 
> A default of 'off' == 0 means not setting anything at all. There's
> already precedent for 'kvm-no*' prefixed cpu features,
> 
> kvm-no-smi-migration
> kvm-nopiodelay

Sorry, I'm not sure I understand. Do you mean that the array where
you store CPU features is 0-inizialized, so it's more convenient to
have off (0) as the default because it means you don't have to touch
it beforehand? Or something different?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Tue, Dec 10, 2019 at 04:47:49PM +0100, Andrea Bolognani wrote:
> On Tue, 2019-12-10 at 15:33 +0100, Andrew Jones wrote:
> > On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote:
> > > I agree with everything except the naming: why would
> > > 
> > >   kvm-no-adjvtime=off  vtime is adjusted      (default)
> > >   kvm-no-adjvtime=on   vtime is not adjusted
> > > 
> > > be better than
> > > 
> > >   kvm-adjvtime=on   vtime is adjusted      (default)
> > >   kvm-adjvtime=off  vtime is not adjusted
> > > 
> > > ? Both offer the exact same amount of flexibility, but the latter has
> > > the advantage of not introducing any unwieldy double negatives.
> > 
> > A default of 'off' == 0 means not setting anything at all. There's
> > already precedent for 'kvm-no*' prefixed cpu features,
> > 
> > kvm-no-smi-migration
> > kvm-nopiodelay
> 
> Sorry, I'm not sure I understand. Do you mean that the array where
> you store CPU features is 0-inizialized, so it's more convenient to
> have off (0) as the default because it means you don't have to touch
> it beforehand? Or something different?
>

Right. The CPU feature flag (a boolean member of the CPU state) will
be zero by default because C. It's not a big deal, though, because the
property default can easily be set to true while it's added to a cpu
type.

I don't have a strong enough opinion about kvm-adjvtime vs.
kvm-no-adjvtime to insist one way or another. I agree double inversions
are easier to mess up, but I also like the way the '-no-' better
communicates that the default is [probably] 'yes'.

All interested parties, please vote. I'll be sending v2 soon and I can
call this thing anything the majority (or the dominate minority) prefer.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrea Bolognani 4 years, 4 months ago

On Tue, 2019-12-10 at 17:08 +0100, Andrew Jones wrote:
> I don't have a strong enough opinion about kvm-adjvtime vs.
> kvm-no-adjvtime to insist one way or another. I agree double inversions
> are easier to mess up, but I also like the way the '-no-' better
> communicates that the default is [probably] 'yes'.
> 
> All interested parties, please vote. I'll be sending v2 soon and I can
> call this thing anything the majority (or the dominate minority) prefer.

I like kvm-adjvtime better because it avoids the double negative,
but on the other hand if the new default is be to adjust the time
then I don't expect many people will actually need to use the
parameter, so the name doesn't matter that much after all :)

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Peter Maydell 4 years, 4 months ago

On Tue, 10 Dec 2019 at 13:33, Andrew Jones <drjones@redhat.com> wrote:
> So the ins and outs of this particular timekeeping issue (to the best of
> my knowledge) is that x86 has implemented this behavior since
> 00f4d64ee76e ("kvmclock: clock should count only if vm is running"), which
> was committed over six years ago. Possibly x86 VM time would behave more
> like arm VM time if kvmclock was disabled, but that's not a recommended
> configuration.
>
> PPC got an equivalent patch to the x86 one in 2017, 42043e4f1241 ("spapr:
> clock should count only if vm is running"), but when stopping time during
> pause on spapr they actually *keep* 'date' and 'hwclock' in synch. I guess
> whatever clocksource 'hwclock' uses on spapr was already stopping when
> paused? For x86 those values diverge, and for arm without this series they
> stay the same but experience jumps, and with this series they diverge like
> x86. I don't see any way to disable the behaviour 42043e4f1241 introduces.
>
> s390x got what appears to be its equivalent patch last year 9bc9d3d1ae3b
> ("s390x/tod: Properly stop the KVM TOD while the guest is not running").
> The commit message doesn't state how hwclock and date values change /
> don't change, and I don't see any way to disable the behavior.
>
> MIPS has had this implemented since KVM support was introduced. No way
> to disable it that I know of.
>
> So why is this arm-specific? arm is just trying to catch up, but also
> offer a switch allowing the current behavior to be selected. If other
> architectures see value in the switch then they're free to adopt it too.
> After having done this git mining, it looks more and more like we should
> at least consider naming this feature 'kvm-no-adjvtime' and probably
> also change arm's default.

Thanks for pulling up the handling by other architectures.
I think I agree that we should change the arm default (ie
we should just call this a bug fix, since the old behaviour
seems unhelpful generally and is more random accident than
a deliberate choice), with a switch provided just in case
anybody had something depending on the old behaviour.

-- PMM

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Guoheyi 4 years, 4 months ago

在 2019/12/6 23:22, Peter Maydell 写道:
> On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
>> v2:
>>   - move from RFC status to v1
>>   - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
>>   - add r-b's from Richard
>>
>>
>> This series is inspired by a series[1] posted by Bijan Mottahedeh about
>> a year ago.  The problem described in the cover letter of [1] is easily
>> reproducible and some users would like to have the option to avoid it.
>> However the solution, which is to adjust the virtual counter offset each
>> time the VM transitions to the running state, introduces a different
>> problem, which is that the virtual and physical counters diverge.  As
>> described in the cover letter of [1] this divergence is easily observed
>> when comparing the output of `date` and `hwclock` after suspending the
>> guest, waiting a while, and then resuming it.  Because this different
>> problem may actually be worse for some users, unlike [1], the series
>> posted here makes the virtual counter offset adjustment optional and not
>> even enabled by default.  Besides the adjustment being optional, this
>> series approaches the needed changes differently to apply them in more
>> appropriate locations.  Finally, unlike [1], this series doesn't attempt
>> to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
>> only ticks when the VM is not stopped, is sufficient.
> So I guess my overall question is "what is the x86 solution to
> this problem, and why is this all arm-specific?" It would also
> be helpful to know how it fits into all the other proposals regarding
> time in VMs.

I also sent a RFC in March and ARM KVM experts were also invoved in the 
discussion:

https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html

According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to 
request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing 
VM kernel about external force stop.

Thanks,

Heyi


>
> thanks
> -- PMM
>
>

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 4 months ago

On Wed, Dec 11, 2019 at 04:02:52PM +0800, Guoheyi wrote:
> 
> 在 2019/12/6 23:22, Peter Maydell 写道:
> > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
> > > v2:
> > >   - move from RFC status to v1
> > >   - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
> > >   - add r-b's from Richard
> > > 
> > > 
> > > This series is inspired by a series[1] posted by Bijan Mottahedeh about
> > > a year ago.  The problem described in the cover letter of [1] is easily
> > > reproducible and some users would like to have the option to avoid it.
> > > However the solution, which is to adjust the virtual counter offset each
> > > time the VM transitions to the running state, introduces a different
> > > problem, which is that the virtual and physical counters diverge.  As
> > > described in the cover letter of [1] this divergence is easily observed
> > > when comparing the output of `date` and `hwclock` after suspending the
> > > guest, waiting a while, and then resuming it.  Because this different
> > > problem may actually be worse for some users, unlike [1], the series
> > > posted here makes the virtual counter offset adjustment optional and not
> > > even enabled by default.  Besides the adjustment being optional, this
> > > series approaches the needed changes differently to apply them in more
> > > appropriate locations.  Finally, unlike [1], this series doesn't attempt
> > > to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> > > only ticks when the VM is not stopped, is sufficient.
> > So I guess my overall question is "what is the x86 solution to
> > this problem, and why is this all arm-specific?" It would also
> > be helpful to know how it fits into all the other proposals regarding
> > time in VMs.
> 
> I also sent a RFC in March and ARM KVM experts were also invoved in the
> discussion:
> 
> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html
> 
> According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to
> request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing VM
> kernel about external force stop.
> 
> Thanks,
> 
> Heyi

Hi Heyi,

Apologies for having forgotten about that thread. I recall now having
followed it last March. Indeed it even addresses the issue in the way
we're coming around to now (save/restore vs. update with virtual time).

I just reread the whole thread, and my feeling is that, while there are
still many issues left to work, until we get a pvclock for arm, a patch
like this one, but with a way to opt-in/out in order to give users a
chance to choose their poison, is the best we can do. Also a patch like
this one is a step in the right direction, as it will be needed as part
of the bigger pvclock solution eventually, just as it is for x86.

One comment on the patch is that I would prefer to do the save/restore
for all VCPUs, even though KVM does currently handle synchronization.
Not only does it "feel" more correct to apply VCPU APIs to all VCPUs,
but I assume it will be less problematic to implement CPU hotplug at
some point.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Guoheyi 4 years, 4 months ago

在 2019/12/11 17:00, Andrew Jones 写道:
> On Wed, Dec 11, 2019 at 04:02:52PM +0800, Guoheyi wrote:
>> 在 2019/12/6 23:22, Peter Maydell 写道:
>>> On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote:
>>>> v2:
>>>>    - move from RFC status to v1
>>>>    - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
>>>>    - add r-b's from Richard
>>>>
>>>>
>>>> This series is inspired by a series[1] posted by Bijan Mottahedeh about
>>>> a year ago.  The problem described in the cover letter of [1] is easily
>>>> reproducible and some users would like to have the option to avoid it.
>>>> However the solution, which is to adjust the virtual counter offset each
>>>> time the VM transitions to the running state, introduces a different
>>>> problem, which is that the virtual and physical counters diverge.  As
>>>> described in the cover letter of [1] this divergence is easily observed
>>>> when comparing the output of `date` and `hwclock` after suspending the
>>>> guest, waiting a while, and then resuming it.  Because this different
>>>> problem may actually be worse for some users, unlike [1], the series
>>>> posted here makes the virtual counter offset adjustment optional and not
>>>> even enabled by default.  Besides the adjustment being optional, this
>>>> series approaches the needed changes differently to apply them in more
>>>> appropriate locations.  Finally, unlike [1], this series doesn't attempt
>>>> to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
>>>> only ticks when the VM is not stopped, is sufficient.
>>> So I guess my overall question is "what is the x86 solution to
>>> this problem, and why is this all arm-specific?" It would also
>>> be helpful to know how it fits into all the other proposals regarding
>>> time in VMs.
>> I also sent a RFC in March and ARM KVM experts were also invoved in the
>> discussion:
>>
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html
>>
>> According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to
>> request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing VM
>> kernel about external force stop.
>>
>> Thanks,
>>
>> Heyi
> Hi Heyi,
>
> Apologies for having forgotten about that thread. I recall now having
> followed it last March. Indeed it even addresses the issue in the way
> we're coming around to now (save/restore vs. update with virtual time).

Never mind :) We were also blocked by this issue when trying to support 
VM suspend/resume, save/restore, etc, so I'm happy to see your patches 
that we have the chance to fix it.

>
> I just reread the whole thread, and my feeling is that, while there are
> still many issues left to work, until we get a pvclock for arm, a patch
> like this one, but with a way to opt-in/out in order to give users a
> chance to choose their poison, is the best we can do. Also a patch like
> this one is a step in the right direction, as it will be needed as part
> of the bigger pvclock solution eventually, just as it is for x86.

Yes, it is simple and it works with current version of ARM64 OS 
distributions.

Thanks,

Heyi

>
> One comment on the patch is that I would prefer to do the save/restore
> for all VCPUs, even though KVM does currently handle synchronization.
> Not only does it "feel" more correct to apply VCPU APIs to all VCPUs,
> but I assume it will be less problematic to implement CPU hotplug at
> some point.
>
> Thanks,
> drew
>
>
> .

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Masayoshi Mizuma 4 years, 6 months ago

Hi Drew,

Thank you for posting the patches, they seems to work well
because the softlockup is gone and the timestamp jump of
dmesg and ftrace record also disappeared after the guest
is virsh resume'ed.

Let me add comments.
I think the kvm-adjvtime behavior should be the default.
How about introducing 'kvm-noadjvtime' parameter instead?
kvm-noadjvtime is to provide the old behavior.

That is because the time jump occurs timeout for timers even though
the timer doesn't reach the timeout in the guest time.

For example, below flow shows that user and/or kernel sets timer A
for +10 sec and B for +20 sec in Guest, then Guest is suspended and
it passes 60 sec, then Guest is resumed. Timer A and B go off because
the Guest timestamp (TS) is jumped to 63. That is unexpected timer
behavior for Guest.

 Host TS(sec) Guest TS(sec) Event
 ============ ============= =============================
 00           00            Guest: Set timer A for +10 sec
 01           01            Guest: Set timer B for +20 sec
 03           03            Host: virsh suspend Guest
 63           63            Host: virsh resume Guest
                            Guest: Timer A and B go off

I believe kvm-adjvtime behavior is as following. So Time A
and B go off as expected time. So, kvm-adjvtime behavior should
be the default.

 Host TS(sec) Guest TS(sec) Event
 ============ ============= =============================
 00           00            Guest: Set timer A for +10 sec
 01           01            Guest: Set timer B for +20 sec
 03           03            Host: virsh suspend guest
 63           03            Host: virsh resume guest
 70           10            Guest: Timer A goes off
 81           21            Guest: Timer B goes off

Thanks,
Masa

On Wed, Oct 16, 2019 at 04:34:05PM +0200, Andrew Jones wrote:
> v2:
>  - move from RFC status to v1
>  - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
>  - add r-b's from Richard
> 
> 
> This series is inspired by a series[1] posted by Bijan Mottahedeh about
> a year ago.  The problem described in the cover letter of [1] is easily
> reproducible and some users would like to have the option to avoid it.
> However the solution, which is to adjust the virtual counter offset each
> time the VM transitions to the running state, introduces a different
> problem, which is that the virtual and physical counters diverge.  As
> described in the cover letter of [1] this divergence is easily observed
> when comparing the output of `date` and `hwclock` after suspending the
> guest, waiting a while, and then resuming it.  Because this different
> problem may actually be worse for some users, unlike [1], the series
> posted here makes the virtual counter offset adjustment optional and not
> even enabled by default.  Besides the adjustment being optional, this
> series approaches the needed changes differently to apply them in more
> appropriate locations.  Finally, unlike [1], this series doesn't attempt
> to measure "pause time" itself.  Simply using QEMU_CLOCK_VIRTUAL, which
> only ticks when the VM is not stopped, is sufficient.
> 
> I've based this series on the SVE series[2] because we're adding a new
> CPU feature (kvm-adjvtime) and the SVE series introduces CPU feature
> documentation, probing, and tests that we can then immediately apply
> to kvm-adjvtime.
> 
> Additional notes
> ----------------
> 
> Note 1
> ------
> 
> As described above, when running a guest with kvm-adjtime enabled it
> will be less likely the guest OS and guest applications get surprise
> time jumps when they use the virtual counter.  However the counter will
> no longer reflect real time.  It will lag behind.  If this is a problem
> then the guest can resynchronize its time from an external source or
> even from its physical counter.  If the suspend/resume is done with
> libvirt's virsh, and the guest is running the guest agent, then it's
> also possible to use a sequence like this
> 
>  $ virsh suspend $GUEST
>  $ virsh resume $GUEST
>  $ virsh domtime --sync $GUEST
> 
> in order to resynchronize a guest right after the resume.  Of course
> there will still be time when the clock is not right, possibly creating
> confusing timestamps in logs, for example, and the guest must still be
> tolerant to the time synchronizations.
> 
> Note 2
> ------
> 
> Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that
> the KVM register ID is not correct.  This cannot be fixed because it's
> UAPI and if the UAPI headers are used then it can't be a problem.
> However, if a userspace attempts to create the ID themselves from the
> register's specification, then they will get KVM_REG_ARM_TIMER_CVAL
> instead, as the _CNT and _CVAL definitions have their register
> parameters swapped.
> 
> Note 3
> ------
> 
> I didn't test this with a 32-bit KVM host, but the changes to kvm32.c
> are the same as kvm64.c. So what could go wrong? Test results would be
> appreciated.
>  
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html
> [2] https://patchew.org/QEMU/20191001125845.8793-1-drjones@redhat.com/
> 
> Thanks,
> drew
> 
> 
> Andrew Jones (5):
>   target/arm/kvm64: kvm64 cpus have timer registers
>   timer: arm: Introduce functions to get the host cntfrq
>   target/arm/kvm: Implement cpu feature kvm-adjvtime
>   tests/arm-cpu-features: Check feature default values
>   target/arm/cpu: Add the kvm-adjvtime CPU property
> 
>  docs/arm-cpu-features.rst | 27 +++++++++++++++++-
>  include/qemu/timer.h      | 16 +++++++++++
>  target/arm/cpu.c          |  2 ++
>  target/arm/cpu.h          |  3 ++
>  target/arm/cpu64.c        |  1 +
>  target/arm/kvm.c          | 59 +++++++++++++++++++++++++++++++++++++++
>  target/arm/kvm32.c        |  3 ++
>  target/arm/kvm64.c        |  4 +++
>  target/arm/kvm_arm.h      | 25 +++++++++++++++++
>  target/arm/monitor.c      |  1 +
>  tests/arm-cpu-features.c  | 48 +++++++++++++++++++++++++------
>  11 files changed, 179 insertions(+), 10 deletions(-)
> 
> -- 
> 2.21.0
> 
>

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Andrew Jones 4 years, 6 months ago

On Thu, Oct 17, 2019 at 05:17:59PM -0400, Masayoshi Mizuma wrote:
> Hi Drew,
> 
> Thank you for posting the patches, they seems to work well
> because the softlockup is gone and the timestamp jump of
> dmesg and ftrace record also disappeared after the guest
> is virsh resume'ed.
> 
> Let me add comments.
> I think the kvm-adjvtime behavior should be the default.
> How about introducing 'kvm-noadjvtime' parameter instead?
> kvm-noadjvtime is to provide the old behavior.
> 
> That is because the time jump occurs timeout for timers even though
> the timer doesn't reach the timeout in the guest time.
> 
> For example, below flow shows that user and/or kernel sets timer A
> for +10 sec and B for +20 sec in Guest, then Guest is suspended and
> it passes 60 sec, then Guest is resumed. Timer A and B go off because
> the Guest timestamp (TS) is jumped to 63. That is unexpected timer
> behavior for Guest.
> 
>  Host TS(sec) Guest TS(sec) Event
>  ============ ============= =============================
>  00           00            Guest: Set timer A for +10 sec
>  01           01            Guest: Set timer B for +20 sec
>  03           03            Host: virsh suspend Guest
>  63           63            Host: virsh resume Guest
>                             Guest: Timer A and B go off
> 
> I believe kvm-adjvtime behavior is as following. So Time A
> and B go off as expected time. So, kvm-adjvtime behavior should
> be the default.
> 
>  Host TS(sec) Guest TS(sec) Event
>  ============ ============= =============================
>  00           00            Guest: Set timer A for +10 sec
>  01           01            Guest: Set timer B for +20 sec
>  03           03            Host: virsh suspend guest
>  63           03            Host: virsh resume guest
>  70           10            Guest: Timer A goes off
>  81           21            Guest: Timer B goes off
> 

Thanks for the testing Masa. Your timer test is another good example of
what can happen when a guest is paused. I'm sure there are many other ways
a pause could be problematic as well, especially if the guest has devices
passed through to it that it's actively using. I also don't expect
kvm-adjvtime to solve all those problems (like, for example, potential
problems with passthrough devices, networks, etc.)  This means that guest
pausing should only be done by host admins that are intimately familiar
with the guest OS, workload, and network connections. They should be sure
that it can tolerate and recover from the pauses. Since the admins need to
make the decision to pause at all, then I think it's fair for them to also
decide if they want to try and mitigate some of the issues with
kvm-adjvtime, i.e. require them to enable it, rather than have it on by
default. Also, if we choose to enable it by default, then we'll need to
deal with the compatibility issues that come with changing a behavior.
That's not impossible, as this feature could be disabled for older
machine types, but it's messy.

All that said, I won't argue too hard against kvm-adjvtime being on by
default, but let's see if others like Peter or Marc want to chime in on
it too.

Thanks,
drew

Re: [PATCH v1 0/5] target/arm/kvm: Provide an option to adjust virtual time

Posted by Masayoshi Mizuma 4 years, 6 months ago

On Fri, Oct 18, 2019 at 02:02:52PM +0200, Andrew Jones wrote:
> On Thu, Oct 17, 2019 at 05:17:59PM -0400, Masayoshi Mizuma wrote:
> > Hi Drew,
> > 
> > Thank you for posting the patches, they seems to work well
> > because the softlockup is gone and the timestamp jump of
> > dmesg and ftrace record also disappeared after the guest
> > is virsh resume'ed.
> > 
> > Let me add comments.
> > I think the kvm-adjvtime behavior should be the default.
> > How about introducing 'kvm-noadjvtime' parameter instead?
> > kvm-noadjvtime is to provide the old behavior.
> > 
> > That is because the time jump occurs timeout for timers even though
> > the timer doesn't reach the timeout in the guest time.
> > 
> > For example, below flow shows that user and/or kernel sets timer A
> > for +10 sec and B for +20 sec in Guest, then Guest is suspended and
> > it passes 60 sec, then Guest is resumed. Timer A and B go off because
> > the Guest timestamp (TS) is jumped to 63. That is unexpected timer
> > behavior for Guest.
> > 
> >  Host TS(sec) Guest TS(sec) Event
> >  ============ ============= =============================
> >  00           00            Guest: Set timer A for +10 sec
> >  01           01            Guest: Set timer B for +20 sec
> >  03           03            Host: virsh suspend Guest
> >  63           63            Host: virsh resume Guest
> >                             Guest: Timer A and B go off
> > 
> > I believe kvm-adjvtime behavior is as following. So Time A
> > and B go off as expected time. So, kvm-adjvtime behavior should
> > be the default.
> > 
> >  Host TS(sec) Guest TS(sec) Event
> >  ============ ============= =============================
> >  00           00            Guest: Set timer A for +10 sec
> >  01           01            Guest: Set timer B for +20 sec
> >  03           03            Host: virsh suspend guest
> >  63           03            Host: virsh resume guest
> >  70           10            Guest: Timer A goes off
> >  81           21            Guest: Timer B goes off
> > 
> 
> Thanks for the testing Masa. Your timer test is another good example of
> what can happen when a guest is paused. I'm sure there are many other ways
> a pause could be problematic as well, especially if the guest has devices
> passed through to it that it's actively using. I also don't expect
> kvm-adjvtime to solve all those problems (like, for example, potential
> problems with passthrough devices, networks, etc.)  This means that guest
> pausing should only be done by host admins that are intimately familiar
> with the guest OS, workload, and network connections. They should be sure
> that it can tolerate and recover from the pauses. Since the admins need to
> make the decision to pause at all, then I think it's fair for them to also
> decide if they want to try and mitigate some of the issues with
> kvm-adjvtime, i.e. require them to enable it, rather than have it on by
> default. 

make sense to me, thanks.

>          Also, if we choose to enable it by default, then we'll need to
> deal with the compatibility issues that come with changing a behavior.
> That's not impossible, as this feature could be disabled for older
> machine types, but it's messy.

I agree with you, we shouldn't add such messy codes to resolve
the compatibility issues...

> 
> All that said, I won't argue too hard against kvm-adjvtime being on by
> default, but let's see if others like Peter or Marc want to chime in on
> it too.

Yeah, I look forward to their comments.

Thanks,
Masa