docs/arm-cpu-features.rst | 27 +++++++++++++++++- include/qemu/timer.h | 16 +++++++++++ target/arm/cpu.c | 2 ++ target/arm/cpu.h | 3 ++ target/arm/cpu64.c | 1 + target/arm/kvm.c | 59 +++++++++++++++++++++++++++++++++++++++ target/arm/kvm32.c | 3 ++ target/arm/kvm64.c | 4 +++ target/arm/kvm_arm.h | 25 +++++++++++++++++ target/arm/monitor.c | 1 + tests/arm-cpu-features.c | 48 +++++++++++++++++++++++++------ 11 files changed, 179 insertions(+), 10 deletions(-)
v2: - move from RFC status to v1 - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c - add r-b's from Richard This series is inspired by a series[1] posted by Bijan Mottahedeh about a year ago. The problem described in the cover letter of [1] is easily reproducible and some users would like to have the option to avoid it. However the solution, which is to adjust the virtual counter offset each time the VM transitions to the running state, introduces a different problem, which is that the virtual and physical counters diverge. As described in the cover letter of [1] this divergence is easily observed when comparing the output of `date` and `hwclock` after suspending the guest, waiting a while, and then resuming it. Because this different problem may actually be worse for some users, unlike [1], the series posted here makes the virtual counter offset adjustment optional and not even enabled by default. Besides the adjustment being optional, this series approaches the needed changes differently to apply them in more appropriate locations. Finally, unlike [1], this series doesn't attempt to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which only ticks when the VM is not stopped, is sufficient. I've based this series on the SVE series[2] because we're adding a new CPU feature (kvm-adjvtime) and the SVE series introduces CPU feature documentation, probing, and tests that we can then immediately apply to kvm-adjvtime. Additional notes ---------------- Note 1 ------ As described above, when running a guest with kvm-adjtime enabled it will be less likely the guest OS and guest applications get surprise time jumps when they use the virtual counter. However the counter will no longer reflect real time. It will lag behind. If this is a problem then the guest can resynchronize its time from an external source or even from its physical counter. If the suspend/resume is done with libvirt's virsh, and the guest is running the guest agent, then it's also possible to use a sequence like this $ virsh suspend $GUEST $ virsh resume $GUEST $ virsh domtime --sync $GUEST in order to resynchronize a guest right after the resume. Of course there will still be time when the clock is not right, possibly creating confusing timestamps in logs, for example, and the guest must still be tolerant to the time synchronizations. Note 2 ------ Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that the KVM register ID is not correct. This cannot be fixed because it's UAPI and if the UAPI headers are used then it can't be a problem. However, if a userspace attempts to create the ID themselves from the register's specification, then they will get KVM_REG_ARM_TIMER_CVAL instead, as the _CNT and _CVAL definitions have their register parameters swapped. Note 3 ------ I didn't test this with a 32-bit KVM host, but the changes to kvm32.c are the same as kvm64.c. So what could go wrong? Test results would be appreciated. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html [2] https://patchew.org/QEMU/20191001125845.8793-1-drjones@redhat.com/ Thanks, drew Andrew Jones (5): target/arm/kvm64: kvm64 cpus have timer registers timer: arm: Introduce functions to get the host cntfrq target/arm/kvm: Implement cpu feature kvm-adjvtime tests/arm-cpu-features: Check feature default values target/arm/cpu: Add the kvm-adjvtime CPU property docs/arm-cpu-features.rst | 27 +++++++++++++++++- include/qemu/timer.h | 16 +++++++++++ target/arm/cpu.c | 2 ++ target/arm/cpu.h | 3 ++ target/arm/cpu64.c | 1 + target/arm/kvm.c | 59 +++++++++++++++++++++++++++++++++++++++ target/arm/kvm32.c | 3 ++ target/arm/kvm64.c | 4 +++ target/arm/kvm_arm.h | 25 +++++++++++++++++ target/arm/monitor.c | 1 + tests/arm-cpu-features.c | 48 +++++++++++++++++++++++++------ 11 files changed, 179 insertions(+), 10 deletions(-) -- 2.21.0
On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > v2: > - move from RFC status to v1 > - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c > - add r-b's from Richard > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > a year ago. The problem described in the cover letter of [1] is easily > reproducible and some users would like to have the option to avoid it. > However the solution, which is to adjust the virtual counter offset each > time the VM transitions to the running state, introduces a different > problem, which is that the virtual and physical counters diverge. As > described in the cover letter of [1] this divergence is easily observed > when comparing the output of `date` and `hwclock` after suspending the > guest, waiting a while, and then resuming it. Because this different > problem may actually be worse for some users, unlike [1], the series > posted here makes the virtual counter offset adjustment optional and not > even enabled by default. Besides the adjustment being optional, this > series approaches the needed changes differently to apply them in more > appropriate locations. Finally, unlike [1], this series doesn't attempt > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > only ticks when the VM is not stopped, is sufficient. So I guess my overall question is "what is the x86 solution to this problem, and why is this all arm-specific?" It would also be helpful to know how it fits into all the other proposals regarding time in VMs. thanks -- PMM
On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote: > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > > > v2: > > - move from RFC status to v1 > > - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c > > - add r-b's from Richard > > > > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > > a year ago. The problem described in the cover letter of [1] is easily > > reproducible and some users would like to have the option to avoid it. > > However the solution, which is to adjust the virtual counter offset each > > time the VM transitions to the running state, introduces a different > > problem, which is that the virtual and physical counters diverge. As > > described in the cover letter of [1] this divergence is easily observed > > when comparing the output of `date` and `hwclock` after suspending the > > guest, waiting a while, and then resuming it. Because this different > > problem may actually be worse for some users, unlike [1], the series > > posted here makes the virtual counter offset adjustment optional and not > > even enabled by default. Besides the adjustment being optional, this > > series approaches the needed changes differently to apply them in more > > appropriate locations. Finally, unlike [1], this series doesn't attempt > > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > > only ticks when the VM is not stopped, is sufficient. > > So I guess my overall question is "what is the x86 solution to > this problem, and why is this all arm-specific?" It would also x86 adjusts the counter offset by default, and I don't think there's any way to turn that behavior off. I think it's too late to follow that default for arm, but this series provides a way to opt into the same behavior. > be helpful to know how it fits into all the other proposals regarding > time in VMs. I've been lightly following the other stuff, but haven't yet seen any overlap. BTW, this series needs to be rebased and reposted. I've been waiting for 4.2 though. Thanks, drew
On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote: > On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote: > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > > > a year ago. The problem described in the cover letter of [1] is easily > > > reproducible and some users would like to have the option to avoid it. > > > However the solution, which is to adjust the virtual counter offset each > > > time the VM transitions to the running state, introduces a different > > > problem, which is that the virtual and physical counters diverge. As > > > described in the cover letter of [1] this divergence is easily observed > > > when comparing the output of `date` and `hwclock` after suspending the > > > guest, waiting a while, and then resuming it. Because this different > > > problem may actually be worse for some users, unlike [1], the series > > > posted here makes the virtual counter offset adjustment optional and not > > > even enabled by default. Besides the adjustment being optional, this > > > series approaches the needed changes differently to apply them in more > > > appropriate locations. Finally, unlike [1], this series doesn't attempt > > > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > > > only ticks when the VM is not stopped, is sufficient. > > > > So I guess my overall question is "what is the x86 solution to > > this problem, and why is this all arm-specific?" It would also > > x86 adjusts the counter offset by default, and I don't think there's any > way to turn that behavior off. I think it's too late to follow that > default for arm, but this series provides a way to opt into the same > behavior. My understanding is that turning kvm-adjvtime either on or off results in a different set of advantages and drawbacks, with neither begin a one-size-fits-all solution. So it's good that we offer a way for the user to pick one or the other based on their specific needs. The main difference is that kvm-adjvtime=on matches x86's behavior, which is fairly well understood and thoroughly documented, along with the corresponding workarounds, dos and don'ts. With that in mind, in my opinion it would make sense to make kvm-adjvtime=on the behavior for newer machine types so that people coming from an x86 background and/or having to manage both at the same time will get a consistent experience and will be able to draw, even for aarch64, on the considerable amount of existing x86-centric literature on the subject. -- Andrea Bolognani / Red Hat / Virtualization
On Tue, 10 Dec 2019 at 09:51, Andrea Bolognani <abologna@redhat.com> wrote: > > On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote: > > On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote: > > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > > > > a year ago. The problem described in the cover letter of [1] is easily > > > > reproducible and some users would like to have the option to avoid it. > > > > However the solution, which is to adjust the virtual counter offset each > > > > time the VM transitions to the running state, introduces a different > > > > problem, which is that the virtual and physical counters diverge. As > > > > described in the cover letter of [1] this divergence is easily observed > > > > when comparing the output of `date` and `hwclock` after suspending the > > > > guest, waiting a while, and then resuming it. Because this different > > > > problem may actually be worse for some users, unlike [1], the series > > > > posted here makes the virtual counter offset adjustment optional and not > > > > even enabled by default. Besides the adjustment being optional, this > > > > series approaches the needed changes differently to apply them in more > > > > appropriate locations. Finally, unlike [1], this series doesn't attempt > > > > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > > > > only ticks when the VM is not stopped, is sufficient. > > > > > > So I guess my overall question is "what is the x86 solution to > > > this problem, and why is this all arm-specific?" It would also > > > > x86 adjusts the counter offset by default, and I don't think there's any > > way to turn that behavior off. I think it's too late to follow that > > default for arm, but this series provides a way to opt into the same > > behavior. > > My understanding is that turning kvm-adjvtime either on or off > results in a different set of advantages and drawbacks, with neither > begin a one-size-fits-all solution. So it's good that we offer a way > for the user to pick one or the other based on their specific needs. If this is the case, shouldn't we be looking at having the option exist for all architectures, not just arm? Obviously pre-existing behaviour would imply having the default have to differ for some archs/machines. thanks -- PMM
On Tue, Dec 10, 2019 at 10:29:22AM +0000, Peter Maydell wrote: > On Tue, 10 Dec 2019 at 09:51, Andrea Bolognani <abologna@redhat.com> wrote: > > > > On Fri, 2019-12-06 at 16:53 +0100, Andrew Jones wrote: > > > On Fri, Dec 06, 2019 at 03:22:58PM +0000, Peter Maydell wrote: > > > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > > > > > a year ago. The problem described in the cover letter of [1] is easily > > > > > reproducible and some users would like to have the option to avoid it. > > > > > However the solution, which is to adjust the virtual counter offset each > > > > > time the VM transitions to the running state, introduces a different > > > > > problem, which is that the virtual and physical counters diverge. As > > > > > described in the cover letter of [1] this divergence is easily observed > > > > > when comparing the output of `date` and `hwclock` after suspending the > > > > > guest, waiting a while, and then resuming it. Because this different > > > > > problem may actually be worse for some users, unlike [1], the series > > > > > posted here makes the virtual counter offset adjustment optional and not > > > > > even enabled by default. Besides the adjustment being optional, this > > > > > series approaches the needed changes differently to apply them in more > > > > > appropriate locations. Finally, unlike [1], this series doesn't attempt > > > > > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > > > > > only ticks when the VM is not stopped, is sufficient. > > > > > > > > So I guess my overall question is "what is the x86 solution to > > > > this problem, and why is this all arm-specific?" It would also > > > > > > x86 adjusts the counter offset by default, and I don't think there's any > > > way to turn that behavior off. I think it's too late to follow that > > > default for arm, but this series provides a way to opt into the same > > > behavior. > > > > My understanding is that turning kvm-adjvtime either on or off > > results in a different set of advantages and drawbacks, with neither > > begin a one-size-fits-all solution. So it's good that we offer a way > > for the user to pick one or the other based on their specific needs. > > If this is the case, shouldn't we be looking at having the > option exist for all architectures, not just arm? Obviously > pre-existing behaviour would imply having the default have > to differ for some archs/machines. > x86 developers could easily add this feature if/when they need a way to disable their current default behavior. But, since the kvm-adjvtime default would likely be 'on' for them, then they'd probably prefer the feature be named kvm-no-adjvtime, and default 'off'. Should we try to anticipate what x86 might want when naming this feature? IMO, we should not, especially because I'm doubtful that x86 will ever want to implement it. Also, what about the other KVM capable architectures? Which defaults do they have now? And do we expect them to want to expose a switch to the user to change it? OTOH, I agree with Andrea that it would be nice if arm had the same default as x86, allowing the documentation regarding this stuff to apply to both. If we did choose to turn this feature on by default for virt-5.0, then maybe we should introduce the feature with the name kvm-no-adjvtime instead. Thanks, drew
On Tue, 10 Dec 2019 at 11:05, Andrew Jones <drjones@redhat.com> wrote: > x86 developers could easily add this feature if/when they need a way to > disable their current default behavior. But, since the kvm-adjvtime > default would likely be 'on' for them, then they'd probably prefer the > feature be named kvm-no-adjvtime, and default 'off'. Should we try to > anticipate what x86 might want when naming this feature? IMO, we should > not, especially because I'm doubtful that x86 will ever want to implement > it. Also, what about the other KVM capable architectures? Which defaults > do they have now? And do we expect them to want to expose a switch to the > user to change it? My perspective here is mostly that I don't really understand the ins and outs of KVM and in particular handling of time in VMs, beyond knowing that it's complicated. So I prefer approaches that push back to "do everything the same for all architectures rather than having something that's arm-specific", because then things get more review from the larger mass of non-arm KVM/QEMU developers. Arm-specific switches/interfaces/designs just make arm more of a special-snowflake. I don't really have a basis to be able to review the patchset beyond those general biases. thanks -- PMM
On Tue, Dec 10, 2019 at 11:48:38AM +0000, Peter Maydell wrote: > On Tue, 10 Dec 2019 at 11:05, Andrew Jones <drjones@redhat.com> wrote: > > x86 developers could easily add this feature if/when they need a way to > > disable their current default behavior. But, since the kvm-adjvtime > > default would likely be 'on' for them, then they'd probably prefer the > > feature be named kvm-no-adjvtime, and default 'off'. Should we try to > > anticipate what x86 might want when naming this feature? IMO, we should > > not, especially because I'm doubtful that x86 will ever want to implement > > it. Also, what about the other KVM capable architectures? Which defaults > > do they have now? And do we expect them to want to expose a switch to the > > user to change it? > > My perspective here is mostly that I don't really understand > the ins and outs of KVM and in particular handling of > time in VMs, beyond knowing that it's complicated. So I > prefer approaches that push back to "do everything the same > for all architectures rather than having something that's > arm-specific", because then things get more review from > the larger mass of non-arm KVM/QEMU developers. Arm-specific > switches/interfaces/designs just make arm more of a > special-snowflake. I don't really have a basis to be able > to review the patchset beyond those general biases. > So the ins and outs of this particular timekeeping issue (to the best of my knowledge) is that x86 has implemented this behavior since 00f4d64ee76e ("kvmclock: clock should count only if vm is running"), which was committed over six years ago. Possibly x86 VM time would behave more like arm VM time if kvmclock was disabled, but that's not a recommended configuration. PPC got an equivalent patch to the x86 one in 2017, 42043e4f1241 ("spapr: clock should count only if vm is running"), but when stopping time during pause on spapr they actually *keep* 'date' and 'hwclock' in synch. I guess whatever clocksource 'hwclock' uses on spapr was already stopping when paused? For x86 those values diverge, and for arm without this series they stay the same but experience jumps, and with this series they diverge like x86. I don't see any way to disable the behaviour 42043e4f1241 introduces. s390x got what appears to be its equivalent patch last year 9bc9d3d1ae3b ("s390x/tod: Properly stop the KVM TOD while the guest is not running"). The commit message doesn't state how hwclock and date values change / don't change, and I don't see any way to disable the behavior. MIPS has had this implemented since KVM support was introduced. No way to disable it that I know of. So why is this arm-specific? arm is just trying to catch up, but also offer a switch allowing the current behavior to be selected. If other architectures see value in the switch then they're free to adopt it too. After having done this git mining, it looks more and more like we should at least consider naming this feature 'kvm-no-adjvtime' and probably also change arm's default. Thanks, drew
On Tue, 2019-12-10 at 14:32 +0100, Andrew Jones wrote: > After having done this git mining, it looks more and more like we should > at least consider naming this feature 'kvm-no-adjvtime' and probably > also change arm's default. I agree with everything except the naming: why would kvm-no-adjvtime=off vtime is adjusted (default) kvm-no-adjvtime=on vtime is not adjusted be better than kvm-adjvtime=on vtime is adjusted (default) kvm-adjvtime=off vtime is not adjusted ? Both offer the exact same amount of flexibility, but the latter has the advantage of not introducing any unwieldy double negatives. -- Andrea Bolognani / Red Hat / Virtualization
On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote: > On Tue, 2019-12-10 at 14:32 +0100, Andrew Jones wrote: > > After having done this git mining, it looks more and more like we should > > at least consider naming this feature 'kvm-no-adjvtime' and probably > > also change arm's default. > > I agree with everything except the naming: why would > > kvm-no-adjvtime=off vtime is adjusted (default) > kvm-no-adjvtime=on vtime is not adjusted > > be better than > > kvm-adjvtime=on vtime is adjusted (default) > kvm-adjvtime=off vtime is not adjusted > > ? Both offer the exact same amount of flexibility, but the latter has > the advantage of not introducing any unwieldy double negatives. > A default of 'off' == 0 means not setting anything at all. There's already precedent for 'kvm-no*' prefixed cpu features, kvm-no-smi-migration kvm-nopiodelay (Unless there's something called a 'nopio' then I guess that one is missing a -, but whatever...) Thanks, drew
On Tue, 2019-12-10 at 15:33 +0100, Andrew Jones wrote: > On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote: > > I agree with everything except the naming: why would > > > > kvm-no-adjvtime=off vtime is adjusted (default) > > kvm-no-adjvtime=on vtime is not adjusted > > > > be better than > > > > kvm-adjvtime=on vtime is adjusted (default) > > kvm-adjvtime=off vtime is not adjusted > > > > ? Both offer the exact same amount of flexibility, but the latter has > > the advantage of not introducing any unwieldy double negatives. > > A default of 'off' == 0 means not setting anything at all. There's > already precedent for 'kvm-no*' prefixed cpu features, > > kvm-no-smi-migration > kvm-nopiodelay Sorry, I'm not sure I understand. Do you mean that the array where you store CPU features is 0-inizialized, so it's more convenient to have off (0) as the default because it means you don't have to touch it beforehand? Or something different? -- Andrea Bolognani / Red Hat / Virtualization
On Tue, Dec 10, 2019 at 04:47:49PM +0100, Andrea Bolognani wrote: > On Tue, 2019-12-10 at 15:33 +0100, Andrew Jones wrote: > > On Tue, Dec 10, 2019 at 03:21:02PM +0100, Andrea Bolognani wrote: > > > I agree with everything except the naming: why would > > > > > > kvm-no-adjvtime=off vtime is adjusted (default) > > > kvm-no-adjvtime=on vtime is not adjusted > > > > > > be better than > > > > > > kvm-adjvtime=on vtime is adjusted (default) > > > kvm-adjvtime=off vtime is not adjusted > > > > > > ? Both offer the exact same amount of flexibility, but the latter has > > > the advantage of not introducing any unwieldy double negatives. > > > > A default of 'off' == 0 means not setting anything at all. There's > > already precedent for 'kvm-no*' prefixed cpu features, > > > > kvm-no-smi-migration > > kvm-nopiodelay > > Sorry, I'm not sure I understand. Do you mean that the array where > you store CPU features is 0-inizialized, so it's more convenient to > have off (0) as the default because it means you don't have to touch > it beforehand? Or something different? > Right. The CPU feature flag (a boolean member of the CPU state) will be zero by default because C. It's not a big deal, though, because the property default can easily be set to true while it's added to a cpu type. I don't have a strong enough opinion about kvm-adjvtime vs. kvm-no-adjvtime to insist one way or another. I agree double inversions are easier to mess up, but I also like the way the '-no-' better communicates that the default is [probably] 'yes'. All interested parties, please vote. I'll be sending v2 soon and I can call this thing anything the majority (or the dominate minority) prefer. Thanks, drew
On Tue, 2019-12-10 at 17:08 +0100, Andrew Jones wrote: > I don't have a strong enough opinion about kvm-adjvtime vs. > kvm-no-adjvtime to insist one way or another. I agree double inversions > are easier to mess up, but I also like the way the '-no-' better > communicates that the default is [probably] 'yes'. > > All interested parties, please vote. I'll be sending v2 soon and I can > call this thing anything the majority (or the dominate minority) prefer. I like kvm-adjvtime better because it avoids the double negative, but on the other hand if the new default is be to adjust the time then I don't expect many people will actually need to use the parameter, so the name doesn't matter that much after all :) -- Andrea Bolognani / Red Hat / Virtualization
On Tue, 10 Dec 2019 at 13:33, Andrew Jones <drjones@redhat.com> wrote: > So the ins and outs of this particular timekeeping issue (to the best of > my knowledge) is that x86 has implemented this behavior since > 00f4d64ee76e ("kvmclock: clock should count only if vm is running"), which > was committed over six years ago. Possibly x86 VM time would behave more > like arm VM time if kvmclock was disabled, but that's not a recommended > configuration. > > PPC got an equivalent patch to the x86 one in 2017, 42043e4f1241 ("spapr: > clock should count only if vm is running"), but when stopping time during > pause on spapr they actually *keep* 'date' and 'hwclock' in synch. I guess > whatever clocksource 'hwclock' uses on spapr was already stopping when > paused? For x86 those values diverge, and for arm without this series they > stay the same but experience jumps, and with this series they diverge like > x86. I don't see any way to disable the behaviour 42043e4f1241 introduces. > > s390x got what appears to be its equivalent patch last year 9bc9d3d1ae3b > ("s390x/tod: Properly stop the KVM TOD while the guest is not running"). > The commit message doesn't state how hwclock and date values change / > don't change, and I don't see any way to disable the behavior. > > MIPS has had this implemented since KVM support was introduced. No way > to disable it that I know of. > > So why is this arm-specific? arm is just trying to catch up, but also > offer a switch allowing the current behavior to be selected. If other > architectures see value in the switch then they're free to adopt it too. > After having done this git mining, it looks more and more like we should > at least consider naming this feature 'kvm-no-adjvtime' and probably > also change arm's default. Thanks for pulling up the handling by other architectures. I think I agree that we should change the arm default (ie we should just call this a bug fix, since the old behaviour seems unhelpful generally and is more random accident than a deliberate choice), with a switch provided just in case anybody had something depending on the old behaviour. -- PMM
在 2019/12/6 23:22, Peter Maydell 写道: > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: >> v2: >> - move from RFC status to v1 >> - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c >> - add r-b's from Richard >> >> >> This series is inspired by a series[1] posted by Bijan Mottahedeh about >> a year ago. The problem described in the cover letter of [1] is easily >> reproducible and some users would like to have the option to avoid it. >> However the solution, which is to adjust the virtual counter offset each >> time the VM transitions to the running state, introduces a different >> problem, which is that the virtual and physical counters diverge. As >> described in the cover letter of [1] this divergence is easily observed >> when comparing the output of `date` and `hwclock` after suspending the >> guest, waiting a while, and then resuming it. Because this different >> problem may actually be worse for some users, unlike [1], the series >> posted here makes the virtual counter offset adjustment optional and not >> even enabled by default. Besides the adjustment being optional, this >> series approaches the needed changes differently to apply them in more >> appropriate locations. Finally, unlike [1], this series doesn't attempt >> to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which >> only ticks when the VM is not stopped, is sufficient. > So I guess my overall question is "what is the x86 solution to > this problem, and why is this all arm-specific?" It would also > be helpful to know how it fits into all the other proposals regarding > time in VMs. I also sent a RFC in March and ARM KVM experts were also invoved in the discussion: https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing VM kernel about external force stop. Thanks, Heyi > > thanks > -- PMM > >
On Wed, Dec 11, 2019 at 04:02:52PM +0800, Guoheyi wrote: > > 在 2019/12/6 23:22, Peter Maydell 写道: > > On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: > > > v2: > > > - move from RFC status to v1 > > > - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c > > > - add r-b's from Richard > > > > > > > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > > > a year ago. The problem described in the cover letter of [1] is easily > > > reproducible and some users would like to have the option to avoid it. > > > However the solution, which is to adjust the virtual counter offset each > > > time the VM transitions to the running state, introduces a different > > > problem, which is that the virtual and physical counters diverge. As > > > described in the cover letter of [1] this divergence is easily observed > > > when comparing the output of `date` and `hwclock` after suspending the > > > guest, waiting a while, and then resuming it. Because this different > > > problem may actually be worse for some users, unlike [1], the series > > > posted here makes the virtual counter offset adjustment optional and not > > > even enabled by default. Besides the adjustment being optional, this > > > series approaches the needed changes differently to apply them in more > > > appropriate locations. Finally, unlike [1], this series doesn't attempt > > > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > > > only ticks when the VM is not stopped, is sufficient. > > So I guess my overall question is "what is the x86 solution to > > this problem, and why is this all arm-specific?" It would also > > be helpful to know how it fits into all the other proposals regarding > > time in VMs. > > I also sent a RFC in March and ARM KVM experts were also invoved in the > discussion: > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html > > According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to > request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing VM > kernel about external force stop. > > Thanks, > > Heyi Hi Heyi, Apologies for having forgotten about that thread. I recall now having followed it last March. Indeed it even addresses the issue in the way we're coming around to now (save/restore vs. update with virtual time). I just reread the whole thread, and my feeling is that, while there are still many issues left to work, until we get a pvclock for arm, a patch like this one, but with a way to opt-in/out in order to give users a chance to choose their poison, is the best we can do. Also a patch like this one is a step in the right direction, as it will be needed as part of the bigger pvclock solution eventually, just as it is for x86. One comment on the patch is that I would prefer to do the save/restore for all VCPUs, even though KVM does currently handle synchronization. Not only does it "feel" more correct to apply VCPU APIs to all VCPUs, but I assume it will be less problematic to implement CPU hotplug at some point. Thanks, drew
在 2019/12/11 17:00, Andrew Jones 写道: > On Wed, Dec 11, 2019 at 04:02:52PM +0800, Guoheyi wrote: >> 在 2019/12/6 23:22, Peter Maydell 写道: >>> On Wed, 16 Oct 2019 at 15:34, Andrew Jones <drjones@redhat.com> wrote: >>>> v2: >>>> - move from RFC status to v1 >>>> - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c >>>> - add r-b's from Richard >>>> >>>> >>>> This series is inspired by a series[1] posted by Bijan Mottahedeh about >>>> a year ago. The problem described in the cover letter of [1] is easily >>>> reproducible and some users would like to have the option to avoid it. >>>> However the solution, which is to adjust the virtual counter offset each >>>> time the VM transitions to the running state, introduces a different >>>> problem, which is that the virtual and physical counters diverge. As >>>> described in the cover letter of [1] this divergence is easily observed >>>> when comparing the output of `date` and `hwclock` after suspending the >>>> guest, waiting a while, and then resuming it. Because this different >>>> problem may actually be worse for some users, unlike [1], the series >>>> posted here makes the virtual counter offset adjustment optional and not >>>> even enabled by default. Besides the adjustment being optional, this >>>> series approaches the needed changes differently to apply them in more >>>> appropriate locations. Finally, unlike [1], this series doesn't attempt >>>> to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which >>>> only ticks when the VM is not stopped, is sufficient. >>> So I guess my overall question is "what is the x86 solution to >>> this problem, and why is this all arm-specific?" It would also >>> be helpful to know how it fits into all the other proposals regarding >>> time in VMs. >> I also sent a RFC in March and ARM KVM experts were also invoved in the >> discussion: >> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035026.html >> >> According to the discussion, qemu on x86 is using KVM_KVMCLOCK_CTRL to >> request KVM to set a flag "PVCLOCK_GUEST_STOPPED" in pvclock, informing VM >> kernel about external force stop. >> >> Thanks, >> >> Heyi > Hi Heyi, > > Apologies for having forgotten about that thread. I recall now having > followed it last March. Indeed it even addresses the issue in the way > we're coming around to now (save/restore vs. update with virtual time). Never mind :) We were also blocked by this issue when trying to support VM suspend/resume, save/restore, etc, so I'm happy to see your patches that we have the chance to fix it. > > I just reread the whole thread, and my feeling is that, while there are > still many issues left to work, until we get a pvclock for arm, a patch > like this one, but with a way to opt-in/out in order to give users a > chance to choose their poison, is the best we can do. Also a patch like > this one is a step in the right direction, as it will be needed as part > of the bigger pvclock solution eventually, just as it is for x86. Yes, it is simple and it works with current version of ARM64 OS distributions. Thanks, Heyi > > One comment on the patch is that I would prefer to do the save/restore > for all VCPUs, even though KVM does currently handle synchronization. > Not only does it "feel" more correct to apply VCPU APIs to all VCPUs, > but I assume it will be less problematic to implement CPU hotplug at > some point. > > Thanks, > drew > > > .
Hi Drew, Thank you for posting the patches, they seems to work well because the softlockup is gone and the timestamp jump of dmesg and ftrace record also disappeared after the guest is virsh resume'ed. Let me add comments. I think the kvm-adjvtime behavior should be the default. How about introducing 'kvm-noadjvtime' parameter instead? kvm-noadjvtime is to provide the old behavior. That is because the time jump occurs timeout for timers even though the timer doesn't reach the timeout in the guest time. For example, below flow shows that user and/or kernel sets timer A for +10 sec and B for +20 sec in Guest, then Guest is suspended and it passes 60 sec, then Guest is resumed. Timer A and B go off because the Guest timestamp (TS) is jumped to 63. That is unexpected timer behavior for Guest. Host TS(sec) Guest TS(sec) Event ============ ============= ============================= 00 00 Guest: Set timer A for +10 sec 01 01 Guest: Set timer B for +20 sec 03 03 Host: virsh suspend Guest 63 63 Host: virsh resume Guest Guest: Timer A and B go off I believe kvm-adjvtime behavior is as following. So Time A and B go off as expected time. So, kvm-adjvtime behavior should be the default. Host TS(sec) Guest TS(sec) Event ============ ============= ============================= 00 00 Guest: Set timer A for +10 sec 01 01 Guest: Set timer B for +20 sec 03 03 Host: virsh suspend guest 63 03 Host: virsh resume guest 70 10 Guest: Timer A goes off 81 21 Guest: Timer B goes off Thanks, Masa On Wed, Oct 16, 2019 at 04:34:05PM +0200, Andrew Jones wrote: > v2: > - move from RFC status to v1 > - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c > - add r-b's from Richard > > > This series is inspired by a series[1] posted by Bijan Mottahedeh about > a year ago. The problem described in the cover letter of [1] is easily > reproducible and some users would like to have the option to avoid it. > However the solution, which is to adjust the virtual counter offset each > time the VM transitions to the running state, introduces a different > problem, which is that the virtual and physical counters diverge. As > described in the cover letter of [1] this divergence is easily observed > when comparing the output of `date` and `hwclock` after suspending the > guest, waiting a while, and then resuming it. Because this different > problem may actually be worse for some users, unlike [1], the series > posted here makes the virtual counter offset adjustment optional and not > even enabled by default. Besides the adjustment being optional, this > series approaches the needed changes differently to apply them in more > appropriate locations. Finally, unlike [1], this series doesn't attempt > to measure "pause time" itself. Simply using QEMU_CLOCK_VIRTUAL, which > only ticks when the VM is not stopped, is sufficient. > > I've based this series on the SVE series[2] because we're adding a new > CPU feature (kvm-adjvtime) and the SVE series introduces CPU feature > documentation, probing, and tests that we can then immediately apply > to kvm-adjvtime. > > Additional notes > ---------------- > > Note 1 > ------ > > As described above, when running a guest with kvm-adjtime enabled it > will be less likely the guest OS and guest applications get surprise > time jumps when they use the virtual counter. However the counter will > no longer reflect real time. It will lag behind. If this is a problem > then the guest can resynchronize its time from an external source or > even from its physical counter. If the suspend/resume is done with > libvirt's virsh, and the guest is running the guest agent, then it's > also possible to use a sequence like this > > $ virsh suspend $GUEST > $ virsh resume $GUEST > $ virsh domtime --sync $GUEST > > in order to resynchronize a guest right after the resume. Of course > there will still be time when the clock is not right, possibly creating > confusing timestamps in logs, for example, and the guest must still be > tolerant to the time synchronizations. > > Note 2 > ------ > > Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that > the KVM register ID is not correct. This cannot be fixed because it's > UAPI and if the UAPI headers are used then it can't be a problem. > However, if a userspace attempts to create the ID themselves from the > register's specification, then they will get KVM_REG_ARM_TIMER_CVAL > instead, as the _CNT and _CVAL definitions have their register > parameters swapped. > > Note 3 > ------ > > I didn't test this with a 32-bit KVM host, but the changes to kvm32.c > are the same as kvm64.c. So what could go wrong? Test results would be > appreciated. > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html > [2] https://patchew.org/QEMU/20191001125845.8793-1-drjones@redhat.com/ > > Thanks, > drew > > > Andrew Jones (5): > target/arm/kvm64: kvm64 cpus have timer registers > timer: arm: Introduce functions to get the host cntfrq > target/arm/kvm: Implement cpu feature kvm-adjvtime > tests/arm-cpu-features: Check feature default values > target/arm/cpu: Add the kvm-adjvtime CPU property > > docs/arm-cpu-features.rst | 27 +++++++++++++++++- > include/qemu/timer.h | 16 +++++++++++ > target/arm/cpu.c | 2 ++ > target/arm/cpu.h | 3 ++ > target/arm/cpu64.c | 1 + > target/arm/kvm.c | 59 +++++++++++++++++++++++++++++++++++++++ > target/arm/kvm32.c | 3 ++ > target/arm/kvm64.c | 4 +++ > target/arm/kvm_arm.h | 25 +++++++++++++++++ > target/arm/monitor.c | 1 + > tests/arm-cpu-features.c | 48 +++++++++++++++++++++++++------ > 11 files changed, 179 insertions(+), 10 deletions(-) > > -- > 2.21.0 > >
On Thu, Oct 17, 2019 at 05:17:59PM -0400, Masayoshi Mizuma wrote: > Hi Drew, > > Thank you for posting the patches, they seems to work well > because the softlockup is gone and the timestamp jump of > dmesg and ftrace record also disappeared after the guest > is virsh resume'ed. > > Let me add comments. > I think the kvm-adjvtime behavior should be the default. > How about introducing 'kvm-noadjvtime' parameter instead? > kvm-noadjvtime is to provide the old behavior. > > That is because the time jump occurs timeout for timers even though > the timer doesn't reach the timeout in the guest time. > > For example, below flow shows that user and/or kernel sets timer A > for +10 sec and B for +20 sec in Guest, then Guest is suspended and > it passes 60 sec, then Guest is resumed. Timer A and B go off because > the Guest timestamp (TS) is jumped to 63. That is unexpected timer > behavior for Guest. > > Host TS(sec) Guest TS(sec) Event > ============ ============= ============================= > 00 00 Guest: Set timer A for +10 sec > 01 01 Guest: Set timer B for +20 sec > 03 03 Host: virsh suspend Guest > 63 63 Host: virsh resume Guest > Guest: Timer A and B go off > > I believe kvm-adjvtime behavior is as following. So Time A > and B go off as expected time. So, kvm-adjvtime behavior should > be the default. > > Host TS(sec) Guest TS(sec) Event > ============ ============= ============================= > 00 00 Guest: Set timer A for +10 sec > 01 01 Guest: Set timer B for +20 sec > 03 03 Host: virsh suspend guest > 63 03 Host: virsh resume guest > 70 10 Guest: Timer A goes off > 81 21 Guest: Timer B goes off > Thanks for the testing Masa. Your timer test is another good example of what can happen when a guest is paused. I'm sure there are many other ways a pause could be problematic as well, especially if the guest has devices passed through to it that it's actively using. I also don't expect kvm-adjvtime to solve all those problems (like, for example, potential problems with passthrough devices, networks, etc.) This means that guest pausing should only be done by host admins that are intimately familiar with the guest OS, workload, and network connections. They should be sure that it can tolerate and recover from the pauses. Since the admins need to make the decision to pause at all, then I think it's fair for them to also decide if they want to try and mitigate some of the issues with kvm-adjvtime, i.e. require them to enable it, rather than have it on by default. Also, if we choose to enable it by default, then we'll need to deal with the compatibility issues that come with changing a behavior. That's not impossible, as this feature could be disabled for older machine types, but it's messy. All that said, I won't argue too hard against kvm-adjvtime being on by default, but let's see if others like Peter or Marc want to chime in on it too. Thanks, drew
On Fri, Oct 18, 2019 at 02:02:52PM +0200, Andrew Jones wrote: > On Thu, Oct 17, 2019 at 05:17:59PM -0400, Masayoshi Mizuma wrote: > > Hi Drew, > > > > Thank you for posting the patches, they seems to work well > > because the softlockup is gone and the timestamp jump of > > dmesg and ftrace record also disappeared after the guest > > is virsh resume'ed. > > > > Let me add comments. > > I think the kvm-adjvtime behavior should be the default. > > How about introducing 'kvm-noadjvtime' parameter instead? > > kvm-noadjvtime is to provide the old behavior. > > > > That is because the time jump occurs timeout for timers even though > > the timer doesn't reach the timeout in the guest time. > > > > For example, below flow shows that user and/or kernel sets timer A > > for +10 sec and B for +20 sec in Guest, then Guest is suspended and > > it passes 60 sec, then Guest is resumed. Timer A and B go off because > > the Guest timestamp (TS) is jumped to 63. That is unexpected timer > > behavior for Guest. > > > > Host TS(sec) Guest TS(sec) Event > > ============ ============= ============================= > > 00 00 Guest: Set timer A for +10 sec > > 01 01 Guest: Set timer B for +20 sec > > 03 03 Host: virsh suspend Guest > > 63 63 Host: virsh resume Guest > > Guest: Timer A and B go off > > > > I believe kvm-adjvtime behavior is as following. So Time A > > and B go off as expected time. So, kvm-adjvtime behavior should > > be the default. > > > > Host TS(sec) Guest TS(sec) Event > > ============ ============= ============================= > > 00 00 Guest: Set timer A for +10 sec > > 01 01 Guest: Set timer B for +20 sec > > 03 03 Host: virsh suspend guest > > 63 03 Host: virsh resume guest > > 70 10 Guest: Timer A goes off > > 81 21 Guest: Timer B goes off > > > > Thanks for the testing Masa. Your timer test is another good example of > what can happen when a guest is paused. I'm sure there are many other ways > a pause could be problematic as well, especially if the guest has devices > passed through to it that it's actively using. I also don't expect > kvm-adjvtime to solve all those problems (like, for example, potential > problems with passthrough devices, networks, etc.) This means that guest > pausing should only be done by host admins that are intimately familiar > with the guest OS, workload, and network connections. They should be sure > that it can tolerate and recover from the pauses. Since the admins need to > make the decision to pause at all, then I think it's fair for them to also > decide if they want to try and mitigate some of the issues with > kvm-adjvtime, i.e. require them to enable it, rather than have it on by > default. make sense to me, thanks. > Also, if we choose to enable it by default, then we'll need to > deal with the compatibility issues that come with changing a behavior. > That's not impossible, as this feature could be disabled for older > machine types, but it's messy. I agree with you, we shouldn't add such messy codes to resolve the compatibility issues... > > All that said, I won't argue too hard against kvm-adjvtime being on by > default, but let's see if others like Peter or Marc want to chime in on > it too. Yeah, I look forward to their comments. Thanks, Masa
© 2016 - 2024 Red Hat, Inc.