[Qemu-devel] [RFC QEMU v2 0/2] arm/virt: Account for guest pause time

Bijan Mottahedeh posted 2 patches 5 years, 4 months ago
Test asan passed
Test checkpatch failed
Test docker-quick@centos7 passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1543352837-21529-1-git-send-email-bijan.mottahedeh@oracle.com
hw/arm/virt.c           | 17 +++++++++++++++++
hw/intc/arm_gicv3_kvm.c | 39 +++++++++++++++++++++++++++++++++++++++
target/arm/cpu.h        |  3 +++
target/arm/helper.c     | 19 ++++++++++++++++---
target/arm/internals.h  |  8 ++++++--
target/arm/kvm64.c      |  1 +
6 files changed, 82 insertions(+), 5 deletions(-)
[Qemu-devel] [RFC QEMU v2 0/2] arm/virt: Account for guest pause time
Posted by Bijan Mottahedeh 5 years, 4 months ago
v1 -> v2:

- Call the asm code only for kvm and always use the default value for TCG.

This patch series address two Qemu issues:

  - improper system clock frequency initialization
  - lack of pause (virtsh suspend) time accounting

A simple test to reproduce the problem executes one or more instances
of the following command in the guest:

dd if=/dev/zero of=/dev/null &

and then pauses and resumes the guest after a certain delay:

virsh suspend <guest>        # pauses the guest
sleep 120
virsh resume <guest>

After the guest is resumed, there are soft lockup warning messages
displayed on the console.

A comparison with x86 shows that hwclock and date values diverge after
the above pause and resume sequence for x86 but remain the same for Arm.

Patch 1 intializes the system clock frequency in Qemu similar to the
kernel.

Patch 2 accumulates the total guest pause time in QEMU and adjusts the
virtual offset counter accordingly before the guest is resumed.

The patches have been tested on an Ampere system.  With the patches the
time behavior is the same as x86 and the soft lockup messages go away.


Clock Frequency Initialization
==============================

Arm v8 provides the virtual counter (cntvct), virtual counter offset
(cntvoff), and counter frequency (cntfrq) registers for guest time
management.

Linux Arm platform code initializes the system clock frequency from
cntrfq_el0 register and sets the value into a statically created device
tree (DT) node.  It is not clear why the timer device node is created
with TIMER_OF_DECLARE().  The DT passed from Qemu to the kernel does not
contain a timer node.

drivers/clocksource/arm_arch_timer.c:

static inline u32 arch_timer_get_cntfrq(void)
{
        return read_sysreg(cntfrq_el0);
}

rate = arch_timer_get_cntfrq();
arch_timer_of_configure_rate(rate, np);

/*
 * For historical reasons, when probing with DT we use whichever (non-zero)
 * rate was probed first, and don't verify that others match. If the first node
 * probed has a clock-frequency property, this overrides the HW register.
 */
static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
{
...
   if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
          arch_timer_rate = rate;
...
}

TIMER_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", arch_timer_of_init);
TIMER_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", arch_timer_of_init);


Linux then initializes the clock frequency to 50MHZ.

Qemu however hard codes the clock frequency to 62.5MHZ.

target/arm/cpu.h:

/* Scale factor for generic timers, ie number of ns per tick.
 * This gives a 62.5MHz timer.
 */
#define GTIMER_SCALE 16

The suggested fix is to follow the kernel's arch_timer_get_cntfrq()
approach in order to set system_clock_scale to match the kernel's idea
of clock-frequency, rather than using a hard-coded value.

Ultimately, it seems that Qemu should construct the timer DT node and
pass the actual clock frequency value to the kernel that way but that
brings up an interface and backward compatibility considerations.
Furthermore, the implications for ACPI method of probing is not clear.


Pause Time Accounting
=====================

Linux registers two clock sources, a platform-independent jiffies
clocksource and a Arm-specific arch_sys_counter; the read interface
for the latter reads the virtual counter register:

static struct clocksource clocksource_jiffies = {
        .name           = "jiffies",
        .rating         = 1, /* lowest valid rating*/
        .read           = jiffies_read,
        .mask           = CLOCKSOURCE_MASK(32),
        .mult           = TICK_NSEC << JIFFIES_SHIFT, /* details above */
        .shift          = JIFFIES_SHIFT,
        .max_cycles     = 10,
};

static struct clocksource clocksource_counter = {
        .name   = "arch_sys_counter",
        .rating = 400,
        .read   = arch_counter_read,
        .mask   = CLOCKSOURCE_MASK(56),
        .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
};

arch_counter_read()
-> arch_timer_read_counter()
   -> arch_counter_get_cntvct()
      -> arch_timer_reg_read_stable(cntvct_el0)

The virtual counter offset register is set from:

kvm_timer_vcpu_load()
-> set_cntvoff()

The counter is zeroed from:

kvm_timer_vcpu_put()
-> set_cntvoff()

        /*
         * The kernel may decide to run userspace after calling vcpu_put, so
         * we reset cntvoff to 0 to ensure a consistent read between user
         * accesses to the virtual counter and kernel access to the physical
         * counter of non-VHE case. For VHE, the virtual counter uses a fixed
         * virtual offset of zero, so no need to zero CNTVOFF_EL2 register.
         */
        if (!has_vhe())
                set_cntvoff(0);

The virtual counter offset is not modified anywhere however to account
for pause time.  The suggested fix is to add pause time accounting to
Qemu.

One potential issue is whether modifying the virtual counter offset
breaks any assumptions, e.g., see the kvm_timer_vcpu_put() comment above.


hwclock vs. date
================

The hwclock on the ends up in drivers/rtc/rtc-pl031.c

Real Time Clock interface for ARM AMBA PrimeCell 031 RTC

ioctl("/dev/rtc", RTC_RD_TIME, ...)
-> rtc_dev_ioctl()
   -> rtc_read_time()
      -> __rtc_read_time()
         -> pl031_read_time()


The date command reads time from from a vdso page updated as follows:

irq_enter()
-> tick_irq_enter()
   -> tick_nohz_irq_enter()
      -> tick_nohz_update_jiffies()
         -> tick_do_update_jiffies64()
            -> tick_do_update_jiffies64()
               -> update_wall_time()
                  -> timekeeping_advance()
                     -> timekeeping_update()
                        -> update_vsyscall(struct timekeepr *tk)

The struct timekeeper uses the Arm platform clocksource_counter noted above:

(gdb) p tk->tkr_mono
$4 = {clock = 0xffff0000097ddad0 <clocksource_counter>,
  mask = 72057594037927935, cycle_last = 271809294296, mult = 335544320,
  shift = 24, xtime_nsec = 3456106496000000, base = 5435795172160,
  base_real = 1539126455000000000}

This would explain why without any pause time accounting, the both
hwclock and date command show the same time after the guest is resume
from a pause, e.g. with the following sequence:

virsh suspend <guest>
sleep <seconds>
virsh resume <guest>

Bijan Mottahedeh (2):
  arm/virt: Initialize generic timer scale factor dynamically
  arm/virt: Account for guest pause time

 hw/arm/virt.c           | 17 +++++++++++++++++
 hw/intc/arm_gicv3_kvm.c | 39 +++++++++++++++++++++++++++++++++++++++
 target/arm/cpu.h        |  3 +++
 target/arm/helper.c     | 19 ++++++++++++++++---
 target/arm/internals.h  |  8 ++++++--
 target/arm/kvm64.c      |  1 +
 6 files changed, 82 insertions(+), 5 deletions(-)

-- 
1.8.3.1


Re: [Qemu-devel] [RFC QEMU v2 0/2] arm/virt: Account for guest pause time
Posted by no-reply@patchew.org 5 years, 4 months ago
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [RFC QEMU v2 0/2] arm/virt: Account for guest pause time
Message-id: 1543352837-21529-1-git-send-email-bijan.mottahedeh@oracle.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
bf7b6dc arm/virt: Account for guest pause time
21423f4 arm/virt: Initialize generic timer scale factor dynamically

=== OUTPUT BEGIN ===
Checking PATCH 1/2: arm/virt: Initialize generic timer scale factor dynamically...
ERROR: exactly one space required after that #ifdef
#33: FILE: hw/arm/virt.c:1757:
+#ifdef  CONFIG_KVM

total: 1 errors, 0 warnings, 106 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/2: arm/virt: Account for guest pause time...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com