[Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Peter Maydell posted 1 patch 11 weeks ago
Test docker-mingw@fedora passed
Test checkpatch passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20181206151401.13455-1-peter.maydell@linaro.org
target/arm/cpu.h     |  9 ++++++++-
target/arm/helper.c  | 27 +++++++++++++++++++++++++--
target/arm/kvm32.c   | 20 ++------------------
target/arm/kvm64.c   |  2 ++
target/arm/machine.c |  2 +-
5 files changed, 38 insertions(+), 22 deletions(-)

[Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by Peter Maydell 11 weeks ago
At the moment the Arm implementations of kvm_arch_{get,put}_registers()
don't support having QEMU change the values of system registers
(aka coprocessor registers for AArch32). This is because although
kvm_arch_get_registers() calls write_list_to_cpustate() to
update the CPU state struct fields (so QEMU code can read the
values in the usual way), kvm_arch_put_registers() does not
call write_cpustate_to_list(), meaning that any changes to
the CPU state struct fields will not be passed back to KVM.

The rationale for this design is documented in a comment in the
AArch32 kvm_arch_put_registers() -- writing the values in the
cpregs list into the CPU state struct is "lossy" because the
write of a register might not succeed, and so if we blindly
copy the CPU state values back again we will incorrectly
change register values for the guest. The assumption was that
no QEMU code would need to write to the registers.

However, when we implemented debug support for KVM guests, we
broke that assumption: the code to handle "set the guest up
to take a breakpoint exception" does so by updating various
guest registers including ESR_EL1.

Support this by making kvm_arch_put_registers() synchronize
CPU state back into the list. We sync only those registers
where the initial write succeeds, which should be sufficient.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I think this patch:
 * should be necessary for the current KVM debug code to be able
   to actually set the ESR_EL1 for the guest when it wants to
   deliver a breakpoint exception to it
 * will also be necessary for Dongjiu's patchset to inject
   external aborts on memory errors

I have tagged it "RFC" because I don't have a setup to test
that; I've just given it a quick smoke test that it runs a
VM OK. Please test this and check whether it really does
fix the bugs I think it does :-)

Opinions welcome on whether the "try the write-and-read-back"
approach in write_cpustate_to_list() is too hacky and it
would be better to actually record whether write_list_to_cpustate()
succeeded for each register. (That would require another array.)
---
 target/arm/cpu.h     |  9 ++++++++-
 target/arm/helper.c  | 27 +++++++++++++++++++++++++--
 target/arm/kvm32.c   | 20 ++------------------
 target/arm/kvm64.c   |  2 ++
 target/arm/machine.c |  2 +-
 5 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2a73fed9a01..c0c111378ea 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2321,18 +2321,25 @@ bool write_list_to_cpustate(ARMCPU *cpu);
 /**
  * write_cpustate_to_list:
  * @cpu: ARMCPU
+ * @kvm_sync: true if this is for syncing back to KVM
  *
  * For each register listed in the ARMCPU cpreg_indexes list, write
  * its value from the ARMCPUState structure into the cpreg_values list.
  * This is used to copy info from TCG's working data structures into
  * KVM or for outbound migration.
  *
+ * @kvm_sync is true if we are doing this in order to sync the
+ * register state back to KVM. In this case we will only update
+ * values in the list if the previous list->cpustate sync actually
+ * successfully wrote the CPU state. Otherwise we will keep the value
+ * that is in the list.
+ *
  * Returns: true if all register values were read correctly,
  * false if some register was unknown or could not be read.
  * Note that we do not stop early on failure -- we will attempt
  * reading all registers in the list.
  */
-bool write_cpustate_to_list(ARMCPU *cpu);
+bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
 
 #define ARM_CPUID_TI915T      0x54029152
 #define ARM_CPUID_TI925T      0x54029252
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0da1424f72d..bc2969eb4d8 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -263,7 +263,7 @@ static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
     return true;
 }
 
-bool write_cpustate_to_list(ARMCPU *cpu)
+bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync)
 {
     /* Write the coprocessor state from cpu->env to the (index,value) list. */
     int i;
@@ -272,6 +272,7 @@ bool write_cpustate_to_list(ARMCPU *cpu)
     for (i = 0; i < cpu->cpreg_array_len; i++) {
         uint32_t regidx = kvm_to_cpreg_id(cpu->cpreg_indexes[i]);
         const ARMCPRegInfo *ri;
+        uint64_t newval;
 
         ri = get_arm_cp_reginfo(cpu->cp_regs, regidx);
         if (!ri) {
@@ -281,7 +282,29 @@ bool write_cpustate_to_list(ARMCPU *cpu)
         if (ri->type & ARM_CP_NO_RAW) {
             continue;
         }
-        cpu->cpreg_values[i] = read_raw_cp_reg(&cpu->env, ri);
+
+        newval = read_raw_cp_reg(&cpu->env, ri);
+        if (kvm_sync) {
+            /*
+             * Only sync if the previous list->cpustate sync succeeded.
+             * Rather than tracking the success/failure state for every
+             * item in the list, we just recheck "does the raw write we must
+             * have made in write_list_to_cpustate() read back OK" here.
+             */
+            uint64_t oldval = cpu->cpreg_values[i];
+
+            if (oldval == newval) {
+                continue;
+            }
+
+            write_raw_cp_reg(&cpu->env, ri, oldval);
+            if (read_raw_cp_reg(&cpu->env, ri) != oldval) {
+                continue;
+            }
+
+            write_raw_cp_reg(&cpu->env, ri, newval);
+        }
+        cpu->cpreg_values[i] = newval;
     }
     return ok;
 }
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index bd51eb43c86..a75e04cc8f3 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -387,24 +387,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
         return ret;
     }
 
-    /* Note that we do not call write_cpustate_to_list()
-     * here, so we are only writing the tuple list back to
-     * KVM. This is safe because nothing can change the
-     * CPUARMState cp15 fields (in particular gdb accesses cannot)
-     * and so there are no changes to sync. In fact syncing would
-     * be wrong at this point: for a constant register where TCG and
-     * KVM disagree about its value, the preceding write_list_to_cpustate()
-     * would not have had any effect on the CPUARMState value (since the
-     * register is read-only), and a write_cpustate_to_list() here would
-     * then try to write the TCG value back into KVM -- this would either
-     * fail or incorrectly change the value the guest sees.
-     *
-     * If we ever want to allow the user to modify cp15 registers via
-     * the gdb stub, we would need to be more clever here (for instance
-     * tracking the set of registers kvm_arch_get_registers() successfully
-     * managed to update the CPUARMState with, and only allowing those
-     * to be written back up into the kernel).
-     */
+    write_cpustate_to_list(cpu, true);
+
     if (!write_list_to_kvmstate(cpu, level)) {
         return EINVAL;
     }
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 0a502091e76..d8ac978d7c3 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -834,6 +834,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
         return ret;
     }
 
+    write_cpustate_to_list(cpu, true);
+
     if (!write_list_to_kvmstate(cpu, level)) {
         return EINVAL;
     }
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 7a22ebc2098..0dd0157f4d4 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -626,7 +626,7 @@ static int cpu_pre_save(void *opaque)
             abort();
         }
     } else {
-        if (!write_cpustate_to_list(cpu)) {
+        if (!write_cpustate_to_list(cpu, false)) {
             /* This should never fail. */
             abort();
         }
-- 
2.19.2


Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by Alex Bennée 1 week ago
Peter Maydell <peter.maydell@linaro.org> writes:

> At the moment the Arm implementations of kvm_arch_{get,put}_registers()
> don't support having QEMU change the values of system registers
> (aka coprocessor registers for AArch32). This is because although
> kvm_arch_get_registers() calls write_list_to_cpustate() to
> update the CPU state struct fields (so QEMU code can read the
> values in the usual way), kvm_arch_put_registers() does not
> call write_cpustate_to_list(), meaning that any changes to
> the CPU state struct fields will not be passed back to KVM.
>
> The rationale for this design is documented in a comment in the
> AArch32 kvm_arch_put_registers() -- writing the values in the
> cpregs list into the CPU state struct is "lossy" because the
> write of a register might not succeed, and so if we blindly
> copy the CPU state values back again we will incorrectly
> change register values for the guest. The assumption was that
> no QEMU code would need to write to the registers.
>
> However, when we implemented debug support for KVM guests, we
> broke that assumption: the code to handle "set the guest up
> to take a breakpoint exception" does so by updating various
> guest registers including ESR_EL1.
>
> Support this by making kvm_arch_put_registers() synchronize
> CPU state back into the list. We sync only those registers
> where the initial write succeeds, which should be sufficient.

So my test is boot the guest:

  qemu-system-aarch64 -nodefaults -cpu host \
                      -machine type=virt,gic-version=host,accel=kvm \
                      -display none -m 4096 -serial mon:stdio \
                      -netdev user,id=unet,hostfwd=tcp::2222-:22 \
                      -device virtio-net-pci,netdev=unet \
                      -device virtio-scsi-pci \
                      -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/idun-vg/debian-buster-arm64 \
                      -device scsi-hd,drive=hd \
                      -kernel ../linux.git/arch/arm64/boot/Image \
                      -append "console=ttyAMA0 root=/dev/sda2" -s -S

And then in another window:

  gdb vmlinux -ex "target remote localhost:1234"

and then:
  c
  disable 1
  <break>
  enable 1

This leaves a hardware breakpoint enabled and therefor KVM guest
debugging aware. Then I ssh into the guest and debug some of our test
cases:

  gdb ./aarch64-linux-user/tests/fcvt
  start

this works with a temporary SW bkpt (and comes through
kvm_arm_handle_debug and is delivered to the guest).

  (gdb) x/10i $pc
  => 0x400d2c <main+16>:  adrp    x0, 0x44f000 <dl_iterate_phdr+224>
     0x400d30 <main+20>:  add     x0, x0, #0xec0
     0x400d34 <main+24>:  bl      0x407998 <puts>
     0x400d38 <main+28>:  str     wzr, [sp, #44]
     0x400d3c <main+32>:  b       0x400da8 <main+140>
  (gdb) hbreak *0x400d38
  Hardware assisted breakpoint 2 at 0x400d38: file /home/alex/lsrc/qemu.git/tests/tcg/arm/fcvt.c, line 409.
  (gdb) b *0x400d3c
  Breakpoint 3 at 0x400d3c: file /home/alex/lsrc/qemu.git/tests/tcg/arm/fcvt.c, line 409.

Then I hit i (stepi):

  Breakpoint 3, main (argc=1, argv=0x7ffffff848) at /home/alex/lsrc/qemu.git/tests/tcg/arm/fcvt.c:409
  409         for (i = 0; i < ARRAY_SIZE(round_flags); ++i) {
  => 0x400d3c <main+32>:  b       0x400da8 <main+140>

stepi doesn't work but we do hit the SW breakpoint. It's expected that
hbreak won't work as while the host is doing guest debugging with
hardware breakpoints it uses them all. However if you switch the host to
use SW breakpoints then you get:

  Breakpoint 2, main (argc=1, argv=0x7ffffff848) at /home/alex/lsrc/qemu.git/tests/tcg/arm/fcvt.c:409
  409         for (i = 0; i < ARRAY_SIZE(round_flags); ++i) {
  => 0x400d38 <main+28>:  str     wzr, [sp, #44]
     0x400d3c <main+32>:  b       0x400da8 <main+140>
     0x400d40 <main+36>:  adrp    x0, 0x485000

As you can multiplex breakpoints between the host and guest fine
(although if the guest happened to somehow set a SW breakpoint at the
same place the host had set a SW breakpoint the host would "win").

If the host deletes all breakpoints the guest can use all the facilities
fine which is to be expected as nothing comes through
kvm_arm_handle_debug for that.

The reason single-step doesn't work is because the kernel suppresses the
guests ability to do it while single-stepping is in effect.

   if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
           *vcpu_cpsr(vcpu) |=  DBG_SPSR_SS;
           mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
           mdscr |= DBG_MDSCR_SS;
           vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
   } else {
           mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
           mdscr &= ~DBG_MDSCR_SS;
           vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
   }

And of course because it does this we check in qemu:

  case EC_SOFTWARESTEP:
      if (cs->singlestep_enabled) {
          return true;
      } else {
          /*
           * The kernel should have suppressed the guest's ability to
           * single step at this point so something has gone wrong.
           */
          error_report("%s: guest single-step while debugging unsupported"
                       " (%"PRIx64", %"PRIx32")",
                       __func__, env->pc, debug_exit->hsr);
          return false;
      }
      break;

I wonder if this restriction is being too restrictive. I'll have to
experiment....

Anyway:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I think this patch:
>  * should be necessary for the current KVM debug code to be able
>    to actually set the ESR_EL1 for the guest when it wants to
>    deliver a breakpoint exception to it
>  * will also be necessary for Dongjiu's patchset to inject
>    external aborts on memory errors
>
> I have tagged it "RFC" because I don't have a setup to test
> that; I've just given it a quick smoke test that it runs a
> VM OK. Please test this and check whether it really does
> fix the bugs I think it does :-)
>
> Opinions welcome on whether the "try the write-and-read-back"
> approach in write_cpustate_to_list() is too hacky and it
> would be better to actually record whether write_list_to_cpustate()
> succeeded for each register. (That would require another array.)
> ---
>  target/arm/cpu.h     |  9 ++++++++-
>  target/arm/helper.c  | 27 +++++++++++++++++++++++++--
>  target/arm/kvm32.c   | 20 ++------------------
>  target/arm/kvm64.c   |  2 ++
>  target/arm/machine.c |  2 +-
>  5 files changed, 38 insertions(+), 22 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 2a73fed9a01..c0c111378ea 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2321,18 +2321,25 @@ bool write_list_to_cpustate(ARMCPU *cpu);
>  /**
>   * write_cpustate_to_list:
>   * @cpu: ARMCPU
> + * @kvm_sync: true if this is for syncing back to KVM
>   *
>   * For each register listed in the ARMCPU cpreg_indexes list, write
>   * its value from the ARMCPUState structure into the cpreg_values list.
>   * This is used to copy info from TCG's working data structures into
>   * KVM or for outbound migration.
>   *
> + * @kvm_sync is true if we are doing this in order to sync the
> + * register state back to KVM. In this case we will only update
> + * values in the list if the previous list->cpustate sync actually
> + * successfully wrote the CPU state. Otherwise we will keep the value
> + * that is in the list.
> + *
>   * Returns: true if all register values were read correctly,
>   * false if some register was unknown or could not be read.
>   * Note that we do not stop early on failure -- we will attempt
>   * reading all registers in the list.
>   */
> -bool write_cpustate_to_list(ARMCPU *cpu);
> +bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
>
>  #define ARM_CPUID_TI915T      0x54029152
>  #define ARM_CPUID_TI925T      0x54029252
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 0da1424f72d..bc2969eb4d8 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -263,7 +263,7 @@ static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
>      return true;
>  }
>
> -bool write_cpustate_to_list(ARMCPU *cpu)
> +bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync)
>  {
>      /* Write the coprocessor state from cpu->env to the (index,value) list. */
>      int i;
> @@ -272,6 +272,7 @@ bool write_cpustate_to_list(ARMCPU *cpu)
>      for (i = 0; i < cpu->cpreg_array_len; i++) {
>          uint32_t regidx = kvm_to_cpreg_id(cpu->cpreg_indexes[i]);
>          const ARMCPRegInfo *ri;
> +        uint64_t newval;
>
>          ri = get_arm_cp_reginfo(cpu->cp_regs, regidx);
>          if (!ri) {
> @@ -281,7 +282,29 @@ bool write_cpustate_to_list(ARMCPU *cpu)
>          if (ri->type & ARM_CP_NO_RAW) {
>              continue;
>          }
> -        cpu->cpreg_values[i] = read_raw_cp_reg(&cpu->env, ri);
> +
> +        newval = read_raw_cp_reg(&cpu->env, ri);
> +        if (kvm_sync) {
> +            /*
> +             * Only sync if the previous list->cpustate sync succeeded.
> +             * Rather than tracking the success/failure state for every
> +             * item in the list, we just recheck "does the raw write we must
> +             * have made in write_list_to_cpustate() read back OK" here.
> +             */

"in the last call to write_list_to_cpustate()"?

> +            uint64_t oldval = cpu->cpreg_values[i];
> +
> +            if (oldval == newval) {
> +                continue;
> +            }
> +
> +            write_raw_cp_reg(&cpu->env, ri, oldval);
> +            if (read_raw_cp_reg(&cpu->env, ri) != oldval) {
> +                continue;
> +            }
> +
> +            write_raw_cp_reg(&cpu->env, ri, newval);
> +        }
> +        cpu->cpreg_values[i] = newval;
>      }
>      return ok;
>  }
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index bd51eb43c86..a75e04cc8f3 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -387,24 +387,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>          return ret;
>      }
>
> -    /* Note that we do not call write_cpustate_to_list()
> -     * here, so we are only writing the tuple list back to
> -     * KVM. This is safe because nothing can change the
> -     * CPUARMState cp15 fields (in particular gdb accesses cannot)
> -     * and so there are no changes to sync. In fact syncing would
> -     * be wrong at this point: for a constant register where TCG and
> -     * KVM disagree about its value, the preceding write_list_to_cpustate()
> -     * would not have had any effect on the CPUARMState value (since the
> -     * register is read-only), and a write_cpustate_to_list() here would
> -     * then try to write the TCG value back into KVM -- this would either
> -     * fail or incorrectly change the value the guest sees.
> -     *
> -     * If we ever want to allow the user to modify cp15 registers via
> -     * the gdb stub, we would need to be more clever here (for instance
> -     * tracking the set of registers kvm_arch_get_registers() successfully
> -     * managed to update the CPUARMState with, and only allowing those
> -     * to be written back up into the kernel).
> -     */
> +    write_cpustate_to_list(cpu, true);
> +
>      if (!write_list_to_kvmstate(cpu, level)) {
>          return EINVAL;
>      }
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 0a502091e76..d8ac978d7c3 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -834,6 +834,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>          return ret;
>      }
>
> +    write_cpustate_to_list(cpu, true);
> +
>      if (!write_list_to_kvmstate(cpu, level)) {
>          return EINVAL;
>      }
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 7a22ebc2098..0dd0157f4d4 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -626,7 +626,7 @@ static int cpu_pre_save(void *opaque)
>              abort();
>          }
>      } else {
> -        if (!write_cpustate_to_list(cpu)) {
> +        if (!write_cpustate_to_list(cpu, false)) {
>              /* This should never fail. */
>              abort();
>          }

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by Peter Maydell 1 week ago
On Wed, 13 Feb 2019 at 16:29, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > At the moment the Arm implementations of kvm_arch_{get,put}_registers()
> > don't support having QEMU change the values of system registers
> > (aka coprocessor registers for AArch32). This is because although
> > kvm_arch_get_registers() calls write_list_to_cpustate() to
> > update the CPU state struct fields (so QEMU code can read the
> > values in the usual way), kvm_arch_put_registers() does not
> > call write_cpustate_to_list(), meaning that any changes to
> > the CPU state struct fields will not be passed back to KVM.
> >
> > The rationale for this design is documented in a comment in the
> > AArch32 kvm_arch_put_registers() -- writing the values in the
> > cpregs list into the CPU state struct is "lossy" because the
> > write of a register might not succeed, and so if we blindly
> > copy the CPU state values back again we will incorrectly
> > change register values for the guest. The assumption was that
> > no QEMU code would need to write to the registers.
> >
> > However, when we implemented debug support for KVM guests, we
> > broke that assumption: the code to handle "set the guest up
> > to take a breakpoint exception" does so by updating various
> > guest registers including ESR_EL1.
> >
> > Support this by making kvm_arch_put_registers() synchronize
> > CPU state back into the list. We sync only those registers
> > where the initial write succeeds, which should be sufficient.

> [snip a long test setup that I didn't really understand]

I'm confused -- I'm not sure if the things you're saying
didn't work:
 (a) didn't work before this patch and still don't
 (b) used to work and are broken by this patch
 (c) used to not work and are fixed by this patch

More generally:
 * this patch claims to fix the code path where QEMU sets
   up the guest to take a breakpoint exception (specifically
   making our change of ESR_EL1 actually take effect) -- so does
   it do that? Or is it impossible for that code path in QEMU
   to be used (if so, can we just remove it) ?

thanks
-- PMM

Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by Alex Bennée 1 week ago
Peter Maydell <peter.maydell@linaro.org> writes:

> On Wed, 13 Feb 2019 at 16:29, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>>
>> > At the moment the Arm implementations of kvm_arch_{get,put}_registers()
>> > don't support having QEMU change the values of system registers
>> > (aka coprocessor registers for AArch32). This is because although
>> > kvm_arch_get_registers() calls write_list_to_cpustate() to
>> > update the CPU state struct fields (so QEMU code can read the
>> > values in the usual way), kvm_arch_put_registers() does not
>> > call write_cpustate_to_list(), meaning that any changes to
>> > the CPU state struct fields will not be passed back to KVM.
>> >
>> > The rationale for this design is documented in a comment in the
>> > AArch32 kvm_arch_put_registers() -- writing the values in the
>> > cpregs list into the CPU state struct is "lossy" because the
>> > write of a register might not succeed, and so if we blindly
>> > copy the CPU state values back again we will incorrectly
>> > change register values for the guest. The assumption was that
>> > no QEMU code would need to write to the registers.
>> >
>> > However, when we implemented debug support for KVM guests, we
>> > broke that assumption: the code to handle "set the guest up
>> > to take a breakpoint exception" does so by updating various
>> > guest registers including ESR_EL1.
>> >
>> > Support this by making kvm_arch_put_registers() synchronize
>> > CPU state back into the list. We sync only those registers
>> > where the initial write succeeds, which should be sufficient.
>
>> [snip a long test setup that I didn't really understand]

Sorry that was just making sure I'd documented the exact steps of the
manual test for posterity.

>
> I'm confused -- I'm not sure if the things you're saying
> didn't work:
>  (a) didn't work before this patch and still don't
>  (b) used to work and are broken by this patch
>  (c) used to not work and are fixed by this patch

option (c) - this patch has made things better hence the t-b and r-b
tags. I might send you a follow up if we can also have single-step
working as well but that requires me to test a kernel patch (and remove
the error leg for guest single step in kvm64).

>
> More generally:
>  * this patch claims to fix the code path where QEMU sets
>    up the guest to take a breakpoint exception (specifically
>    making our change of ESR_EL1 actually take effect) -- so does
>    it do that? Or is it impossible for that code path in QEMU
>    to be used (if so, can we just remove it) ?

No it's needed - I'm observing exceptions coming through the path and
then being delivered to the guest while the host is debugging - modulo
the you can't have active HW breakpoints in both guest and host at the
same time.

Even here now we are sure the guest state is correctly reflected we
could make the debug code smarter and try it's best to preserve guest
debug regs while using it's own entirely in userspace. But I suspect
that is a bit niche.

>
> thanks
> -- PMM


--
Alex Bennée

Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by Alex Bennée 1 week ago
Alex Bennée <alex.bennee@linaro.org> writes:

> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On Wed, 13 Feb 2019 at 16:29, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>>
>>> Peter Maydell <peter.maydell@linaro.org> writes:
>>>
>>> > At the moment the Arm implementations of kvm_arch_{get,put}_registers()
>>> > don't support having QEMU change the values of system registers
>>> > (aka coprocessor registers for AArch32). This is because although
>>> > kvm_arch_get_registers() calls write_list_to_cpustate() to
>>> > update the CPU state struct fields (so QEMU code can read the
>>> > values in the usual way), kvm_arch_put_registers() does not
>>> > call write_cpustate_to_list(), meaning that any changes to
>>> > the CPU state struct fields will not be passed back to KVM.
<snip>
>
>>
>> I'm confused -- I'm not sure if the things you're saying
>> didn't work:
>>  (a) didn't work before this patch and still don't
>>  (b) used to work and are broken by this patch
>>  (c) used to not work and are fixed by this patch
>
> option (c) - this patch has made things better hence the t-b and r-b
> tags. I might send you a follow up if we can also have single-step
> working as well but that requires me to test a kernel patch (and remove
> the error leg for guest single step in kvm64).

Oh well it looks like we need to keep the suppression of SS while guest
debug is happening. Otherwise we end up looping exceptions:

first SS from guest trigger and exception at:
kvm_arm_handle_debug: @0xffffff8010083fa8 mdscr_el1:0x9001 pstate:0x400003c5

then repeats:
kvm_arm_handle_debug: @0xffffff8010082200 mdscr_el1:0x9001 pstate:0x3c5
kvm_arm_handle_debug: @0xffffff8010082200 mdscr_el1:0x9001 pstate:0x3c5
kvm_arm_handle_debug: @0xffffff8010082200 mdscr_el1:0x9001 pstate:0x3c5
kvm_arm_handle_debug: @0xffffff8010082200 mdscr_el1:0x9001 pstate:0x3c5

This all looks like it is in the kernel which is slightly confusing:

  (gdb) x/6i 0xffffff8010083fa0
     0xffffff8010083fa0 <+24>:    orr     x2, x2, #0x1
     0xffffff8010083fa4 <+28>:    msr     mdscr_el1, x2
     0xffffff8010083fa8 <ret_to_user+32>: ldp     x21, x22, [sp, #256]
     0xffffff8010083fac <ret_to_user+36>: bl      0xffffff80101f1ad8  <context_tracking_user_enter>
     0xffffff8010083fb0 <ret_to_user+40>: ldr     x23, [sp, #248]
     0xffffff8010083fb4 <ret_to_user+44>: msr     sp_el0, x23

which is weird because we it seems the kernel is delivering a SOFTSTEP
exception to userspace on access to mdscr_el1 rather than trapping the
access and emulating it in kernel. The loop then just ends up in the
vectors table:

  (gdb) x/4i 0xffffff8010082200
=> 0xffffff8010082200 <vectors+512>:    sub     sp, sp, #0x140
   0xffffff8010082204 <vectors+516>:    add     sp, sp, x0
   0xffffff8010082208 <vectors+520>:    sub     x0, sp, x0
   0xffffff801008220c <vectors+524>:    tbnz    w0, #14, 0xffffff801008221c <vectors+540>

Anyway that doesn't affect the viability of this patch for the
breakpoint behaviour.

>
>>
>> More generally:
>>  * this patch claims to fix the code path where QEMU sets
>>    up the guest to take a breakpoint exception (specifically
>>    making our change of ESR_EL1 actually take effect) -- so does
>>    it do that? Or is it impossible for that code path in QEMU
>>    to be used (if so, can we just remove it) ?
>
> No it's needed - I'm observing exceptions coming through the path and
> then being delivered to the guest while the host is debugging - modulo
> the you can't have active HW breakpoints in both guest and host at the
> same time.
>
> Even here now we are sure the guest state is correctly reflected we
> could make the debug code smarter and try it's best to preserve guest
> debug regs while using it's own entirely in userspace. But I suspect
> that is a bit niche.
>
>>
>> thanks
>> -- PMM


--
Alex Bennée

Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by gengdongjiu 1 week ago
I think Peter's patch is good, I will resubmit my series patches based on this.

On 2019/2/14 4:18, Alex Bennée wrote:
> 
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Wed, 13 Feb 2019 at 16:29, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>>
>>> Peter Maydell <peter.maydell@linaro.org> writes:
>>>
>>>> At the moment the Arm implementations of kvm_arch_{get,put}_registers()
>>>> don't support having QEMU change the values of system registers
>>>> (aka coprocessor registers for AArch32). This is because although
>>>> kvm_arch_get_registers() calls write_list_to_cpustate() to
>>>> update the CPU state struct fields (so QEMU code can read the
>>>> values in the usual way), kvm_arch_put_registers() does not
>>>> call write_cpustate_to_list(), meaning that any changes to
>>>> the CPU state struct fields will not be passed back to KVM.
>>>>
>>>> The rationale for this design is documented in a comment in the
>>>> AArch32 kvm_arch_put_registers() -- writing the values in the
>>>> cpregs list into the CPU state struct is "lossy" because the
>>>> write of a register might not succeed, and so if we blindly
>>>> copy the CPU state values back again we will incorrectly
>>>> change register values for the guest. The assumption was that
>>>> no QEMU code would need to write to the registers.
>>>>
>>>> However, when we implemented debug support for KVM guests, we
>>>> broke that assumption: the code to handle "set the guest up
>>>> to take a breakpoint exception" does so by updating various
>>>> guest registers including ESR_EL1.
>>>>
>>>> Support this by making kvm_arch_put_registers() synchronize
>>>> CPU state back into the list. We sync only those registers
>>>> where the initial write succeeds, which should be sufficient.
>>
>>> [snip a long test setup that I didn't really understand]
> 
> Sorry that was just making sure I'd documented the exact steps of the
> manual test for posterity.
> 
>>
>> I'm confused -- I'm not sure if the things you're saying
>> didn't work:
>>  (a) didn't work before this patch and still don't
>>  (b) used to work and are broken by this patch
>>  (c) used to not work and are fixed by this patch
> 
> option (c) - this patch has made things better hence the t-b and r-b
> tags. I might send you a follow up if we can also have single-step
> working as well but that requires me to test a kernel patch (and remove
> the error leg for guest single step in kvm64).
> 
>>
>> More generally:
>>  * this patch claims to fix the code path where QEMU sets
>>    up the guest to take a breakpoint exception (specifically
>>    making our change of ESR_EL1 actually take effect) -- so does
>>    it do that? Or is it impossible for that code path in QEMU
>>    to be used (if so, can we just remove it) ?
> 
> No it's needed - I'm observing exceptions coming through the path and
> then being delivered to the guest while the host is debugging - modulo
> the you can't have active HW breakpoints in both guest and host at the
> same time.
> 
> Even here now we are sure the guest state is correctly reflected we
> could make the debug code smarter and try it's best to preserve guest
> debug regs while using it's own entirely in userspace. But I suspect
> that is a bit niche.
> 
>>
>> thanks
>> -- PMM
> 
> 
> --
> Alex Bennée
> 
> .
> 


Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by gengdongjiu 10 weeks ago
On 2018/12/6 23:14, Peter Maydell wrote:
> At the moment the Arm implementations of kvm_arch_{get,put}_registers()
> don't support having QEMU change the values of system registers
> (aka coprocessor registers for AArch32). This is because although
> kvm_arch_get_registers() calls write_list_to_cpustate() to
> update the CPU state struct fields (so QEMU code can read the
> values in the usual way), kvm_arch_put_registers() does not
> call write_cpustate_to_list(), meaning that any changes to
> the CPU state struct fields will not be passed back to KVM.
> 
> The rationale for this design is documented in a comment in the
> AArch32 kvm_arch_put_registers() -- writing the values in the
> cpregs list into the CPU state struct is "lossy" because the
> write of a register might not succeed, and so if we blindly
> copy the CPU state values back again we will incorrectly
> change register values for the guest. The assumption was that
> no QEMU code would need to write to the registers.
> 
> However, when we implemented debug support for KVM guests, we
> broke that assumption: the code to handle "set the guest up
> to take a breakpoint exception" does so by updating various
> guest registers including ESR_EL1.
> 
> Support this by making kvm_arch_put_registers() synchronize
> CPU state back into the list. We sync only those registers
> where the initial write succeeds, which should be sufficient.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Tested-by: Dongjiu Geng <gengdongjiu@huawei.com>

> ---
> I think this patch:
>  * should be necessary for the current KVM debug code to be able
>    to actually set the ESR_EL1 for the guest when it wants to
>    deliver a breakpoint exception to it
>  * will also be necessary for Dongjiu's patchset to inject
>    external aborts on memory errors
> 
> I have tagged it "RFC" because I don't have a setup to test
> that; I've just given it a quick smoke test that it runs a
> VM OK. Please test this and check whether it really does
> fix the bugs I think it does :-)

I have tested this patch in my aarch64 platform,
it works well to inject external aborts on memory errors.
Thanks for this patch.

> 
> Opinions welcome on whether the "try the write-and-read-back"
> approach in write_cpustate_to_list() is too hacky and it
> would be better to actually record whether write_list_to_cpustate()
> succeeded for each register. (That would require another array.)
> ---
>  target/arm/cpu.h     |  9 ++++++++-
>  target/arm/helper.c  | 27 +++++++++++++++++++++++++--
>  target/arm/kvm32.c   | 20 ++------------------
>  target/arm/kvm64.c   |  2 ++
>  target/arm/machine.c |  2 +-
>  5 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 2a73fed9a01..c0c111378ea 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2321,18 +2321,25 @@ bool write_list_to_cpustate(ARMCPU *cpu);
>  /**
>   * write_cpustate_to_list:
>   * @cpu: ARMCPU
> + * @kvm_sync: true if this is for syncing back to KVM
>   *
>   * For each register listed in the ARMCPU cpreg_indexes list, write
>   * its value from the ARMCPUState structure into the cpreg_values list.
>   * This is used to copy info from TCG's working data structures into
>   * KVM or for outbound migration.
>   *
> + * @kvm_sync is true if we are doing this in order to sync the
> + * register state back to KVM. In this case we will only update
> + * values in the list if the previous list->cpustate sync actually
> + * successfully wrote the CPU state. Otherwise we will keep the value
> + * that is in the list.
> + *
>   * Returns: true if all register values were read correctly,
>   * false if some register was unknown or could not be read.
>   * Note that we do not stop early on failure -- we will attempt
>   * reading all registers in the list.
>   */
> -bool write_cpustate_to_list(ARMCPU *cpu);
> +bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
>  
>  #define ARM_CPUID_TI915T      0x54029152
>  #define ARM_CPUID_TI925T      0x54029252
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 0da1424f72d..bc2969eb4d8 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -263,7 +263,7 @@ static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
>      return true;
>  }
>  
> -bool write_cpustate_to_list(ARMCPU *cpu)
> +bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync)
>  {
>      /* Write the coprocessor state from cpu->env to the (index,value) list. */
>      int i;
> @@ -272,6 +272,7 @@ bool write_cpustate_to_list(ARMCPU *cpu)
>      for (i = 0; i < cpu->cpreg_array_len; i++) {
>          uint32_t regidx = kvm_to_cpreg_id(cpu->cpreg_indexes[i]);
>          const ARMCPRegInfo *ri;
> +        uint64_t newval;
>  
>          ri = get_arm_cp_reginfo(cpu->cp_regs, regidx);
>          if (!ri) {
> @@ -281,7 +282,29 @@ bool write_cpustate_to_list(ARMCPU *cpu)
>          if (ri->type & ARM_CP_NO_RAW) {
>              continue;
>          }
> -        cpu->cpreg_values[i] = read_raw_cp_reg(&cpu->env, ri);
> +
> +        newval = read_raw_cp_reg(&cpu->env, ri);
> +        if (kvm_sync) {
> +            /*
> +             * Only sync if the previous list->cpustate sync succeeded.
> +             * Rather than tracking the success/failure state for every
> +             * item in the list, we just recheck "does the raw write we must
> +             * have made in write_list_to_cpustate() read back OK" here.
> +             */
> +            uint64_t oldval = cpu->cpreg_values[i];
> +
> +            if (oldval == newval) {
> +                continue;
> +            }
> +
> +            write_raw_cp_reg(&cpu->env, ri, oldval);
> +            if (read_raw_cp_reg(&cpu->env, ri) != oldval) {
> +                continue;
> +            }
> +
> +            write_raw_cp_reg(&cpu->env, ri, newval);
> +        }
> +        cpu->cpreg_values[i] = newval;
>      }
>      return ok;
>  }
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index bd51eb43c86..a75e04cc8f3 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -387,24 +387,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>          return ret;
>      }
>  
> -    /* Note that we do not call write_cpustate_to_list()
> -     * here, so we are only writing the tuple list back to
> -     * KVM. This is safe because nothing can change the
> -     * CPUARMState cp15 fields (in particular gdb accesses cannot)
> -     * and so there are no changes to sync. In fact syncing would
> -     * be wrong at this point: for a constant register where TCG and
> -     * KVM disagree about its value, the preceding write_list_to_cpustate()
> -     * would not have had any effect on the CPUARMState value (since the
> -     * register is read-only), and a write_cpustate_to_list() here would
> -     * then try to write the TCG value back into KVM -- this would either
> -     * fail or incorrectly change the value the guest sees.
> -     *
> -     * If we ever want to allow the user to modify cp15 registers via
> -     * the gdb stub, we would need to be more clever here (for instance
> -     * tracking the set of registers kvm_arch_get_registers() successfully
> -     * managed to update the CPUARMState with, and only allowing those
> -     * to be written back up into the kernel).
> -     */
> +    write_cpustate_to_list(cpu, true);
> +
>      if (!write_list_to_kvmstate(cpu, level)) {
>          return EINVAL;
>      }
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 0a502091e76..d8ac978d7c3 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -834,6 +834,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>          return ret;
>      }
>  
> +    write_cpustate_to_list(cpu, true);
> +
>      if (!write_list_to_kvmstate(cpu, level)) {
>          return EINVAL;
>      }
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 7a22ebc2098..0dd0157f4d4 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -626,7 +626,7 @@ static int cpu_pre_save(void *opaque)
>              abort();
>          }
>      } else {
> -        if (!write_cpustate_to_list(cpu)) {
> +        if (!write_cpustate_to_list(cpu, false)) {
>              /* This should never fail. */
>              abort();
>          }
> 


Re: [Qemu-devel] [RFC] arm: Allow system registers for KVM guests to be changed by QEMU code

Posted by gengdongjiu 11 weeks ago
On 2018/12/6 23:14, Peter Maydell wrote:
> ---
> I think this patch:
>  * should be necessary for the current KVM debug code to be able
>    to actually set the ESR_EL1 for the guest when it wants to
>    deliver a breakpoint exception to it
>  * will also be necessary for Dongjiu's patchset to inject
>    external aborts on memory errors
> 
> I have tagged it "RFC" because I don't have a setup to test
> that; I've just given it a quick smoke test that it runs a
> VM OK. Please test this and check whether it really does
> fix the bugs I think it does :-)

Great! thanks Peter follow up this issue and patch to fix it.
I will test it and give response.

> 
> Opinions welcome on whether the "try the write-and-read-back"
> approach in write_cpustate_to_list() is too hacky and it
> would be better to actually record whether write_list_to_cpustate()
> succeeded for each register. (That would require another array.)