[PATCH 00/35] target/arm: Implement emulation of nested virtualization

Peter Maydell posted 35 patches 11 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231218113305.2511480-1-peter.maydell@linaro.org
Maintainers: Peter Maydell <peter.maydell@linaro.org>
docs/system/arm/emulation.rst  |   2 +
target/arm/cpregs.h            |  54 ++++-
target/arm/cpu-features.h      |  10 +
target/arm/cpu.h               |  24 ++-
target/arm/syndrome.h          |  20 +-
target/arm/tcg/translate.h     |  16 +-
hw/intc/arm_gicv3_cpuif.c      |  28 ++-
target/arm/cpu.c               |   8 +-
target/arm/debug_helper.c      |  34 +++-
target/arm/helper.c            | 360 ++++++++++++++++++++++++++++-----
target/arm/ptw.c               |  21 ++
target/arm/tcg/cpu64.c         |  11 +
target/arm/tcg/hflags.c        |  30 ++-
target/arm/tcg/op_helper.c     |  16 +-
target/arm/tcg/tlb_helper.c    |  27 ++-
target/arm/tcg/translate-a64.c | 162 +++++++++++++--
16 files changed, 725 insertions(+), 98 deletions(-)
[PATCH 00/35] target/arm: Implement emulation of nested virtualization
Posted by Peter Maydell 11 months, 2 weeks ago
This patchset adds support for emulating the Arm architectural features
FEAT_NV and FEAT_NV2 which allow nested virtualization, i.e. where a
hypervisor can run a guest which thinks it is running at EL2.

Nominally FEAT_NV is sufficient for this and FEAT_NV2 merely improves
the performance in the nested-virt setup, but in practice hypervisors
such as KVM are going to require FEAT_NV2 and not bother to support
the FEAT_NV-only case, so I have implemented them one after the other
in this single patchset.

The feature is essentially a collection of changes that allow the
hypervisor to lie to the guest so that it thinks it is running in EL2
when it's really at EL1. The best summary of what all the changes are
is in section D8.11 "Nested virtualization" in the Arm ARM, but the
short summary is:
 * EL2 system registers etc trap to EL2 rather than UNDEFing
 * ERET traps to EL2
 * the CurrentEL register reports "EL2" when NV is enabled
 * on exception entry, SPSR_EL1.M may report "EL2" as the EL the
   exception was taken from
 * when HCR_EL1.NV1 is also set, then there are some extra tweaks
   (NV1 == 1 means "guest thinks it is running with HCR_EL2.E2H == 0")
 * some AT S1 address translation insns can be trapped to EL2
and FEAT_NV2 adds:
 * accesses to some system registers are transformed into memory
   accesses instead of trapping to EL2
 * accesses to a few EL2 system registers are redirected to the
   equivalent EL1 registers

This patchset is sufficient that you can run an L0 guest kernel that
has support for FEAT_NV/FEAT_NV2 in its KVM code, and then
inside that start a nested L1 guest that thinks it has EL2 access,
and then run an inner-nested L2 guest under that that can get
to running userspace code. To do that you'll need some not-yet-upstream
patches for both Linux and kvmtool:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
https://gitlab.arm.com/linux-arm/kvmtool/-/commits/nv-v6.6

You'll also want to turn off SVE and SME emulation in QEMU
(-cpu max,sve=off,sme=off) because at the moment the KVM patchset
doesn't handle SVE and nested-virt together (the other option
is to hack kvmtool to make it not ask for both at once, but this
is easier).

(kvmtool is needed here to run the L1 because QEMU itself as a VMM
doesn't yet support asking KVM for an EL2 guest.)

The first three patches in the series aren't strictly part of FEAT_NV:
 * patch 1 is already reviewed; I put it here to avoid having
   to deal with textual conflicts between it and this series
 * patch 2 sets CTR_EL0.{IDC,DIC} for '-cpu max', which is a good
   idea anyway and also works around what Marc Z and I think is
   a KVM bug that otherwise causes boot of the L2 kernel to hang
 * patch 3 is a GIC bug which is not FEAT_NV specific but for
   some reason only manifests when booting an L1 kernel under NV

thanks
-- PMM

Peter Maydell (35):
  target/arm: Don't implement *32_EL2 registers when EL1 is AArch64 only
  target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
  hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
  target/arm: Handle HCR_EL2 accesses for bits introduced with FEAT_NV
  target/arm: Implement HCR_EL2.AT handling
  target/arm: Enable trapping of ERET for FEAT_NV
  target/arm: Always honour HCR_EL2.TSC when HCR_EL2.NV is set
  target/arm: Allow use of upper 32 bits of TBFLAG_A64
  target/arm: Record correct opcode fields in cpreg for E2H aliases
  target/arm: *_EL12 registers should UNDEF when HCR_EL2.E2H is 0
  target/arm: Make EL2 cpreg accessfns safe for FEAT_NV EL1 accesses
  target/arm: Move FPU/SVE/SME access checks up above
    ARM_CP_SPECIAL_MASK check
  target/arm: Trap sysreg accesses for FEAT_NV
  target/arm: Make NV reads of CurrentEL return EL2
  target/arm: Set SPSR_EL1.M correctly when nested virt is enabled
  target/arm: Trap registers when HCR_EL2.{NV,NV1} == {1,1}
  target/arm: Always use arm_pan_enabled() when checking if PAN is
    enabled
  target/arm: Don't honour PSTATE.PAN when HCR_EL2.{NV,NV1} == {1,1}
  target/arm: Treat LDTR* and STTR* as LDR/STR when NV,NV1 is 1,1
  target/arm: Handle FEAT_NV page table attribute changes
  target/arm: Add FEAT_NV to max, neoverse-n2, neoverse-v1 CPUs
  target/arm: Handle HCR_EL2 accesses for FEAT_NV2 bits
  target/arm: Implement VNCR_EL2 register
  target/arm: Handle FEAT_NV2 changes to when SPSR_EL1.M reports EL2
  target/arm: Handle FEAT_NV2 redirection of SPSR_EL2, ELR_EL2, ESR_EL2,
    FAR_EL2
  target/arm: Implement FEAT_NV2 redirection of sysregs to RAM
  target/arm: Report VNCR_EL2 based faults correctly
  target/arm: Mark up VNCR offsets (offsets 0x0..0xff)
  target/arm: Mark up VNCR offsets (offsets 0x100..0x160)
  target/arm: Mark up VNCR offsets (offsets 0x168..0x1f8)
  target/arm: Mark up VNCR offsets (offsets >= 0x200, except GIC)
  hw/intc/arm_gicv3_cpuif: Mark up VNCR offsets for GIC CPU registers
  target/arm: Report HCR_EL2.{NV,NV1,NV2} in cpu dumps
  target/arm: Enhance CPU_LOG_INT to show SPSR on AArch64
    exception-entry
  target/arm: Add FEAT_NV2 to max, neoverse-n2, neoverse-v1 CPUs

 docs/system/arm/emulation.rst  |   2 +
 target/arm/cpregs.h            |  54 ++++-
 target/arm/cpu-features.h      |  10 +
 target/arm/cpu.h               |  24 ++-
 target/arm/syndrome.h          |  20 +-
 target/arm/tcg/translate.h     |  16 +-
 hw/intc/arm_gicv3_cpuif.c      |  28 ++-
 target/arm/cpu.c               |   8 +-
 target/arm/debug_helper.c      |  34 +++-
 target/arm/helper.c            | 360 ++++++++++++++++++++++++++++-----
 target/arm/ptw.c               |  21 ++
 target/arm/tcg/cpu64.c         |  11 +
 target/arm/tcg/hflags.c        |  30 ++-
 target/arm/tcg/op_helper.c     |  16 +-
 target/arm/tcg/tlb_helper.c    |  27 ++-
 target/arm/tcg/translate-a64.c | 162 +++++++++++++--
 16 files changed, 725 insertions(+), 98 deletions(-)

-- 
2.34.1
Re: [PATCH 00/35] target/arm: Implement emulation of nested virtualization
Posted by Miguel Luis 11 months, 1 week ago
Hi Peter,

> On 18 Dec 2023, at 10:32, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> This patchset adds support for emulating the Arm architectural features
> FEAT_NV and FEAT_NV2 which allow nested virtualization, i.e. where a
> hypervisor can run a guest which thinks it is running at EL2.
> 
> Nominally FEAT_NV is sufficient for this and FEAT_NV2 merely improves
> the performance in the nested-virt setup, but in practice hypervisors
> such as KVM are going to require FEAT_NV2 and not bother to support
> the FEAT_NV-only case, so I have implemented them one after the other
> in this single patchset.
> 
> The feature is essentially a collection of changes that allow the
> hypervisor to lie to the guest so that it thinks it is running in EL2
> when it's really at EL1. The best summary of what all the changes are
> is in section D8.11 "Nested virtualization" in the Arm ARM, but the
> short summary is:
> * EL2 system registers etc trap to EL2 rather than UNDEFing
> * ERET traps to EL2
> * the CurrentEL register reports "EL2" when NV is enabled
> * on exception entry, SPSR_EL1.M may report "EL2" as the EL the
>   exception was taken from
> * when HCR_EL1.NV1 is also set, then there are some extra tweaks
>   (NV1 == 1 means "guest thinks it is running with HCR_EL2.E2H == 0")
> * some AT S1 address translation insns can be trapped to EL2
> and FEAT_NV2 adds:
> * accesses to some system registers are transformed into memory
>   accesses instead of trapping to EL2
> * accesses to a few EL2 system registers are redirected to the
>   equivalent EL1 registers
> 
> This patchset is sufficient that you can run an L0 guest kernel that
> has support for FEAT_NV/FEAT_NV2 in its KVM code, and then
> inside that start a nested L1 guest that thinks it has EL2 access,
> and then run an inner-nested L2 guest under that that can get
> to running userspace code. To do that you'll need some not-yet-upstream
> patches for both Linux and kvmtool:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.8-nv2-only
> https://gitlab.arm.com/linux-arm/kvmtool/-/commits/nv-v6.6
> 
> You'll also want to turn off SVE and SME emulation in QEMU
> (-cpu max,sve=off,sme=off) because at the moment the KVM patchset
> doesn't handle SVE and nested-virt together (the other option
> is to hack kvmtool to make it not ask for both at once, but this
> is easier).
> 
> (kvmtool is needed here to run the L1 because QEMU itself as a VMM
> doesn't yet support asking KVM for an EL2 guest.)
> 
> The first three patches in the series aren't strictly part of FEAT_NV:
> * patch 1 is already reviewed; I put it here to avoid having
>   to deal with textual conflicts between it and this series
> * patch 2 sets CTR_EL0.{IDC,DIC} for '-cpu max', which is a good
>   idea anyway and also works around what Marc Z and I think is
>   a KVM bug that otherwise causes boot of the L2 kernel to hang
> * patch 3 is a GIC bug which is not FEAT_NV specific but for
>   some reason only manifests when booting an L1 kernel under NV
> 

I've successfully replicated this setup and reached inner-nested L2 guest 
userspace.

FWIW, feel free to add

Tested-by: Miguel Luis <miguel.luis@oracle.com>

(I've been working on QEMU asking KVM for an EL2 guest on top of this series 
although there's been yet some debugging to do.)

Thank you!

Miguel

> thanks
> -- PMM
> 
> Peter Maydell (35):
>  target/arm: Don't implement *32_EL2 registers when EL1 is AArch64 only
>  target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
>  hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
>  target/arm: Handle HCR_EL2 accesses for bits introduced with FEAT_NV
>  target/arm: Implement HCR_EL2.AT handling
>  target/arm: Enable trapping of ERET for FEAT_NV
>  target/arm: Always honour HCR_EL2.TSC when HCR_EL2.NV is set
>  target/arm: Allow use of upper 32 bits of TBFLAG_A64
>  target/arm: Record correct opcode fields in cpreg for E2H aliases
>  target/arm: *_EL12 registers should UNDEF when HCR_EL2.E2H is 0
>  target/arm: Make EL2 cpreg accessfns safe for FEAT_NV EL1 accesses
>  target/arm: Move FPU/SVE/SME access checks up above
>    ARM_CP_SPECIAL_MASK check
>  target/arm: Trap sysreg accesses for FEAT_NV
>  target/arm: Make NV reads of CurrentEL return EL2
>  target/arm: Set SPSR_EL1.M correctly when nested virt is enabled
>  target/arm: Trap registers when HCR_EL2.{NV,NV1} == {1,1}
>  target/arm: Always use arm_pan_enabled() when checking if PAN is
>    enabled
>  target/arm: Don't honour PSTATE.PAN when HCR_EL2.{NV,NV1} == {1,1}
>  target/arm: Treat LDTR* and STTR* as LDR/STR when NV,NV1 is 1,1
>  target/arm: Handle FEAT_NV page table attribute changes
>  target/arm: Add FEAT_NV to max, neoverse-n2, neoverse-v1 CPUs
>  target/arm: Handle HCR_EL2 accesses for FEAT_NV2 bits
>  target/arm: Implement VNCR_EL2 register
>  target/arm: Handle FEAT_NV2 changes to when SPSR_EL1.M reports EL2
>  target/arm: Handle FEAT_NV2 redirection of SPSR_EL2, ELR_EL2, ESR_EL2,
>    FAR_EL2
>  target/arm: Implement FEAT_NV2 redirection of sysregs to RAM
>  target/arm: Report VNCR_EL2 based faults correctly
>  target/arm: Mark up VNCR offsets (offsets 0x0..0xff)
>  target/arm: Mark up VNCR offsets (offsets 0x100..0x160)
>  target/arm: Mark up VNCR offsets (offsets 0x168..0x1f8)
>  target/arm: Mark up VNCR offsets (offsets >= 0x200, except GIC)
>  hw/intc/arm_gicv3_cpuif: Mark up VNCR offsets for GIC CPU registers
>  target/arm: Report HCR_EL2.{NV,NV1,NV2} in cpu dumps
>  target/arm: Enhance CPU_LOG_INT to show SPSR on AArch64
>    exception-entry
>  target/arm: Add FEAT_NV2 to max, neoverse-n2, neoverse-v1 CPUs
> 
> docs/system/arm/emulation.rst  |   2 +
> target/arm/cpregs.h            |  54 ++++-
> target/arm/cpu-features.h      |  10 +
> target/arm/cpu.h               |  24 ++-
> target/arm/syndrome.h          |  20 +-
> target/arm/tcg/translate.h     |  16 +-
> hw/intc/arm_gicv3_cpuif.c      |  28 ++-
> target/arm/cpu.c               |   8 +-
> target/arm/debug_helper.c      |  34 +++-
> target/arm/helper.c            | 360 ++++++++++++++++++++++++++++-----
> target/arm/ptw.c               |  21 ++
> target/arm/tcg/cpu64.c         |  11 +
> target/arm/tcg/hflags.c        |  30 ++-
> target/arm/tcg/op_helper.c     |  16 +-
> target/arm/tcg/tlb_helper.c    |  27 ++-
> target/arm/tcg/translate-a64.c | 162 +++++++++++++--
> 16 files changed, 725 insertions(+), 98 deletions(-)
> 
> -- 
> 2.34.1
> 
>