[PATCH v10 00/43] arm64: Support for Arm CCA in KVM

Steven Price posted 43 patches 3 weeks, 3 days ago
Documentation/virt/kvm/api.rst       |   92 +-
arch/arm64/include/asm/kvm_emulate.h |   38 +
arch/arm64/include/asm/kvm_host.h    |   13 +-
arch/arm64/include/asm/kvm_rme.h     |  135 ++
arch/arm64/include/asm/rmi_cmds.h    |  508 ++++++++
arch/arm64/include/asm/rmi_smc.h     |  269 ++++
arch/arm64/include/asm/virt.h        |    1 +
arch/arm64/include/uapi/asm/kvm.h    |   49 +
arch/arm64/kvm/Kconfig               |    1 +
arch/arm64/kvm/Makefile              |    2 +-
arch/arm64/kvm/arch_timer.c          |   44 +-
arch/arm64/kvm/arm.c                 |  169 ++-
arch/arm64/kvm/guest.c               |  108 +-
arch/arm64/kvm/hypercalls.c          |    4 +-
arch/arm64/kvm/inject_fault.c        |    5 +-
arch/arm64/kvm/mmio.c                |   16 +-
arch/arm64/kvm/mmu.c                 |  209 ++-
arch/arm64/kvm/pmu-emul.c            |    6 +
arch/arm64/kvm/psci.c                |   30 +
arch/arm64/kvm/reset.c               |   23 +-
arch/arm64/kvm/rme-exit.c            |  207 +++
arch/arm64/kvm/rme.c                 | 1746 ++++++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c            |   53 +-
arch/arm64/kvm/vgic/vgic-init.c      |    2 +-
arch/arm64/kvm/vgic/vgic.c           |   61 +-
arch/arm64/mm/fault.c                |   31 +-
drivers/perf/arm_pmu.c               |   15 +
include/kvm/arm_arch_timer.h         |    2 +
include/kvm/arm_pmu.h                |    4 +
include/kvm/arm_psci.h               |    2 +
include/linux/perf/arm_pmu.h         |    5 +
include/uapi/linux/kvm.h             |   29 +-
32 files changed, 3778 insertions(+), 101 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_rme.h
create mode 100644 arch/arm64/include/asm/rmi_cmds.h
create mode 100644 arch/arm64/include/asm/rmi_smc.h
create mode 100644 arch/arm64/kvm/rme-exit.c
create mode 100644 arch/arm64/kvm/rme.c
[PATCH v10 00/43] arm64: Support for Arm CCA in KVM
Posted by Steven Price 3 weeks, 3 days ago
This series adds support for running protected VMs using KVM under the
Arm Confidential Compute Architecture (CCA).

The related guest support was merged for v6.14-rc1 so you no longer need
that separately.

There are a few changes since v9, many thanks for the review
comments. The highlights are below, and individual patches have a changelog.

 * Fix a potential issue where the host was walking the stage 2 page tables on
   realm destruction. If the RMM didn't zero when undelegated (which it isn't
   required to) then the kernel would attempt to work the junk values and crash.

 * Avoid RCU stall warnings by correctly settign may_block in
   kvm_free_stage2_pgd().

 * Rebased onto v6.17-rc1.

Things to note:

 * The magic numbers for capabilities and ioctls have been updated. So
   you'll need to update your VMM. See below for the updated kvmtool branch.

 * This series doesn't attempt to integrate with the guest-memfd changes that
   are being discussed (see below).

 * Vishal raised an important question about what to do in the case of
   undelegate failures (also see below).

guest-memfd
===========

This series still implements the same API as previous versions, which means
only the private memory of the guest is backed by guest-memfd, and the shared
portion is accessed using VMAs from the VMM. This approach is compatible with
the proposed changes to support guest-memfd backing shared memory but would
require the VMM to mmap() the shared memory at the appropriate time so that
KVM can access via the VMM's VMAs.

Changing to access both shared and private through the guest-memfd API
shouldn't be difficult and could be done in a backwards compatible manner
(with the VMM choosing which method to use). It's not clear to me whether we
want to hold things up waiting for the full guest-memfd (and only support that
uAPI) or if it would be fine to just support both approaches and let VMM's
switch when they are ready.

There is a slight wrinkle around the 'populate' uAPI
(KVM_CAP_ARM_RME_POPULATE). At the moment this expects 'double mapping' - the
contents to be populated should be in the shared memory region, but the
physical pages for the private region are also allocated from the guest_memfd.
Arm CCA doesn't support a true 'in-place' conversion so this is required in
some form. However it may make sense to allow the populate call to take a user
space pointer for the data to be copied rather than require it to be in the
shared memory region. We do already have a "flags" argument so there's also
scope for easily supporting both options. The current approach integrates
quite nicely in kvmtool with the existing logic for setting up a normal
(non-CoCo) guest. But I'm aware kvmtool isn't entirely representative of what
VMMs do, so I'd welcome feedback on what a good populate uAPI would look like.

Undelegation failure
====================

When the kernel is tearing down a CCA VM, it will attempt to "undelegate"
pages allowing them to be used by the Normal World again. Assuming no bugs
then these operations will always succeed. However, the RMM has the ability to
return a failure if it considers these pages to still be in use. A failure
like this is always a bug in the code, but it would be good to be able to
handle these without resorting to an immediate BUG_ON().

The current approach in the code is to simply WARN() and use get_page() to
take an extra reference to the page to stop it being reused (as it will
immediately cause a GPF if the Normal World attempts to access the page).

However this will cause problems when we start supporting huge pages. It may
be possible to use the HWPOISON infrastructure to flag the page as unusable
(although note there is no way of 'scrubbing' a page to recover from this
situation). The other option is to just accept this this "should never happen"
and upgrade the WARN() to a BUG_ON() so we don't have to deal with the
situation. A third option is to do nothing (other than WARN) and let the
system run until the inevitable GPF which will probably bring it down (and
hope there's enough time for the user to save work etc).

I'd welcome thoughts on the best solution.

====================

The ABI to the RMM (the RMI) is based on RMM v1.0-rel0 specification[1].

This series is based on v6.17-rc1. It is also available as a git
repository:

https://gitlab.arm.com/linux-arm/linux-cca cca-host/v10

Work in progress changes for kvmtool are available from the git
repository below:

https://gitlab.arm.com/linux-arm/kvmtool-cca cca/v8

[1] https://developer.arm.com/documentation/den0137/1-0rel0/

Jean-Philippe Brucker (7):
  arm64: RME: Propagate number of breakpoints and watchpoints to
    userspace
  arm64: RME: Set breakpoint parameters through SET_ONE_REG
  arm64: RME: Initialize PMCR.N with number counter supported by RMM
  arm64: RME: Propagate max SVE vector length from RMM
  arm64: RME: Configure max SVE vector length for a Realm
  arm64: RME: Provide register list for unfinalized RME RECs
  arm64: RME: Provide accurate register list

Joey Gouly (2):
  arm64: RME: allow userspace to inject aborts
  arm64: RME: support RSI_HOST_CALL

Steven Price (31):
  arm64: RME: Handle Granule Protection Faults (GPFs)
  arm64: RME: Add SMC definitions for calling the RMM
  arm64: RME: Add wrappers for RMI calls
  arm64: RME: Check for RME support at KVM init
  arm64: RME: Define the user ABI
  arm64: RME: ioctls to create and configure realms
  KVM: arm64: Allow passing machine type in KVM creation
  arm64: RME: RTT tear down
  arm64: RME: Allocate/free RECs to match vCPUs
  KVM: arm64: vgic: Provide helper for number of list registers
  arm64: RME: Support for the VGIC in realms
  KVM: arm64: Support timers in realm RECs
  arm64: RME: Allow VMM to set RIPAS
  arm64: RME: Handle realm enter/exit
  arm64: RME: Handle RMI_EXIT_RIPAS_CHANGE
  KVM: arm64: Handle realm MMIO emulation
  arm64: RME: Allow populating initial contents
  arm64: RME: Runtime faulting of memory
  KVM: arm64: Handle realm VCPU load
  KVM: arm64: Validate register access for a Realm VM
  KVM: arm64: Handle Realm PSCI requests
  KVM: arm64: WARN on injected undef exceptions
  arm64: Don't expose stolen time for realm guests
  arm64: RME: Always use 4k pages for realms
  arm64: RME: Prevent Device mappings for Realms
  arm_pmu: Provide a mechanism for disabling the physical IRQ
  arm64: RME: Enable PMU support with a realm guest
  arm64: RME: Hide KVM_CAP_READONLY_MEM for realm guests
  KVM: arm64: Expose support for private memory
  KVM: arm64: Expose KVM_ARM_VCPU_REC to user space
  KVM: arm64: Allow activating realms

Suzuki K Poulose (3):
  kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h
  kvm: arm64: Don't expose debug capabilities for realm guests
  arm64: RME: Allow checking SVE on VM instance

 Documentation/virt/kvm/api.rst       |   92 +-
 arch/arm64/include/asm/kvm_emulate.h |   38 +
 arch/arm64/include/asm/kvm_host.h    |   13 +-
 arch/arm64/include/asm/kvm_rme.h     |  135 ++
 arch/arm64/include/asm/rmi_cmds.h    |  508 ++++++++
 arch/arm64/include/asm/rmi_smc.h     |  269 ++++
 arch/arm64/include/asm/virt.h        |    1 +
 arch/arm64/include/uapi/asm/kvm.h    |   49 +
 arch/arm64/kvm/Kconfig               |    1 +
 arch/arm64/kvm/Makefile              |    2 +-
 arch/arm64/kvm/arch_timer.c          |   44 +-
 arch/arm64/kvm/arm.c                 |  169 ++-
 arch/arm64/kvm/guest.c               |  108 +-
 arch/arm64/kvm/hypercalls.c          |    4 +-
 arch/arm64/kvm/inject_fault.c        |    5 +-
 arch/arm64/kvm/mmio.c                |   16 +-
 arch/arm64/kvm/mmu.c                 |  209 ++-
 arch/arm64/kvm/pmu-emul.c            |    6 +
 arch/arm64/kvm/psci.c                |   30 +
 arch/arm64/kvm/reset.c               |   23 +-
 arch/arm64/kvm/rme-exit.c            |  207 +++
 arch/arm64/kvm/rme.c                 | 1746 ++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            |   53 +-
 arch/arm64/kvm/vgic/vgic-init.c      |    2 +-
 arch/arm64/kvm/vgic/vgic.c           |   61 +-
 arch/arm64/mm/fault.c                |   31 +-
 drivers/perf/arm_pmu.c               |   15 +
 include/kvm/arm_arch_timer.h         |    2 +
 include/kvm/arm_pmu.h                |    4 +
 include/kvm/arm_psci.h               |    2 +
 include/linux/perf/arm_pmu.h         |    5 +
 include/uapi/linux/kvm.h             |   29 +-
 32 files changed, 3778 insertions(+), 101 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_rme.h
 create mode 100644 arch/arm64/include/asm/rmi_cmds.h
 create mode 100644 arch/arm64/include/asm/rmi_smc.h
 create mode 100644 arch/arm64/kvm/rme-exit.c
 create mode 100644 arch/arm64/kvm/rme.c

-- 
2.43.0
Re: [PATCH v10 00/43] arm64: Support for Arm CCA in KVM
Posted by Gavin Shan 1 week, 2 days ago
On 8/21/25 12:55 AM, Steven Price wrote:
> This series adds support for running protected VMs using KVM under the
> Arm Confidential Compute Architecture (CCA).
> 
> The related guest support was merged for v6.14-rc1 so you no longer need
> that separately.
> 
> There are a few changes since v9, many thanks for the review
> comments. The highlights are below, and individual patches have a changelog.
> 
>   * Fix a potential issue where the host was walking the stage 2 page tables on
>     realm destruction. If the RMM didn't zero when undelegated (which it isn't
>     required to) then the kernel would attempt to work the junk values and crash.
> 
>   * Avoid RCU stall warnings by correctly settign may_block in
>     kvm_free_stage2_pgd().
> 
>   * Rebased onto v6.17-rc1.
> 
> Things to note:
> 
>   * The magic numbers for capabilities and ioctls have been updated. So
>     you'll need to update your VMM. See below for the updated kvmtool branch.
> 
>   * This series doesn't attempt to integrate with the guest-memfd changes that
>     are being discussed (see below).
> 
>   * Vishal raised an important question about what to do in the case of
>     undelegate failures (also see below).
> 

[...]

I tried to boot a guest using the following combinations, nothing obvious went to
wrong except several long existing issues (described below). So feel free to add:

Tested-by: Gavin Shan <gshan@redhat.com>

Combination
===========
host.tf-a        https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git      (v2.13-rc0)
host.tf-rmm      https://git.codelinaro.org/linaro/dcap/rmm                       (cca/v8)
host.edk2        git@github.com:tianocore/edk2.git                                (edk2-stable202411)
host.kernel      git@github.com:gwshan/linux.git                                  (cca/host-v10) (this series)
host.qemu        https://git.qemu.org/git/qemu.git                                (stable-9.2)
host.buildroot   https://github.com/buildroot/buildroot                           (master)
guest.qemu       https://git.codelinaro.org/linaro/dcap/qemu.git                  (cca/latest) (with linux-headers sync'ed)
guest.kvmtool    https://gitlab.arm.com/linux-arm/kvmtool-cca                     (cca/latest)
guest.edk2       https://git.codelinaro.org/linaro/dcap/edk2                      (cca/latest)
guest.kernel     git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (v6.17.rc3)
guest.buildroot  https://github.com/buildroot/buildroot                           (master)

Script to start the host
========================
gshan@nvidia-grace-hopper-01:~/sandbox/qemu/host$ cat start.sh
#!/bin/sh
HOST_PATH=/home/gshan/sandbox/qemu/host
GUEST_PATH=/home/gshan/sandbox/qemu/guest
IF_UP_SCRIPT=/etc/qemu-ifup-gshan
IF_DOWN_SCRIPT=/etc/qemu-ifdown-gshan

sudo ${HOST_PATH}/qemu/build/qemu-system-aarch64                        \
-M virt,virtualization=on,secure=on,gic-version=3,acpi=off              \
-cpu max,x-rme=on -m 3G -smp 8                                          \
-serial mon:stdio -monitor none -nographic -nodefaults                  \
-bios ${HOST_PATH}/tf-a/flash.bin                                       \
-kernel ${HOST_PATH}/linux/arch/arm64/boot/Image                        \
-initrd ${HOST_PATH}/buildroot/output/images/rootfs.cpio.xz             \
-device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1                   \
-device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2                   \
-device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3                   \
-device pcie-root-port,bus=pcie.0,chassis=4,id=pcie.4                   \
-device virtio-9p-device,fsdev=shr0,mount_tag=shr0                      \
-fsdev local,security_model=none,path=${GUEST_PATH},id=shr0             \
-netdev tap,id=tap1,script=${IF_UP_SCRIPT},downscript=${IF_DOWN_SCRIPT} \
-device virtio-net-pci,bus=pcie.2,netdev=tap1,mac=b8:3f:d2:1d:3e:f1

Script to start the guest
=========================
gshan@nvidia-grace-hopper-01:~/sandbox/qemu/guest$ cat start_full.sh
#!/bin/sh
key="VGhlIHJlYWxtIGd1ZXN0IHBlcnNvbmFsaXphdGlvbiBrZXkgaW4gZm9ybWF0IG9mIGJhc2U2NCAgICAgICAgIA=="
IF_UP_SCRIPT=/etc/qemu-ifup
IF_DOWN_SCRIPT=/etc/qemu-ifdown

qemu-system-aarch64 -enable-kvm \
-object rme-guest,id=rme0,measurement-algorithm=sha512,personalization-value=${key} \
-M virt,gic-version=3,confidential-guest-support=rme0                               \
-cpu host -smp 4 -m 2G -boot c                                                      \
-serial mon:stdio -monitor none -nographic -nodefaults                              \
-bios /mnt/edk2/Build/ArmVirtQemu-AARCH64/RELEASE_GCC5/FV/QEMU_EFI.fd               \
-device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1                               \
-device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2                               \
-drive file=/mnt/rhel10.qcow2,if=none,id=drive0                                     \
-device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4             \
-netdev tap,id=tap0,script=${IF_UP_SCRIPT},downscript=${IF_DOWN_SCRIPT}             \
-device virtio-net-pci,bus=pcie.2,netdev=tap0,mac=b8:3f:d2:1d:3e:f9

Issues
======
1. virtio-iommu isn't supported by QEMU. The guest kernel becomes stuck at IOMMU
probing time where the endpoint's capabilities is queried by sending request over
virtio device's vring and the response is expected to be fed by QEMU. The request
can't be seen by QEMU due to the wrong IOMMU address translation used in QEMU as
virtio-iommu provides a different IOMMU address translation operations to override
the platform one, leading the DMA address (in the shared space) can't be properly
recognized. The information has been shared to Jean.

2. 'reboot' command doesn't work in the guest. QEMU complains some registers aren't
accessible from QEMU. I didn't sorted out a workaround for this.

3. HMP command 'dump-guest-memory' causes QEMU to exit abnormally. The cause is the
realm is reconfigured when the VM is resumed after the guest memory is dumped. The
reconfiguration is rejected by the host, leading QEMU's abnormal exit. The fix would
be to avoid the reconfiguration on the realm. The issue was originally reported by
Fujitsu and all the information has been shared to Fujitsu.

4. In QEMU, the CPU property 'kvm-no-adjvtime' can't be set to off. Otherwise, QEMU
tries to access the timer registers, which have been hidden by the host. So we need
to take the parameter (for QEMU) to by pass it: "-cpu host,kvm-no-adjvtime=on".

5. I didn't try virtio-mem and memory balloon, which isn't expected to work, especially
when the guest memory is hot added or hot removed.

Thanks,
Gavin