Hi all,
This series moves the stage-2 page-table management of non-protected
guests to EL2 when pKVM is enabled. This is only intended as an
incremental step towards a 'feature-complete' pKVM, there is however a
lot more that needs to come on top.
With that series applied, pKVM provides near-parity with standard KVM
from a functional perspective all while Linux no longer touches the
stage-2 page-tables itself at EL1. The majority of mm-related KVM
features work out of the box, including MMU notifiers, dirty logging,
RO memslots and things of that nature. There are however two gotchas:
- We don't support mapping devices into guests: this requires
additional hypervisor support for tracking the 'state' of devices,
which will come in a later series. No device assignment until then.
- Stage-2 mappings are forced to page-granularity even when backed by a
huge page for the sake of simplicity of this series. I'm only aiming
at functional parity-ish (from userspace's PoV) for now, support for
HP can be added on top later as a perf improvement.
Please note that the approach taken in this series is a departure from
the existing out-of-tree implementation in Android which relies on
long-term GUP pinning even for non-protected guests which I felt was a
non-starter for upstream. Android will obviously migrate to the
upstream implementation when that lands.
The last two patches are likely to be the most 'controversial' ones as
they do the integration into KVM (please see the notes in the commit
messages), but obviously feedback is more than welcome throughout. The
overall idea is to use the KVM/arm pgtable API as the 'contract' between
the standard KVM and pKVM backend implementations. With pKVM we use
wrappers at EL1 with the exact same prototype as the kvm_pgtable_stage2_*()
functions that end up simply doing hypercalls as that allows easy-ish
plumbing in kvm/mmu.c. The pKVM EL1 helpers use a simple RB-tree to
maintain the GFN->PFN mappings as we need to be able to walk those from
EL1. For the record, I have tried a few other data-structures. A maple
tree doesn't lend itself very well to the use-case as we need to
pre-allocate the node from outside the mmu_lock critical section. I also
considered using a 'dummy' s2 page-table at EL1 purely for tracking
purposes, and hook deep into pgtable.c to issue hypercalls to notify
pKVM directly from there. That has very nice properties, but gets
horrible as we try to elide the TLB invalidation / CMOs logic from the
dummy page-table.
The series is organized as follows:
- Patches 01 to 04 move the host ownership state tracking from the
host's stage-2 page-table to the hypervisor's vmemmap. This avoids
fragmenting the host stage-2 for shared pages, which is only needed
to store an annotation in the SW bits of the corresponding PTE. All
pages mapped into non-protected guests are shared from pKVM's PoV,
so the cost of stage-2 fragmentation will increase massively as we
start tracking that at EL2. Note that these patches also help with
the existing sharing for e.g. FF-A, so they could possibly be merged
separately from the rest of the series.
- Patches 05 to 07 implement a minor refactoring of the pgtable code to
ease the integration of the pKVM MMU later on.
- Patches 08 to 16 introduce all the infrastructure needed on the pKVM
side for handling guest stage-2 page-tables at EL2.
- Patches 17 and 18 plumb the newly introduced pKVM support into
KVM/arm64.
Patches based on 6.12-rc5, tested on Pixel 6 and Qemu.
Thanks!
Quentin
Marc Zyngier (1):
KVM: arm64: Introduce pkvm_vcpu_{load,put}()
Quentin Perret (17):
KVM: arm64: Change the layout of enum pkvm_page_state
KVM: arm64: Move enum pkvm_page_state to memory.h
KVM: arm64: Make hyp_page::order a u8
KVM: arm64: Move host page ownership tracking to the hyp vmemmap
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_mkyoung
KVM: arm64: Pass walk flags to kvm_pgtable_stage2_relax_perms
KVM: arm64: Make kvm_pgtable_stage2_init() a static inline function
KVM: arm64: Introduce {get,put}_pkvm_hyp_vm() helpers
KVM: arm64: Introduce __pkvm_host_share_guest()
KVM: arm64: Introduce __pkvm_host_unshare_guest()
KVM: arm64: Introduce __pkvm_host_relax_guest_perms()
KVM: arm64: Introduce __pkvm_host_wrprotect_guest()
KVM: arm64: Introduce __pkvm_host_test_clear_young_guest()
KVM: arm64: Introduce __pkvm_host_mkyoung_guest()
KVM: arm64: Introduce __pkvm_tlb_flush_vmid()
KVM: arm64: Introduce the EL1 pKVM MMU
KVM: arm64: Plumb the pKVM MMU in KVM
arch/arm64/include/asm/kvm_asm.h | 9 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/kvm_pgtable.h | 42 ++-
arch/arm64/include/asm/kvm_pkvm.h | 28 ++
arch/arm64/kvm/arm.c | 23 +-
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 38 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 43 ++-
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 15 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 210 ++++++++++-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 333 ++++++++++++++++--
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 +-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 55 +++
arch/arm64/kvm/hyp/nvhe/setup.c | 7 +-
arch/arm64/kvm/hyp/pgtable.c | 13 +-
arch/arm64/kvm/mmu.c | 110 ++++--
arch/arm64/kvm/pkvm.c | 194 ++++++++++
arch/arm64/kvm/vgic/vgic-v3.c | 6 +-
18 files changed, 1007 insertions(+), 143 deletions(-)
--
2.47.0.163.g1226f6d8fa-goog