From nobody Fri Dec 19 04:53:41 2025 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EF921FF7C3 for ; Wed, 18 Dec 2024 19:41:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734550904; cv=none; b=spbPtbxpDWAi19eb51sjF3gNwyF/0rNTWvZ/MrXPQdgHxVfvaiegTGM34IhaQzDgBI6CiVBWSqoQStcUiXSQKcjcDQeYhI/XYsM9ZFVr7wQxn62JS4MHXR0T+9LOXhFXagAzAIp1thQnVvwNKiv9kxgx6c3sIVMVQ9eQ+KFicMQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734550904; c=relaxed/simple; bh=ZQPelPNWhkJ1At26Nrdp5WffmSNYb8CHelUw1gYrKzY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=G9uYWlbgrxiv1gO7l4Tbn9Ccj1L1H6MqC7pF7IAwN1ger0Jvc7e1sPleer4YpnkVbffMqD3tEVBBhJ71dTVDqdph6/utIIPtxn8X8upkDRvqcl9tSnmRwr6Slliy7q2boxEl3Bg/03hMxblOMtwom4Z7k3taTxxWQ9lhO78o0EA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--qperret.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ybwfkhNz; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--qperret.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ybwfkhNz" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-aa68203d288so505619066b.1 for ; Wed, 18 Dec 2024 11:41:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734550900; x=1735155700; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=y6DkakClF8BqI5Hrkmle3MNhfJWXR/14r9X27edMbAg=; b=ybwfkhNzVnrGu/VdzdC+DcAD4qhb0lQQ4V1DCHAYXTcXZ/o9WX9CfVLdRPT3RXWdgN aFBOKiToKdUMe+YN3QnyFOJg3WTpPixetEYpxJ4pnhRFzAIKqRRTOIbiORnMzNPXLfe8 jxSRk20KrN4tvGrjIt7c9vXG97kw28ltm1o7or9uaGM/EbFxDzOz0A3+LEMlEnwzF+/G unHJSV4m5zlmYrDMFm8PnjA+vOcqIC2TIVQIkZC1pQYs4i5AmNNmtFCOI3UsVsTz8v0S AOnu++NYUJnXrNvnesthQMSxyByM7Be4EoPAH9/T6lm/XUL/0eJVl3iW2FULNTPNNxoa qNjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734550900; x=1735155700; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=y6DkakClF8BqI5Hrkmle3MNhfJWXR/14r9X27edMbAg=; b=idjwDUQGrcSmermmk4HUoOcnEQnnwabsmeIqm7VdUlJHLE8ZTGKNH5Cv2dHxJu2A3m 5Zy2mgCllNnC3en8FBCqk/tdD1TNJq5fLp5GoBfiMh/OTzmF4DmE1daGeYTZc8xTzHGf iHK0wpZ7gn3P+EZhdt8vktSClS9j4nJEI/HJmZrbxOjCap/N//KNmiwDqfN2kKAYqNDG dhG/63hFeYYlZ6jjSKRwl6ZCVSZghnD4O4KnDH9M+Qw4BykKrlYhgIlx+uNfO+m0fpyZ L1kjGx45pL2WnDDOCc3InDkOqwVyuq8yVGTW6MxYQ1SqkQUo/YzHPw2NxXeInPOWddom wkFA== X-Forwarded-Encrypted: i=1; AJvYcCXce+RX11dpzGBBZFSbXTAUH3F3sWBAVpWYRosP558OMZ1z4bD3CjUtmlvFSMJqbWQMjtJfP2PuAtnK99w=@vger.kernel.org X-Gm-Message-State: AOJu0YxUn8jzbn9bebhpe8NLb54gfHVOwtJflINEDFrvIJUVfn55ZY4R hGsqxksCmcxgo8ggtO0csJsA6scTAeMimhM/snxhmraC66bp/YFSLNLji1rGzQi7dYifLnJwhER 2EvNDLA== X-Google-Smtp-Source: AGHT+IHNKpe3m3A1MgUUPpiS0U2tlUavLUx/VUDBj0i41dQCQEpXAX6s495FgbN3PfM0SoDtetWb6xxetzlN X-Received: from ejctq12.prod.google.com ([2002:a17:907:c50c:b0:aa6:a222:16b6]) (user=qperret job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:72c7:b0:aab:d4f0:c598 with SMTP id a640c23a62f3a-aabf478a5d0mr412739466b.27.1734550899889; Wed, 18 Dec 2024 11:41:39 -0800 (PST) Date: Wed, 18 Dec 2024 19:40:58 +0000 In-Reply-To: <20241218194059.3670226-1-qperret@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241218194059.3670226-1-qperret@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241218194059.3670226-18-qperret@google.com> Subject: [PATCH v4 17/18] KVM: arm64: Introduce the EL1 pKVM MMU From: Quentin Perret To: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon Cc: Fuad Tabba , Vincent Donnefort , Sebastian Ene , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a set of helper functions allowing to manipulate the pKVM guest stage-2 page-tables from EL1 using pKVM's HVC interface. Each helper has an exact one-to-one correspondance with the traditional kvm_pgtable_stage2_*() functions from pgtable.c, with a strictly matching prototype. This will ease plumbing later on in mmu.c. These callbacks track the gfn->pfn mappings in a simple rb_tree indexed by IPA in lieu of a page-table. This rb-tree is kept in sync with pKVM's state and is protected by the mmu_lock like a traditional stage-2 page-table. Signed-off-by: Quentin Perret --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/include/asm/kvm_pgtable.h | 23 +-- arch/arm64/include/asm/kvm_pkvm.h | 26 ++++ arch/arm64/kvm/pkvm.c | 201 +++++++++++++++++++++++++++ 4 files changed, 242 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 1246f1d01dbf..f23f4ea9ec8b 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -85,6 +85,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu); struct kvm_hyp_memcache { phys_addr_t head; unsigned long nr_pages; + struct pkvm_mapping *mapping; /* only used from EL1 */ }; =20 static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc, diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/= kvm_pgtable.h index 04418b5e3004..6b9d274052c7 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -412,15 +412,20 @@ static inline bool kvm_pgtable_walk_lock_held(void) * be used instead of block mappings. */ struct kvm_pgtable { - u32 ia_bits; - s8 start_level; - kvm_pteref_t pgd; - struct kvm_pgtable_mm_ops *mm_ops; - - /* Stage-2 only */ - struct kvm_s2_mmu *mmu; - enum kvm_pgtable_stage2_flags flags; - kvm_pgtable_force_pte_cb_t force_pte_cb; + union { + struct rb_root pkvm_mappings; + struct { + u32 ia_bits; + s8 start_level; + kvm_pteref_t pgd; + struct kvm_pgtable_mm_ops *mm_ops; + + /* Stage-2 only */ + enum kvm_pgtable_stage2_flags flags; + kvm_pgtable_force_pte_cb_t force_pte_cb; + }; + }; + struct kvm_s2_mmu *mmu; }; =20 /** diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm= _pkvm.h index cd56acd9a842..65f988b6fe0d 100644 --- a/arch/arm64/include/asm/kvm_pkvm.h +++ b/arch/arm64/include/asm/kvm_pkvm.h @@ -137,4 +137,30 @@ static inline size_t pkvm_host_sve_state_size(void) SVE_SIG_REGS_SIZE(sve_vq_from_vl(kvm_host_sve_max_vl))); } =20 +struct pkvm_mapping { + struct rb_node node; + u64 gfn; + u64 pfn; +}; + +int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *m= mu, + struct kvm_pgtable_mm_ops *mm_ops); +void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); +int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u= 64 phys, + enum kvm_pgtable_prot prot, void *mc, + enum kvm_pgtable_walk_flags flags); +int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size); +int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 s= ize); +int pkvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size); +bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 add= r, u64 size, bool mkold); +int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enu= m kvm_pgtable_prot prot, + enum kvm_pgtable_walk_flags flags); +void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr, + enum kvm_pgtable_walk_flags flags); +int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size, + struct kvm_mmu_memory_cache *mc); +void pkvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, = void *pgtable, s8 level); +kvm_pte_t *pkvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, u6= 4 phys, s8 level, + enum kvm_pgtable_prot prot, void *mc, + bool force_pte); #endif /* __ARM64_KVM_PKVM_H__ */ diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c index 85117ea8f351..930b677eb9b0 100644 --- a/arch/arm64/kvm/pkvm.c +++ b/arch/arm64/kvm/pkvm.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -268,3 +269,203 @@ static int __init finalize_pkvm(void) return ret; } device_initcall_sync(finalize_pkvm); + +static int cmp_mappings(struct rb_node *node, const struct rb_node *parent) +{ + struct pkvm_mapping *a =3D rb_entry(node, struct pkvm_mapping, node); + struct pkvm_mapping *b =3D rb_entry(parent, struct pkvm_mapping, node); + + if (a->gfn < b->gfn) + return -1; + if (a->gfn > b->gfn) + return 1; + return 0; +} + +static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 g= fn) +{ + struct rb_node *node =3D root->rb_node, *prev =3D NULL; + struct pkvm_mapping *mapping; + + while (node) { + mapping =3D rb_entry(node, struct pkvm_mapping, node); + if (mapping->gfn =3D=3D gfn) + return node; + prev =3D node; + node =3D (gfn < mapping->gfn) ? node->rb_left : node->rb_right; + } + + return prev; +} + +/* + * __tmp is updated to rb_next(__tmp) *before* entering the body of the lo= op to allow freeing + * of __map inline. + */ +#define for_each_mapping_in_range_safe(__pgt, __start, __end, __map) \ + for (struct rb_node *__tmp =3D find_first_mapping_node(&(__pgt)->pkvm_map= pings, \ + ((__start) >> PAGE_SHIFT)); \ + __tmp && ({ \ + __map =3D rb_entry(__tmp, struct pkvm_mapping, node); \ + __tmp =3D rb_next(__tmp); \ + true; \ + }); \ + ) \ + if (__map->gfn < ((__start) >> PAGE_SHIFT)) \ + continue; \ + else if (__map->gfn >=3D ((__end) >> PAGE_SHIFT)) \ + break; \ + else + +int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *m= mu, + struct kvm_pgtable_mm_ops *mm_ops) +{ + pgt->pkvm_mappings =3D RB_ROOT; + pgt->mmu =3D mmu; + + return 0; +} + +void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + pkvm_handle_t handle =3D kvm->arch.pkvm.handle; + struct pkvm_mapping *mapping; + struct rb_node *node; + + if (!handle) + return; + + node =3D rb_first(&pgt->pkvm_mappings); + while (node) { + mapping =3D rb_entry(node, struct pkvm_mapping, node); + kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn); + node =3D rb_next(node); + rb_erase(&mapping->node, &pgt->pkvm_mappings); + kfree(mapping); + } +} + +int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, + u64 phys, enum kvm_pgtable_prot prot, + void *mc, enum kvm_pgtable_walk_flags flags) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + struct pkvm_mapping *mapping =3D NULL; + struct kvm_hyp_memcache *cache =3D mc; + u64 gfn =3D addr >> PAGE_SHIFT; + u64 pfn =3D phys >> PAGE_SHIFT; + int ret; + + if (size !=3D PAGE_SIZE) + return -EINVAL; + + lockdep_assert_held_write(&kvm->mmu_lock); + ret =3D kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot); + if (ret) { + /* Is the gfn already mapped due to a racing vCPU? */ + if (ret =3D=3D -EPERM) + return -EAGAIN; + } + + swap(mapping, cache->mapping); + mapping->gfn =3D gfn; + mapping->pfn =3D pfn; + WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm_mappings, cmp_mappings)); + + return ret; +} + +int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + pkvm_handle_t handle =3D kvm->arch.pkvm.handle; + struct pkvm_mapping *mapping; + int ret =3D 0; + + lockdep_assert_held_write(&kvm->mmu_lock); + for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) { + ret =3D kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gf= n); + if (WARN_ON(ret)) + break; + rb_erase(&mapping->node, &pgt->pkvm_mappings); + kfree(mapping); + } + + return ret; +} + +int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 s= ize) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + pkvm_handle_t handle =3D kvm->arch.pkvm.handle; + struct pkvm_mapping *mapping; + int ret =3D 0; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) { + ret =3D kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->= gfn); + if (WARN_ON(ret)) + break; + } + + return ret; +} + +int pkvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + struct pkvm_mapping *mapping; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) + __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE); + + return 0; +} + +bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 add= r, u64 size, bool mkold) +{ + struct kvm *kvm =3D kvm_s2_mmu_to_kvm(pgt->mmu); + pkvm_handle_t handle =3D kvm->arch.pkvm.handle; + struct pkvm_mapping *mapping; + bool young =3D false; + + lockdep_assert_held(&kvm->mmu_lock); + for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) + young |=3D kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle,= mapping->gfn, + mkold); + + return young; +} + +int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enu= m kvm_pgtable_prot prot, + enum kvm_pgtable_walk_flags flags) +{ + return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIF= T, prot); +} + +void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr, + enum kvm_pgtable_walk_flags flags) +{ + WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT)); +} + +void pkvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, = void *pgtable, s8 level) +{ + WARN_ON_ONCE(1); +} + +kvm_pte_t *pkvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt, u6= 4 phys, s8 level, + enum kvm_pgtable_prot prot, void *mc, bool force_pte) +{ + WARN_ON_ONCE(1); + return NULL; +} + +int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size, + struct kvm_mmu_memory_cache *mc) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} --=20 2.47.1.613.gc27f4b7a9f-goog