From nobody Sat Jun 13 04:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2FE0939C64E for ; Sun, 10 May 2026 14:54:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424865; cv=none; b=hAUUV5WUembzeVskSoImZu81/reqhibjs8qraOsf26v1MrhKTQCFmqXFXhnv4Fozx8GeeQJAiQ26lBdCABqFYcIJcB0AR5NavL0YbKIKCXqPbD8QQ0af637NWu2zDpcALeH4uoiDigLliZiT378MRAeuSMListbdopV949d5SqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424865; c=relaxed/simple; bh=lwnpRiOgHSXv6Wdc+Q1IaXvZ4XNXi6T0Q1XkzBl0AqM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gcySf26Mr1X77hYCqUFfBDZc6Wp+4wFwxNqmvomKI9YcjzoKOia3pmeD0gK+q43glN95YsNRBOh3iuNVwE5bcikJDxJpntv1vfBNm6AO+z8aBJP124lLwHDde8Z0m5N7sfWkCx5ehPXu/LPxr7w16EbZCa2gBvEIkRShvTQ0+PI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=n+eSL8vZ; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="n+eSL8vZ" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8CE5627B5; Sun, 10 May 2026 07:54:11 -0700 (PDT) Received: from workstation-e142269.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 74EA23F836; Sun, 10 May 2026 07:54:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778424856; bh=lwnpRiOgHSXv6Wdc+Q1IaXvZ4XNXi6T0Q1XkzBl0AqM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=n+eSL8vZC3Yo8PcRWxnD55+5D0P+TgPrOL/fco4Em4zvoDbZin1x1JPbfJHL4/BuY NYhbP8b1fhny6BbLMVVcMzYnJp1kuzrB5Bmu/8Pj6o5C0s7jx5JvA01oGaerU5zO25 sGMj7p7jy9Rt6cESpP6/92tuylxEIv+NKa7jxYXA= From: Wei-Lin Chang To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Wei-Lin Chang Subject: [PATCH v3 1/5] KVM: arm64: Use a variable for the canonical GPA in kvm_s2_fault_map() Date: Sun, 10 May 2026 15:53:34 +0100 Message-ID: <20260510145338.322962-2-weilin.chang@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260510145338.322962-1-weilin.chang@arm.com> References: <20260510145338.322962-1-weilin.chang@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Create a variable to store the canonical GPA, instead of calculating it when needed. This will be useful when we need to use the canonical GPA for the nested reverse map later. Signed-off-by: Wei-Lin Chang --- arch/arm64/kvm/mmu.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index d089c107d9b7..e4becd5cdf36 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1981,6 +1981,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault= _desc *s2fd, long mapping_size; kvm_pfn_t pfn; gfn_t gfn; + phys_addr_t canonical_gpa; int ret; =20 kvm_fault_lock(kvm); @@ -1994,6 +1995,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault= _desc *s2fd, mapping_size =3D s2vi->vma_pagesize; pfn =3D s2vi->pfn; gfn =3D s2vi->gfn; + canonical_gpa =3D gfn_to_gpa(get_canonical_gfn(s2fd, s2vi)); =20 /* * If we are not forced to use page mapping, check if we are @@ -2012,6 +2014,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault= _desc *s2fd, goto out_unlock; } } + canonical_gpa =3D ALIGN_DOWN(canonical_gpa, mapping_size); } =20 if (!perm_fault_granule && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) @@ -2045,11 +2048,9 @@ static int kvm_s2_fault_map(const struct kvm_s2_faul= t_desc *s2fd, * making sure we adjust the canonical IPA if the mapping size has * been updated (via a THP upgrade, for example). */ - if (writable && !ret) { - phys_addr_t ipa =3D gfn_to_gpa(get_canonical_gfn(s2fd, s2vi)); - ipa &=3D ~(mapping_size - 1); - mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa)); - } + if (writable && !ret) + mark_page_dirty_in_slot(kvm, s2fd->memslot, + gpa_to_gfn(canonical_gpa)); =20 if (ret !=3D -EAGAIN) return ret; --=20 2.43.0 From nobody Sat Jun 13 04:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 39E3439E6DE for ; Sun, 10 May 2026 14:54:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424865; cv=none; b=boRGixh7gV3/SK9tn1Oc/U0VGtxXb8TcTSTocwr9D6J6Oz1I23VBO1M8Iq2++2x4vlEHm2jQS0UYhgJPjU6xxa9KGrNW6QkYP+bGunz+FNvzIzpctdr3gPnJrQgtP3yWg0hUzaxfJISiOembQXKTDuhFmK/j07bNMGqjPNIsPh0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424865; c=relaxed/simple; bh=uk0eSquGGjcstKJbba70+RhN+eJ4xKDzik20lhfjNtI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F67R0hIztB62A4PIouX/n7P77LrCzekrQrzSOaCK6ScopqTehIbNXp8BPfEd7f/1jwOowOBU2Pp00D909nbFEaVLOlkzUndK/9BFRpmMDagHxBi/KJU3X5D9rNCQYF0l7aFIWlvfZeK/XHcxqqkt6n3HJl1hagFiCP2XkO0clYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=Q2eMloWX; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="Q2eMloWX" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 36B1228FA; Sun, 10 May 2026 07:54:13 -0700 (PDT) Received: from workstation-e142269.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 25B0C3F836; Sun, 10 May 2026 07:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778424858; bh=uk0eSquGGjcstKJbba70+RhN+eJ4xKDzik20lhfjNtI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q2eMloWXtpOSzZYz4AQ44H/DWTx0bvACoXQ+WUgU/0tTPnrvVXDAzbhctBDWaPtGR 8QFnsqI7rU59xnQdcHSO/xiZQYE5mTE2ZpkYr/j3n5ehHOfnV2hro4Mi/9d+1QCjMk EJ6ttFTf/V52tlfViLCFn2FZnvGeIjQzCnxa4FLg= From: Wei-Lin Chang To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Wei-Lin Chang Subject: [PATCH v3 2/5] KVM: arm64: Move shadow_pt_debugfs_dentry to reduce holes in kvm_s2_mmu Date: Sun, 10 May 2026 15:53:35 +0100 Message-ID: <20260510145338.322962-3-weilin.chang@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260510145338.322962-1-weilin.chang@arm.com> References: <20260510145338.322962-1-weilin.chang@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" dentry pointer shadow_pt_debugfs_dentry was placed between two booleans in kvm_s2_mmu, which created unnecessary holes in the struct. Move it so the two booleans connect. Signed-off-by: Wei-Lin Chang --- arch/arm64/include/asm/kvm_host.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 851f6171751c..1a56d137df10 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -217,16 +217,16 @@ struct kvm_s2_mmu { */ bool nested_stage2_enabled; =20 -#ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS - struct dentry *shadow_pt_debugfs_dentry; -#endif - /* * true when this MMU needs to be unmapped before being used for a new * purpose. */ bool pending_unmap; =20 +#ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS + struct dentry *shadow_pt_debugfs_dentry; +#endif + /* * 0: Nobody is currently using this, check vttbr for validity * >0: Somebody is actively using this. --=20 2.43.0 From nobody Sat Jun 13 04:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1D14339FCAF for ; Sun, 10 May 2026 14:54:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424868; cv=none; b=DbqsTXH4Dc8C5LBIdf35QFzufvtDkW2lhAAso6r51OYShMaLDYpfr5tFhLdqMYmk8xQ6i8ShcC1+nr7Dke/TSodOW7yO5FpE6NpFJzkYkI07h5uQUcRV6h3AgCjPORFGW/ZTMwQfKLqSBq6/rR/ABbNxGa1913Hx8T7UDyaIMFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424868; c=relaxed/simple; bh=JbecycsRsO7GsuiD2qYQWCKe8z/108r6RfeoZgR8gSY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lcdg9e76x9WqjmYIxJ1zKF7I6oEAOBT9tCXTk1sBAOSG1CoctedNA43/LjhT8vn9gWPZpVHPd3WfoXNw3Iyf68hW/VIqDXtk0iY9m62+fPrGFCph0TWqyD8xlH7seWBBS+4e3dmiKDhbOyXbGmcIouprKqPx3U6J4rlA+Hk/qs0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=vI1bL0GV; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="vI1bL0GV" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 05D002936; Sun, 10 May 2026 07:54:15 -0700 (PDT) Received: from workstation-e142269.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C46DE3F836; Sun, 10 May 2026 07:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778424860; bh=JbecycsRsO7GsuiD2qYQWCKe8z/108r6RfeoZgR8gSY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vI1bL0GVk/XuBFOApt/EmeQuwANWzrnLhZQRdYc0gV3Qrb7o2xcxVSF1DQ8t90hH0 Za6kAWtSmkAuk8L10TcjDAQ3BLq1dTlHeHAwbbmobpXi6dyg6nAy4yRsAagaGbpQns z7tYaKBXJo/7uBsZJkEQyvvWtLAuJsdY+YdSRP6g= From: Wei-Lin Chang To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Wei-Lin Chang Subject: [PATCH v3 3/5] KVM: arm64: nv: Avoid full shadow s2 unmap Date: Sun, 10 May 2026 15:53:36 +0100 Message-ID: <20260510145338.322962-4-weilin.chang@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260510145338.322962-1-weilin.chang@arm.com> References: <20260510145338.322962-1-weilin.chang@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently we are forced to fully unmap all shadow stage-2 for a VM when unmapping a page from the canonical stage-2, for example during an MMU notifier call. This is because we are not tracking what canonical IPA are mapped in the shadow stage-2 page tables hence there is no way to know what to unmap. Create a per kvm_s2_mmu maple tree to track canonical IPA range -> nested IPA range, so that it is possible to partially unmap shadow stage-2 when a canonical IPA range is unmapped. The algorithm is simple and conservative: At each shadow stage-2 map, insert the nested IPA range into the maple tree, with the canonical IPA range as the key. If the canonical IPA range doesn't overlap with existing ranges in the tree, insert as is, and a reverse mapping for this range is established. But if the canonical IPA range overlaps with any existing ranges in the tree, create a new range that spans all the overlapping ranges including the input range and replace those existing ranges. In the mean time, mark this new spanning canonical IPA range with an "UNKNOWN_IPA" bit, indicating we give up tracking the nested IPA ranges that map to this canonical IPA range. The maple tree's 64 bit entry is enough to store the nested IPA and the UNKNOWN_IPA status, therefore besides maple tree's internal operation, memory allocation is avoided. Example: |||| means existing range, ---- means empty range input: $$$$$$$$$$$$$$$$$$$$$$$$$$ tree: --||||-----|||||||---------||||||||||----------- insert spanning range and replace overlapping ones: --||||-----||||||||||||||||||||||||||----------- ^^^^marked UNKNOWN_IPA^^^^ With the reverse map created, when a canonical IPA range gets unmapped, look into each s2 mmu's maple tree and look for canonical IPA ranges affected, and base on their UNKNOWN_IPA status: UNKNOWN_IPA -> fall back and fully unmap the current shadow stage-2, also clear the tree not UNKNOWN_IPA -> unmap the nested IPA range, and remove the reverse map entry Suggested-by: Marc Zyngier Signed-off-by: Wei-Lin Chang --- arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/kvm_nested.h | 4 + arch/arm64/kvm/mmu.c | 27 ++++-- arch/arm64/kvm/nested.c | 140 +++++++++++++++++++++++++++- 4 files changed, 167 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 1a56d137df10..dc4c0bce1bbb 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -223,6 +223,10 @@ struct kvm_s2_mmu { */ bool pending_unmap; =20 + bool nested_revmap_broken; + /* canonical IPA to nested IPA range lookup */ + struct maple_tree nested_revmap_mt; + #ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS struct dentry *shadow_pt_debugfs_dentry; #endif diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/k= vm_nested.h index 091544e6af44..5cbf78dfc685 100644 --- a/arch/arm64/include/asm/kvm_nested.h +++ b/arch/arm64/include/asm/kvm_nested.h @@ -76,6 +76,8 @@ extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u= 16 vmid, const union tlbi_info *info, void (*)(struct kvm_s2_mmu *, const union tlbi_info *)); +extern void kvm_record_nested_revmap(gpa_t gpa, struct kvm_s2_mmu *mmu, + gpa_t fault_ipa, size_t map_size); extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu); extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu); =20 @@ -164,6 +166,8 @@ extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vc= pu, struct kvm_s2_trans *trans); extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2); extern void kvm_nested_s2_wp(struct kvm *kvm); +extern void kvm_unmap_gfn_range_nested(struct kvm *kvm, gpa_t gpa, size_t = size, + bool may_block); extern void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block); extern void kvm_nested_s2_flush(struct kvm *kvm); =20 diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index e4becd5cdf36..ce0bd88cd3c1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -5,6 +5,7 @@ */ =20 #include +#include #include #include #include @@ -1099,6 +1100,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) { struct kvm *kvm =3D kvm_s2_mmu_to_kvm(mmu); struct kvm_pgtable *pgt =3D NULL; + struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; =20 write_lock(&kvm->mmu_lock); pgt =3D mmu->pgt; @@ -1108,8 +1110,11 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) free_percpu(mmu->last_vcpu_ran); } =20 - if (kvm_is_nested_s2_mmu(kvm, mmu)) + if (kvm_is_nested_s2_mmu(kvm, mmu)) { + if (!mtree_empty(revmap_mt)) + mtree_destroy(revmap_mt); kvm_init_nested_s2_mmu(mmu); + } =20 write_unlock(&kvm->mmu_lock); =20 @@ -1631,6 +1636,10 @@ static int gmem_abort(const struct kvm_s2_fault_desc= *s2fd) goto out_unlock; } =20 + if (s2fd->nested) + kvm_record_nested_revmap(gfn << PAGE_SHIFT, pgt->mmu, + s2fd->fault_ipa, PAGE_SIZE); + ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZ= E, __pfn_to_phys(pfn), prot, memcache, flags); @@ -2034,6 +2043,10 @@ static int kvm_s2_fault_map(const struct kvm_s2_faul= t_desc *s2fd, ret =3D KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn), prot, flags); } else { + if (s2fd->nested) + kvm_record_nested_revmap(canonical_gpa, pgt->mmu, + gfn_to_gpa(gfn), mapping_size); + ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping= _size, __pfn_to_phys(pfn), prot, memcache, flags); @@ -2389,14 +2402,16 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) =20 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { + gpa_t gpa =3D range->start << PAGE_SHIFT; + size_t size =3D (range->end - range->start) << PAGE_SHIFT; + bool may_block =3D range->may_block; + if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm)) return false; =20 - __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT, - (range->end - range->start) << PAGE_SHIFT, - range->may_block); + __unmap_stage2_range(&kvm->arch.mmu, gpa, size, may_block); + kvm_unmap_gfn_range_nested(kvm, gpa, size, may_block); =20 - kvm_nested_s2_unmap(kvm, range->may_block); return false; } =20 @@ -2674,7 +2689,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm, =20 write_lock(&kvm->mmu_lock); kvm_stage2_unmap_range(&kvm->arch.mmu, gpa, size, true); - kvm_nested_s2_unmap(kvm, true); + kvm_unmap_gfn_range_nested(kvm, gpa, size, true); write_unlock(&kvm->mmu_lock); } =20 diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 883b6c1008fb..35b5d5f21a23 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 #include #include @@ -43,6 +44,20 @@ struct vncr_tlb { */ #define S2_MMU_PER_VCPU 2 =20 +/* + * Per shadow S2 reverse map (IPA -> nested IPA range) maple tree payload + * layout: + * + * bit 62: valid, prevents the case where the nested IPA is 0 and tur= ning + * the whole value to 0 + * bits 55-12: nested IPA bits 55-12 + * bit 0: UNKNOWN_IPA bit, 1 indicates we give up on tracking what n= ested + * IPA maps to this canonical IPA in the shadow stage-2 + */ +#define VALID_ENTRY BIT(62) +#define ADDR_MASK GENMASK_ULL(55, 12) +#define UNKNOWN_IPA BIT(0) + void kvm_init_nested(struct kvm *kvm) { kvm->arch.nested_mmus =3D NULL; @@ -769,12 +784,57 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kv= m_vcpu *vcpu) return s2_mmu; } =20 +void kvm_record_nested_revmap(gpa_t ipa, struct kvm_s2_mmu *mmu, + gpa_t fault_ipa, size_t map_size) +{ + struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + gpa_t ipa_end =3D ipa + map_size - 1; + u64 entry, new_entry =3D 0; + MA_STATE(mas_rev, revmap_mt, ipa, ipa_end); + + if (mmu->nested_revmap_broken) + return; + + mtree_lock(revmap_mt); + entry =3D xa_to_value(mas_find_range(&mas_rev, ipa_end)); + + if (entry) { + /* maybe just a perm update... */ + if (!(entry & UNKNOWN_IPA) && mas_rev.index =3D=3D ipa && + mas_rev.last =3D=3D ipa_end && + fault_ipa =3D=3D (entry & ADDR_MASK)) + goto unlock; + /* + * Create a "UNKNOWN_IPA" range that spans all the overlapping + * ranges and store it. + */ + while (entry && mas_rev.index <=3D ipa_end) { + ipa =3D min(mas_rev.index, ipa); + ipa_end =3D max(mas_rev.last, ipa_end); + entry =3D xa_to_value(mas_find_range(&mas_rev, ipa_end)); + } + new_entry |=3D UNKNOWN_IPA; + } else { + new_entry |=3D fault_ipa; + new_entry |=3D VALID_ENTRY; + } + + mas_set_range(&mas_rev, ipa, ipa_end); + if (mas_store_gfp(&mas_rev, xa_mk_value(new_entry), + GFP_NOWAIT | __GFP_ACCOUNT)) + mmu->nested_revmap_broken =3D true; +unlock: + mtree_unlock(revmap_mt); +} + void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu) { /* CnP being set denotes an invalid entry */ mmu->tlb_vttbr =3D VTTBR_CNP_BIT; mmu->nested_stage2_enabled =3D false; atomic_set(&mmu->refcnt, 0); + mt_init(&mmu->nested_revmap_mt); + mmu->nested_revmap_broken =3D false; } =20 void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) @@ -1150,6 +1210,82 @@ void kvm_nested_s2_wp(struct kvm *kvm) kvm_invalidate_vncr_ipa(kvm, 0, BIT(kvm->arch.mmu.pgt->ia_bits)); } =20 +static void reset_revmap_and_unmap(struct kvm_s2_mmu *mmu, bool may_block) +{ + mtree_destroy(&mmu->nested_revmap_mt); + mmu->nested_revmap_broken =3D false; + kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), may_block); +} + +static void unmap_mmu_ipa_range(struct kvm_s2_mmu *mmu, gpa_t gpa, + size_t unmap_size, bool may_block) +{ + struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + gpa_t ipa =3D gpa; + gpa_t ipa_end =3D gpa + unmap_size - 1; + u64 entry; + size_t entry_size; + MA_STATE(mas_rev, revmap_mt, gpa, ipa_end); + + if (mmu->nested_revmap_broken) { + reset_revmap_and_unmap(mmu, may_block); + return; + } + + mtree_lock(revmap_mt); + entry =3D xa_to_value(mas_find_range(&mas_rev, ipa_end)); + + while (entry && mas_rev.index <=3D ipa_end) { + ipa =3D mas_rev.last + 1; + entry_size =3D mas_rev.last - mas_rev.index + 1; + /* + * Give up and invalidate this s2 mmu if the unmap range + * touches any UNKNOWN_IPA range. + */ + if (entry & UNKNOWN_IPA) { + mtree_unlock(revmap_mt); + reset_revmap_and_unmap(mmu, may_block); + return; + } + + /* + * Ignore result, it is okay if a reverse mapping erase + * fails. + */ + mas_store_gfp(&mas_rev, NULL, GFP_NOWAIT | __GFP_ACCOUNT); + + mtree_unlock(revmap_mt); + kvm_stage2_unmap_range(mmu, entry & ADDR_MASK, entry_size, + may_block); + mtree_lock(revmap_mt); + /* + * Other maple tree operations during preemption could render + * this ma_state invalid, so reset it. + */ + mas_set_range(&mas_rev, ipa, ipa_end); + entry =3D xa_to_value(mas_find_range(&mas_rev, ipa_end)); + } + mtree_unlock(revmap_mt); +} + +void kvm_unmap_gfn_range_nested(struct kvm *kvm, gpa_t gpa, size_t size, + bool may_block) +{ + int i; + + if (!kvm->arch.nested_mmus_size) + return; + + for (i =3D 0; i < kvm->arch.nested_mmus_size; i++) { + struct kvm_s2_mmu *mmu =3D &kvm->arch.nested_mmus[i]; + + if (kvm_s2_mmu_valid(mmu)) + unmap_mmu_ipa_range(mmu, gpa, size, may_block); + } + + kvm_invalidate_vncr_ipa(kvm, gpa, gpa + size); +} + void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block) { int i; @@ -1163,7 +1299,7 @@ void kvm_nested_s2_unmap(struct kvm *kvm, bool may_bl= ock) struct kvm_s2_mmu *mmu =3D &kvm->arch.nested_mmus[i]; =20 if (kvm_s2_mmu_valid(mmu)) - kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), may_block); + reset_revmap_and_unmap(mmu, may_block); } =20 kvm_invalidate_vncr_ipa(kvm, 0, BIT(kvm->arch.mmu.pgt->ia_bits)); @@ -1848,7 +1984,7 @@ void check_nested_vcpu_requests(struct kvm_vcpu *vcpu) =20 write_lock(&vcpu->kvm->mmu_lock); if (mmu->pending_unmap) { - kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), true); + reset_revmap_and_unmap(mmu, true); mmu->pending_unmap =3D false; } write_unlock(&vcpu->kvm->mmu_lock); --=20 2.43.0 From nobody Sat Jun 13 04:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6FF9739C631 for ; Sun, 10 May 2026 14:54:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424864; cv=none; b=Jdsm6Adj9moJAm6AcoFOydIADFV5t5SWcNgJ6AdEotfgI/TpjB95DXVLIDAD9xdALRm7TsDvU2A3x2j+DSq9FcZnbzVL2fkLf8DTnDlTjkpecgQ6zwYrE6RwF50+NYp6UqWCiL2HWI6/IT9csXtnqkwMm3p4KbOSFox/lA5mH70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424864; c=relaxed/simple; bh=htsQJlRBS06zME+hYGhGVA/iLnzz79Qd5j+6CmUrf6s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sZa+iZGg1kfkRlVpCe9LpVoRd0fNFBNMZwAH5s9oBQYm4owuufSuXlQPden4pZQhJ2tXPcXELyt6F0BMz7VGP3+xFVFNXeXjudkCFO0NQbXlDvlRl2ZVd+5poSWrZ5/m0lQ3M7vn2xlFSojDzyh7ynlGKc5ttIK6Le0uslA3PrA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=imSHh8YE; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="imSHh8YE" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A22872938; Sun, 10 May 2026 07:54:16 -0700 (PDT) Received: from workstation-e142269.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 915863F836; Sun, 10 May 2026 07:54:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778424861; bh=htsQJlRBS06zME+hYGhGVA/iLnzz79Qd5j+6CmUrf6s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=imSHh8YEQCrEqL+SqwMuw0wVtDp2VlrsfY+XNmLoTkFMi1XHjAWHuX6F0GqixrUY+ xgghmVmEGCIYpzcwpg056PdqPnBNBIMn1+UD0BDxk7WM1Tt9u3O9olYQ8xBNbF08iZ Jk2nAdsFe2i5UYnj8bfi+MxwMR8aE8G6zjEtuYmM= From: Wei-Lin Chang To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Wei-Lin Chang Subject: [PATCH v3 4/5] KVM: arm64: nv: Remove reverse map entries during TLBI handling Date: Sun, 10 May 2026 15:53:37 +0100 Message-ID: <20260510145338.322962-5-weilin.chang@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260510145338.322962-1-weilin.chang@arm.com> References: <20260510145338.322962-1-weilin.chang@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a guest hypervisor issues a TLBI for a specific IPA range, KVM unmaps that range from all the effected shadow stage-2s. During this we get the opportunity to remove the reverse map, and lower the probability of creating UNKNOWN_IPA reverse map ranges at subsequent stage-2 faults. However, the TLBI ranges are specified in nested IPA, so in order to locate the affected ranges in the reverse map maple tree, which is a mapping from canonical IPA to nested IPA, we can only iterate through the entire tree and check each entry. Suggested-by: Marc Zyngier Signed-off-by: Wei-Lin Chang --- arch/arm64/include/asm/kvm_nested.h | 2 ++ arch/arm64/kvm/nested.c | 38 +++++++++++++++++++++++++++++ arch/arm64/kvm/sys_regs.c | 3 +++ 3 files changed, 43 insertions(+) diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/k= vm_nested.h index 5cbf78dfc685..b11925826b25 100644 --- a/arch/arm64/include/asm/kvm_nested.h +++ b/arch/arm64/include/asm/kvm_nested.h @@ -76,6 +76,8 @@ extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u= 16 vmid, const union tlbi_info *info, void (*)(struct kvm_s2_mmu *, const union tlbi_info *)); +extern void kvm_remove_nested_revmap(struct kvm_s2_mmu *mmu, u64 nested_ip= a, + size_t size); extern void kvm_record_nested_revmap(gpa_t gpa, struct kvm_s2_mmu *mmu, gpa_t fault_ipa, size_t map_size); extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 35b5d5f21a23..96b88d9c0c2a 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -784,6 +784,44 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm= _vcpu *vcpu) return s2_mmu; } =20 +void kvm_remove_nested_revmap(struct kvm_s2_mmu *mmu, u64 nested_ipa, size= _t size) +{ + /* + * Iterate through the mt of this mmu, remove all canonical ipa ranges + * with !UNKNOWN_IPA that maps to ranges that are strictly within + * [addr, addr + size). + */ + struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + void *entry; + u64 entry_val, nested_ipa_end =3D nested_ipa + size; + u64 this_nested_ipa, this_nested_ipa_end; + size_t revmap_size; + + MA_STATE(mas_rev, revmap_mt, 0, ULONG_MAX); + + mtree_lock(revmap_mt); + mas_for_each(&mas_rev, entry, ULONG_MAX) { + entry_val =3D xa_to_value(entry); + if (entry_val & UNKNOWN_IPA) + continue; + + revmap_size =3D mas_rev.last - mas_rev.index + 1; + this_nested_ipa =3D entry_val & ADDR_MASK; + this_nested_ipa_end =3D this_nested_ipa + revmap_size; + + if (this_nested_ipa >=3D nested_ipa && + this_nested_ipa_end <=3D nested_ipa_end) { + /* + * As the shadow stage-2 is about to be unmapped + * after this function, it doesn't matter whether the + * removal of the reverse map failed or not. + */ + mas_store_gfp(&mas_rev, NULL, GFP_NOWAIT | __GFP_ACCOUNT); + } + } + mtree_unlock(revmap_mt); +} + void kvm_record_nested_revmap(gpa_t ipa, struct kvm_s2_mmu *mmu, gpa_t fault_ipa, size_t map_size) { diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 6a96cb7ba9a3..a97304680cee 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -4006,6 +4006,7 @@ union tlbi_info { static void s2_mmu_unmap_range(struct kvm_s2_mmu *mmu, const union tlbi_info *info) { + kvm_remove_nested_revmap(mmu, info->range.start, info->range.size); /* * The unmap operation is allowed to drop the MMU lock and block, which * means that @mmu could be used for a different context than the one @@ -4104,6 +4105,8 @@ static void s2_mmu_unmap_ipa(struct kvm_s2_mmu *mmu, max_size =3D compute_tlb_inval_range(mmu, info->ipa.addr); base_addr &=3D ~(max_size - 1); =20 + kvm_remove_nested_revmap(mmu, base_addr, max_size); + /* * See comment in s2_mmu_unmap_range() for why this is allowed to * reschedule. --=20 2.43.0 From nobody Sat Jun 13 04:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 90B6939DBF4 for ; Sun, 10 May 2026 14:54:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424866; cv=none; b=q6WlT+LnaO0LXTsphdCKh3UqgOvbsI/OgDTxwjELs5IK7uDzsgzPeptJ12zEb3pov1c1rQ8kkLMdX9kHTDDA6YUEI+dLCB7GKOe8W8ZfFaqJfT8cXGn4bDLPxwVcm6ijL+q7XSTbHaDBe/mGjQYw8Dq5zaH78DoV0Cs5Hi7yhRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778424866; c=relaxed/simple; bh=9fm0S8CwEDchTPXPyF4feAMGPKEjd0k5sn0tF2wpAv8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YcySifMLZYy+LgTKzX/eOFlwPPsF1098uceLfHvG+nkPaJUqIlDUBj3dm0bW52NYrYT/je3lrdOYk4joQtnaSYVk2sH+2lwggEgxjQ7x4FycKpDsU6TnUJJGr80bPL7sMIGEu4AkVBhiMDgnAcYSVqFkDWM3Q0GnJGc/uyAep+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=S+1BdtcH; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="S+1BdtcH" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6AC15293F; Sun, 10 May 2026 07:54:18 -0700 (PDT) Received: from workstation-e142269.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3FD0B3F836; Sun, 10 May 2026 07:54:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778424863; bh=9fm0S8CwEDchTPXPyF4feAMGPKEjd0k5sn0tF2wpAv8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S+1BdtcHphWSuGmGikwmK3uKHitEkieD6gB1ej5+6DjUKPMO4AdW69jO61hwYe2tn 4YUPztVVIhwKjnDyNMXPUGOhILcSMLFV6Wg5iPOTog3RG1ksn7ytAdi0hXQIDkrJn1 HUGTStmW9Tt3sDabDQd15ZRy+Z5tOWhnYpCIv5Qg= From: Wei-Lin Chang To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Wei-Lin Chang Subject: [PATCH v3 5/5] KVM: arm64: nv: Create nested IPA direct map to speed up reverse map removal Date: Sun, 10 May 2026 15:53:38 +0100 Message-ID: <20260510145338.322962-6-weilin.chang@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260510145338.322962-1-weilin.chang@arm.com> References: <20260510145338.322962-1-weilin.chang@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Iterating through the whole reverse map to find which entries to remove when handling guest hypervisor TLBIs is not efficient. Create a direct map that goes from nested IPA to canonical IPA so that the canonical IPA range affected by the TLBI can be quickly determined, then remove the entries in the reverse map accordingly. Suggested-by: Marc Zyngier Signed-off-by: Wei-Lin Chang --- arch/arm64/include/asm/kvm_host.h | 5 ++ arch/arm64/kvm/mmu.c | 9 ++- arch/arm64/kvm/nested.c | 124 ++++++++++++++++++++++-------- 3 files changed, 104 insertions(+), 34 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index dc4c0bce1bbb..f9e95a023ec4 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -226,6 +226,11 @@ struct kvm_s2_mmu { bool nested_revmap_broken; /* canonical IPA to nested IPA range lookup */ struct maple_tree nested_revmap_mt; + /* + * Nested IPA to canonical IPA range lookup, essentially a cache of + * the guest's stage-2. + */ + struct maple_tree nested_direct_mt; =20 #ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS struct dentry *shadow_pt_debugfs_dentry; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index ce0bd88cd3c1..77146431be6d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1101,6 +1101,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) struct kvm *kvm =3D kvm_s2_mmu_to_kvm(mmu); struct kvm_pgtable *pgt =3D NULL; struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + struct maple_tree *direct_mt =3D &mmu->nested_direct_mt; =20 write_lock(&kvm->mmu_lock); pgt =3D mmu->pgt; @@ -1111,8 +1112,12 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) } =20 if (kvm_is_nested_s2_mmu(kvm, mmu)) { - if (!mtree_empty(revmap_mt)) - mtree_destroy(revmap_mt); + if (!mtree_empty(revmap_mt) || !mtree_empty(direct_mt)) { + mtree_lock(revmap_mt); + __mt_destroy(revmap_mt); + __mt_destroy(direct_mt); + mtree_unlock(revmap_mt); + } kvm_init_nested_s2_mmu(mmu); } =20 diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 96b88d9c0c2a..fcb6a88047e1 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -45,14 +45,14 @@ struct vncr_tlb { #define S2_MMU_PER_VCPU 2 =20 /* - * Per shadow S2 reverse map (IPA -> nested IPA range) maple tree payload - * layout: + * Per shadow S2 reverse & direct map maple tree payload layout: * - * bit 62: valid, prevents the case where the nested IPA is 0 and tur= ning + * bit 62: valid, prevents the case where the address is 0 and turning * the whole value to 0 - * bits 55-12: nested IPA bits 55-12 + * bits 55-12: {nested, canonical} IPA bits 55-12 * bit 0: UNKNOWN_IPA bit, 1 indicates we give up on tracking what n= ested - * IPA maps to this canonical IPA in the shadow stage-2 + * IPA maps to this canonical IPA in the shadow stage-2, only= used + * in reverse map */ #define VALID_ENTRY BIT(62) #define ADDR_MASK GENMASK_ULL(55, 12) @@ -787,37 +787,67 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kv= m_vcpu *vcpu) void kvm_remove_nested_revmap(struct kvm_s2_mmu *mmu, u64 nested_ipa, size= _t size) { /* - * Iterate through the mt of this mmu, remove all canonical ipa ranges - * with !UNKNOWN_IPA that maps to ranges that are strictly within - * [addr, addr + size). + * For all ranges in direct_mt that are completely covered by the range + * we are TLBIing [gpa, gpa + size), remove the reverse map and its + * corresponding direct map together, when these conditions are met: + * + * 1. The reverse map is not UNKNOWN_IPA. + * 2. The reverse map is completely covered by the TLBI range. + * 3. The reverse map and the direct map are symmetric i.e. they map to + * each other, with the same size. + * + * Symmetry must be checked because there are three places where the + * direct map could become inconsistent: + * + * 1. Direct map removal failure during an mmu notifier in + * unmap_mmu_ipa_range(). + * 2. Direct map insertion failure during an s2 fault in + * kvm_record_nested_revmap(). + * 3. Direct map removal failure during a previous call of this very + * function. */ struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; - void *entry; - u64 entry_val, nested_ipa_end =3D nested_ipa + size; - u64 this_nested_ipa, this_nested_ipa_end; - size_t revmap_size; - - MA_STATE(mas_rev, revmap_mt, 0, ULONG_MAX); - + struct maple_tree *direct_mt =3D &mmu->nested_direct_mt; + gpa_t nested_ipa_end =3D nested_ipa + size - 1; + u64 entry_dir; + struct mapping { + u64 from; + u64 to; + size_t size; + }; + + MA_STATE(mas_dir, direct_mt, nested_ipa, nested_ipa_end); mtree_lock(revmap_mt); - mas_for_each(&mas_rev, entry, ULONG_MAX) { - entry_val =3D xa_to_value(entry); - if (entry_val & UNKNOWN_IPA) - continue; - - revmap_size =3D mas_rev.last - mas_rev.index + 1; - this_nested_ipa =3D entry_val & ADDR_MASK; - this_nested_ipa_end =3D this_nested_ipa + revmap_size; - - if (this_nested_ipa >=3D nested_ipa && - this_nested_ipa_end <=3D nested_ipa_end) { - /* - * As the shadow stage-2 is about to be unmapped - * after this function, it doesn't matter whether the - * removal of the reverse map failed or not. - */ + entry_dir =3D xa_to_value(mas_find_range(&mas_dir, nested_ipa_end)); + + while (entry_dir && mas_dir.index <=3D nested_ipa_end) { + struct mapping dir, rev; + u64 entry_rev; + + dir.from =3D mas_dir.index; + dir.to =3D entry_dir & ADDR_MASK; + dir.size =3D mas_dir.last - mas_dir.index + 1; + + /* Use ipa range to find the corresponding entry in revmap. */ + MA_STATE(mas_rev, revmap_mt, dir.to, dir.to + dir.size - 1); + entry_rev =3D xa_to_value(mas_find_range(&mas_rev, + dir.to + dir.size - 1)); + + rev.from =3D mas_rev.index; + rev.to =3D entry_rev & ADDR_MASK; + rev.size =3D mas_rev.last - mas_rev.index + 1; + + /* The three conditions outlined above. */ + if (entry_rev && !(entry_rev & UNKNOWN_IPA) && + dir.from >=3D nested_ipa && + dir.from + dir.size - 1 <=3D nested_ipa_end && + dir.from =3D=3D rev.to && + rev.from =3D=3D dir.to && + dir.size =3D=3D rev.size) { + mas_store_gfp(&mas_dir, NULL, GFP_NOWAIT | __GFP_ACCOUNT); mas_store_gfp(&mas_rev, NULL, GFP_NOWAIT | __GFP_ACCOUNT); } + entry_dir =3D xa_to_value(mas_find_range(&mas_dir, nested_ipa_end)); } mtree_unlock(revmap_mt); } @@ -826,9 +856,12 @@ void kvm_record_nested_revmap(gpa_t ipa, struct kvm_s2= _mmu *mmu, gpa_t fault_ipa, size_t map_size) { struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + struct maple_tree *direct_mt =3D &mmu->nested_direct_mt; gpa_t ipa_end =3D ipa + map_size - 1; + gpa_t fault_ipa_end =3D fault_ipa + map_size - 1; u64 entry, new_entry =3D 0; MA_STATE(mas_rev, revmap_mt, ipa, ipa_end); + MA_STATE(mas_dir, direct_mt, fault_ipa, fault_ipa_end); =20 if (mmu->nested_revmap_broken) return; @@ -861,6 +894,15 @@ void kvm_record_nested_revmap(gpa_t ipa, struct kvm_s2= _mmu *mmu, if (mas_store_gfp(&mas_rev, xa_mk_value(new_entry), GFP_NOWAIT | __GFP_ACCOUNT)) mmu->nested_revmap_broken =3D true; + + /* + * Add direct map but ignore the result, missing a direct map does not + * affect correctness. + */ + if (new_entry & VALID_ENTRY && !mmu->nested_revmap_broken) + mas_store_gfp(&mas_dir, xa_mk_value(ipa | VALID_ENTRY), + GFP_NOWAIT | __GFP_ACCOUNT); + unlock: mtree_unlock(revmap_mt); } @@ -872,6 +914,8 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu) mmu->nested_stage2_enabled =3D false; atomic_set(&mmu->refcnt, 0); mt_init(&mmu->nested_revmap_mt); + mt_init_flags(&mmu->nested_direct_mt, MT_FLAGS_LOCK_EXTERN); + mt_set_external_lock(&mmu->nested_direct_mt, &mmu->nested_revmap_mt.ma_lo= ck); mmu->nested_revmap_broken =3D false; } =20 @@ -1250,7 +1294,10 @@ void kvm_nested_s2_wp(struct kvm *kvm) =20 static void reset_revmap_and_unmap(struct kvm_s2_mmu *mmu, bool may_block) { - mtree_destroy(&mmu->nested_revmap_mt); + mtree_lock(&mmu->nested_revmap_mt); + __mt_destroy(&mmu->nested_revmap_mt); + __mt_destroy(&mmu->nested_direct_mt); + mtree_unlock(&mmu->nested_revmap_mt); mmu->nested_revmap_broken =3D false; kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), may_block); } @@ -1259,11 +1306,14 @@ static void unmap_mmu_ipa_range(struct kvm_s2_mmu *= mmu, gpa_t gpa, size_t unmap_size, bool may_block) { struct maple_tree *revmap_mt =3D &mmu->nested_revmap_mt; + struct maple_tree *direct_mt =3D &mmu->nested_direct_mt; gpa_t ipa =3D gpa; gpa_t ipa_end =3D gpa + unmap_size - 1; + gpa_t nested_ipa, nested_ipa_end; u64 entry; size_t entry_size; MA_STATE(mas_rev, revmap_mt, gpa, ipa_end); + MA_STATE(mas_dir, direct_mt, 0, ULONG_MAX); =20 if (mmu->nested_revmap_broken) { reset_revmap_and_unmap(mmu, may_block); @@ -1292,6 +1342,16 @@ static void unmap_mmu_ipa_range(struct kvm_s2_mmu *m= mu, gpa_t gpa, */ mas_store_gfp(&mas_rev, NULL, GFP_NOWAIT | __GFP_ACCOUNT); =20 + /* + * Try to also remove the direct map, it is okay if this fails, + * as we check for direct map consistency in + * kvm_remove_nested_revmap(). + */ + nested_ipa =3D entry & ADDR_MASK; + nested_ipa_end =3D nested_ipa + entry_size - 1; + mas_set_range(&mas_dir, nested_ipa, nested_ipa_end); + mas_store_gfp(&mas_dir, NULL, GFP_NOWAIT | __GFP_ACCOUNT); + mtree_unlock(revmap_mt); kvm_stage2_unmap_range(mmu, entry & ADDR_MASK, entry_size, may_block); --=20 2.43.0