From nobody Tue Dec 2 02:04:17 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD7E0368E19; Thu, 20 Nov 2025 17:16:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658993; cv=none; b=s1tF7iZLQE2JzVY1jrpVzE/fQ1sn+f5TQ8+CDWAfRctegAb5x0jIwuJDdGj8JYgyM1MHDPKzteH7qh9DwU+pie86w6/30SE1CLjSpHu03RA+g5FjvWCvLcuadWtnHB92Xvgx8492HA6zy55Mb2PE2exGGx6eqAENCf3W/FbeluI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658993; c=relaxed/simple; bh=ln+bh5U3MAH/Hmqd+cn8RAgNSBwIixhjSCtvVS8Hb7w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fRFwCdVQj794sTZRithda/N0UcynLC1sMHtSt6QGd9FIirdFUnTMn0v9QtHl33jVR8kfWLZ5aT1C3l6PHDExZAm63gfQqFfiUOcsUcTP6TmJkbbBT6vWELt3lth3Erh1gsKyxwcu6mBhIsyfAc635ODNOQJjQ0VjVVv2/KOs9Ts= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=re650c6W; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="re650c6W" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCk5xN007069; Thu, 20 Nov 2025 17:16:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=91tpwFi8Ap7xuDZ5b DactEORgKv7T9E08OKMu3A67Qk=; b=re650c6W4ZkYhch+a5OlmSiSNZ1YHyXvc lO2M/TYqeLViwG/x4+oyqmq76B1EKHARGiWKcULC6AM71J/K88N5LNGgPuIptICZ WD4s9T4ntyr0p7E9Pmzom0+UMIFeATD0AkcO2vUiAYxqtl4EqFS8u3EaQgdnDxhj 7DJZqPSPu3FVXXCSrtuIeoHffmvvI8IgmBD3P4Ff04p9ZKV5kXTVOPDiytn9PSkJ xCUXX5u0fQFU0hPxFz3wwKvwdv/iA00K3NcZUqCrojRdKvVUQsOPvgG++6LTg6jp bS8/H56rwgZnNtCaZ6wvVWDAkS0ijDRF/nKunH0orcx+c4DRZN+Xw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejmsxbqf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:18 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKE8luX005133; Thu, 20 Nov 2025 17:16:17 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af5bkfm2x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:17 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGDp860883374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:13 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1BC2B2004D; Thu, 20 Nov 2025 17:16:13 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53CAE2004B; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 14/23] KVM: s390: New gmap code Date: Thu, 20 Nov 2025 18:15:35 +0100 Message-ID: <20251120171544.96841-15-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 5_k6svsnudG5Jv-eLSbEAE6_yk6puhqv X-Authority-Analysis: v=2.4 cv=Rv3I7SmK c=1 sm=1 tr=0 ts=691f4ce2 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=a_3Y2duSdAakELC2MvUA:9 a=xtKt4m9Umsf5W5BQ:21 X-Proofpoint-GUID: 5_k6svsnudG5Jv-eLSbEAE6_yk6puhqv X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX/OIRWMbDyFmp +dyyuCkpWBsSBSwizYuCF5KLkAlg8E3+EqBwDczfCR/64DrBqMM8hg/UHDiQblA7QKeTMpwFhbY gx5/DT6HANHH6LelYltzX8XgKWlxn9CK07yuaLuAxvo57M+m7HI/+tLXNd00I/hiRFCCiul89hz XZdqkFZwMrjC0qsm1yTil/6UMk4O4bPj1pWLlzDKpyKhjZrY0OUP1eWje/nomb2Juym8jOp3rSy fWpGlrWXTJjWZz4J3nIk1g6jqHDLFW3aF4D220vxylDsEsbLcRa4Mft5n5ktcpkQOtcAkl1/k0m VWXovMnREe7hBTKbm7hIV08UgeJ9wupxSufjz9L5UJS+0TzGfAGHFgaKXV+jmn9p0KOneH0IaUh kM+VuAty6s1cdrMe654rYmYaZjZp+g== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" New gmap (guest map) code. This new gmap code will only be used by KVM. This will replace the existing gmap. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/gmap.c | 1062 ++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/gmap.h | 165 +++++++ 3 files changed, 1228 insertions(+), 1 deletion(-) create mode 100644 arch/s390/kvm/gmap.c create mode 100644 arch/s390/kvm/gmap.h diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 84315d2f75fb..21088265402c 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o +kvm-y +=3D dat.o gmap.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c new file mode 100644 index 000000000000..29ce8df697dd --- /dev/null +++ b/arch/s390/kvm/gmap.c @@ -0,0 +1,1062 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Guest memory management for KVM/s390 + * + * Copyright IBM Corp. 2008, 2020, 2024 + * + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dat.h" +#include "gmap.h" +#include "kvm-s390.h" + +static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.sie_block->prog0c & PROG_IN_SIE; +} + +static int gmap_limit_to_type(gfn_t limit) +{ + if (!limit) + return TABLE_TYPE_REGION1; + if (limit <=3D _REGION3_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_SEGMENT; + if (limit <=3D _REGION2_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION3; + if (limit <=3D _REGION1_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION2; + return TABLE_TYPE_REGION1; +} + +/** + * gmap_alloc - allocate and initialize a guest address space + * @limit: maximum address of the gmap address space + * + * Returns a guest address space structure. + */ +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit) +{ + struct crst_table *table; + struct gmap *gmap; + int type; + + type =3D gmap_limit_to_type(limit); + + gmap =3D kzalloc(sizeof(*gmap), GFP_KERNEL_ACCOUNT); + if (!gmap) + return NULL; + INIT_LIST_HEAD(&gmap->children); + INIT_LIST_HEAD(&gmap->list); + INIT_LIST_HEAD(&gmap->scb_users); + INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_KVM_S390_MMU_CACHE); + spin_lock_init(&gmap->children_lock); + spin_lock_init(&gmap->host_to_rmap_lock); + + table =3D dat_alloc_crst_sleepable(_CRSTE_EMPTY(type).val); + if (!table) { + kfree(gmap); + return NULL; + } + + gmap->asce.val =3D __pa(table); + gmap->asce.dt =3D type; + gmap->asce.tl =3D _ASCE_TABLE_LENGTH; + gmap->asce.x =3D 1; + gmap->asce.p =3D 1; + gmap->asce.s =3D 1; + gmap->kvm =3D kvm; + gmap->owns_page_tables =3D 1; + + return gmap; +} + +static void gmap_add_child(struct gmap *parent, struct gmap *child) +{ + KVM_BUG_ON(parent && parent->is_ucontrol && parent->parent, parent->kvm); + KVM_BUG_ON(parent && parent->is_ucontrol && !parent->owns_page_tables, pa= rent->kvm); + lockdep_assert_held(&parent->children_lock); + + child->parent =3D parent; + child->is_ucontrol =3D parent->is_ucontrol; + child->allow_hpage_1m =3D parent->allow_hpage_1m; + if (kvm_is_ucontrol(parent->kvm)) + child->owns_page_tables =3D 0; + list_add(&child->list, &parent->children); +} + +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit) +{ + struct gmap *res; + + lockdep_assert_not_held(&parent->children_lock); + res =3D gmap_new(parent->kvm, limit); + if (res) { + scoped_guard(spinlock, &parent->children_lock) + gmap_add_child(parent, res); + } + return res; +} + +int gmap_set_limit(struct gmap *gmap, gfn_t limit) +{ + struct kvm_s390_mmu_cache *mc; + int rc, type; + + type =3D gmap_limit_to_type(limit); + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + rc =3D dat_set_asce_limit(mc, &gmap->asce, type); + } while (rc =3D=3D -ENOMEM); + + kvm_s390_free_mmu_cache(mc); + return 0; +} + +static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct radix_tree_iter iter; + unsigned long indices[16]; + unsigned long index; + void __rcu **slot; + int i, nr; + + /* A radix tree is freed by deleting all of its entries */ + index =3D 0; + do { + nr =3D 0; + radix_tree_for_each_slot(slot, root, &iter, index) { + indices[nr] =3D iter.index; + if (++nr =3D=3D 16) + break; + } + for (i =3D 0; i < nr; i++) { + index =3D indices[i]; + head =3D radix_tree_delete(root, index); + gmap_for_each_rmap_safe(rmap, rnext, head) + kfree(rmap); + } + } while (nr > 0); +} + +void gmap_remove_child(struct gmap *child) +{ + if (KVM_BUG_ON(!child->parent, child->kvm)) + return; + lockdep_assert_held(&child->parent->children_lock); + + list_del(&child->list); + child->parent =3D NULL; +} + +/** + * gmap_dispose - remove and free a guest address space and its children + * @gmap: pointer to the guest address space structure + */ +void gmap_dispose(struct gmap *gmap) +{ + /* The gmap must have been removed from the parent beforehands */ + KVM_BUG_ON(gmap->parent, gmap->kvm); + /* All children of this gmap must have been removed beforehands*/ + KVM_BUG_ON(!list_empty(&gmap->children), gmap->kvm); + /* No VSIE shadow block is allowed to use this gmap */ + KVM_BUG_ON(!list_empty(&gmap->scb_users), gmap->kvm); + KVM_BUG_ON(!gmap->asce.val, gmap->kvm); + + /* Flush tlb of all gmaps */ + asce_flush_tlb(gmap->asce); + + /* Free all DAT tables. */ + dat_free_level(dereference_asce(gmap->asce), gmap->owns_page_tables); + + /* Free additional data for a shadow gmap */ + if (gmap->is_shadow) + gmap_rmap_radix_tree_free(&gmap->host_to_rmap); + + kfree(gmap); +} + +/** + * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py + * @gmap: the gmap whose ASCE needs to be replaced + * + * If the ASCE is a SEGMENT type then this function will return -EINVAL, + * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng + * to the wrong pages, causing use-after-free and memory corruption. + * If the allocation of the new top level page table fails, the ASCE is not + * replaced. + * In any case, the old ASCE is always removed from the gmap CRST list. + * Therefore the caller has to make sure to save a pointer to it + * beforehand, unless a leak is actually intended. + */ +int s390_replace_asce(struct gmap *gmap) +{ + struct crst_table *table; + union asce asce; + + /* Replacing segment type ASCEs would cause serious issues */ + if (gmap->asce.dt =3D=3D ASCE_TYPE_SEGMENT) + return -EINVAL; + + table =3D dat_alloc_crst_sleepable(0); + if (!table) + return -ENOMEM; + memcpy(table, dereference_asce(gmap->asce), sizeof(*table)); + + /* Set new table origin while preserving existing ASCE control bits */ + asce =3D gmap->asce; + asce.rsto =3D virt_to_pfn(table); + WRITE_ONCE(gmap->asce, asce); + + return 0; +} + +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint) +{ + struct kvm *kvm =3D gmap->kvm; + struct kvm_vcpu *vcpu; + gfn_t prefix_gfn; + unsigned long i; + + if (gmap->is_shadow) + return false; + kvm_for_each_vcpu(i, vcpu, kvm) { + /* match against both prefix pages */ + prefix_gfn =3D gpa_to_gfn(kvm_s390_get_prefix(vcpu)); + if (prefix_gfn < end && gfn <=3D prefix_gfn + 1) { + if (hint && kvm_s390_is_in_sie(vcpu)) + return false; + VCPU_EVENT(vcpu, 2, "gmap notifier for %llx-%llx", + gfn_to_gpa(gfn), gfn_to_gpa(end)); + kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); + } + } + return true; +} + +struct clear_young_pte_priv { + struct gmap *gmap; + bool young; +}; + +static long gmap_clear_young_pte(union pte *ptep, gfn_t gfn, gfn_t end, st= ruct dat_walk *walk) +{ + struct clear_young_pte_priv *p =3D walk->priv; + union pgste pgste; + union pte pte, new; + + pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (!pte.s.y && pte.h.i)) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (!pgste.prefix_notif || gmap_mkold_prefix(p->gmap, gfn, end)) { + new =3D pte; + new.h.i =3D 1; + new.s.y =3D 0; + if ((new.s.d || !new.h.p) && !new.s.s) + folio_set_dirty(pfn_folio(pte.h.pfra)); + new.s.d =3D 0; + new.h.p =3D 1; + + pgste.prefix_notif =3D 0; + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, walk->asce, p->gmap->us= es_skeys); + } + p->young =3D 1; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t e= nd, struct dat_walk *walk) +{ + struct clear_young_pte_priv *priv =3D walk->priv; + union crste crste, new; + + crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc) + return 0; + if (!crste.s.fc1.y && crste.h.i) + return 0; + if (!crste_prefix(crste) || gmap_mkold_prefix(priv->gmap, gfn, end)) { + new =3D crste; + new.h.i =3D 1; + new.s.fc1.y =3D 0; + new.s.fc1.prefix_notif =3D 0; + if (new.s.fc1.d || !new.h.p) + folio_set_dirty(phys_to_folio(crste_origin_large(crste))); + new.s.fc1.d =3D 0; + new.h.p =3D 1; + dat_crstep_xchg(crstep, new, gfn, walk->asce); + } + priv->young =3D 1; + return 0; +} + +/** + * gmap_age_gfn() - clear young + * @gmap: the guest gmap + * @start: the first gfn to test + * @end: the gfn after the last one to test + * + * Context: called with the kvm mmu write lock held + * Return: 1 if any page in the given range was young, otherwise 0. + */ +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D gmap_clear_young_pte, + .pmd_entry =3D gmap_clear_young_crste, + .pud_entry =3D gmap_clear_young_crste, + }; + struct clear_young_pte_priv priv =3D { + .gmap =3D gmap, + .young =3D false, + }; + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + + return priv.young; +} + +struct gmap_unmap_priv { + struct gmap *gmap; + struct kvm_memory_slot *slot; +}; + +static long _gmap_unmap_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct= dat_walk *w) +{ + struct gmap_unmap_priv *priv =3D w->priv; + unsigned long vmaddr; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (ptep->s.pr && pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED) { + vmaddr =3D __gfn_to_hva_memslot(priv->slot, gfn); + gmap_helper_try_set_pte_unused(priv->gmap->kvm->mm, vmaddr); + } + pgste =3D gmap_ptep_xchg(priv->gmap, ptep, _PTE_EMPTY, pgste, gfn); + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long _gmap_unmap_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap_unmap_priv *priv =3D walk->priv; + + if (crstep->h.fc) + gmap_crstep_xchg(priv->gmap, crstep, _CRSTE_EMPTY(crstep->h.tt), gfn); + + return 0; +} + +/** + * gmap_unmap_gfn_range() - Unmap a range of guest addresses + * @gmap: the gmap to act on + * @start: the first gfn to unmap + * @end: the gfn after the last one to unmap + * + * Context: called with the kvm mmu write lock held + * Return: false + */ +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _gmap_unmap_pte, + .pmd_entry =3D _gmap_unmap_crste, + .pud_entry =3D _gmap_unmap_crste, + }; + struct gmap_unmap_priv priv =3D { + .gmap =3D gmap, + .slot =3D slot, + }; + + lockdep_assert_held_write(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + return false; +} + +static union pgste __pte_test_and_clear_softdirty(union pte *ptep, union p= gste pgste, gfn_t gfn, + struct gmap *gmap) +{ + union pte pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (pte.h.p && !pte.s.sd)) + return pgste; + + /* + * If this page contains one or more prefixes of vCPUS that are currently + * running, do not reset the protection, leave it marked as dirty. + */ + if (!pgste.prefix_notif || gmap_mkold_prefix(gmap, gfn, gfn + 1)) { + pte.h.p =3D 1; + pte.s.sd =3D 0; + pgste =3D gmap_ptep_xchg(gmap, ptep, pte, pgste, gfn); + } + + mark_page_dirty(gmap->kvm, gfn); + + return pgste; +} + +static long _pte_test_and_clear_softdirty(union pte *ptep, gfn_t gfn, gfn_= t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __pte_test_and_clear_softdirty(ptep, pgste, gfn, gmap); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long _crste_test_and_clear_softdirty(union crste *table, gfn_t gfn,= gfn_t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, new; + + if (fatal_signal_pending(current)) + return 1; + crste =3D READ_ONCE(*table); + if (!crste.h.fc) + return 0; + if (crste.h.p && !crste.s.fc1.sd) + return 0; + + /* + * If this large page contains one or more prefixes of vCPUs that are + * currently running, do not reset the protection, leave it marked as + * dirty. + */ + if (!crste.s.fc1.prefix_notif || gmap_mkold_prefix(gmap, gfn, end)) { + new =3D crste; + new.h.p =3D 1; + new.s.fc1.sd =3D 0; + gmap_crstep_xchg(gmap, table, new, gfn); + } + + for ( ; gfn < end; gfn++) + mark_page_dirty(gmap->kvm, gfn); + + return 0; +} + +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops walk_ops =3D { + .pte_entry =3D _pte_test_and_clear_softdirty, + .pmd_entry =3D _crste_test_and_clear_softdirty, + .pud_entry =3D _crste_test_and_clear_softdirty, + }; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &walk_ops, 0, gmap); +} + +static int gmap_handle_minor_crste_fault(union asce asce, struct guest_fau= lt *f) +{ + union crste newcrste, oldcrste =3D READ_ONCE(*f->crstep); + + /* Somehow the crste is not large anymore, let the slow path deal with it= */ + if (!oldcrste.h.fc) + return 1; + + f->pfn =3D PHYS_PFN(large_crste_to_phys(oldcrste, f->gfn)); + f->writable =3D oldcrste.s.fc1.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldcrste.h.i && !(f->write_attempt && oldcrste.h.p)) + return 0; + + if (!f->write_attempt || oldcrste.s.fc1.w) { + f->write_attempt |=3D oldcrste.s.fc1.w && oldcrste.s.fc1.d; + newcrste =3D oldcrste; + newcrste.h.i =3D 0; + newcrste.s.fc1.y =3D 1; + if (f->write_attempt) { + newcrste.h.p =3D 0; + newcrste.s.fc1.d =3D 1; + newcrste.s.fc1.sd =3D 1; + } + if (!oldcrste.s.fc1.d && newcrste.s.fc1.d) + SetPageDirty(phys_to_page(crste_origin_large(newcrste))); + /* In case of races, let the slow path deal with it */ + return !dat_crstep_xchg_atomic(f->crstep, oldcrste, newcrste, f->gfn, as= ce); + } + /* Trying to write on a read-only page, let the slow path deal with it */ + return 1; +} + +static int _gmap_handle_minor_pte_fault(struct gmap *gmap, union pgste *pg= ste, + struct guest_fault *f) +{ + union pte newpte, oldpte =3D READ_ONCE(*f->ptep); + + f->pfn =3D oldpte.h.pfra; + f->writable =3D oldpte.s.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldpte.h.i && !(f->write_attempt && oldpte.h.p)) + return 0; + /* Trying to write on a read-only page, let the slow path deal with it */ + if (!oldpte.s.pr || (f->write_attempt && !oldpte.s.w)) + return 1; + + newpte =3D oldpte; + newpte.h.i =3D 0; + newpte.s.y =3D 1; + if (f->write_attempt) { + newpte.h.p =3D 0; + newpte.s.d =3D 1; + newpte.s.sd =3D 1; + } + if (!oldpte.s.d && newpte.s.d) + SetPageDirty(pfn_to_page(newpte.h.pfra)); + *pgste =3D gmap_ptep_xchg(gmap, f->ptep, newpte, *pgste, f->gfn); + + return 0; +} + +/** + * gmap_try_fixup_minor() -- Try to fixup a minor gmap fault. + * @gmap: the gmap whose fault needs to be resolved. + * @gfn: the faulting address. + * @wr: whether the fault was caused by a write access. + * + * A minor fault is a fault that can be resolved quickly within gmap. + * The page is already mapped, the fault is only due to dirty/young tracki= ng. + * + * Return: 0 in case of success, < 0 in case of error, > 0 if the fault co= uld + * not be resolved and needs to go through the slow path. + */ +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault) +{ + union pgste pgste; + int rc; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(NULL, fault->gfn, gmap->asce, DAT_WALK_LEAF, TABLE_= TYPE_PAGE_TABLE, + &fault->crstep, &fault->ptep); + /* If a PTE or a leaf CRSTE could not be reached, slow path */ + if (rc) + return 1; + + if (fault->ptep) { + pgste =3D pgste_get_lock(fault->ptep); + rc =3D _gmap_handle_minor_pte_fault(gmap, &pgste, fault); + if (!rc && fault->callback) + fault->callback(fault); + pgste_set_unlock(fault->ptep, pgste); + } else { + rc =3D gmap_handle_minor_crste_fault(gmap->asce, fault); + if (!rc && fault->callback) + fault->callback(fault); + } + return rc; +} + +static inline bool gmap_2g_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) +{ + unsigned int order; + int rc, level; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + level =3D TABLE_TYPE_PAGE_TABLE; + if (f->page) { + order =3D folio_order(page_folio(f->page)); + if (order >=3D get_order(_REGION3_SIZE) && gmap_2g_allowed(gmap, f->gfn)) + level =3D TABLE_TYPE_REGION3; + else if (order >=3D get_order(_SEGMENT_SIZE) && gmap_1m_allowed(gmap, f-= >gfn)) + level =3D TABLE_TYPE_SEGMENT; + } + rc =3D dat_link(mc, gmap->asce, level, gmap->uses_skeys, f); + KVM_BUG_ON(rc =3D=3D -EINVAL, gmap->kvm); + return rc; +} + +static int gmap_ucas_map_one(struct kvm_s390_mmu_cache *mc, struct gmap *g= map, + gfn_t p_gfn, gfn_t c_gfn) +{ + struct page_table *pt; + union crste *crstep; + union pte *ptep; + int rc; + + guard(read_lock)(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(mc, p_gfn, gmap->parent->asce, DAT_WALK_ALLOC, TABL= E_TYPE_PAGE_TABLE, + &crstep, &ptep); + if (rc) + return rc; + pt =3D pte_table_start(ptep); + dat_set_ptval(pt, PTVAL_VMADDR, p_gfn >> (_SEGMENT_SHIFT - PAGE_SHIFT)); + + rc =3D dat_entry_walk(mc, c_gfn, gmap->asce, DAT_WALK_ALLOC, TABLE_TYPE_S= EGMENT, + &crstep, &ptep); + if (rc) + return rc; + dat_crstep_xchg(crstep, _crste_fc0(virt_to_pfn(pt), TABLE_TYPE_SEGMENT), = c_gfn, gmap->asce); + return 0; +} + +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count) +{ + struct kvm_s390_mmu_cache *mc; + int rc; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + while (count) { + rc =3D gmap_ucas_map_one(mc, gmap, p_gfn, c_gfn); + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + break; + continue; + } + if (rc) + break; + + count--; + c_gfn +=3D _PAGE_ENTRIES; + p_gfn +=3D _PAGE_ENTRIES; + } + return rc; +} + +static void gmap_ucas_unmap_one(struct gmap *gmap, gfn_t c_gfn) +{ + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, c_gfn, gmap->asce, 0, TABLE_TYPE_SEGMENT, &cr= step, &ptep); + if (!rc) + dat_crstep_xchg(crstep, _PMD_EMPTY, c_gfn, gmap->asce); +} + +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count) +{ + guard(read_lock)(&gmap->kvm->mmu_lock); + + for ( ; count; count--, c_gfn +=3D _PAGE_ENTRIES) + gmap_ucas_unmap_one(gmap, c_gfn); +} + +static long _gmap_split_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, newcrste; + + crste =3D READ_ONCE(*crstep); + newcrste =3D _CRSTE_EMPTY(crste.h.tt); + + while (crste_leaf(crste)) { + if (crste_prefix(crste)) + gmap_unmap_prefix(gmap, gfn, next); + if (crste.s.fc1.vsie_notif) + gmap_handle_vsie_unshadow_event(gmap, gfn); + if (dat_crstep_xchg_atomic(crstep, crste, newcrste, gfn, walk->asce)) + break; + crste =3D READ_ONCE(*crstep); + } + + if (need_resched()) + return next; + + return 0; +} + +void gmap_split_huge_pages(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { + .pmd_entry =3D _gmap_split_crste, + .pud_entry =3D _gmap_split_crste, + }; + gfn_t start =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, asce_end(gmap->asce), gmap->asce, + &ops, DAT_WALK_IGN_HOLES, gmap); + cond_resched(); + } while (start); +} + +static int _gmap_enable_skeys(struct gmap *gmap) +{ + gfn_t start =3D 0; + int rc; + + if (mm_uses_skeys(gmap->kvm->mm)) + return 0; + + gmap->kvm->mm->context.uses_skeys =3D 1; + rc =3D gmap_helper_disable_cow_sharing(); + if (rc) { + gmap->kvm->mm->context.uses_skeys =3D 0; + return rc; + } + + do { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + start =3D dat_reset_skeys(gmap->asce, start); + cond_resched(); + } while (start); + return 0; +} + +int gmap_enable_skeys(struct gmap *gmap) +{ + int rc; + + mmap_write_lock(gmap->kvm->mm); + rc =3D _gmap_enable_skeys(gmap); + mmap_write_unlock(gmap->kvm->mm); + return rc; +} + +static long _destroy_pages_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + if (!ptep->s.pr) + return 0; + __kvm_s390_pv_destroy_page(phys_to_page(pte_origin(*ptep))); + if (need_resched()) + return next; + return 0; +} + +static long _destroy_pages_crste(union crste *crstep, gfn_t gfn, gfn_t nex= t, struct dat_walk *walk) +{ + phys_addr_t origin, cur, end; + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + origin =3D crste_origin_large(*crstep); + cur =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + for ( ; cur < end; cur +=3D PAGE_SIZE) + __kvm_s390_pv_destroy_page(phys_to_page(cur)); + if (need_resched()) + return next; + return 0; +} + +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _destroy_pages_pte, + .pmd_entry =3D _destroy_pages_crste, + .pud_entry =3D _destroy_pages_crste, + }; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, end, gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + if (interruptible && fatal_signal_pending(current)) + return -EINTR; + cond_resched(); + } while (start && start < end); + return 0; +} + +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level) +{ + struct vsie_rmap *temp, *rmap; + void __rcu **slot; + int rc; + + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + lockdep_assert_held(&sg->host_to_rmap_lock); + + rc =3D -ENOMEM; + rmap =3D kzalloc(sizeof(*rmap), GFP_ATOMIC); + if (!rmap) + goto out; + + rc =3D 0; + rmap->r_gfn =3D r_gfn; + rmap->level =3D level; + slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, p_gfn); + if (slot) { + rmap->next =3D radix_tree_deref_slot_protected(slot, &sg->host_to_rmap_l= ock); + for (temp =3D rmap->next; temp; temp =3D temp->next) { + if (temp->val =3D=3D rmap->val) + goto out; + } + radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); + } else { + rmap->next =3D NULL; + rc =3D radix_tree_insert(&sg->host_to_rmap, p_gfn, rmap); + if (rc) + goto out; + } + rmap =3D NULL; +out: + kfree(rmap); + return rc; +} + +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + union pte pte; + int flags, rc; + + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + lockdep_assert_held(&sg->parent->children_lock); + + flags =3D DAT_WALK_SPLIT_ALLOC | (sg->parent->uses_skeys ? DAT_WALK_USES_= SKEYS : 0); + rc =3D dat_entry_walk(mc, p_gfn, sg->parent->asce, flags, + TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) + return rc; + if (level <=3D TABLE_TYPE_REGION1) { + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, p_gfn, r_gfn, level); + } + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep); + pte =3D ptep->s.pr ? *ptep : _pte(pfn, wr, false, false); + pte.h.p =3D 1; + if (pgste.vsie_notif) { + _gmap_handle_vsie_unshadow_event(sg->parent, p_gfn); + pgste.vsie_notif =3D 0; + } + pgste =3D gmap_ptep_xchg(sg->parent, ptep, pte, pgste, p_gfn); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + __atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val); + if (need_resched()) + return next; + return 0; +} + +void gmap_set_cmma_all_dirty(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __set_cmma_dirty_pte, = }; + gfn_t gfn =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + gfn =3D _dat_walk_gfn_range(gfn, asce_end(gmap->asce), gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + cond_resched(); + } while (gfn); +} + +static void gmap_unshadow_level(struct gmap *sg, gfn_t r_gfn, int level) +{ + unsigned long align =3D PAGE_SIZE; + gpa_t gaddr =3D gfn_to_gpa(r_gfn); + union crste *crstep; + union crste crste; + union pte *ptep; + + if (level > TABLE_TYPE_PAGE_TABLE) + align =3D 1UL << (11 * level + _SEGMENT_SHIFT); + kvm_s390_vsie_gmap_notifier(sg, ALIGN_DOWN(gaddr, align), ALIGN(gaddr + 1= , align)); + if (dat_entry_walk(NULL, r_gfn, sg->asce, 0, level, &crstep, &ptep)) + return; + if (ptep) { + dat_ptep_xchg(ptep, _PTE_EMPTY, r_gfn, sg->asce, sg->uses_skeys); + return; + } + crste =3D READ_ONCE(*crstep); + dat_crstep_clear(crstep, r_gfn, sg->asce); + if (is_pmd(crste)) + dat_free_pt(dereference_pmd(crste.pmd)); + else + dat_free_level(dereference_crste(crste), true); +} + +static void gmap_unshadow(struct gmap *sg) +{ + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + KVM_BUG_ON(!sg->parent, sg->kvm); + KVM_BUG_ON(sg->removed, sg->kvm); + + lockdep_assert_held(&sg->parent->children_lock); + + sg->removed =3D 1; + kvm_s390_vsie_gmap_notifier(sg, 0, -1UL); + + if (list_empty(&sg->scb_users), sg->kvm) { + gmap_remove_child(sg); + gmap_dispose(sg); + } +} + +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct gmap *sg, *next; + gfn_t start, end; + + list_for_each_entry_safe(sg, next, &parent->children, list) { + start =3D sg->guest_asce.rsto; + end =3D start + sg->guest_asce.tl + 1; + if (!sg->guest_asce.r && gfn >=3D start && gfn < end) { + gmap_unshadow(sg); + continue; + } + scoped_guard(spinlock, &sg->host_to_rmap_lock) + head =3D radix_tree_delete(&sg->host_to_rmap, gfn); + gmap_for_each_rmap_safe(rmap, rnext, head) + gmap_unshadow_level(sg, rmap->r_gfn, rmap->level); + } +} + +/** + * gmap_find_shadow - find a specific asce in the list of shadow tables + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * Returns the pointer to a gmap if a shadow table with the given asce is + * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, + * otherwise NULL + * + * Context: Called with parent->children_lock held + */ +static struct gmap *gmap_find_shadow(struct gmap *parent, union asce asce,= int edat_level) +{ + struct gmap *sg; + + lockdep_assert_held(&parent->children_lock); + list_for_each_entry(sg, &parent->children, list) { + if (!gmap_is_shadow_valid(sg, asce, edat_level)) + continue; + if (!sg->initialized) + return ERR_PTR(-EAGAIN); + return sg; + } + return NULL; +} + +static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) +{ + KVM_BUG_ON(1, sg->kvm); + return -EINVAL; +} + +/** + * gmap_create_shadow() - create/find a shadow guest address space + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * The pages of the top level page table referred by the asce parameter + * will be set to read-only and marked in the PGSTEs of the kvm process. + * The shadow table will be removed automatically on any change to the + * PTE mapping for the source table. + * + * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, + * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the + * parent gmap table could not be protected. + */ +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *parent, + union asce asce, int edat_level) +{ + struct gmap *sg, *new; + int rc; + + scoped_guard(spinlock, &parent->children_lock) + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) + return sg; + /* Create a new shadow gmap */ + new =3D gmap_new(parent->kvm, asce.r ? 1UL << (64 - PAGE_SHIFT) : asce_en= d(asce)); + if (!new) + return ERR_PTR(-ENOMEM); + new->guest_asce =3D asce; + new->edat_level =3D edat_level; + new->initialized =3D false; + new->is_shadow =3D true; + new->parent =3D parent; + + scoped_guard(spinlock, &parent->children_lock) { + /* Recheck if another CPU created the same shadow */ + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) { + gmap_dispose(new); + return sg; + } + if (asce.r) { + /* only allow one real-space gmap shadow */ + list_for_each_entry(sg, &parent->children, list) { + if (sg->guest_asce.r) { + scoped_guard(write_lock, &parent->kvm->mmu_lock) + gmap_unshadow(sg); + break; + } + } + new->initialized =3D true; + gmap_add_child(parent, new); + /* nothing to protect, return right away */ + return new; + } + } + + /* protect after insertion, so it will get properly invalidated */ + rc =3D gmap_protect_asce_top_level(mc, new); + if (rc) { + gmap_dispose(new); + return ERR_PTR(rc); + } + return new; +} diff --git a/arch/s390/kvm/gmap.h b/arch/s390/kvm/gmap.h new file mode 100644 index 000000000000..dcfd8d213321 --- /dev/null +++ b/arch/s390/kvm/gmap.h @@ -0,0 +1,165 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2016, 2025 + * Author(s): Martin Schwidefsky + * Claudio Imbrenda + */ + +#ifndef ARCH_KVM_S390_GMAP_H +#define ARCH_KVM_S390_GMAP_H + +#include "dat.h" + +/** + * struct gmap_struct - guest address space + * @is_shadow: whether this gmap is a vsie shadow gmap + * @owns_page_tables: whether this gmap owns all dat levels; normally 1, i= s 0 + * only for ucontrol per-cpu gmaps, since they share th= e page + * tables with the main gmap. + * @is_ucontrol: whether this gmap is ucontrol (main gmap or per-cpu gmap) + * @allow_hpage_1m: whether 1M hugepages are allowed for this gmap, + * independently of whatever page size is used by userspa= ce + * @allow_hpage_2g: whether 2G hugepages are allowed for this gmap, + * independently of whatever page size is used by userspa= ce + * @pfault_enabled: whether pfault is enabled for this gmap + * @removed: whether this shadow gmap is about to be disposed of + * @initialized: flag to indicate if a shadow guest address space can be u= sed + * @uses_skeys: indicates if the guest uses storage keys + * @uses_cmm: indicates if the guest uses cmm + * @edat_level: the edat level of this shadow gmap + * @kvm: the vm + * @asce: the ASCE used by this gmap + * @list: list head used in children gmaps for the children gmap list + * @children_lock: protects children and scb_users + * @children: list of child gmaps of this gmap + * @scb_users: list of vsie_scb that use this shadow gmap + * @parent: parent gmap of a child gmap + * @guest_asce: original ASCE of this shadow gmap + * @host_to_rmap_lock: protects host_to_rmap + * @host_to_rmap: radix tree mapping host addresses to guest addresses + */ +struct gmap { + unsigned char is_shadow:1; + unsigned char owns_page_tables:1; + unsigned char is_ucontrol:1; + bool allow_hpage_1m; + bool allow_hpage_2g; + bool pfault_enabled; + bool removed; + bool initialized; + bool uses_skeys; + bool uses_cmm; + unsigned char edat_level; + struct kvm *kvm; + union asce asce; + struct list_head list; + spinlock_t children_lock; /* protects: children, scb_users */ + struct list_head children; + struct list_head scb_users; + struct gmap *parent; + union asce guest_asce; + spinlock_t host_to_rmap_lock; /* protects host_to_rmap */ + struct radix_tree_root host_to_rmap; +}; + +#define gmap_for_each_rmap_safe(pos, n, head) \ + for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) + +int s390_replace_asce(struct gmap *gmap); +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint= ); +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end); +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end); +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault); +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit); +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit); +void gmap_remove_child(struct gmap *child); +void gmap_dispose(struct gmap *gmap); +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *fault); +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end); +int gmap_set_limit(struct gmap *gmap, gfn_t limit); +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count); +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count); +int gmap_enable_skeys(struct gmap *gmap); +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible); +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level); +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr); +void gmap_set_cmma_all_dirty(struct gmap *gmap); +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn); +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *gmap, + union asce asce, int edat_level); +void gmap_split_huge_pages(struct gmap *gmap); + +static inline void gmap_handle_vsie_unshadow_event(struct gmap *parent, gf= n_t gfn) +{ + scoped_guard(spinlock, &parent->children_lock) + _gmap_handle_vsie_unshadow_event(parent, gfn); +} + +static inline bool gmap_mkold_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, true); +} + +static inline bool gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, false); +} + +static inline union pgste gmap_ptep_xchg(struct gmap *gmap, union pte *pte= p, union pte newpte, + union pgste pgste, gfn_t gfn) +{ + lockdep_assert_held(&gmap->kvm->mmu_lock); + + if (pgste.prefix_notif && (newpte.h.p || newpte.h.i)) { + pgste.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + 1); + } + if (pgste.vsie_notif && (ptep->h.p !=3D newpte.h.p || newpte.h.i)) { + pgste.vsie_notif =3D 0; + gmap_handle_vsie_unshadow_event(gmap, gfn); + } + return __dat_ptep_xchg(ptep, pgste, newpte, gfn, gmap->asce, gmap->uses_s= keys); +} + +static inline void gmap_crstep_xchg(struct gmap *gmap, union crste *crstep= , union crste ne, + gfn_t gfn) +{ + unsigned long align =3D 8 + (is_pmd(*crstep) ? 0 : 11); + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + gfn =3D ALIGN_DOWN(gfn, align); + if (crste_prefix(*crstep) && (ne.h.p || ne.h.i || !crste_prefix(ne))) { + ne.s.fc1.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + align); + } + if (crste_leaf(*crstep) && crstep->s.fc1.vsie_notif && + (ne.h.p || ne.h.i || !ne.s.fc1.vsie_notif)) { + ne.s.fc1.vsie_notif =3D 0; + gmap_handle_vsie_unshadow_event(gmap, gfn); + } + dat_crstep_xchg(crstep, ne, gfn, gmap->asce); +} + +/** + * gmap_is_shadow_valid() - check if a shadow guest address space matches = the + * given properties and is still valid + * @sg: pointer to the shadow guest address space structure + * @asce: ASCE for which the shadow table is requested + * @edat_level: edat level to be used for the shadow translation + * + * Returns true if the gmap shadow is still valid and matches the given + * properties, the caller can continue using it. Returns false otherwise; = the + * caller has to request a new shadow gmap in this case. + */ +static inline bool gmap_is_shadow_valid(struct gmap *sg, union asce asce, = int edat_level) +{ + if (sg->removed) + return false; + return sg->guest_asce.val =3D=3D asce.val && sg->edat_level =3D=3D edat_l= evel; +} + +#endif /* ARCH_KVM_S390_GMAP_H */ --=20 2.51.1