From nobody Tue Dec 2 02:05:53 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 462FF3A9BE9; Thu, 20 Nov 2025 17:16:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658980; cv=none; b=LTjCUHTYcZIvkSOwYqq05CRryql/VGGjq9cYy7xhXLFUBqyWoTIPwN9pbwnwEutxOrAL8b9cx3CIH+Cz+stYjXgyiMCpnjQE7udH5qT1OJKfkv6IH7jXa82NbO2rffBwIZEbg2MDZS7gtcpggfT79515/L5RqlEEAs0jBW5EaVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658980; c=relaxed/simple; bh=up24uzi92wniVe0oZiNVa7yr9lo8md0w9iqib6KQSts=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BMuxX2r2OsRwjBZUIOO3YO8pbAOdsrn1gCN2kM/YMnGz7I48ClUFUZzDcaDZngJwzyTd1a2y6xRBmykn1JEl5UtiMTxGlSEZ9q4qCIpDYowG0LlwYH9Qk37kjcQ0uha47LRb7AEezPC/aVDXQ1LqXZAk8lM3A81GTMsR7D+uHcQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=RJXMHgxa; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="RJXMHgxa" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDbT3u002688; Thu, 20 Nov 2025 17:16:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=k3QDu3p5g30wFYhxk nJJrwMZhiIueD71+gxGCNq065Q=; b=RJXMHgxaxIh1LT6V0oN+fzt5nZIEGi5Ro +dqvTnjvbMCFO6B9PT5F18R12zSY98WvaqPIO7SV8x2GV98PqkE+bsqBWoduUSmj +1FP+ACnGifWRaizUX5tYXUV+uIbhku2ywCAYVNHyboAT64H0NLTurtlVEU2vv+S gb55zYuJPz+CD+nBrezmrto9b1Y9/lDEOXvX85hsWXid+mdHurzyAPDG/b11QHMO lGDvMkW4Qy1AQ26S7CDjXNZKjKKtkwkm5CoTr+mClocDx7yM+1ZDZFvJOB2qY0pb YpP3YpSSqAF9fFOoaX429Yq+KwrLmrE2z2rSuDRU21rOhXkGglCiw== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju685x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:15 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEotn7017404; Thu, 20 Nov 2025 17:16:14 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af6j1ybqj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:14 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGB8524642220 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:11 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2686720040; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8BAA72004B; Thu, 20 Nov 2025 17:16:09 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:09 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 13/23] KVM: s390: KVM page table management functions: CMMA Date: Thu, 20 Nov 2025 18:15:34 +0100 Message-ID: <20251120171544.96841-14-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: l3SZ3bMAdq1PUdQtIMWgeT_tY8Q8-h4S X-Proofpoint-ORIG-GUID: l3SZ3bMAdq1PUdQtIMWgeT_tY8Q8-h4S X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX5Gylqtd2TJPX xKu0lxJO3gnNzW6WJdfmwb/B5SDowzWUo7recifNrXOHebAYTPen07qt+z/2zeKNC4DPN3peDGU +ZKmO32OUU3LBeRMbrLwGlswhivcDZBmprVC7yF8l0FaeAIH2P8Z0UVjrYFb8W54Shphbho0Icf nSyIAkFG8w7FYHA9dzL05PtzpwzNTyhhDt/tQ3kebDuGiDi683vDLulc6yuuKs1Bf+WfwIaGHk7 bPF9c/70QgSG+QKl3Z9QmJ1FzsRuoUjMBo/6ezeetcId5L/ttTTLHWKdinWNbuJiDyX/WWI/2Sn HjkCQuPTUJzNAC/IO1sFLJoDgfRdTFpEehNmRDtfO6J7s/t9DS2NNS4DPbgbbtcb64WsZRvG8KF 521obpjeTC+5LWU4NioW9TItmqiL6Q== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4cdf cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=XQIW6j_J4to4VNKvMa8A:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle CMMA and the ESSA instruction. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 262 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 27 +++++ 2 files changed, 289 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index d6b03ba58c93..d31d059e9996 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -1100,3 +1100,265 @@ int dat_set_prefix_notif_bit(union asce asce, gfn_t= gfn) return -EAGAIN; return 0; } + +/** + * dat_perform_essa() - perform ESSA actions on the PGSTE. + * @asce: the asce to operate on. + * @gfn: the guest page frame to operate on. + * @orc: the specific action to perform, see the ESSA_SET_* macros. + * @state: the storage attributes to be returned to the guest. + * @dirty: returns whether the function dirtied a previously clean entry. + * + * Context: Called with kvm->mmu_lock held. + * + * Return: + * * 1 if the page state has been altered and the page is to be added to t= he CBRL + * * 0 if the page state has been altered, but the page is not to be added= to the CBRL + * * -1 if the page state has not been altered and the page is not to be a= dded to the CBRL + */ +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int res =3D 0; + + if (dat_entry_walk(NULL, gfn, asce, 0, TABLE_TYPE_PAGE_TABLE, &crstep, &p= tep)) { + *state =3D (union essa_state) { .exception =3D 1 }; + return -1; + } + + pgste =3D pgste_get_lock(ptep); + + *state =3D (union essa_state) { + .content =3D (ptep->h.i << 1) + (ptep->h.i && pgste.zero), + .nodat =3D pgste.nodat, + .usage =3D pgste.usage, + }; + + switch (orc) { + case ESSA_GET_STATE: + res =3D -1; + break; + case ESSA_SET_STABLE: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 0; + break; + case ESSA_SET_UNUSED: + pgste.usage =3D PGSTE_GPS_USAGE_UNUSED; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_VOLATILE: + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_POT_VOLATILE: + if (!ptep->h.i) { + pgste.usage =3D PGSTE_GPS_USAGE_POT_VOLATILE; + } else if (pgste.zero) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + } else if (!pgste.gc) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + res =3D 1; + } + break; + case ESSA_SET_STABLE_RESIDENT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + /* + * Since the resident state can go away any time after this + * call, we will not make this page resident. We can revisit + * this decision if a guest will ever start using this. + */ + break; + case ESSA_SET_STABLE_IF_RESIDENT: + if (!ptep->h.i) + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + break; + case ESSA_SET_STABLE_NODAT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 1; + break; + default: + WARN_ONCE(1, "Invalid ORC!"); + res =3D -1; + break; + } + /* If we are discarding a page, set it to logical zero */ + pgste.zero =3D res =3D=3D 1; + if (orc > 0) { + *dirty =3D !pgste.cmma_d; + pgste.cmma_d =3D 1; + } + + pgste_set_unlock(ptep, pgste); + + return res; +} + +static long dat_reset_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D 0; + pgste.nodat =3D 0; + pgste.cmma_d =3D 0; + pgste_set_unlock(ptep, pgste); + if (need_resched()) + return next; + return 0; +} + +long dat_reset_cmma(union asce asce, gfn_t start) +{ + const struct dat_walk_ops dat_reset_cmma_ops =3D { + .pte_entry =3D dat_reset_cmma_pte, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &dat_reset_cmma_o= ps, + DAT_WALK_IGN_HOLES, NULL); +} + +struct dat_get_cmma_state { + gfn_t start; + gfn_t end; + unsigned int count; + u8 *values; + atomic64_t *remaining; +}; + +static long __dat_peek_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + state->values[gfn - walk->start] =3D pgste.usage | (pgste.nodat << 6); + pgste_set_unlock(ptep, pgste); + state->end =3D next; + + return 0; +} + +static long __dat_peek_cmma_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + + if (crstep->h.i) + state->end =3D min(walk->end, next); + return 0; +} + +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D __dat_peek_cmma_pte, + .pmd_entry =3D __dat_peek_cmma_crste, + .pud_entry =3D __dat_peek_cmma_crste, + .p4d_entry =3D __dat_peek_cmma_crste, + .pgd_entry =3D __dat_peek_cmma_crste, + }; + struct dat_get_cmma_state state =3D { .values =3D values, }; + int rc; + + rc =3D _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DE= FAULT, &state); + *count =3D state.end - start; + /* Return success if at least one value was saved, otherwise an error. */ + return (rc =3D=3D -EFAULT && *count > 0) ? 0 : rc; +} + +static long __dat_get_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + if (state->start !=3D -1) { + if ((gfn - state->end) > KVM_S390_MAX_BIT_DISTANCE) + return 1; + if (gfn - state->start >=3D state->count) + return 1; + } + + if (!READ_ONCE(*pgste_of(ptep)).cmma_d) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (pgste.cmma_d) { + if (state->start =3D=3D -1) + state->start =3D gfn; + pgste.cmma_d =3D 0; + atomic64_dec(state->remaining); + state->values[gfn - state->start] =3D pgste.usage | pgste.nodat << 6; + state->end =3D next; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_get_cmma_pte, }; + struct dat_get_cmma_state state =3D { + .remaining =3D rem, + .values =3D values, + .count =3D *count, + .start =3D -1, + }; + + _dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLE= S, &state); + + if (state.start =3D=3D -1) { + *count =3D 0; + } else { + *count =3D state.end - state.start; + *start =3D state.start; + } + + return 0; +} + +struct dat_set_cmma_state { + unsigned long mask; + const u8 *bits; +}; + +static long __dat_set_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_set_cmma_state *state =3D walk->priv; + union pgste pgste, tmp; + + tmp.val =3D (state->bits[gfn - walk->start] << 24) & state->mask; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D tmp.usage; + pgste.nodat =3D tmp.nodat; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +/* + * This function sets the CMMA attributes for the given pages. If the input + * buffer has zero length, no action is taken, otherwise the attributes are + * set and the mm->context.uses_cmm flag is set. + */ +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_set_cmma_pte, }; + struct dat_set_cmma_state state =3D { .mask =3D mask, .bits =3D bits, }; + union crste *crstep; + union pte *ptep; + gfn_t cur; + int rc; + + for (cur =3D ALIGN_DOWN(gfn, _PAGE_ENTRIES); cur < gfn + count; cur +=3D = _PAGE_ENTRIES) { + rc =3D dat_entry_walk(mc, cur, asce, DAT_WALK_ALLOC, TABLE_TYPE_PAGE_TAB= LE, + &crstep, &ptep); + if (rc) + return rc; + } + return _dat_walk_gfn_range(gfn, gfn + count, asce, &ops, DAT_WALK_IGN_HOL= ES, &state); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index c8df33f95160..4190a54224c0 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -17,6 +17,15 @@ #include #include =20 +/* + * Base address and length must be sent at the start of each block, theref= ore + * it's cheaper to send some clean data, as long as it's less than the siz= e of + * two longs. + */ +#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) +/* For consistency */ +#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) + #define _ASCE(x) ((union asce) { .val =3D (x), }) #define NULL_ASCE _ASCE(0) =20 @@ -433,6 +442,17 @@ static inline union crste _crste_fc1(kvm_pfn_t pfn, in= t tt, bool writable, bool return res; } =20 +union essa_state { + unsigned char val; + struct { + unsigned char : 2; + unsigned char nodat : 1; + unsigned char exception : 1; + unsigned char usage : 2; + unsigned char content : 2; + }; +}; + /** * struct vsie_rmap - reverse mapping for shadow page table entries * @next: pointer to next rmap in the list @@ -522,6 +542,13 @@ bool dat_test_age_gfn(union asce asce, gfn_t start, gf= n_t end); int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, bool uses_skeys, struct guest_fault *f); =20 +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty); +long dat_reset_cmma(union asce asce, gfn_t start_gfn); +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues); +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem); +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) --=20 2.51.1