From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 292052BEFED; Wed, 4 Feb 2026 15:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217388; cv=none; b=pSEgZ/8UauvWdiz2nA5ixkntEE0aiZklWxWZwStzkb+1TxNwoFQapJVrjVoNpAMcz63hcToaiWKrOCH0Qd+jkKRuXxjGzOrV7IUgsziMdjvT6VImFwl+X6reK9wUIwjOTVcQyI01ATRIQ0KNl0ltK2Fe4hYEWT+sVKZmzFeS4pY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217388; c=relaxed/simple; bh=Frn6hltIyV5WJ2d6pPnm0V60inoBt8b4iK5mJVeSKiw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aqmbmbfq3lulBwPSCLkSFTflg/Wp8lyFqiFAyBTRxXOgINsyCNBEBkLMZhFRIWpQEAdHrT3O1UcY14wGIEKeSbLvzZjMKFVgKt1fe6Bdo+R0iGpie0KbA18Rp1MJlMb+JD7G733sPVPIamVhepNi3H0/8agXN/TOCp8QHEsF6CU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=N2rYW+j1; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="N2rYW+j1" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614CPYNr027389; Wed, 4 Feb 2026 15:03:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=rv39S/P+MdKrw6xwW OVNfixy/froEIPZnW3Fr4nBnwA=; b=N2rYW+j1Fcpd2L0K0j6LwBES6KltAbQj0 v+XMIOd3oOPtayo2KxjZa1l6QXpRdH/aooCnsRO8OGscdYlbzW/TOV01zD6brj2/ 0ZjO7ht5YFQAfsQ0ItsNXUoxVOqnz/TDV+v00AgxPsUHjy3LhoUHJrtHR0XJ6MIk 6A5uI4aY5mbJhZ1cjM5DVGwPkqKC6eFW9iKMQR5jmysrAZ2AkEyhxyZQthIN4boK TkFymx8CurHNrcmKPBhcqds6/zxWy3K3RYQMr5LugGvFNxv2Hb+TtKX45Edffs2W O8ULyEV4mDB/0ukzTvb+ueQxHI9ZjnBQGm63XBQiE384UQZ3cgW1A== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c175n00e6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:04 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CA55B025710; Wed, 4 Feb 2026 15:03:03 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnj8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:03 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F2xU152887836 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:02:59 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC28B20040; Wed, 4 Feb 2026 15:02:59 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 88F452004B; Wed, 4 Feb 2026 15:02:59 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:02:59 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 01/29] KVM: s390: Refactor pgste lock and unlock functions Date: Wed, 4 Feb 2026 16:02:30 +0100 Message-ID: <20260204150259.60425-2-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: RT40z16FDrzUcVCWGl5IxA5XsRdWpMic X-Authority-Analysis: v=2.4 cv=VcX6/Vp9 c=1 sm=1 tr=0 ts=69835fa8 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=gw90SDbXj9Ny8T8iXsQA:9 X-Proofpoint-GUID: RT40z16FDrzUcVCWGl5IxA5XsRdWpMic X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX7lqqrQJk1mm6 XubXh4TeTObWSjmCllyvql0HDYcUOnkZA7DTVbDjOiDdX7+3L6LSRzcpB/Tq5dOO4V/Vlpip8s0 prudWcPAdCdzkb1qvgxul/V0QQYthBH2TAIrBo9c76GrrED39mpWREh5PS5gNegLh8CibFMBUqX 56q+nqN1U23NB58kYtq+aRbigsA3ormKpXuj+bhjWVBkoIoHOofcwN2W7zuk9dcE4Z/Nzsn97NM es7x+kaVWIghM3QRZsHtuuxw1TyhDzuyWp5fnsav8cPmbHryxm/G+Q9JJGtI/nqCLbQfz6+PNaU pVbO0w96nuQVIMQ1bD/qBzW+yPSigpSQV/dJvEu77kE1ZgKtZTBWkj0hkU0YZGNDTZLCmha+oTi zK+a2kz50gCorbnrIYUQ9lt4GaHVw9+Jjv/fLNggrWM7NB6+VoZi51YA4zFwFPZ/kuBVJqCoeUr ofUbRfdC53w1I+xMadw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Move the pgste lock and unlock functions back into mm/pgtable.c and duplicate them in mm/gmap_helpers.c to avoid function name collisions later on. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/include/asm/pgtable.h | 22 ---------------------- arch/s390/mm/gmap_helpers.c | 23 ++++++++++++++++++++++- arch/s390/mm/pgtable.c | 23 ++++++++++++++++++++++- 3 files changed, 44 insertions(+), 24 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index bca9b29778c3..8194a2b12ecf 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -2040,26 +2040,4 @@ static inline unsigned long gmap_pgste_get_pgt_addr(= unsigned long *pgt) return res; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index d41b19925a5a..4fba13675950 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -15,7 +15,6 @@ #include #include #include -#include =20 /** * ptep_zap_softleaf_entry() - discard a software leaf entry. @@ -35,6 +34,28 @@ static void ptep_zap_softleaf_entry(struct mm_struct *mm= , softleaf_t entry) free_swap_and_cache(entry); } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 666adcd681ab..08743c1dac2f 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -24,7 +24,6 @@ #include #include #include -#include #include =20 pgprot_t pgprot_writecombine(pgprot_t prot) @@ -116,6 +115,28 @@ static inline pte_t ptep_flush_lazy(struct mm_struct *= mm, return old; } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + static inline pgste_t pgste_get(pte_t *ptep) { unsigned long pgste =3D 0; --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8ACA2D7D42; Wed, 4 Feb 2026 15:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; cv=none; b=RHW0pGUiKj/DWlfuEr4M1Ith7M35fcPppWlvHr5lNJXtRdu0WX26LlfeSydJp3T9Ky1NbGFagxqQ5Ox2WVk7Mj7ZmQnpbC1gGMqbQ9mtGC3vU1uv62rNRqKDD8u1N282uNtD9XZDJ5OjFfTSbpKlRpMKeP2tfN7EbHhqxuY8POk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; c=relaxed/simple; bh=8V0RsbJjbg+C2CRylv9rHMgahDcLKD21enufdKYElYs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gr5A5sE2q0zqFBosEeDLskfhH9pqlath5mOMbXILimGrQfVANnCXSzxu9yKmhJzE/j00YTiGF48KTHJorwbf0I/u4AW/Kna2gzNlWemjYiaphyQh5gSWi4+cbDihYfnX9WsxJmBpj6CNIJDFX3RmbqYa3rn9rOydykJJvQezUBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=UakN7Wsn; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="UakN7Wsn" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613NMUpg031045; Wed, 4 Feb 2026 15:03:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=+5QN9fBrz8NVdvPYx 1QvWxk2hn6Rv7kUyS/mMpg6ITQ=; b=UakN7WsnO6op0pBAe0Neyy3cLiBK8VX08 bUu+5D05iVpOE2eJ7tbwTYTM/fZtwqnA3VQ/qWHKoUax4MureT4JNi5u38evGOwy CeviOMYCxqwb79YjpCZetba04PNCcOU4IfpZdRTeQBrgJzlmJVrEQXEDU+xTsyHr gKk/gJp+fsDCjwl+nwYcGmDNnDB0l1s2jmhKvJ3XVQgLs67zvwfVak1SQE3+rTwy LGk3H0H9WbSyG23vioxMcsflarozxG2Lrp8oU6LumZSsRCfqUz6JfkeLboSj0EMO 3HWAtN0r9dDQLxgQ4xHE8IOMSmKIuNHDDRRU0A4HJ5YbYHyGW7WlQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c175n00e5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:04 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CQ8Co029108; Wed, 4 Feb 2026 15:03:03 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1v2sdsp0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:03 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F30oF51053050 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:00 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2089920040; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D37CD20043; Wed, 4 Feb 2026 15:02:59 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:02:59 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 02/29] KVM: s390: Add P bit in table entry bitfields, move union vaddress Date: Wed, 4 Feb 2026 16:02:31 +0100 Message-ID: <20260204150259.60425-3-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: XUhsb31e6hxxuoeX4YNyc9CX8AAFDhF_ X-Authority-Analysis: v=2.4 cv=VcX6/Vp9 c=1 sm=1 tr=0 ts=69835fa8 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=ejxT3_R5mcMXAYFV490A:9 X-Proofpoint-GUID: XUhsb31e6hxxuoeX4YNyc9CX8AAFDhF_ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX5c4DsdKcp2Qg SuOTcFradE9Xg4vQUBPv3M59Hpz9s4xf3nFXtVT+JIf0myLfuxgj26aOiMsOe9BlWZSycljrMwu ct0vNUL2XfHHdKVlX1L7d2g4zCRFVMhnESsONwLPX/u3iP1oLK0Bn/iQa3jcrYb2XnSlFg1+JJe g+rDxxrrA45kt8nJQw0YcRW+J4XoKqJ6YXnpMzcJyRT3ATjgaopSQmYJcrnoJx6PtVOUDhG+Vue ZZ1g8srCJadPDo1XCfq8DIICnpKu8d1ycpRoPT6GZ6d3SP5eT0wD8eHEKEc3x3Cgy95M82GypIC r9yc9G9Nb6MaYxsUQ2nuUVDlNAe1RzR5plHWEHMzUDbzK5GP6tYQOIyixlHcG/X2Wh1ffqU2fNS oRUOUfUACaed/0gk4z/exvoCZbn3rO4wNPYB6MFxAyBwRg21A/Wpdu2T7k4MTy+l8vltkHjq+yv w8qkt2DoHFIOpGO0RQQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add P bit in hardware definition of region 3 and segment table entries. Move union vaddress from kvm/gaccess.c to asm/dat_bits.h Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Acked-by: Heiko Carstens --- arch/s390/include/asm/dat-bits.h | 32 ++++++++++++++++++++++++++++++-- arch/s390/kvm/gaccess.c | 26 -------------------------- 2 files changed, 30 insertions(+), 28 deletions(-) diff --git a/arch/s390/include/asm/dat-bits.h b/arch/s390/include/asm/dat-b= its.h index 8d65eec2f124..c40874e0e426 100644 --- a/arch/s390/include/asm/dat-bits.h +++ b/arch/s390/include/asm/dat-bits.h @@ -9,6 +9,32 @@ #ifndef _S390_DAT_BITS_H #define _S390_DAT_BITS_H =20 +/* + * vaddress union in order to easily decode a virtual address into its + * region first index, region second index etc. parts. + */ +union vaddress { + unsigned long addr; + struct { + unsigned long rfx : 11; + unsigned long rsx : 11; + unsigned long rtx : 11; + unsigned long sx : 11; + unsigned long px : 8; + unsigned long bx : 12; + }; + struct { + unsigned long rfx01 : 2; + unsigned long : 9; + unsigned long rsx01 : 2; + unsigned long : 9; + unsigned long rtx01 : 2; + unsigned long : 9; + unsigned long sx01 : 2; + unsigned long : 29; + }; +}; + union asce { unsigned long val; struct { @@ -98,7 +124,8 @@ union region3_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Region-Invalid Bit */ unsigned long cr: 1; /* Common-Region Bit */ unsigned long tt: 2; /* Table-Type Bits */ @@ -140,7 +167,8 @@ union segment_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Segment-Invalid Bit */ unsigned long cs: 1; /* Common-Segment Bit */ unsigned long tt: 2; /* Table-Type Bits */ diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 41ca6b0ee7a9..d8347f7cbe51 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -20,32 +20,6 @@ =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 -/* - * vaddress union in order to easily decode a virtual address into its - * region first index, region second index etc. parts. - */ -union vaddress { - unsigned long addr; - struct { - unsigned long rfx : 11; - unsigned long rsx : 11; - unsigned long rtx : 11; - unsigned long sx : 11; - unsigned long px : 8; - unsigned long bx : 12; - }; - struct { - unsigned long rfx01 : 2; - unsigned long : 9; - unsigned long rsx01 : 2; - unsigned long : 9; - unsigned long rtx01 : 2; - unsigned long : 9; - unsigned long sx01 : 2; - unsigned long : 29; - }; -}; - /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A1B42BF002; Wed, 4 Feb 2026 15:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; cv=none; b=n5Z9ZvjI+nBcXLgviSqq7LDDil4ywMsr3xUl6HaXL/3gDsM2ot9irEunTAdS0lwcB6zgXJRFr7D+RGm99mlZp7BiLewICUGIzeZNh5J0IAhWZY28XIxsOqWSafz3OJpd9/Mkh7rIDamYbR0TwQnuLUoSp8rSjZchyz94lcUSlRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; c=relaxed/simple; bh=KKJC+YnSQfRueC8R5pCG5k1inUq4VYQQ3SDb0JxrwfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eR8yYvZBjLN2UkQrHfA8qepSw2J2ozijgDgG/96dRxeCInmvY5NHdoWBUq+l44UzJsIE1vGoyxMu5YGetyxhDm/ct9R6LDYcjhkPWrDLmwMaSnoexEO/aYFMEZQTz4E9ptBAX4IykjXO180bkuMHBg+s3xx3+zSpX/8z6Kv4EG8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=K058GrJE; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="K058GrJE" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61417gWB014178; Wed, 4 Feb 2026 15:03:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=HX2NRnK5H2nIdD2i/ 6mgkh0Vo4qi+8ItWn9NM+weks8=; b=K058GrJElbujo6c1PORl0ccCD0UnqyZ8I 4A6FWg9IcsckjhujQQOI6ld1H4lxveDnEpytDtzUaW18cp9bDYLS7iccsL1lY6fL Pe/ft9JKZl72wPt6Hv5XAmSIF8xZz6kYRrkqd7cgxL23TwxKEZkZCwwJfPOnG3W7 N9bTFGkNIjBfmmfH5LQVFB9WCY3uxVqeX7cLy5zuQjftJTB2uNHVcHBmLWuiJeQv lY5JmaNJ591GYJ+qRmmLfwuH22JCNIWL+vNJqILTnt8UWZ+lFwNrq90r+FKXi1U0 9KFLNSPVDXfKZlk/ZtkadCmLuC2CH4KEZmfMgK3LZV7LOzi7BdDAg== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rna-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:04 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614Dc3pn005996; Wed, 4 Feb 2026 15:03:04 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1x9jde54-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:03 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F30Jq57606576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:00 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 692E92004D; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 282E72004B; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 03/29] s390: Make UV folio operations work on whole folio Date: Wed, 4 Feb 2026 16:02:32 +0100 Message-ID: <20260204150259.60425-4-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX3eqmY//46mwP 31NCYE6TFPFHJelR2+VyS35z+Zkdi/AKvdxUWKoKFiW3U4bRQrq+csB9ZEhmuaO14t1yMyPKVu3 l/ao6l+yHAkbtmnODbngmJrKun9H+nkdB8AJtPLxWH0bHdM9vn/BQ/u4+xu+O8v/i7vPPN8wR/b 3i1g36XFcmw63C0Pa9FMdgD+b6CbuiCcG6Di4cyzbdgWJJeIb+o9GkNmg6B2KKnK9Bhnv6XZkMi jherSXv4YguOCWXqvoo3Rjq5coGVQ1IiFeiv2XZSiAMsQy0hiJFLRMmco1ot3NhcAfDUWLBbY+9 vur1q1jtRVG9iXjEQ8cj9XMpkMMjTCukN/4vrtPy6IT4AQtROCS5rKqZ3jP9DSDt545dssHpT8a axtnnWURtwuIyhdawQSoa2eBxfiTTGhj3pOgKW/Ewar0OxJsGf3GM1CERkaxHME9/fl+x1s/b6V HQcsD1APNpjmdqpiW3g== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fa8 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=56eYp6RrapUnXBmR778A:9 X-Proofpoint-ORIG-GUID: y3_J4by3HLekC6oxvoHdPSQFtfUe7E5x X-Proofpoint-GUID: y3_J4by3HLekC6oxvoHdPSQFtfUe7E5x X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" uv_destroy_folio() and uv_convert_from_secure_folio() should work on all pages in the folio, not just the first one. This was fine until now, but it will become a problem with upcoming patches. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kernel/uv.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index ed46950be86f..ca0849008c0d 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -134,14 +134,15 @@ static int uv_destroy(unsigned long paddr) */ int uv_destroy_folio(struct folio *folio) { + unsigned long i; int rc; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - folio_get(folio); - rc =3D uv_destroy(folio_to_phys(folio)); + for (i =3D 0; i < (1 << folio_order(folio)); i++) { + rc =3D uv_destroy(folio_to_phys(folio) + i * PAGE_SIZE); + if (rc) + break; + } if (!rc) clear_bit(PG_arch_1, &folio->flags.f); folio_put(folio); @@ -183,14 +184,15 @@ EXPORT_SYMBOL_GPL(uv_convert_from_secure); */ int uv_convert_from_secure_folio(struct folio *folio) { + unsigned long i; int rc; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - folio_get(folio); - rc =3D uv_convert_from_secure(folio_to_phys(folio)); + for (i =3D 0; i < (1 << folio_order(folio)); i++) { + rc =3D uv_convert_from_secure(folio_to_phys(folio) + i * PAGE_SIZE); + if (rc) + break; + } if (!rc) clear_bit(PG_arch_1, &folio->flags.f); folio_put(folio); --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0F5841B36D; Wed, 4 Feb 2026 15:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; cv=none; b=n3XYU8lUOpehw4gJ0MeChqxoHBtshO2ix0g1ML79QqNBCVOOMTSm4l6EtDhj95HmgDDYoHuCKET/JzO7GLPaXHbx5j9f1sGy4Kz+PVTnjsFcMEX0U4K2IXgKveHUh2nmUDtXek/ewSde57wCcpxqKI+edW7xzlYOivpr2ICu890= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; c=relaxed/simple; bh=ppddRHBjjzuGKYC56DMR9Y/6uMZxA+vA361WbAMUQjk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CLxgUagsIGl7F8bnh9MZRELWa/47ZaE+I7G67mTU9pR1YRw9S8dOGTVm3lpZkQy5Qkdyhb4u8Q1UN04bdv02FLvgyW0Iq9gYWRC1Q3g1zh05QeuCwrbqYSq0xxNaaMx6XiJ/EM5rv6i5bYo7J6FMyqJUSpr07njs8MSNOvWVlA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ZDy7EhYw; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ZDy7EhYw" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6149VkGO016593; Wed, 4 Feb 2026 15:03:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=53+6YA0gHtcxVllIt onUc6YdfZpmIkiOnoO0JHtuaEQ=; b=ZDy7EhYwCShykWabk+oCcN4TGVZmr3T78 Vf4AYsrSxfaXg2/rEW07O7mFUgXOzfbhmdAkufqnb5Y9OHmKenf+hvs+E5wm/he5 YL9DVNytGygDC/YxURIBcZVDdCUWrWQBTJt+X8ji6taZ+WYBL9ZLcTIAm4DtGgB3 2z+1xD7e7EvVz2Dm8tmxhLAvbF3r++KQ1ibfjosnGk5yyehvKVLZdJDnfyUKvcPG iSSSP7bMgA1fRHDIXA/yDmpJ2pIwpVzqBEUGzPeQk4VmBV00S6uZNMLp7+aV4Aph A30bO+qhFs+texMkUlBqQaGeVMTJMA5VpPZqKwkxnROP4a1y32baw== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw6v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:05 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614EP2Y4027348; Wed, 4 Feb 2026 15:03:04 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1xs1dcqy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:04 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F30XR57606578 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:00 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B117C2004B; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 70AA720043; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 04/29] s390: Move sske_frame() to a header Date: Wed, 4 Feb 2026 16:02:33 +0100 Message-ID: <20260204150259.60425-5-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835fa9 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=cwyL-bC_1qH1TysSW8cA:9 X-Proofpoint-GUID: d8fLPXytuwIq-LUUZV29sDKVdqYH2iAQ X-Proofpoint-ORIG-GUID: d8fLPXytuwIq-LUUZV29sDKVdqYH2iAQ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX7fsom2Y+PQVk GV207boAikW8WXnpETheRyM9RS10xR5jBvezkkp4TLJ2OAxuUXLYaPPJQVRnOZqJqfIoOR3/Www S9iMPjcOcgcgywSN5OauUjPirx1r8fyLf/11nP+6SbnN1v6GUMBvSk6pzf9dHAFos8DygIAzCTC lE/7JYpFZjoaQ4j/NKnbIx1tjAzt4435++++srSamOEwKylFHjnwGea8VfUsX30Jj9EzZhlIRZG KS73cmoi6tddvf/gylaCoXAvBztgy2i4bOUQP129Emu1NcVsNUbBC88lBhsx32QX1Ehr9onsCR2 xdFGfDfdrP1Q39J6L98sU+voiFtmpJaSC0Vr6VEdyDJ64xRt/2+Am8uG0F/HRgt4wEV3twgrpJ2 lKx0GkdLjI5Y5SF02Zce6Fcu5WM55ZHjk2BRnOLUmt+63Th34U3s53IJqh76dsAElr23vSJeofF 8fnQo/osEY78OZdBQBA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Move the sske_frame() function to asm/pgtable.h, so it can be used in other modules too. Opportunistically convert the .insn opcode specification to the appropriate mnemonic. Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Reviewed-by: Nina Schoetterl-Glausch Acked-by: Heiko Carstens --- arch/s390/include/asm/pgtable.h | 7 +++++++ arch/s390/mm/pageattr.c | 7 ------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 8194a2b12ecf..73c30b811b98 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1136,6 +1136,13 @@ static inline pte_t pte_mkhuge(pte_t pte) } #endif =20 +static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) +{ + asm volatile("sske %[skey],%[addr],1" + : [addr] "+a" (addr) : [skey] "d" (skey)); + return addr; +} + #define IPTE_GLOBAL 0 #define IPTE_LOCAL 1 =20 diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index d3ce04a4b248..bb29c38ae624 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -16,13 +16,6 @@ #include #include =20 -static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) -{ - asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],1,0" - : [addr] "+a" (addr) : [skey] "d" (skey)); - return addr; -} - void __storage_key_init_range(unsigned long start, unsigned long end) { unsigned long boundary, size; --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C76B2D879A; Wed, 4 Feb 2026 15:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; cv=none; b=Nks0IXfciHc8LHWux07xFDRIcU2FXXDLAM0US1Z831o1YCFPe2Bc2ymEOcAtjbGYtJuRVENJToFBzrKcad6rokJKrlLP5511YGYXCMyL/i/GlNz2z2fjd4NbA09VMDstHZw4r36T/78zt8F6M7JF19XfN8HpWAXIROCPC8xUuww= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; c=relaxed/simple; bh=wZIM7tTYqqSB7gE+7VmQ8HvdCuIZC1gLeC4DpqWpcvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mi4hQ7+FLhmlDR5pvJBj2Cr6GgYG4vLSES6WVbckWKcqX5Xss7wBH8mv3UmVT7Y2gAFgpPJci8sqOx5qpAuNpUiiP+Zaj2n4WHBdzfxgFxqY1ddDfFENUwhP9rctTE2mdB268tfwdhAc+BD/IkgKo4/4VcB51S7UmvUjNngFfUY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Vcs/mTCj; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Vcs/mTCj" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6140dl3D032157; Wed, 4 Feb 2026 15:03:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=CE4JpDSEghk1nwVFo 3HYe4mFC7f75/s97B715bNStdc=; b=Vcs/mTCjZ7TWLWJ5q7kBB9mUQ/k2WjJc3 aAU25/isqfzzlW4/HFrWEv3jpyjDKNqXp/H9k7RycvEftMLYddPqM2EprluCzC0L OAoufSqk96YKyHzHToPO0VSVBxCawwP8QcbbYMhpDGaqblGzqzRgoz480u+HCtHL sYuzZJzaIR5PL3UNQjry5b+qBYEuLzMaXRY7R5tYmy2YmuxlYQ80q+7B1sFcb1yH XNnhK3LqxAFWS4ImDgfJp7kL+kTEjXcpx2YA/pvGGJbhKwObR0qh4fuKfxp6/NKo eCLttleZFJC33jlzr1RBs+LiLK804fTlxQhxDeHEzlDDLUNQnXv0A== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw70-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:05 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CPBBR025706; Wed, 4 Feb 2026 15:03:05 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnjd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:04 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F31cf16777716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:01 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 066C220040; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B85EC2004D; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:00 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 05/29] KVM: s390: Add gmap_helper_set_unused() Date: Wed, 4 Feb 2026 16:02:34 +0100 Message-ID: <20260204150259.60425-6-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835fa9 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=F2Av9RbSm_i__n-hslAA:9 X-Proofpoint-GUID: BYKL69CtYqqMcBuFN3VxyOi7ON-pNe6_ X-Proofpoint-ORIG-GUID: BYKL69CtYqqMcBuFN3VxyOi7ON-pNe6_ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX4+eLnlxsS++8 Fx9MqKeLK0RI+QbizF6I5jLPBMskVzGSBp2xe7+dtdy0neoiDFW6MJsvnw7+naL53tfKtoX/urc a/pqEqvB1m2Kyg0YdWsEprx0kAln6h9g7lDNdibmOj9v1nWTn51SNymN9IzdC40qG5qqBZmlQQM kEpGWYtG0loEJc0A7VebDpEZBp0pBi5UYYVUZrZ+xy0oD11IPYVymZqUkpUTPaSXCW17QOnRrLM ApMqwo+kDe4LaRHv7Te56XtDut0N6kdc2mWKPNxRFyE2pNiOHy1ZsaqiSXQouQCM79ASXKR8pDP uld11Abszk/C+wT7JS4QDhLoKQYSgzvOVk7QZRpp8LglKE8Txj+kfAdKU7jMK6FttwmUm0iL25z V+2O6d+9jjo7keG+vqWM7ID/ep2Lc7sj04Z2gwoaXKC+d5PqdYGXdSETBgcPt5lT0G9+YgZWk0I +I27gmyjgQTnrhcK8Tg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add gmap_helper_set_unused() to mark userspace ptes as unused. Core mm code will use that information to discard unused pages instead of attempting to swap them. Signed-off-by: Claudio Imbrenda Reviewed-by: Nico Boehr Tested-by: Nico Boehr Acked-by: Christoph Schlameuss Acked-by: Heiko Carstens --- arch/s390/include/asm/gmap_helpers.h | 1 + arch/s390/mm/gmap_helpers.c | 79 ++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/arch/s390/include/asm/gmap_helpers.h b/arch/s390/include/asm/g= map_helpers.h index 5356446a61c4..2d3ae421077e 100644 --- a/arch/s390/include/asm/gmap_helpers.h +++ b/arch/s390/include/asm/gmap_helpers.h @@ -11,5 +11,6 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr); void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsig= ned long end); int gmap_helper_disable_cow_sharing(void); +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr); =20 #endif /* _ASM_S390_GMAP_HELPERS_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index 4fba13675950..4864cb35fc25 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -129,6 +129,85 @@ void gmap_helper_discard(struct mm_struct *mm, unsigne= d long vmaddr, unsigned lo } EXPORT_SYMBOL_GPL(gmap_helper_discard); =20 +/** + * gmap_helper_try_set_pte_unused() - mark a pte entry as unused + * @mm: the mm + * @vmaddr: the userspace address whose pte is to be marked + * + * Mark the pte corresponding the given address as unused. This will cause + * core mm code to just drop this page instead of swapping it. + * + * This function needs to be called with interrupts disabled (for example + * while holding a spinlock), or while holding the mmap lock. Normally this + * function is called as a result of an unmap operation, and thus KVM comm= on + * code will already hold kvm->mmu_lock in write mode. + * + * Context: Needs to be called while holding the mmap lock or with interru= pts + * disabled. + */ +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr) +{ + pmd_t *pmdp, pmd, pmdval; + pud_t *pudp, pud; + p4d_t *p4dp, p4d; + pgd_t *pgdp, pgd; + spinlock_t *ptl; /* Lock for the host (userspace) page table */ + pte_t *ptep; + + pgdp =3D pgd_offset(mm, vmaddr); + pgd =3D pgdp_get(pgdp); + if (pgd_none(pgd) || !pgd_present(pgd)) + return; + + p4dp =3D p4d_offset(pgdp, vmaddr); + p4d =3D p4dp_get(p4dp); + if (p4d_none(p4d) || !p4d_present(p4d)) + return; + + pudp =3D pud_offset(p4dp, vmaddr); + pud =3D pudp_get(pudp); + if (pud_none(pud) || pud_leaf(pud) || !pud_present(pud)) + return; + + pmdp =3D pmd_offset(pudp, vmaddr); + pmd =3D pmdp_get_lockless(pmdp); + if (pmd_none(pmd) || pmd_leaf(pmd) || !pmd_present(pmd)) + return; + + ptep =3D pte_offset_map_rw_nolock(mm, pmdp, vmaddr, &pmdval, &ptl); + if (!ptep) + return; + + /* + * Several paths exists that takes the ptl lock and then call the + * mmu_notifier, which takes the mmu_lock. The unmap path, instead, + * takes the mmu_lock in write mode first, and then potentially + * calls this function, which takes the ptl lock. This can lead to a + * deadlock. + * The unused page mechanism is only an optimization, if the + * _PAGE_UNUSED bit is not set, the unused page is swapped as normal + * instead of being discarded. + * If the lock is contended the bit is not set and the deadlock is + * avoided. + */ + if (spin_trylock(ptl)) { + /* + * Make sure the pte we are touching is still the correct + * one. In theory this check should not be needed, but + * better safe than sorry. + * Disabling interrupts or holding the mmap lock is enough to + * guarantee that no concurrent updates to the page tables + * are possible. + */ + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmdp)))) + __atomic64_or(_PAGE_UNUSED, (long *)ptep); + spin_unlock(ptl); + } + + pte_unmap(ptep); +} +EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused); + static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3EAC41B359; Wed, 4 Feb 2026 15:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; cv=none; b=B+YT6AxHNqxc+KNEeOzL7KPRvSmfmq3P8gbyDW3BfxwNAz0q8nRBXehVAlKsAv4XUb+Yx89dTStrdopoPpqvpZviv1ZOjxWJO2Zp1Q8lCPgbw8wVnBQCX4g64cCXpdS5P4K5UIUNewDooAhIk00dd3l52sxgQow6MaAT6DdSWnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; c=relaxed/simple; bh=2G/RNx8tFc20K9NAY2KiZ4CkWJ7nDB/pbKmFm1hV4Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r2VMe+eVW/u2eyjpdh4ZYi2r/TCvNXHQ8RqLTPoK0/Z+qkDCvoD4Pqraj2aAcgwfIXmEWvjZ7Tyu8kcFPLJRuwx968G3bjoT+4wG3T88bY5faxW9iJ0Vz+jmBKjcbRee7sWy9h7yGpslmSCwnaBX7+0fo62lPjh5XxtTkbF2PKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=oZjWKczv; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="oZjWKczv" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61408RTb021631; Wed, 4 Feb 2026 15:03:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9qirMlnR08lnLuOsu ocG1SXixr7jpbIgJF3CIcBfEdM=; b=oZjWKczvi3uD9IqkGeAffKvKQv1o44VVn KF4pypeX/bGAEp0F0uEetd8ZYPi4g3Bdh2Mq4sqvV1+sK/aZXleAOgeNjr9yT1dX G8NTXrmh3t1aK1e5p2gxfnl+RJnICO5BLdtxYCspJjw8Aydhn8SvUxwUT3SmODlg wt7w7eqkPOslANkidufOz7J5wqSFARP6ZpK0kyFp2XUrkb1knOe659tLDhz6f1xe W2NDSPAdLZwnLNvGQbbuHeHwabEcb1IvdrB1KVwsJlNe9SVrilQyfArry7Ver4MK f2a8u8zhwSMs6c8QwPDdCYFQwW7rLCYZAyjVghzNhgExamBI9UCeQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c1986jf5j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614C9Q07009115; Wed, 4 Feb 2026 15:03:05 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1vey5s29-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:05 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F31aC16777720 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:01 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5E0792004B; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 11F882004F; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 06/29] KVM: s390: Introduce import_lock Date: Wed, 4 Feb 2026 16:02:35 +0100 Message-ID: <20260204150259.60425-7-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX8xAwTD2nPO+5 RXunBXovkNX9LpSoBeQB8iiS/yxM7SC1RoeNKodBBE1QjbYe5HoDDgE9XXVdnGiEcf+8ogEUiDs YvasaGBF3CMzNEmNLQSuMKm+j+Vy/vx4wpfS+9vorguSZNPUWhysnUR/I6/DwWOkEG50gbyLZgh /CVx5xvLjTD/Vio4zPTAP87UEIwBg4FTMj+Q9335rjf3syXq4BatWQOElC65AKBOxmqZWXfdQOH zidKL8qa6fjmHY1TR92HA+SBVTBlKUK9SoYaJ/1WDvLjioFqlDHXGMpeganaaEKsZdMxwa49QnO NF4jMOv9gkW5DGMd/Nc12uYtzoZgtL+UiQwMzlWtrsS7k2w1VX7AmVmoefZr13ycCCZnjCEhBvX iDySWUzK7wpxAz4u14F7omOTZ75JenQrGq9iMYGWTz3lhEUW2SwM0pm+M//eCPqQQqIY+NgNFu3 X2c+gjlWTu/fM7mhjUg== X-Proofpoint-GUID: CkTnNpl9gYGbYk7wOXVeDTjwIFd9nvoo X-Authority-Analysis: v=2.4 cv=DbAaa/tW c=1 sm=1 tr=0 ts=69835fab cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=ukBcN2NOhkw1kEjOR7IA:9 X-Proofpoint-ORIG-GUID: CkTnNpl9gYGbYk7wOXVeDTjwIFd9nvoo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 adultscore=0 malwarescore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 suspectscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Introduce import_lock to avoid future races when converting pages to secure. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/include/asm/kvm_host.h | 2 ++ arch/s390/kvm/kvm-s390.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index ae1223264d3c..3dbddb7c60a9 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -630,6 +630,8 @@ struct kvm_s390_pv { void *set_aside; struct list_head need_cleanup; struct mmu_notifier mmu_notifier; + /* Protects against concurrent import-like operations */ + struct mutex import_lock; }; =20 struct kvm_arch { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 56a50524b3ee..cd39b2f099ca 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3330,6 +3330,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long t= ype) char debug_name[16]; int i, rc; =20 + mutex_init(&kvm->arch.pv.import_lock); + rc =3D -EINVAL; #ifdef CONFIG_KVM_S390_UCONTROL if (type & ~KVM_VM_S390_UCONTROL) --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 306462D879E; Wed, 4 Feb 2026 15:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; cv=none; b=L1GnqV5IRQGHwwoyRJsEJ3PxynTRcI9F8JMaWeMPFJ+Vk2d88a4xNgh1YguY2MaI4f5rqcKhonbbmBeQu+eFwCsLOY536wDHQNDm+i1t1rLj2NZGAlrJCLJzOSH4woSyA+e3qiT/teHFK0/t2sLAc094opVt508+8CpbrYZbGY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217389; c=relaxed/simple; bh=9YJhYzyxGQsOEJpcFJeaFUho+YsLDtwkF+hgJBjHvhc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ro0d/HJs5X3tcuKnp8X1PD43FMeRi7YIxQFvSIBb+NVc698ts1nHuKBB++aGjE2OsOkQ7dzWCcNnQafPMhDYK4vDpIyNmC6vvSLD+2bzHIztw1ik/8NdjZi12j9GrrkkV6z0SPyztraFsogI3UieJVD1VHIyPPe1xN5kdOzrjzY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=eIOeS1+F; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="eIOeS1+F" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613N6c9d008818; Wed, 4 Feb 2026 15:03:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=z1DOOgHqufta6Fg5x p6jSQBL+c+NsNMwlN8dukx9jZs=; b=eIOeS1+FLTop6GduoF09S1/By8/p25Sjt BkdI9b4rdt5OLmt/H52XSOy98MERl/tEQEnVrPwsC1ZWN6TQVTWtGFt+1SXyq4UR xs+IjAF4eoKmKmsSAkHHXmB4F8pNpsxp9U3ajlM38gsD2Pfq3mHK3zv4Nf1XpTs5 TEDbAMPHMnkS2OebDVThmnFle6rdSbK0TcIoofCYQcBgCYntsQoCDJr/uX/WmFct ToTJvM4iLf8Eej6nZjdCAyWqZkGgoru/6KsGnA6K0aCNJ8JScLoszoFYkXQgx2Oz Hwqxt6GZZTWtV5/Faki2NbgyrMx5Vk+6AK/HjBFSYtOct2PP1A/ug== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c1986jf5h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:06 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614E05tK027391; Wed, 4 Feb 2026 15:03:05 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1xs1dcr2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:05 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F31P329688504 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:01 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A712520040; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 65B8020043; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 07/29] KVM: s390: Export two functions Date: Wed, 4 Feb 2026 16:02:36 +0100 Message-ID: <20260204150259.60425-8-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfXxdW1XJSujOqE qwzOdB10sM7k/x5abNDPVwcpU13/0zvdxpBMNFIszLRBx3S4I9nRlorA8urNLenmZMjKnL8A0sj DKfwf+obhpKvknyPWyE1xkpBNblVWHLO/wTDMk14R4ApWuHWo8UhJhNzzYHVUN4Ua2cNv+iTpPz xlNR8ZbeYkALW0dVeCWqlAEDE3YeJUdnFLdsmsCnbfQ5CKdgubmP4g8lKb6JCVuKnnCbvfiKBL7 CJueYZS1IeetLKUFmXLObLI31ECqBiX0HDApdLyBSmKGP9sDR9d3bJaxM2G1JoUQhdduOZECT71 q+XzkzWyj54YW/zbsRX8dCob//+fcIQHeHSO3GtkOpJ5sSH2w01cJnmKVPPOA2Kq9vRtfgz56V5 8eBV1yLlGu7Zgj2ULjpB0B+UPGLZmunuHNv8oDwmhNvM0ObUIlzM8ZUb0l4MpN1S3T+hktiZxNQ ks5ZWN4GMLq0LpFZC9Q== X-Proofpoint-GUID: QgSV8MtVXN3eTPPKjROnjRnpbjebCWBe X-Authority-Analysis: v=2.4 cv=DbAaa/tW c=1 sm=1 tr=0 ts=69835faa cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=kE4rddDq9Ks6am3DDGoA:9 X-Proofpoint-ORIG-GUID: QgSV8MtVXN3eTPPKjROnjRnpbjebCWBe X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 adultscore=0 malwarescore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 suspectscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Export __make_folio_secure() and s390_wiggle_split_folio(), as they will be needed to be used by KVM. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/include/asm/uv.h | 2 ++ arch/s390/kernel/uv.c | 6 ++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index 8018549a1ad2..0744874ca6df 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -632,6 +632,8 @@ int uv_destroy_folio(struct folio *folio); int uv_destroy_pte(pte_t pte); int uv_convert_from_secure_pte(pte_t pte); int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb); +int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio); +int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb); int uv_convert_from_secure(unsigned long paddr); int uv_convert_from_secure_folio(struct folio *folio); =20 diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index ca0849008c0d..cb4e8089fbca 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -281,7 +281,7 @@ static int expected_folio_refs(struct folio *folio) * (it's the same logic as split_folio()), and the folio must be * locked. */ -static int __make_folio_secure(struct folio *folio, struct uv_cb_header *u= vcb) +int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb) { int expected, cc =3D 0; =20 @@ -311,6 +311,7 @@ static int __make_folio_secure(struct folio *folio, str= uct uv_cb_header *uvcb) return -EAGAIN; return uvcb->rc =3D=3D 0x10a ? -ENXIO : -EINVAL; } +EXPORT_SYMBOL(__make_folio_secure); =20 static int make_folio_secure(struct mm_struct *mm, struct folio *folio, st= ruct uv_cb_header *uvcb) { @@ -339,7 +340,7 @@ static int make_folio_secure(struct mm_struct *mm, stru= ct folio *folio, struct u * but another attempt can be made; * -EINVAL in case of other folio splitting errors. See split_folio(). */ -static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *fol= io) +int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio) { int rc, tried_splits; =20 @@ -411,6 +412,7 @@ static int s390_wiggle_split_folio(struct mm_struct *mm= , struct folio *folio) } return -EAGAIN; } +EXPORT_SYMBOL_GPL(s390_wiggle_split_folio); =20 int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb) { --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A75D421890; Wed, 4 Feb 2026 15:03:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217390; cv=none; b=OYBQRqZpONn+OfuTMBfWwz53tmpmjmiQULUkXrxgNqOq+dSUUegCKm2KREsJMT095t749CUgZ2mhGLSIOETQxBL0lwg+BCB/9qCNXoTcx/HECqKGAr/iPPk0BI0OZOeip46v+Cbk8BPueeQy7byITtELuna6TnJv7zR9kAf/U0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217390; c=relaxed/simple; bh=9J579RODsBY3NHU28XOdCR6DAB1tRTgjQOLebDa/iz0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=utkspyojGFOiwOfaJkUKS1642aQ0MPZHDK3xbTcA+N8ScllixBC7YN78nq/ggBGcaeI6LDKlQQtFdshFbE/hnzf643FcGa5ST2RdLOCTKRlyrWy6aZ1EhikfVw46TnAd1hECPJbWV6PP0mZp0pq8zTrQCC3PYdUnMgWCMT7ufTM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=L8LwRJ6v; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="L8LwRJ6v" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614F228Q014475; Wed, 4 Feb 2026 15:03:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=oOx4bRoIpElNi+6/4 tyln67C51SNBcTtwkDt4dOQoUw=; b=L8LwRJ6vDQvKHjeBWZF0K8O+LSn/YuHRi TjjZqGJRVAdfEk/a7iiegK6047WY2PCVxEAK9NbYRzrG0a64ZbKhAtn4+w9Ao+jv dMJ1DhztUzD5aYf0Fmo1VniH35OL102Wdih+cVbo0gw7wRnebCUxWGSniNHTz2AR z2S54AdfNRQwhizduVDIM02KTHj1G6i39h4Sk5F3qqUBMDvI43x6x09WFo2BMP8N gEdHzf6tsKXD5mMy7PGSsOaCMWfldwLw9OsmglmoCLxCV5ohtXf2+EqQKofOhCYA 6Gr1YeMTPdga7CSy4JIpFM3wmg0B9plI9wSwAk/sHf2dR92fgvzVQ== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad6h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614D1fOZ004411; Wed, 4 Feb 2026 15:03:06 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1wjjwk8p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:05 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F32sv43975124 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:02 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EF96C20040; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AEB3E2004D; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 08/29] s390/mm: Warn if uv_convert_from_secure_pte() fails Date: Wed, 4 Feb 2026 16:02:37 +0100 Message-ID: <20260204150259.60425-9-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfXyBnAd7/AiFXG NqIZ8otL826+ELG1ypCe15/2ypbCO7027oPOJk8c8Xi5mMIyjtBlmCVMWxkOvSmSdmBfkW7ts8M aQtN0+REDPJH36pROLG8lAASouSrmAIWloAZQ9Spgh/iSHbMTfP5F7Wzc7BRiII7QsR7upBVa8a QeUrwgZxqaryhbnaoofZxj43Kp7OnDcUXKCjp0WUQFqBCUUkj18FmbrYTX7iotN0s3tkwyRH+GS pYECrw0UmdFgy9Ocq8a+3pwy1LS5xJjk9QA/X/3tz86zJ3+J7WimEXXj4f4+V8PTjRBCUT3Nitd KWzMKTGkP1rgJe8fAxf4mo6T8nun3NE39CZWScUcHtTE4qCP8NOSzluvQpD5y7+8MSz/DZ0aM6W nGZ9M1vGOJX0Obfo6ACdLspi/Pk5YayQ9+DjphqwWnYzp1NKNr6vJqM0N5qrmh9fi7ufdHM+wQJ 3UiKnNpYDpfy9S0dsOw== X-Proofpoint-GUID: OBo_1wYLNWGoA9Tb0Fh3liAmLAa0ft-n X-Proofpoint-ORIG-GUID: OBo_1wYLNWGoA9Tb0Fh3liAmLAa0ft-n X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fab cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=gEHK17p0qXyn9eJGjbgA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" If uv_convert_from_secure_pte() fails, the page becomes unusable by the host. The failure can only occour in case of hardware malfunction or a serious KVM bug. When the unusable page is reused, the system can have issues and hang. Print a warning to aid debugging such unlikely scenarios. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/include/asm/pgtable.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 73c30b811b98..04335f5e7f47 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1239,7 +1239,7 @@ static inline pte_t ptep_get_and_clear(struct mm_stru= ct *mm, res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); /* At this point the reference through the mapping is still present */ if (mm_is_protected(mm) && pte_present(res)) - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 @@ -1257,7 +1257,7 @@ static inline pte_t ptep_clear_flush(struct vm_area_s= truct *vma, res =3D ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID)); /* At this point the reference through the mapping is still present */ if (mm_is_protected(vma->vm_mm) && pte_present(res)) - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 @@ -1294,9 +1294,10 @@ static inline pte_t ptep_get_and_clear_full(struct m= m_struct *mm, /* * If something went wrong and the page could not be destroyed, or * if this is not a mm teardown, the slower export is used as - * fallback instead. + * fallback instead. If even that fails, print a warning and leak + * the page, to avoid crashing the whole system. */ - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FAD842188E; Wed, 4 Feb 2026 15:03:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217390; cv=none; b=cOD/nVvT6VXNhRGmpojR7mJeEXHFKBeiCwZiE50vXMxFz2OiQyLMKz4BmK1KW2NoIEFJbHrAIE/72heaz9SXnigVQexulL9BJ3ZZJZ9t5NdCFtHIc8Yicxnqsv3eV1mHPUr0pQDNeZGUCdDxsNQ9N+j2cqFPv087krc4qXmEzQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217390; c=relaxed/simple; bh=MwRlEXS+9TTKAVxXRbRJfm30lcddhdF+DFAQ7aLn7eQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QBV3T8X3gChisWMyhd0sh4jcZqoBLfOr54TUdJwyJefZyDA224YPiJPr4TfDSMEVFPRQeAEhjDe6/GxaO9sASMew3abnOAAsv3CEncck6g1//OvUJxPF53/riL/a409CrdCMbHiRM8cLadxy8JpxLSNEKmK6PohwMogVVTRPFXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=A86Xhtne; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="A86Xhtne" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613LxS5G024736; Wed, 4 Feb 2026 15:03:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9VBHMGApuDPu1H5nW xC2gjbA/GYRZEiMfQwZendHh0k=; b=A86Xhtne8s8i9Dguc9de2j+Rg6l3OyN1B OnQCSDu9EbwvF9SgVDpx0x89aBow3lDsFN1Y4s+kQpo45j+fU5zuRMsKS0XIIUKo PIU/lY9IBRe8AFO8doH7CtOBZWTNC3Ts84KgtP1el8Xq3FqT6wTtjh5qRVNUuvEP CmcBfkmtGLUPjleQBNX/zcb7cIz5iVom0A1P7xw+VVjgXX3cchi6ltpT6+WoYnWi 8w7h8gKgPsizAC+atsB4AjaN4N7mES0sAVELq+165At4W14v/8FcVQbM+7lFHWj+ LDBOGfasqzeyhDcMxtyz5arPw39dpiXLgYN/n+Ri1pvSXLs4jLAlA== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19f6jg2t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CV4Xx025812; Wed, 4 Feb 2026 15:03:06 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnjq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:06 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F32dV43975126 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:02 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46E312004B; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 038E120043; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 09/29] KVM: s390: vsie: Pass gmap explicitly as parameter Date: Wed, 4 Feb 2026 16:02:38 +0100 Message-ID: <20260204150259.60425-10-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Oj7l8ikxz_hWsQlyQlXRUkBPTh3E59Eg X-Authority-Analysis: v=2.4 cv=drTWylg4 c=1 sm=1 tr=0 ts=69835fab cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=HXG1Eys9fZm8ipb54dgA:9 X-Proofpoint-ORIG-GUID: Oj7l8ikxz_hWsQlyQlXRUkBPTh3E59Eg X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX+OdHcrARwFEC mzZw7Py5KLAZ8j/LGmYGHf0Xg994neEp3ZMAwIyHVvI8qUWON5RuZ07xjS+QCXxbs0eGJgNi3xR lKjVev7rDmz9u+XEXtOJ46ahZSxvr0hNVjk4S8ApMf4dfnsYEv9DroQ92VUCVHAe7w8Pxx99fp7 DTw3SkHbPBViyWq0sTl5/kQ3YrPZhR5SRR5q7rtyEcFyoFr3uPrCWyh2lXf/XT1KhH6D3WnhQHx nuiDTwsHvOkEQahN5wUROxsd6OotEVnkR/+VZRCzmRcgCkrj9/DhY7gLffgrKdGEmFrP+doU44y ZVGVtwoO12mhP8Xv9R7NoDLlr/RanhQ2pfaGtYPZTkVHVCfCDMbMjJlfGCJR6mQpc20TFozpO09 hiRvWxWzSPGh/ZE4rPBvC4layjOHbvKMoDpoUVqfmzliJv3+C4+EDrxROCI0F7bTDilmuIzEGRK HDn+GzN9E233nncbBWA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Pass the gmap explicitly as parameter, instead of just using vsie_page->gmap. This will be used in upcoming patches. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/vsie.c | 40 +++++++++++++++++++--------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index b526621d2a1b..1dd54ca3070a 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -652,7 +652,7 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, uns= igned long start, * - -EAGAIN if the caller can retry immediately * - -ENOMEM if out of memory */ -static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page, = struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; u64 prefix =3D scb_s->prefix << GUEST_PREFIX_SHIFT; @@ -667,10 +667,9 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vs= ie_page *vsie_page) /* with mso/msl, the prefix lies at offset *mso* */ prefix +=3D scb_s->mso; =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix, NULL); if (!rc && (scb_s->ecb & ECB_TE)) - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - prefix + PAGE_SIZE, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL); /* * We don't have to mprotect, we will be called for all unshadows. * SIE will detect if protection applies and trigger a validity. @@ -951,7 +950,7 @@ static int inject_fault(struct kvm_vcpu *vcpu, __u16 co= de, __u64 vaddr, * - > 0 if control has to be given to guest 2 * - < 0 if an error occurred */ -static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page= , struct gmap *sg) { int rc; =20 @@ -960,8 +959,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) return inject_fault(vcpu, PGM_PROTECTION, current->thread.gmap_teid.addr * PAGE_SIZE, 1); =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - current->thread.gmap_teid.addr * PAGE_SIZE, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * P= AGE_SIZE, NULL); if (rc > 0) { rc =3D inject_fault(vcpu, rc, current->thread.gmap_teid.addr * PAGE_SIZE, @@ -978,12 +976,10 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct= vsie_page *vsie_page) * * Will ignore any errors. The next SIE fault will do proper fault handlin= g. */ -static void handle_last_fault(struct kvm_vcpu *vcpu, - struct vsie_page *vsie_page) +static void handle_last_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsi= e_page, struct gmap *sg) { if (vsie_page->fault_addr) - kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - vsie_page->fault_addr, NULL); + kvm_s390_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL); vsie_page->fault_addr =3D 0; } =20 @@ -1065,7 +1061,7 @@ static u64 vsie_get_register(struct kvm_vcpu *vcpu, s= truct vsie_page *vsie_page, } } =20 -static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page) +static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page, struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; unsigned long pei_dest, pei_src, src, dest, mask, prefix; @@ -1083,8 +1079,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) src =3D vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; src =3D _kvm_s390_real_to_abs(prefix, src) + scb_s->mso; =20 - rc_dest =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest); - rc_src =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src); + rc_dest =3D kvm_s390_shadow_fault(vcpu, sg, dest, &pei_dest); + rc_src =3D kvm_s390_shadow_fault(vcpu, sg, src, &pei_src); /* * Either everything went well, or something non-critical went wrong * e.g. because of a race. In either case, simply retry. @@ -1144,7 +1140,7 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) * - > 0 if control has to be given to guest 2 * - < 0 if an error occurred */ -static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page,= struct gmap *sg) __releases(vcpu->kvm->srcu) __acquires(vcpu->kvm->srcu) { @@ -1153,7 +1149,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) int guest_bp_isolation; int rc =3D 0; =20 - handle_last_fault(vcpu, vsie_page); + handle_last_fault(vcpu, vsie_page, sg); =20 kvm_vcpu_srcu_read_unlock(vcpu); =20 @@ -1191,7 +1187,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) goto xfer_to_guest_mode_check; } guest_timing_enter_irqoff(); - rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, vsie_page-= >gmap->asce); + rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce); guest_timing_exit_irqoff(); local_irq_enable(); } @@ -1215,7 +1211,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) if (rc > 0) rc =3D 0; /* we could still have an icpt */ else if (current->thread.gmap_int_code) - return handle_fault(vcpu, vsie_page); + return handle_fault(vcpu, vsie_page, sg); =20 switch (scb_s->icptcode) { case ICPT_INST: @@ -1233,7 +1229,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) break; case ICPT_PARTEXEC: if (scb_s->ipa =3D=3D 0xb254) - rc =3D vsie_handle_mvpg(vcpu, vsie_page); + rc =3D vsie_handle_mvpg(vcpu, vsie_page, sg); break; } return rc; @@ -1330,15 +1326,17 @@ static void unregister_shadow_scb(struct kvm_vcpu *= vcpu) static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; + struct gmap *sg; int rc =3D 0; =20 while (1) { rc =3D acquire_gmap_shadow(vcpu, vsie_page); + sg =3D vsie_page->gmap; if (!rc) - rc =3D map_prefix(vcpu, vsie_page); + rc =3D map_prefix(vcpu, vsie_page, sg); if (!rc) { update_intervention_requests(vsie_page); - rc =3D do_vsie_run(vcpu, vsie_page); + rc =3D do_vsie_run(vcpu, vsie_page, sg); } atomic_andnot(PROG_BLOCK_SIE, &scb_s->prog20); =20 --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 206C64218BB; Wed, 4 Feb 2026 15:03:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; cv=none; b=Bay8wTQOfssmupoyeBPJWfSLHkmEUpEeXdIg35cL43JvqHX5IZvttxIsRcvSdFKiBEZ9NBVLxvFUZXulFsgh1JCn99OoYpp2v0uVb2Pkm2lk2dNeOsCZ5RCOmcrBXbP483dvajeeoYwwDTW8F9xiL7HXNAGhh5sM/rw5X5K0Zo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; c=relaxed/simple; bh=8UkpGTWwtwpBCmzbEuC+cxDdvga6PTHi3Kdv9DBkpd4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SFFwzoFjU+bqnTT7sZCb12FxnNRc1hbK4PcRolTPR4Dexm4aGGdf2ExD4cm35054A232s2dGCsVEk2RPcpZRj8g2IYCi3VcsrFuzgSuUHUo/4gPLntD9TWCFv4RSo138zy/h21PTikZ4+O7IM1A57MWQ3nLV0D0bxhv2XA4xH3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=afTRYFRy; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="afTRYFRy" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61407IjU005992; Wed, 4 Feb 2026 15:03:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=lwTngDyaNYF1vtCdZ /G6YT4Oibj8h9KDMYrxj6jl/Wg=; b=afTRYFRygHGdhppPyxlL7BWj96hgupLR6 P5VPrdZ8sldHEO4r+nUhrW5xC5UqMluPEiiu+1igVvEsMas9RJb4duGJjNsWhkHe Fh9yFzgVuhhu6cdFcaqp5vTw6sp4EW7JcxJXFPEF+spR7P5+bSHea11wglBt4ksD Bn9IQyX81jsRW/X8jmCPiieCClnULvJiwM9A6QyPEoxwyzTyVHe0Q8A/sSrOr9E1 7bqs3O1FUIIJj3U+LaibAxTOwUfL2Jp5SGssOf4hk3Zp1oZW3wx+Yo5qilSBTABy WMWtQkrgt3qR/LYUz4n8221mOW1+iQbO3+9YeF+dxXoYcYLz97iqQ== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rng-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614DdM65005959; Wed, 4 Feb 2026 15:03:06 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1x9jde59-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:06 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F32tC36897240 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:02 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8F2A82004B; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E84A20040; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 10/29] KVM: s390: Enable KVM_GENERIC_MMU_NOTIFIER Date: Wed, 4 Feb 2026 16:02:39 +0100 Message-ID: <20260204150259.60425-11-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX1bR9wHgpCccQ ijr78ARjMXdVw1BuAruwTJb/bTqNKvWtPhaRzMFP/Vyebe4od2brHQ/P2ELrBKZWnKxdCP+zOjN jWhlnl5R8EoHJz1a8z9jP1dE4YUWNZSeWqhN0i2Od51HlUaf0VTQcCkW0HgFaG7qrMtWBhe0dXS 32r5qJePwBH++gCgJuhZBay4UmbRpNXh07LjVuD+21a+93fIZsMIq/PkLmuziT+bJ0sMZ0HT2N4 Ef4UYidptSRSg2cMk4dZp+wfrEvzIafpXd0ypsMF0xbDMK+kYxSqXne+MWz7sn7Km2o7XHv9afk 0A60KgJH/xMnxAPwsTPI2+v0k8tRPMJKliGVNWbNj1SSvbb7iYnxRgv8a1QVwBGp8KDTIe6O8ht loc+9rtLNw3odAzDm4OZ2r8KCj2PhND/Mg6sWYiiIu6ohWY1qRMvzTHuxWJScj9qcT8H5qi9Xms ewoGEWi6bzvrhG6Cj9g== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fab cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=sWpJ-_il0aw9mM0vlR0A:9 X-Proofpoint-ORIG-GUID: JBhZwtGI8A2FbpnkPh3cIxmHSgMeWUr7 X-Proofpoint-GUID: JBhZwtGI8A2FbpnkPh3cIxmHSgMeWUr7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Enable KVM_GENERIC_MMU_NOTIFIER, for now with empty placeholder callbacks. Also enable KVM_MMU_LOCKLESS_AGING and define KVM_HAVE_MMU_RWLOCK. Signed-off-by: Claudio Imbrenda Acked-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Acked-by: Heiko Carstens --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Kconfig | 2 ++ arch/s390/kvm/kvm-s390.c | 45 +++++++++++++++++++++++++++++++- 3 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 3dbddb7c60a9..6ba99870fc32 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -27,6 +27,7 @@ #include #include =20 +#define KVM_HAVE_MMU_RWLOCK #define KVM_MAX_VCPUS 255 =20 #define KVM_INTERNAL_MEM_SLOTS 1 diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index f4ec8c1ce214..917ac740513e 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -30,6 +30,8 @@ config KVM select KVM_VFIO select MMU_NOTIFIER select VIRT_XFER_TO_GUEST_WORK + select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_LOCKLESS_AGING help Support hosting paravirtualized guest machines using the SIE virtualization capability on the mainframe. This should work diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index cd39b2f099ca..ec92e6361eab 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4805,7 +4805,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu= , gfn_t gfn, gpa_t gaddr, u rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); if (!rc) rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(spinlock, &vcpu->kvm->mmu_lock) { + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { kvm_release_faultin_page(vcpu->kvm, page, false, writable); } mmap_read_unlock(vcpu->arch.gmap->mm); @@ -6021,6 +6021,49 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, return; } =20 +/** + * kvm_test_age_gfn() - test young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range is young, otherwise 0. + */ +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_age_gfn() - clear young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range was young, otherwise 0. + */ +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_unmap_gfn_range() - Unmap a range of guest addresses + * @kvm: the kvm instance + * @range: the range of guest page frames to invalidate + * + * This function always returns false because every DAT table modification + * has to use the appropriate DAT table manipulation instructions, which w= ill + * keep the TLB coherent, hence no additional TLB flush is ever required. + * + * Context: called by KVM common code with the kvm mmu write lock held + * Return: false + */ +bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + static inline unsigned long nonhyp_mask(int i) { unsigned int nonhyp_fai =3D (sclp.hmfai << i * 2) >> 30; --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90CE4421A02; Wed, 4 Feb 2026 15:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; cv=none; b=fdca5NCWvzM+9P07GpPdFAHb8Y0QVdIkManiLWtyOGap1i9d+NfccBHWOn+VQYyq+OsMle46sV4+LKAZsFqKwmtxqeIdEb5g/9cavVWCSRMdcp9cNUtyJfrwTucq9267pqAnbbXgaTua1CIwegdax6a2Eh4mxoabYgbUnRT8Oww= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217391; c=relaxed/simple; bh=gvAnFdHT/XMjCR1zs+fqNo8NXhkPFU1ciyQBZIXGeAY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q8qCTneNfPBuf+cmbXUYvlGbGLMXrQ/QMh6FOnBBzyoMuTTumqLdh+9ZX3lVrOPBkebkfg1eER/6p2ozZrUZnJHKJXLal7gVMnP7gMigSCzRgNMS7cEJmNKoHY/Qpp3T2+WbhVwtK/LmFObP0xEZmulum66JSVUXczJDEonkgQY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=KPJPCIIp; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="KPJPCIIp" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6140IDtw012349; Wed, 4 Feb 2026 15:03:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=JtFCSxrF9CLU3LvfT rCfyoeTzzbfLQKkoH4wpG6bzTA=; b=KPJPCIIpPXvsGj1KQDKF9nJBaDpmP5PWx OKnvynhFCojDVxdj/RYhG61DaRIpya2hYRhAtC2cbmFXmtMmhb2tB/QcJnFMR77Q 5cM17DXpAjEbcOmmwIL+oSD3To1vhvYmYyoE6vPsQGJ3h4+cQJJosxiZBSCcv/nd PkjkbdtLJGssfIk4pC6+pQ0vP7BdrwbXf05GdhsMJ0FpiKu34ILExmdRS6tuXMOL Mq/le+jJfC869fCIJ7fdCT54P8fJXxa9DeQHPMn31UyasUjHVtTcdUWLmp4W9uzG cTG+JHCllGjGFeYB0Jm50SrkXYj0CGK2fjW7dKmzwEoHu7Lp1LgAQ== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rnk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614DFkhW004437; Wed, 4 Feb 2026 15:03:06 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1wjjwk8t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:06 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F32lH46727438 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:03 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D7A8120043; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 968872004D; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 11/29] KVM: s390: Rename some functions in gaccess.c Date: Wed, 4 Feb 2026 16:02:40 +0100 Message-ID: <20260204150259.60425-12-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX3+nPGB8BAeUh qkb7TH2l7TZvS1M/ZosAtyjY8RsJnL5xLCdiUngVLFYT4b8/tPPzkz2iR4JsU8MTRQtuNPTkSlf ov3U4hwvx63Q06lROQdyyyQzs35iW0K+T2u75N6Zq3SoPZR1iBRVwCV7NcNW/ATJcBf7LOWGg/I 702veBtQgGm3izhDoGLRHlAXLAUTDEM9HUNYKuXV41IomkLSKYJMd6Rn3+C5GUoMc8r3J15Qdd0 RXggrpOp8aEZUc1PRKHs8+n6Bd5zx0UD4uUFxcyHcWw2VMslcZFnSNmGPF1ta3Nn2eRsHL6ae8i UHRSOKuz3jfEun8Ix7iLE8b3MrGiIwFHUJCv+HSgXzf/9ikRIzwm73VEU+h0DeYSeYXZVUV/N4Y fv5f379YVQz8lbh9/wRzCV4WMu0hkPz9KSeaf8SvZo8KpOZw43UiG5z7uX7RxzRkg/4XzsiyCQT ianU33Kf5p9b66Upraw== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fab cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=daTLW1200KpnZZ0VTdgA:9 X-Proofpoint-ORIG-GUID: nKB0nAMT7wuiZ_nS1kTdLV7Ol8LqsdRK X-Proofpoint-GUID: nKB0nAMT7wuiZ_nS1kTdLV7Ol8LqsdRK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Rename some functions in gaccess.c to add a _gva or _gpa suffix to indicate whether the function accepts a virtual or a guest-absolute address. This makes it easier to understand the code. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Acked-by: Heiko Carstens --- arch/s390/kvm/gaccess.c | 51 +++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index d8347f7cbe51..9df868bddf9a 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -397,7 +397,7 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) } =20 /** - * guest_translate - translate a guest virtual into a guest absolute addre= ss + * guest_translate_gva() - translate a guest virtual into a guest absolute= address * @vcpu: virtual cpu * @gva: guest virtual address * @gpa: points to where guest physical (absolute) address should be stored @@ -417,9 +417,9 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) * the returned value is the program interruption code as defined * by the architecture */ -static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long = gva, - unsigned long *gpa, const union asce asce, - enum gacc_mode mode, enum prot_type *prot) +static unsigned long guest_translate_gva(struct kvm_vcpu *vcpu, unsigned l= ong gva, + unsigned long *gpa, const union asce asce, + enum gacc_mode mode, enum prot_type *prot) { union vaddress vaddr =3D {.addr =3D gva}; union raddress raddr =3D {.addr =3D gva}; @@ -600,8 +600,8 @@ static int low_address_protection_enabled(struct kvm_vc= pu *vcpu, return 1; } =20 -static int vm_check_access_key(struct kvm *kvm, u8 access_key, - enum gacc_mode mode, gpa_t gpa) +static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, + enum gacc_mode mode, gpa_t gpa) { u8 storage_key, access_control; bool fetch_protected; @@ -663,9 +663,9 @@ static bool storage_prot_override_applies(u8 access_con= trol) return access_control =3D=3D PAGE_SPO_ACC; } =20 -static int vcpu_check_access_key(struct kvm_vcpu *vcpu, u8 access_key, - enum gacc_mode mode, union asce asce, gpa_t gpa, - unsigned long ga, unsigned int len) +static int vcpu_check_access_key_gpa(struct kvm_vcpu *vcpu, u8 access_key, + enum gacc_mode mode, union asce asce, gpa_t gpa, + unsigned long ga, unsigned int len) { u8 storage_key, access_control; unsigned long hva; @@ -757,7 +757,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return trans_exc(vcpu, PGM_PROTECTION, ga, ar, mode, PROT_TYPE_LA); if (psw_bits(*psw).dat) { - rc =3D guest_translate(vcpu, ga, &gpa, asce, mode, &prot); + rc =3D guest_translate_gva(vcpu, ga, &gpa, asce, mode, &prot); if (rc < 0) return rc; } else { @@ -769,8 +769,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, } if (rc) return trans_exc(vcpu, rc, ga, ar, mode, prot); - rc =3D vcpu_check_access_key(vcpu, access_key, mode, asce, gpa, ga, - fragment_len); + rc =3D vcpu_check_access_key_gpa(vcpu, access_key, mode, asce, gpa, ga, = fragment_len); if (rc) return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_KEYC); if (gpas) @@ -782,8 +781,8 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return 0; } =20 -static int access_guest_page(struct kvm *kvm, enum gacc_mode mode, gpa_t g= pa, - void *data, unsigned int len) +static int access_guest_page_gpa(struct kvm *kvm, enum gacc_mode mode, gpa= _t gpa, + void *data, unsigned int len) { const unsigned int offset =3D offset_in_page(gpa); const gfn_t gfn =3D gpa_to_gfn(gpa); @@ -798,9 +797,8 @@ static int access_guest_page(struct kvm *kvm, enum gacc= _mode mode, gpa_t gpa, return rc; } =20 -static int -access_guest_page_with_key(struct kvm *kvm, enum gacc_mode mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) +static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, + void *data, unsigned int len, u8 access_key) { struct kvm_memory_slot *slot; bool writable; @@ -808,7 +806,7 @@ access_guest_page_with_key(struct kvm *kvm, enum gacc_m= ode mode, gpa_t gpa, hva_t hva; int rc; =20 - gfn =3D gpa >> PAGE_SHIFT; + gfn =3D gpa_to_gfn(gpa); slot =3D gfn_to_memslot(kvm, gfn); hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); =20 @@ -841,7 +839,7 @@ int access_guest_abs_with_key(struct kvm *kvm, gpa_t gp= a, void *data, =20 while (min(PAGE_SIZE - offset, len) > 0) { fragment_len =3D min(PAGE_SIZE - offset, len); - rc =3D access_guest_page_with_key(kvm, mode, gpa, data, fragment_len, ac= cess_key); + rc =3D access_guest_page_with_key_gpa(kvm, mode, gpa, data, fragment_len= , access_key); if (rc) return rc; offset =3D 0; @@ -901,15 +899,14 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsi= gned long ga, u8 ar, for (idx =3D 0; idx < nr_pages; idx++) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpas[idx]), len); if (try_fetch_prot_override && fetch_prot_override_applies(ga, fragment_= len)) { - rc =3D access_guest_page(vcpu->kvm, mode, gpas[idx], - data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpas[idx], data, fragment= _len); } else { - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, access_key); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, access_key); } if (rc =3D=3D PGM_PROTECTION && try_storage_prot_override) - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, PAGE_SPO_ACC); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, PAGE_SPO_ACC); if (rc) break; len -=3D fragment_len; @@ -943,7 +940,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, while (len && !rc) { gpa =3D kvm_s390_real_to_abs(vcpu, gra); fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), len); - rc =3D access_guest_page(vcpu->kvm, mode, gpa, data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpa, data, fragment_len); len -=3D fragment_len; gra +=3D fragment_len; data +=3D fragment_len; @@ -1134,7 +1131,7 @@ int check_gpa_range(struct kvm *kvm, unsigned long gp= a, unsigned long length, =20 while (length && !rc) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), length); - rc =3D vm_check_access_key(kvm, access_key, mode, gpa); + rc =3D vm_check_access_key_gpa(kvm, access_key, mode, gpa); length -=3D fragment_len; gpa +=3D fragment_len; } --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D096425CD2; Wed, 4 Feb 2026 15:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; cv=none; b=fK8Se4x+s/mzboB1HPZ8UkJJP7gFCxnTTuIweIXTGvE8e0Ym47vp7nlJC1K/ImL9VKJ6F+9BPeOHoobi9zz7nuR1ipv1UmBubaNnbPjJa6+B5vSl9inf9xOYWvWgfQFaI1UaCQlGAR7VFl5+CsdLx1WKtrNVbyY4v2AySrwwfJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; c=relaxed/simple; bh=MhtDSpaX0tH4IhJ1TP7pk0wsnsh98p4+Ub5xq2OXzkU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uC2a8/9KpF47nk+CTzjWYFiIu9xNZVbpgFgmrP3w2l7yhtaltAziVyfmWH8el4Hrp5Cx3PD6HE13wRnbEhGmAGD1fCdHbiC909dQXYVpx0JWHtxao3X5VBPJXFbaNcYSnWsjLR3J1/YaTiGb0Jvbk7Oaa/lmFn+XK8YTUsZIYhA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=sX42BbmV; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="sX42BbmV" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6142U4Yg002790; Wed, 4 Feb 2026 15:03:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=eIIWaeMzgmhDevSfn 4vyLA52tgZlq9qAgslSUN52sVs=; b=sX42BbmVwfivathFkyTLpNJqIKvb67RP8 StWQP/A3/QekhFgtfr18JGy7q8P/OYOqEEUcesu7/RmDlxoEEgDtEaEvb1fwemf1 TbUoZf5huGtaXuXaKrPHUKTTYDO0fDWPZHBBq7zp/UbUtf5LS9VH7aBoX77x/Tv7 3Uu4uwFnJ2xszbCSE41+fCs7dLeGY2NVXdM26ZXSvJ9kFrDUiZOz1DDMm55TsWD1 x6VPBvqKNav/y2ze7jVxTo26Y0fzS18xf+9w0AjlvA7JyL76E5ezGOPjubHwNc3B 5LJwbLSTfgaeZCnmhBrYoVVHJqgp7B+E/9OCeRPeJXopCCo3KAJWQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw7a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614C9Ovl025696; Wed, 4 Feb 2026 15:03:07 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnjw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F33go22085892 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:03 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3ED0C20043; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E00962004B; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:02 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 12/29] KVM: s390: KVM-specific bitfields and helper functions Date: Wed, 4 Feb 2026 16:02:41 +0100 Message-ID: <20260204150259.60425-13-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835fac cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=3TdPgYH1icVPsE-fAXsA:9 X-Proofpoint-GUID: NizWbzzSAvTPK6lWIwolQR09PrUZJqM8 X-Proofpoint-ORIG-GUID: NizWbzzSAvTPK6lWIwolQR09PrUZJqM8 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX2qvtbJfd6HB7 jU/6H+mkWR736duav2Dtxhl5eXTqvn9QgDEpRyKk2xuNGdo2AvfL7UdmBGZ/ddSrNGJSU+Ponug DSzDuwxQa6U4D2JPcRpJRuRBXkXxKGNDUooRxAEB9RWf/NG/svfSqW/8/C0NZoARA0eeL/fTiv9 n8eMIUwD8UlBbK66Oq5FXSjmzXzQDJW7RWv4w3waVEkJe5gNBJa5e1+fNRTGdTbyeCnJifIJAeD YsJf0avhBv9VP0C+/g3/1bVTmrpSWp0Vo/Pp0Y1VgUwatyDglOC0TsOCHs1E20pGqBztUP0EIoh hctZgzhjaHCXfi74PTjKRYRbYpssyoJw+j9CgiwknM/9P5outhc5nfYoQKl8+q6tZF8vnFfNgd2 KgpiM0EQEz7sw/bAHNbOuyBC3JHLU9X84tosLxFWeHIRN+OxGI0YPjNIvypc+UNKWLu9if/P7LG mICK8dUuVpxnpFVJRog== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add KVM-s390 specific bitfields and helper functions to manipulate DAT tables. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.h | 720 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 720 insertions(+) create mode 100644 arch/s390/kvm/dat.h diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h new file mode 100644 index 000000000000..d5e1a45813bc --- /dev/null +++ b/arch/s390/kvm/dat.h @@ -0,0 +1,720 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2024, 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_DAT_H +#define __KVM_S390_DAT_H + +#include +#include +#include +#include +#include +#include +#include + +#define _ASCE(x) ((union asce) { .val =3D (x), }) +#define NULL_ASCE _ASCE(0) + +enum { + _DAT_TOKEN_NONE =3D 0, + _DAT_TOKEN_PIC, +}; + +#define _CRSTE_TOK(l, t, p) ((union crste) { \ + .tok.i =3D 1, \ + .tok.tt =3D (l), \ + .tok.type =3D (t), \ + .tok.par =3D (p) \ + }) +#define _CRSTE_PIC(l, p) _CRSTE_TOK(l, _DAT_TOKEN_PIC, p) + +#define _CRSTE_HOLE(l) _CRSTE_PIC(l, PGM_ADDRESSING) +#define _CRSTE_EMPTY(l) _CRSTE_TOK(l, _DAT_TOKEN_NONE, 0) + +#define _PMD_EMPTY _CRSTE_EMPTY(TABLE_TYPE_SEGMENT) + +#define _PTE_TOK(t, p) ((union pte) { .tok.i =3D 1, .tok.type =3D (t), .to= k.par =3D (p) }) +#define _PTE_EMPTY _PTE_TOK(_DAT_TOKEN_NONE, 0) + +/* This fake table type is used for page table walks (both for normal page= tables and vSIE) */ +#define TABLE_TYPE_PAGE_TABLE -1 + +enum dat_walk_flags { + DAT_WALK_CONTINUE =3D 0x20, + DAT_WALK_IGN_HOLES =3D 0x10, + DAT_WALK_SPLIT =3D 0x08, + DAT_WALK_ALLOC =3D 0x04, + DAT_WALK_ANY =3D 0x02, + DAT_WALK_LEAF =3D 0x01, + DAT_WALK_DEFAULT =3D 0 +}; + +#define DAT_WALK_SPLIT_ALLOC (DAT_WALK_SPLIT | DAT_WALK_ALLOC) +#define DAT_WALK_ALLOC_CONTINUE (DAT_WALK_CONTINUE | DAT_WALK_ALLOC) +#define DAT_WALK_LEAF_ALLOC (DAT_WALK_LEAF | DAT_WALK_ALLOC) + +union pte { + unsigned long val; + union page_table_entry h; + struct { + unsigned long :56; /* Hardware bits */ + unsigned long u : 1; /* Page unused */ + unsigned long s : 1; /* Special */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long sd: 1; /* Soft dirty */ + unsigned long pr: 1; /* Present */ + } s; + struct { + unsigned char hwbytes[7]; + unsigned char swbyte; + }; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :20; + unsigned long : 1; /* Must be 0 */ + unsigned long i : 1; /* Must be 1 */ + unsigned long : 2; + unsigned long : 7; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; +}; + +/* Soft dirty, needed as macro for atomic operations on ptes */ +#define _PAGE_SD 0x002 + +/* Needed as macro to perform atomic operations */ +#define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ + +enum pgste_gps_usage { + PGSTE_GPS_USAGE_STABLE =3D 0, + PGSTE_GPS_USAGE_UNUSED, + PGSTE_GPS_USAGE_POT_VOLATILE, + PGSTE_GPS_USAGE_VOLATILE, +}; + +union pgste { + unsigned long val; + struct { + unsigned long acc : 4; + unsigned long fp : 1; + unsigned long : 3; + unsigned long pcl : 1; + unsigned long hr : 1; + unsigned long hc : 1; + unsigned long : 2; + unsigned long gr : 1; + unsigned long gc : 1; + unsigned long : 1; + unsigned long :16; /* val16 */ + unsigned long zero : 1; + unsigned long nodat : 1; + unsigned long : 4; + unsigned long usage : 2; + unsigned long : 8; + unsigned long cmma_d : 1; /* Dirty flag for CMMA bits */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificatio= n */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 5; + unsigned long : 8; + }; + struct { + unsigned short hwbytes0; + unsigned short val16; /* Used to store chunked values, see dat_{s,g}et_p= tval() */ + unsigned short hwbytes4; + unsigned char flags; /* Maps to the software bits */ + unsigned char hwbyte7; + } __packed; +}; + +union pmd { + unsigned long val; + union segment_table_entry h; + struct { + struct { + unsigned long :44; /* HW */ + unsigned long : 3; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union pud { + unsigned long val; + union region3_table_entry h; + struct { + struct { + unsigned long :33; /* HW */ + unsigned long :14; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union p4d { + unsigned long val; + union region2_table_entry h; +}; + +union pgd { + unsigned long val; + union region1_table_entry h; +}; + +union crste { + unsigned long val; + union { + struct { + unsigned long :52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long : 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long : 2; + }; + struct { + unsigned long to:52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long tf: 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long tl: 2; + } fc0; + struct { + unsigned long :47; + unsigned long av : 1; /* ACCF-Validity Control */ + unsigned long acc: 4; /* Access-Control Bits */ + unsigned long f : 1; /* Fetch-Protection Bit */ + unsigned long fc : 1; /* Format-Control */ + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long iep: 1; /* Instruction-Execution-Protection */ + unsigned long : 2; + unsigned long i : 1; /* Segment-Invalid Bit */ + unsigned long cs : 1; /* Common-Segment Bit */ + unsigned long tt : 2; /* Table-Type Bits */ + unsigned long : 2; + } fc1; + } h; + struct { + struct { + unsigned long :47; + unsigned long : 1; /* HW (should be 0) */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :26; + unsigned long i : 1; /* Must be 1 */ + unsigned long : 1; + unsigned long tt : 2; + unsigned long : 1; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; + union pmd pmd; + union pud pud; + union p4d p4d; + union pgd pgd; +}; + +union skey { + unsigned char skey; + struct { + unsigned char acc :4; + unsigned char fp :1; + unsigned char r :1; + unsigned char c :1; + unsigned char zero:1; + }; +}; + +static_assert(sizeof(union pgste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pte) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pmd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pud) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union p4d) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pgd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union crste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union skey) =3D=3D sizeof(char)); + +struct segment_table { + union pmd pmds[_CRST_ENTRIES]; +}; + +struct region3_table { + union pud puds[_CRST_ENTRIES]; +}; + +struct region2_table { + union p4d p4ds[_CRST_ENTRIES]; +}; + +struct region1_table { + union pgd pgds[_CRST_ENTRIES]; +}; + +struct crst_table { + union { + union crste crstes[_CRST_ENTRIES]; + struct segment_table segment; + struct region3_table region3; + struct region2_table region2; + struct region1_table region1; + }; +}; + +struct page_table { + union pte ptes[_PAGE_ENTRIES]; + union pgste pgstes[_PAGE_ENTRIES]; +}; + +static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); +static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); + +/** + * _pte() - Useful constructor for union pte + * @pfn: the pfn this pte should point to. + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * @special: whether the pte should be marked as special + * + * The pte is also marked as young and present. If the pte is marked as di= rty, + * it gets marked as soft-dirty too. If the pte is not dirty, the hardware + * protect bit is set (independently of the write softbit); this way proper + * dirty tracking can be performed. + * + * Return: a union pte value. + */ +static inline union pte _pte(kvm_pfn_t pfn, bool writable, bool dirty, boo= l special) +{ + union pte res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.p =3D !dirty; + res.s.y =3D 1; + res.s.pr =3D 1; + res.s.w =3D writable; + res.s.d =3D dirty; + res.s.sd =3D dirty; + res.s.s =3D special; + return res; +} + +static inline union crste _crste_fc0(kvm_pfn_t pfn, int tt) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.tt =3D tt; + res.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + res.h.fc0.tf =3D 0; + return res; +} + +/** + * _crste() - Useful constructor for union crste with FC=3D1 + * @pfn: the pfn this pte should point to. + * @tt: the table type + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * + * The crste is also marked as young and present. If the crste is marked as + * dirty, it gets marked as soft-dirty too. If the crste is not dirty, the + * hardware protect bit is set (independently of the write softbit); this = way + * proper dirty tracking can be performed. + * + * Return: a union crste value. + */ +static inline union crste _crste_fc1(kvm_pfn_t pfn, int tt, bool writable,= bool dirty) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) & _SEGMENT_MASK }; + + res.h.tt =3D tt; + res.h.p =3D !dirty; + res.h.fc =3D 1; + res.s.fc1.y =3D 1; + res.s.fc1.pr =3D 1; + res.s.fc1.w =3D writable; + res.s.fc1.d =3D dirty; + res.s.fc1.sd =3D dirty; + return res; +} + +/** + * struct vsie_rmap - reverse mapping for shadow page table entries + * @next: pointer to next rmap in the list + * @r_gfn: virtual rmap address in the shadow guest address space + */ +struct vsie_rmap { + struct vsie_rmap *next; + union { + unsigned long val; + struct { + long level: 8; + unsigned long : 4; + unsigned long r_gfn:52; + }; + }; +}; + +static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); + +static inline struct crst_table *crste_table_start(union crste *crstep) +{ + return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); +} + +static inline struct page_table *pte_table_start(union pte *ptep) +{ + return (struct page_table *)ALIGN_DOWN((unsigned long)ptep, _PAGE_TABLE_S= IZE); +} + +static inline bool crdte_crste(union crste *crstep, union crste old, union= crste new, gfn_t gfn, + union asce asce) +{ + unsigned long dtt =3D 0x10 | new.h.tt << 2; + void *table =3D crste_table_start(crstep); + + return crdte(old.val, new.val, table, dtt, gfn_to_gpa(gfn), asce.val); +} + +/** + * idte_crste() - invalidate a crste entry using idte + * @crstep: pointer to the crste to be invalidated + * @gfn: a gfn mapped by the crste + * @opt: options for the idte instruction + * @asce: the asce + * @local: whether the operation is cpu-local + */ +static __always_inline void idte_crste(union crste *crstep, gfn_t gfn, uns= igned long opt, + union asce asce, int local) +{ + unsigned long table_origin =3D __pa(crste_table_start(crstep)); + unsigned long gaddr =3D gfn_to_gpa(gfn) & HPAGE_MASK; + + if (__builtin_constant_p(opt) && opt =3D=3D 0) { + /* flush without guest asce */ + asm volatile("idte %[table_origin],0,%[gaddr],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr] "a" (gaddr), + [local] "i" (local) + : "cc"); + } else { + /* flush with guest asce */ + asm volatile("idte %[table_origin],%[asce],%[gaddr_opt],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr_opt] "a" (gaddr | opt), + [asce] "a" (asce.val), [local] "i" (local) + : "cc"); + } +} + +static inline void dat_init_pgstes(struct page_table *pt, unsigned long va= l) +{ + memset64((void *)pt->pgstes, val, PTRS_PER_PTE); +} + +static inline void dat_init_page_table(struct page_table *pt, unsigned lon= g ptes, + unsigned long pgstes) +{ + memset64((void *)pt->ptes, ptes, PTRS_PER_PTE); + dat_init_pgstes(pt, pgstes); +} + +static inline gfn_t asce_end(union asce asce) +{ + return 1ULL << ((asce.dt + 1) * 11 + _SEGMENT_SHIFT - PAGE_SHIFT); +} + +#define _CRSTE(x) ((union crste) { .val =3D _Generic((x), \ + union pgd : (x).val, \ + union p4d : (x).val, \ + union pud : (x).val, \ + union pmd : (x).val, \ + union crste : (x).val)}) + +#define _CRSTEP(x) ((union crste *)_Generic((*(x)), \ + union pgd : (x), \ + union p4d : (x), \ + union pud : (x), \ + union pmd : (x), \ + union crste : (x))) + +#define _CRSTP(x) ((struct crst_table *)_Generic((*(x)), \ + struct crst_table : (x), \ + struct segment_table : (x), \ + struct region3_table : (x), \ + struct region2_table : (x), \ + struct region1_table : (x))) + +static inline bool asce_contains_gfn(union asce asce, gfn_t gfn) +{ + return gfn < asce_end(asce); +} + +static inline bool is_pmd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_SEGMENT; +} + +static inline bool is_pud(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION3; +} + +static inline bool is_p4d(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION2; +} + +static inline bool is_pgd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION1; +} + +static inline phys_addr_t pmd_origin_large(union pmd pmd) +{ + return pmd.val & _SEGMENT_ENTRY_ORIGIN_LARGE; +} + +static inline phys_addr_t pud_origin_large(union pud pud) +{ + return pud.val & _REGION3_ENTRY_ORIGIN_LARGE; +} + +/** + * crste_origin_large() - Return the large frame origin of a large crste + * @crste: The crste whose origin is to be returned. Should be either a + * region-3 table entry or a segment table entry, in both cases wi= th + * FC set to 1 (large pages). + * + * Return: The origin of the large frame pointed to by @crste, or -1 if the + * crste was not large (wrong table type, or FC=3D=3D0) + */ +static inline phys_addr_t crste_origin_large(union crste crste) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return pmd_origin_large(crste.pmd); + return pud_origin_large(crste.pud); +} + +#define crste_origin(x) (_Generic((x), \ + union pmd : (x).val & _SEGMENT_ENTRY_ORIGIN, \ + union pud : (x).val & _REGION_ENTRY_ORIGIN, \ + union p4d : (x).val & _REGION_ENTRY_ORIGIN, \ + union pgd : (x).val & _REGION_ENTRY_ORIGIN)) + +static inline unsigned long pte_origin(union pte pte) +{ + return pte.val & PAGE_MASK; +} + +static inline bool pmd_prefix(union pmd pmd) +{ + return pmd.h.fc && pmd.s.fc1.prefix_notif; +} + +static inline bool pud_prefix(union pud pud) +{ + return pud.h.fc && pud.s.fc1.prefix_notif; +} + +static inline bool crste_leaf(union crste crste) +{ + return (crste.h.tt <=3D TABLE_TYPE_REGION3) && crste.h.fc; +} + +static inline bool crste_prefix(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.prefix_notif; +} + +static inline bool crste_dirty(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.d; +} + +static inline union pgste *pgste_of(union pte *pte) +{ + return (union pgste *)(pte + _PAGE_ENTRIES); +} + +static inline bool pte_hole(union pte pte) +{ + return pte.h.i && !pte.tok.pr && pte.tok.type !=3D _DAT_TOKEN_NONE; +} + +static inline bool _crste_hole(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type !=3D _DAT_TOKEN_NONE; +} + +#define crste_hole(x) _crste_hole(_CRSTE(x)) + +static inline bool _crste_none(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type =3D=3D _DAT_TOKEN_NON= E; +} + +#define crste_none(x) _crste_none(_CRSTE(x)) + +static inline phys_addr_t large_pud_to_phys(union pud pud, gfn_t gfn) +{ + return pud_origin_large(pud) | (gfn_to_gpa(gfn) & ~_REGION3_MASK); +} + +static inline phys_addr_t large_pmd_to_phys(union pmd pmd, gfn_t gfn) +{ + return pmd_origin_large(pmd) | (gfn_to_gpa(gfn) & ~_SEGMENT_MASK); +} + +static inline phys_addr_t large_crste_to_phys(union crste crste, gfn_t gfn) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return large_pmd_to_phys(crste.pmd, gfn); + return large_pud_to_phys(crste.pud, gfn); +} + +static inline bool cspg_crste(union crste *crstep, union crste old, union = crste new) +{ + return cspg(&crstep->val, old.val, new.val); +} + +static inline struct page_table *dereference_pmd(union pmd pmd) +{ + return phys_to_virt(crste_origin(pmd)); +} + +static inline struct segment_table *dereference_pud(union pud pud) +{ + return phys_to_virt(crste_origin(pud)); +} + +static inline struct region3_table *dereference_p4d(union p4d p4d) +{ + return phys_to_virt(crste_origin(p4d)); +} + +static inline struct region2_table *dereference_pgd(union pgd pgd) +{ + return phys_to_virt(crste_origin(pgd)); +} + +static inline struct crst_table *_dereference_crste(union crste crste) +{ + if (unlikely(is_pmd(crste))) + return NULL; + return phys_to_virt(crste_origin(crste.pud)); +} + +#define dereference_crste(x) (_Generic((x), \ + union pud : _dereference_crste(_CRSTE(x)), \ + union p4d : _dereference_crste(_CRSTE(x)), \ + union pgd : _dereference_crste(_CRSTE(x)), \ + union crste : _dereference_crste(_CRSTE(x)))) + +static inline struct crst_table *dereference_asce(union asce asce) +{ + return phys_to_virt(asce.val & _ASCE_ORIGIN); +} + +static inline void asce_flush_tlb(union asce asce) +{ + __tlb_flush_idte(asce.val); +} + +static inline bool pgste_get_trylock(union pte *ptep, union pgste *res) +{ + union pgste *pgstep =3D pgste_of(ptep); + union pgste old_pgste; + + if (READ_ONCE(pgstep->val) & PGSTE_PCL_BIT) + return false; + old_pgste.val =3D __atomic64_or_barrier(PGSTE_PCL_BIT, &pgstep->val); + if (old_pgste.pcl) + return false; + old_pgste.pcl =3D 1; + *res =3D old_pgste; + return true; +} + +static inline union pgste pgste_get_lock(union pte *ptep) +{ + union pgste res; + + while (!pgste_get_trylock(ptep, &res)) + cpu_relax(); + return res; +} + +static inline void pgste_set_unlock(union pte *ptep, union pgste pgste) +{ + pgste.pcl =3D 0; + barrier(); + WRITE_ONCE(*pgste_of(ptep), pgste); +} + +#endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C75EE2C21F2; Wed, 4 Feb 2026 15:03:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217392; cv=none; b=qq3xu5v1Ai4VdCW3yGLxJ+R1q2vGGHc9jKEKGDx9Jhao6M0Knn67m6eOhafi80FA2GRPd2UtEfTM2qvWBiL1cvsFP+Bv12KDy0gpEjhK2lrCl6OlLK7unLQ6MOTZ2VbYNqw5g+rcr4lAJaa/ee5xGncyS+a+diYDxF1HSJ7Aqvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217392; c=relaxed/simple; bh=X/9jsjyaBRXkNS4VcNwgMllC6zMauDO6PGpmUhvrsjs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t2cj5URWSBlPZcq49lTtaqB8WKDGCpSd8ctNryUo0YSqSz4bmVbVroAbhnhRQDpzjQyWSmwqf2qrRQzSjv79+QEqBVH+bxLA92JioQbQLd3UWcqrgykW9/v0G3khwRyVBr9zyerwj4CLUQJ5X0CItvGsNjmbVMf+6fgcstO2MrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=EunoD1TJ; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="EunoD1TJ" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613Mbt1k021704; Wed, 4 Feb 2026 15:03:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=trNgtTtg42ZVX0hYX iZyZ0171YWRKKNwV7vaSwnffPM=; b=EunoD1TJwwwKma3J74KcJd7wcSg0yeQ5H gC+APJ4maXtiG87gOJK2Ym55WL3ZhobQawWznNYVv6dnXJy/c4CO6q6yQdrepcDm ep7b+P01nFZyDcbLaRhE6cHc8EDLD1kMCPaOv8iwKAKuM007Vhk7Xjf6A05chw2P ErUpTFEE/bcwI0uuGU1LgFePuwKm1fHmKc5k5spEs9lI0ZUFMlhy7NFFNg9XYoGk cz0vMLuhkFz7Hmck1e/dvzvVoXY3rTMEfJljrkFQ4qkOGpwgsR6d+Fei86LDsbZ2 A/TuWjL7Nzef1zdtJH+OqCzGHY4/qQ+58AjLXZwntkT4QtGAqPJsQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad6s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614COMTS009156; Wed, 4 Feb 2026 15:03:07 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1vey5s2k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F33c830015832 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:03 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 91D9C20040; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46F382004D; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 13/29] KVM: s390: KVM page table management functions: allocation Date: Wed, 4 Feb 2026 16:02:42 +0100 Message-ID: <20260204150259.60425-14-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX2gb73XrOvkCa E6oQTTev2NneT4uw/poewtbnnnMiHy7kpHq1LURMtkW5M1j+USr8WvqbebJKnAW2VQBslyuJduF iexRk/br0x/lp1g4KaOrbRVW8fEeP51sMZQfjOGCzenzNbJSWmLF+5dKQOCbiwOToxVvyBHJOTv fAd8ILYsPzzA97I9BTGK10FAKueuVZAahqnLkPWPE+fZID+C61vmbQfFpFp7X9UANTNFpABy63R uvCR+L2HtrvwXO1yQzgqWLV8LxGUJMR8Do8LYduJorYkt7OBMcCwHLP1jgMvabtg1FhXN+pF2Eo MO8N+PRD2cBNvUbp56DGxAkxNi5t9rAtHYtRLwrakOWwmvuUOJrBMB3ORXdZ75BSl4++rpskGJC 4OMu74XSEcHYXmG3w+toDvhiYEs6kxIQhrPqYUr6ekCmJq6RATekdXGITcn0wEq09YvFWsLnHN4 PTVzLrhR6uDhcASchGw== X-Proofpoint-GUID: _zcMZEaWR1ccVvGANlvtJbHz7sxLxAu2 X-Proofpoint-ORIG-GUID: _zcMZEaWR1ccVvGANlvtJbHz7sxLxAu2 X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fad cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=ciOewYurfqE_QU1RCzEA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds the boilerplate and functions for the allocation and deallocation of DAT tables. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/Makefile | 1 + arch/s390/kvm/dat.c | 103 +++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 77 +++++++++++++++++++++++++++ arch/s390/mm/page-states.c | 1 + 4 files changed, 182 insertions(+) create mode 100644 arch/s390/kvm/dat.c diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 9a723c48b05a..84315d2f75fb 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,6 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D dat.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c new file mode 100644 index 000000000000..c324a27f379f --- /dev/null +++ b/arch/s390/kvm/dat.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2020, 2024 + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "dat.h" + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc) +{ + void *o; + + for ( ; mc->n_crsts < KVM_S390_MMU_CACHE_N_CRSTS; mc->n_crsts++) { + o =3D (void *)__get_free_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALL= OC_ORDER); + if (!o) + return -ENOMEM; + mc->crsts[mc->n_crsts] =3D o; + } + for ( ; mc->n_pts < KVM_S390_MMU_CACHE_N_PTS; mc->n_pts++) { + o =3D (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->pts[mc->n_pts] =3D o; + } + for ( ; mc->n_rmaps < KVM_S390_MMU_CACHE_N_RMAPS; mc->n_rmaps++) { + o =3D kzalloc(sizeof(*mc->rmaps[0]), GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->rmaps[mc->n_rmaps] =3D o; + } + return 0; +} + +static inline struct page_table *dat_alloc_pt_noinit(struct kvm_s390_mmu_c= ache *mc) +{ + struct page_table *res; + + res =3D kvm_s390_mmu_cache_alloc_pt(mc); + if (res) + __arch_set_page_dat(res, 1); + return res; +} + +static inline struct crst_table *dat_alloc_crst_noinit(struct kvm_s390_mmu= _cache *mc) +{ + struct crst_table *res; + + res =3D kvm_s390_mmu_cache_alloc_crst(mc); + if (res) + __arch_set_page_dat(res, 1UL << CRST_ALLOC_ORDER); + return res; +} + +struct crst_table *dat_alloc_crst_sleepable(unsigned long init) +{ + struct page *page; + void *virt; + + page =3D alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALLOC_ORDER); + if (!page) + return NULL; + virt =3D page_to_virt(page); + __arch_set_page_dat(virt, 1UL << CRST_ALLOC_ORDER); + crst_table_init(virt, init); + return virt; +} + +void dat_free_level(struct crst_table *table, bool owns_ptes) +{ + unsigned int i; + + for (i =3D 0; i < _CRST_ENTRIES; i++) { + if (table->crstes[i].h.fc || table->crstes[i].h.i) + continue; + if (!is_pmd(table->crstes[i])) + dat_free_level(dereference_crste(table->crstes[i]), owns_ptes); + else if (owns_ptes) + dat_free_pt(dereference_pmd(table->crstes[i].pmd)); + } + dat_free_crst(table); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index d5e1a45813bc..a053f0d49bae 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -418,6 +418,46 @@ struct vsie_rmap { =20 static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); =20 +#define KVM_S390_MMU_CACHE_N_CRSTS 6 +#define KVM_S390_MMU_CACHE_N_PTS 2 +#define KVM_S390_MMU_CACHE_N_RMAPS 16 +struct kvm_s390_mmu_cache { + void *crsts[KVM_S390_MMU_CACHE_N_CRSTS]; + void *pts[KVM_S390_MMU_CACHE_N_PTS]; + void *rmaps[KVM_S390_MMU_CACHE_N_RMAPS]; + short int n_crsts; + short int n_pts; + short int n_rmaps; +}; + +void dat_free_level(struct crst_table *table, bool owns_ptes); +struct crst_table *dat_alloc_crst_sleepable(unsigned long init); + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); + +#define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) + +static inline struct page_table *kvm_s390_mmu_cache_alloc_pt(struct kvm_s3= 90_mmu_cache *mc) +{ + if (mc->n_pts) + return mc->pts[--mc->n_pts]; + return (void *)__get_free_page(GFP_KVM_S390_MMU_CACHE); +} + +static inline struct crst_table *kvm_s390_mmu_cache_alloc_crst(struct kvm_= s390_mmu_cache *mc) +{ + if (mc->n_crsts) + return mc->crsts[--mc->n_crsts]; + return (void *)__get_free_pages(GFP_KVM_S390_MMU_CACHE | __GFP_COMP, CRST= _ALLOC_ORDER); +} + +static inline struct vsie_rmap *kvm_s390_mmu_cache_alloc_rmap(struct kvm_s= 390_mmu_cache *mc) +{ + if (mc->n_rmaps) + return mc->rmaps[--mc->n_rmaps]; + return kzalloc(sizeof(struct vsie_rmap), GFP_KVM_S390_MMU_CACHE); +} + static inline struct crst_table *crste_table_start(union crste *crstep) { return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); @@ -717,4 +757,41 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_free_pt(struct page_table *pt) +{ + free_page((unsigned long)pt); +} + +static inline void _dat_free_crst(struct crst_table *table) +{ + free_pages((unsigned long)table, CRST_ALLOC_ORDER); +} + +#define dat_free_crst(x) _dat_free_crst(_CRSTP(x)) + +static inline void kvm_s390_free_mmu_cache(struct kvm_s390_mmu_cache *mc) +{ + if (!mc) + return; + while (mc->n_pts) + dat_free_pt(mc->pts[--mc->n_pts]); + while (mc->n_crsts) + _dat_free_crst(mc->crsts[--mc->n_crsts]); + while (mc->n_rmaps) + kfree(mc->rmaps[--mc->n_rmaps]); + kfree(mc); +} + +DEFINE_FREE(kvm_s390_mmu_cache, struct kvm_s390_mmu_cache *, if (_T) kvm_s= 390_free_mmu_cache(_T)) + +static inline struct kvm_s390_mmu_cache *kvm_s390_new_mmu_cache(void) +{ + struct kvm_s390_mmu_cache *mc __free(kvm_s390_mmu_cache) =3D NULL; + + mc =3D kzalloc(sizeof(*mc), GFP_KERNEL_ACCOUNT); + if (mc && !kvm_s390_mmu_cache_topup(mc)) + return_ptr(mc); + return NULL; +} + #endif /* __KVM_S390_DAT_H */ diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c index 01f9b39e65f5..5bee173db72e 100644 --- a/arch/s390/mm/page-states.c +++ b/arch/s390/mm/page-states.c @@ -13,6 +13,7 @@ #include =20 int __bootdata_preserved(cmma_flag); +EXPORT_SYMBOL(cmma_flag); =20 void arch_free_page(struct page *page, int order) { --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86462423A97; Wed, 4 Feb 2026 15:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; cv=none; b=hwc08xKHQitOdW2XwK4sQ3Ceiexuwks2RZtCoT65kCPGgX/dr8XAMl5POuid7hfZxCA17f1rUP38OkdcMOl2LTfqZoG32HNm9EHJGq0rTAh/daeWaBk7glZVnWgNqvhw/RwgXkM3Bk9sJlkzKdxtPXiTfZJaaCYBYNGUXAs1odA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; c=relaxed/simple; bh=4wZErfrKNssXMqg/WlH8N5QlgSfcmfywFXurpRnUSvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=htYEDalorcHGq68p1PjqBtubHOg/0fW+8UhIOlJXl8DN9DNH4PUiWLpHK4P7zotMPDuSyphFIr4bMGLKZm2gBg9HdfKbYgCk5BpOIEdI6mUMQLpulEU/cSQKw9TlL7BoRsUdsGWdaHv2aUyiLz09Kx98EqYmackC/oShFjivZC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=CvLOwiiq; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="CvLOwiiq" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613NLpuM030016; Wed, 4 Feb 2026 15:03:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=f7RVWpRk5Z8PWWgWS Ty+nNh0Azc8F0Cuhx8493suI6E=; b=CvLOwiiqOaoKAEpyzoE5dSgXTz6jInQ7N Zdm98iOvNwqw6SJKxzJkJULyoELNNLQClDiaUcaMwv3mAMvtL00qYtYvn4PzXrr8 EAr2fD9GnoXh/S1tqm4+TXkyDvUpZT5DsqmlEglZuOxBF9nPsrPMTcgw7+QESBWP 5KFSo3jkTxPjyVXoYWuHS6L8ztTl25sWISUyIx5YYGakPPDyW1UtyZFRmDqB3lvI OUwzeSxncnpPbT8YhYdQO/+eBXNHZZk88st+xpXV7B5XHsdM1FiURuuNE02P4XHN Y7K2pBOeLslZimlQvIkZX2zbh+UTCU7Cn6YQAA4TS0lG8gQ49LHSQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19f6jg30-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614DeYFr027353; Wed, 4 Feb 2026 15:03:07 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1xs1dcra-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F337m30015836 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:04 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DA1952004E; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 98FB520043; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 14/29] KVM: s390: KVM page table management functions: clear and replace Date: Wed, 4 Feb 2026 16:02:43 +0100 Message-ID: <20260204150259.60425-15-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: SNtu0RNwXuEnMEwR51Yry21m_zV598uX X-Authority-Analysis: v=2.4 cv=drTWylg4 c=1 sm=1 tr=0 ts=69835fac cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=cTj5uL8xXfUXbJxEK6wA:9 X-Proofpoint-ORIG-GUID: SNtu0RNwXuEnMEwR51Yry21m_zV598uX X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX/hE1RxFRBRX9 rP+AirjfmMJJSihW5O0SWBWlzjTNJk4BkIDQDNU3CsTqchaeq/SYLf23hWeuXGpyJj8Lc4GE1An CdMFouiEyXA0nBKnN/3F/uAbAOhKrOsADnc7gRd+qURfLxfmcbBNlL/GAtCUN1Gps+6Q/735HGk ZDgzJChq+4KHTx+9JO5B5pLPGz/ZuJHBd0L/mnCNFaP6KuCWkPls7ZsXvDmHPLHt0DHDN5ywW5Z 6NkvyBBeoDRSWaqiJ0iSBwhPsDNj5x7F00Sv82TDeMVzxWroBnLyO0FgqnYmN86ITyG4tAX098i 0frgXgDkhWLazyDc1a0EUWT8JgHYhW/gV7uuzYWxm6kkHFE0HcNJMjOJHU1FpD1/udWTWe0zWsj sq5TDsHPZ9Gn/ExS1ggpP+l2VRxcVxTFCvIKPAZl6LpTPfdJOl12U/FHBo2O3fsI+CkIbNIEYuY Q5lX/68WHsOgXy4z2Kw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to clear, replace or exchange DAT table entries. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.c | 115 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 40 +++++++++++++++ 2 files changed, 155 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index c324a27f379f..e38b1a139fbb 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -101,3 +101,118 @@ void dat_free_level(struct crst_table *table, bool ow= ns_ptes) } dat_free_crst(table); } + +/** + * dat_crstep_xchg() - Exchange a gmap CRSTE with another. + * @crstep: Pointer to the CRST entry + * @new: Replacement entry. + * @gfn: The affected guest address. + * @asce: The ASCE of the address space. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + */ +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce) +{ + if (crstep->h.i) { + WRITE_ONCE(*crstep, new); + return; + } else if (cpu_has_edat2()) { + crdte_crste(crstep, *crstep, new, gfn, asce); + return; + } + + if (machine_has_tlb_guest()) + idte_crste(crstep, gfn, IDTE_GUEST_ASCE, asce, IDTE_GLOBAL); + else + idte_crste(crstep, gfn, 0, NULL_ASCE, IDTE_GLOBAL); + WRITE_ONCE(*crstep, new); +} + +/** + * dat_crstep_xchg_atomic() - Atomically exchange a gmap CRSTE with anothe= r. + * @crstep: Pointer to the CRST entry. + * @old: Expected old value. + * @new: Replacement entry. + * @gfn: The affected guest address. + * @asce: The asce of the address space. + * + * This function is needed to atomically exchange a CRSTE that potentially + * maps a prefix area, without having to invalidate it inbetween. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: %true if the exchange was successful. + */ +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce) +{ + if (old.h.i) + return arch_try_cmpxchg((long *)crstep, &old.val, new.val); + if (cpu_has_edat2()) + return crdte_crste(crstep, old, new, gfn, asce); + return cspg_crste(crstep, old, new); +} + +static void dat_set_storage_key_from_pgste(union pte pte, union pgste pgst= e) +{ + union skey nkey =3D { .acc =3D pgste.acc, .fp =3D pgste.fp }; + + page_set_storage_key(pte_origin(pte), nkey.skey, 0); +} + +static void dat_move_storage_key(union pte old, union pte new) +{ + page_set_storage_key(pte_origin(new), page_get_storage_key(pte_origin(old= )), 1); +} + +static union pgste dat_save_storage_key_into_pgste(union pte pte, union pg= ste pgste) +{ + union skey skey; + + skey.skey =3D page_get_storage_key(pte_origin(pte)); + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gr |=3D skey.r; + pgste.gc |=3D skey.c; + + return pgste; +} + +union pgste __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte = new, gfn_t gfn, + union asce asce, bool uses_skeys) +{ + union pte old =3D READ_ONCE(*ptep); + + /* Updating only the software bits while holding the pgste lock. */ + if (!((ptep->val ^ new.val) & ~_PAGE_SW_BITS)) { + WRITE_ONCE(ptep->swbyte, new.swbyte); + return pgste; + } + + if (!old.h.i) { + unsigned long opts =3D IPTE_GUEST_ASCE | (pgste.nodat ? IPTE_NODAT : 0); + + if (machine_has_tlb_guest()) + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, opts, asce.val, IPTE_GLOBAL); + else + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, 0, 0, IPTE_GLOBAL); + } + + if (uses_skeys) { + if (old.h.i && !new.h.i) + /* Invalid to valid: restore storage keys from PGSTE. */ + dat_set_storage_key_from_pgste(new, pgste); + else if (!old.h.i && new.h.i) + /* Valid to invalid: save storage keys to PGSTE. */ + pgste =3D dat_save_storage_key_into_pgste(old, pgste); + else if (!old.h.i && !new.h.i) + /* Valid to valid: move storage keys. */ + if (old.h.pfra !=3D new.h.pfra) + dat_move_storage_key(old, new); + /* Invalid to invalid: nothing to do. */ + } + + WRITE_ONCE(*ptep, new); + return pgste; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index a053f0d49bae..ee070d18bd36 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -430,6 +430,12 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, + gfn_t gfn, union asce asce, bool uses_skeys); +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce); +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); + void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -757,6 +763,21 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_ptep_xchg(union pte *ptep, union pte new, gfn_t gfn= , union asce asce, + bool has_skeys) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, asce, has_skeys); + pgste_set_unlock(ptep, pgste); +} + +static inline void dat_ptep_clear(union pte *ptep, gfn_t gfn, union asce a= sce, bool has_skeys) +{ + dat_ptep_xchg(ptep, _PTE_EMPTY, gfn, asce, has_skeys); +} + static inline void dat_free_pt(struct page_table *pt) { free_page((unsigned long)pt); @@ -794,4 +815,23 @@ static inline struct kvm_s390_mmu_cache *kvm_s390_new_= mmu_cache(void) return NULL; } =20 +static inline bool dat_pmdp_xchg_atomic(union pmd *pmdp, union pmd old, un= ion pmd new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pmdp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline bool dat_pudp_xchg_atomic(union pud *pudp, union pud old, un= ion pud new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pudp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline void dat_crstep_clear(union crste *crstep, gfn_t gfn, union = asce asce) +{ + union crste newcrste =3D _CRSTE_EMPTY(crstep->h.tt); + + dat_crstep_xchg(crstep, newcrste, gfn, asce); +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A24A3423A87; Wed, 4 Feb 2026 15:03:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217394; cv=none; b=DSE5R1eHBaW5MFiuplSP5EPK5YkScJhrK6zXJVfGJfcT01GXqtO9rWUVXK0VX74qZynUVehZN6b3bd4ZoNqYChuLz1J6v6iHzQ0TQeK9UM+dMrDQ9DI6C65Rfts8BdHTUz3J1mF4vsdQ7xV04F1DosXyj6S8MKE+LKVJPGoyJ0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217394; c=relaxed/simple; bh=/YJAmSKm3BG3gjsBQ3bfA558MgMxISufz0kFoQIUj1g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UHYI5s3c1Xzg0/vG7CDfDuWoOFkgmIpzzyQePc3T1CbaKe3fqCMXEWGPq9zxTK7ZQHdC7dZ2On+04L2Ih1O+vW+G5xbB2W4vvEfDcEIRTEUgULG64yCRXrlNpyUo52wS70pd6LRSQnPJA1HGgwLyGpQMyb4/VBwUHG15i/urGGc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=cSsiky0b; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="cSsiky0b" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6140s537011155; Wed, 4 Feb 2026 15:03:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Y+0GA4uginX0e9BnW GBbu/OtviVmfuI72y52dsKC1ZM=; b=cSsiky0bx4lS8wC8u4sDIKgUbaDqi+bHO X//We0Q36AEDZAx9gUq6r4k4DKHqqcl4IyUduRxrB8epjF57/vVB5A6z6OEFEUg+ VqJDUMJE1ntvIsOcdzv8fzoBrX9HVaq3dAxg+leSU7RVLKOd6AJ3o/wu+f6E2IMc jpNFKxIMp2f/VoLDKn4UgnOx2m5vvuzGR4B8IKgJUNqpjVi+GKGTV0s7sEl5exRk MYXZHvaf6JqIpWG6joagma0I3auYD+R/COCF//Y0gMInhibhFHT+TNtHluawUo7T NGEsHuBZJxrGHmHB1z4/ChvySYfvqFce3RlKGBkSfJ69akoBs6n+g== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw7c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614DkxUx005928; Wed, 4 Feb 2026 15:03:08 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1x9jde5g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:07 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F34pd45678960 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:04 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4201520040; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E22242004F; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:03 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 15/29] KVM: s390: KVM page table management functions: walks Date: Wed, 4 Feb 2026 16:02:44 +0100 Message-ID: <20260204150259.60425-16-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835fac cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=QAN07HEYxiKkflwQkAkA:9 X-Proofpoint-GUID: RUKweQ2rIyHuyEiMPKCCRfIOMthPfAQs X-Proofpoint-ORIG-GUID: RUKweQ2rIyHuyEiMPKCCRfIOMthPfAQs X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfXyuNg9wRklxhb xsgW8IBNz5A554pg+imUBsSf9tgw5+mUFbd2v3wJKViPfMQKO3xpPbWrfidTlZ2vpv/igD36j37 E2+3A058qWI7cJ8fYNqEuKh1g/M7tUWwgzW2S8bG6vrIWB7IYyyg7fFLXZ+qXbmA1UdqGvl2Ard AJGlldk8fdK5yD4NyEGRLu6RSKE37BI/HJBfzJxbVN/asEdsZNZxGRvxZG6MuMMDs5OkALIo+Jg 0hzMGxCYJMi1CsE7hBCl3QHOfoYS4Y0XThd409UQUrHvSawHTouUhSN/BIVy16SQBIfGyb0q3kr CsYoNw6ojs8MdnZIcgW69mVDtFNy/4ued49bfI3B4a1zyBUyAMJAX/B/kLSE1FjkD8/01HoDUL+ 7tHkwQvcV76T2e3+mVdmgQnlTEdxG1of7WsxiMl3bYOtxCqzclhWJQ9u15l4MDuQ23cubqJ+aJ0 nEPVu/D+tK5o5B/i8og== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to walk to specific table entries, or to perform actions on a range of entries. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.c | 386 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 39 +++++ 2 files changed, 425 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index e38b1a139fbb..c31838107752 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -216,3 +216,389 @@ union pgste __dat_ptep_xchg(union pte *ptep, union pg= ste pgste, union pte new, g WRITE_ONCE(*ptep, new); return pgste; } + +/* + * dat_split_ste() - Split a segment table entry into page table entries. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: 0 in case of success, -ENOMEM if running out of memory. + */ +static int dat_split_ste(struct kvm_s390_mmu_cache *mc, union pmd *pmdp, g= fn_t gfn, + union asce asce, bool uses_skeys) +{ + union pgste pgste_init; + struct page_table *pt; + union pmd new, old; + union pte init; + int i; + + BUG_ON(!mc); + old =3D READ_ONCE(*pmdp); + + /* Already split, nothing to do. */ + if (!old.h.i && !old.h.fc) + return 0; + + pt =3D dat_alloc_pt_noinit(mc); + if (!pt) + return -ENOMEM; + new.val =3D virt_to_phys(pt); + + while (old.h.i || old.h.fc) { + init.val =3D pmd_origin_large(old); + init.h.p =3D old.h.p; + init.h.i =3D old.h.i; + init.s.d =3D old.s.fc1.d; + init.s.w =3D old.s.fc1.w; + init.s.y =3D old.s.fc1.y; + init.s.sd =3D old.s.fc1.sd; + init.s.pr =3D old.s.fc1.pr; + pgste_init.val =3D 0; + if (old.h.fc) { + for (i =3D 0; i < _PAGE_ENTRIES; i++) + pt->ptes[i].val =3D init.val | i * PAGE_SIZE; + /* No need to take locks as the page table is not installed yet. */ + pgste_init.prefix_notif =3D old.s.fc1.prefix_notif; + pgste_init.pcl =3D uses_skeys && init.h.i; + dat_init_pgstes(pt, pgste_init.val); + } else { + dat_init_page_table(pt, init.val, 0); + } + + if (dat_pmdp_xchg_atomic(pmdp, old, new, gfn, asce)) { + if (!pgste_init.pcl) + return 0; + for (i =3D 0; i < _PAGE_ENTRIES; i++) { + union pgste pgste =3D pt->pgstes[i]; + + pgste =3D dat_save_storage_key_into_pgste(pt->ptes[i], pgste); + pgste_set_unlock(pt->ptes + i, pgste); + } + return 0; + } + old =3D READ_ONCE(*pmdp); + } + + dat_free_pt(pt); + return 0; +} + +/* + * dat_split_crste() - Split a crste into smaller crstes. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: %0 in case of success, %-ENOMEM if running out of memory. + */ +static int dat_split_crste(struct kvm_s390_mmu_cache *mc, union crste *crs= tep, + gfn_t gfn, union asce asce, bool uses_skeys) +{ + struct crst_table *table; + union crste old, new, init; + int i; + + old =3D READ_ONCE(*crstep); + if (is_pmd(old)) + return dat_split_ste(mc, &crstep->pmd, gfn, asce, uses_skeys); + + BUG_ON(!mc); + + /* Already split, nothing to do. */ + if (!old.h.i && !old.h.fc) + return 0; + + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + + new.val =3D virt_to_phys(table); + new.h.tt =3D old.h.tt; + new.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + + while (old.h.i || old.h.fc) { + init =3D old; + init.h.tt--; + if (old.h.fc) { + for (i =3D 0; i < _CRST_ENTRIES; i++) + table->crstes[i].val =3D init.val | i * HPAGE_SIZE; + } else { + crst_table_init((void *)table, init.val); + } + if (dat_crstep_xchg_atomic(crstep, old, new, gfn, asce)) + return 0; + old =3D READ_ONCE(*crstep); + } + + dat_free_crst(table); + return 0; +} + +/** + * dat_entry_walk() - Walk the gmap page tables. + * @mc: Cache to use to allocate dat tables, if needed; can be NULL if nei= ther + * %DAT_WALK_SPLIT or %DAT_WALK_ALLOC is specified in @flags. + * @gfn: Guest frame. + * @asce: The ASCE of the address space. + * @flags: Flags from WALK_* macros. + * @walk_level: Level to walk to, from LEVEL_* macros. + * @last: Will be filled the last visited non-pte DAT entry. + * @ptepp: Will be filled the last visited pte entry, if any, otherwise NU= LL. + * + * Returns a table entry pointer for the given guest address and @walk_lev= el. + * + * The @flags have the following meanings: + * * %DAT_WALK_IGN_HOLES: consider holes as normal table entries + * * %DAT_WALK_ALLOC: allocate new tables to reach the requested level, if= needed + * * %DAT_WALK_SPLIT: split existing large pages to reach the requested le= vel, if needed + * * %DAT_WALK_LEAF: return successfully whenever a large page is encounte= red + * * %DAT_WALK_ANY: return successfully even if the requested level could = not be reached + * * %DAT_WALK_CONTINUE: walk to the requested level with the specified fl= ags, and then try to + * continue walking to ptes with only DAT_WALK_ANY + * * %DAT_WALK_USES_SKEYS: storage keys are in use + * + * Context: called with kvm->mmu_lock held. + * + * Return: + * * %PGM_ADDRESSING if the requested address lies outside memory + * * a PIC number if the requested address lies in a memory hole of type _= DAT_TOKEN_PIC + * * %-EFAULT if the requested address lies inside a memory hole of a diff= erent type + * * %-EINVAL if the given ASCE is not compatible with the requested level + * * %-EFBIG if the requested level could not be reached because a larger = frame was found + * * %-ENOENT if the requested level could not be reached for other reasons + * * %-ENOMEM if running out of memory while allocating or splitting a tab= le + */ +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp) +{ + union vaddress vaddr =3D { .addr =3D gfn_to_gpa(gfn) }; + bool continue_anyway =3D flags & DAT_WALK_CONTINUE; + bool uses_skeys =3D flags & DAT_WALK_USES_SKEYS; + bool ign_holes =3D flags & DAT_WALK_IGN_HOLES; + bool allocate =3D flags & DAT_WALK_ALLOC; + bool split =3D flags & DAT_WALK_SPLIT; + bool leaf =3D flags & DAT_WALK_LEAF; + bool any =3D flags & DAT_WALK_ANY; + struct page_table *pgtable; + struct crst_table *table; + union crste entry; + int rc; + + *last =3D NULL; + *ptepp =3D NULL; + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (WARN_ON_ONCE(unlikely(walk_level > asce.dt))) + return -EINVAL; + if (!asce_contains_gfn(asce, gfn)) + return PGM_ADDRESSING; + + table =3D dereference_asce(asce); + if (asce.dt >=3D ASCE_TYPE_REGION1) { + *last =3D table->crstes + vaddr.rfx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION1)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION1) + return 0; + if (entry.pgd.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pgd); + } + + if (asce.dt >=3D ASCE_TYPE_REGION2) { + *last =3D table->crstes + vaddr.rsx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION2)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION2) + return 0; + if (entry.p4d.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.p4d); + } + + if (asce.dt >=3D ASCE_TYPE_REGION3) { + *last =3D table->crstes + vaddr.rtx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION3)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION3 && + continue_anyway && !entry.pud.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_REGION3 || ((leaf || any) && entry.pud.= h.fc)) + return 0; + if (entry.pud.h.i && !entry.pud.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_SEGMENT && entry.pud.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pud); + } + + *last =3D table->crstes + vaddr.sx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_SEGMENT)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (continue_anyway && !entry.pmd.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_SEGMENT || ((leaf || any) && entry.pmd.h= .fc)) + return 0; + + if (entry.pmd.h.i && !entry.pmd.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_PAGE_TABLE && entry.pmd.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + pgtable =3D dereference_pmd(entry.pmd); + *ptepp =3D pgtable->ptes + vaddr.px; + if (pte_hole(**ptepp) && !ign_holes) + return (*ptepp)->tok.type =3D=3D _DAT_TOKEN_PIC ? (*ptepp)->tok.par : -E= FAULT; + return 0; +} + +static long dat_pte_walk_range(gfn_t gfn, gfn_t end, struct page_table *ta= ble, struct dat_walk *w) +{ + unsigned int idx =3D gfn & (_PAGE_ENTRIES - 1); + long rc =3D 0; + + for ( ; gfn < end; idx++, gfn++) { + if (pte_hole(READ_ONCE(table->ptes[idx]))) { + if (!(w->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(w->flags & DAT_WALK_ANY)) + continue; + } + + rc =3D w->ops->pte_entry(table->ptes + idx, gfn, gfn + 1, w); + if (rc) + break; + } + return rc; +} + +static long dat_crste_walk_range(gfn_t start, gfn_t end, struct crst_table= *table, + struct dat_walk *walk) +{ + unsigned long idx, cur_shift, cur_size; + dat_walk_op the_op; + union crste crste; + gfn_t cur, next; + long rc =3D 0; + + cur_shift =3D 8 + table->crstes[0].h.tt * 11; + idx =3D (start >> cur_shift) & (_CRST_ENTRIES - 1); + cur_size =3D 1UL << cur_shift; + + for (cur =3D ALIGN_DOWN(start, cur_size); cur < end; idx++, cur =3D next)= { + next =3D cur + cur_size; + walk->last =3D table->crstes + idx; + crste =3D READ_ONCE(*walk->last); + + if (crste_hole(crste)) { + if (!(walk->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(walk->flags & DAT_WALK_ANY)) + continue; + } + + the_op =3D walk->ops->crste_ops[crste.h.tt]; + if (the_op) { + rc =3D the_op(walk->last, cur, next, walk); + crste =3D READ_ONCE(*walk->last); + } + if (rc) + break; + if (!crste.h.i && !crste.h.fc) { + if (!is_pmd(crste)) + rc =3D dat_crste_walk_range(max(start, cur), min(end, next), + _dereference_crste(crste), walk); + else if (walk->ops->pte_entry) + rc =3D dat_pte_walk_range(max(start, cur), min(end, next), + dereference_pmd(crste.pmd), walk); + } + } + return rc; +} + +/** + * _dat_walk_gfn_range() - Walk DAT tables. + * @start: The first guest page frame to walk. + * @end: The guest page frame immediately after the last one to walk. + * @asce: The ASCE of the guest mapping. + * @ops: The gmap_walk_ops that will be used to perform the walk. + * @flags: Flags from WALK_* (currently only WALK_IGN_HOLES is supported). + * @priv: Will be passed as-is to the callbacks. + * + * Any callback returning non-zero causes the walk to stop immediately. + * + * Return: %-EINVAL in case of error, %-EFAULT if @start is too high for t= he + * given ASCE unless the DAT_WALK_IGN_HOLES flag is specified, + * otherwise it returns whatever the callbacks return. + */ +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv) +{ + struct crst_table *table =3D dereference_asce(asce); + struct dat_walk walk =3D { + .ops =3D ops, + .asce =3D asce, + .priv =3D priv, + .flags =3D flags, + .start =3D start, + .end =3D end, + }; + + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (!asce_contains_gfn(asce, start)) + return (flags & DAT_WALK_IGN_HOLES) ? 0 : -EFAULT; + + return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index ee070d18bd36..409064bd20a5 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -45,6 +45,7 @@ enum { #define TABLE_TYPE_PAGE_TABLE -1 =20 enum dat_walk_flags { + DAT_WALK_USES_SKEYS =3D 0x40, DAT_WALK_CONTINUE =3D 0x20, DAT_WALK_IGN_HOLES =3D 0x10, DAT_WALK_SPLIT =3D 0x08, @@ -332,6 +333,34 @@ struct page_table { static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); =20 +struct dat_walk; + +typedef long (*dat_walk_op)(union crste *crste, gfn_t gfn, gfn_t next, str= uct dat_walk *w); + +struct dat_walk_ops { + union { + dat_walk_op crste_ops[4]; + struct { + dat_walk_op pmd_entry; + dat_walk_op pud_entry; + dat_walk_op p4d_entry; + dat_walk_op pgd_entry; + }; + }; + long (*pte_entry)(union pte *pte, gfn_t gfn, gfn_t next, struct dat_walk = *w); +}; + +struct dat_walk { + const struct dat_walk_ops *ops; + union crste *last; + union pte *last_pte; + union asce asce; + gfn_t start; + gfn_t end; + int flags; + void *priv; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -436,6 +465,11 @@ bool dat_crstep_xchg_atomic(union crste *crstep, union= crste old, union crste ne union asce asce); void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); =20 +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv); + +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -834,4 +868,9 @@ static inline void dat_crstep_clear(union crste *crstep= , gfn_t gfn, union asce a dat_crstep_xchg(crstep, newcrste, gfn, asce); } =20 +static inline int get_level(union crste *crstep, union pte *ptep) +{ + return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93DB3423161; Wed, 4 Feb 2026 15:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217393; cv=none; b=L97jmDLVOJpDlbTUn90MOYxHIDe+vUX+h+b4wSfInmuRiu1v13XPavRBL3gJBvc9SDe/kaAc3Qml5yDrdcaNN0UWp8fM1zAJntA53OrYfHY18l0GYDsILe3zEY56TwuxgdcLX7MYy1mCNW6Otvx8OJ6Omxtu6VlGbYwRVSxWqW8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217393; c=relaxed/simple; bh=U3sQOJljqLrMrwqX6yzYSmojGp5W8ux11s5jzh/Cgvw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s2WIX7EnnELEKqQ1bQPae93lGtqoyxtRwvcz9+KWCSCzz3Q9n5lWNa7gR9IsYlZ47qVXpbPkEVqc1nfvB3jQOMt1zIEttMuzwHj8+mMPjVXf5OHZdV+OydiAT95xEstrvszwuxuq3FCcyqxTZD2b9GB2x/s7YqgE2/CX6FHVj5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=mXH/VMef; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="mXH/VMef" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6148tNuV024122; Wed, 4 Feb 2026 15:03:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=mGp6OKLB/aSmTnVVR 4YeDhFbFE1l5I8zv+nxtZ8ZFAU=; b=mXH/VMef+6QMifLEzY3HqyMuhRnl4t6ip 3WxshaeAK2W7l83YJsc+fAXihoEf/mTlIYGQ91IY95qXlSLs5+5oIKKbyfMUJuH1 MpUXfIPrk/dLQEGnncHVNZhwfwCB8j2wpeNltfySNCHQhFFh5Ay6DsDYGDkt9B7U klb93ldz02T4TV++qr74HNz4i/8y64/oO7TfZtt5Xrw0/zGihwtY+/pA2kcAM2Wt W/rVQAMYz98U3wK/OmtWtFxXhr3o4HvYosR71OL4arBTIans6OFgvdNhNobDepht LXCHXgjAB+JNag0uFi6goBdDtnxCIwEbrSFhhZ5ljuztpxXYrig4Q== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c1986jf5u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:09 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614DeYFt027353; Wed, 4 Feb 2026 15:03:08 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1xs1dcre-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F349G43843898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:04 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9346F2004B; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4924220043; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 16/29] KVM: s390: KVM page table management functions: storage keys Date: Wed, 4 Feb 2026 16:02:45 +0100 Message-ID: <20260204150259.60425-17-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX44pzNqRf3cqu jX/GOOSLpc3pqDTL9GRv737WZ/1wnw45RJ8VQOobcLW2CPgXSuBRTcXJjCu1P6bQdoJBICxWdSv bp9uwXKHcvw9Tv0KSGMmymhylICJEZOP6mYaDpP4fDTH/yRL5Ysb9YcWcyHiWA5AP9Z1SdJR+Ti PkLm+0STbv0tbVbu4Ah72QJF9M/1tCxv0bVFiAtO4eQ9DQOTd+lAuY0QVeTiksNuqW3qKz8RVLv QC/VVsOFzMF2sVCqfTFzF9vtcRapAXyn+qS7SrDDdAB6q5IHCWfIV56VaNkxgBuQjvMry91lzOL w/8N5OOdtLxD42gzxiDXdPtt/01s8bBveUh+2T4poKYomAz61EcZeBJ1635QwzZXM1VZQwHscs2 mlEFkd1uDaFLS6xbbP+bMBpD31JaXaW333fTusmEtjwjNc2o3+TCYRp1QHaAslKmsHYewTtqFZZ q2GgLNaLYVCemH7MVyA== X-Proofpoint-GUID: SELhimdDIHb6RjHH-XuALQrLeJFruDYK X-Authority-Analysis: v=2.4 cv=DbAaa/tW c=1 sm=1 tr=0 ts=69835fad cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=nv0Um0OdivUXotKj5UwA:9 X-Proofpoint-ORIG-GUID: SELhimdDIHb6RjHH-XuALQrLeJFruDYK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 adultscore=0 malwarescore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 suspectscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions related to storage key handling. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.c | 223 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 7 ++ 2 files changed, 230 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index c31838107752..99682792d9ee 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -602,3 +602,226 @@ long _dat_walk_gfn_range(gfn_t start, gfn_t end, unio= n asce asce, =20 return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); } + +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int rc; + + skey->skey =3D 0; + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste; + + crste =3D READ_ONCE(*crstep); + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + skey->skey =3D page_get_storage_key(large_crste_to_phys(crste, gfn)); + return 0; + } + pgste =3D pgste_get_lock(ptep); + if (ptep->h.i) { + skey->acc =3D pgste.acc; + skey->fp =3D pgste.fp; + } else { + skey->skey =3D page_get_storage_key(pte_origin(*ptep)); + } + skey->r |=3D pgste.gr; + skey->c |=3D pgste.gc; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static void dat_update_ptep_sd(union pgste old, union pgste pgste, union p= te *ptep) +{ + if (pgste.acc !=3D old.acc || pgste.fp !=3D old.fp || pgste.gr !=3D old.g= r || pgste.gc !=3D old.gc) + __atomic64_or(_PAGE_SD, &ptep->val); +} + +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAGE= _TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + page_set_storage_key(large_crste_to_phys(*crstep, gfn), skey.skey, !nq); + return 0; + } + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + union skey old_skey; + + old_skey.skey =3D page_get_storage_key(pte_origin(*ptep)); + pgste.hc |=3D old_skey.c; + pgste.hr |=3D old_skey.r; + old_skey.c =3D old.gc; + old_skey.r =3D old.gr; + skey.r =3D 0; + skey.c =3D 0; + page_set_storage_key(pte_origin(*ptep), skey.skey, !nq); + } + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static bool page_cond_set_storage_key(phys_addr_t paddr, union skey skey, = union skey *oldkey, + bool nq, bool mr, bool mc) +{ + oldkey->skey =3D page_get_storage_key(paddr); + if (oldkey->acc =3D=3D skey.acc && oldkey->fp =3D=3D skey.fp && + (oldkey->r =3D=3D skey.r || mr) && (oldkey->c =3D=3D skey.c || mc)) + return false; + page_set_storage_key(paddr, skey.skey, !nq); + return true; +} + +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc) +{ + union pgste pgste, old; + union crste *crstep; + union skey prev; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mmc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAG= E_TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) + return page_cond_set_storage_key(large_crste_to_phys(*crstep, gfn), skey= , oldkey, + nq, mr, mc); + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + rc =3D 1; + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + rc =3D page_cond_set_storage_key(pte_origin(*ptep), skey, &prev, nq, mr,= mc); + pgste.hc |=3D prev.c; + pgste.hr |=3D prev.r; + prev.c |=3D old.gc; + prev.r |=3D old.gr; + } else { + prev.acc =3D old.acc; + prev.fp =3D old.fp; + prev.c =3D old.gc; + prev.r =3D old.gr; + } + if (oldkey) + *oldkey =3D prev; + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +int dat_reset_reference_bit(union asce asce, gfn_t gfn) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + return page_reset_referenced(large_crste_to_phys(*crstep, gfn)); + } + old =3D pgste_get_lock(ptep); + pgste =3D old; + + if (!ptep->h.i) { + rc =3D page_reset_referenced(pte_origin(*ptep)); + pgste.hr =3D rc >> 1; + } + rc |=3D (pgste.gr << 1) | pgste.gc; + pgste.gr =3D 0; + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +static long dat_reset_skeys_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.acc =3D 0; + pgste.fp =3D 0; + pgste.gr =3D 0; + pgste.gc =3D 0; + if (ptep->s.pr) + page_set_storage_key(pte_origin(*ptep), PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(ptep, pgste); + + if (need_resched()) + return next; + return 0; +} + +static long dat_reset_skeys_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + phys_addr_t addr, end, origin =3D crste_origin_large(*crstep); + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + addr =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + while (ALIGN(addr + 1, _SEGMENT_SIZE) <=3D end) + addr =3D sske_frame(addr, PAGE_DEFAULT_KEY); + for ( ; addr < end; addr +=3D PAGE_SIZE) + page_set_storage_key(addr, PAGE_DEFAULT_KEY, 1); + + if (need_resched()) + return next; + return 0; +} + +long dat_reset_skeys(union asce asce, gfn_t start) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_reset_skeys_pte, + .pmd_entry =3D dat_reset_skeys_crste, + .pud_entry =3D dat_reset_skeys_crste, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 409064bd20a5..6c7815d4f07f 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -472,6 +472,13 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_= t gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq); +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc); +int dat_reset_reference_bit(union asce asce, gfn_t gfn); +long dat_reset_skeys(union asce asce, gfn_t start); =20 int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42A3842669E; Wed, 4 Feb 2026 15:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; cv=none; b=Gk0RDqoYE1PEOTlD7hDVhCC+UJL1CMiSUH5HiGS47F0nhj8/ZtroUksCgvL6WovXq3R5jqcEG2M0VZ8ccGMTFwOhOpHN5MuiaztYu4P+RjKXsGlsVmgpVXEjeh8zshg+qQ4tjX5WtGyj13zNZtVp8mCQoC4achNQ33gmHMPnVZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; c=relaxed/simple; bh=Nt8yiwKcTVIYoICE/YSuq6dvlS2Fs33a2orln7a17zc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lLtkVQARd9qORKLIdJ50ybyIIN+0chRm7wTn2nkmG6CZUjQO+9yaiOimH0HPVzeWebaV7NNnV9mI/OG5tsHKuLCnrD5aYMNxSos9khq4LDNRcufyzwFXT7isGaDitM/uAHriYBTnIgkEX2OFpS+jKPZh5JRsTL5jmxwUnsAWs3s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=RoNBTTzj; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="RoNBTTzj" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613N65f8008971; Wed, 4 Feb 2026 15:03:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=ig34zV3vXd5xLb9Md YiK2ZSKEXxDpyc8Aj+wy2JXTw0=; b=RoNBTTzjoX7lTiDQJjTl87W487ulfmwrl dsN4mvImoglTXQW18CdDGbim39T4Eq1OSjHs3ATFTAgHAq04nnW66ih80p7prHRs qi/b8SQSl5p9nxsK5U27GX8WZT44OJzjp6cMgvs+uRG00qQOBMTtJRUHYZTMJiaB li4DqLH/zYpJyY3rl3f19UEfcC+On1+1xuR2ueHMMiPwO/18gQzx16607ZDw8WTT Ibn41EXgjjJY00k1syPyKGaiW3IEcls9TZRcNtiC7IUMfiGkyP40aWlHtXQvXW6c z0SYjzotHx0VCeJ01MRhxPDOWWmFaRfFIGzBnOuyd//poQvt79Jrg== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad6w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614BtKiU009198; Wed, 4 Feb 2026 15:03:09 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1vey5s2s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:08 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F35HH60162388 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:05 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E459820043; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9B5252004F; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 17/29] KVM: s390: KVM page table management functions: lifecycle management Date: Wed, 4 Feb 2026 16:02:46 +0100 Message-ID: <20260204150259.60425-18-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX+mYc+XAIuGcI tCmiMRLEclwbLEtWqS16xaIifNU08ZFyLYD9dBcHHNOA5Z+VD7qeWbGsmPYOVt2x0sXQcBbNLPd dHXJi0lhTdGLIhZIV3B/XKphvon0uL0AkkbZdUNjj+TydUVMmbYkd8dSbudCFzlIKpB9zsQLCef IJB7K6JJRwJJIOwdQs451YvBi4Ee6fnlwcvbIUAaesUBBcJcsgz6sRdYcP+bvKDeuqkofkS5wDJ seOC6ETxyHuLW1FQfDFh5T/iAJs65mCakQwAmtSEbnt55nl3NHxTvFPDkZ1tXLtkSU12cy2jN2j 2vd4aMizJgoYgHC4dIHHmGHh9EPQDzeQZFDgT3/jVmfAHxXumaqLIFT2vRAq6B4YeqKvq3+biQf PC7nm/oADF6leZaY17w0hwCJb9HAV8imgZW9h+9QOoH7HJVylZh42CisxjW4A5UBwIYEH4PC7hb sSLlCh7FRfCijY35zyg== X-Proofpoint-GUID: H8cwJAz5ZMNU_CR8ABJuvEVE8vnYELju X-Proofpoint-ORIG-GUID: H8cwJAz5ZMNU_CR8ABJuvEVE8vnYELju X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fae cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iJ2WplcNSJuwDbTYnpEA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle memslot creation and destruction, additional per-pagetable data stored in the PGSTEs, mapping physical addresses into the gmap, and marking address ranges as prefix. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.c | 289 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 59 +++++++++ 2 files changed, 348 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index 99682792d9ee..bc27405cdea1 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -102,6 +102,38 @@ void dat_free_level(struct crst_table *table, bool own= s_ptes) dat_free_crst(table); } =20 +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype) +{ + struct crst_table *table; + union crste crste; + + while (asce->dt > newtype) { + table =3D dereference_asce(*asce); + crste =3D table->crstes[0]; + if (crste.h.fc) + return 0; + if (!crste.h.i) { + asce->rsto =3D crste.h.fc0.to; + dat_free_crst(table); + } else { + crste.h.tt--; + crst_table_init((void *)table, crste.val); + } + asce->dt--; + } + while (asce->dt < newtype) { + crste =3D _crste_fc0(asce->rsto, asce->dt + 1); + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + crst_table_init((void *)table, _CRSTE_HOLE(crste.h.tt).val); + table->crstes[0] =3D crste; + asce->rsto =3D __pa(table) >> PAGE_SHIFT; + asce->dt++; + } + return 0; +} + /** * dat_crstep_xchg() - Exchange a gmap CRSTE with another. * @crstep: Pointer to the CRST entry @@ -825,3 +857,260 @@ long dat_reset_skeys(union asce asce, gfn_t start) =20 return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); } + +struct slot_priv { + unsigned long token; + struct kvm_s390_mmu_cache *mc; +}; + +static long _dat_slot_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct d= at_walk *walk) +{ + struct slot_priv *p =3D walk->priv; + union crste dummy =3D { .val =3D p->token }; + union pte new_pte, pte =3D READ_ONCE(*ptep); + + new_pte =3D _PTE_TOK(dummy.tok.type, dummy.tok.par); + + /* Table entry already in the desired state. */ + if (pte.val =3D=3D new_pte.val) + return 0; + + dat_ptep_xchg(ptep, new_pte, gfn, walk->asce, false); + return 0; +} + +static long _dat_slot_crste(union crste *crstep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union crste new_crste, crste =3D READ_ONCE(*crstep); + struct slot_priv *p =3D walk->priv; + + new_crste.val =3D p->token; + new_crste.h.tt =3D crste.h.tt; + + /* Table entry already in the desired state. */ + if (crste.val =3D=3D new_crste.val) + return 0; + + /* This table entry needs to be updated. */ + if (walk->start <=3D gfn && walk->end >=3D next) { + dat_crstep_xchg_atomic(crstep, crste, new_crste, gfn, walk->asce); + /* A lower level table was present, needs to be freed. */ + if (!crste.h.fc && !crste.h.i) { + if (is_pmd(crste)) + dat_free_pt(dereference_pmd(crste.pmd)); + else + dat_free_level(dereference_crste(crste), true); + } + return 0; + } + + /* A lower level table is present, things will handled there. */ + if (!crste.h.fc && !crste.h.i) + return 0; + /* Split (install a lower level table), and handle things there. */ + return dat_split_crste(p->mc, crstep, gfn, walk->asce, false); +} + +static const struct dat_walk_ops dat_slot_ops =3D { + .pte_entry =3D _dat_slot_pte, + .crste_ops =3D { _dat_slot_crste, _dat_slot_crste, _dat_slot_crste, _dat_= slot_crste, }, +}; + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param) +{ + struct slot_priv priv =3D { + .token =3D _CRSTE_TOK(0, type, param).val, + .mc =3D mc, + }; + + return _dat_walk_gfn_range(start, end, asce, &dat_slot_ops, + DAT_WALK_IGN_HOLES | DAT_WALK_ANY, &priv); +} + +static void pgste_set_unlock_multiple(union pte *first, int n, union pgste= *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgstes[i].pcl) + break; + pgste_set_unlock(first + i, pgstes[i]); + } +} + +static bool pgste_get_trylock_multiple(union pte *first, int n, union pgst= e *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgste_get_trylock(first + i, pgstes + i)) + break; + } + if (i =3D=3D n) + return true; + pgste_set_unlock_multiple(first, n, pgstes); + return false; +} + +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram) +{ + union pgste pgstes[4] =3D {}; + unsigned long res =3D 0; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D 0; i < n; i++) + res =3D res << 16 | pgstes[i].val16; + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); + return res; +} + +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val) +{ + union pgste pgstes[4] =3D {}; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D param.len; i >=3D 0; i--) { + pgstes[i].val16 =3D val; + val =3D val >> 16; + } + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); +} + +static long _dat_test_young_pte(union pte *ptep, gfn_t start, gfn_t end, s= truct dat_walk *walk) +{ + return ptep->s.y; +} + +static long _dat_test_young_crste(union crste *crstep, gfn_t start, gfn_t = end, + struct dat_walk *walk) +{ + return crstep->h.fc && crstep->s.fc1.y; +} + +static const struct dat_walk_ops test_age_ops =3D { + .pte_entry =3D _dat_test_young_pte, + .pmd_entry =3D _dat_test_young_crste, + .pud_entry =3D _dat_test_young_crste, +}; + +/** + * dat_test_age_gfn() - Test young. + * @asce: The ASCE whose address range is to be tested. + * @start: The first guest frame of the range to check. + * @end: The guest frame after the last in the range. + * + * Context: called by KVM common code with the kvm mmu write lock held. + * + * Return: %true if any page in the given range is young, otherwise %false. + */ +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end) +{ + return _dat_walk_gfn_range(start, end, asce, &test_age_ops, 0, NULL) > 0; +} + +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f) +{ + union crste oldval, newval; + union pte newpte, oldpte; + union pgste pgste; + int rc =3D 0; + + rc =3D dat_entry_walk(mc, f->gfn, asce, DAT_WALK_ALLOC_CONTINUE, level, &= f->crstep, &f->ptep); + if (rc =3D=3D -EINVAL || rc =3D=3D -ENOMEM) + return rc; + if (rc) + return -EAGAIN; + + if (WARN_ON_ONCE(unlikely(get_level(f->crstep, f->ptep) > level))) + return -EINVAL; + + if (f->ptep) { + pgste =3D pgste_get_lock(f->ptep); + oldpte =3D *f->ptep; + newpte =3D _pte(f->pfn, f->writable, f->write_attempt | oldpte.s.d, !f->= page); + newpte.s.sd =3D oldpte.s.sd; + oldpte.s.sd =3D 0; + if (oldpte.val =3D=3D _PTE_EMPTY.val || oldpte.h.pfra =3D=3D f->pfn) { + pgste =3D __dat_ptep_xchg(f->ptep, pgste, newpte, f->gfn, asce, uses_sk= eys); + if (f->callback) + f->callback(f); + } else { + rc =3D -EAGAIN; + } + pgste_set_unlock(f->ptep, pgste); + } else { + oldval =3D READ_ONCE(*f->crstep); + newval =3D _crste_fc1(f->pfn, oldval.h.tt, f->writable, + f->write_attempt | oldval.s.fc1.d); + newval.s.fc1.sd =3D oldval.s.fc1.sd; + if (oldval.val !=3D _CRSTE_EMPTY(oldval.h.tt).val && + crste_origin_large(oldval) !=3D crste_origin_large(newval)) + return -EAGAIN; + if (!dat_crstep_xchg_atomic(f->crstep, oldval, newval, f->gfn, asce)) + return -EAGAIN; + if (f->callback) + f->callback(f); + } + + return rc; +} + +static long dat_set_pn_crste(union crste *crstep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + union crste crste =3D READ_ONCE(*crstep); + int *n =3D walk->priv; + + if (!crste.h.fc || crste.h.i || crste.h.p) + return 0; + + *n =3D 2; + if (crste.s.fc1.prefix_notif) + return 0; + crste.s.fc1.prefix_notif =3D 1; + dat_crstep_xchg(crstep, crste, gfn, walk->asce); + return 0; +} + +static long dat_set_pn_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct = dat_walk *walk) +{ + int *n =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (!ptep->h.i && !ptep->h.p) { + pgste.prefix_notif =3D 1; + *n +=3D 1; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn) +{ + static const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_set_pn_pte, + .pmd_entry =3D dat_set_pn_crste, + .pud_entry =3D dat_set_pn_crste, + }; + + int n =3D 0; + + _dat_walk_gfn_range(gfn, gfn + 2, asce, &ops, DAT_WALK_IGN_HOLES, &n); + if (n !=3D 2) + return -EAGAIN; + return 0; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 6c7815d4f07f..fe8d790a297d 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -361,6 +361,11 @@ struct dat_walk { void *priv; }; =20 +struct ptval_param { + unsigned char offset : 6; + unsigned char len : 2; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -459,6 +464,32 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +struct guest_fault { + gfn_t gfn; /* Guest frame */ + kvm_pfn_t pfn; /* Host PFN */ + struct page *page; /* Host page */ + union pte *ptep; /* Used to resolve the fault, or NULL */ + union crste *crstep; /* Used to resolve the fault, or NULL */ + bool writable; /* Mapping is writable */ + bool write_attempt; /* Write access attempted */ + bool attempt_pfault; /* Attempt a pfault first */ + bool valid; /* This entry contains valid data */ + void (*callback)(struct guest_fault *f); + void *priv; +}; + +/* + * 0 1 2 3 4 5 6 7 + * +-------+-------+-------+-------+-------+-------+-------+-------+ + * 0 | | PGT_ADDR | + * 8 | VMADDR | | + * 16 | | + * 24 | | + */ +#define MKPTVAL(o, l) ((struct ptval_param) { .offset =3D (o), .len =3D ((= l) + 1) / 2 - 1}) +#define PTVAL_PGT_ADDR MKPTVAL(4, 8) +#define PTVAL_VMADDR MKPTVAL(8, 6) + union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, gfn_t gfn, union asce asce, bool uses_skeys); bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, @@ -472,6 +503,7 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t= gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype); int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, union skey skey, bool nq); @@ -480,6 +512,16 @@ int dat_cond_set_storage_key(struct kvm_s390_mmu_cache= *mmc, union asce asce, gf int dat_reset_reference_bit(union asce asce, gfn_t gfn); long dat_reset_skeys(union asce asce, gfn_t start); =20 +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram); +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val); + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param); +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn); +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end); +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) @@ -880,4 +922,21 @@ static inline int get_level(union crste *crstep, union= pte *ptep) return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; } =20 +static inline int dat_delete_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_PIC, PGM_= ADDRESSING); +} + +static inline int dat_create_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_NONE, 0); +} + +static inline bool crste_is_ucas(union crste crste) +{ + return is_pmd(crste) && crste.h.i && crste.h.fc0.tl =3D=3D 1 && crste.h.f= c =3D=3D 0; +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 276DA426691; Wed, 4 Feb 2026 15:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; cv=none; b=Ac8B6Fji6Mt1ev2AnkAmQGp78Y2pOzTdZncw9Fx3Ad3mvJNd3Qx/xoxMPFpJUmID4+WS15Sap7vyJKUt/Bbe0hTOk2n28MEn+H4L5Wv4JLl2x/hdYEi0m7qKQesbvfDP8UHlqpQkkNxSuUm5uule7EaDxi2TasTzDiQqTFBdEPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; c=relaxed/simple; bh=uXe7iU9TcPHImEr6aYR53RiEIzjjtUhgCuuMxMoetcE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V4MJKJhP5mSa37DW2PaaE0O1odwvw/tCBI11OYXdokx9vLDDfIAnGsMtvHYHY26HNgRusJTpoC4BUwW4PFS+QG7Bn7G0Er+36VFp2ySykg4V3sbP4epYLZDJGiuWeOBuVh00It+dhgLhBlhBMKhvCNQMOU0vg73rdW2aQhxf3bQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=T+CZwiW7; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="T+CZwiW7" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614EsmVb006454; Wed, 4 Feb 2026 15:03:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=E9PAUnJlbkhgz5yld Ei9g1o3xVsFa/94NQjHWyaoxaI=; b=T+CZwiW7WKzUYkAVVKbNA2+dCyiC3FFPc jMUfFodV2Y5LthRgCkmI7kM+4CT7tVVgJxDTRz2mjqFZEn8DRaOP+LjmayK65vPV 3oOrl5zTJh0N2iuWsETgOAYTzwHORQfl95eeIIF769BBeNU9KHFW6BUk2Bw8QJKn +jQkIzKbhPzV+a68tHiFKmgXOBjYyVx8eRR5P40R06yv2mMOqmX3vDaJSXjwNn2v m0UdEgeXgB/RSDWwqRYu+QKtl5bjQpM2UhXws335PQtoU4CdgIoEca8q08FRdZvs P5F9HtFntjT6Vz07ZCVWGXdi3Im9++Ul/BE0k8d9qpYZBO4Kocs8Q== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rnv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CTXp3004466; Wed, 4 Feb 2026 15:03:09 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1wjjwk8y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:09 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F35WY60162392 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:05 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4146020040; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC31E2004B; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:04 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 18/29] KVM: s390: KVM page table management functions: CMMA Date: Wed, 4 Feb 2026 16:02:47 +0100 Message-ID: <20260204150259.60425-19-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX4qGKgHPkaZdA uQZ9/H7VAAwNCOumn6ryzPyEeROaTRqX28ezuC+XcMHP7S1e1HuLXZ3dso9rkZEOLCSk9WXpXul rhJZxShVJeb6VOlM8KQA2vVFYotfLcMQEqGxxnnI6zjWOGdnqJu46TYxa0falJ3YMeMRQWykn1T PsDpxkzCU92FyOpvDLHJubY23RRmqQZ4bW6FIyo5rOHctX+uTxXhRGwEcbe27QkvgyPhNxqDvxm jpwW79hRFO3dgNzF+jcJASCXM48lcW8X70pyHif+xmrTvzLxr6EQSXGfiDpZaKULxMtiBBn4YQ9 P8XezqgQia3jNVacS9+gD6+a94D4ZxinIo6i4tHw6lcPj/5FYrM6YXbRkbqDEnJ7tSQ3k7DUpSC AyvQgMZ7W0iW3HmPmB+ZJ+lrRp+GjR4dFhiO2yOCUoEruZyjV7aNsSotEcxY15gl/UEGqBLERKZ JUgNIITey4FtDXkVCyw== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fae cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=DcBydVYxWztYwQPU1OIA:9 X-Proofpoint-ORIG-GUID: fvBVGFdXxm2LmSjifbYgLKZfx67Ix7JS X-Proofpoint-GUID: fvBVGFdXxm2LmSjifbYgLKZfx67Ix7JS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle CMMA and the ESSA instruction. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/dat.c | 275 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 27 +++++ 2 files changed, 302 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index bc27405cdea1..129dc55a4a0d 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -1114,3 +1114,278 @@ int dat_set_prefix_notif_bit(union asce asce, gfn_t= gfn) return -EAGAIN; return 0; } + +/** + * dat_perform_essa() - Perform ESSA actions on the PGSTE. + * @asce: The asce to operate on. + * @gfn: The guest page frame to operate on. + * @orc: The specific action to perform, see the ESSA_SET_* macros. + * @state: The storage attributes to be returned to the guest. + * @dirty: Returns whether the function dirtied a previously clean entry. + * + * Context: Called with kvm->mmu_lock held. + * + * Return: + * * %1 if the page state has been altered and the page is to be added to = the CBRL + * * %0 if the page state has been altered, but the page is not to be adde= d to the CBRL + * * %-1 if the page state has not been altered and the page is not to be = added to the CBRL + */ +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int res =3D 0; + + if (dat_entry_walk(NULL, gfn, asce, 0, TABLE_TYPE_PAGE_TABLE, &crstep, &p= tep)) { + *state =3D (union essa_state) { .exception =3D 1 }; + return -1; + } + + pgste =3D pgste_get_lock(ptep); + + *state =3D (union essa_state) { + .content =3D (ptep->h.i << 1) + (ptep->h.i && pgste.zero), + .nodat =3D pgste.nodat, + .usage =3D pgste.usage, + }; + + switch (orc) { + case ESSA_GET_STATE: + res =3D -1; + break; + case ESSA_SET_STABLE: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 0; + break; + case ESSA_SET_UNUSED: + pgste.usage =3D PGSTE_GPS_USAGE_UNUSED; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_VOLATILE: + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_POT_VOLATILE: + if (!ptep->h.i) { + pgste.usage =3D PGSTE_GPS_USAGE_POT_VOLATILE; + } else if (pgste.zero) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + } else if (!pgste.gc) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + res =3D 1; + } + break; + case ESSA_SET_STABLE_RESIDENT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + /* + * Since the resident state can go away any time after this + * call, we will not make this page resident. We can revisit + * this decision if a guest will ever start using this. + */ + break; + case ESSA_SET_STABLE_IF_RESIDENT: + if (!ptep->h.i) + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + break; + case ESSA_SET_STABLE_NODAT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 1; + break; + default: + WARN_ONCE(1, "Invalid ORC!"); + res =3D -1; + break; + } + /* If we are discarding a page, set it to logical zero. */ + pgste.zero =3D res =3D=3D 1; + if (orc > 0) { + *dirty =3D !pgste.cmma_d; + pgste.cmma_d =3D 1; + } + + pgste_set_unlock(ptep, pgste); + + return res; +} + +static long dat_reset_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D 0; + pgste.nodat =3D 0; + pgste.cmma_d =3D 0; + pgste_set_unlock(ptep, pgste); + if (need_resched()) + return next; + return 0; +} + +long dat_reset_cmma(union asce asce, gfn_t start) +{ + const struct dat_walk_ops dat_reset_cmma_ops =3D { + .pte_entry =3D dat_reset_cmma_pte, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &dat_reset_cmma_o= ps, + DAT_WALK_IGN_HOLES, NULL); +} + +struct dat_get_cmma_state { + gfn_t start; + gfn_t end; + unsigned int count; + u8 *values; + atomic64_t *remaining; +}; + +static long __dat_peek_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + state->values[gfn - walk->start] =3D pgste.usage | (pgste.nodat << 6); + pgste_set_unlock(ptep, pgste); + state->end =3D next; + + return 0; +} + +static long __dat_peek_cmma_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + + if (crstep->h.i) + state->end =3D min(walk->end, next); + return 0; +} + +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D __dat_peek_cmma_pte, + .pmd_entry =3D __dat_peek_cmma_crste, + .pud_entry =3D __dat_peek_cmma_crste, + .p4d_entry =3D __dat_peek_cmma_crste, + .pgd_entry =3D __dat_peek_cmma_crste, + }; + struct dat_get_cmma_state state =3D { .values =3D values, }; + int rc; + + rc =3D _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DE= FAULT, &state); + *count =3D state.end - start; + /* Return success if at least one value was saved, otherwise an error. */ + return (rc =3D=3D -EFAULT && *count > 0) ? 0 : rc; +} + +static long __dat_get_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + if (state->start !=3D -1) { + if ((gfn - state->end) > KVM_S390_MAX_BIT_DISTANCE) + return 1; + if (gfn - state->start >=3D state->count) + return 1; + } + + if (!READ_ONCE(*pgste_of(ptep)).cmma_d) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (pgste.cmma_d) { + if (state->start =3D=3D -1) + state->start =3D gfn; + pgste.cmma_d =3D 0; + atomic64_dec(state->remaining); + state->values[gfn - state->start] =3D pgste.usage | pgste.nodat << 6; + state->end =3D next; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_get_cmma_pte, }; + struct dat_get_cmma_state state =3D { + .remaining =3D rem, + .values =3D values, + .count =3D *count, + .start =3D -1, + }; + + _dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLE= S, &state); + + if (state.start =3D=3D -1) { + *count =3D 0; + } else { + *count =3D state.end - state.start; + *start =3D state.start; + } + + return 0; +} + +struct dat_set_cmma_state { + unsigned long mask; + const u8 *bits; +}; + +static long __dat_set_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_set_cmma_state *state =3D walk->priv; + union pgste pgste, tmp; + + tmp.val =3D (state->bits[gfn - walk->start] << 24) & state->mask; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D tmp.usage; + pgste.nodat =3D tmp.nodat; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +/** + * dat_set_cmma_bits() - Set CMMA bits for a range of guest pages. + * @mc: Cache used for allocations. + * @asce: The ASCE of the guest. + * @gfn: The guest frame of the fist page whose CMMA bits are to set. + * @count: How many pages need to be processed. + * @mask: Which PGSTE bits should be set. + * @bits: Points to an array with the CMMA attributes. + * + * This function sets the CMMA attributes for the given pages. If the input + * buffer has zero length, no action is taken, otherwise the attributes are + * set and the mm->context.uses_cmm flag is set. + * + * Each byte in @bits contains new values for bits 32-39 of the PGSTE. + * Currently, only the fields NT and US are applied. + * + * Return: %0 in case of success, a negative error value otherwise. + */ +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_set_cmma_pte, }; + struct dat_set_cmma_state state =3D { .mask =3D mask, .bits =3D bits, }; + union crste *crstep; + union pte *ptep; + gfn_t cur; + int rc; + + for (cur =3D ALIGN_DOWN(gfn, _PAGE_ENTRIES); cur < gfn + count; cur +=3D = _PAGE_ENTRIES) { + rc =3D dat_entry_walk(mc, cur, asce, DAT_WALK_ALLOC, TABLE_TYPE_PAGE_TAB= LE, + &crstep, &ptep); + if (rc) + return rc; + } + return _dat_walk_gfn_range(gfn, gfn + count, asce, &ops, DAT_WALK_IGN_HOL= ES, &state); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index fe8d790a297d..358b756ca8c9 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -17,6 +17,15 @@ #include #include =20 +/* + * Base address and length must be sent at the start of each block, theref= ore + * it's cheaper to send some clean data, as long as it's less than the siz= e of + * two longs. + */ +#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) +/* For consistency */ +#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) + #define _ASCE(x) ((union asce) { .val =3D (x), }) #define NULL_ASCE _ASCE(0) =20 @@ -433,6 +442,17 @@ static inline union crste _crste_fc1(kvm_pfn_t pfn, in= t tt, bool writable, bool return res; } =20 +union essa_state { + unsigned char val; + struct { + unsigned char : 2; + unsigned char nodat : 1; + unsigned char exception : 1; + unsigned char usage : 2; + unsigned char content : 2; + }; +}; + /** * struct vsie_rmap - reverse mapping for shadow page table entries * @next: pointer to next rmap in the list @@ -522,6 +542,13 @@ bool dat_test_age_gfn(union asce asce, gfn_t start, gf= n_t end); int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, bool uses_skeys, struct guest_fault *f); =20 +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty); +long dat_reset_cmma(union asce asce, gfn_t start_gfn); +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues); +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem); +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8B27428840; Wed, 4 Feb 2026 15:03:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217399; cv=none; b=LhV2deySvkSFL7VKFjAJboFEmSxa11JsHwOHVSrNcThqSYs0QafaKIBstZPDMWWkm59bFereAUV1HtmYISFKh5E5ImJdyuEud/eQkIa4F09pAUFOYQVQp94babQOQZ1GDUvieIOj7P1PMP7AsAUO1mfkPzzY4mrBtzvr31nj+38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217399; c=relaxed/simple; bh=QJlr3DMwp5FYRwqQ6xH815JojbKVMR/aY1zigroPnpQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ReHAQbJpUpJwP6xKFUD6NCEtVF7C+3uCsIHGGNwRYSbhEaY8scQxgEqeKeC+xs54qw9NX40BG+51WPXdB2NpGlriAJ96+qIDEPUU+suqxqFypqD6/WIQY9hH4N7NQALQHH9JbWdOt4h0a4wVJgN5QVQohp+QUd9tcua4gD+o/2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=bc00sAw4; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="bc00sAw4" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614EXS4k032144; Wed, 4 Feb 2026 15:03:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=e907pJBzgRBe+iQ2j R2m64vbaWLg+AHw65pOQ7kn4Jo=; b=bc00sAw4Hs4wwVNzZYm5xjXYF9aSmfmPs yuvb8mFtlsczhFt8g07C0Rll0a/kN9X+SIPh3/kBizjHT+Yn3YJyV3JjsiFLrKAW huIkFlOim3T4U+UJbhdngcOElPgVLaXL5UVAWVNi8fi8PXpFcX0A+k5MAUsoqwjY s1U0E1bU4nn5K/rzA3ojKeowiStW2GEUwZbZrIZH140XmEq1Sdfm/blObkyukb2K asK6Hbbn5N4115MeKcgiMi3D8l5xNzKP6IiGX2OGCcHgY/OSdu7MozJFoPp/gotB 2eElWADHIFOgDgVjh6hq4gXGQGm1uXLdNoltO6vFQLGSSOvegAnLQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad70-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614C0Pob029052; Wed, 4 Feb 2026 15:03:09 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1v2sdspk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:09 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F35rM60162396 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:05 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9D90F20040; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A1B12004D; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 19/29] KVM: s390: New gmap code Date: Wed, 4 Feb 2026 16:02:48 +0100 Message-ID: <20260204150259.60425-20-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX8sAbzZOLmcEJ FqGf2+Daq45uEpYd3vUgN6qLOnNFn46SnPLJrgD4AoIspoiGbCYIb0GkMb/Hkr5wk1aOV0wMyYu rC2XCZR8UDzVBSWBrmO2d2QqeO3uZZAy0J91xH+PGnCympCbFxNmXc/8lYirEV87Hz+uhpYrNFN TmTvSK91XFbe7/i+QNOq2ZYBSYLOQkbNkYsuWfIZN0gabCTU13DtzX7DLZj1qNgSBaORSf5swRa 8DXBA2hduDjIwKlDPfGUqhj6u+sxTLAkIVl6Td8HEtz2jqILAAC4BJS7F3ARuf98sbX5M3pEoxD Jshlh5UusYOnhQyUiQH30uSnMW5002l7CNUsCvwRXKlcG72/BckkINg21mhz3YxCklR7TCReApA T/bp9uMgSXMHVwQQOh7+/K9s0rXZcuOROVrb9b2lhcDuIDNuTQUFVJApi9Ww2IFduLQdsIf8ydh 3+G4ZIKycUNkX/Zth/w== X-Proofpoint-GUID: 3UcWzY-UhdOcrBssxAcnfHjM1UfVvdED X-Proofpoint-ORIG-GUID: 3UcWzY-UhdOcrBssxAcnfHjM1UfVvdED X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fae cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=EoYcdVckXceJWVHm2r4A:9 a=yDVgLJAh9znUKtHr:21 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" New gmap (guest map) code. This new gmap code will only be used by KVM. This will replace the existing gmap. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/gmap.c | 1165 ++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/gmap.h | 244 +++++++++ 3 files changed, 1410 insertions(+), 1 deletion(-) create mode 100644 arch/s390/kvm/gmap.c create mode 100644 arch/s390/kvm/gmap.h diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 84315d2f75fb..21088265402c 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o +kvm-y +=3D dat.o gmap.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c new file mode 100644 index 000000000000..5736145ef53a --- /dev/null +++ b/arch/s390/kvm/gmap.c @@ -0,0 +1,1165 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Guest memory management for KVM/s390 + * + * Copyright IBM Corp. 2008, 2020, 2024 + * + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dat.h" +#include "gmap.h" +#include "kvm-s390.h" + +static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.sie_block->prog0c & PROG_IN_SIE; +} + +static int gmap_limit_to_type(gfn_t limit) +{ + if (!limit) + return TABLE_TYPE_REGION1; + if (limit <=3D _REGION3_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_SEGMENT; + if (limit <=3D _REGION2_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION3; + if (limit <=3D _REGION1_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION2; + return TABLE_TYPE_REGION1; +} + +/** + * gmap_new() - Allocate and initialize a guest address space. + * @kvm: The kvm owning the guest. + * @limit: Maximum address of the gmap address space. + * + * Return: A guest address space structure. + */ +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit) +{ + struct crst_table *table; + struct gmap *gmap; + int type; + + type =3D gmap_limit_to_type(limit); + + gmap =3D kzalloc(sizeof(*gmap), GFP_KERNEL_ACCOUNT); + if (!gmap) + return NULL; + INIT_LIST_HEAD(&gmap->children); + INIT_LIST_HEAD(&gmap->list); + INIT_LIST_HEAD(&gmap->scb_users); + INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_KVM_S390_MMU_CACHE); + spin_lock_init(&gmap->children_lock); + spin_lock_init(&gmap->host_to_rmap_lock); + refcount_set(&gmap->refcount, 1); + + table =3D dat_alloc_crst_sleepable(_CRSTE_EMPTY(type).val); + if (!table) { + kfree(gmap); + return NULL; + } + + gmap->asce.val =3D __pa(table); + gmap->asce.dt =3D type; + gmap->asce.tl =3D _ASCE_TABLE_LENGTH; + gmap->asce.x =3D 1; + gmap->asce.p =3D 1; + gmap->asce.s =3D 1; + gmap->kvm =3D kvm; + set_bit(GMAP_FLAG_OWNS_PAGETABLES, &gmap->flags); + + return gmap; +} + +static void gmap_add_child(struct gmap *parent, struct gmap *child) +{ + KVM_BUG_ON(is_ucontrol(parent) && parent->parent, parent->kvm); + KVM_BUG_ON(is_ucontrol(parent) && !owns_page_tables(parent), parent->kvm); + KVM_BUG_ON(!refcount_read(&child->refcount), parent->kvm); + lockdep_assert_held(&parent->children_lock); + + child->parent =3D parent; + + if (is_ucontrol(parent)) + set_bit(GMAP_FLAG_IS_UCONTROL, &child->flags); + else + clear_bit(GMAP_FLAG_IS_UCONTROL, &child->flags); + + if (test_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &parent->flags)) + set_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &child->flags); + else + clear_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &child->flags); + + if (kvm_is_ucontrol(parent->kvm)) + clear_bit(GMAP_FLAG_OWNS_PAGETABLES, &child->flags); + list_add(&child->list, &parent->children); +} + +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit) +{ + struct gmap *res; + + lockdep_assert_not_held(&parent->children_lock); + res =3D gmap_new(parent->kvm, limit); + if (res) { + scoped_guard(spinlock, &parent->children_lock) + gmap_add_child(parent, res); + } + return res; +} + +int gmap_set_limit(struct gmap *gmap, gfn_t limit) +{ + struct kvm_s390_mmu_cache *mc; + int rc, type; + + type =3D gmap_limit_to_type(limit); + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + rc =3D dat_set_asce_limit(mc, &gmap->asce, type); + } while (rc =3D=3D -ENOMEM); + + kvm_s390_free_mmu_cache(mc); + return 0; +} + +static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct radix_tree_iter iter; + unsigned long indices[16]; + unsigned long index; + void __rcu **slot; + int i, nr; + + /* A radix tree is freed by deleting all of its entries */ + index =3D 0; + do { + nr =3D 0; + radix_tree_for_each_slot(slot, root, &iter, index) { + indices[nr] =3D iter.index; + if (++nr =3D=3D 16) + break; + } + for (i =3D 0; i < nr; i++) { + index =3D indices[i]; + head =3D radix_tree_delete(root, index); + gmap_for_each_rmap_safe(rmap, rnext, head) + kfree(rmap); + } + } while (nr > 0); +} + +void gmap_remove_child(struct gmap *child) +{ + if (KVM_BUG_ON(!child->parent, child->kvm)) + return; + lockdep_assert_held(&child->parent->children_lock); + + list_del(&child->list); + child->parent =3D NULL; +} + +/** + * gmap_dispose() - Remove and free a guest address space and its children. + * @gmap: Pointer to the guest address space structure. + */ +void gmap_dispose(struct gmap *gmap) +{ + /* The gmap must have been removed from the parent beforehands */ + KVM_BUG_ON(gmap->parent, gmap->kvm); + /* All children of this gmap must have been removed beforehands */ + KVM_BUG_ON(!list_empty(&gmap->children), gmap->kvm); + /* No VSIE shadow block is allowed to use this gmap */ + KVM_BUG_ON(!list_empty(&gmap->scb_users), gmap->kvm); + /* The ASCE must be valid */ + KVM_BUG_ON(!gmap->asce.val, gmap->kvm); + /* The refcount must be 0 */ + KVM_BUG_ON(refcount_read(&gmap->refcount), gmap->kvm); + + /* Flush tlb of all gmaps */ + asce_flush_tlb(gmap->asce); + + /* Free all DAT tables. */ + dat_free_level(dereference_asce(gmap->asce), owns_page_tables(gmap)); + + /* Free additional data for a shadow gmap */ + if (is_shadow(gmap)) + gmap_rmap_radix_tree_free(&gmap->host_to_rmap); + + kfree(gmap); +} + +/** + * s390_replace_asce() - Try to replace the current ASCE of a gmap with a = copy. + * @gmap: The gmap whose ASCE needs to be replaced. + * + * If the ASCE is a SEGMENT type then this function will return -EINVAL, + * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng + * to the wrong pages, causing use-after-free and memory corruption. + * If the allocation of the new top level page table fails, the ASCE is not + * replaced. + * In any case, the old ASCE is always removed from the gmap CRST list. + * Therefore the caller has to make sure to save a pointer to it + * beforehand, unless a leak is actually intended. + * + * Return: 0 in case of success, -EINVAL if the ASCE is segment type ASCE, + * -ENOMEM if runinng out of memory. + */ +int s390_replace_asce(struct gmap *gmap) +{ + struct crst_table *table; + union asce asce; + + /* Replacing segment type ASCEs would cause serious issues */ + if (gmap->asce.dt =3D=3D ASCE_TYPE_SEGMENT) + return -EINVAL; + + table =3D dat_alloc_crst_sleepable(0); + if (!table) + return -ENOMEM; + memcpy(table, dereference_asce(gmap->asce), sizeof(*table)); + + /* Set new table origin while preserving existing ASCE control bits */ + asce =3D gmap->asce; + asce.rsto =3D virt_to_pfn(table); + WRITE_ONCE(gmap->asce, asce); + + return 0; +} + +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint) +{ + struct kvm *kvm =3D gmap->kvm; + struct kvm_vcpu *vcpu; + gfn_t prefix_gfn; + unsigned long i; + + if (is_shadow(gmap)) + return false; + kvm_for_each_vcpu(i, vcpu, kvm) { + /* Match against both prefix pages */ + prefix_gfn =3D gpa_to_gfn(kvm_s390_get_prefix(vcpu)); + if (prefix_gfn < end && gfn <=3D prefix_gfn + 1) { + if (hint && kvm_s390_is_in_sie(vcpu)) + return false; + VCPU_EVENT(vcpu, 2, "gmap notifier for %llx-%llx", + gfn_to_gpa(gfn), gfn_to_gpa(end)); + kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); + } + } + return true; +} + +struct clear_young_pte_priv { + struct gmap *gmap; + bool young; +}; + +static long gmap_clear_young_pte(union pte *ptep, gfn_t gfn, gfn_t end, st= ruct dat_walk *walk) +{ + struct clear_young_pte_priv *p =3D walk->priv; + union pgste pgste; + union pte pte, new; + + pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (!pte.s.y && pte.h.i)) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (!pgste.prefix_notif || gmap_mkold_prefix(p->gmap, gfn, end)) { + new =3D pte; + new.h.i =3D 1; + new.s.y =3D 0; + if ((new.s.d || !new.h.p) && !new.s.s) + folio_set_dirty(pfn_folio(pte.h.pfra)); + new.s.d =3D 0; + new.h.p =3D 1; + + pgste.prefix_notif =3D 0; + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, walk->asce, uses_skeys(= p->gmap)); + } + p->young =3D 1; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t e= nd, struct dat_walk *walk) +{ + struct clear_young_pte_priv *priv =3D walk->priv; + union crste crste, new; + + crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc) + return 0; + if (!crste.s.fc1.y && crste.h.i) + return 0; + if (!crste_prefix(crste) || gmap_mkold_prefix(priv->gmap, gfn, end)) { + new =3D crste; + new.h.i =3D 1; + new.s.fc1.y =3D 0; + new.s.fc1.prefix_notif =3D 0; + if (new.s.fc1.d || !new.h.p) + folio_set_dirty(phys_to_folio(crste_origin_large(crste))); + new.s.fc1.d =3D 0; + new.h.p =3D 1; + dat_crstep_xchg(crstep, new, gfn, walk->asce); + } + priv->young =3D 1; + return 0; +} + +/** + * gmap_age_gfn() - Clear young. + * @gmap: The guest gmap. + * @start: The first gfn to test. + * @end: The gfn after the last one to test. + * + * Context: Called with the kvm mmu write lock held. + * Return: 1 if any page in the given range was young, otherwise 0. + */ +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D gmap_clear_young_pte, + .pmd_entry =3D gmap_clear_young_crste, + .pud_entry =3D gmap_clear_young_crste, + }; + struct clear_young_pte_priv priv =3D { + .gmap =3D gmap, + .young =3D false, + }; + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + + return priv.young; +} + +struct gmap_unmap_priv { + struct gmap *gmap; + struct kvm_memory_slot *slot; +}; + +static long _gmap_unmap_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct= dat_walk *w) +{ + struct gmap_unmap_priv *priv =3D w->priv; + struct folio *folio =3D NULL; + unsigned long vmaddr; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (ptep->s.pr && pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED) { + vmaddr =3D __gfn_to_hva_memslot(priv->slot, gfn); + gmap_helper_try_set_pte_unused(priv->gmap->kvm->mm, vmaddr); + } + if (ptep->s.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags)) + folio =3D pfn_folio(ptep->h.pfra); + pgste =3D gmap_ptep_xchg(priv->gmap, ptep, _PTE_EMPTY, pgste, gfn); + pgste_set_unlock(ptep, pgste); + if (folio) + uv_convert_from_secure_folio(folio); + + return 0; +} + +static long _gmap_unmap_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap_unmap_priv *priv =3D walk->priv; + struct folio *folio =3D NULL; + + if (crstep->h.fc) { + if (crstep->s.fc1.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap-= >flags)) + folio =3D phys_to_folio(crste_origin_large(*crstep)); + gmap_crstep_xchg(priv->gmap, crstep, _CRSTE_EMPTY(crstep->h.tt), gfn); + if (folio) + uv_convert_from_secure_folio(folio); + } + + return 0; +} + +/** + * gmap_unmap_gfn_range() - Unmap a range of guest addresses. + * @gmap: The gmap to act on. + * @slot: The memslot in which the range is located. + * @start: The first gfn to unmap. + * @end: The gfn after the last one to unmap. + * + * Context: Called with the kvm mmu write lock held. + * Return: false + */ +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _gmap_unmap_pte, + .pmd_entry =3D _gmap_unmap_crste, + .pud_entry =3D _gmap_unmap_crste, + }; + struct gmap_unmap_priv priv =3D { + .gmap =3D gmap, + .slot =3D slot, + }; + + lockdep_assert_held_write(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + return false; +} + +static union pgste __pte_test_and_clear_softdirty(union pte *ptep, union p= gste pgste, gfn_t gfn, + struct gmap *gmap) +{ + union pte pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (pte.h.p && !pte.s.sd)) + return pgste; + + /* + * If this page contains one or more prefixes of vCPUS that are currently + * running, do not reset the protection, leave it marked as dirty. + */ + if (!pgste.prefix_notif || gmap_mkold_prefix(gmap, gfn, gfn + 1)) { + pte.h.p =3D 1; + pte.s.sd =3D 0; + pgste =3D gmap_ptep_xchg(gmap, ptep, pte, pgste, gfn); + } + + mark_page_dirty(gmap->kvm, gfn); + + return pgste; +} + +static long _pte_test_and_clear_softdirty(union pte *ptep, gfn_t gfn, gfn_= t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __pte_test_and_clear_softdirty(ptep, pgste, gfn, gmap); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long _crste_test_and_clear_softdirty(union crste *table, gfn_t gfn,= gfn_t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, new; + + if (fatal_signal_pending(current)) + return 1; + crste =3D READ_ONCE(*table); + if (!crste.h.fc) + return 0; + if (crste.h.p && !crste.s.fc1.sd) + return 0; + + /* + * If this large page contains one or more prefixes of vCPUs that are + * currently running, do not reset the protection, leave it marked as + * dirty. + */ + if (!crste.s.fc1.prefix_notif || gmap_mkold_prefix(gmap, gfn, end)) { + new =3D crste; + new.h.p =3D 1; + new.s.fc1.sd =3D 0; + gmap_crstep_xchg(gmap, table, new, gfn); + } + + for ( ; gfn < end; gfn++) + mark_page_dirty(gmap->kvm, gfn); + + return 0; +} + +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops walk_ops =3D { + .pte_entry =3D _pte_test_and_clear_softdirty, + .pmd_entry =3D _crste_test_and_clear_softdirty, + .pud_entry =3D _crste_test_and_clear_softdirty, + }; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &walk_ops, 0, gmap); +} + +static int gmap_handle_minor_crste_fault(union asce asce, struct guest_fau= lt *f) +{ + union crste newcrste, oldcrste =3D READ_ONCE(*f->crstep); + + /* Somehow the crste is not large anymore, let the slow path deal with it= . */ + if (!oldcrste.h.fc) + return 1; + + f->pfn =3D PHYS_PFN(large_crste_to_phys(oldcrste, f->gfn)); + f->writable =3D oldcrste.s.fc1.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do. */ + if (!oldcrste.h.i && !(f->write_attempt && oldcrste.h.p)) + return 0; + + if (!f->write_attempt || oldcrste.s.fc1.w) { + f->write_attempt |=3D oldcrste.s.fc1.w && oldcrste.s.fc1.d; + newcrste =3D oldcrste; + newcrste.h.i =3D 0; + newcrste.s.fc1.y =3D 1; + if (f->write_attempt) { + newcrste.h.p =3D 0; + newcrste.s.fc1.d =3D 1; + newcrste.s.fc1.sd =3D 1; + } + if (!oldcrste.s.fc1.d && newcrste.s.fc1.d) + SetPageDirty(phys_to_page(crste_origin_large(newcrste))); + /* In case of races, let the slow path deal with it. */ + return !dat_crstep_xchg_atomic(f->crstep, oldcrste, newcrste, f->gfn, as= ce); + } + /* Trying to write on a read-only page, let the slow path deal with it. */ + return 1; +} + +static int _gmap_handle_minor_pte_fault(struct gmap *gmap, union pgste *pg= ste, + struct guest_fault *f) +{ + union pte newpte, oldpte =3D READ_ONCE(*f->ptep); + + f->pfn =3D oldpte.h.pfra; + f->writable =3D oldpte.s.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do. */ + if (!oldpte.h.i && !(f->write_attempt && oldpte.h.p)) + return 0; + /* Trying to write on a read-only page, let the slow path deal with it. */ + if (!oldpte.s.pr || (f->write_attempt && !oldpte.s.w)) + return 1; + + newpte =3D oldpte; + newpte.h.i =3D 0; + newpte.s.y =3D 1; + if (f->write_attempt) { + newpte.h.p =3D 0; + newpte.s.d =3D 1; + newpte.s.sd =3D 1; + } + if (!oldpte.s.d && newpte.s.d) + SetPageDirty(pfn_to_page(newpte.h.pfra)); + *pgste =3D gmap_ptep_xchg(gmap, f->ptep, newpte, *pgste, f->gfn); + + return 0; +} + +/** + * gmap_try_fixup_minor() -- Try to fixup a minor gmap fault. + * @gmap: The gmap whose fault needs to be resolved. + * @fault: Describes the fault that is being resolved. + * + * A minor fault is a fault that can be resolved quickly within gmap. + * The page is already mapped, the fault is only due to dirty/young tracki= ng. + * + * Return: 0 in case of success, < 0 in case of error, > 0 if the fault co= uld + * not be resolved and needs to go through the slow path. + */ +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault) +{ + union pgste pgste; + int rc; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(NULL, fault->gfn, gmap->asce, DAT_WALK_LEAF, TABLE_= TYPE_PAGE_TABLE, + &fault->crstep, &fault->ptep); + /* If a PTE or a leaf CRSTE could not be reached, slow path. */ + if (rc) + return 1; + + if (fault->ptep) { + pgste =3D pgste_get_lock(fault->ptep); + rc =3D _gmap_handle_minor_pte_fault(gmap, &pgste, fault); + if (!rc && fault->callback) + fault->callback(fault); + pgste_set_unlock(fault->ptep, pgste); + } else { + rc =3D gmap_handle_minor_crste_fault(gmap->asce, fault); + if (!rc && fault->callback) + fault->callback(fault); + } + return rc; +} + +static inline bool gmap_2g_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) +{ + unsigned int order; + int rc, level; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + level =3D TABLE_TYPE_PAGE_TABLE; + if (f->page) { + order =3D folio_order(page_folio(f->page)); + if (order >=3D get_order(_REGION3_SIZE) && gmap_2g_allowed(gmap, f->gfn)) + level =3D TABLE_TYPE_REGION3; + else if (order >=3D get_order(_SEGMENT_SIZE) && gmap_1m_allowed(gmap, f-= >gfn)) + level =3D TABLE_TYPE_SEGMENT; + } + rc =3D dat_link(mc, gmap->asce, level, uses_skeys(gmap), f); + KVM_BUG_ON(rc =3D=3D -EINVAL, gmap->kvm); + return rc; +} + +static int gmap_ucas_map_one(struct kvm_s390_mmu_cache *mc, struct gmap *g= map, + gfn_t p_gfn, gfn_t c_gfn, bool force_alloc) +{ + struct page_table *pt; + union crste newcrste; + union crste *crstep; + union pte *ptep; + int rc; + + if (force_alloc) + rc =3D dat_entry_walk(mc, p_gfn, gmap->parent->asce, DAT_WALK_ALLOC, + TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + else + rc =3D dat_entry_walk(mc, p_gfn, gmap->parent->asce, DAT_WALK_ALLOC_CONT= INUE, + TABLE_TYPE_SEGMENT, &crstep, &ptep); + if (rc) + return rc; + if (!ptep) { + newcrste =3D _crste_fc0(p_gfn, TABLE_TYPE_SEGMENT); + newcrste.h.i =3D 1; + newcrste.h.fc0.tl =3D 1; + } else { + pt =3D pte_table_start(ptep); + dat_set_ptval(pt, PTVAL_VMADDR, p_gfn >> (_SEGMENT_SHIFT - PAGE_SHIFT)); + newcrste =3D _crste_fc0(virt_to_pfn(pt), TABLE_TYPE_SEGMENT); + } + rc =3D dat_entry_walk(mc, c_gfn, gmap->asce, DAT_WALK_ALLOC, TABLE_TYPE_S= EGMENT, + &crstep, &ptep); + if (rc) + return rc; + dat_crstep_xchg(crstep, newcrste, c_gfn, gmap->asce); + return 0; +} + +static int gmap_ucas_translate_simple(struct gmap *gmap, gpa_t *gaddr, uni= on crste **crstepp) +{ + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, gpa_to_gfn(*gaddr), gmap->asce, DAT_WALK_CONT= INUE, + TABLE_TYPE_SEGMENT, crstepp, &ptep); + if (rc || (!ptep && !crste_is_ucas(**crstepp))) + return -EREMOTE; + if (!ptep) + return 1; + *gaddr &=3D ~_SEGMENT_MASK; + *gaddr |=3D dat_get_ptval(pte_table_start(ptep), PTVAL_VMADDR) << _SEGMEN= T_SHIFT; + return 0; +} + +/** + * gmap_ucas_translate() - Translate a vcpu address into a host gmap addre= ss + * @mc: The memory cache to be used for allocations. + * @gmap: The per-cpu gmap. + * @gaddr: Pointer to the address to be translated, will get overwritten w= ith + * the translated address in case of success. + * Translates the per-vCPU guest address into a fake guest address, which = can + * then be used with the fake memslots that are identity mapping userspace. + * This allows ucontrol VMs to use the normal fault resolution path, like + * normal VMs. + * + * Return: %0 in case of success, otherwise %-EREMOTE. + */ +int gmap_ucas_translate(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, = gpa_t *gaddr) +{ + gpa_t translated_address; + union crste *crstep; + gfn_t gfn; + int rc; + + gfn =3D gpa_to_gfn(*gaddr); + + scoped_guard(read_lock, &gmap->kvm->mmu_lock) { + rc =3D gmap_ucas_translate_simple(gmap, gaddr, &crstep); + if (rc <=3D 0) + return rc; + } + do { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) { + rc =3D gmap_ucas_translate_simple(gmap, gaddr, &crstep); + if (rc <=3D 0) + return rc; + translated_address =3D (*gaddr & ~_SEGMENT_MASK) | + (crstep->val & _SEGMENT_MASK); + rc =3D gmap_ucas_map_one(mc, gmap, gpa_to_gfn(translated_address), gfn,= true); + } + if (!rc) { + *gaddr =3D translated_address; + return 0; + } + if (rc !=3D -ENOMEM) + return -EREMOTE; + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + } while (1); + return 0; +} + +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count) +{ + struct kvm_s390_mmu_cache *mc; + int rc; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + while (count) { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + rc =3D gmap_ucas_map_one(mc, gmap, p_gfn, c_gfn, false); + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + continue; + } + if (rc) + return rc; + + count--; + c_gfn +=3D _PAGE_ENTRIES; + p_gfn +=3D _PAGE_ENTRIES; + } + return rc; +} + +static void gmap_ucas_unmap_one(struct gmap *gmap, gfn_t c_gfn) +{ + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, c_gfn, gmap->asce, 0, TABLE_TYPE_SEGMENT, &cr= step, &ptep); + if (!rc) + dat_crstep_xchg(crstep, _PMD_EMPTY, c_gfn, gmap->asce); +} + +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count) +{ + guard(read_lock)(&gmap->kvm->mmu_lock); + + for ( ; count; count--, c_gfn +=3D _PAGE_ENTRIES) + gmap_ucas_unmap_one(gmap, c_gfn); +} + +static long _gmap_split_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, newcrste; + + crste =3D READ_ONCE(*crstep); + newcrste =3D _CRSTE_EMPTY(crste.h.tt); + + while (crste_leaf(crste)) { + if (crste_prefix(crste)) + gmap_unmap_prefix(gmap, gfn, next); + if (crste.s.fc1.vsie_notif) + gmap_handle_vsie_unshadow_event(gmap, gfn); + if (dat_crstep_xchg_atomic(crstep, crste, newcrste, gfn, walk->asce)) + break; + crste =3D READ_ONCE(*crstep); + } + + if (need_resched()) + return next; + + return 0; +} + +void gmap_split_huge_pages(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { + .pmd_entry =3D _gmap_split_crste, + .pud_entry =3D _gmap_split_crste, + }; + gfn_t start =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, asce_end(gmap->asce), gmap->asce, + &ops, DAT_WALK_IGN_HOLES, gmap); + cond_resched(); + } while (start); +} + +static int _gmap_enable_skeys(struct gmap *gmap) +{ + gfn_t start =3D 0; + int rc; + + if (uses_skeys(gmap)) + return 0; + + set_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); + rc =3D gmap_helper_disable_cow_sharing(); + if (rc) { + clear_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); + return rc; + } + + do { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + start =3D dat_reset_skeys(gmap->asce, start); + cond_resched(); + } while (start); + return 0; +} + +int gmap_enable_skeys(struct gmap *gmap) +{ + int rc; + + mmap_write_lock(gmap->kvm->mm); + rc =3D _gmap_enable_skeys(gmap); + mmap_write_unlock(gmap->kvm->mm); + return rc; +} + +static long _destroy_pages_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + if (!ptep->s.pr) + return 0; + __kvm_s390_pv_destroy_page(phys_to_page(pte_origin(*ptep))); + if (need_resched()) + return next; + return 0; +} + +static long _destroy_pages_crste(union crste *crstep, gfn_t gfn, gfn_t nex= t, struct dat_walk *walk) +{ + phys_addr_t origin, cur, end; + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + origin =3D crste_origin_large(*crstep); + cur =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + for ( ; cur < end; cur +=3D PAGE_SIZE) + __kvm_s390_pv_destroy_page(phys_to_page(cur)); + if (need_resched()) + return next; + return 0; +} + +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _destroy_pages_pte, + .pmd_entry =3D _destroy_pages_crste, + .pud_entry =3D _destroy_pages_crste, + }; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, end, gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + if (interruptible && fatal_signal_pending(current)) + return -EINTR; + cond_resched(); + } while (start && start < end); + return 0; +} + +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level) +{ + struct vsie_rmap *rmap __free(kvfree) =3D NULL; + struct vsie_rmap *temp; + void __rcu **slot; + int rc =3D 0; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + lockdep_assert_held(&sg->host_to_rmap_lock); + + rmap =3D kzalloc(sizeof(*rmap), GFP_ATOMIC); + if (!rmap) + return -ENOMEM; + + rmap->r_gfn =3D r_gfn; + rmap->level =3D level; + slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, p_gfn); + if (slot) { + rmap->next =3D radix_tree_deref_slot_protected(slot, &sg->host_to_rmap_l= ock); + for (temp =3D rmap->next; temp; temp =3D temp->next) { + if (temp->val =3D=3D rmap->val) + return 0; + } + radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); + } else { + rmap->next =3D NULL; + rc =3D radix_tree_insert(&sg->host_to_rmap, p_gfn, rmap); + if (rc) + return rc; + } + rmap =3D NULL; + + return 0; +} + +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + union pte pte; + int flags, rc; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + lockdep_assert_held(&sg->parent->children_lock); + + flags =3D DAT_WALK_SPLIT_ALLOC | (uses_skeys(sg->parent) ? DAT_WALK_USES_= SKEYS : 0); + rc =3D dat_entry_walk(mc, p_gfn, sg->parent->asce, flags, + TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) + return rc; + if (level <=3D TABLE_TYPE_REGION1) { + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, p_gfn, r_gfn, level); + } + if (rc) + return rc; + + if (!pgste_get_trylock(ptep, &pgste)) + return -EAGAIN; + pte =3D ptep->s.pr ? *ptep : _pte(pfn, wr, false, false); + pte.h.p =3D 1; + pgste =3D _gmap_ptep_xchg(sg->parent, ptep, pte, pgste, p_gfn, false); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + __atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val); + if (need_resched()) + return next; + return 0; +} + +void gmap_set_cmma_all_dirty(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __set_cmma_dirty_pte, = }; + gfn_t gfn =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + gfn =3D _dat_walk_gfn_range(gfn, asce_end(gmap->asce), gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + cond_resched(); + } while (gfn); +} + +static void gmap_unshadow_level(struct gmap *sg, gfn_t r_gfn, int level) +{ + unsigned long align =3D PAGE_SIZE; + gpa_t gaddr =3D gfn_to_gpa(r_gfn); + union crste *crstep; + union crste crste; + union pte *ptep; + + if (level > TABLE_TYPE_PAGE_TABLE) + align =3D 1UL << (11 * level + _SEGMENT_SHIFT); + kvm_s390_vsie_gmap_notifier(sg, ALIGN_DOWN(gaddr, align), ALIGN(gaddr + 1= , align)); + if (dat_entry_walk(NULL, r_gfn, sg->asce, 0, level, &crstep, &ptep)) + return; + if (ptep) { + if (READ_ONCE(*ptep).val !=3D _PTE_EMPTY.val) + dat_ptep_xchg(ptep, _PTE_EMPTY, r_gfn, sg->asce, uses_skeys(sg)); + return; + } + crste =3D READ_ONCE(*crstep); + dat_crstep_clear(crstep, r_gfn, sg->asce); + if (crste_leaf(crste) || crste.h.i) + return; + if (is_pmd(crste)) + dat_free_pt(dereference_pmd(crste.pmd)); + else + dat_free_level(dereference_crste(crste), true); +} + +static void gmap_unshadow(struct gmap *sg) +{ + struct gmap_cache *gmap_cache, *next; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + KVM_BUG_ON(!sg->parent, sg->kvm); + + lockdep_assert_held(&sg->parent->children_lock); + + gmap_remove_child(sg); + kvm_s390_vsie_gmap_notifier(sg, 0, -1UL); + + list_for_each_entry_safe(gmap_cache, next, &sg->scb_users, list) { + gmap_cache->gmap =3D NULL; + list_del(&gmap_cache->list); + } + + gmap_put(sg); +} + +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct gmap *sg, *next; + gfn_t start, end; + + list_for_each_entry_safe(sg, next, &parent->children, list) { + start =3D sg->guest_asce.rsto; + end =3D start + sg->guest_asce.tl + 1; + if (!sg->guest_asce.r && gfn >=3D start && gfn < end) { + gmap_unshadow(sg); + continue; + } + scoped_guard(spinlock, &sg->host_to_rmap_lock) + head =3D radix_tree_delete(&sg->host_to_rmap, gfn); + gmap_for_each_rmap_safe(rmap, rnext, head) + gmap_unshadow_level(sg, rmap->r_gfn, rmap->level); + } +} + +/** + * gmap_find_shadow() - Find a specific ASCE in the list of shadow tables. + * @parent: Pointer to the parent gmap. + * @asce: ASCE for which the shadow table is created. + * @edat_level: Edat level to be used for the shadow translation. + * + * Context: Called with parent->children_lock held. + * + * Return: The pointer to a gmap if a shadow table with the given asce is + * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, + * otherwise NULL. + */ +static struct gmap *gmap_find_shadow(struct gmap *parent, union asce asce,= int edat_level) +{ + struct gmap *sg; + + lockdep_assert_held(&parent->children_lock); + list_for_each_entry(sg, &parent->children, list) { + if (!gmap_is_shadow_valid(sg, asce, edat_level)) + continue; + return sg; + } + return NULL; +} + +static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) +{ + KVM_BUG_ON(1, sg->kvm); + return -EINVAL; +} + +/** + * gmap_create_shadow() - Create/find a shadow guest address space. + * @mc: The cache to use to allocate dat tables. + * @parent: Pointer to the parent gmap. + * @asce: ASCE for which the shadow table is created. + * @edat_level: Edat level to be used for the shadow translation. + * + * The pages of the top level page table referred by the asce parameter + * will be set to read-only and marked in the PGSTEs of the kvm process. + * The shadow table will be removed automatically on any change to the + * PTE mapping for the source table. + * + * Return: A guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, + * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the + * parent gmap table could not be protected. + */ +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *parent, + union asce asce, int edat_level) +{ + struct gmap *sg, *new; + int rc; + + scoped_guard(spinlock, &parent->children_lock) + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) + return sg; + /* Create a new shadow gmap. */ + new =3D gmap_new(parent->kvm, asce.r ? 1UL << (64 - PAGE_SHIFT) : asce_en= d(asce)); + if (!new) + return ERR_PTR(-ENOMEM); + new->guest_asce =3D asce; + new->edat_level =3D edat_level; + set_bit(GMAP_FLAG_SHADOW, &new->flags); + + scoped_guard(spinlock, &parent->children_lock) { + /* Recheck if another CPU created the same shadow. */ + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) { + gmap_put(new); + return sg; + } + if (asce.r) { + /* Only allow one real-space gmap shadow. */ + list_for_each_entry(sg, &parent->children, list) { + if (sg->guest_asce.r) { + scoped_guard(write_lock, &parent->kvm->mmu_lock) + gmap_unshadow(sg); + break; + } + } + gmap_add_child(parent, new); + /* Nothing to protect, return right away. */ + return new; + } + } + + new->parent =3D parent; + /* Protect while inserting, protects against invalidation races. */ + rc =3D gmap_protect_asce_top_level(mc, new); + if (rc) { + new->parent =3D NULL; + gmap_put(new); + return ERR_PTR(rc); + } + return new; +} diff --git a/arch/s390/kvm/gmap.h b/arch/s390/kvm/gmap.h new file mode 100644 index 000000000000..ccb5cd751e31 --- /dev/null +++ b/arch/s390/kvm/gmap.h @@ -0,0 +1,244 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2016, 2025 + * Author(s): Martin Schwidefsky + * Claudio Imbrenda + */ + +#ifndef ARCH_KVM_S390_GMAP_H +#define ARCH_KVM_S390_GMAP_H + +#include "dat.h" + +/** + * enum gmap_flags - Flags of a gmap. + * + * @GMAP_FLAG_SHADOW: The gmap is a vsie shadow gmap. + * @GMAP_FLAG_OWNS_PAGETABLES: The gmap owns all dat levels; normally 1, i= s 0 + * only for ucontrol per-cpu gmaps, since they + * share the page tables with the main gmap. + * @GMAP_FLAG_IS_UCONTROL: The gmap is ucontrol (main gmap or per-cpu gmap= ). + * @GMAP_FLAG_ALLOW_HPAGE_1M: 1M hugepages are allowed for this gmap, + * independently of the page size used by users= pace. + * @GMAP_FLAG_ALLOW_HPAGE_2G: 2G hugepages are allowed for this gmap, + * independently of the page size used by users= pace. + * @GMAP_FLAG_PFAULT_ENABLED: Pfault is enabled for the gmap. + * @GMAP_FLAG_USES_SKEYS: If the guest uses storage keys. + * @GMAP_FLAG_USES_CMM: Whether the guest uses CMMA. + * @GMAP_FLAG_EXPORT_ON_UNMAP: Whether to export guest pages when unmappin= g. + */ +enum gmap_flags { + GMAP_FLAG_SHADOW =3D 0, + GMAP_FLAG_OWNS_PAGETABLES, + GMAP_FLAG_IS_UCONTROL, + GMAP_FLAG_ALLOW_HPAGE_1M, + GMAP_FLAG_ALLOW_HPAGE_2G, + GMAP_FLAG_PFAULT_ENABLED, + GMAP_FLAG_USES_SKEYS, + GMAP_FLAG_USES_CMM, + GMAP_FLAG_EXPORT_ON_UNMAP, +}; + +/** + * struct gmap_struct - Guest address space. + * + * @flags: GMAP_FLAG_* flags. + * @edat_level: The edat level of this shadow gmap. + * @kvm: The vm. + * @asce: The ASCE used by this gmap. + * @list: List head used in children gmaps for the children gmap list. + * @children_lock: Protects children and scb_users. + * @children: List of child gmaps of this gmap. + * @scb_users: List of vsie_scb that use this shadow gmap. + * @parent: Parent gmap of a child gmap. + * @guest_asce: Original ASCE of this shadow gmap. + * @host_to_rmap_lock: Protects host_to_rmap. + * @host_to_rmap: Radix tree mapping host addresses to guest addresses. + */ +struct gmap { + unsigned long flags; + unsigned char edat_level; + struct kvm *kvm; + union asce asce; + struct list_head list; + spinlock_t children_lock; /* Protects: children, scb_users */ + struct list_head children; + struct list_head scb_users; + struct gmap *parent; + union asce guest_asce; + spinlock_t host_to_rmap_lock; /* Protects host_to_rmap */ + struct radix_tree_root host_to_rmap; + refcount_t refcount; +}; + +struct gmap_cache { + struct list_head list; + struct gmap *gmap; +}; + +#define gmap_for_each_rmap_safe(pos, n, head) \ + for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) + +int s390_replace_asce(struct gmap *gmap); +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint= ); +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end); +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end); +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault); +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit); +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit); +void gmap_remove_child(struct gmap *child); +void gmap_dispose(struct gmap *gmap); +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *fault); +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end); +int gmap_set_limit(struct gmap *gmap, gfn_t limit); +int gmap_ucas_translate(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, = gpa_t *gaddr); +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count); +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count); +int gmap_enable_skeys(struct gmap *gmap); +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible); +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level); +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr); +void gmap_set_cmma_all_dirty(struct gmap *gmap); +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn); +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *gmap, + union asce asce, int edat_level); +void gmap_split_huge_pages(struct gmap *gmap); + +static inline bool uses_skeys(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); +} + +static inline bool uses_cmm(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_USES_CMM, &gmap->flags); +} + +static inline bool pfault_enabled(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_PFAULT_ENABLED, &gmap->flags); +} + +static inline bool is_ucontrol(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_IS_UCONTROL, &gmap->flags); +} + +static inline bool is_shadow(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_SHADOW, &gmap->flags); +} + +static inline bool owns_page_tables(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_OWNS_PAGETABLES, &gmap->flags); +} + +static inline struct gmap *gmap_put(struct gmap *gmap) +{ + if (refcount_dec_and_test(&gmap->refcount)) + gmap_dispose(gmap); + return NULL; +} + +static inline void gmap_get(struct gmap *gmap) +{ + WARN_ON_ONCE(unlikely(!refcount_inc_not_zero(&gmap->refcount))); +} + +static inline void gmap_handle_vsie_unshadow_event(struct gmap *parent, gf= n_t gfn) +{ + scoped_guard(spinlock, &parent->children_lock) + _gmap_handle_vsie_unshadow_event(parent, gfn); +} + +static inline bool gmap_mkold_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, true); +} + +static inline bool gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, false); +} + +static inline union pgste _gmap_ptep_xchg(struct gmap *gmap, union pte *pt= ep, union pte newpte, + union pgste pgste, gfn_t gfn, bool needs_lock) +{ + lockdep_assert_held(&gmap->kvm->mmu_lock); + if (!needs_lock) + lockdep_assert_held(&gmap->children_lock); + else + lockdep_assert_not_held(&gmap->children_lock); + + if (pgste.prefix_notif && (newpte.h.p || newpte.h.i)) { + pgste.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + 1); + } + if (pgste.vsie_notif && (ptep->h.p !=3D newpte.h.p || newpte.h.i)) { + pgste.vsie_notif =3D 0; + if (needs_lock) + gmap_handle_vsie_unshadow_event(gmap, gfn); + else + _gmap_handle_vsie_unshadow_event(gmap, gfn); + } + return __dat_ptep_xchg(ptep, pgste, newpte, gfn, gmap->asce, uses_skeys(g= map)); +} + +static inline union pgste gmap_ptep_xchg(struct gmap *gmap, union pte *pte= p, union pte newpte, + union pgste pgste, gfn_t gfn) +{ + return _gmap_ptep_xchg(gmap, ptep, newpte, pgste, gfn, true); +} + +static inline void _gmap_crstep_xchg(struct gmap *gmap, union crste *crste= p, union crste ne, + gfn_t gfn, bool needs_lock) +{ + unsigned long align =3D 8 + (is_pmd(*crstep) ? 0 : 11); + + lockdep_assert_held(&gmap->kvm->mmu_lock); + if (!needs_lock) + lockdep_assert_held(&gmap->children_lock); + + gfn =3D ALIGN_DOWN(gfn, align); + if (crste_prefix(*crstep) && (ne.h.p || ne.h.i || !crste_prefix(ne))) { + ne.s.fc1.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + align); + } + if (crste_leaf(*crstep) && crstep->s.fc1.vsie_notif && + (ne.h.p || ne.h.i || !ne.s.fc1.vsie_notif)) { + ne.s.fc1.vsie_notif =3D 0; + if (needs_lock) + gmap_handle_vsie_unshadow_event(gmap, gfn); + else + _gmap_handle_vsie_unshadow_event(gmap, gfn); + } + dat_crstep_xchg(crstep, ne, gfn, gmap->asce); +} + +static inline void gmap_crstep_xchg(struct gmap *gmap, union crste *crstep= , union crste ne, + gfn_t gfn) +{ + return _gmap_crstep_xchg(gmap, crstep, ne, gfn, true); +} + +/** + * gmap_is_shadow_valid() - check if a shadow guest address space matches = the + * given properties and is still valid. + * @sg: Pointer to the shadow guest address space structure. + * @asce: ASCE for which the shadow table is requested. + * @edat_level: Edat level to be used for the shadow translation. + * + * Return: true if the gmap shadow is still valid and matches the given + * properties and the caller can continue using it; false otherwise, the + * caller has to request a new shadow gmap in this case. + */ +static inline bool gmap_is_shadow_valid(struct gmap *sg, union asce asce, = int edat_level) +{ + return sg->guest_asce.val =3D=3D asce.val && sg->edat_level =3D=3D edat_l= evel; +} + +#endif /* ARCH_KVM_S390_GMAP_H */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03BC9426D22; Wed, 4 Feb 2026 15:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; cv=none; b=dAKJFDcn5TnEs+VQal6AGktxh+F60pxWPXo4Uwp0BxXU11juQGbrQFeFltOiboh9yYozDnkc6YmFJHhUQtLCvX8KNW1LklXsYtTgcNCGpEyakRrhLurpY8WoipONGzfkJ4YnAB99pTOcw9lld4NJi1F7vZ+3vo11ydc9pgXIs44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; c=relaxed/simple; bh=/p+i/QujsQQU/ZwL4OyIXyZpIIMJfZDVzHm3GO5CSd4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uIXDr4fMqvrWsTIudmUN9uj40jJg4IsDvOBin2VI6bDQDCmJHQD5N/IfDQknupkfXBOxnNAHEIKfiKttcvUjGICqjwYjSJfMDynkrHny13TE8zyjWRjOj3rHLIcGxUX6eyyDcLQr1BfF6zKOr9u07qS2OFi4jFh44xcmHyh5sgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=bbN6A8N3; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="bbN6A8N3" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6143djKs009940; Wed, 4 Feb 2026 15:03:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=pwlc668bdqJLP5XmI 5/C81RNIDr2mgzF7IgrHRephDc=; b=bbN6A8N3UAHDJHqxqW9yWgLoWwtSeTq/F 3Taez2cMetFWYwDTPU1VpHi/HeSt04HVJieYH7reR47TmWCGgX0K0b4aeq3adQ0b 2W6Hs3INze0bJOruNxgWiqpWwASmhy7vMAGnqfsOjv/GOH7srlx45mvtDq+lNS9T ocUlf8zQdAITMUCQQar0YIspnw8PjQ+VToLYR4EXv/eVIU7xcctjQHbUyLG8a377 /zIwsw4vcjhRplz3+O5Mta3ezBDGq3tNd+Xrle92ZliODT1Z5nAa2dfsCHROT93X NMqK224aZC13f9sA5XfnoZMyEJEmmm85TTcgomU0htpj/8qZfeSqw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw7j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614Dd0c3006033; Wed, 4 Feb 2026 15:03:09 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1x9jde5m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:09 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F36f547645058 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:06 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EE1E720040; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A56B12004B; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 20/29] KVM: s390: Add helper functions for fault handling Date: Wed, 4 Feb 2026 16:02:49 +0100 Message-ID: <20260204150259.60425-21-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835fae cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=5alz5MtixxWFdSgp2hYA:9 X-Proofpoint-GUID: Kct3_3uodc2xxxRwfk9QSrIVgJIlNPKo X-Proofpoint-ORIG-GUID: Kct3_3uodc2xxxRwfk9QSrIVgJIlNPKo X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX4KDvSnO51C7T Wi9T9SvTb7pFJ3x51FPEPXaQl88BgaroIjvh6iE4zr5NgyiTpKky8kNJLpM6FAEVhup702B8qzN RX+9ORtu3Jaq+NjNCahBhhc6O61VQG7wTSpndNCJ3JDedHU3Ya4StYV5Yzjx2rLGv/PSyrvECGj Yu2SzU/ypN2cuqWn4uZMlwdGJ6uhhKy6i2itXti9tvmx1601AzsDaA/e/yaSbZop95eKtFtNB9i fQL231Xh4oKmn9g2f/c/HfWgXN1b/4+BL3CBDSRt0b/y0CmDSqlCZRfx799EPDKkwJ+fywGaWzH +7NGPaJ2Kix7o3ziEdS60dag51fs+6rRXeYIITIV+NrDCFREGj/Ueb8hlpC5YmZKGfDfw5SmtPN xdSli3MLtQ2o15Ihxm87inO8PdXO9uH7v9GMBR9yJDyGnVWao2wasEtqDBXoGMZBPQ1aWN1KCbC 1gAj3vBrecz1woDHQkQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add some helper functions for handling multiple guest faults at the same time. This will be needed for VSIE, where a nested guest access also needs to access all the page tables that map it. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/faultin.c | 148 +++++++++++++++++++++++++++++++ arch/s390/kvm/faultin.h | 92 +++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/kvm-s390.h | 2 + 6 files changed, 245 insertions(+), 2 deletions(-) create mode 100644 arch/s390/kvm/faultin.c create mode 100644 arch/s390/kvm/faultin.h diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 6ba99870fc32..816776a8a8e3 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -442,6 +442,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; + void *mc; /* Placeholder */ }; =20 struct kvm_vm_stat { diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 21088265402c..1e2dcd3e2436 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o gmap.o +kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/faultin.c b/arch/s390/kvm/faultin.c new file mode 100644 index 000000000000..e37cd18200f5 --- /dev/null +++ b/arch/s390/kvm/faultin.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ +#include +#include + +#include "gmap.h" +#include "trace.h" +#include "faultin.h" + +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + +/* + * kvm_s390_faultin_gfn() - handle a dat fault. + * @vcpu: The vCPU whose gmap is to be fixed up, or NULL if operating on t= he VM. + * @kvm: The VM whose gmap is to be fixed up, or NULL if operating on a vC= PU. + * @f: The guest fault that needs to be resolved. + * + * Return: + * * 0 on success + * * < 0 in case of error + * * > 0 in case of guest exceptions + * + * Context: + * * The mm lock must not be held before calling + * * kvm->srcu must be held + * * may sleep + */ +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f) +{ + struct kvm_s390_mmu_cache *local_mc __free(kvm_s390_mmu_cache) =3D NULL; + struct kvm_s390_mmu_cache *mc =3D NULL; + struct kvm_memory_slot *slot; + unsigned long inv_seq; + int foll, rc =3D 0; + + foll =3D f->write_attempt ? FOLL_WRITE : 0; + foll |=3D f->attempt_pfault ? FOLL_NOWAIT : 0; + + if (vcpu) { + kvm =3D vcpu->kvm; + mc =3D vcpu->arch.mc; + } + + lockdep_assert_held(&kvm->srcu); + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (gmap_try_fixup_minor(kvm->arch.gmap, f) =3D=3D 0) + return 0; + } + + while (1) { + f->valid =3D false; + inv_seq =3D kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + if (vcpu) + slot =3D kvm_vcpu_gfn_to_memslot(vcpu, f->gfn); + else + slot =3D gfn_to_memslot(kvm, f->gfn); + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page); + + /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT)= . */ + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { + if (unlikely(!f->attempt_pfault)) + return -EAGAIN; + if (unlikely(!vcpu)) + return -EINVAL; + trace_kvm_s390_major_guest_pfault(vcpu); + if (kvm_arch_setup_async_pf(vcpu)) + return 0; + vcpu->stat.pfault_sync++; + /* Could not setup async pfault, try again synchronously. */ + foll &=3D ~FOLL_NOWAIT; + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page= ); + } + + /* Access outside memory, addressing exception. */ + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + /* Signal pending: try again. */ + if (f->pfn =3D=3D KVM_PFN_ERR_SIGPENDING) + return -EAGAIN; + /* Check if it's read-only memory; don't try to actually handle that cas= e. */ + if (f->pfn =3D=3D KVM_PFN_ERR_RO_FAULT) + return -EOPNOTSUPP; + /* Any other error. */ + if (is_error_pfn(f->pfn)) + return -EFAULT; + + if (!mc) { + local_mc =3D kvm_s390_new_mmu_cache(); + if (!local_mc) + return -ENOMEM; + mc =3D local_mc; + } + + /* Loop, will automatically release the faulted page. */ + if (mmu_invalidate_retry_gfn_unsafe(kvm, inv_seq, f->gfn)) { + kvm_release_faultin_page(kvm, f->page, true, false); + continue; + } + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (!mmu_invalidate_retry_gfn(kvm, inv_seq, f->gfn)) { + f->valid =3D true; + rc =3D gmap_link(mc, kvm->arch.gmap, f); + kvm_release_faultin_page(kvm, f->page, !!rc, f->write_attempt); + f->page =3D NULL; + } + } + kvm_release_faultin_page(kvm, f->page, true, false); + + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + } else if (rc !=3D -EAGAIN) { + return rc; + } + } +} + +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w) +{ + struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); + int foll =3D w ? FOLL_WRITE : 0; + + f->write_attempt =3D w; + f->gfn =3D gfn; + f->pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &f->writable, &f->page); + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + if (is_sigpending_pfn(f->pfn)) + return -EINTR; + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) + return -EAGAIN; + if (is_error_pfn(f->pfn)) + return -EFAULT; + + f->valid =3D true; + return 0; +} diff --git a/arch/s390/kvm/faultin.h b/arch/s390/kvm/faultin.h new file mode 100644 index 000000000000..f86176d2769c --- /dev/null +++ b/arch/s390/kvm/faultin.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_FAULTIN_H +#define __KVM_S390_FAULTIN_H + +#include + +#include "dat.h" + +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f); +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w); + +static inline int kvm_s390_faultin_gfn_simple(struct kvm_vcpu *vcpu, struc= t kvm *kvm, + gfn_t gfn, bool wr) +{ + struct guest_fault f =3D { .gfn =3D gfn, .write_attempt =3D wr, }; + + return kvm_s390_faultin_gfn(vcpu, kvm, &f); +} + +static inline int kvm_s390_get_guest_page_and_read_gpa(struct kvm *kvm, st= ruct guest_fault *f, + gpa_t gaddr, unsigned long *val) +{ + int rc; + + rc =3D kvm_s390_get_guest_page(kvm, f, gpa_to_gfn(gaddr), false); + if (rc) + return rc; + + *val =3D *(unsigned long *)phys_to_virt(pfn_to_phys(f->pfn) | offset_in_p= age(gaddr)); + + return 0; +} + +static inline void kvm_s390_release_multiple(struct kvm *kvm, struct guest= _fault *guest_faults, + int n, bool ignore) +{ + int i; + + for (i =3D 0; i < n; i++) { + kvm_release_faultin_page(kvm, guest_faults[i].page, ignore, + guest_faults[i].write_attempt); + guest_faults[i].page =3D NULL; + } +} + +static inline bool kvm_s390_multiple_faults_need_retry(struct kvm *kvm, un= signed long seq, + struct guest_fault *guest_faults, int n, + bool unsafe) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!guest_faults[i].valid) + continue; + if (unsafe && mmu_invalidate_retry_gfn_unsafe(kvm, seq, guest_faults[i].= gfn)) + return true; + if (!unsafe && mmu_invalidate_retry_gfn(kvm, seq, guest_faults[i].gfn)) + return true; + } + return false; +} + +static inline int kvm_s390_get_guest_pages(struct kvm *kvm, struct guest_f= ault *guest_faults, + gfn_t start, int n_pages, bool write_attempt) +{ + int i, rc; + + for (i =3D 0; i < n_pages; i++) { + rc =3D kvm_s390_get_guest_page(kvm, guest_faults + i, start + i, write_a= ttempt); + if (rc) + break; + } + return rc; +} + +#define kvm_s390_release_faultin_array(kvm, array, ignore) \ + kvm_s390_release_multiple(kvm, array, ARRAY_SIZE(array), ignore) + +#define kvm_s390_array_needs_retry_unsafe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), t= rue) + +#define kvm_s390_array_needs_retry_safe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), f= alse) + +#endif /* __KVM_S390_FAULTIN_H */ diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ec92e6361eab..2b5ecdc3814e 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4637,7 +4637,7 @@ bool kvm_arch_can_dequeue_async_page_present(struct k= vm_vcpu *vcpu) return true; } =20 -static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) { hva_t hva; struct kvm_arch_async_pf arch; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 65c950760993..9ce71c8433a1 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -470,6 +470,8 @@ static inline int kvm_s390_handle_dat_fault(struct kvm_= vcpu *vcpu, gpa_t gaddr, return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); } =20 +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + /* implemented in diag.c */ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu); =20 --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2277425CFD; Wed, 4 Feb 2026 15:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; cv=none; b=tHtR88i4l7DSBk0Vj/Z/BQPGTSVXjDrPS7M5XpOm0GQGjcG7fmUw4uTSL1iTT/NwMoesYILRlC1xBjNxCqIGwfE+zUChPJtamV+PxSGCZMj+FmXDOfZBVnNaTDKSdNHbgsqemcEeTfgVkvpbUPvvT8fplLBHHpRrdkgv5sx6M/M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217396; c=relaxed/simple; bh=SAteRiMdwX1LFAOgy8DSS/MHr4xWkv4xCMolIJlUaA4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=epj8+aJInq4m+aqWLFAbejDYoE/oGR6FxNl5lPk61vQCgyOxcVW528sLUV3Bz/c6hNaqgB/svtA+dODUrDlXcFHCzsGhu6CuKDNx+4MlpiN0unYmcz5GmdSaNmiRW0SQb7GDq/X56RcYHJUF5HBzH6s7s+3Lk47sVqZqsmu3zp8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=nc6XObIz; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="nc6XObIz" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614AG4kI027849; Wed, 4 Feb 2026 15:03:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=mypTH+XBNkKWj1LDS q3KYPnUvP/z9R0J703RZXHbfKE=; b=nc6XObIzCcYC3o6wLzpi8t8LUVs7rEAlS ZeesfcO1O2si7FnxRrvbi/sa9gRt+VKp6ixKy+jruo1+KrYiviMRaHv4/1Y1wxaY ZY9zle2RoegNukV8zrEUED1RvtwfRcPdlXKDLFiljqhYWDIBI6S+QCv/q1EGiYEl IA3qddsy4q1F7RvFdi+ekVN5quQ0a86eyyVEz1teCNCUkegYebE3boAI5fc1PKU8 VF45VOKAS6ZALWZ5C8fYW264EEitmiaND3UbU9hRK+MR25vWJST/lX4AInmJhxGJ BTjIvyD3qbNN6POzZTXjop9BjkA57uKsMY3V5/IQkjMFIHn832DPA== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rnx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614E5DsC027437; Wed, 4 Feb 2026 15:03:10 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1xs1dcrj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:09 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F36ID47645066 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:06 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 42F782004B; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 019F720043; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:05 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 21/29] KVM: s390: Add some helper functions needed for vSIE Date: Wed, 4 Feb 2026 16:02:50 +0100 Message-ID: <20260204150259.60425-22-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX486AhDgimvA5 gvQHhJpkSRRiXF1liUl0vcBGio99LrxT2EWQF9EKtMPgeMsXzMEjiIjhfGstFUibS6hkskb2/gL x7+lPO8cozzg7XljlwWVF95R61SZh5O6zIB0mwHHIgQAo1LRo75pzkAT2EJWVHmfsLws0U/KV6T mwi3Hrv8WaZJXpRsrJsfqZgAEmRlQupq9wJAXNq1K2c3FA9OKcV8VVzuKj/1BxDJWLN34njZO5H 7rlP53LacKRjhgpeNH6qRWIXqO8+zuEp0rtf5oLj2/hcmMRlHXbCOjWQZsMYKiuE45I2gSFIOCU KYHZFoXIJRT6J1HipnkZD0mwgO3bSFhyHWm2EgkTCg5ZTggYqAQl1OSEmsEKkxnwXrP8L/04dcR HNyoLEvz2shFpteRl/7vucbZFbt/EdLIkaFEyPJISnv3kWmeDWHg5rYLmS+40v1s61HZVR4dlWX Tsno6FW5AkLj9W8n9hw== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fae cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=f7MFh5bWiv51XAAf4ckA:9 X-Proofpoint-ORIG-GUID: nt9YTETfuYx2mXNOu4evByG9pOY6WTZg X-Proofpoint-GUID: nt9YTETfuYx2mXNOu4evByG9pOY6WTZg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Implement gmap_protect_asce_top_level(), which was a stub. This function was a stub due to cross dependencies with other patches. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/gmap.c | 74 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 72 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index 5736145ef53a..fea1c66fcabe 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -22,6 +22,7 @@ #include "dat.h" #include "gmap.h" #include "kvm-s390.h" +#include "faultin.h" =20 static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) { @@ -1091,10 +1092,79 @@ static struct gmap *gmap_find_shadow(struct gmap *p= arent, union asce asce, int e return NULL; } =20 +#define CRST_TABLE_PAGES (_CRST_TABLE_SIZE / PAGE_SIZE) +struct gmap_protect_asce_top_level { + unsigned long seq; + struct guest_fault f[CRST_TABLE_PAGES]; +}; + +static inline int __gmap_protect_asce_top_level(struct kvm_s390_mmu_cache = *mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc, i; + + guard(write_lock)(&sg->kvm->mmu_lock); + + if (kvm_s390_array_needs_retry_safe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + + scoped_guard(spinlock, &sg->parent->children_lock) { + for (i =3D 0; i < CRST_TABLE_PAGES; i++) { + if (!context->f[i].valid) + continue; + rc =3D gmap_protect_rmap(mc, sg, context->f[i].gfn, 0, context->f[i].pf= n, + TABLE_TYPE_REGION1 + 1, context->f[i].writable); + if (rc) + return rc; + } + gmap_add_child(sg->parent, sg); + } + + kvm_s390_release_faultin_array(sg->kvm, context->f, false); + return 0; +} + +static inline int _gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *= mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc; + + if (kvm_s390_array_needs_retry_unsafe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + rc =3D radix_tree_preload(GFP_KERNEL); + if (rc) + return rc; + rc =3D __gmap_protect_asce_top_level(mc, sg, context); + radix_tree_preload_end(); + } while (rc =3D=3D -ENOMEM); + + return rc; +} + static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) { - KVM_BUG_ON(1, sg->kvm); - return -EINVAL; + struct gmap_protect_asce_top_level context =3D {}; + union asce asce =3D sg->guest_asce; + int rc; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + + context.seq =3D sg->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + rc =3D kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + = 1, false); + if (rc > 0) + rc =3D -EFAULT; + if (!rc) + rc =3D _gmap_protect_asce_top_level(mc, sg, &context); + if (rc) + kvm_s390_release_faultin_array(sg->kvm, context.f, true); + return rc; } =20 /** --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90079425CE2; Wed, 4 Feb 2026 15:03:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; cv=none; b=PQOP3Oanh9FFPzTSCVgQ5LnjM/MMZ/hw6drOvhozHvRJzOckTBqVBR4qWEBiXgYg+uENUtw3qO2I5YIcFA8CuQaFXtk45WOQo4uYlt5nifpi74Hp30WKwR+FXy/pw+evmkJ72a3rTlnCHUIseoe3fF3Z/H7scjmE819JBSIyv5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217395; c=relaxed/simple; bh=igISKK7+v8BlipLv7r4rS7c4NKEeh58owjmh9HI5ruM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VrvJ0HQb3za85yquYJIn8P5uuEOHTOShmrLF7r6JEHBUTM+82bSutrz12yAQ40HtEAd1tyE6i/b+0UrEy+d84Iy5/WLRv3E7yqzdm0jAEd10DYKv3HrKKILrj5oNOOjA519Axcin74/MOQh1Axk3fqVan3nVdio0X/wCpdAUThQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=QA6Qg8rO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="QA6Qg8rO" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6149VkGQ016593; Wed, 4 Feb 2026 15:03:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=gswWEi089zPf+dLFl 41BgwM5CH+umFApMFEKUX6Stgg=; b=QA6Qg8rODGRFje9IDy/kQ+cSeDc0Iub92 uwjbkzL3pOFeMkl7RpGsE+VgNELWGgNgFVDx9+fYXhD+Wy8+D22AbTKmgBZI8Ifi zKH5KBHNnpgLe5RsdyK2Ic5VmV2CYSQ6uvvzySMrFZ40wHfJ7zvrakx5GTBh6eBs iomh0iXK0bAU0VVgG3/lDfdDgSMGV9N1fg9bEDpUJ4vZFISyhgHEngEV/ziG+Qex GdfsWDGWxdoYrt5TKWcfu+ic0rdefhR9866Za/rhK1diDTIrQNdI9RhEhQlC0WL/ s00I2FJ7Ex1XpoE1QGZau1ZRSUzLKhMv1jG7SJwkBbn60XOp4nu5Q== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c185gyw7p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:11 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CRlX5004425; Wed, 4 Feb 2026 15:03:10 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1wjjwk93-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F36wF47645070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:06 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B3402004E; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A5F52004D; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 22/29] KVM: s390: Stop using CONFIG_PGSTE Date: Wed, 4 Feb 2026 16:02:51 +0100 Message-ID: <20260204150259.60425-23-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=UdxciaSN c=1 sm=1 tr=0 ts=69835faf cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=WeuG7GAIFJ3A7Xv3FmEA:9 X-Proofpoint-GUID: zrOETy3f5hFM0-09MtlXRYtyhsQDNnBT X-Proofpoint-ORIG-GUID: zrOETy3f5hFM0-09MtlXRYtyhsQDNnBT X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX38vcIVH+78xl u+DDYpwfJdT51IsfhEagUYvNBdJjwznre+CLd23Li9JGL7wmAynt5BdfoILYlwrtwnS1cClrwQA 5xIffXxpO/DsTd6Xliz6Dvmyf0xVAVpKH3KxfSayokWa6vKfVaF8kUXkZ13vhe2YDzBUK85VY2k Th/bmp7b7QrEMIPIoErRY7JhlbR0l0RSlzDL7bOXX7SjhlDHlKXD94wwcPJE346sNfo0cyHGcgP upe89uSXYMMft25QIT4msaL1XZP8MvrLIih8WV2fJmMB/pWhoYFv9UYd3lrOJqWqiYNuhdbgcdQ ArSNRKosFnxCcqelWXWEG1XKdU99qAbwyiUcUXFJbLkYg+WRxoCHxCbijg7MsrP6LVGT4ysEboA nCHAStyHBmHO3wtpNT4psBQ8aLNFeqttzS3J7tyqfNUMFRDN8mNi02HDJlP1y1PihCSebdzjQu+ WgHBOd1PedVNwQyGgTg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Switch to using IS_ENABLED(CONFIG_KVM) instead of CONFIG_PGSTE, since the latter will be removed soon. Many CONFIG_PGSTE are left behind, because they will be removed completely in upcoming patches. The ones replaced here are mostly the ones that will stay. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden Acked-by: Heiko Carstens --- arch/s390/include/asm/mmu_context.h | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/s390/mm/fault.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index d9b8501bc93d..48e548c01daa 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -29,7 +29,7 @@ static inline int init_new_context(struct task_struct *ts= k, atomic_set(&mm->context.protected_count, 0); mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) mm->context.has_pgste =3D 0; mm->context.uses_skeys =3D 0; mm->context.uses_cmm =3D 0; diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 04335f5e7f47..cd4d135c4503 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -577,7 +577,7 @@ static inline int mm_has_pgste(struct mm_struct *mm) =20 static inline int mm_is_protected(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (unlikely(atomic_read(&mm->context.protected_count))) return 1; #endif @@ -632,7 +632,7 @@ static inline pud_t set_pud_bit(pud_t pud, pgprot_t pro= t) #define mm_forbids_zeropage mm_forbids_zeropage static inline int mm_forbids_zeropage(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (!mm->context.allow_cow_sharing) return 1; #endif diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index e2e13778c36a..a52aa7a99b6b 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -403,7 +403,7 @@ void do_dat_exception(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_dat_exception); =20 -#if IS_ENABLED(CONFIG_PGSTE) +#if IS_ENABLED(CONFIG_KVM) =20 void do_secure_storage_access(struct pt_regs *regs) { @@ -470,4 +470,4 @@ void do_secure_storage_access(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_secure_storage_access); =20 -#endif /* CONFIG_PGSTE */ +#endif /* CONFIG_KVM */ --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF479426ED1; Wed, 4 Feb 2026 15:03:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; cv=none; b=XL89JRMgDiFE+ru8qxM50lqBnOp5ZIvx1gi6Qi2bH2/yMBKBmyn0Qlstj/7Qiw4+JyR30sshGmzMfbJvlUIRI8u/bwf9wfG3e1INk5Qa2Qvi9cloaa+XDEOCEU6wzBpgMsdxGWrKsIjXCSbSwb3R5KTYhXyMOTc89UCuKT8Y2Lk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; c=relaxed/simple; bh=wKdozpnCyQvNwmdirVgYmW7edwrvcJKS2i2moIpVW20=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ED/n8g1WpIvBP02iOuNasQ8KOY1IX9uggRMHAibaKdIy4MN/wSmLU7rbF2W6HltVzT/0S8YSGMZ4ay4KynwDai4y3JXiJA89qF1Fa6I7mkTk+G6cySNazA3VbPKR2fI8OwU9eotP0XLbgVECU3a6fTRCUjJ7WucyPLuXshEj22g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=cx5RugB4; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="cx5RugB4" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61407h9e023046; Wed, 4 Feb 2026 15:03:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=74meehHv84eHrso5Q QwTJJrBsUKtPpVb5UjMLtZ0AV8=; b=cx5RugB48yizG+UcBpuQM9iaOE1zpAqQB oW1dC0kT2wgMRrkUR3kDk/JrX8vM/+SfnUDGbAW78gl4ttpr6mu44wrExELDQgpT viWDAdl7du87Y74FBq53E9Osnt57jGNqLKeC9ULeCKbWcVOWeach7Dfn2qJtr8ZH Kkb/af98DTZAFHmouNxms35sxXU/IPnR1hgVj5z8rbBLYq5O6LOmjuM3cAySXxX3 HwPUYIz0lXc8dxkW+5SJJuyLHYstKGoIipB419dySEKOff8ou2ePacCwIvR+uJqH cR7Z+TCLVukLdzU2tXVosYNM1+8jPHFVnYSUTzm6Daw82zvjUoptQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad75-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:11 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CV4Y1025812; Wed, 4 Feb 2026 15:03:10 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:10 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F36Hl53608902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:07 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DA89520040; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9299A2004F; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 23/29] KVM: s390: Storage key functions refactoring Date: Wed, 4 Feb 2026 16:02:52 +0100 Message-ID: <20260204150259.60425-24-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX8ya7OoAWG8Zn /WS8uIyOdKmUMDqDUfxTd1NZH+TnxTtkfPaW4p6uQWuaK9nK863v3xFZzrDGXP7O8aleRS8w/67 LbvGV7XmekAmlZ+aU9/nXOG1/QOXDa1F9/j9IoxhXn2IwSYWJyTRMrdWItO61kz0shStcjU/S5Q GN1KZBMlOnyvfo0Wc3cw/Ym/mU4wTZHPeo6HNoTgQO5lD9Lu5D8FoGwBIoMt+poPU1SG+YYHDMD 2Zzxdpi8RuC7rE57+ryFmDsjm3auev4o3AxxfDGzRgRIJV3e0ZMXdaAHwnA5BUus85dqzAd4e3G 4Z6I+dBCYoYm6H4q3o1UYNeC6GO+tT7w0xlBBY/IxmkPWK5IIcLRGbgasM8d7q2d/JMdw6/Jp+t Fvp5C59KvtUYIHRe0RFPThapkrmEyCGsJYUO4XWK397c3q5d5SIePyf44cvz5Uf0X5q5eZ8hJ3w oMI1ae54d8ESJ5LSn6Q== X-Proofpoint-GUID: dZ4qbL8XeosjKvpcrGH2_7nwFJjwudIt X-Proofpoint-ORIG-GUID: dZ4qbL8XeosjKvpcrGH2_7nwFJjwudIt X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fb0 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=HyGg85_KfVV_UHYI6-AA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Refactor some storage key functions to improve readability. Introduce helper functions that will be used in the next patches. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/gaccess.c | 38 +++++++++---------- arch/s390/kvm/gaccess.h | 4 +- arch/s390/kvm/kvm-s390.c | 80 +++++++++++++++------------------------- arch/s390/kvm/kvm-s390.h | 8 ++++ 4 files changed, 59 insertions(+), 71 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 9df868bddf9a..2649365bf054 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -961,7 +961,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, * *@old_addr contains the value at @gpa before the attempt to * exchange the value. * @new: The value to place at @gpa. - * @access_key: The access key to use for the guest access. + * @acc: The access key to use for the guest access. * @success: output value indicating if an exchange occurred. * * Atomically exchange the value at @gpa by @new, if it contains *@old. @@ -974,9 +974,8 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, - __uint128_t *old_addr, __uint128_t new, - u8 access_key, bool *success) +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 acc, bool *success) { gfn_t gfn =3D gpa_to_gfn(gpa); struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); @@ -1008,41 +1007,42 @@ int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa= _t gpa, int len, case 1: { u8 old; =20 - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, *old_addr, new, access_= key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); + *success =3D !ret && old =3D=3D old_addr->one; + old_addr->one =3D old; break; } case 2: { u16 old; =20 - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); + *success =3D !ret && old =3D=3D old_addr->two; + old_addr->two =3D old; break; } case 4: { u32 old; =20 - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); + *success =3D !ret && old =3D=3D old_addr->four; + old_addr->four =3D old; break; } case 8: { u64 old; =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); + *success =3D !ret && old =3D=3D old_addr->eight; + old_addr->eight =3D old; break; } case 16: { __uint128_t old; =20 - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, *old_addr, new= , access_key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, + new.sixteen, acc); + *success =3D !ret && old =3D=3D old_addr->sixteen; + old_addr->sixteen =3D old; break; } default: diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 3fde45a151f2..774cdf19998f 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,8 +206,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, __uint= 128_t *old, - __uint128_t new, u8 access_key, bool *success); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 access_key, bool *success); =20 /** * write_guest_with_key - copy data from kernel space to guest space diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2b5ecdc3814e..f5411e093fb5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2900,9 +2900,9 @@ static int mem_op_validate_common(struct kvm_s390_mem= _op *mop, u64 supported_fla static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op = *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; - int r, srcu_idx; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION | KVM_S390_MEMOP_F_CHECK_ONLY); @@ -2915,52 +2915,36 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, = struct kvm_s390_mem_op *mop) return -ENOMEM; } =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { - r =3D check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); - goto out_unlock; - } - if (acc_mode =3D=3D GACC_FETCH) { + if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) + return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); + + if (acc_mode =3D=3D GACC_STORE && copy_from_user(tmpbuf, uaddr, mop->siz= e)) + return -EFAULT; r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_FETCH, mop->key); + mop->size, acc_mode, mop->key); if (r) - goto out_unlock; - if (copy_to_user(uaddr, tmpbuf, mop->size)) - r =3D -EFAULT; - } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_unlock; - } - r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_STORE, mop->key); + return r; + if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); - - vfree(tmpbuf); - return r; + return 0; } =20 static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem= _op *mop) { void __user *uaddr =3D (void __user *)mop->buf; void __user *old_addr =3D (void __user *)mop->old_addr; - union { - __uint128_t quad; - char raw[sizeof(__uint128_t)]; - } old =3D { .quad =3D 0}, new =3D { .quad =3D 0 }; - unsigned int off_in_quad =3D sizeof(new) - mop->size; - int r, srcu_idx; - bool success; + union kvm_s390_quad old =3D { .sixteen =3D 0 }; + union kvm_s390_quad new =3D { .sixteen =3D 0 }; + bool success =3D false; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION); if (r) @@ -2972,25 +2956,21 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *k= vm, struct kvm_s390_mem_op *m */ if (mop->size > sizeof(new)) return -EINVAL; - if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size)) + if (copy_from_user(&new, uaddr, mop->size)) return -EFAULT; - if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size)) + if (copy_from_user(&old, old_addr, mop->size)) return -EFAULT; =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } - - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old.quad, - new.quad, mop->key, &success); - if (!success && copy_to_user(old_addr, &old.raw[off_in_quad], mop->size)) - r =3D -EFAULT; + r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, + mop->key, &success); =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); + if (!success && copy_to_user(old_addr, &old, mop->size)) + return -EFAULT; + } return r; } =20 diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9ce71c8433a1..c44c52266e26 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -22,6 +22,14 @@ =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 +union kvm_s390_quad { + __uint128_t sixteen; + unsigned long eight; + unsigned int four; + unsigned short two; + unsigned char one; +}; + static inline void kvm_s390_fpu_store(struct kvm_run *run) { fpu_stfpc(&run->s.regs.fpc); --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C3F442DFF7; Wed, 4 Feb 2026 15:03:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217405; cv=none; b=ML374YRWv0EGzV740BGRVaCdKWNisTAAhnyvAPqmuZ82z2lRvLAeOOoILVMnX4Q+XQpAI03CZqE32tFQVtxlS4mtIFYBhfuz1BLWY0CPA5maWmIdmaa8FDXsiSj60wM/ONAP95mAhGnCVJCDGKgm5H/n/bK+q/9DTFSpJqdItLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217405; c=relaxed/simple; bh=C8xEDf9VnzZ7vNcqGVIV9Af7KyBZwcsZcs20vqU57vI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OEfLFx6zvEnadE7uoPfFVWdI4jeXAOmYI+PSrzMT5bE0Mc5TgjNa01Yoq7GF1j1ZNYRCy1lGk1/mpJSEaLTDWLtdGTw88hrla3B5FC+FlRVt9OXBhvXcynSZ96UHa7hB8DDQIwbO1JBeHuZJjLsTQEOeOhVmjBrQZl5dtt6+IgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Anj7+OkF; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Anj7+OkF" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61407K0u024273; Wed, 4 Feb 2026 15:03:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=soXeN/90jginI23ov A8+zq0QdeCYBNEEtxtkMfF2Jc8=; b=Anj7+OkFoy0wRM9XCJmT3zFuNwLhllZpu rK4t8uLR+RNI9RqZPaCBJlyKoaCA1WwN3OVP1Ecbzg+ltTtiZBjuOLIua31Ld+uW doecctJH6rUsZzoB6OsgJRNqj2RSeEp2kmNW/q/P5nG1j3QuHdDN2sWy04a1Cv+X F74wvzxz3XVU8K/38wNXie5N4EQ/1/tNvLMMMdwVeiBeai6L/8qvWUxq0g3P1+6L uPZA2s1RbuXhaV1Q/c04vAf2vl160pch20nvvvmggisTPfYpJZtKDxx5DOmS4NOx 16goOgzR83BKmF4llG51r0btqwcMZ9JZksRX4Hg74iMrZjROVGAcw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19f6jg3f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614Do4TD005921; Wed, 4 Feb 2026 15:03:12 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1x9jde5v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:11 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F37GX59113848 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:07 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 458BE20040; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E5A8320043; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:06 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 24/29] KVM: s390: Switch to new gmap Date: Wed, 4 Feb 2026 16:02:53 +0100 Message-ID: <20260204150259.60425-25-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: vg9mkByCqS3dI6RSQ7rHETVQQC0EOB2X X-Authority-Analysis: v=2.4 cv=drTWylg4 c=1 sm=1 tr=0 ts=69835fb1 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=uKg510MlI9vT2tuH0HEA:9 a=9EL-mxEuePe28HE8:21 X-Proofpoint-ORIG-GUID: vg9mkByCqS3dI6RSQ7rHETVQQC0EOB2X X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX+54bIr3NO4NQ 8jNPwLskMNNjOa2j/4LXn5EZ5MHb2Mmv0Wu2bIQVRVcPpCdOroffJ6AkBFY/14turi39x1uL9aq +HTYbx8QTxptVFIBmA43/GDjvJQmLBG9Bsv+0+F7sv3ShHr+YQsRQwllZ2//9le6dwoHcarOWg8 al2jI22E120fYKQ7RV3oE+h6Jn3qttKT5UpiCZOOxGxFFpbi2MnyT9i7aWAHy4CJ67PC883J1Lu sAiMl1raWW0VPLolpNjB3CTN+KnfvLm/MVCWfqt5eZnNMpBVQoytmqjxY8NDePdZ1+jZjSxD0E5 IogFEn0jwm5+2c5e2Q6jvv0Qfh7BrM12nui9oQA+ZHk8uoh1FwWlL72Wv6ImVGNiDewf6tzEkEA GOQpiOfYtlv7vxy/W1mhiHFnK1NVjUStyeo2FWm/DTWSR9cZQgADMPFV6cf4V/B8UNtx7/dbIC9 i+Jxk028zF4sBixH74g== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Switch KVM/s390 to use the new gmap code. Remove includes to and include "gmap.h" instead; fix all the existing users of the old gmap functions to use the new ones instead. Fix guest storage key access functions to work with the new gmap. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/Kconfig | 2 +- arch/s390/include/asm/kvm_host.h | 5 +- arch/s390/include/asm/mmu_context.h | 4 - arch/s390/include/asm/tlb.h | 3 - arch/s390/include/asm/uaccess.h | 70 +-- arch/s390/include/asm/uv.h | 1 - arch/s390/kernel/uv.c | 114 +--- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c | 2 +- arch/s390/kvm/gaccess.c | 885 +++++++++++++++++----------- arch/s390/kvm/gaccess.h | 18 +- arch/s390/kvm/gmap-vsie.c | 141 ----- arch/s390/kvm/intercept.c | 15 +- arch/s390/kvm/interrupt.c | 6 +- arch/s390/kvm/kvm-s390.c | 781 +++++++----------------- arch/s390/kvm/kvm-s390.h | 19 +- arch/s390/kvm/priv.c | 213 +++---- arch/s390/kvm/pv.c | 174 ++++-- arch/s390/kvm/vsie.c | 168 +++--- arch/s390/lib/uaccess.c | 184 +----- arch/s390/mm/gmap_helpers.c | 38 +- 21 files changed, 1119 insertions(+), 1726 deletions(-) delete mode 100644 arch/s390/kvm/gmap-vsie.c diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 0e5fad5f06ca..8270754985e9 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -33,7 +33,7 @@ config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 config PGSTE - def_bool y if KVM + def_bool n =20 config AUDIT_ARCH def_bool y diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 816776a8a8e3..64a50f0862aa 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -442,7 +442,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; - void *mc; /* Placeholder */ + struct kvm_s390_mmu_cache *mc; }; =20 struct kvm_vm_stat { @@ -636,6 +636,8 @@ struct kvm_s390_pv { struct mutex import_lock; }; =20 +struct kvm_s390_mmu_cache; + struct kvm_arch { struct esca_block *sca; debug_info_t *dbf; @@ -675,6 +677,7 @@ struct kvm_arch { struct kvm_s390_pv pv; struct list_head kzdev_list; spinlock_t kzdev_list_lock; + struct kvm_s390_mmu_cache *mc; }; =20 #define KVM_HVA_ERR_BAD (-1UL) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index 48e548c01daa..bd1ef5e2d2eb 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -30,11 +30,7 @@ static inline int init_new_context(struct task_struct *t= sk, mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; #if IS_ENABLED(CONFIG_KVM) - mm->context.has_pgste =3D 0; - mm->context.uses_skeys =3D 0; - mm->context.uses_cmm =3D 0; mm->context.allow_cow_sharing =3D 1; - mm->context.allow_gmap_hpage_1m =3D 0; #endif switch (mm->context.asce_limit) { default: diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index 1e50f6f1ad9d..7354b42ee994 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -36,7 +36,6 @@ static inline bool __tlb_remove_folio_pages(struct mmu_ga= ther *tlb, =20 #include #include -#include =20 /* * Release the page cache reference for a pte removed by @@ -85,8 +84,6 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, p= gtable_t pte, tlb->mm->context.flush_mm =3D 1; tlb->freed_tables =3D 1; tlb->cleared_pmds =3D 1; - if (mm_has_pgste(tlb->mm)) - gmap_unlink(tlb->mm, (unsigned long *)pte, address); tlb_remove_ptdesc(tlb, virt_to_ptdesc(pte)); } =20 diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uacces= s.h index c5e02addcd67..dff035372601 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -471,65 +471,15 @@ do { \ #define arch_get_kernel_nofault __mvc_kernel_nofault #define arch_put_kernel_nofault __mvc_kernel_nofault =20 -void __cmpxchg_user_key_called_with_bad_pointer(void); - -int __cmpxchg_user_key1(unsigned long address, unsigned char *uval, - unsigned char old, unsigned char new, unsigned long key); -int __cmpxchg_user_key2(unsigned long address, unsigned short *uval, - unsigned short old, unsigned short new, unsigned long key); -int __cmpxchg_user_key4(unsigned long address, unsigned int *uval, - unsigned int old, unsigned int new, unsigned long key); -int __cmpxchg_user_key8(unsigned long address, unsigned long *uval, - unsigned long old, unsigned long new, unsigned long key); -int __cmpxchg_user_key16(unsigned long address, __uint128_t *uval, - __uint128_t old, __uint128_t new, unsigned long key); - -static __always_inline int _cmpxchg_user_key(unsigned long address, void *= uval, - __uint128_t old, __uint128_t new, - unsigned long key, int size) -{ - switch (size) { - case 1: return __cmpxchg_user_key1(address, uval, old, new, key); - case 2: return __cmpxchg_user_key2(address, uval, old, new, key); - case 4: return __cmpxchg_user_key4(address, uval, old, new, key); - case 8: return __cmpxchg_user_key8(address, uval, old, new, key); - case 16: return __cmpxchg_user_key16(address, uval, old, new, key); - default: __cmpxchg_user_key_called_with_bad_pointer(); - } - return 0; -} - -/** - * cmpxchg_user_key() - cmpxchg with user space target, honoring storage k= eys - * @ptr: User space address of value to compare to @old and exchange with - * @new. Must be aligned to sizeof(*@ptr). - * @uval: Address where the old value of *@ptr is written to. - * @old: Old value. Compared to the content pointed to by @ptr in order to - * determine if the exchange occurs. The old value read from *@ptr is - * written to *@uval. - * @new: New value to place at *@ptr. - * @key: Access key to use for checking storage key protection. - * - * Perform a cmpxchg on a user space target, honoring storage key protecti= on. - * @key alone determines how key checking is performed, neither - * storage-protection-override nor fetch-protection-override apply. - * The caller must compare *@uval and @old to determine if values have been - * exchanged. In case of an exception *@uval is set to zero. - * - * Return: 0: cmpxchg executed - * -EFAULT: an exception happened when trying to access *@ptr - * -EAGAIN: maxed out number of retries (byte and short only) - */ -#define cmpxchg_user_key(ptr, uval, old, new, key) \ -({ \ - __typeof__(ptr) __ptr =3D (ptr); \ - __typeof__(uval) __uval =3D (uval); \ - \ - BUILD_BUG_ON(sizeof(*(__ptr)) !=3D sizeof(*(__uval))); \ - might_fault(); \ - __chk_user_ptr(__ptr); \ - _cmpxchg_user_key((unsigned long)(__ptr), (void *)(__uval), \ - (old), (new), (key), sizeof(*(__ptr))); \ -}) +int __cmpxchg_key1(void *address, unsigned char *uval, unsigned char old, + unsigned char new, unsigned long key); +int __cmpxchg_key2(void *address, unsigned short *uval, unsigned short old, + unsigned short new, unsigned long key); +int __cmpxchg_key4(void *address, unsigned int *uval, unsigned int old, + unsigned int new, unsigned long key); +int __cmpxchg_key8(void *address, unsigned long *uval, unsigned long old, + unsigned long new, unsigned long key); +int __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_t old, + __uint128_t new, unsigned long key); =20 #endif /* __S390_UACCESS_H */ diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index 0744874ca6df..d919e69662f5 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -631,7 +631,6 @@ int uv_pin_shared(unsigned long paddr); int uv_destroy_folio(struct folio *folio); int uv_destroy_pte(pte_t pte); int uv_convert_from_secure_pte(pte_t pte); -int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb); int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio); int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb); int uv_convert_from_secure(unsigned long paddr); diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index cb4e8089fbca..a284f98d9716 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -209,39 +209,6 @@ int uv_convert_from_secure_pte(pte_t pte) return uv_convert_from_secure_folio(pfn_folio(pte_pfn(pte))); } =20 -/** - * should_export_before_import - Determine whether an export is needed - * before an import-like operation - * @uvcb: the Ultravisor control block of the UVC to be performed - * @mm: the mm of the process - * - * Returns whether an export is needed before every import-like operation. - * This is needed for shared pages, which don't trigger a secure storage - * exception when accessed from a different guest. - * - * Although considered as one, the Unpin Page UVC is not an actual import, - * so it is not affected. - * - * No export is needed also when there is only one protected VM, because t= he - * page cannot belong to the wrong VM in that case (there is no "other VM" - * it can belong to). - * - * Return: true if an export is needed before every import, otherwise fals= e. - */ -static bool should_export_before_import(struct uv_cb_header *uvcb, struct = mm_struct *mm) -{ - /* - * The misc feature indicates, among other things, that importing a - * shared page from a different protected VM will automatically also - * transfer its ownership. - */ - if (uv_has_feature(BIT_UV_FEAT_MISC)) - return false; - if (uvcb->cmd =3D=3D UVC_CMD_UNPIN_PAGE_SHARED) - return false; - return atomic_read(&mm->context.protected_count) > 1; -} - /* * Calculate the expected ref_count for a folio that would otherwise have = no * further pins. This was cribbed from similar functions in other places in @@ -313,20 +280,6 @@ int __make_folio_secure(struct folio *folio, struct uv= _cb_header *uvcb) } EXPORT_SYMBOL(__make_folio_secure); =20 -static int make_folio_secure(struct mm_struct *mm, struct folio *folio, st= ruct uv_cb_header *uvcb) -{ - int rc; - - if (!folio_trylock(folio)) - return -EAGAIN; - if (should_export_before_import(uvcb, mm)) - uv_convert_from_secure(folio_to_phys(folio)); - rc =3D __make_folio_secure(folio, uvcb); - folio_unlock(folio); - - return rc; -} - /** * s390_wiggle_split_folio() - try to drain extra references to a folio and * split the folio if it is large. @@ -414,56 +367,6 @@ int s390_wiggle_split_folio(struct mm_struct *mm, stru= ct folio *folio) } EXPORT_SYMBOL_GPL(s390_wiggle_split_folio); =20 -int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb) -{ - struct vm_area_struct *vma; - struct folio_walk fw; - struct folio *folio; - int rc; - - mmap_read_lock(mm); - vma =3D vma_lookup(mm, hva); - if (!vma) { - mmap_read_unlock(mm); - return -EFAULT; - } - folio =3D folio_walk_start(&fw, vma, hva, 0); - if (!folio) { - mmap_read_unlock(mm); - return -ENXIO; - } - - folio_get(folio); - /* - * Secure pages cannot be huge and userspace should not combine both. - * In case userspace does it anyway this will result in an -EFAULT for - * the unpack. The guest is thus never reaching secure mode. - * If userspace plays dirty tricks and decides to map huge pages at a - * later point in time, it will receive a segmentation fault or - * KVM_RUN will return -EFAULT. - */ - if (folio_test_hugetlb(folio)) - rc =3D -EFAULT; - else if (folio_test_large(folio)) - rc =3D -E2BIG; - else if (!pte_write(fw.pte) || (pte_val(fw.pte) & _PAGE_INVALID)) - rc =3D -ENXIO; - else - rc =3D make_folio_secure(mm, folio, uvcb); - folio_walk_end(&fw, vma); - mmap_read_unlock(mm); - - if (rc =3D=3D -E2BIG || rc =3D=3D -EBUSY) { - rc =3D s390_wiggle_split_folio(mm, folio); - if (!rc) - rc =3D -EAGAIN; - } - folio_put(folio); - - return rc; -} -EXPORT_SYMBOL_GPL(make_hva_secure); - /* * To be called with the folio locked or with an extra reference! This will * prevent kvm_s390_pv_make_secure() from touching the folio concurrently. @@ -474,21 +377,18 @@ int arch_make_folio_accessible(struct folio *folio) { int rc =3D 0; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - /* - * PG_arch_1 is used in 2 places: - * 1. for storage keys of hugetlb folios and KVM - * 2. As an indication that this small folio might be secure. This can - * overindicate, e.g. we set the bit before calling - * convert_to_secure. - * As secure pages are never large folios, both variants can co-exists. + * PG_arch_1 is used as an indication that this small folio might be + * secure. This can overindicate, e.g. we set the bit before calling + * convert_to_secure. */ if (!test_bit(PG_arch_1, &folio->flags.f)) return 0; =20 + /* Large folios cannot be secure. */ + if (WARN_ON_ONCE(folio_test_large(folio))) + return -EFAULT; + rc =3D uv_pin_shared(folio_to_phys(folio)); if (!rc) { clear_bit(PG_arch_1, &folio->flags.f); diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 1e2dcd3e2436..dac9d53b23d8 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -8,7 +8,7 @@ include $(srctree)/virt/kvm/Makefile.kvm ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o -kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index 53233dec8cad..d89d1c381522 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -10,13 +10,13 @@ =20 #include #include -#include #include #include #include "kvm-s390.h" #include "trace.h" #include "trace-s390.h" #include "gaccess.h" +#include "gmap.h" =20 static void do_discard_gfn_range(struct kvm_vcpu *vcpu, gfn_t gfn_start, g= fn_t gfn_end) { diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 2649365bf054..67de47a81a87 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -11,15 +11,43 @@ #include #include #include +#include +#include +#include #include #include -#include #include #include "kvm-s390.h" +#include "dat.h" +#include "gmap.h" #include "gaccess.h" +#include "faultin.h" =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 +union dat_table_entry { + unsigned long val; + union region1_table_entry pgd; + union region2_table_entry p4d; + union region3_table_entry pud; + union segment_table_entry pmd; + union page_table_entry pte; +}; + +#define WALK_N_ENTRIES 7 +#define LEVEL_MEM -2 +struct pgtwalk { + struct guest_fault raw_entries[WALK_N_ENTRIES]; + gpa_t last_addr; + int level; + bool p; +}; + +static inline struct guest_fault *get_entries(struct pgtwalk *w) +{ + return w->raw_entries - LEVEL_MEM; +} + /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to @@ -81,6 +109,28 @@ struct aste { /* .. more fields there */ }; =20 +union oac { + unsigned int val; + struct { + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac1; + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac2; + }; +}; + int ipte_lock_held(struct kvm *kvm) { if (sclp.has_siif) @@ -603,28 +653,16 @@ static int low_address_protection_enabled(struct kvm_= vcpu *vcpu, static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, enum gacc_mode mode, gpa_t gpa) { - u8 storage_key, access_control; - bool fetch_protected; - unsigned long hva; + union skey storage_key; int r; =20 - if (access_key =3D=3D 0) - return 0; - - hva =3D gfn_to_hva(kvm, gpa_to_gfn(gpa)); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &kvm->mmu_lock) + r =3D dat_get_storage_key(kvm->arch.gmap->asce, gpa_to_gfn(gpa), &storag= e_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); - if (access_control =3D=3D access_key) + if (access_key =3D=3D 0 || storage_key.acc =3D=3D access_key) return 0; - fetch_protected =3D storage_key & _PAGE_FP_BIT; - if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !fetch_protect= ed) + if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !storage_key.f= p) return 0; return PGM_PROTECTION; } @@ -667,8 +705,7 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu *v= cpu, u8 access_key, enum gacc_mode mode, union asce asce, gpa_t gpa, unsigned long ga, unsigned int len) { - u8 storage_key, access_control; - unsigned long hva; + union skey storage_key; int r; =20 /* access key 0 matches any storage key -> allow */ @@ -678,26 +715,23 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu = *vcpu, u8 access_key, * caller needs to ensure that gfn is accessible, so we can * assume that this cannot fail */ - hva =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gpa)); - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + r =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gpa), &stora= ge_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); /* access key matches storage key -> allow */ - if (access_control =3D=3D access_key) + if (storage_key.acc =3D=3D access_key) return 0; if (mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) { /* it is a fetch and fetch protection is off -> allow */ - if (!(storage_key & _PAGE_FP_BIT)) + if (!storage_key.fp) return 0; if (fetch_prot_override_applicable(vcpu, mode, asce) && fetch_prot_override_applies(ga, len)) return 0; } if (storage_prot_override_applicable(vcpu) && - storage_prot_override_applies(access_control)) + storage_prot_override_applies(storage_key.acc)) return 0; return PGM_PROTECTION; } @@ -797,37 +831,79 @@ static int access_guest_page_gpa(struct kvm *kvm, enu= m gacc_mode mode, gpa_t gpa return rc; } =20 +static int mvcos_key(void *to, const void *from, unsigned long size, u8 ds= t_key, u8 src_key) +{ + union oac spec =3D { + .oac1.key =3D dst_key, + .oac1.k =3D !!dst_key, + .oac2.key =3D src_key, + .oac2.k =3D !!src_key, + }; + int exception =3D PGM_PROTECTION; + + asm_inline volatile( + " lr %%r0,%[spec]\n" + "0: mvcos %[to],%[from],%[size]\n" + "1: lhi %[exc],0\n" + "2:\n" + EX_TABLE(0b, 2b) + EX_TABLE(1b, 2b) + : [size] "+d" (size), [to] "=3DQ" (*(char *)to), [exc] "+d" (exception) + : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) + : "memory", "cc", "0"); + return exception; +} + +struct acc_page_key_context { + void *data; + int exception; + unsigned short offset; + unsigned short len; + bool store; + u8 access_key; +}; + +static void _access_guest_page_with_key_gpa(struct guest_fault *f) +{ + struct acc_page_key_context *context =3D f->priv; + void *ptr; + int r; + + ptr =3D __va(PFN_PHYS(f->pfn) | context->offset); + + if (context->store) + r =3D mvcos_key(ptr, context->data, context->len, context->access_key, 0= ); + else + r =3D mvcos_key(context->data, ptr, context->len, 0, context->access_key= ); + + context->exception =3D r; +} + static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) + void *data, unsigned int len, u8 acc) { - struct kvm_memory_slot *slot; - bool writable; - gfn_t gfn; - hva_t hva; + struct acc_page_key_context context =3D { + .offset =3D offset_in_page(gpa), + .len =3D len, + .data =3D data, + .access_key =3D acc, + .store =3D mode =3D=3D GACC_STORE, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D mode =3D=3D GACC_STORE, + .callback =3D _access_guest_page_with_key_gpa, + }; int rc; =20 - gfn =3D gpa_to_gfn(gpa); - slot =3D gfn_to_memslot(kvm, gfn); - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); + if (KVM_BUG_ON((len + context.offset) > PAGE_SIZE, kvm)) + return -EINVAL; =20 - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a ro memslot, even tho that can't occur (they're unsuppo= rted). - * Don't try to actually handle that case. - */ - if (!writable && mode =3D=3D GACC_STORE) - return -EOPNOTSUPP; - hva +=3D offset_in_page(gpa); - if (mode =3D=3D GACC_STORE) - rc =3D copy_to_user_key((void __user *)hva, data, len, access_key); - else - rc =3D copy_from_user_key(data, (void __user *)hva, len, access_key); + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); if (rc) - return PGM_PROTECTION; - if (mode =3D=3D GACC_STORE) - mark_page_dirty_in_slot(kvm, slot, gfn); - return 0; + return rc; + return context.exception; } =20 int access_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, void *data, @@ -950,16 +1026,100 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsign= ed long gra, return rc; } =20 +/** + * __cmpxchg_with_key() - Perform cmpxchg, honoring storage keys. + * @ptr: Address of value to compare to *@old and exchange with + * @new. Must be aligned to @size. + * @old: Old value. Compared to the content pointed to by @ptr in order to + * determine if the exchange occurs. The old value read from *@ptr is + * written here. + * @new: New value to place at *@ptr. + * @size: Size of the operation in bytes, may only be a power of two up to= 16. + * @access_key: Access key to use for checking storage key protection. + * + * Perform a cmpxchg on guest memory, honoring storage key protection. + * @access_key alone determines how key checking is performed, neither + * storage-protection-override nor fetch-protection-override apply. + * In case of an exception *@uval is set to zero. + * + * Return: + * * %0: cmpxchg executed successfully + * * %1: cmpxchg executed unsuccessfully + * * %PGM_PROTECTION: an exception happened when trying to access *@ptr + * * %-EAGAIN: maxed out number of retries (byte and short only) + * * %-EINVAL: invalid value for @size + */ +static int __cmpxchg_with_key(union kvm_s390_quad *ptr, union kvm_s390_qua= d *old, + union kvm_s390_quad new, int size, u8 access_key) +{ + union kvm_s390_quad tmp =3D { .sixteen =3D 0 }; + int rc; + + /* + * The cmpxchg_key macro depends on the type of "old", so we need + * a case for each valid length and get some code duplication as long + * as we don't introduce a new macro. + */ + switch (size) { + case 1: + rc =3D __cmpxchg_key1(&ptr->one, &tmp.one, old->one, new.one, access_key= ); + break; + case 2: + rc =3D __cmpxchg_key2(&ptr->two, &tmp.two, old->two, new.two, access_key= ); + break; + case 4: + rc =3D __cmpxchg_key4(&ptr->four, &tmp.four, old->four, new.four, access= _key); + break; + case 8: + rc =3D __cmpxchg_key8(&ptr->eight, &tmp.eight, old->eight, new.eight, ac= cess_key); + break; + case 16: + rc =3D __cmpxchg_key16(&ptr->sixteen, &tmp.sixteen, old->sixteen, new.si= xteen, + access_key); + break; + default: + return -EINVAL; + } + if (!rc && memcmp(&tmp, old, size)) + rc =3D 1; + *old =3D tmp; + /* + * Assume that the fault is caused by protection, either key protection + * or user page write protection. + */ + if (rc =3D=3D -EFAULT) + rc =3D PGM_PROTECTION; + return rc; +} + +struct cmpxchg_key_context { + union kvm_s390_quad new; + union kvm_s390_quad *old; + int exception; + unsigned short offset; + u8 access_key; + u8 len; +}; + +static void _cmpxchg_guest_abs_with_key(struct guest_fault *f) +{ + struct cmpxchg_key_context *context =3D f->priv; + + context->exception =3D __cmpxchg_with_key(__va(PFN_PHYS(f->pfn) | context= ->offset), + context->old, context->new, context->len, + context->access_key); +} + /** * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute addres= s. * @kvm: Virtual machine instance. * @gpa: Absolute guest address of the location to be changed. * @len: Operand length of the cmpxchg, required: 1 <=3D len <=3D 16. Prov= iding a * non power of two will result in failure. - * @old_addr: Pointer to old value. If the location at @gpa contains this = value, - * the exchange will succeed. After calling cmpxchg_guest_abs_w= ith_key() - * *@old_addr contains the value at @gpa before the attempt to - * exchange the value. + * @old: Pointer to old value. If the location at @gpa contains this value, + * the exchange will succeed. After calling cmpxchg_guest_abs_with_k= ey() + * *@old contains the value at @gpa before the attempt to + * exchange the value. * @new: The value to place at @gpa. * @acc: The access key to use for the guest access. * @success: output value indicating if an exchange occurred. @@ -974,89 +1134,36 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigne= d long gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 acc, bool *success) { - gfn_t gfn =3D gpa_to_gfn(gpa); - struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); - bool writable; - hva_t hva; - int ret; - - if (!IS_ALIGNED(gpa, len)) - return -EINVAL; - - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a read-only memslot, even though that cannot occur - * since those are unsupported. - * Don't try to actually handle that case. - */ - if (!writable) - return -EOPNOTSUPP; - - hva +=3D offset_in_page(gpa); - /* - * The cmpxchg_user_key macro depends on the type of "old", so we need - * a case for each valid length and get some code duplication as long - * as we don't introduce a new macro. - */ - switch (len) { - case 1: { - u8 old; - - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); - *success =3D !ret && old =3D=3D old_addr->one; - old_addr->one =3D old; - break; - } - case 2: { - u16 old; - - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); - *success =3D !ret && old =3D=3D old_addr->two; - old_addr->two =3D old; - break; - } - case 4: { - u32 old; - - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); - *success =3D !ret && old =3D=3D old_addr->four; - old_addr->four =3D old; - break; - } - case 8: { - u64 old; + struct cmpxchg_key_context context =3D { + .old =3D old, + .new =3D new, + .offset =3D offset_in_page(gpa), + .len =3D len, + .access_key =3D acc, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D true, + .callback =3D _cmpxchg_guest_abs_with_key, + }; + int rc; =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); - *success =3D !ret && old =3D=3D old_addr->eight; - old_addr->eight =3D old; - break; - } - case 16: { - __uint128_t old; + lockdep_assert_held(&kvm->srcu); =20 - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, - new.sixteen, acc); - *success =3D !ret && old =3D=3D old_addr->sixteen; - old_addr->sixteen =3D old; - break; - } - default: + if (len > 16 || !IS_ALIGNED(gpa, len)) return -EINVAL; - } - if (*success) - mark_page_dirty_in_slot(kvm, slot, gfn); - /* - * Assume that the fault is caused by protection, either key protection - * or user page write protection. - */ - if (ret =3D=3D -EFAULT) - ret =3D PGM_PROTECTION; - return ret; + + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); + if (rc) + return rc; + *success =3D !context.exception; + if (context.exception =3D=3D 1) + return 0; + return context.exception; } =20 /** @@ -1158,304 +1265,372 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_= vcpu *vcpu, unsigned long gra) } =20 /** - * kvm_s390_shadow_tables - walk the guest page table and create shadow ta= bles - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pgt: pointer to the beginning of the page table for the given address = if - * successful (return value 0), or to the first invalid DAT entry in - * case of exceptions (return value > 0) - * @dat_protection: referenced memory is write protected - * @fake: pgt references contiguous guest memory block, not a pgtable + * walk_guest_tables() - Walk the guest page table and pin the dat tables. + * @sg: Pointer to the shadow guest address space structure. + * @saddr: Faulting address in the shadow gmap. + * @w: Will be filled with information on the pinned pages. + * @wr: Wndicates a write access if true. + * + * Return: + * * %0 in case of success, + * * a PIC code > 0 in case the address translation fails + * * an error code < 0 if other errors happen in the host */ -static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, - unsigned long *pgt, int *dat_protection, - int *fake) +static int walk_guest_tables(struct gmap *sg, unsigned long saddr, struct = pgtwalk *w, bool wr) { - struct kvm *kvm; - struct gmap *parent; - union asce asce; + struct gmap *parent =3D sg->parent; + struct guest_fault *entries; + union dat_table_entry table; union vaddress vaddr; unsigned long ptr; + struct kvm *kvm; + union asce asce; int rc; =20 - *fake =3D 0; - *dat_protection =3D 0; - kvm =3D sg->private; - parent =3D sg->parent; + kvm =3D parent->kvm; + asce =3D sg->guest_asce; + entries =3D get_entries(w); + + w->level =3D LEVEL_MEM; + w->last_addr =3D saddr; + if (asce.r) + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, gpa_to_gfn(sadd= r), false); + vaddr.addr =3D saddr; - asce.val =3D sg->orig_asce; ptr =3D asce.rsto * PAGE_SIZE; - if (asce.r) { - *fake =3D 1; - ptr =3D 0; - asce.dt =3D ASCE_TYPE_REGION1; - } + + if (!asce_contains_gfn(asce, gpa_to_gfn(saddr))) + return PGM_ASCE_TYPE; switch (asce.dt) { case ASCE_TYPE_REGION1: - if (vaddr.rfx01 > asce.tl && !*fake) + if (vaddr.rfx01 > asce.tl) return PGM_REGION_FIRST_TRANS; break; case ASCE_TYPE_REGION2: - if (vaddr.rfx) - return PGM_ASCE_TYPE; if (vaddr.rsx01 > asce.tl) return PGM_REGION_SECOND_TRANS; break; case ASCE_TYPE_REGION3: - if (vaddr.rfx || vaddr.rsx) - return PGM_ASCE_TYPE; if (vaddr.rtx01 > asce.tl) return PGM_REGION_THIRD_TRANS; break; case ASCE_TYPE_SEGMENT: - if (vaddr.rfx || vaddr.rsx || vaddr.rtx) - return PGM_ASCE_TYPE; if (vaddr.sx01 > asce.tl) return PGM_SEGMENT_TRANSLATION; break; } =20 + w->level =3D asce.dt; switch (asce.dt) { - case ASCE_TYPE_REGION1: { - union region1_table_entry rfte; - - if (*fake) { - ptr +=3D vaddr.rfx * _REGION1_SIZE; - rfte.val =3D ptr; - goto shadow_r2t; - } - *pgt =3D ptr + vaddr.rfx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val); + case ASCE_TYPE_REGION1: + w->last_addr =3D ptr + vaddr.rfx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rfte.i) + if (table.pgd.i) return PGM_REGION_FIRST_TRANS; - if (rfte.tt !=3D TABLE_TYPE_REGION1) + if (table.pgd.tt !=3D TABLE_TYPE_REGION1) return PGM_TRANSLATION_SPEC; - if (vaddr.rsx01 < rfte.tf || vaddr.rsx01 > rfte.tl) + if (vaddr.rsx01 < table.pgd.tf || vaddr.rsx01 > table.pgd.tl) return PGM_REGION_SECOND_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rfte.p; - ptr =3D rfte.rto * PAGE_SIZE; -shadow_r2t: - rc =3D gmap_shadow_r2t(sg, saddr, rfte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r1_entry++; - } + w->p |=3D table.pgd.p; + ptr =3D table.pgd.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION2: { - union region2_table_entry rste; - - if (*fake) { - ptr +=3D vaddr.rsx * _REGION2_SIZE; - rste.val =3D ptr; - goto shadow_r3t; - } - *pgt =3D ptr + vaddr.rsx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val); + case ASCE_TYPE_REGION2: + w->last_addr =3D ptr + vaddr.rsx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rste.i) + if (table.p4d.i) return PGM_REGION_SECOND_TRANS; - if (rste.tt !=3D TABLE_TYPE_REGION2) + if (table.p4d.tt !=3D TABLE_TYPE_REGION2) return PGM_TRANSLATION_SPEC; - if (vaddr.rtx01 < rste.tf || vaddr.rtx01 > rste.tl) + if (vaddr.rtx01 < table.p4d.tf || vaddr.rtx01 > table.p4d.tl) return PGM_REGION_THIRD_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rste.p; - ptr =3D rste.rto * PAGE_SIZE; -shadow_r3t: - rste.p |=3D *dat_protection; - rc =3D gmap_shadow_r3t(sg, saddr, rste.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r2_entry++; - } + w->p |=3D table.p4d.p; + ptr =3D table.p4d.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION3: { - union region3_table_entry rtte; - - if (*fake) { - ptr +=3D vaddr.rtx * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; - } - *pgt =3D ptr + vaddr.rtx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val); + case ASCE_TYPE_REGION3: + w->last_addr =3D ptr + vaddr.rtx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rtte.i) + if (table.pud.i) return PGM_REGION_THIRD_TRANS; - if (rtte.tt !=3D TABLE_TYPE_REGION3) + if (table.pud.tt !=3D TABLE_TYPE_REGION3) return PGM_TRANSLATION_SPEC; - if (rtte.cr && asce.p && sg->edat_level >=3D 2) + if (table.pud.cr && asce.p && sg->edat_level >=3D 2) return PGM_TRANSLATION_SPEC; - if (rtte.fc && sg->edat_level >=3D 2) { - *dat_protection |=3D rtte.fc0.p; - *fake =3D 1; - ptr =3D rtte.fc1.rfaa * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; + if (sg->edat_level >=3D 1) + w->p |=3D table.pud.p; + if (table.pud.fc && sg->edat_level >=3D 2) { + table.val =3D u64_replace_bits(table.val, saddr, ~_REGION3_MASK); + goto edat_applies; } - if (vaddr.sx01 < rtte.fc0.tf || vaddr.sx01 > rtte.fc0.tl) + if (vaddr.sx01 < table.pud.fc0.tf || vaddr.sx01 > table.pud.fc0.tl) return PGM_SEGMENT_TRANSLATION; - if (sg->edat_level >=3D 1) - *dat_protection |=3D rtte.fc0.p; - ptr =3D rtte.fc0.sto * PAGE_SIZE; -shadow_sgt: - rtte.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_sgt(sg, saddr, rtte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r3_entry++; - } + ptr =3D table.pud.fc0.sto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_SEGMENT: { - union segment_table_entry ste; - - if (*fake) { - ptr +=3D vaddr.sx * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; - } - *pgt =3D ptr + vaddr.sx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val); + case ASCE_TYPE_SEGMENT: + w->last_addr =3D ptr + vaddr.sx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (ste.i) + if (table.pmd.i) return PGM_SEGMENT_TRANSLATION; - if (ste.tt !=3D TABLE_TYPE_SEGMENT) + if (table.pmd.tt !=3D TABLE_TYPE_SEGMENT) return PGM_TRANSLATION_SPEC; - if (ste.cs && asce.p) + if (table.pmd.cs && asce.p) return PGM_TRANSLATION_SPEC; - *dat_protection |=3D ste.fc0.p; - if (ste.fc && sg->edat_level >=3D 1) { - *fake =3D 1; - ptr =3D ste.fc1.sfaa * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; + w->p |=3D table.pmd.p; + if (table.pmd.fc && sg->edat_level >=3D 1) { + table.val =3D u64_replace_bits(table.val, saddr, ~_SEGMENT_MASK); + goto edat_applies; } - ptr =3D ste.fc0.pto * (PAGE_SIZE / 2); -shadow_pgt: - ste.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_pgt(sg, saddr, ste.val, *fake); + ptr =3D table.pmd.fc0.pto * (PAGE_SIZE / 2); + w->level--; + } + w->last_addr =3D ptr + vaddr.px * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); + if (rc) + return rc; + if (table.pte.i) + return PGM_PAGE_TRANSLATION; + if (table.pte.z) + return PGM_TRANSLATION_SPEC; + w->p |=3D table.pte.p; +edat_applies: + if (wr && w->p) + return PGM_PROTECTION; + + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, table.pte.pfra, = wr); +} + +static int _do_shadow_pte(struct gmap *sg, gpa_t raddr, union pte *ptep_h,= union pte *ptep, + struct guest_fault *f, bool p) +{ + union pgste pgste; + union pte newpte; + int rc; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, f->gfn, gpa_to_gfn(raddr), TABLE_TYPE_PAGE_T= ABLE); + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep_h); + newpte =3D _pte(f->pfn, f->writable, !p, 0); + newpte.s.d |=3D ptep->s.d; + newpte.s.sd |=3D ptep->s.sd; + newpte.h.p &=3D ptep->h.p; + pgste =3D _gmap_ptep_xchg(sg->parent, ptep_h, newpte, pgste, f->gfn, fals= e); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep_h, pgste); + + newpte =3D _pte(f->pfn, 0, !p, 0); + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, newpte, gpa_to_gfn(raddr), sg->asc= e, uses_skeys(sg)); + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static int _do_shadow_crste(struct gmap *sg, gpa_t raddr, union crste *hos= t, union crste *table, + struct guest_fault *f, bool p) +{ + union crste newcrste; + gfn_t gfn; + int rc; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + gfn =3D f->gfn & gpa_to_gfn(is_pmd(*table) ? _SEGMENT_MASK : _REGION3_MAS= K); + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, gfn, gpa_to_gfn(raddr), host->h.tt); + if (rc) + return rc; + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, f->writable, !p); + newcrste.s.fc1.d |=3D host->s.fc1.d; + newcrste.s.fc1.sd |=3D host->s.fc1.sd; + newcrste.h.p &=3D host->h.p; + newcrste.s.fc1.vsie_notif =3D 1; + newcrste.s.fc1.prefix_notif =3D host->s.fc1.prefix_notif; + _gmap_crstep_xchg(sg->parent, host, newcrste, f->gfn, false); + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, 0, !p); + dat_crstep_xchg(table, newcrste, gpa_to_gfn(raddr), sg->asce); + return 0; +} + +static int _gaccess_do_shadow(struct kvm_s390_mmu_cache *mc, struct gmap *= sg, + unsigned long saddr, struct pgtwalk *w) +{ + struct guest_fault *entries; + int flags, i, hl, gl, l, rc; + union crste *table, *host; + union pte *ptep, *ptep_h; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + entries =3D get_entries(w); + ptep_h =3D NULL; + ptep =3D NULL; + + rc =3D dat_entry_walk(NULL, gpa_to_gfn(saddr), sg->asce, DAT_WALK_ANY, TA= BLE_TYPE_PAGE_TABLE, + &table, &ptep); + if (rc) + return rc; + + /* A race occourred. The shadow mapping is already valid, nothing to do */ + if ((ptep && !ptep->h.i) || (!ptep && crste_leaf(*table))) + return 0; + + gl =3D get_level(table, ptep); + + /* + * Skip levels that are already protected. For each level, protect + * only the page containing the entry, not the whole table. + */ + for (i =3D gl ; i >=3D w->level; i--) { + rc =3D gmap_protect_rmap(mc, sg, entries[i - 1].gfn, gpa_to_gfn(saddr), + entries[i - 1].pfn, i, entries[i - 1].writable); + if (rc) + return rc; + } + + rc =3D dat_entry_walk(NULL, entries[LEVEL_MEM].gfn, sg->parent->asce, DAT= _WALK_LEAF, + TABLE_TYPE_PAGE_TABLE, &host, &ptep_h); + if (rc) + return rc; + + hl =3D get_level(host, ptep_h); + /* Get the smallest granularity */ + l =3D min3(gl, hl, w->level); + + flags =3D DAT_WALK_SPLIT_ALLOC | (uses_skeys(sg->parent) ? DAT_WALK_USES_= SKEYS : 0); + /* If necessary, create the shadow mapping */ + if (l < gl) { + rc =3D dat_entry_walk(mc, gpa_to_gfn(saddr), sg->asce, flags, l, &table,= &ptep); if (rc) return rc; - kvm->stat.gmap_shadow_sg_entry++; } + if (l < hl) { + rc =3D dat_entry_walk(mc, entries[LEVEL_MEM].gfn, sg->parent->asce, + flags, l, &host, &ptep_h); + if (rc) + return rc; } - /* Return the parent address of the page table */ - *pgt =3D ptr; - return 0; + + if (KVM_BUG_ON(l > TABLE_TYPE_REGION3, sg->kvm)) + return -EFAULT; + if (l =3D=3D TABLE_TYPE_PAGE_TABLE) + return _do_shadow_pte(sg, saddr, ptep_h, ptep, entries + LEVEL_MEM, w->p= ); + return _do_shadow_crste(sg, saddr, host, table, entries + LEVEL_MEM, w->p= ); } =20 -/** - * shadow_pgt_lookup() - find a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: the address in the shadow aguest address space - * @pgt: parent gmap address of the page table to get shadowed - * @dat_protection: if the pgtable is marked as protected by dat - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if the shadow page table was found and -EAGAIN if the page - * table was not found. - * - * Called with sg->mm->mmap_lock in read. - */ -static int shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, unsigne= d long *pgt, - int *dat_protection, int *fake) +static inline int _gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap= *sg, gpa_t saddr, + unsigned long seq, struct pgtwalk *walk) { - unsigned long pt_index; - unsigned long *table; - struct page *page; + struct gmap *parent; int rc; =20 - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { - /* Shadow page tables are full pages (pte+pgste) */ - page =3D pfn_to_page(*table >> PAGE_SHIFT); - pt_index =3D gmap_pgste_get_pgt_addr(page_to_virt(page)); - *pgt =3D pt_index & ~GMAP_SHADOW_FAKE_TABLE; - *dat_protection =3D !!(*table & _SEGMENT_ENTRY_PROTECT); - *fake =3D !!(pt_index & GMAP_SHADOW_FAKE_TABLE); - rc =3D 0; - } else { - rc =3D -EAGAIN; + if (kvm_s390_array_needs_retry_unsafe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; +again: + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + if (kvm_s390_array_needs_retry_safe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; + parent =3D READ_ONCE(sg->parent); + if (!parent) + return -EAGAIN; + scoped_guard(spinlock, &parent->children_lock) { + if (READ_ONCE(sg->parent) !=3D parent) + return -EAGAIN; + rc =3D _gaccess_do_shadow(vcpu->arch.mc, sg, saddr, walk); + } + if (rc =3D=3D -ENOMEM) + goto again; + if (!rc) + kvm_s390_release_faultin_array(vcpu->kvm, walk->raw_entries, false); } - spin_unlock(&sg->guest_table_lock); return rc; } =20 /** - * kvm_s390_shadow_fault - handle fault on a shadow page table - * @vcpu: virtual cpu - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @datptr: will contain the address of the faulting DAT table entry, or of - * the valid leaf, plus some flags + * __gaccess_shadow_fault() - Handle fault on a shadow page table. + * @vcpu: Virtual cpu that triggered the action. + * @sg: The shadow guest address space structure. + * @saddr: Faulting address in the shadow gmap. + * @datptr: Will contain the address of the faulting DAT table entry, or of + * the valid leaf, plus some flags. + * @wr: Whether this is a write access. * - * Returns: - 0 if the shadow fault was successfully resolved - * - > 0 (pgm exception code) on exceptions while faulting - * - -EAGAIN if the caller can retry immediately - * - -EFAULT when accessing invalid guest addresses - * - -ENOMEM if out of memory + * Return: + * * %0 if the shadow fault was successfully resolved + * * > 0 (pgm exception code) on exceptions while faulting + * * %-EAGAIN if the caller can retry immediately + * * %-EFAULT when accessing invalid guest addresses + * * %-ENOMEM if out of memory */ -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, - unsigned long saddr, unsigned long *datptr) +static int __gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, = gpa_t saddr, + union mvpg_pei *datptr, bool wr) { - union vaddress vaddr; - union page_table_entry pte; - unsigned long pgt =3D 0; - int dat_protection, fake; + struct pgtwalk walk =3D { .p =3D false, }; + unsigned long seq; int rc; =20 - if (KVM_BUG_ON(!gmap_is_shadow(sg), vcpu->kvm)) - return -EFAULT; + seq =3D vcpu->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); =20 - mmap_read_lock(sg->mm); - /* - * We don't want any guest-2 tables to change - so the parent - * tables/pointers we read stay valid - unshadowing is however - * always possible - only guest_table_lock protects us. - */ - ipte_lock(vcpu->kvm); - - rc =3D shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); + rc =3D walk_guest_tables(sg, saddr, &walk, wr); + if (datptr) { + datptr->val =3D walk.last_addr; + datptr->dat_prot =3D wr && walk.p; + datptr->not_pte =3D walk.level > TABLE_TYPE_PAGE_TABLE; + datptr->real =3D sg->guest_asce.r; + } + if (!rc) + rc =3D _gaccess_shadow_fault(vcpu, sg, saddr, seq, &walk); if (rc) - rc =3D kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection, - &fake); + kvm_s390_release_faultin_array(vcpu->kvm, walk.raw_entries, true); + return rc; +} =20 - vaddr.addr =3D saddr; - if (fake) { - pte.val =3D pgt + vaddr.px * PAGE_SIZE; - goto shadow_page; - } +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr) +{ + int rc; =20 - switch (rc) { - case PGM_SEGMENT_TRANSLATION: - case PGM_REGION_THIRD_TRANS: - case PGM_REGION_SECOND_TRANS: - case PGM_REGION_FIRST_TRANS: - pgt |=3D PEI_NOT_PTE; - break; - case 0: - pgt +=3D vaddr.px * 8; - rc =3D gmap_read_table(sg->parent, pgt, &pte.val); - } - if (datptr) - *datptr =3D pgt | dat_protection * PEI_DAT_PROT; - if (!rc && pte.i) - rc =3D PGM_PAGE_TRANSLATION; - if (!rc && pte.z) - rc =3D PGM_TRANSLATION_SPEC; -shadow_page: - pte.p |=3D dat_protection; - if (!rc) - rc =3D gmap_shadow_page(sg, saddr, __pte(pte.val)); - vcpu->kvm->stat.gmap_shadow_pg_entry++; + if (KVM_BUG_ON(!test_bit(GMAP_FLAG_SHADOW, &sg->flags), vcpu->kvm)) + return -EFAULT; + + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + + ipte_lock(vcpu->kvm); + rc =3D __gaccess_shadow_fault(vcpu, sg, saddr, datptr, wr || sg->guest_as= ce.r); ipte_unlock(vcpu->kvm); - mmap_read_unlock(sg->mm); + return rc; } diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 774cdf19998f..b5385cec60f4 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,7 +206,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 access_key, bool *success); =20 /** @@ -450,11 +450,17 @@ void ipte_unlock(struct kvm *kvm); int ipte_lock_held(struct kvm *kvm); int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long= gra); =20 -/* MVPG PEI indication bits */ -#define PEI_DAT_PROT 2 -#define PEI_NOT_PTE 4 +union mvpg_pei { + unsigned long val; + struct { + unsigned long addr : 61; + unsigned long not_pte : 1; + unsigned long dat_prot: 1; + unsigned long real : 1; + }; +}; =20 -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow, - unsigned long saddr, unsigned long *datptr); +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr); =20 #endif /* __KVM_S390_GACCESS_H */ diff --git a/arch/s390/kvm/gmap-vsie.c b/arch/s390/kvm/gmap-vsie.c deleted file mode 100644 index 56ef153eb8fe..000000000000 --- a/arch/s390/kvm/gmap-vsie.c +++ /dev/null @@ -1,141 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Guest memory management for KVM/s390 nested VMs. - * - * Copyright IBM Corp. 2008, 2020, 2024 - * - * Author(s): Claudio Imbrenda - * Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -#include "kvm-s390.h" - -/** - * gmap_find_shadow - find a specific asce in the list of shadow tables - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * Returns the pointer to a gmap if a shadow table with the given asce is - * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, - * otherwise NULL - * - * Context: Called with parent->shadow_lock held - */ -static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long as= ce, int edat_level) -{ - struct gmap *sg; - - lockdep_assert_held(&parent->shadow_lock); - list_for_each_entry(sg, &parent->children, list) { - if (!gmap_shadow_valid(sg, asce, edat_level)) - continue; - if (!sg->initialized) - return ERR_PTR(-EAGAIN); - refcount_inc(&sg->ref_count); - return sg; - } - return NULL; -} - -/** - * gmap_shadow - create/find a shadow guest address space - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * The pages of the top level page table referred by the asce parameter - * will be set to read-only and marked in the PGSTEs of the kvm process. - * The shadow table will be removed automatically on any change to the - * PTE mapping for the source table. - * - * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, - * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the - * parent gmap table could not be protected. - */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level) -{ - struct gmap *sg, *new; - unsigned long limit; - int rc; - - if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)par= ent->private) || - KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private)) - return ERR_PTR(-EFAULT); - spin_lock(&parent->shadow_lock); - sg =3D gmap_find_shadow(parent, asce, edat_level); - spin_unlock(&parent->shadow_lock); - if (sg) - return sg; - /* Create a new shadow gmap */ - limit =3D -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11)); - if (asce & _ASCE_REAL_SPACE) - limit =3D -1UL; - new =3D gmap_alloc(limit); - if (!new) - return ERR_PTR(-ENOMEM); - new->mm =3D parent->mm; - new->parent =3D gmap_get(parent); - new->private =3D parent->private; - new->orig_asce =3D asce; - new->edat_level =3D edat_level; - new->initialized =3D false; - spin_lock(&parent->shadow_lock); - /* Recheck if another CPU created the same shadow */ - sg =3D gmap_find_shadow(parent, asce, edat_level); - if (sg) { - spin_unlock(&parent->shadow_lock); - gmap_free(new); - return sg; - } - if (asce & _ASCE_REAL_SPACE) { - /* only allow one real-space gmap shadow */ - list_for_each_entry(sg, &parent->children, list) { - if (sg->orig_asce & _ASCE_REAL_SPACE) { - spin_lock(&sg->guest_table_lock); - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - break; - } - } - } - refcount_set(&new->ref_count, 2); - list_add(&new->list, &parent->children); - if (asce & _ASCE_REAL_SPACE) { - /* nothing to protect, return right away */ - new->initialized =3D true; - spin_unlock(&parent->shadow_lock); - return new; - } - spin_unlock(&parent->shadow_lock); - /* protect after insertion, so it will get properly invalidated */ - mmap_read_lock(parent->mm); - rc =3D __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN, - ((asce & _ASCE_TABLE_LENGTH) + 1), - PROT_READ, GMAP_NOTIFY_SHADOW); - mmap_read_unlock(parent->mm); - spin_lock(&parent->shadow_lock); - new->initialized =3D true; - if (rc) { - list_del(&new->list); - gmap_free(new); - new =3D ERR_PTR(rc); - } - spin_unlock(&parent->shadow_lock); - return new; -} diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c index 420ae62977e2..39aff324203e 100644 --- a/arch/s390/kvm/intercept.c +++ b/arch/s390/kvm/intercept.c @@ -21,6 +21,7 @@ #include "gaccess.h" #include "trace.h" #include "trace-s390.h" +#include "faultin.h" =20 u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu) { @@ -367,8 +368,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg2, &srcaddr, GACC_FETCH, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, srcaddr, 0); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(srcaddr), fals= e); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 /* Ensure that the source is paged-in, no actual access -> no key checkin= g */ @@ -376,8 +380,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg1, &dstaddr, GACC_STORE, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(dstaddr), true= ); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 kvm_s390_retry_instr(vcpu); diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 249cdc822ec5..f55eca9aa638 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -34,6 +33,7 @@ #include "gaccess.h" #include "trace-s390.h" #include "pci.h" +#include "gmap.h" =20 #define PFAULT_INIT 0x0600 #define PFAULT_DONE 0x0680 @@ -2632,12 +2632,12 @@ static int flic_set_attr(struct kvm_device *dev, st= ruct kvm_device_attr *attr) case KVM_DEV_FLIC_APF_ENABLE: if (kvm_is_ucontrol(dev->kvm)) return -EINVAL; - dev->kvm->arch.gmap->pfault_enabled =3D 1; + set_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags); break; case KVM_DEV_FLIC_APF_DISABLE_WAIT: if (kvm_is_ucontrol(dev->kvm)) return -EINVAL; - dev->kvm->arch.gmap->pfault_enabled =3D 0; + clear_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags); /* * Make sure no async faults are in transition when * clearing the queues. So we don't need to worry diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f5411e093fb5..bde55761bf8a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include @@ -53,6 +52,8 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" +#include "faultin.h" #include "pci.h" =20 #define CREATE_TRACE_POINTS @@ -264,16 +265,11 @@ static DECLARE_BITMAP(kvm_s390_available_cpu_feat, KV= M_S390_VM_CPU_FEAT_NR_BITS) /* available subfunctions indicated via query / "test bit" */ static struct kvm_s390_vm_cpu_subfunc kvm_s390_available_subfunc; =20 -static struct gmap_notifier gmap_notifier; -static struct gmap_notifier vsie_gmap_notifier; debug_info_t *kvm_s390_dbf; debug_info_t *kvm_s390_dbf_uv; =20 /* Section: not file related */ /* forward declarations */ -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); - static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta) { u8 delta_idx =3D 0; @@ -529,10 +525,6 @@ static int __init __kvm_s390_init(void) if (rc) goto err_gib; =20 - gmap_notifier.notifier_call =3D kvm_gmap_notifier; - gmap_register_pte_notifier(&gmap_notifier); - vsie_gmap_notifier.notifier_call =3D kvm_s390_vsie_gmap_notifier; - gmap_register_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_register(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -552,8 +544,6 @@ static int __init __kvm_s390_init(void) =20 static void __kvm_s390_exit(void) { - gmap_unregister_pte_notifier(&gmap_notifier); - gmap_unregister_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_unregister(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -569,7 +559,7 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { if (ioctl =3D=3D KVM_S390_ENABLE_SIE) - return s390_enable_sie(); + return 0; return -EINVAL; } =20 @@ -698,32 +688,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *mems= lot) { - int i; - gfn_t cur_gfn, last_gfn; - unsigned long gaddr, vmaddr; - struct gmap *gmap =3D kvm->arch.gmap; - DECLARE_BITMAP(bitmap, _PAGE_ENTRIES); - - /* Loop over all guest segments */ - cur_gfn =3D memslot->base_gfn; - last_gfn =3D memslot->base_gfn + memslot->npages; - for (; cur_gfn <=3D last_gfn; cur_gfn +=3D _PAGE_ENTRIES) { - gaddr =3D gfn_to_gpa(cur_gfn); - vmaddr =3D gfn_to_hva_memslot(memslot, cur_gfn); - if (kvm_is_error_hva(vmaddr)) - continue; - - bitmap_zero(bitmap, _PAGE_ENTRIES); - gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr); - for (i =3D 0; i < _PAGE_ENTRIES; i++) { - if (test_bit(i, bitmap)) - mark_page_dirty(kvm, cur_gfn + i); - } + gfn_t last_gfn =3D memslot->base_gfn + memslot->npages; =20 - if (fatal_signal_pending(current)) - return; - cond_resched(); - } + scoped_guard(read_lock, &kvm->mmu_lock) + gmap_sync_dirty_log(kvm->arch.gmap, memslot->base_gfn, last_gfn); } =20 /* Section: vm related */ @@ -883,9 +851,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; - mmap_write_lock(kvm->mm); - kvm->mm->context.allow_gmap_hpage_1m =3D 1; - mmap_write_unlock(kvm->mm); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -958,7 +923,7 @@ static int kvm_s390_get_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_att= r *attr) { int ret; - unsigned int idx; + switch (attr->attr) { case KVM_S390_VM_MEM_ENABLE_CMMA: ret =3D -ENXIO; @@ -969,8 +934,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att mutex_lock(&kvm->lock); if (kvm->created_vcpus) ret =3D -EBUSY; - else if (kvm->mm->context.allow_gmap_hpage_1m) - ret =3D -EINVAL; else { kvm->arch.use_cmma =3D 1; /* Not compatible with cmma. */ @@ -979,7 +942,9 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att } mutex_unlock(&kvm->lock); break; - case KVM_S390_VM_MEM_CLR_CMMA: + case KVM_S390_VM_MEM_CLR_CMMA: { + gfn_t start_gfn =3D 0; + ret =3D -ENXIO; if (!sclp.has_cmma) break; @@ -988,13 +953,13 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, = struct kvm_device_attr *att break; =20 VM_EVENT(kvm, 3, "%s", "RESET: CMMA states"); - mutex_lock(&kvm->lock); - idx =3D srcu_read_lock(&kvm->srcu); - s390_reset_cmma(kvm->arch.gmap->mm); - srcu_read_unlock(&kvm->srcu, idx); - mutex_unlock(&kvm->lock); + do { + start_gfn =3D dat_reset_cmma(kvm->arch.gmap->asce, start_gfn); + cond_resched(); + } while (start_gfn); ret =3D 0; break; + } case KVM_S390_VM_MEM_LIMIT_SIZE: { unsigned long new_limit; =20 @@ -1011,29 +976,12 @@ static int kvm_s390_set_mem_control(struct kvm *kvm,= struct kvm_device_attr *att if (!new_limit) return -EINVAL; =20 - /* gmap_create takes last usable address */ - if (new_limit !=3D KVM_S390_NO_MEM_LIMIT) - new_limit -=3D 1; - ret =3D -EBUSY; - mutex_lock(&kvm->lock); - if (!kvm->created_vcpus) { - /* gmap_create will round the limit up */ - struct gmap *new =3D gmap_create(current->mm, new_limit); - - if (!new) { - ret =3D -ENOMEM; - } else { - gmap_remove(kvm->arch.gmap); - new->private =3D kvm; - kvm->arch.gmap =3D new; - ret =3D 0; - } - } - mutex_unlock(&kvm->lock); + if (!kvm->created_vcpus) + ret =3D gmap_set_limit(kvm->arch.gmap, gpa_to_gfn(new_limit)); VM_EVENT(kvm, 3, "SET: max guest address: %lu", new_limit); VM_EVENT(kvm, 3, "New guest asce: 0x%p", - (void *) kvm->arch.gmap->asce); + (void *)kvm->arch.gmap->asce.val); break; } default: @@ -1198,19 +1146,13 @@ static int kvm_s390_vm_start_migration(struct kvm *= kvm) kvm->arch.migration_mode =3D 1; return 0; } - /* mark all the pages in active slots as dirty */ kvm_for_each_memslot(ms, bkt, slots) { if (!ms->dirty_bitmap) return -EINVAL; - /* - * The second half of the bitmap is only used on x86, - * and would be wasted otherwise, so we put it to good - * use here to keep track of the state of the storage - * attributes. - */ - memset(kvm_second_dirty_bitmap(ms), 0xff, kvm_dirty_bitmap_bytes(ms)); ram_pages +=3D ms->npages; } + /* mark all the pages as dirty */ + gmap_set_cmma_all_dirty(kvm->arch.gmap); atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages); kvm->arch.migration_mode =3D 1; kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION); @@ -2116,40 +2058,32 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, st= ruct kvm_device_attr *attr) =20 static int kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; =20 /* Is this guest using storage keys? */ - if (!mm_uses_skeys(current->mm)) + if (!uses_skeys(kvm->arch.gmap)) return KVM_S390_GET_SKEYS_NONE; =20 /* Enforce sane limit on memory allocation */ if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0; i < args->count; i++) { + r =3D dat_get_storage_key(kvm->arch.gmap->asce, + args->start_gfn + i, keys + i); + if (r) + break; } - - r =3D get_guest_storage_key(current->mm, hva, &keys[i]); - if (r) - break; } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); =20 if (!r) { r =3D copy_to_user((uint8_t __user *)args->skeydata_addr, keys, @@ -2164,10 +2098,9 @@ static int kvm_s390_get_skeys(struct kvm *kvm, struc= t kvm_s390_skeys *args) =20 static int kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; - bool unlocked; + struct kvm_s390_mmu_cache *mc; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; @@ -2176,7 +2109,7 @@ static int kvm_s390_set_skeys(struct kvm *kvm, struct= kvm_s390_skeys *args) if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 @@ -2188,159 +2121,41 @@ static int kvm_s390_set_skeys(struct kvm *kvm, str= uct kvm_s390_skeys *args) } =20 /* Enable storage key handling for the guest */ - r =3D s390_enable_skey(); + r =3D gmap_enable_skeys(kvm->arch.gmap); if (r) goto out; =20 - i =3D 0; - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - while (i < args->count) { - unlocked =3D false; - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; - } - + r =3D -EINVAL; + for (i =3D 0; i < args->count; i++) { /* Lowest order bit is reserved */ - if (keys[i] & 0x01) { - r =3D -EINVAL; - break; - } - - r =3D set_guest_storage_key(current->mm, hva, keys[i], 0); - if (r) { - r =3D fixup_user_fault(current->mm, hva, - FAULT_FLAG_WRITE, &unlocked); - if (r) - break; - } - if (!r) - i++; - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); -out: - kvfree(keys); - return r; -} - -/* - * Base address and length must be sent at the start of each block, theref= ore - * it's cheaper to send some clean data, as long as it's less than the siz= e of - * two longs. - */ -#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) -/* for consistency */ -#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) - -static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *a= rgs, - u8 *res, unsigned long bufsize) -{ - unsigned long pgstev, hva, cur_gfn =3D args->start_gfn; - - args->count =3D 0; - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - /* - * We return an error if the first value was invalid, but we - * return successfully if at least one value was copied. - */ - if (kvm_is_error_hva(hva)) - return args->count ? 0 : -EFAULT; - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - res[args->count++] =3D (pgstev >> 24) & 0x43; - cur_gfn++; - } - - return 0; -} - -static struct kvm_memory_slot *gfn_to_memslot_approx(struct kvm_memslots *= slots, - gfn_t gfn) -{ - return ____gfn_to_memslot(slots, gfn, true); -} - -static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots, - unsigned long cur_gfn) -{ - struct kvm_memory_slot *ms =3D gfn_to_memslot_approx(slots, cur_gfn); - unsigned long ofs =3D cur_gfn - ms->base_gfn; - struct rb_node *mnode =3D &ms->gfn_node[slots->node_idx]; - - if (ms->base_gfn + ms->npages <=3D cur_gfn) { - mnode =3D rb_next(mnode); - /* If we are above the highest slot, wrap around */ - if (!mnode) - mnode =3D rb_first(&slots->gfn_tree); - - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D 0; + if (keys[i].zero) + goto out; } =20 - if (cur_gfn < ms->base_gfn) - ofs =3D 0; - - ofs =3D find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs); - while (ofs >=3D ms->npages && (mnode =3D rb_next(mnode))) { - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages); + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + r =3D -ENOMEM; + goto out; } - return ms->base_gfn + ofs; -} =20 -static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *ar= gs, - u8 *res, unsigned long bufsize) -{ - unsigned long mem_end, cur_gfn, next_gfn, hva, pgstev; - struct kvm_memslots *slots =3D kvm_memslots(kvm); - struct kvm_memory_slot *ms; - - if (unlikely(kvm_memslots_empty(slots))) - return 0; - - cur_gfn =3D kvm_s390_next_dirty_cmma(slots, args->start_gfn); - ms =3D gfn_to_memslot(kvm, cur_gfn); - args->count =3D 0; - args->start_gfn =3D cur_gfn; - if (!ms) - return 0; - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - mem_end =3D kvm_s390_get_gfn_end(slots); - - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - if (kvm_is_error_hva(hva)) - return 0; - /* Decrement only if we actually flipped the bit to 0 */ - if (test_and_clear_bit(cur_gfn - ms->base_gfn, kvm_second_dirty_bitmap(m= s))) - atomic64_dec(&kvm->arch.cmma_dirty_pages); - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - /* Save the value */ - res[args->count++] =3D (pgstev >> 24) & 0x43; - /* If the next bit is too far away, stop. */ - if (next_gfn > cur_gfn + KVM_S390_MAX_BIT_DISTANCE) - return 0; - /* If we reached the previous "next", find the next one */ - if (cur_gfn =3D=3D next_gfn) - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - /* Reached the end of memory or of the buffer, stop */ - if ((next_gfn >=3D mem_end) || - (next_gfn - args->start_gfn >=3D bufsize)) - return 0; - cur_gfn++; - /* Reached the end of the current memslot, take the next one. */ - if (cur_gfn - ms->base_gfn >=3D ms->npages) { - ms =3D gfn_to_memslot(kvm, cur_gfn); - if (!ms) - return 0; + r =3D 0; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r =3D=3D -ENOMEM) + break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0 ; i < args->count; i++) { + r =3D dat_set_storage_key(mc, kvm->arch.gmap->asce, + args->start_gfn + i, keys[i], 0); + if (r) + break; + } } - } - return 0; + } while (r =3D=3D -ENOMEM); + kvm_s390_free_mmu_cache(mc); +out: + kvfree(keys); + return r; } =20 /* @@ -2354,8 +2169,7 @@ static int kvm_s390_get_cmma(struct kvm *kvm, struct = kvm_s390_cmma_log *args, static int kvm_s390_get_cmma_bits(struct kvm *kvm, struct kvm_s390_cmma_log *args) { - unsigned long bufsize; - int srcu_idx, peek, ret; + int peek, ret; u8 *values; =20 if (!kvm->arch.use_cmma) @@ -2368,8 +2182,8 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, if (!peek && !kvm->arch.migration_mode) return -EINVAL; /* CMMA is disabled or was not used, or the buffer has length zero */ - bufsize =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); - if (!bufsize || !kvm->mm->context.uses_cmm) { + args->count =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); + if (!args->count || !uses_cmm(kvm->arch.gmap)) { memset(args, 0, sizeof(*args)); return 0; } @@ -2379,18 +2193,18 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, return 0; } =20 - values =3D vmalloc(bufsize); + values =3D vmalloc(args->count); if (!values) return -ENOMEM; =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - if (peek) - ret =3D kvm_s390_peek_cmma(kvm, args, values, bufsize); - else - ret =3D kvm_s390_get_cmma(kvm, args, values, bufsize); - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); + scoped_guard(read_lock, &kvm->mmu_lock) { + if (peek) + ret =3D dat_peek_cmma(args->start_gfn, kvm->arch.gmap->asce, &args->cou= nt, + values); + else + ret =3D dat_get_cmma(kvm->arch.gmap->asce, &args->start_gfn, &args->cou= nt, + values, &kvm->arch.cmma_dirty_pages); + } =20 if (kvm->arch.migration_mode) args->remaining =3D atomic64_read(&kvm->arch.cmma_dirty_pages); @@ -2412,11 +2226,9 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, static int kvm_s390_set_cmma_bits(struct kvm *kvm, const struct kvm_s390_cmma_log *args) { - unsigned long hva, mask, pgstev, i; - uint8_t *bits; - int srcu_idx, r =3D 0; - - mask =3D args->mask; + struct kvm_s390_mmu_cache *mc; + u8 *bits =3D NULL; + int r =3D 0; =20 if (!kvm->arch.use_cmma) return -ENXIO; @@ -2430,9 +2242,12 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, if (args->count =3D=3D 0) return 0; =20 + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; bits =3D vmalloc(array_size(sizeof(*bits), args->count)); if (!bits) - return -ENOMEM; + goto out; =20 r =3D copy_from_user(bits, (void __user *)args->values, args->count); if (r) { @@ -2440,29 +2255,19 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, goto out; } =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r) break; + scoped_guard(read_lock, &kvm->mmu_lock) { + r =3D dat_set_cmma_bits(mc, kvm->arch.gmap->asce, args->start_gfn, + args->count, args->mask, bits); } + } while (r =3D=3D -ENOMEM); =20 - pgstev =3D bits[i]; - pgstev =3D pgstev << 24; - mask &=3D _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT; - set_pgste_bits(kvm->mm, hva, mask, pgstev); - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); - - if (!kvm->mm->context.uses_cmm) { - mmap_write_lock(kvm->mm); - kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(kvm->mm); - } + set_bit(GMAP_FLAG_USES_CMM, &kvm->arch.gmap->flags); out: + kvm_s390_free_mmu_cache(mc); vfree(bits); return r; } @@ -2671,6 +2476,13 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struc= t kvm_pv_cmd *cmd) break; =20 mmap_write_lock(kvm->mm); + /* + * Disable creation of new THPs. Existing THPs can stay, they + * will be split when any part of them gets imported. + */ + mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, kvm->mm); + mm_flags_set(MMF_DISABLE_THP_COMPLETELY, kvm->mm); + set_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &kvm->arch.gmap->flags); r =3D gmap_helper_disable_cow_sharing(); mmap_write_unlock(kvm->mm); if (r) @@ -2918,9 +2730,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); =20 @@ -2933,7 +2742,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) return -EFAULT; } - return 0; } =20 @@ -2962,9 +2770,6 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm= , struct kvm_s390_mem_op *m return -EFAULT; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, mop->key, &success); =20 @@ -3322,11 +3127,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) if (type) goto out_err; #endif - - rc =3D s390_enable_sie(); - if (rc) - goto out_err; - rc =3D -ENOMEM; =20 if (!sclp.has_64bscao) @@ -3400,6 +3200,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) debug_register_view(kvm->arch.dbf, &debug_sprintf_view); VM_EVENT(kvm, 3, "vm created with type %lu", type); =20 + kvm->arch.mem_limit =3D type & KVM_VM_S390_UCONTROL ? KVM_S390_NO_MEM_LIM= IT : sclp.hamax + 1; + kvm->arch.gmap =3D gmap_new(kvm, gpa_to_gfn(kvm->arch.mem_limit)); + if (!kvm->arch.gmap) + goto out_err; + clear_bit(GMAP_FLAG_PFAULT_ENABLED, &kvm->arch.gmap->flags); + if (type & KVM_VM_S390_UCONTROL) { struct kvm_userspace_memory_region2 fake_memslot =3D { .slot =3D KVM_S390_UCONTROL_MEMSLOT, @@ -3409,23 +3215,15 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) .flags =3D 0, }; =20 - kvm->arch.gmap =3D NULL; - kvm->arch.mem_limit =3D KVM_S390_NO_MEM_LIMIT; /* one flat fake memslot covering the whole address-space */ mutex_lock(&kvm->slots_lock); KVM_BUG_ON(kvm_set_internal_memslot(kvm, &fake_memslot), kvm); mutex_unlock(&kvm->slots_lock); + set_bit(GMAP_FLAG_IS_UCONTROL, &kvm->arch.gmap->flags); } else { - if (sclp.hamax =3D=3D U64_MAX) - kvm->arch.mem_limit =3D TASK_SIZE_MAX; - else - kvm->arch.mem_limit =3D min_t(unsigned long, TASK_SIZE_MAX, - sclp.hamax + 1); - kvm->arch.gmap =3D gmap_create(current->mm, kvm->arch.mem_limit - 1); - if (!kvm->arch.gmap) - goto out_err; - kvm->arch.gmap->private =3D kvm; - kvm->arch.gmap->pfault_enabled =3D 0; + struct crst_table *table =3D dereference_asce(kvm->arch.gmap->asce); + + crst_table_init((void *)table, _CRSTE_HOLE(table->crstes[0].h.tt).val); } =20 kvm->arch.use_pfmfi =3D sclp.has_pfmfi; @@ -3459,8 +3257,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) sca_del_vcpu(vcpu); kvm_s390_update_topology_change_report(vcpu->kvm, 1); =20 - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) + gmap_remove_child(vcpu->arch.gmap); + vcpu->arch.gmap =3D gmap_put(vcpu->arch.gmap); + } =20 if (vcpu->kvm->arch.use_cmma) kvm_s390_vcpu_unsetup_cmma(vcpu); @@ -3468,6 +3269,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) if (kvm_s390_pv_cpu_get_handle(vcpu)) kvm_s390_pv_destroy_cpu(vcpu, &rc, &rrc); free_page((unsigned long)(vcpu->arch.sie_block)); + kvm_s390_free_mmu_cache(vcpu->arch.mc); } =20 void kvm_arch_destroy_vm(struct kvm *kvm) @@ -3494,25 +3296,14 @@ void kvm_arch_destroy_vm(struct kvm *kvm) =20 debug_unregister(kvm->arch.dbf); free_page((unsigned long)kvm->arch.sie_page2); - if (!kvm_is_ucontrol(kvm)) - gmap_remove(kvm->arch.gmap); kvm_s390_destroy_adapters(kvm); kvm_s390_clear_float_irqs(kvm); kvm_s390_vsie_destroy(kvm); + kvm->arch.gmap =3D gmap_put(kvm->arch.gmap); KVM_EVENT(3, "vm 0x%p destroyed", kvm); } =20 /* Section: vcpu related */ -static int __kvm_ucontrol_vcpu_init(struct kvm_vcpu *vcpu) -{ - vcpu->arch.gmap =3D gmap_create(current->mm, -1UL); - if (!vcpu->arch.gmap) - return -ENOMEM; - vcpu->arch.gmap->private =3D vcpu->kvm; - - return 0; -} - static void sca_del_vcpu(struct kvm_vcpu *vcpu) { struct esca_block *sca =3D vcpu->kvm->arch.sca; @@ -3853,9 +3644,15 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) int rc; =20 BUILD_BUG_ON(sizeof(struct sie_page) !=3D 4096); + vcpu->arch.mc =3D kvm_s390_new_mmu_cache(); + if (!vcpu->arch.mc) + return -ENOMEM; sie_page =3D (struct sie_page *) get_zeroed_page(GFP_KERNEL_ACCOUNT); - if (!sie_page) + if (!sie_page) { + kvm_s390_free_mmu_cache(vcpu->arch.mc); + vcpu->arch.mc =3D NULL; return -ENOMEM; + } =20 vcpu->arch.sie_block =3D &sie_page->sie_block; vcpu->arch.sie_block->itdba =3D virt_to_phys(&sie_page->itdb); @@ -3897,8 +3694,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->run->kvm_valid_regs |=3D KVM_SYNC_FPRS; =20 if (kvm_is_ucontrol(vcpu->kvm)) { - rc =3D __kvm_ucontrol_vcpu_init(vcpu); - if (rc) + rc =3D -ENOMEM; + vcpu->arch.gmap =3D gmap_new_child(vcpu->kvm->arch.gmap, -1UL); + if (!vcpu->arch.gmap) goto out_free_sie_block; } =20 @@ -3914,8 +3712,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; =20 out_ucontrol_uninit: - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + gmap_remove_child(vcpu->arch.gmap); + vcpu->arch.gmap =3D gmap_put(vcpu->arch.gmap); + } out_free_sie_block: free_page((unsigned long)(vcpu->arch.sie_block)); return rc; @@ -3979,32 +3779,6 @@ void kvm_s390_sync_request(int req, struct kvm_vcpu = *vcpu) kvm_s390_vcpu_request(vcpu); } =20 -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct kvm *kvm =3D gmap->private; - struct kvm_vcpu *vcpu; - unsigned long prefix; - unsigned long i; - - trace_kvm_s390_gmap_notifier(start, end, gmap_is_shadow(gmap)); - - if (gmap_is_shadow(gmap)) - return; - if (start >=3D 1UL << 31) - /* We are only interested in prefix pages */ - return; - kvm_for_each_vcpu(i, vcpu, kvm) { - /* match against both prefix pages */ - prefix =3D kvm_s390_get_prefix(vcpu); - if (prefix <=3D end && start <=3D prefix + 2*PAGE_SIZE - 1) { - VCPU_EVENT(vcpu, 2, "gmap notifier for %lx-%lx", - start, end); - kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); - } - } -} - bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) { /* do not poll with more than halt_poll_max_steal percent of steal time */ @@ -4386,72 +4160,41 @@ static bool ibs_enabled(struct kvm_vcpu *vcpu) return kvm_s390_test_cpuflags(vcpu, CPUSTAT_IBS); } =20 -static int __kvm_s390_fixup_fault_sync(struct gmap *gmap, gpa_t gaddr, uns= igned int flags) -{ - struct kvm *kvm =3D gmap->private; - gfn_t gfn =3D gpa_to_gfn(gaddr); - bool unlocked; - hva_t vmaddr; - gpa_t tmp; - int rc; - - if (kvm_is_ucontrol(kvm)) { - tmp =3D __gmap_translate(gmap, gaddr); - gfn =3D gpa_to_gfn(tmp); - } - - vmaddr =3D gfn_to_hva(kvm, gfn); - rc =3D fixup_user_fault(gmap->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!rc) - rc =3D __gmap_link(gmap, gaddr, vmaddr); - return rc; -} - -/** - * __kvm_s390_mprotect_many() - Apply specified protection to guest pages - * @gmap: the gmap of the guest - * @gpa: the starting guest address - * @npages: how many pages to protect - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: 0 in case of success, < 0 in case of error - see gmap_protect_= one() - * - * Context: kvm->srcu and gmap->mm need to be held in read mode - */ -int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, - unsigned long bits) +static int vcpu_ucontrol_translate(struct kvm_vcpu *vcpu, gpa_t *gaddr) { - unsigned int fault_flag =3D (prot & PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - gpa_t end =3D gpa + npages * PAGE_SIZE; int rc; =20 - for (; gpa < end; gpa =3D ALIGN(gpa + 1, rc)) { - rc =3D gmap_protect_one(gmap, gpa, prot, bits); - if (rc =3D=3D -EAGAIN) { - __kvm_s390_fixup_fault_sync(gmap, gpa, fault_flag); - rc =3D gmap_protect_one(gmap, gpa, prot, bits); + if (kvm_is_ucontrol(vcpu->kvm)) { + rc =3D gmap_ucas_translate(vcpu->arch.mc, vcpu->arch.gmap, gaddr); + if (rc =3D=3D -EREMOTE) { + vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; + vcpu->run->s390_ucontrol.trans_exc_code =3D *gaddr; + vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; } - if (rc < 0) - return rc; + return rc; } - return 0; } =20 -static int kvm_s390_mprotect_notify_prefix(struct kvm_vcpu *vcpu) +static int kvm_s390_fixup_prefix(struct kvm_vcpu *vcpu) { gpa_t gaddr =3D kvm_s390_get_prefix(vcpu); - int idx, rc; - - idx =3D srcu_read_lock(&vcpu->kvm->srcu); - mmap_read_lock(vcpu->arch.gmap->mm); + gfn_t gfn; + int rc; =20 - rc =3D __kvm_s390_mprotect_many(vcpu->arch.gmap, gaddr, 2, PROT_WRITE, GM= AP_NOTIFY_MPROT); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + gfn =3D gpa_to_gfn(gaddr); =20 - mmap_read_unlock(vcpu->arch.gmap->mm); - srcu_read_unlock(&vcpu->kvm->srcu, idx); + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn, true); + if (rc) + return rc; + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn + 1, true); + if (rc) + return rc; =20 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_set_prefix_notif_bit(vcpu->kvm->arch.gmap->asce, gfn); return rc; } =20 @@ -4471,7 +4214,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) { int rc; =20 - rc =3D kvm_s390_mprotect_notify_prefix(vcpu); + rc =3D kvm_s390_fixup_prefix(vcpu); if (rc) { kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); return rc; @@ -4520,8 +4263,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) * Re-enable CMM virtualization if CMMA is available and * CMM has been used. */ - if ((vcpu->kvm->arch.use_cmma) && - (vcpu->kvm->mm->context.uses_cmm)) + if (vcpu->kvm->arch.use_cmma && uses_cmm(vcpu->arch.gmap)) vcpu->arch.sie_block->ecb2 |=3D ECB2_CMMA; goto retry; } @@ -4633,7 +4375,7 @@ bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) return false; if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) return false; - if (!vcpu->arch.gmap->pfault_enabled) + if (!pfault_enabled(vcpu->arch.gmap)) return false; =20 hva =3D gfn_to_hva(vcpu->kvm, current->thread.gmap_teid.addr); @@ -4726,98 +4468,25 @@ static void kvm_s390_assert_primary_as(struct kvm_v= cpu *vcpu) current->thread.gmap_int_code, current->thread.gmap_teid.val); } =20 -/* - * __kvm_s390_handle_dat_fault() - handle a dat fault for the gmap of a vc= pu - * @vcpu: the vCPU whose gmap is to be fixed up - * @gfn: the guest frame number used for memslots (including fake memslots) - * @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps - * @foll: FOLL_* flags - * - * Return: 0 on success, < 0 in case of error. - * Context: The mm lock must not be held before calling. May sleep. - */ -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int foll) -{ - struct kvm_memory_slot *slot; - unsigned int fault_flags; - bool writable, unlocked; - unsigned long vmaddr; - struct page *page; - kvm_pfn_t pfn; +static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, gpa_t gaddr, bool= wr) +{ + struct guest_fault f =3D { + .write_attempt =3D wr, + .attempt_pfault =3D pfault_enabled(vcpu->arch.gmap), + }; int rc; =20 - slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); - if (!slot || slot->flags & KVM_MEMSLOT_INVALID) - return vcpu_post_run_addressing_exception(vcpu); - - fault_flags =3D foll & FOLL_WRITE ? FAULT_FLAG_WRITE : 0; - if (vcpu->arch.gmap->pfault_enabled) - foll |=3D FOLL_NOWAIT; - vmaddr =3D __gfn_to_hva_memslot(slot, gfn); - -try_again: - pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &writable, &page); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + f.gfn =3D gpa_to_gfn(gaddr); =20 - /* Access outside memory, inject addressing exception */ - if (is_noslot_pfn(pfn)) + rc =3D kvm_s390_faultin_gfn(vcpu, NULL, &f); + if (rc <=3D 0) + return rc; + if (rc =3D=3D PGM_ADDRESSING) return vcpu_post_run_addressing_exception(vcpu); - /* Signal pending: try again */ - if (pfn =3D=3D KVM_PFN_ERR_SIGPENDING) - return -EAGAIN; - - /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT) = */ - if (pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { - trace_kvm_s390_major_guest_pfault(vcpu); - if (kvm_arch_setup_async_pf(vcpu)) - return 0; - vcpu->stat.pfault_sync++; - /* Could not setup async pfault, try again synchronously */ - foll &=3D ~FOLL_NOWAIT; - goto try_again; - } - /* Any other error */ - if (is_error_pfn(pfn)) - return -EFAULT; - - /* Success */ - mmap_read_lock(vcpu->arch.gmap->mm); - /* Mark the userspace PTEs as young and/or dirty, to avoid page fault loo= ps */ - rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); - if (!rc) - rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { - kvm_release_faultin_page(vcpu->kvm, page, false, writable); - } - mmap_read_unlock(vcpu->arch.gmap->mm); - return rc; -} - -static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gad= dr, unsigned int foll) -{ - unsigned long gaddr_tmp; - gfn_t gfn; - - gfn =3D gpa_to_gfn(gaddr); - if (kvm_is_ucontrol(vcpu->kvm)) { - /* - * This translates the per-vCPU guest address into a - * fake guest address, which can then be used with the - * fake memslots that are identity mapping userspace. - * This allows ucontrol VMs to use the normal fault - * resolution path, like normal VMs. - */ - mmap_read_lock(vcpu->arch.gmap->mm); - gaddr_tmp =3D __gmap_translate(vcpu->arch.gmap, gaddr); - mmap_read_unlock(vcpu->arch.gmap->mm); - if (gaddr_tmp =3D=3D -EFAULT) { - vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; - vcpu->run->s390_ucontrol.trans_exc_code =3D gaddr; - vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; - return -EREMOTE; - } - gfn =3D gpa_to_gfn(gaddr_tmp); - } - return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, foll); + KVM_BUG_ON(rc, vcpu->kvm); + return -EINVAL; } =20 static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu) @@ -4994,7 +4663,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) =20 exit_reason =3D kvm_s390_enter_exit_sie(vcpu->arch.sie_block, vcpu->run->s.regs.gprs, - vcpu->arch.gmap->asce); + vcpu->arch.gmap->asce.val); =20 __enable_cpu_timer_accounting(vcpu); guest_timing_exit_irqoff(); @@ -5529,8 +5198,8 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcp= u, struct kvm_s390_mem_op *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_INJECT_EXCEPTION | @@ -5552,32 +5221,21 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *v= cpu, if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r =3D check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, acc_mode, mop->key); - goto out_inject; - } - if (acc_mode =3D=3D GACC_FETCH) { + } else if (acc_mode =3D=3D GACC_FETCH) { r =3D read_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); - if (r) - goto out_inject; - if (copy_to_user(uaddr, tmpbuf, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (!r && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (copy_from_user(tmpbuf, uaddr, mop->size)) + return -EFAULT; r =3D write_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); } =20 -out_inject: if (r > 0 && (mop->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION) !=3D 0) kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm); =20 -out_free: - vfree(tmpbuf); return r; } =20 @@ -5767,37 +5425,39 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } #ifdef CONFIG_KVM_S390_UCONTROL case KVM_S390_UCAS_MAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.user_addr | ucas.vcpu_addr | ucas.length, _SEGMENT_= SIZE)) break; - } =20 - r =3D gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr, - ucasmap.vcpu_addr, ucasmap.length); + r =3D gmap_ucas_map(vcpu->arch.gmap, gpa_to_gfn(ucas.user_addr), + gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); break; } case KVM_S390_UCAS_UNMAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.vcpu_addr | ucas.length, _SEGMENT_SIZE)) break; - } =20 - r =3D gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr, - ucasmap.length); + gmap_ucas_unmap(vcpu->arch.gmap, gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); + r =3D 0; break; } #endif @@ -5970,34 +5630,41 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *new, enum kvm_mr_change change) { + struct kvm_s390_mmu_cache *mc =3D NULL; int rc =3D 0; =20 - if (kvm_is_ucontrol(kvm)) + if (change =3D=3D KVM_MR_FLAGS_ONLY) return; =20 - switch (change) { - case KVM_MR_DELETE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); - break; - case KVM_MR_MOVE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); - if (rc) + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + rc =3D -ENOMEM; + goto out; + } + + scoped_guard(write_lock, &kvm->mmu_lock) { + switch (change) { + case KVM_MR_DELETE: + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->np= ages); break; - fallthrough; - case KVM_MR_CREATE: - rc =3D gmap_map_segment(kvm->arch.gmap, new->userspace_addr, - new->base_gfn * PAGE_SIZE, - new->npages * PAGE_SIZE); - break; - case KVM_MR_FLAGS_ONLY: - break; - default: - WARN(1, "Unknown KVM MR CHANGE: %d\n", change); + case KVM_MR_MOVE: + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->np= ages); + if (rc) + break; + fallthrough; + case KVM_MR_CREATE: + rc =3D dat_create_slot(mc, kvm->arch.gmap->asce, new->base_gfn, new->np= ages); + break; + case KVM_MR_FLAGS_ONLY: + break; + default: + WARN(1, "Unknown KVM MR CHANGE: %d\n", change); + } } +out: if (rc) pr_warn("failed to commit memory region\n"); + kvm_s390_free_mmu_cache(mc); return; } =20 @@ -6011,7 +5678,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, */ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return dat_test_age_gfn(kvm->arch.gmap->asce, range->start, range->end); } =20 /** @@ -6024,7 +5692,8 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn= _range *range) */ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return gmap_age_gfn(kvm->arch.gmap, range->start, range->end); } =20 /** @@ -6041,7 +5710,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) */ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + return gmap_unmap_gfn_range(kvm->arch.gmap, range->slot, range->start, ra= nge->end); } =20 static inline unsigned long nonhyp_mask(int i) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index c44c52266e26..bf1d7798c1af 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -19,6 +19,8 @@ #include #include #include +#include "dat.h" +#include "gmap.h" =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 @@ -114,9 +116,7 @@ static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) static inline int kvm_is_ucontrol(struct kvm *kvm) { #ifdef CONFIG_KVM_S390_UCONTROL - if (kvm->arch.gmap) - return 0; - return 1; + return test_bit(GMAP_FLAG_IS_UCONTROL, &kvm->arch.gmap->flags); #else return 0; #endif @@ -440,14 +440,9 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu); /* implemented in vsie.c */ int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu); void kvm_s390_vsie_kick(struct kvm_vcpu *vcpu); -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end= ); void kvm_s390_vsie_init(struct kvm *kvm); void kvm_s390_vsie_destroy(struct kvm *kvm); -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level); - -/* implemented in gmap-vsie.c */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level); =20 /* implemented in sigp.c */ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); @@ -469,15 +464,9 @@ void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu); void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm); __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu); int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rc, u16 *rrc); -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int flags); int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, unsigned long bits); =20 -static inline int kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gpa_t g= addr, unsigned int flags) -{ - return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); -} - bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); =20 /* implemented in diag.c */ diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 0b14d894f38a..a3250ad83a8e 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -21,13 +21,14 @@ #include #include #include -#include #include #include #include +#include #include "gaccess.h" #include "kvm-s390.h" #include "trace.h" +#include "gmap.h" =20 static int handle_ri(struct kvm_vcpu *vcpu) { @@ -222,7 +223,7 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu) if (vcpu->arch.skey_enabled) return 0; =20 - rc =3D s390_enable_skey(); + rc =3D gmap_enable_skeys(vcpu->arch.gmap); VCPU_EVENT(vcpu, 3, "enabling storage keys for guest: %d", rc); if (rc) return rc; @@ -255,10 +256,9 @@ static int try_handle_skey(struct kvm_vcpu *vcpu) =20 static int handle_iske(struct kvm_vcpu *vcpu) { - unsigned long gaddr, vmaddr; - unsigned char key; + unsigned long gaddr; int reg1, reg2; - bool unlocked; + union skey key; int rc; =20 vcpu->stat.instruction_iske++; @@ -275,37 +275,21 @@ static int handle_iske(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D get_guest_storage_key(current->mm, vmaddr, &key); - - if (rc) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr), &ke= y); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; vcpu->run->s.regs.gprs[reg1] &=3D ~0xff; - vcpu->run->s.regs.gprs[reg1] |=3D key; + vcpu->run->s.regs.gprs[reg1] |=3D key.skey; return 0; } =20 static int handle_rrbe(struct kvm_vcpu *vcpu) { - unsigned long vmaddr, gaddr; + unsigned long gaddr; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_rrbe++; @@ -322,24 +306,10 @@ static int handle_rrbe(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D reset_guest_reference_bit(current->mm, vmaddr); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_reset_reference_bit(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr)); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; kvm_s390_set_psw_cc(vcpu, rc); @@ -354,9 +324,8 @@ static int handle_sske(struct kvm_vcpu *vcpu) { unsigned char m3 =3D vcpu->arch.sie_block->ipb >> 28; unsigned long start, end; - unsigned char key, oldkey; + union skey key, oldkey; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_sske++; @@ -377,7 +346,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) =20 kvm_s390_get_regs_rre(vcpu, ®1, ®2); =20 - key =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); if (m3 & SSKE_MB) { @@ -389,27 +358,17 @@ static int handle_sske(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - unlocked =3D false; - - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, key, &oldkey, - m3 & SSKE_NQ, m3 & SSKE_MR, - m3 & SSKE_MC); - - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, &oldkey, + m3 & SSKE_NQ, m3 & SSKE_MR, m3 & SSKE_MC); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) + if (rc > 1) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; start +=3D PAGE_SIZE; @@ -422,7 +381,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) } else { kvm_s390_set_psw_cc(vcpu, rc); vcpu->run->s.regs.gprs[reg1] &=3D ~0xff00UL; - vcpu->run->s.regs.gprs[reg1] |=3D (u64) oldkey << 8; + vcpu->run->s.regs.gprs[reg1] |=3D (u64)oldkey.skey << 8; } } if (m3 & SSKE_MB) { @@ -1082,7 +1041,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) bool mr =3D false, mc =3D false, nq; int reg1, reg2; unsigned long start, end; - unsigned char key; + union skey key; =20 vcpu->stat.instruction_pfmf++; =20 @@ -1110,7 +1069,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 nq =3D vcpu->run->s.regs.gprs[reg1] & PFMF_NQ; - key =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); =20 @@ -1141,14 +1100,6 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr; - bool unlocked =3D false; - - /* Translate guest address to host address */ - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (vcpu->run->s.regs.gprs[reg1] & PFMF_CF) { if (kvm_clear_guest(vcpu->kvm, start, PAGE_SIZE)) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); @@ -1159,19 +1110,17 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) =20 if (rc) return rc; - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, - key, NULL, nq, mr, mc); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, + NULL, nq, mr, mc); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc > 1) + return kvm_s390_inject_program_int(vcpu, rc); + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; } @@ -1195,8 +1144,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc) { int r1, r2, nappended, entries; - unsigned long gfn, hva, res, pgstev, ptev; + union essa_state state; unsigned long *cbrlo; + unsigned long gfn; + bool dirtied; =20 /* * We don't need to set SD.FPF.SK to 1 here, because if we have a @@ -1205,33 +1156,12 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) =20 kvm_s390_get_regs_rre(vcpu, &r1, &r2); gfn =3D vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT; - hva =3D gfn_to_hva(vcpu->kvm, gfn); entries =3D (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; =20 - if (kvm_is_error_hva(hva)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - nappended =3D pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev); - if (nappended < 0) { - res =3D orc ? 0x10 : 0; - vcpu->run->s.regs.gprs[r1] =3D res; /* Exception Indication */ + nappended =3D dat_perform_essa(vcpu->arch.gmap->asce, gfn, orc, &state, &= dirtied); + vcpu->run->s.regs.gprs[r1] =3D state.val; + if (nappended < 0) return 0; - } - res =3D (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22; - /* - * Set the block-content state part of the result. 0 means resident, so - * nothing to do if the page is valid. 2 is for preserved pages - * (non-present and non-zero), and 3 for zero pages (non-present and - * zero). - */ - if (ptev & _PAGE_INVALID) { - res |=3D 2; - if (pgstev & _PGSTE_GPS_ZERO) - res |=3D 1; - } - if (pgstev & _PGSTE_GPS_NODAT) - res |=3D 0x20; - vcpu->run->s.regs.gprs[r1] =3D res; /* * It is possible that all the normal 511 slots were full, in which case * we will now write in the 512th slot, which is reserved for host use. @@ -1243,17 +1173,34 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) cbrlo[entries] =3D gfn << PAGE_SHIFT; } =20 - if (orc) { - struct kvm_memory_slot *ms =3D gfn_to_memslot(vcpu->kvm, gfn); - - /* Increment only if we are really flipping the bit */ - if (ms && !test_and_set_bit(gfn - ms->base_gfn, kvm_second_dirty_bitmap(= ms))) - atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); - } + if (dirtied) + atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); =20 return nappended; } =20 +static void _essa_clear_cbrl(struct kvm_vcpu *vcpu, unsigned long *cbrl, i= nt len) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int i; + + lockdep_assert_held(&vcpu->kvm->mmu_lock); + + for (i =3D 0; i < len; i++) { + if (dat_entry_walk(NULL, gpa_to_gfn(cbrl[i]), vcpu->arch.gmap->asce, + 0, TABLE_TYPE_PAGE_TABLE, &crstep, &ptep)) + continue; + if (!ptep || ptep->s.pr) + continue; + pgste =3D pgste_get_lock(ptep); + if (pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED || pgste.zero) + gmap_helper_zap_one_page(vcpu->kvm->mm, cbrl[i]); + pgste_set_unlock(ptep, pgste); + } +} + static int handle_essa(struct kvm_vcpu *vcpu) { lockdep_assert_held(&vcpu->kvm->srcu); @@ -1261,11 +1208,9 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* entries expected to be 1FF */ int entries =3D (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; unsigned long *cbrlo; - struct gmap *gmap; int i, orc; =20 VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries); - gmap =3D vcpu->arch.gmap; vcpu->stat.instruction_essa++; if (!vcpu->kvm->arch.use_cmma) return kvm_s390_inject_program_int(vcpu, PGM_OPERATION); @@ -1289,11 +1234,7 @@ static int handle_essa(struct kvm_vcpu *vcpu) * value really needs to be written to; if the value is * already correct, we do nothing and avoid the lock. */ - if (vcpu->kvm->mm->context.uses_cmm =3D=3D 0) { - mmap_write_lock(vcpu->kvm->mm); - vcpu->kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(vcpu->kvm->mm); - } + set_bit(GMAP_FLAG_USES_CMM, &vcpu->arch.gmap->flags); /* * If we are here, we are supposed to have CMMA enabled in * the SIE block. Enabling CMMA works on a per-CPU basis, @@ -1307,20 +1248,22 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* Retry the ESSA instruction */ kvm_s390_retry_instr(vcpu); } else { - mmap_read_lock(vcpu->kvm->mm); - i =3D __do_essa(vcpu, orc); - mmap_read_unlock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + i =3D __do_essa(vcpu, orc); if (i < 0) return i; /* Account for the possible extra cbrl entry */ entries +=3D i; } - vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; /* reset nceo */ + /* reset nceo */ + vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; cbrlo =3D phys_to_virt(vcpu->arch.sie_block->cbrlo); - mmap_read_lock(gmap->mm); - for (i =3D 0; i < entries; ++i) - __gmap_zap(gmap, cbrlo[i]); - mmap_read_unlock(gmap->mm); + + mmap_read_lock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + _essa_clear_cbrl(vcpu, cbrlo, entries); + mmap_read_unlock(vcpu->kvm->mm); + return 0; } =20 diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index 6ba5a0305e25..a48a8afd40df 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -12,13 +12,16 @@ #include #include #include -#include #include #include #include #include #include #include "kvm-s390.h" +#include "dat.h" +#include "gaccess.h" +#include "gmap.h" +#include "faultin.h" =20 bool kvm_s390_pv_is_protected(struct kvm *kvm) { @@ -34,6 +37,85 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_s390_pv_cpu_is_protected); =20 +/** + * should_export_before_import() - Determine whether an export is needed + * before an import-like operation. + * @uvcb: The Ultravisor control block of the UVC to be performed. + * @mm: The mm of the process. + * + * Returns whether an export is needed before every import-like operation. + * This is needed for shared pages, which don't trigger a secure storage + * exception when accessed from a different guest. + * + * Although considered as one, the Unpin Page UVC is not an actual import, + * so it is not affected. + * + * No export is needed also when there is only one protected VM, because t= he + * page cannot belong to the wrong VM in that case (there is no "other VM" + * it can belong to). + * + * Return: %true if an export is needed before every import, otherwise %fa= lse. + */ +static bool should_export_before_import(struct uv_cb_header *uvcb, struct = mm_struct *mm) +{ + /* + * The misc feature indicates, among other things, that importing a + * shared page from a different protected VM will automatically also + * transfer its ownership. + */ + if (uv_has_feature(BIT_UV_FEAT_MISC)) + return false; + if (uvcb->cmd =3D=3D UVC_CMD_UNPIN_PAGE_SHARED) + return false; + return atomic_read(&mm->context.protected_count) > 1; +} + +struct pv_make_secure { + void *uvcb; + struct folio *folio; + int rc; + bool needs_export; +}; + +static int __kvm_s390_pv_make_secure(struct guest_fault *f, struct folio *= folio) +{ + struct pv_make_secure *priv =3D f->priv; + int rc; + + if (priv->needs_export) + uv_convert_from_secure(folio_to_phys(folio)); + + if (folio_test_hugetlb(folio)) + return -EFAULT; + if (folio_test_large(folio)) + return -E2BIG; + + if (!f->page) + folio_get(folio); + rc =3D __make_folio_secure(folio, priv->uvcb); + if (!f->page) + folio_put(folio); + + return rc; +} + +static void _kvm_s390_pv_make_secure(struct guest_fault *f) +{ + struct pv_make_secure *priv =3D f->priv; + struct folio *folio; + + folio =3D pfn_folio(f->pfn); + priv->rc =3D -EAGAIN; + if (folio_trylock(folio)) { + priv->rc =3D __kvm_s390_pv_make_secure(f, folio); + if (priv->rc =3D=3D -E2BIG || priv->rc =3D=3D -EBUSY) { + priv->folio =3D folio; + folio_get(folio); + } + folio_unlock(folio); + } +} + /** * kvm_s390_pv_make_secure() - make one guest page secure * @kvm: the guest @@ -45,14 +127,34 @@ EXPORT_SYMBOL_GPL(kvm_s390_pv_cpu_is_protected); */ int kvm_s390_pv_make_secure(struct kvm *kvm, unsigned long gaddr, void *uv= cb) { - unsigned long vmaddr; + struct pv_make_secure priv =3D { .uvcb =3D uvcb }; + struct guest_fault f =3D { + .write_attempt =3D true, + .gfn =3D gpa_to_gfn(gaddr), + .callback =3D _kvm_s390_pv_make_secure, + .priv =3D &priv, + }; + int rc; =20 lockdep_assert_held(&kvm->srcu); =20 - vmaddr =3D gfn_to_hva(kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return -EFAULT; - return make_hva_secure(kvm->mm, vmaddr, uvcb); + priv.needs_export =3D should_export_before_import(uvcb, kvm->mm); + + scoped_guard(mutex, &kvm->arch.pv.import_lock) { + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &f); + + if (!rc) { + rc =3D priv.rc; + if (priv.folio) { + rc =3D s390_wiggle_split_folio(kvm->mm, priv.folio); + if (!rc) + rc =3D -EAGAIN; + } + } + } + if (priv.folio) + folio_put(priv.folio); + return rc; } =20 int kvm_s390_pv_convert_to_secure(struct kvm *kvm, unsigned long gaddr) @@ -299,35 +401,6 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm= *kvm, return 0; } =20 -/** - * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest me= mory. - * @kvm: the VM whose memory is to be cleared. - * - * Destroy the first 2GB of guest memory, to avoid prefix issues after reb= oot. - * The CPUs of the protected VM need to be destroyed beforehand. - */ -static void kvm_s390_destroy_lower_2g(struct kvm *kvm) -{ - const unsigned long pages_2g =3D SZ_2G / PAGE_SIZE; - struct kvm_memory_slot *slot; - unsigned long len; - int srcu_idx; - - srcu_idx =3D srcu_read_lock(&kvm->srcu); - - /* Take the memslot containing guest absolute address 0 */ - slot =3D gfn_to_memslot(kvm, 0); - /* Clear all slots or parts thereof that are below 2GB */ - while (slot && slot->base_gfn < pages_2g) { - len =3D min_t(u64, slot->npages, pages_2g - slot->base_gfn) * PAGE_SIZE; - s390_uv_destroy_range(kvm->mm, slot->userspace_addr, slot->userspace_add= r + len); - /* Take the next memslot */ - slot =3D gfn_to_memslot(kvm, slot->base_gfn + slot->npages); - } - - srcu_read_unlock(&kvm->srcu, srcu_idx); -} - static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, u16 *rc, u16 *rrc) { struct uv_cb_destroy_fast uvcb =3D { @@ -342,7 +415,6 @@ static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, = u16 *rc, u16 *rrc) *rc =3D uvcb.header.rc; if (rrc) *rrc =3D uvcb.header.rrc; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM FAST: rc %x rrc %x", uvcb.header.rc, uvcb.header.rrc); WARN_ONCE(cc && uvcb.header.rc !=3D 0x104, @@ -391,7 +463,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return -EINVAL; =20 /* Guest with segment type ASCE, refuse to destroy asynchronously */ - if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) + if (kvm->arch.gmap->asce.dt =3D=3D TABLE_TYPE_SEGMENT) return -EINVAL; =20 priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); @@ -404,8 +476,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) priv->stor_var =3D kvm->arch.pv.stor_var; priv->stor_base =3D kvm->arch.pv.stor_base; priv->handle =3D kvm_s390_pv_get_handle(kvm); - priv->old_gmap_table =3D (unsigned long)kvm->arch.gmap->table; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); + priv->old_gmap_table =3D (unsigned long)dereference_asce(kvm->arch.gmap-= >asce); if (s390_replace_asce(kvm->arch.gmap)) res =3D -ENOMEM; } @@ -415,7 +486,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return res; } =20 - kvm_s390_destroy_lower_2g(kvm); + gmap_pv_destroy_range(kvm->arch.gmap, 0, gpa_to_gfn(SZ_2G), false); kvm_s390_clear_pv_state(kvm); kvm->arch.pv.set_aside =3D priv; =20 @@ -449,7 +520,6 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16= *rrc) =20 cc =3D uv_cmd_nodata(kvm_s390_pv_get_handle(kvm), UVC_CMD_DESTROY_SEC_CONF, rc, rrc); - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); if (!cc) { atomic_dec(&kvm->mm->context.protected_count); kvm_s390_pv_dealloc_vm(kvm); @@ -532,7 +602,7 @@ int kvm_s390_pv_deinit_cleanup_all(struct kvm *kvm, u16= *rc, u16 *rrc) * cleanup has been performed. */ if (need_zap && mmget_not_zero(kvm->mm)) { - s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE); + gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asce),= false); mmput(kvm->mm); } =20 @@ -570,7 +640,7 @@ int kvm_s390_pv_deinit_aside_vm(struct kvm *kvm, u16 *r= c, u16 *rrc) return -EINVAL; =20 /* When a fatal signal is received, stop immediately */ - if (s390_uv_destroy_range_interruptible(kvm->mm, 0, TASK_SIZE_MAX)) + if (gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asc= e), true)) goto done; if (kvm_s390_pv_dispose_one_leftover(kvm, p, rc, rrc)) ret =3D -EIO; @@ -609,6 +679,7 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu= _notifier *subscription, r =3D kvm_s390_cpus_from_pv(kvm, &dummy, &dummy); if (!r && is_destroy_fast_available() && kvm_s390_pv_get_handle(kvm)) kvm_s390_pv_deinit_vm_fast(kvm, &dummy, &dummy); + set_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &kvm->arch.gmap->flags); } =20 static const struct mmu_notifier_ops kvm_s390_pv_mmu_notifier_ops =3D { @@ -642,7 +713,7 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) /* Inputs */ uvcb.guest_stor_origin =3D 0; /* MSO is 0 for KVM */ uvcb.guest_stor_len =3D kvm->arch.pv.guest_len; - uvcb.guest_asce =3D kvm->arch.gmap->asce; + uvcb.guest_asce =3D kvm->arch.gmap->asce.val; uvcb.guest_sca =3D virt_to_phys(kvm->arch.sca); uvcb.conf_base_stor_origin =3D virt_to_phys((void *)kvm->arch.pv.stor_base); @@ -669,7 +740,6 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) } return -EIO; } - kvm->arch.gmap->guest_handle =3D uvcb.guest_handle; return 0; } =20 @@ -704,26 +774,14 @@ static int unpack_one(struct kvm *kvm, unsigned long = addr, u64 tweak, .tweak[1] =3D offset, }; int ret =3D kvm_s390_pv_make_secure(kvm, addr, &uvcb); - unsigned long vmaddr; - bool unlocked; =20 *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; =20 if (ret =3D=3D -ENXIO) { - mmap_read_lock(kvm->mm); - vmaddr =3D gfn_to_hva(kvm, gpa_to_gfn(addr)); - if (kvm_is_error_hva(vmaddr)) { - ret =3D -EFAULT; - } else { - ret =3D fixup_user_fault(kvm->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!ret) - ret =3D __gmap_link(kvm->arch.gmap, addr, vmaddr); - } - mmap_read_unlock(kvm->mm); + ret =3D kvm_s390_faultin_gfn_simple(NULL, kvm, gpa_to_gfn(addr), true); if (!ret) return -EAGAIN; - return ret; } =20 if (ret && ret !=3D -EAGAIN) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 1dd54ca3070a..faf8b01fa672 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -15,7 +15,6 @@ #include #include =20 -#include #include #include #include @@ -23,6 +22,7 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" =20 enum vsie_page_flags { VSIE_PAGE_IN_USE =3D 0, @@ -41,8 +41,11 @@ struct vsie_page { * are reused conditionally, should be accessed via READ_ONCE. */ struct kvm_s390_sie_block *scb_o; /* 0x0218 */ - /* the shadow gmap in use by the vsie_page */ - struct gmap *gmap; /* 0x0220 */ + /* + * Flags: must be set/cleared atomically after the vsie page can be + * looked up by other CPUs. + */ + unsigned long flags; /* 0x0220 */ /* address of the last reported fault to guest2 */ unsigned long fault_addr; /* 0x0228 */ /* calculated guest addresses of satellite control blocks */ @@ -57,33 +60,14 @@ struct vsie_page { * radix tree. */ gpa_t scb_gpa; /* 0x0258 */ - /* - * Flags: must be set/cleared atomically after the vsie page can be - * looked up by other CPUs. - */ - unsigned long flags; /* 0x0260 */ - __u8 reserved[0x0700 - 0x0268]; /* 0x0268 */ + /* the shadow gmap in use by the vsie_page */ + struct gmap_cache gmap_cache; /* 0x0260 */ + __u8 reserved[0x0700 - 0x0278]; /* 0x0278 */ struct kvm_s390_crypto_cb crycb; /* 0x0700 */ __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ }; =20 -/** - * gmap_shadow_valid() - check if a shadow guest address space matches the - * given properties and is still valid - * @sg: pointer to the shadow guest address space structure - * @asce: ASCE for which the shadow table is requested - * @edat_level: edat level to be used for the shadow translation - * - * Returns 1 if the gmap shadow is still valid and matches the given - * properties, the caller can continue using it. Returns 0 otherwise; the - * caller has to request a new shadow gmap in this case. - */ -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level) -{ - if (sg->removed) - return 0; - return sg->orig_asce =3D=3D asce && sg->edat_level =3D=3D edat_level; -} +static_assert(sizeof(struct vsie_page) =3D=3D PAGE_SIZE); =20 /* trigger a validity icpt for the given scb */ static int set_validity_icpt(struct kvm_s390_sie_block *scb, @@ -612,26 +596,17 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) return rc; } =20 -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end) { - struct kvm *kvm =3D gmap->private; - struct vsie_page *cur; + struct vsie_page *cur, *next; unsigned long prefix; - int i; =20 - if (!gmap_is_shadow(gmap)) - return; + KVM_BUG_ON(!test_bit(GMAP_FLAG_SHADOW, &gmap->flags), gmap->kvm); /* * Only new shadow blocks are added to the list during runtime, * therefore we can safely reference them all the time. */ - for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { - cur =3D READ_ONCE(kvm->arch.vsie.pages[i]); - if (!cur) - continue; - if (READ_ONCE(cur->gmap) !=3D gmap) - continue; + list_for_each_entry_safe(cur, next, &gmap->scb_users, gmap_cache.list) { prefix =3D cur->scb_s.prefix << GUEST_PREFIX_SHIFT; /* with mso/msl, the prefix lies at an offset */ prefix +=3D cur->scb_s.mso; @@ -667,9 +642,9 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vsi= e_page *vsie_page, struct /* with mso/msl, the prefix lies at offset *mso* */ prefix +=3D scb_s->mso; =20 - rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, prefix, NULL, true); if (!rc && (scb_s->ecb & ECB_TE)) - rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL, true); /* * We don't have to mprotect, we will be called for all unshadows. * SIE will detect if protection applies and trigger a validity. @@ -952,6 +927,7 @@ static int inject_fault(struct kvm_vcpu *vcpu, __u16 co= de, __u64 vaddr, */ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page= , struct gmap *sg) { + bool wr =3D kvm_s390_cur_gmap_fault_is_write(); int rc; =20 if ((current->thread.gmap_int_code & PGM_INT_CODE_MASK) =3D=3D PGM_PROTEC= TION) @@ -959,11 +935,10 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct= vsie_page *vsie_page, stru return inject_fault(vcpu, PGM_PROTECTION, current->thread.gmap_teid.addr * PAGE_SIZE, 1); =20 - rc =3D kvm_s390_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * P= AGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * PA= GE_SIZE, NULL, wr); if (rc > 0) { rc =3D inject_fault(vcpu, rc, - current->thread.gmap_teid.addr * PAGE_SIZE, - kvm_s390_cur_gmap_fault_is_write()); + current->thread.gmap_teid.addr * PAGE_SIZE, wr); if (rc >=3D 0) vsie_page->fault_addr =3D current->thread.gmap_teid.addr * PAGE_SIZE; } @@ -979,7 +954,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page, stru static void handle_last_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsi= e_page, struct gmap *sg) { if (vsie_page->fault_addr) - kvm_s390_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL); + gaccess_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL, true); vsie_page->fault_addr =3D 0; } =20 @@ -1064,8 +1039,9 @@ static u64 vsie_get_register(struct kvm_vcpu *vcpu, s= truct vsie_page *vsie_page, static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page, struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; - unsigned long pei_dest, pei_src, src, dest, mask, prefix; + unsigned long src, dest, mask, prefix; u64 *pei_block =3D &vsie_page->scb_o->mcic; + union mvpg_pei pei_dest, pei_src; int edat, rc_dest, rc_src; union ctlreg0 cr0; =20 @@ -1079,8 +1055,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page, src =3D vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; src =3D _kvm_s390_real_to_abs(prefix, src) + scb_s->mso; =20 - rc_dest =3D kvm_s390_shadow_fault(vcpu, sg, dest, &pei_dest); - rc_src =3D kvm_s390_shadow_fault(vcpu, sg, src, &pei_src); + rc_dest =3D gaccess_shadow_fault(vcpu, sg, dest, &pei_dest, true); + rc_src =3D gaccess_shadow_fault(vcpu, sg, src, &pei_src, false); /* * Either everything went well, or something non-critical went wrong * e.g. because of a race. In either case, simply retry. @@ -1115,8 +1091,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page, rc_src =3D rc_src !=3D PGM_PAGE_TRANSLATION ? rc_src : 0; } if (!rc_dest && !rc_src) { - pei_block[0] =3D pei_dest; - pei_block[1] =3D pei_src; + pei_block[0] =3D pei_dest.val; + pei_block[1] =3D pei_src.val; return 1; } =20 @@ -1187,7 +1163,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page, struc goto xfer_to_guest_mode_check; } guest_timing_enter_irqoff(); - rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce); + rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce.v= al); guest_timing_exit_irqoff(); local_irq_enable(); } @@ -1237,43 +1213,63 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struc= t vsie_page *vsie_page, struc =20 static void release_gmap_shadow(struct vsie_page *vsie_page) { - if (vsie_page->gmap) - gmap_put(vsie_page->gmap); - WRITE_ONCE(vsie_page->gmap, NULL); + struct gmap *gmap =3D vsie_page->gmap_cache.gmap; + + lockdep_assert_held(&gmap->kvm->arch.gmap->children_lock); + + list_del(&vsie_page->gmap_cache.list); + vsie_page->gmap_cache.gmap =3D NULL; prefix_unmapped(vsie_page); + + if (list_empty(&gmap->scb_users)) { + gmap_remove_child(gmap); + gmap_put(gmap); + } } =20 -static int acquire_gmap_shadow(struct kvm_vcpu *vcpu, - struct vsie_page *vsie_page) +static struct gmap *acquire_gmap_shadow(struct kvm_vcpu *vcpu, struct vsie= _page *vsie_page) { - unsigned long asce; union ctlreg0 cr0; struct gmap *gmap; + union asce asce; int edat; =20 - asce =3D vcpu->arch.sie_block->gcr[1]; + asce.val =3D vcpu->arch.sie_block->gcr[1]; cr0.val =3D vcpu->arch.sie_block->gcr[0]; edat =3D cr0.edat && test_kvm_facility(vcpu->kvm, 8); edat +=3D edat && test_kvm_facility(vcpu->kvm, 78); =20 - /* - * ASCE or EDAT could have changed since last icpt, or the gmap - * we're holding has been unshadowed. If the gmap is still valid, - * we can safely reuse it. - */ - if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat)) { - vcpu->kvm->stat.gmap_shadow_reuse++; - return 0; + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + gmap =3D vsie_page->gmap_cache.gmap; + if (gmap) { + /* + * ASCE or EDAT could have changed since last icpt, or the gmap + * we're holding has been unshadowed. If the gmap is still valid, + * we can safely reuse it. + */ + if (gmap_is_shadow_valid(gmap, asce, edat)) { + vcpu->kvm->stat.gmap_shadow_reuse++; + gmap_get(gmap); + return gmap; + } + /* release the old shadow and mark the prefix as unmapped */ + release_gmap_shadow(vsie_page); + } } - - /* release the old shadow - if any, and mark the prefix as unmapped */ - release_gmap_shadow(vsie_page); - gmap =3D gmap_shadow(vcpu->arch.gmap, asce, edat); + gmap =3D gmap_create_shadow(vcpu->arch.mc, vcpu->kvm->arch.gmap, asce, ed= at); if (IS_ERR(gmap)) - return PTR_ERR(gmap); - vcpu->kvm->stat.gmap_shadow_create++; - WRITE_ONCE(vsie_page->gmap, gmap); - return 0; + return gmap; + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + /* unlikely race condition, remove the previous shadow */ + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); + vcpu->kvm->stat.gmap_shadow_create++; + list_add(&vsie_page->gmap_cache.list, &gmap->scb_users); + vsie_page->gmap_cache.gmap =3D gmap; + prefix_unmapped(vsie_page); + gmap_get(gmap); + } + return gmap; } =20 /* @@ -1330,8 +1326,11 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vs= ie_page *vsie_page) int rc =3D 0; =20 while (1) { - rc =3D acquire_gmap_shadow(vcpu, vsie_page); - sg =3D vsie_page->gmap; + sg =3D acquire_gmap_shadow(vcpu, vsie_page); + if (IS_ERR(sg)) { + rc =3D PTR_ERR(sg); + sg =3D NULL; + } if (!rc) rc =3D map_prefix(vcpu, vsie_page, sg); if (!rc) { @@ -1359,6 +1358,9 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsi= e_page *vsie_page) kvm_s390_rewind_psw(vcpu, 4); break; } + if (sg) + sg =3D gmap_put(sg); + cond_resched(); } =20 if (rc =3D=3D -EFAULT) { @@ -1455,8 +1457,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kv= m, unsigned long addr) vsie_page->scb_gpa =3D ULONG_MAX; =20 /* Double use of the same address or allocation failure. */ - if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, - vsie_page)) { + if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, vsie_page)= ) { put_vsie_page(vsie_page); mutex_unlock(&kvm->arch.vsie.mutex); return NULL; @@ -1465,7 +1466,12 @@ static struct vsie_page *get_vsie_page(struct kvm *k= vm, unsigned long addr) mutex_unlock(&kvm->arch.vsie.mutex); =20 memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); - release_gmap_shadow(vsie_page); + if (vsie_page->gmap_cache.gmap) { + scoped_guard(spinlock, &kvm->arch.gmap->children_lock) + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); + } + prefix_unmapped(vsie_page); vsie_page->fault_addr =3D 0; vsie_page->scb_s.ihcpu =3D 0xffffU; return vsie_page; @@ -1541,8 +1547,10 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) mutex_lock(&kvm->arch.vsie.mutex); for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { vsie_page =3D kvm->arch.vsie.pages[i]; + scoped_guard(spinlock, &kvm->arch.gmap->children_lock) + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); kvm->arch.vsie.pages[i] =3D NULL; - release_gmap_shadow(vsie_page); /* free the radix tree entry */ if (vsie_page->scb_gpa !=3D ULONG_MAX) radix_tree_delete(&kvm->arch.vsie.addr_to_page, diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c index 1a6ba105e071..0ac2f3998b14 100644 --- a/arch/s390/lib/uaccess.c +++ b/arch/s390/lib/uaccess.c @@ -34,136 +34,19 @@ void debug_user_asce(int exit) } #endif /*CONFIG_DEBUG_ENTRY */ =20 -union oac { - unsigned int val; - struct { - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac1; - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac2; - }; -}; - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_from_user_key(void *to, const void __user *from, unsigned long si= ze, unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac2.key =3D key, - .oac2.as =3D PSW_BITS_AS_SECONDARY, - .oac2.k =3D 1, - .oac2.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_FROM(0b, 0b) - EX_TABLE_UA_MVCOS_FROM(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char __user *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_from_user_key(void *to, const void __user *from, - unsigned long n, unsigned long key) -{ - unsigned long res =3D n; - - might_fault(); - if (!should_fail_usercopy()) { - instrument_copy_from_user_before(to, from, n); - res =3D raw_copy_from_user_key(to, from, n, key); - instrument_copy_from_user_after(to, from, n, res); - } - if (unlikely(res)) - memset(to + (n - res), 0, res); - return res; -} -EXPORT_SYMBOL(_copy_from_user_key); - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_to_user_key(void __user *to, const void *from, unsigned long size= , unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac1.key =3D key, - .oac1.as =3D PSW_BITS_AS_SECONDARY, - .oac1.k =3D 1, - .oac1.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_TO(0b, 0b) - EX_TABLE_UA_MVCOS_TO(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char __user *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_to_user_key(void __user *to, const void *from, - unsigned long n, unsigned long key) -{ - might_fault(); - if (should_fail_usercopy()) - return n; - instrument_copy_to_user(to, from, n); - return raw_copy_to_user_key(to, from, n, key); -} -EXPORT_SYMBOL(_copy_to_user_key); - #define CMPXCHG_USER_KEY_MAX_LOOPS 128 =20 -static nokprobe_inline int __cmpxchg_user_key_small(unsigned long address,= unsigned int *uval, - unsigned int old, unsigned int new, - unsigned int mask, unsigned long key) +static nokprobe_inline int __cmpxchg_key_small(void *address, unsigned int= *uval, + unsigned int old, unsigned int new, + unsigned int mask, unsigned long key) { unsigned long count; unsigned int prev; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" " llill %[count],%[max_loops]\n" "0: l %[prev],%[address]\n" "1: nr %[prev],%[mask]\n" @@ -178,8 +61,7 @@ static nokprobe_inline int __cmpxchg_user_key_small(unsi= gned long address, unsig " nr %[tmp],%[mask]\n" " jnz 5f\n" " brct %[count],2b\n" - "5: sacf 768\n" - " spka %[default_key]\n" + "5: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 5b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 5b, %[rc], %[prev]) @@ -197,16 +79,16 @@ static nokprobe_inline int __cmpxchg_user_key_small(un= signed long address, unsig [default_key] "J" (PAGE_DEFAULT_KEY), [max_loops] "J" (CMPXCHG_USER_KEY_MAX_LOOPS) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; if (!count) rc =3D -EAGAIN; return rc; } =20 -int __kprobes __cmpxchg_user_key1(unsigned long address, unsigned char *uv= al, - unsigned char old, unsigned char new, unsigned long key) +int __kprobes __cmpxchg_key1(void *addr, unsigned char *uval, unsigned cha= r old, + unsigned char new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -215,15 +97,16 @@ int __kprobes __cmpxchg_user_key1(unsigned long addres= s, unsigned char *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key1); +EXPORT_SYMBOL(__cmpxchg_key1); =20 -int __kprobes __cmpxchg_user_key2(unsigned long address, unsigned short *u= val, - unsigned short old, unsigned short new, unsigned long key) +int __kprobes __cmpxchg_key2(void *addr, unsigned short *uval, unsigned sh= ort old, + unsigned short new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -232,27 +115,23 @@ int __kprobes __cmpxchg_user_key2(unsigned long addre= ss, unsigned short *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xffff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key2); +EXPORT_SYMBOL(__cmpxchg_key2); =20 -int __kprobes __cmpxchg_user_key4(unsigned long address, unsigned int *uva= l, - unsigned int old, unsigned int new, unsigned long key) +int __kprobes __cmpxchg_key4(void *address, unsigned int *uval, unsigned i= nt old, + unsigned int new, unsigned long key) { unsigned int prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cs %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -264,27 +143,22 @@ int __kprobes __cmpxchg_user_key4(unsigned long addre= ss, unsigned int *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key4); +EXPORT_SYMBOL(__cmpxchg_key4); =20 -int __kprobes __cmpxchg_user_key8(unsigned long address, unsigned long *uv= al, - unsigned long old, unsigned long new, unsigned long key) +int __kprobes __cmpxchg_key8(void *address, unsigned long *uval, unsigned = long old, + unsigned long new, unsigned long key) { unsigned long prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: csg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -296,27 +170,22 @@ int __kprobes __cmpxchg_user_key8(unsigned long addre= ss, unsigned long *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key8); +EXPORT_SYMBOL(__cmpxchg_key8); =20 -int __kprobes __cmpxchg_user_key16(unsigned long address, __uint128_t *uva= l, - __uint128_t old, __uint128_t new, unsigned long key) +int __kprobes __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_= t old, + __uint128_t new, unsigned long key) { __uint128_t prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cdsg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REGPAIR(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REGPAIR(1b, 1b, %[rc], %[prev]) @@ -328,8 +197,7 @@ int __kprobes __cmpxchg_user_key16(unsigned long addres= s, __uint128_t *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key16); +EXPORT_SYMBOL(__cmpxchg_key16); diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index 4864cb35fc25..d653c64b869a 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -34,28 +34,6 @@ static void ptep_zap_softleaf_entry(struct mm_struct *mm= , softleaf_t entry) free_swap_and_cache(entry); } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm @@ -68,9 +46,7 @@ static inline void pgste_set_unlock(pte_t *ptep, pgste_t = pgste) void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr) { struct vm_area_struct *vma; - unsigned long pgstev; spinlock_t *ptl; - pgste_t pgste; pte_t *ptep; =20 mmap_assert_locked(mm); @@ -85,18 +61,8 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsi= gned long vmaddr) if (unlikely(!ptep)) return; if (pte_swap(*ptep)) { - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - - if ((pgstev & _PGSTE_GPS_USAGE_MASK) =3D=3D _PGSTE_GPS_USAGE_UNUSED || - (pgstev & _PGSTE_GPS_ZERO)) { - ptep_zap_softleaf_entry(mm, softleaf_from_pte(*ptep)); - pte_clear(mm, vmaddr, ptep); - } - - pgste_set_unlock(ptep, pgste); - preempt_enable(); + ptep_zap_softleaf_entry(mm, softleaf_from_pte(*ptep)); + pte_clear(mm, vmaddr, ptep); } pte_unmap_unlock(ptep, ptl); } --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4135C42DFE5; Wed, 4 Feb 2026 15:03:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217403; cv=none; b=IgMNAi+OYtTs9cK3q5cMtVsDMHZW8ncXroxQNcbU5V7oujXyi3xsIuXSxV1BEu5EKnJ8cOigOnU2tvEUC7RpZe4Mc3tFAPZuFnl2xG6eyhfexfNY2T231IKdn/6sdf08RZJUsiu8G4EK5lHIz73XTQTretTQGMejJ0aufbAsU9o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217403; c=relaxed/simple; bh=WLZ2fJLpaaxPvrXYo4djgnhlTRAD6iUmDEBHkmA2kmg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lMADXiduCaQI7KwEunPOufK6Vr92OFbOnKSnzqFLwDTuvY3Ee2lpnxAhUqChdcQ4rpbNqAUtnlL/DajsCegeCfYvJq/5japAhH4646XqQ43PAG40vVbhoN4SPtHdr07xqAlPtqxsJF1VTMLt/LZe1dqGc8oYMHUWrfZuDjtleio= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=sDnnsy9b; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="sDnnsy9b" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 614EsmVc006454; Wed, 4 Feb 2026 15:03:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=LqSF23NOlRiz9DzP6 Rg0sAxRr2R/wiqi1F0XFPQuEik=; b=sDnnsy9bV5ii2MyX3J6LmQ3C07oogswta ONy9eWl5IV6MDwNm7BVS/zwzWyNnD03qfiiHcgKS4Y9MzM27AZ1DUnglTTt7dA23 wKZFZ8Nwu4XJvc/JgwxBuEesnbxyEmlA8XOUYYeqdH0YEy9onHc4ur++9g2r0AHO 5nTGJ7gFBkp+ZHH0Rn6XIGjpHW0sh6cBpx6AlBbDtv5xATk2C+B+8WsFWphyIZxB yKFkV4uVudyEmgO4RGuwu4w+V3SORGE4ih4qClSFm3YISurjJH+t8Vkb7t/wfPss aa43zpK5wsDFnQOdc3t7bZRCsn0wJ/+tneI3LuCMxSZ1Z7lvBlKRg== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19cw7rp9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:13 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614BrrhL009163; Wed, 4 Feb 2026 15:03:12 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1vey5s39-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F37M759113850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:07 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F43F20040; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4F81E2004B; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 25/29] KVM: s390: Remove gmap from s390/mm Date: Wed, 4 Feb 2026 16:02:54 +0100 Message-ID: <20260204150259.60425-26-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX44+7sVUj/y+X I+mSCHNQa9Uj2BuY1/G5QF8UxMbH56LXAgIMXaSorjQzGrX96UkMtH3lzDJlyoGuLuJ3IXGvG4u q+5HSr4L3dDZTLzLE3DTKdq1cUSQCZgKlsX6I6XVsVvt4CdUr6DSd8jMxZVO2JjbR627jcJEKFr HTmiwJ+I08er3vCXFNosYjvkckcUShV4cKIXAeVPTpPdznHuHk2EtnNlUrlibqwqg6QOX/mrvJa QBdMZ63A6r45NHzvv07I278hGxSdroW/hT+LFJbR47k0HrbMQTcGYViowXCrlZfDaepdBaFkhdg 5GM7zFS7Bnn6Jl/gmxos7WGmySAsfe8T7NTISWZcmMYlLDWaKxnc7WGS8T1I3rSOmz8pCC791ji SW3Ww5lrO4RE8hczJhTlYhfwx/92RJJdXac/ABivFZ1YxFylThaF4InO91Aejf3LqlktM0AqIgc uUAcy2SAdYUuqoE6gfw== X-Authority-Analysis: v=2.4 cv=UuRu9uwB c=1 sm=1 tr=0 ts=69835fb1 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=VwQbUJbxAAAA:8 a=20KFwNOVAAAA:8 a=HgcNs73Y_g91Sx3oYX8A:9 a=i6BJ_KlcRHWnWxX2:21 X-Proofpoint-ORIG-GUID: 0mbwmS_bNxDDHTA-L1lIm_NsbGi8hNH8 X-Proofpoint-GUID: 0mbwmS_bNxDDHTA-L1lIm_NsbGi8hNH8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Remove the now unused include/asm/gmap.h and mm/gmap.c files. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- MAINTAINERS | 2 - arch/s390/include/asm/gmap.h | 174 --- arch/s390/include/asm/pgtable.h | 8 - arch/s390/mm/Makefile | 1 - arch/s390/mm/gmap.c | 2436 ------------------------------- arch/s390/mm/pgtable.c | 8 - 6 files changed, 2629 deletions(-) delete mode 100644 arch/s390/include/asm/gmap.h delete mode 100644 arch/s390/mm/gmap.c diff --git a/MAINTAINERS b/MAINTAINERS index dc731d37c8fe..95448b485fd2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13899,14 +13899,12 @@ L: kvm@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git F: Documentation/virt/kvm/s390* -F: arch/s390/include/asm/gmap.h F: arch/s390/include/asm/gmap_helpers.h F: arch/s390/include/asm/kvm* F: arch/s390/include/uapi/asm/kvm* F: arch/s390/include/uapi/asm/uvdevice.h F: arch/s390/kernel/uv.c F: arch/s390/kvm/ -F: arch/s390/mm/gmap.c F: arch/s390/mm/gmap_helpers.c F: drivers/s390/char/uvdevice.c F: tools/testing/selftests/drivers/s390x/uvdevice/ diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h deleted file mode 100644 index 66c5808fd011..000000000000 --- a/arch/s390/include/asm/gmap.h +++ /dev/null @@ -1,174 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2016 - * Author(s): Martin Schwidefsky - */ - -#ifndef _ASM_S390_GMAP_H -#define _ASM_S390_GMAP_H - -#include -#include - -/* Generic bits for GMAP notification on DAT table entry changes. */ -#define GMAP_NOTIFY_SHADOW 0x2 -#define GMAP_NOTIFY_MPROT 0x1 - -/* Status bits only for huge segment entries */ -#define _SEGMENT_ENTRY_GMAP_IN 0x0800 /* invalidation notify bit */ -#define _SEGMENT_ENTRY_GMAP_UC 0x0002 /* dirty (migration) */ - -/** - * struct gmap_struct - guest address space - * @list: list head for the mm->context gmap list - * @mm: pointer to the parent mm_struct - * @guest_to_host: radix tree with guest to host address translation - * @host_to_guest: radix tree with pointer to segment table entries - * @guest_table_lock: spinlock to protect all entries in the guest page ta= ble - * @ref_count: reference counter for the gmap structure - * @table: pointer to the page directory - * @asce: address space control element for gmap page table - * @pfault_enabled: defines if pfaults are applicable for the guest - * @guest_handle: protected virtual machine handle for the ultravisor - * @host_to_rmap: radix tree with gmap_rmap lists - * @children: list of shadow gmap structures - * @shadow_lock: spinlock to protect the shadow gmap list - * @parent: pointer to the parent gmap for shadow guest address spaces - * @orig_asce: ASCE for which the shadow page table has been created - * @edat_level: edat level to be used for the shadow translation - * @removed: flag to indicate if a shadow guest address space has been rem= oved - * @initialized: flag to indicate if a shadow guest address space can be u= sed - */ -struct gmap { - struct list_head list; - struct mm_struct *mm; - struct radix_tree_root guest_to_host; - struct radix_tree_root host_to_guest; - spinlock_t guest_table_lock; - refcount_t ref_count; - unsigned long *table; - unsigned long asce; - unsigned long asce_end; - void *private; - bool pfault_enabled; - /* only set for protected virtual machines */ - unsigned long guest_handle; - /* Additional data for shadow guest address spaces */ - struct radix_tree_root host_to_rmap; - struct list_head children; - spinlock_t shadow_lock; - struct gmap *parent; - unsigned long orig_asce; - int edat_level; - bool removed; - bool initialized; -}; - -/** - * struct gmap_rmap - reverse mapping for shadow page table entries - * @next: pointer to next rmap in the list - * @raddr: virtual rmap address in the shadow guest address space - */ -struct gmap_rmap { - struct gmap_rmap *next; - unsigned long raddr; -}; - -#define gmap_for_each_rmap(pos, head) \ - for (pos =3D (head); pos; pos =3D pos->next) - -#define gmap_for_each_rmap_safe(pos, n, head) \ - for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) - -/** - * struct gmap_notifier - notify function block for page invalidation - * @notifier_call: address of callback function - */ -struct gmap_notifier { - struct list_head list; - struct rcu_head rcu; - void (*notifier_call)(struct gmap *gmap, unsigned long start, - unsigned long end); -}; - -static inline int gmap_is_shadow(struct gmap *gmap) -{ - return !!gmap->parent; -} - -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit); -void gmap_remove(struct gmap *gmap); -struct gmap *gmap_get(struct gmap *gmap); -void gmap_put(struct gmap *gmap); -void gmap_free(struct gmap *gmap); -struct gmap *gmap_alloc(unsigned long limit); - -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len); -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len); -unsigned long __gmap_translate(struct gmap *, unsigned long gaddr); -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr); -void __gmap_zap(struct gmap *, unsigned long gaddr); -void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long v= maddr); - -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val); - -void gmap_unshadow(struct gmap *sg); -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake); -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake); -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake); -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake); -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte); - -void gmap_register_pte_notifier(struct gmap_notifier *); -void gmap_unregister_pte_notifier(struct gmap_notifier *); - -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits); - -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap= [4], - unsigned long gaddr, unsigned long vmaddr); -int s390_replace_asce(struct gmap *gmap); -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns); -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible); -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level); - -/** - * s390_uv_destroy_range - Destroy a range of pages in the given mm. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls, b= ut - * it will otherwise only return when it completed. - */ -static inline void s390_uv_destroy_range(struct mm_struct *mm, unsigned lo= ng start, - unsigned long end) -{ - (void)__s390_uv_destroy_range(mm, start, end, false); -} - -/** - * s390_uv_destroy_range_interruptible - Destroy a range of pages in the - * given mm, but stop when a fatal signal is received. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls. If - * a fatal signal is received, it will return with -EINTR immediately, - * without finishing destroying the whole range. Upon successful - * completion, 0 is returned. - */ -static inline int s390_uv_destroy_range_interruptible(struct mm_struct *mm= , unsigned long start, - unsigned long end) -{ - return __s390_uv_destroy_range(mm, start, end, true); -} -#endif /* _ASM_S390_GMAP_H */ diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index cd4d135c4503..45f13697cf9e 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1369,8 +1369,6 @@ static inline int ptep_set_access_flags(struct vm_are= a_struct *vma, void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t entry); void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -void ptep_notify(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, unsigned long bits); int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, pte_t *ptep, int prot, unsigned long bit); void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, @@ -1396,9 +1394,6 @@ int set_pgste_bits(struct mm_struct *mm, unsigned lon= g addr, int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, unsigned long *oldpte, unsigned long *oldpgste); -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr); =20 #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); @@ -2023,9 +2018,6 @@ extern int __vmem_map_4k_page(unsigned long addr, uns= igned long phys, pgprot_t p extern int vmem_map_4k_page(unsigned long addr, unsigned long phys, pgprot= _t prot); extern void vmem_unmap_4k_page(unsigned long addr); extern pte_t *vmem_get_alloc_pte(unsigned long addr, bool alloc); -extern int s390_enable_sie(void); -extern int s390_enable_skey(void); -extern void s390_reset_cmma(struct mm_struct *mm); =20 /* s390 has a private copy of get unmapped area to deal with cache synonym= s */ #define HAVE_ARCH_UNMAPPED_AREA diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile index bd0401cc7ca5..193899c39ca7 100644 --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -10,7 +10,6 @@ obj-$(CONFIG_CMM) +=3D cmm.o obj-$(CONFIG_DEBUG_VIRTUAL) +=3D physaddr.o obj-$(CONFIG_HUGETLB_PAGE) +=3D hugetlbpage.o obj-$(CONFIG_PTDUMP) +=3D dump_pagetables.o -obj-$(CONFIG_PGSTE) +=3D gmap.o obj-$(CONFIG_PFAULT) +=3D pfault.o =20 obj-$(subst m,y,$(CONFIG_KVM)) +=3D gmap_helpers.o diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c deleted file mode 100644 index dd85bcca817d..000000000000 --- a/arch/s390/mm/gmap.c +++ /dev/null @@ -1,2436 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2020 - * Author(s): Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * The address is saved in a radix tree directly; NULL would be ambiguous, - * since 0 is a valid address, and NULL is returned when nothing was found. - * The lower bits are ignored by all users of the macro, so it can be used - * to distinguish a valid address 0 from a NULL. - */ -#define VALID_GADDR_FLAG 1 -#define IS_GADDR_VALID(gaddr) ((gaddr) & VALID_GADDR_FLAG) -#define MAKE_VALID_GADDR(gaddr) (((gaddr) & HPAGE_MASK) | VALID_GADDR_FLAG) - -#define GMAP_SHADOW_FAKE_TABLE 1ULL - -static struct page *gmap_alloc_crst(void) -{ - struct page *page; - - page =3D alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); - if (!page) - return NULL; - __arch_set_page_dat(page_to_virt(page), 1UL << CRST_ALLOC_ORDER); - return page; -} - -/** - * gmap_alloc - allocate and initialize a guest address space - * @limit: maximum address of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_alloc(unsigned long limit) -{ - struct gmap *gmap; - struct page *page; - unsigned long *table; - unsigned long etype, atype; - - if (limit < _REGION3_SIZE) { - limit =3D _REGION3_SIZE - 1; - atype =3D _ASCE_TYPE_SEGMENT; - etype =3D _SEGMENT_ENTRY_EMPTY; - } else if (limit < _REGION2_SIZE) { - limit =3D _REGION2_SIZE - 1; - atype =3D _ASCE_TYPE_REGION3; - etype =3D _REGION3_ENTRY_EMPTY; - } else if (limit < _REGION1_SIZE) { - limit =3D _REGION1_SIZE - 1; - atype =3D _ASCE_TYPE_REGION2; - etype =3D _REGION2_ENTRY_EMPTY; - } else { - limit =3D -1UL; - atype =3D _ASCE_TYPE_REGION1; - etype =3D _REGION1_ENTRY_EMPTY; - } - gmap =3D kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT); - if (!gmap) - goto out; - INIT_LIST_HEAD(&gmap->children); - INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT); - spin_lock_init(&gmap->guest_table_lock); - spin_lock_init(&gmap->shadow_lock); - refcount_set(&gmap->ref_count, 1); - page =3D gmap_alloc_crst(); - if (!page) - goto out_free; - table =3D page_to_virt(page); - crst_table_init(table, etype); - gmap->table =3D table; - gmap->asce =3D atype | _ASCE_TABLE_LENGTH | - _ASCE_USER_BITS | __pa(table); - gmap->asce_end =3D limit; - return gmap; - -out_free: - kfree(gmap); -out: - return NULL; -} -EXPORT_SYMBOL_GPL(gmap_alloc); - -/** - * gmap_create - create a guest address space - * @mm: pointer to the parent mm_struct - * @limit: maximum size of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit) -{ - struct gmap *gmap; - unsigned long gmap_asce; - - gmap =3D gmap_alloc(limit); - if (!gmap) - return NULL; - gmap->mm =3D mm; - spin_lock(&mm->context.lock); - list_add_rcu(&gmap->list, &mm->context.gmap_list); - if (list_is_singular(&mm->context.gmap_list)) - gmap_asce =3D gmap->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(mm->context.gmap_asce, gmap_asce); - spin_unlock(&mm->context.lock); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_create); - -static void gmap_flush_tlb(struct gmap *gmap) -{ - __tlb_flush_idte(gmap->asce); -} - -static void gmap_radix_tree_free(struct radix_tree_root *root) -{ - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - radix_tree_delete(root, index); - } - } while (nr > 0); -} - -static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) -{ - struct gmap_rmap *rmap, *rnext, *head; - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - head =3D radix_tree_delete(root, index); - gmap_for_each_rmap_safe(rmap, rnext, head) - kfree(rmap); - } - } while (nr > 0); -} - -static void gmap_free_crst(unsigned long *table, bool free_ptes) -{ - bool is_segment =3D (table[0] & _SEGMENT_ENTRY_TYPE_MASK) =3D=3D 0; - int i; - - if (is_segment) { - if (!free_ptes) - goto out; - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _SEGMENT_ENTRY_INVALID)) - page_table_free_pgste(page_ptdesc(phys_to_page(table[i]))); - } else { - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _REGION_ENTRY_INVALID)) - gmap_free_crst(__va(table[i] & PAGE_MASK), free_ptes); - } - -out: - free_pages((unsigned long)table, CRST_ALLOC_ORDER); -} - -/** - * gmap_free - free a guest address space - * @gmap: pointer to the guest address space structure - * - * No locks required. There are no references to this gmap anymore. - */ -void gmap_free(struct gmap *gmap) -{ - /* Flush tlb of all gmaps (if not already done for shadows) */ - if (!(gmap_is_shadow(gmap) && gmap->removed)) - gmap_flush_tlb(gmap); - /* Free all segment & region tables. */ - gmap_free_crst(gmap->table, gmap_is_shadow(gmap)); - - gmap_radix_tree_free(&gmap->guest_to_host); - gmap_radix_tree_free(&gmap->host_to_guest); - - /* Free additional data for a shadow gmap */ - if (gmap_is_shadow(gmap)) { - gmap_rmap_radix_tree_free(&gmap->host_to_rmap); - /* Release reference to the parent */ - gmap_put(gmap->parent); - } - - kfree(gmap); -} -EXPORT_SYMBOL_GPL(gmap_free); - -/** - * gmap_get - increase reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * Returns the gmap pointer - */ -struct gmap *gmap_get(struct gmap *gmap) -{ - refcount_inc(&gmap->ref_count); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_get); - -/** - * gmap_put - decrease reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * If the reference counter reaches zero the guest address space is freed. - */ -void gmap_put(struct gmap *gmap) -{ - if (refcount_dec_and_test(&gmap->ref_count)) - gmap_free(gmap); -} -EXPORT_SYMBOL_GPL(gmap_put); - -/** - * gmap_remove - remove a guest address space but do not free it yet - * @gmap: pointer to the guest address space structure - */ -void gmap_remove(struct gmap *gmap) -{ - struct gmap *sg, *next; - unsigned long gmap_asce; - - /* Remove all shadow gmaps linked to this gmap */ - if (!list_empty(&gmap->children)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, &gmap->children, list) { - list_del(&sg->list); - gmap_put(sg); - } - spin_unlock(&gmap->shadow_lock); - } - /* Remove gmap from the pre-mm list */ - spin_lock(&gmap->mm->context.lock); - list_del_rcu(&gmap->list); - if (list_empty(&gmap->mm->context.gmap_list)) - gmap_asce =3D 0; - else if (list_is_singular(&gmap->mm->context.gmap_list)) - gmap_asce =3D list_first_entry(&gmap->mm->context.gmap_list, - struct gmap, list)->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(gmap->mm->context.gmap_asce, gmap_asce); - spin_unlock(&gmap->mm->context.lock); - synchronize_rcu(); - /* Put reference */ - gmap_put(gmap); -} -EXPORT_SYMBOL_GPL(gmap_remove); - -/* - * gmap_alloc_table is assumed to be called with mmap_lock held - */ -static int gmap_alloc_table(struct gmap *gmap, unsigned long *table, - unsigned long init, unsigned long gaddr) -{ - struct page *page; - unsigned long *new; - - /* since we dont free the gmap table until gmap_free we can unlock */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - new =3D page_to_virt(page); - crst_table_init(new, init); - spin_lock(&gmap->guest_table_lock); - if (*table & _REGION_ENTRY_INVALID) { - *table =3D __pa(new) | _REGION_ENTRY_LENGTH | - (*table & _REGION_ENTRY_TYPE_MASK); - page =3D NULL; - } - spin_unlock(&gmap->guest_table_lock); - if (page) - __free_pages(page, CRST_ALLOC_ORDER); - return 0; -} - -static unsigned long host_to_guest_lookup(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_lookup(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static unsigned long host_to_guest_delete(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_delete(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static pmd_t *host_to_guest_pmd_delete(struct gmap *gmap, unsigned long vm= addr, - unsigned long *gaddr) -{ - *gaddr =3D host_to_guest_delete(gmap, vmaddr); - if (IS_GADDR_VALID(*gaddr)) - return (pmd_t *)gmap_table_walk(gmap, *gaddr, 1); - return NULL; -} - -/** - * __gmap_unlink_by_vmaddr - unlink a single segment via a host address - * @gmap: pointer to the guest address space structure - * @vmaddr: address in the host process address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unlink_by_vmaddr(struct gmap *gmap, unsigned long vmaddr) -{ - unsigned long gaddr; - int flush =3D 0; - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - spin_lock(&gmap->guest_table_lock); - - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - flush =3D (pmd_val(*pmdp) !=3D _SEGMENT_ENTRY_EMPTY); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - - spin_unlock(&gmap->guest_table_lock); - return flush; -} - -/** - * __gmap_unmap_by_gaddr - unmap a single segment via a guest address - * @gmap: pointer to the guest address space structure - * @gaddr: address in the guest address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unmap_by_gaddr(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) radix_tree_delete(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - return vmaddr ? __gmap_unlink_by_vmaddr(gmap, vmaddr) : 0; -} - -/** - * gmap_unmap_segment - unmap segment from the guest address space - * @gmap: pointer to the guest address space structure - * @to: address in the guest address space - * @len: length of the memory area to unmap - * - * Returns 0 if the unmap succeeded, -EINVAL if not. - */ -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || to + len < to) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - return 0; -} -EXPORT_SYMBOL_GPL(gmap_unmap_segment); - -/** - * gmap_map_segment - map a segment to the guest address space - * @gmap: pointer to the guest address space structure - * @from: source address in the parent address space - * @to: target address in the guest address space - * @len: length of the memory area to map - * - * Returns 0 if the mmap succeeded, -EINVAL or -ENOMEM if not. - */ -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((from | to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || from + len < from || to + len < to || - from + len - 1 > TASK_SIZE_MAX || to + len - 1 > gmap->asce_end) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) { - /* Remove old translation */ - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - /* Store new translation */ - if (radix_tree_insert(&gmap->guest_to_host, - (to + off) >> PMD_SHIFT, - (void *) from + off)) - break; - } - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - if (off >=3D len) - return 0; - gmap_unmap_segment(gmap, to, len); - return -ENOMEM; -} -EXPORT_SYMBOL_GPL(gmap_map_segment); - -/** - * __gmap_translate - translate a guest address to a user space address - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * - * Returns user space address which corresponds to the guest address or - * -EFAULT if no such mapping exists. - * This function does not establish potentially missing page table entries. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long __gmap_translate(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) - radix_tree_lookup(&gmap->guest_to_host, gaddr >> PMD_SHIFT); - /* Note: guest_to_host is empty for a shadow gmap */ - return vmaddr ? (vmaddr | (gaddr & ~PMD_MASK)) : -EFAULT; -} -EXPORT_SYMBOL_GPL(__gmap_translate); - -/** - * gmap_unlink - disconnect a page table from the gmap shadow tables - * @mm: pointer to the parent mm_struct - * @table: pointer to the host page table - * @vmaddr: vm address associated with the host page table - */ -void gmap_unlink(struct mm_struct *mm, unsigned long *table, - unsigned long vmaddr) -{ - struct gmap *gmap; - int flush; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - flush =3D __gmap_unlink_by_vmaddr(gmap, vmaddr); - if (flush) - gmap_flush_tlb(gmap); - } - rcu_read_unlock(); -} - -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new, - unsigned long gaddr); - -/** - * __gmap_link - set up shadow page tables to connect a host to a guest ad= dress - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * @vmaddr: vm address - * - * Returns 0 on success, -ENOMEM for out of memory conditions, and -EFAULT - * if the vm address is already mapped to a different guest segment. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - */ -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr) -{ - struct mm_struct *mm; - unsigned long *table; - spinlock_t *ptl; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - u64 unprot; - int rc; - - BUG_ON(gmap_is_shadow(gmap)); - /* Create higher level tables in the gmap page table */ - table =3D gmap->table; - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION1) { - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION2_ENTRY_EMPTY, - gaddr & _REGION1_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION2) { - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION3_ENTRY_EMPTY, - gaddr & _REGION2_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION3) { - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _SEGMENT_ENTRY_EMPTY, - gaddr & _REGION3_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - /* Walk the parent mm page table */ - mm =3D gmap->mm; - pgd =3D pgd_offset(mm, vmaddr); - VM_BUG_ON(pgd_none(*pgd)); - p4d =3D p4d_offset(pgd, vmaddr); - VM_BUG_ON(p4d_none(*p4d)); - pud =3D pud_offset(p4d, vmaddr); - VM_BUG_ON(pud_none(*pud)); - /* large puds cannot yet be handled */ - if (pud_leaf(*pud)) - return -EFAULT; - pmd =3D pmd_offset(pud, vmaddr); - VM_BUG_ON(pmd_none(*pmd)); - /* Are we allowed to use huge pages? */ - if (pmd_leaf(*pmd) && !gmap->mm->context.allow_gmap_hpage_1m) - return -EFAULT; - /* Link gmap segment table entry location to page table. */ - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - return rc; - ptl =3D pmd_lock(mm, pmd); - spin_lock(&gmap->guest_table_lock); - if (*table =3D=3D _SEGMENT_ENTRY_EMPTY) { - rc =3D radix_tree_insert(&gmap->host_to_guest, - vmaddr >> PMD_SHIFT, - (void *)MAKE_VALID_GADDR(gaddr)); - if (!rc) { - if (pmd_leaf(*pmd)) { - *table =3D (pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS_LARGE) - | _SEGMENT_ENTRY_GMAP_UC - | _SEGMENT_ENTRY; - } else - *table =3D (pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS) - | _SEGMENT_ENTRY; - } - } else if (*table & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) { - unprot =3D (u64)*table; - unprot &=3D ~_SEGMENT_ENTRY_PROTECT; - unprot |=3D _SEGMENT_ENTRY_GMAP_UC; - gmap_pmdp_xchg(gmap, (pmd_t *)table, __pmd(unprot), gaddr); - } - spin_unlock(&gmap->guest_table_lock); - spin_unlock(ptl); - radix_tree_preload_end(); - return rc; -} -EXPORT_SYMBOL(__gmap_link); - -/* - * this function is assumed to be called with mmap_lock held - */ -void __gmap_zap(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - mmap_assert_locked(gmap->mm); - - /* Find the vm address for the guest address */ - vmaddr =3D (unsigned long) radix_tree_lookup(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - if (vmaddr) { - vmaddr |=3D gaddr & ~PMD_MASK; - gmap_helper_zap_one_page(gmap->mm, vmaddr); - } -} -EXPORT_SYMBOL_GPL(__gmap_zap); - -static LIST_HEAD(gmap_notifier_list); -static DEFINE_SPINLOCK(gmap_notifier_lock); - -/** - * gmap_register_pte_notifier - register a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_register_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_add_rcu(&nb->list, &gmap_notifier_list); - spin_unlock(&gmap_notifier_lock); -} -EXPORT_SYMBOL_GPL(gmap_register_pte_notifier); - -/** - * gmap_unregister_pte_notifier - remove a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_unregister_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_del_rcu(&nb->list); - spin_unlock(&gmap_notifier_lock); - synchronize_rcu(); -} -EXPORT_SYMBOL_GPL(gmap_unregister_pte_notifier); - -/** - * gmap_call_notifier - call all registered invalidation callbacks - * @gmap: pointer to guest mapping meta data structure - * @start: start virtual address in the guest address space - * @end: end virtual address in the guest address space - */ -static void gmap_call_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct gmap_notifier *nb; - - list_for_each_entry(nb, &gmap_notifier_list, list) - nb->notifier_call(gmap, start, end); -} - -/** - * gmap_table_walk - walk the gmap page tables - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @level: page table level to stop at - * - * Returns a table entry pointer for the given guest address and @level - * @level=3D0 : returns a pointer to a page table table entry (or NULL) - * @level=3D1 : returns a pointer to a segment table entry (or NULL) - * @level=3D2 : returns a pointer to a region-3 table entry (or NULL) - * @level=3D3 : returns a pointer to a region-2 table entry (or NULL) - * @level=3D4 : returns a pointer to a region-1 table entry (or NULL) - * - * Returns NULL if the gmap page tables could not be walked to the - * requested level. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level) -{ - const int asce_type =3D gmap->asce & _ASCE_TYPE_MASK; - unsigned long *table =3D gmap->table; - - if (gmap_is_shadow(gmap) && gmap->removed) - return NULL; - - if (WARN_ON_ONCE(level > (asce_type >> 2) + 1)) - return NULL; - - if (asce_type !=3D _ASCE_TYPE_REGION1 && - gaddr & (-1UL << (31 + (asce_type >> 2) * 11))) - return NULL; - - switch (asce_type) { - case _ASCE_TYPE_REGION1: - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if (level =3D=3D 4) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION2: - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if (level =3D=3D 3) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION3: - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if (level =3D=3D 2) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_SEGMENT: - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - if (level =3D=3D 1) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _SEGMENT_ENTRY_ORIGIN); - table +=3D (gaddr & _PAGE_INDEX) >> PAGE_SHIFT; - } - return table; -} -EXPORT_SYMBOL(gmap_table_walk); - -/** - * gmap_pte_op_walk - walk the gmap page table, get the page table lock - * and return the pte pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @ptl: pointer to the spinlock pointer - * - * Returns a pointer to the locked pte for a guest address, or NULL - */ -static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr, - spinlock_t **ptl) -{ - unsigned long *table; - - BUG_ON(gmap_is_shadow(gmap)); - /* Walk the gmap page table, lock and get pte pointer */ - table =3D gmap_table_walk(gmap, gaddr, 1); /* get segment pointer */ - if (!table || *table & _SEGMENT_ENTRY_INVALID) - return NULL; - return pte_alloc_map_lock(gmap->mm, (pmd_t *) table, gaddr, ptl); -} - -/** - * gmap_pte_op_fixup - force a page in and connect the gmap page table - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @vmaddr: address in the host process address space - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * - * Returns 0 if the caller can retry __gmap_translate (might fail again), - * -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing - * up or connecting the gmap page table. - */ -static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr, - unsigned long vmaddr, int prot) -{ - struct mm_struct *mm =3D gmap->mm; - unsigned int fault_flags; - bool unlocked =3D false; - - BUG_ON(gmap_is_shadow(gmap)); - fault_flags =3D (prot =3D=3D PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - if (fixup_user_fault(mm, vmaddr, fault_flags, &unlocked)) - return -EFAULT; - if (unlocked) - /* lost mmap_lock, caller has to retry __gmap_translate */ - return 0; - /* Connect the page tables */ - return __gmap_link(gmap, gaddr, vmaddr); -} - -/** - * gmap_pte_op_end - release the page table lock - * @ptep: pointer to the locked pte - * @ptl: pointer to the page table spinlock - */ -static void gmap_pte_op_end(pte_t *ptep, spinlock_t *ptl) -{ - pte_unmap_unlock(ptep, ptl); -} - -/** - * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock - * and return the pmd pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * - * Returns a pointer to the pmd for a guest address, or NULL - */ -static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gad= dr) -{ - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - pmdp =3D (pmd_t *) gmap_table_walk(gmap, gaddr, 1); - if (!pmdp) - return NULL; - - /* without huge pages, there is no need to take the table lock */ - if (!gmap->mm->context.allow_gmap_hpage_1m) - return pmd_none(*pmdp) ? NULL : pmdp; - - spin_lock(&gmap->guest_table_lock); - if (pmd_none(*pmdp)) { - spin_unlock(&gmap->guest_table_lock); - return NULL; - } - - /* 4k page table entries are locked via the pte (pte_alloc_map_lock). */ - if (!pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); - return pmdp; -} - -/** - * gmap_pmd_op_end - release the guest_table_lock if needed - * @gmap: pointer to the guest mapping meta data structure - * @pmdp: pointer to the pmd - */ -static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp) -{ - if (pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); -} - -/* - * gmap_protect_pmd - remove access rights to memory and set pmd notificat= ion bits - * @pmdp: pointer to the pmd to be protected - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns: - * 0 if successfully protected - * -EAGAIN if a fixup is needed - * -EINVAL if unsupported notifier bits have been specified - * - * Expected to be called with sg->mm->mmap_lock in read and - * guest_table_lock held. - */ -static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int pmd_i =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID; - int pmd_p =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT; - pmd_t new =3D *pmdp; - - /* Fixup needed */ - if ((pmd_i && (prot !=3D PROT_NONE)) || (pmd_p && (prot =3D=3D PROT_WRITE= ))) - return -EAGAIN; - - if (prot =3D=3D PROT_NONE && !pmd_i) { - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (prot =3D=3D PROT_READ && !pmd_p) { - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_PROTECT)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (bits & GMAP_NOTIFY_MPROT) - set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - - /* Shadow GMAP protection needs split PMDs */ - if (bits & GMAP_NOTIFY_SHADOW) - return -EINVAL; - - return 0; -} - -/* - * gmap_protect_pte - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @pmdp: pointer to the pmd associated with the pte - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns 0 if successfully protected, -ENOMEM if out of memory and - * -EAGAIN if a fixup is needed. - * - * Expected to be called with sg->mm->mmap_lock in read - */ -static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int rc; - pte_t *ptep; - spinlock_t *ptl; - unsigned long pbits =3D 0; - - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return -EAGAIN; - - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl); - if (!ptep) - return -ENOMEM; - - pbits |=3D (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0; - pbits |=3D (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0; - /* Protect and unlock. */ - rc =3D ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits); - gmap_pte_op_end(ptep, ptl); - return rc; -} - -/* - * gmap_protect_range - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @len: size of area - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: - * PAGE_SIZE if a small page was successfully protected; - * HPAGE_SIZE if a large page was successfully protected; - * -ENOMEM if out of memory; - * -EFAULT if gaddr is invalid (or mapping for shadows is missing); - * -EAGAIN if the guest mapping is missing and should be fixed by the ca= ller. - * - * Context: Called with sg->mm->mmap_lock in read. - */ -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits) -{ - pmd_t *pmdp; - int rc =3D 0; - - BUG_ON(gmap_is_shadow(gmap)); - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return -EAGAIN; - - if (!pmd_leaf(*pmdp)) { - rc =3D gmap_protect_pte(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D PAGE_SIZE; - } else { - rc =3D gmap_protect_pmd(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D HPAGE_SIZE; - } - gmap_pmd_op_end(gmap, pmdp); - - return rc; -} -EXPORT_SYMBOL_GPL(gmap_protect_one); - -/** - * gmap_read_table - get an unsigned long value from a guest page table us= ing - * absolute addressing, without marking the page referen= ced. - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @val: pointer to the unsigned long value to return - * - * Returns 0 if the value was read, -ENOMEM if out of memory and -EFAULT - * if reading using the virtual address failed. -EINVAL if called on a gmap - * shadow. - * - * Called with gmap->mm->mmap_lock in read. - */ -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val) -{ - unsigned long address, vmaddr; - spinlock_t *ptl; - pte_t *ptep, pte; - int rc; - - if (gmap_is_shadow(gmap)) - return -EINVAL; - - while (1) { - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(gmap, gaddr, &ptl); - if (ptep) { - pte =3D *ptep; - if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) { - address =3D pte_val(pte) & PAGE_MASK; - address +=3D gaddr & ~PAGE_MASK; - *val =3D *(unsigned long *)__va(address); - set_pte(ptep, set_pte_bit(*ptep, __pgprot(_PAGE_YOUNG))); - /* Do *NOT* clear the _PAGE_INVALID bit! */ - rc =3D 0; - } - gmap_pte_op_end(ptep, ptl); - } - if (!rc) - break; - vmaddr =3D __gmap_translate(gmap, gaddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ); - if (rc) - break; - } - return rc; -} -EXPORT_SYMBOL_GPL(gmap_read_table); - -/** - * gmap_insert_rmap - add a rmap to the host_to_rmap radix tree - * @sg: pointer to the shadow guest address space structure - * @vmaddr: vm address associated with the rmap - * @rmap: pointer to the rmap structure - * - * Called with the sg->guest_table_lock - */ -static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr, - struct gmap_rmap *rmap) -{ - struct gmap_rmap *temp; - void __rcu **slot; - - BUG_ON(!gmap_is_shadow(sg)); - slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - if (slot) { - rmap->next =3D radix_tree_deref_slot_protected(slot, - &sg->guest_table_lock); - for (temp =3D rmap->next; temp; temp =3D temp->next) { - if (temp->raddr =3D=3D rmap->raddr) { - kfree(rmap); - return; - } - } - radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); - } else { - rmap->next =3D NULL; - radix_tree_insert(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT, - rmap); - } -} - -/** - * gmap_protect_rmap - restrict access rights to memory (RO) and create an= rmap - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow gmap - * @paddr: address in the parent guest address space - * @len: length of the memory area to protect - * - * Returns 0 if successfully protected and the rmap was created, -ENOMEM - * if out of memory and -EFAULT if paddr is invalid. - */ -static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr, - unsigned long paddr, unsigned long len) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr; - spinlock_t *ptl; - pte_t *ptep; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - while (len) { - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) - return vmaddr; - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D raddr; - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) { - kfree(rmap); - return rc; - } - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (ptep) { - spin_lock(&sg->guest_table_lock); - rc =3D ptep_force_prot(parent->mm, paddr, ptep, PROT_READ, - PGSTE_VSIE_BIT); - if (!rc) - gmap_insert_rmap(sg, vmaddr, rmap); - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(ptep, ptl); - } - radix_tree_preload_end(); - if (rc) { - kfree(rmap); - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ); - if (rc) - return rc; - continue; - } - paddr +=3D PAGE_SIZE; - len -=3D PAGE_SIZE; - } - return 0; -} - -#define _SHADOW_RMAP_MASK 0x7 -#define _SHADOW_RMAP_REGION1 0x5 -#define _SHADOW_RMAP_REGION2 0x4 -#define _SHADOW_RMAP_REGION3 0x3 -#define _SHADOW_RMAP_SEGMENT 0x2 -#define _SHADOW_RMAP_PGTABLE 0x1 - -/** - * gmap_idte_one - invalidate a single region or segment table entry - * @asce: region or segment table *origin* + table-type bits - * @vaddr: virtual address to identify the table entry to flush - * - * The invalid bit of a single region or segment table entry is set - * and the associated TLB entries depending on the entry are flushed. - * The table-type of the @asce identifies the portion of the @vaddr - * that is used as the invalidation index. - */ -static inline void gmap_idte_one(unsigned long asce, unsigned long vaddr) -{ - asm volatile( - " idte %0,0,%1" - : : "a" (asce), "a" (vaddr) : "cc", "memory"); -} - -/** - * gmap_unshadow_page - remove a page from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_page(struct gmap *sg, unsigned long raddr) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - table =3D gmap_table_walk(sg, raddr, 0); /* get page table pointer */ - if (!table || *table & _PAGE_INVALID) - return; - gmap_call_notifier(sg, raddr, raddr + PAGE_SIZE - 1); - ptep_unshadow_pte(sg->mm, raddr, (pte_t *) table); -} - -/** - * __gmap_unshadow_pgt - remove all entries from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @pgt: pointer to the start of a shadow page table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr, - unsigned long *pgt) -{ - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _PAGE_ENTRIES; i++, raddr +=3D PAGE_SIZE) - pgt[i] =3D _PAGE_INVALID; -} - -/** - * gmap_unshadow_pgt - remove a shadow page table from a segment entry - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long *ste; - phys_addr_t sto, pgt; - struct ptdesc *ptdesc; - - BUG_ON(!gmap_is_shadow(sg)); - ste =3D gmap_table_walk(sg, raddr, 1); /* get segment pointer */ - if (!ste || !(*ste & _SEGMENT_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _SEGMENT_SIZE - 1); - sto =3D __pa(ste - ((raddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT)); - gmap_idte_one(sto | _ASCE_TYPE_SEGMENT, raddr); - pgt =3D *ste & _SEGMENT_ENTRY_ORIGIN; - *ste =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); -} - -/** - * __gmap_unshadow_sgt - remove all entries from a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @sgt: pointer to the start of a shadow segment table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr, - unsigned long *sgt) -{ - struct ptdesc *ptdesc; - phys_addr_t pgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _SEGMENT_SIZE) { - if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN)) - continue; - pgt =3D sgt[i] & _REGION_ENTRY_ORIGIN; - sgt[i] =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); - } -} - -/** - * gmap_unshadow_sgt - remove a shadow segment table from a region-3 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the shadow->guest_table_lock - */ -static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long r3o, *r3e; - phys_addr_t sgt; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r3e =3D gmap_table_walk(sg, raddr, 2); /* get region-3 pointer */ - if (!r3e || !(*r3e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION3_SIZE - 1); - r3o =3D (unsigned long) (r3e - ((raddr & _REGION3_INDEX) >> _REGION3_SHIF= T)); - gmap_idte_one(__pa(r3o) | _ASCE_TYPE_REGION3, raddr); - sgt =3D *r3e & _REGION_ENTRY_ORIGIN; - *r3e =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r3t - remove all entries from a shadow region-3 table - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * @r3t: pointer to the start of a shadow region-3 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr, - unsigned long *r3t) -{ - struct page *page; - phys_addr_t sgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION3_SIZE) { - if (!(r3t[i] & _REGION_ENTRY_ORIGIN)) - continue; - sgt =3D r3t[i] & _REGION_ENTRY_ORIGIN; - r3t[i] =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r3t - remove a shadow region-3 table from a region-2 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r2o, *r2e; - phys_addr_t r3t; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r2e =3D gmap_table_walk(sg, raddr, 3); /* get region-2 pointer */ - if (!r2e || !(*r2e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION2_SIZE - 1); - r2o =3D (unsigned long) (r2e - ((raddr & _REGION2_INDEX) >> _REGION2_SHIF= T)); - gmap_idte_one(__pa(r2o) | _ASCE_TYPE_REGION2, raddr); - r3t =3D *r2e & _REGION_ENTRY_ORIGIN; - *r2e =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r2t - remove all entries from a shadow region-2 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r2t: pointer to the start of a shadow region-2 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, - unsigned long *r2t) -{ - phys_addr_t r3t; - struct page *page; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION2_SIZE) { - if (!(r2t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r3t =3D r2t[i] & _REGION_ENTRY_ORIGIN; - r2t[i] =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r2t - remove a shadow region-2 table from a region-1 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r1o, *r1e; - struct page *page; - phys_addr_t r2t; - - BUG_ON(!gmap_is_shadow(sg)); - r1e =3D gmap_table_walk(sg, raddr, 4); /* get region-1 pointer */ - if (!r1e || !(*r1e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION1_SIZE - 1); - r1o =3D (unsigned long) (r1e - ((raddr & _REGION1_INDEX) >> _REGION1_SHIF= T)); - gmap_idte_one(__pa(r1o) | _ASCE_TYPE_REGION1, raddr); - r2t =3D *r1e & _REGION_ENTRY_ORIGIN; - *r1e =3D _REGION1_ENTRY_EMPTY; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r1t - remove all entries from a shadow region-1 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r1t: pointer to the start of a shadow region-1 table - * - * Called with the shadow->guest_table_lock - */ -static void __gmap_unshadow_r1t(struct gmap *sg, unsigned long raddr, - unsigned long *r1t) -{ - unsigned long asce; - struct page *page; - phys_addr_t r2t; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - asce =3D __pa(r1t) | _ASCE_TYPE_REGION1; - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION1_SIZE) { - if (!(r1t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r2t =3D r1t[i] & _REGION_ENTRY_ORIGIN; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Clear entry and flush translation r1t -> r2t */ - gmap_idte_one(asce, raddr); - r1t[i] =3D _REGION1_ENTRY_EMPTY; - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow - remove a shadow page table completely - * @sg: pointer to the shadow guest address space structure - * - * Called with sg->guest_table_lock - */ -void gmap_unshadow(struct gmap *sg) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - if (sg->removed) - return; - sg->removed =3D 1; - gmap_call_notifier(sg, 0, -1UL); - gmap_flush_tlb(sg); - table =3D __va(sg->asce & _ASCE_ORIGIN); - switch (sg->asce & _ASCE_TYPE_MASK) { - case _ASCE_TYPE_REGION1: - __gmap_unshadow_r1t(sg, 0, table); - break; - case _ASCE_TYPE_REGION2: - __gmap_unshadow_r2t(sg, 0, table); - break; - case _ASCE_TYPE_REGION3: - __gmap_unshadow_r3t(sg, 0, table); - break; - case _ASCE_TYPE_SEGMENT: - __gmap_unshadow_sgt(sg, 0, table); - break; - } -} -EXPORT_SYMBOL(gmap_unshadow); - -/** - * gmap_shadow_r2t - create an empty shadow region 2 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r2t: parent gmap address of the region 2 table to get shadowed - * @fake: r2t references contiguous guest memory block, not a r2t - * - * The r2t parameter specifies the address of the source table. The - * four pages of the source table are made read-only in the parent gmap - * address space. A write to the source table area @r2t will automatically - * remove the shadow r2 table and all of its descendants. - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r2t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r2t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 4); /* get region-1 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r2t), _REGION2_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r2t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r2t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r2t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION1_MASK) | _SHADOW_RMAP_REGION1; - origin =3D r2t & _REGION_ENTRY_ORIGIN; - offset =3D ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 4); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r2t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r2t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r2t); - -/** - * gmap_shadow_r3t - create a shadow region 3 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r3t: parent gmap address of the region 3 table to get shadowed - * @fake: r3t references contiguous guest memory block, not a r3t - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r3t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r3t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 3); /* get region-2 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r3t), _REGION3_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r3t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r3t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r3t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION2_MASK) | _SHADOW_RMAP_REGION2; - origin =3D r3t & _REGION_ENTRY_ORIGIN; - offset =3D ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 3); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r3t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r3t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r3t); - -/** - * gmap_shadow_sgt - create a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @sgt: parent gmap address of the segment table to get shadowed - * @fake: sgt references contiguous guest memory block, not a sgt - * - * Returns: 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_sgt; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (sgt & _REGION3_ENTRY_LARGE)); - /* Allocate a shadow segment table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_sgt =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 2); /* get region-3 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_sgt), _SEGMENT_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_sgt | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D sgt & _REGION_ENTRY_PROTECT; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make sgt read-only in parent gmap page table */ - raddr =3D (saddr & _REGION3_MASK) | _SHADOW_RMAP_REGION3; - origin =3D sgt & _REGION_ENTRY_ORIGIN; - offset =3D ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 2); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_sgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_sgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_sgt); - -static void gmap_pgste_set_pgt_addr(struct ptdesc *ptdesc, unsigned long p= gt_addr) -{ - unsigned long *pgstes =3D page_to_virt(ptdesc_page(ptdesc)); - - pgstes +=3D _PAGE_ENTRIES; - - pgstes[0] &=3D ~PGSTE_ST2_MASK; - pgstes[1] &=3D ~PGSTE_ST2_MASK; - pgstes[2] &=3D ~PGSTE_ST2_MASK; - pgstes[3] &=3D ~PGSTE_ST2_MASK; - - pgstes[0] |=3D (pgt_addr >> 16) & PGSTE_ST2_MASK; - pgstes[1] |=3D pgt_addr & PGSTE_ST2_MASK; - pgstes[2] |=3D (pgt_addr << 16) & PGSTE_ST2_MASK; - pgstes[3] |=3D (pgt_addr << 32) & PGSTE_ST2_MASK; -} - -/** - * gmap_shadow_pgt - instantiate a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pgt: parent gmap address of the page table to get shadowed - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory, - * -EFAULT if an address in the parent gmap could not be resolved and - * - * Called with gmap->mm->mmap_lock in read - */ -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake) -{ - unsigned long raddr, origin; - unsigned long *table; - struct ptdesc *ptdesc; - phys_addr_t s_pgt; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (pgt & _SEGMENT_ENTRY_LARGE)); - /* Allocate a shadow page table */ - ptdesc =3D page_table_alloc_pgste(sg->mm); - if (!ptdesc) - return -ENOMEM; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN; - if (fake) - origin |=3D GMAP_SHADOW_FAKE_TABLE; - gmap_pgste_set_pgt_addr(ptdesc, origin); - s_pgt =3D page_to_phys(ptdesc_page(ptdesc)); - /* Install shadow page table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _SEGMENT_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _SEGMENT_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - /* mark as invalid as long as the parent table is not protected */ - *table =3D (unsigned long) s_pgt | _SEGMENT_ENTRY | - (pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_SEGMENT_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make pgt read-only in parent gmap page table (not the pgste) */ - raddr =3D (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK; - rc =3D gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 1); - if (!table || (*table & _SEGMENT_ENTRY_ORIGIN) !=3D s_pgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_SEGMENT_ENTRY_INVALID; - } else { - gmap_unshadow_pgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - page_table_free_pgste(ptdesc); - return rc; - -} -EXPORT_SYMBOL_GPL(gmap_shadow_pgt); - -/** - * gmap_shadow_page - create a shadow page mapping - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pte: pte in parent gmap address space to get shadowed - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr, paddr; - spinlock_t *ptl; - pte_t *sptep, *tptep; - int prot; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - prot =3D (pte_val(pte) & _PAGE_PROTECT) ? PROT_READ : PROT_WRITE; - - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D (saddr & PAGE_MASK) | _SHADOW_RMAP_PGTABLE; - - while (1) { - paddr =3D pte_val(pte) & PAGE_MASK; - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - break; - rc =3D -EAGAIN; - sptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (sptep) { - spin_lock(&sg->guest_table_lock); - /* Get page table pointer */ - tptep =3D (pte_t *) gmap_table_walk(sg, saddr, 0); - if (!tptep) { - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(sptep, ptl); - radix_tree_preload_end(); - break; - } - rc =3D ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte); - if (rc > 0) { - /* Success and a new mapping */ - gmap_insert_rmap(sg, vmaddr, rmap); - rmap =3D NULL; - rc =3D 0; - } - gmap_pte_op_end(sptep, ptl); - spin_unlock(&sg->guest_table_lock); - } - radix_tree_preload_end(); - if (!rc) - break; - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, prot); - if (rc) - break; - } - kfree(rmap); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_page); - -/* - * gmap_shadow_notify - handle notifications for shadow gmap - * - * Called with sg->parent->shadow_lock. - */ -static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr, - unsigned long gaddr) -{ - struct gmap_rmap *rmap, *rnext, *head; - unsigned long start, end, bits, raddr; - - BUG_ON(!gmap_is_shadow(sg)); - - spin_lock(&sg->guest_table_lock); - if (sg->removed) { - spin_unlock(&sg->guest_table_lock); - return; - } - /* Check for top level table */ - start =3D sg->orig_asce & _ASCE_ORIGIN; - end =3D start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE; - if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >=3D start && - gaddr < end) { - /* The complete shadow table has to go */ - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - return; - } - /* Remove the page table tree from on specific entry */ - head =3D radix_tree_delete(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - gmap_for_each_rmap_safe(rmap, rnext, head) { - bits =3D rmap->raddr & _SHADOW_RMAP_MASK; - raddr =3D rmap->raddr ^ bits; - switch (bits) { - case _SHADOW_RMAP_REGION1: - gmap_unshadow_r2t(sg, raddr); - break; - case _SHADOW_RMAP_REGION2: - gmap_unshadow_r3t(sg, raddr); - break; - case _SHADOW_RMAP_REGION3: - gmap_unshadow_sgt(sg, raddr); - break; - case _SHADOW_RMAP_SEGMENT: - gmap_unshadow_pgt(sg, raddr); - break; - case _SHADOW_RMAP_PGTABLE: - gmap_unshadow_page(sg, raddr); - break; - } - kfree(rmap); - } - spin_unlock(&sg->guest_table_lock); -} - -/** - * ptep_notify - call all invalidation callbacks for a specific pte. - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - * @pte: pointer to the page table entry - * @bits: bits from the pgste that caused the notify call - * - * This function is assumed to be called with the page table lock held - * for the pte to notify. - */ -void ptep_notify(struct mm_struct *mm, unsigned long vmaddr, - pte_t *pte, unsigned long bits) -{ - unsigned long offset, gaddr =3D 0; - struct gmap *gmap, *sg, *next; - - offset =3D ((unsigned long) pte) & (255 * sizeof(pte_t)); - offset =3D offset * (PAGE_SIZE / sizeof(pte_t)); - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - gaddr =3D host_to_guest_lookup(gmap, vmaddr) + offset; - spin_unlock(&gmap->guest_table_lock); - if (!IS_GADDR_VALID(gaddr)) - continue; - - if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, - &gmap->children, list) - gmap_shadow_notify(sg, vmaddr, gaddr); - spin_unlock(&gmap->shadow_lock); - } - if (bits & PGSTE_IN_BIT) - gmap_call_notifier(gmap, gaddr, gaddr + PAGE_SIZE - 1); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(ptep_notify); - -static void pmdp_notify_gmap(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1); -} - -/** - * gmap_pmdp_xchg - exchange a gmap pmd with another - * @gmap: pointer to the guest address space structure - * @pmdp: pointer to the pmd entry - * @new: replacement entry - * @gaddr: the affected guest address - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new, - unsigned long gaddr) -{ - gaddr &=3D HPAGE_MASK; - pmdp_notify_gmap(gmap, pmdp, gaddr); - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_GMAP_IN)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, (pmd_t *)pmdp, IDTE_GUEST_ASCE, gmap->asce, - IDTE_GLOBAL); - else - __pmdp_idte(gaddr, (pmd_t *)pmdp, 0, 0, IDTE_GLOBAL); - set_pmd(pmdp, new); -} - -static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr, - int purge) -{ - pmd_t *pmdp; - struct gmap *gmap; - unsigned long gaddr; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (purge) - __pmdp_cspg(pmdp); - set_pmd(pmdp, __pmd(_SEGMENT_ENTRY_EMPTY)); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} - -/** - * gmap_pmdp_invalidate - invalidate all affected guest pmd entries without - * flushing - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr) -{ - gmap_pmdp_clear(mm, vmaddr, 0); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_invalidate); - -/** - * gmap_pmdp_idte_local - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_LOCAL); - else - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_local); - -/** - * gmap_pmdp_idte_global - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_GLOBAL); - else - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global); - -/** - * gmap_test_and_clear_dirty_pmd - test and reset segment dirty status - * @gmap: pointer to guest address space - * @pmdp: pointer to the pmd to be tested - * @gaddr: virtual address in the guest address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static bool gmap_test_and_clear_dirty_pmd(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return false; - - /* Already protected memory, which did not change is clean */ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_UC)) - return false; - - /* Clear UC indication and reset protection */ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_UC))); - gmap_protect_pmd(gmap, gaddr, pmdp, PROT_READ, 0); - return true; -} - -/** - * gmap_sync_dirty_log_pmd - set bitmap based on dirty status of segment - * @gmap: pointer to guest address space - * @bitmap: dirty bitmap for this pmd - * @gaddr: virtual address in the guest address space - * @vmaddr: virtual address in the host address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4], - unsigned long gaddr, unsigned long vmaddr) -{ - int i; - pmd_t *pmdp; - pte_t *ptep; - spinlock_t *ptl; - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return; - - if (pmd_leaf(*pmdp)) { - if (gmap_test_and_clear_dirty_pmd(gmap, pmdp, gaddr)) - bitmap_fill(bitmap, _PAGE_ENTRIES); - } else { - for (i =3D 0; i < _PAGE_ENTRIES; i++, vmaddr +=3D PAGE_SIZE) { - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, vmaddr, &ptl); - if (!ptep) - continue; - if (ptep_test_and_clear_uc(gmap->mm, vmaddr, ptep)) - set_bit(i, bitmap); - pte_unmap_unlock(ptep, ptl); - } - } - gmap_pmd_op_end(gmap, pmdp); -} -EXPORT_SYMBOL_GPL(gmap_sync_dirty_log_pmd); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static int thp_split_walk_pmd_entry(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - - split_huge_pmd(vma, pmd, addr); - return 0; -} - -static const struct mm_walk_ops thp_split_walk_ops =3D { - .pmd_entry =3D thp_split_walk_pmd_entry, - .walk_lock =3D PGWALK_WRLOCK_VERIFY, -}; - -static inline void thp_split_mm(struct mm_struct *mm) -{ - struct vm_area_struct *vma; - VMA_ITERATOR(vmi, mm, 0); - - for_each_vma(vmi, vma) { - vm_flags_mod(vma, VM_NOHUGEPAGE, VM_HUGEPAGE); - walk_page_vma(vma, &thp_split_walk_ops, NULL); - } - mm->def_flags |=3D VM_NOHUGEPAGE; -} -#else -static inline void thp_split_mm(struct mm_struct *mm) -{ -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -/* - * switch on pgstes for its userspace process (for kvm) - */ -int s390_enable_sie(void) -{ - struct mm_struct *mm =3D current->mm; - - /* Do we have pgstes? if yes, we are done */ - if (mm_has_pgste(mm)) - return 0; - mmap_write_lock(mm); - mm->context.has_pgste =3D 1; - /* split thp mappings and disable thp for future mappings */ - thp_split_mm(mm); - mmap_write_unlock(mm); - return 0; -} -EXPORT_SYMBOL_GPL(s390_enable_sie); - -/* - * Enable storage key handling from now on and initialize the storage - * keys with the default key. - */ -static int __s390_enable_skey_pte(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - /* Clear storage key */ - ptep_zap_key(walk->mm, addr, pte); - return 0; -} - -/* - * Give a chance to schedule after setting a key to 256 pages. - * We only hold the mm lock, which is a rwsem and the kvm srcu. - * Both can sleep. - */ -static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - cond_resched(); - return 0; -} - -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, - struct mm_walk *walk) -{ - pmd_t *pmd =3D (pmd_t *)pte; - unsigned long start, end; - struct folio *folio =3D page_folio(pmd_page(*pmd)); - - /* - * The write check makes sure we do not set a key on shared - * memory. This is needed as the walker does not differentiate - * between actual guest memory and the process executable or - * shared libraries. - */ - if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID || - !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE)) - return 0; - - start =3D pmd_val(*pmd) & HPAGE_MASK; - end =3D start + HPAGE_SIZE; - __storage_key_init_range(start, end); - set_bit(PG_arch_1, &folio->flags.f); - cond_resched(); - return 0; -} - -static const struct mm_walk_ops enable_skey_walk_ops =3D { - .hugetlb_entry =3D __s390_enable_skey_hugetlb, - .pte_entry =3D __s390_enable_skey_pte, - .pmd_entry =3D __s390_enable_skey_pmd, - .walk_lock =3D PGWALK_WRLOCK, -}; - -int s390_enable_skey(void) -{ - struct mm_struct *mm =3D current->mm; - int rc =3D 0; - - mmap_write_lock(mm); - if (mm_uses_skeys(mm)) - goto out_up; - - mm->context.uses_skeys =3D 1; - rc =3D gmap_helper_disable_cow_sharing(); - if (rc) { - mm->context.uses_skeys =3D 0; - goto out_up; - } - walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL); - -out_up: - mmap_write_unlock(mm); - return rc; -} -EXPORT_SYMBOL_GPL(s390_enable_skey); - -/* - * Reset CMMA state, make all pages stable again. - */ -static int __s390_reset_cmma(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - ptep_zap_unused(walk->mm, addr, pte, 1); - return 0; -} - -static const struct mm_walk_ops reset_cmma_walk_ops =3D { - .pte_entry =3D __s390_reset_cmma, - .walk_lock =3D PGWALK_WRLOCK, -}; - -void s390_reset_cmma(struct mm_struct *mm) -{ - mmap_write_lock(mm); - walk_page_range(mm, 0, TASK_SIZE, &reset_cmma_walk_ops, NULL); - mmap_write_unlock(mm); -} -EXPORT_SYMBOL_GPL(s390_reset_cmma); - -#define GATHER_GET_PAGES 32 - -struct reset_walk_state { - unsigned long next; - unsigned long count; - unsigned long pfns[GATHER_GET_PAGES]; -}; - -static int s390_gather_pages(pte_t *ptep, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct reset_walk_state *p =3D walk->private; - pte_t pte =3D READ_ONCE(*ptep); - - if (pte_present(pte)) { - /* we have a reference from the mapping, take an extra one */ - get_page(phys_to_page(pte_val(pte))); - p->pfns[p->count] =3D phys_to_pfn(pte_val(pte)); - p->next =3D next; - p->count++; - } - return p->count >=3D GATHER_GET_PAGES; -} - -static const struct mm_walk_ops gather_pages_ops =3D { - .pte_entry =3D s390_gather_pages, - .walk_lock =3D PGWALK_RDLOCK, -}; - -/* - * Call the Destroy secure page UVC on each page in the given array of PFN= s. - * Each page needs to have an extra reference, which will be released here. - */ -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns) -{ - struct folio *folio; - unsigned long i; - - for (i =3D 0; i < count; i++) { - folio =3D pfn_folio(pfns[i]); - /* we always have an extra reference */ - uv_destroy_folio(folio); - /* get rid of the extra reference */ - folio_put(folio); - cond_resched(); - } -} -EXPORT_SYMBOL_GPL(s390_uv_destroy_pfns); - -/** - * __s390_uv_destroy_range - Call the destroy secure page UVC on each page - * in the given range of the given address space. - * @mm: the mm to operate on - * @start: the start of the range - * @end: the end of the range - * @interruptible: if not 0, stop when a fatal signal is received - * - * Walk the given range of the given address space and call the destroy - * secure page UVC on each page. Optionally exit early if a fatal signal is - * pending. - * - * Return: 0 on success, -EINTR if the function stopped before completing - */ -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible) -{ - struct reset_walk_state state =3D { .next =3D start }; - int r =3D 1; - - while (r > 0) { - state.count =3D 0; - mmap_read_lock(mm); - r =3D walk_page_range(mm, state.next, end, &gather_pages_ops, &state); - mmap_read_unlock(mm); - cond_resched(); - s390_uv_destroy_pfns(state.count, state.pfns); - if (interruptible && fatal_signal_pending(current)) - return -EINTR; - } - return 0; -} -EXPORT_SYMBOL_GPL(__s390_uv_destroy_range); - -/** - * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py - * @gmap: the gmap whose ASCE needs to be replaced - * - * If the ASCE is a SEGMENT type then this function will return -EINVAL, - * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng - * to the wrong pages, causing use-after-free and memory corruption. - * If the allocation of the new top level page table fails, the ASCE is not - * replaced. - * In any case, the old ASCE is always removed from the gmap CRST list. - * Therefore the caller has to make sure to save a pointer to it - * beforehand, unless a leak is actually intended. - */ -int s390_replace_asce(struct gmap *gmap) -{ - unsigned long asce; - struct page *page; - void *table; - - /* Replacing segment type ASCEs would cause serious issues */ - if ((gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) - return -EINVAL; - - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - table =3D page_to_virt(page); - memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); - - /* Set new table origin while preserving existing ASCE control bits */ - asce =3D (gmap->asce & ~_ASCE_ORIGIN) | __pa(table); - WRITE_ONCE(gmap->asce, asce); - WRITE_ONCE(gmap->mm->context.gmap_asce, asce); - WRITE_ONCE(gmap->table, table); - - return 0; -} -EXPORT_SYMBOL_GPL(s390_replace_asce); diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 08743c1dac2f..eced1dc5214f 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -369,8 +369,6 @@ static inline void pmdp_idte_local(struct mm_struct *mm, mm->context.asce, IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_local(mm, addr); } =20 static inline void pmdp_idte_global(struct mm_struct *mm, @@ -379,12 +377,8 @@ static inline void pmdp_idte_global(struct mm_struct *= mm, if (machine_has_tlb_guest()) { __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } else { __pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } } =20 @@ -419,8 +413,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *m= m, cpumask_of(smp_processor_id()))) { set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_INVALID))); mm->context.flush_mm =3D 1; - if (mm_has_pgste(mm)) - gmap_pmdp_invalidate(mm, addr); } else { pmdp_idte_global(mm, addr, pmdp); } --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97B0142981F; Wed, 4 Feb 2026 15:03:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217400; cv=none; b=sB/mgOSiiFzd3tqWjUEbUStrQqZLEeRMOfszqM8H2C+hRNbia3H0cRBY1XFC1s0DtnSAsN243D7yVvgbadvc39jfFysr3Ns1Gob7aBdcuwAA5OroW0pF5sVcynoc4K3nasaHNLMSo+KfDLdFe/98LvDCnAPb9ryo4HJC0Dxuk2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217400; c=relaxed/simple; bh=4bt7HNCKP7FWl7VbMn/CSZyhwlqxP8rR9Ic5YVIbx0o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JoZ9tBlZlmIRBlHJVp+DTu+wi9SGI7acN6YTFPUPeR6y0z3FzWeRVxHEX5dIh7L3aUWRPtR41kjGOMHeUZW2dk+mVlrLcpOxx2Bhvcm2h/9I2eR24rW3ck8+lLthaNvv1+lxvyX3NhcnvK0yRhcLGqZphJDlrRoOR7/zATWxD9c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=jOCnJbbi; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="jOCnJbbi" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613MP2Ug001160; Wed, 4 Feb 2026 15:03:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=09lrcaLEXz3jHIyp/ CbY4FGl3GS26ObwxYd3oDLquSo=; b=jOCnJbbiZYR2Ozu6O81+vT5DDVQSn42sW VI3FHnqwRRg3q1RVhDrV7OfHVrPNoSZEzFJlOq8PW2qh/I/pnsh5N3vwWga61/QB 3+gmNApQONNqPrJEF04/UeX4ukkEYwf4ev6pwiBQC8+3piVP7eSzjR/Um/eV5t1x B7zHB1z9CmvAbJzt1IfAQBjXIHUpSXS5HkM63EJVKRBxs2tD/rXft8CEqP0Vo6OM Cs1a90qFIxmWNRNrklHkbhMVN3yhhaSKj851HttyPBZ+wJzhlcHKBFbJq9H0NcP3 M8ix3LgK0Jl1KUQHS7nPUkQ5zsT9xF44pwHsLNkv8NbQaF9z/XE1g== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c175n00fd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:13 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614Ct58G025703; Wed, 4 Feb 2026 15:03:12 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnkf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F38Rn30474818 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:08 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 06A5B20040; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A85BC20043; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:07 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 26/29] KVM: S390: Remove PGSTE code from linux/s390 mm Date: Wed, 4 Feb 2026 16:02:55 +0100 Message-ID: <20260204150259.60425-27-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 7Qcil7SQEvXwJ5Tweni3J3qTwcsQLSPJ X-Authority-Analysis: v=2.4 cv=VcX6/Vp9 c=1 sm=1 tr=0 ts=69835fb1 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iGo5jALgg6wKPfVY9XUA:9 X-Proofpoint-GUID: 7Qcil7SQEvXwJ5Tweni3J3qTwcsQLSPJ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX8UqN4JGWiwVE 9xryFTNaaPe8OoH0BUuI31xzPjH5blHhfB4dYrmcKipIBbtCgRCoJN2kzvkPqberbzbgKvPqIGB st33LJfhIXiowl8vJYYUoCAqnutG2irwikmx/NdMBllTtIu7XAnvN3F+U/arKGOmvc0qf4fiRa9 RUNYRfFCCqDSBRlida/qz/L90q8XMwW+7svIlmmTLXi5hT+HBqDLMaQ1+rUdLd5fKLXmcaBi0pb ej7E+mN+oBs07+aJAiY/UlXXCHyOdFIkoPuLyt9JbmatfRTF5LCwMQ9vkezNV/Z5itR03n7PyHJ 00AQ38afEULLiNiLUy5Q4h/xjdnfQY68BgR2QMzQEXS77EhIZKMBWfI4Vuq3QCnvYx8mGbZ+qvp MfQGFCRHBIPb2gyiKrqXoJGZqXYRdi0TmWbzXWsg2L4IE52NLi5tpU0Hs3ABSCFcl8LRUbTOWxN 5EAHBws1FWnBLBMbQog== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Remove the PGSTE config option. Remove all code from linux/s390 mm that involves PGSTEs. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/Kconfig | 3 - arch/s390/include/asm/hugetlb.h | 6 - arch/s390/include/asm/mmu.h | 13 - arch/s390/include/asm/page.h | 4 - arch/s390/include/asm/pgalloc.h | 4 - arch/s390/include/asm/pgtable.h | 121 +---- arch/s390/kvm/dat.h | 1 + arch/s390/mm/hugetlbpage.c | 24 - arch/s390/mm/pgalloc.c | 24 - arch/s390/mm/pgtable.c | 827 +------------------------------- mm/khugepaged.c | 9 - 11 files changed, 15 insertions(+), 1021 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 8270754985e9..961cbf023c1b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -32,9 +32,6 @@ config GENERIC_BUG_RELATIVE_POINTERS config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 -config PGSTE - def_bool n - config AUDIT_ARCH def_bool y =20 diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 69131736daaa..6983e52eaf81 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_st= ruct *mm, return __huge_ptep_get_and_clear(mm, addr, ptep); } =20 -static inline void arch_clear_hugetlb_flags(struct folio *folio) -{ - clear_bit(PG_arch_1, &folio->flags.f); -} -#define arch_clear_hugetlb_flags arch_clear_hugetlb_flags - #define __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index f07e49b419ab..d4fd7bf3692e 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -18,24 +18,11 @@ typedef struct { unsigned long vdso_base; /* The mmu context belongs to a secure guest. */ atomic_t protected_count; - /* - * The following bitfields need a down_write on the mm - * semaphore when they are written to. As they are only - * written once, they can be read without a lock. - */ - /* The mmu context uses extended page tables. */ - unsigned int has_pgste:1; - /* The mmu context uses storage keys. */ - unsigned int uses_skeys:1; - /* The mmu context uses CMM. */ - unsigned int uses_cmm:1; /* * The mmu context allows COW-sharing of memory pages (KSM, zeropage). * Note that COW-sharing during fork() is currently always allowed. */ unsigned int allow_cow_sharing:1; - /* The gmaps associated with this context are allowed to use huge pages. = */ - unsigned int allow_gmap_hpage_1m:1; } mm_context_t; =20 #define INIT_MM_CONTEXT(name) \ diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index c1d63b613bf9..6de2f4d25b63 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -78,7 +78,6 @@ static inline void copy_page(void *to, void *from) #ifdef STRICT_MM_TYPECHECKS =20 typedef struct { unsigned long pgprot; } pgprot_t; -typedef struct { unsigned long pgste; } pgste_t; typedef struct { unsigned long pte; } pte_t; typedef struct { unsigned long pmd; } pmd_t; typedef struct { unsigned long pud; } pud_t; @@ -94,7 +93,6 @@ static __always_inline unsigned long name ## _val(name ##= _t name) \ #else /* STRICT_MM_TYPECHECKS */ =20 typedef unsigned long pgprot_t; -typedef unsigned long pgste_t; typedef unsigned long pte_t; typedef unsigned long pmd_t; typedef unsigned long pud_t; @@ -110,7 +108,6 @@ static __always_inline unsigned long name ## _val(name = ## _t name) \ #endif /* STRICT_MM_TYPECHECKS */ =20 DEFINE_PGVAL_FUNC(pgprot) -DEFINE_PGVAL_FUNC(pgste) DEFINE_PGVAL_FUNC(pte) DEFINE_PGVAL_FUNC(pmd) DEFINE_PGVAL_FUNC(pud) @@ -120,7 +117,6 @@ DEFINE_PGVAL_FUNC(pgd) typedef pte_t *pgtable_t; =20 #define __pgprot(x) ((pgprot_t) { (x) } ) -#define __pgste(x) ((pgste_t) { (x) } ) #define __pte(x) ((pte_t) { (x) } ) #define __pmd(x) ((pmd_t) { (x) } ) #define __pud(x) ((pud_t) { (x) } ) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgallo= c.h index a16e65072371..a5de9e61ea9e 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -27,10 +27,6 @@ unsigned long *page_table_alloc_noprof(struct mm_struct = *); #define page_table_alloc(...) alloc_hooks(page_table_alloc_noprof(__VA_ARG= S__)) void page_table_free(struct mm_struct *, unsigned long *); =20 -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm); -#define page_table_alloc_pgste(...) alloc_hooks(page_table_alloc_pgste_nop= rof(__VA_ARGS__)) -void page_table_free_pgste(struct ptdesc *ptdesc); - static inline void crst_table_init(unsigned long *crst, unsigned long entr= y) { memset64((u64 *)crst, entry, _CRST_ENTRIES); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 45f13697cf9e..1c3c3be93be9 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -413,28 +413,6 @@ void setup_protection_map(void); * SW-bits: y young, d dirty, r read, w write */ =20 -/* Page status table bits for virtualization */ -#define PGSTE_ACC_BITS 0xf000000000000000UL -#define PGSTE_FP_BIT 0x0800000000000000UL -#define PGSTE_PCL_BIT 0x0080000000000000UL -#define PGSTE_HR_BIT 0x0040000000000000UL -#define PGSTE_HC_BIT 0x0020000000000000UL -#define PGSTE_GR_BIT 0x0004000000000000UL -#define PGSTE_GC_BIT 0x0002000000000000UL -#define PGSTE_ST2_MASK 0x0000ffff00000000UL -#define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */ -#define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */ -#define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */ - -/* Guest Page State used for virtualization */ -#define _PGSTE_GPS_ZERO 0x0000000080000000UL -#define _PGSTE_GPS_NODAT 0x0000000040000000UL -#define _PGSTE_GPS_USAGE_MASK 0x0000000003000000UL -#define _PGSTE_GPS_USAGE_STABLE 0x0000000000000000UL -#define _PGSTE_GPS_USAGE_UNUSED 0x0000000001000000UL -#define _PGSTE_GPS_USAGE_POT_VOLATILE 0x0000000002000000UL -#define _PGSTE_GPS_USAGE_VOLATILE _PGSTE_GPS_USAGE_MASK - /* * A user page table pointer has the space-switch-event bit, the * private-space-control bit and the storage-alteration-event-control @@ -566,15 +544,6 @@ static inline bool mm_pmd_folded(struct mm_struct *mm) } #define mm_pmd_folded(mm) mm_pmd_folded(mm) =20 -static inline int mm_has_pgste(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (unlikely(mm->context.has_pgste)) - return 1; -#endif - return 0; -} - static inline int mm_is_protected(struct mm_struct *mm) { #if IS_ENABLED(CONFIG_KVM) @@ -584,16 +553,6 @@ static inline int mm_is_protected(struct mm_struct *mm) return 0; } =20 -static inline pgste_t clear_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) & ~mask); -} - -static inline pgste_t set_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) | mask); -} - static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { return __pte(pte_val(pte) & ~pgprot_val(prot)); @@ -639,15 +598,6 @@ static inline int mm_forbids_zeropage(struct mm_struct= *mm) return 0; } =20 -static inline int mm_uses_skeys(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (mm->context.uses_skeys) - return 1; -#endif - return 0; -} - /** * cspg() - Compare and Swap and Purge (CSPG) * @ptr: Pointer to the value to be exchanged @@ -1356,45 +1306,13 @@ static inline int ptep_set_access_flags(struct vm_a= rea_struct *vma, { if (pte_same(*ptep, entry)) return 0; - if (cpu_has_rdp() && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, en= try)) + if (cpu_has_rdp() && pte_allow_rdp(*ptep, entry)) ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry); else ptep_xchg_direct(vma->vm_mm, addr, ptep, entry); return 1; } =20 -/* - * Additional functions to handle KVM guest page tables - */ -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry); -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, - pte_t *ptep, int prot, unsigned long bit); -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep , int reset); -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep); -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte); -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep); - -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long address, - pte_t *ptep); -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq); -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc); -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr); -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key); - -int set_pgste_bits(struct mm_struct *mm, unsigned long addr, - unsigned long bits, unsigned long value); -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste); - #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); =20 @@ -1409,23 +1327,12 @@ static inline void set_ptes(struct mm_struct *mm, u= nsigned long addr, { if (pte_present(entry)) entry =3D clear_pte_bit(entry, __pgprot(_PAGE_UNUSED)); - if (mm_has_pgste(mm)) { - for (;;) { - ptep_set_pte_at(mm, addr, ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - addr +=3D PAGE_SIZE; - } - } else { - for (;;) { - set_pte(ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - } + for (;;) { + set_pte(ptep, entry); + if (--nr =3D=3D 0) + break; + ptep++; + entry =3D __pte(pte_val(entry) + PAGE_SIZE); } } #define set_ptes set_ptes @@ -2026,18 +1933,4 @@ extern pte_t *vmem_get_alloc_pte(unsigned long addr,= bool alloc); #define pmd_pgtable(pmd) \ ((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE)) =20 -static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt) -{ - unsigned long *pgstes, res; - - pgstes =3D pgt + _PAGE_ENTRIES; - - res =3D (pgstes[0] & PGSTE_ST2_MASK) << 16; - res |=3D pgstes[1] & PGSTE_ST2_MASK; - res |=3D (pgstes[2] & PGSTE_ST2_MASK) >> 16; - res |=3D (pgstes[3] & PGSTE_ST2_MASK) >> 32; - - return res; -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 358b756ca8c9..8c7ae07dcc28 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -108,6 +108,7 @@ union pte { #define _PAGE_SD 0x002 =20 /* Needed as macro to perform atomic operations */ +#define PGSTE_PCL_BIT 0x0080000000000000UL /* PCL lock, HW bit */ #define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ =20 enum pgste_gps_usage { diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index d42e61c7594e..35a898e15b1c 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -135,29 +135,6 @@ static inline pte_t __rste_to_pte(unsigned long rste) return __pte(pteval); } =20 -static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste) -{ - struct folio *folio; - unsigned long size, paddr; - - if (!mm_uses_skeys(mm) || - rste & _SEGMENT_ENTRY_INVALID) - return; - - if ((rste & _REGION_ENTRY_TYPE_MASK) =3D=3D _REGION_ENTRY_TYPE_R3) { - folio =3D page_folio(pud_page(__pud(rste))); - size =3D PUD_SIZE; - paddr =3D rste & PUD_MASK; - } else { - folio =3D page_folio(pmd_page(__pmd(rste))); - size =3D PMD_SIZE; - paddr =3D rste & PMD_MASK; - } - - if (!test_and_set_bit(PG_arch_1, &folio->flags.f)) - __storage_key_init_range(paddr, paddr + size); -} - void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -173,7 +150,6 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned l= ong addr, } else if (likely(pte_present(pte))) rste |=3D _SEGMENT_ENTRY_LARGE; =20 - clear_huge_pte_skeys(mm, rste); set_pte(ptep, __pte(rste)); } =20 diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 7df23528c01b..7ac44543e051 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -114,30 +114,6 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned = long end) return -ENOMEM; } =20 -#ifdef CONFIG_PGSTE - -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm) -{ - struct ptdesc *ptdesc; - u64 *table; - - ptdesc =3D pagetable_alloc_noprof(GFP_KERNEL_ACCOUNT, 0); - if (ptdesc) { - table =3D (u64 *)ptdesc_address(ptdesc); - __arch_set_page_dat(table, 1); - memset64(table, _PAGE_INVALID, PTRS_PER_PTE); - memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE); - } - return ptdesc; -} - -void page_table_free_pgste(struct ptdesc *ptdesc) -{ - pagetable_free(ptdesc); -} - -#endif /* CONFIG_PGSTE */ - unsigned long *page_table_alloc_noprof(struct mm_struct *mm) { gfp_t gfp =3D GFP_KERNEL_ACCOUNT; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index eced1dc5214f..4acd8b140c4b 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -115,171 +115,14 @@ static inline pte_t ptep_flush_lazy(struct mm_struct= *mm, return old; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - -static inline pgste_t pgste_get(pte_t *ptep) -{ - unsigned long pgste =3D 0; -#ifdef CONFIG_PGSTE - pgste =3D *(unsigned long *)(ptep + PTRS_PER_PTE); -#endif - return __pgste(pgste); -} - -static inline void pgste_set(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - *(pgste_t *)(ptep + PTRS_PER_PTE) =3D pgste; -#endif -} - -static inline pgste_t pgste_update_all(pte_t pte, pgste_t pgste, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address, bits, skey; - - if (!mm_uses_skeys(mm) || pte_val(pte) & _PAGE_INVALID) - return pgste; - address =3D pte_val(pte) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(address); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - /* Transfer page changed & referenced bit to guest bits in pgste */ - pgste =3D set_pgste_bit(pgste, bits << 48); /* GR bit & GC bit */ - /* Copy page access key and fetch protection bit to pgste */ - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) <= < 56); -#endif - return pgste; - -} - -static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address; - unsigned long nkey; - - if (!mm_uses_skeys(mm) || pte_val(entry) & _PAGE_INVALID) - return; - VM_BUG_ON(!(pte_val(*ptep) & _PAGE_INVALID)); - address =3D pte_val(entry) & PAGE_MASK; - /* - * Set page access key and fetch protection bit from pgste. - * The guest C/R information is still in the PGSTE, set real - * key C/R to 0. - */ - nkey =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - nkey |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - page_set_storage_key(address, nkey, 0); -#endif -} - -static inline pgste_t pgste_set_pte(pte_t *ptep, pgste_t pgste, pte_t entr= y) -{ -#ifdef CONFIG_PGSTE - if ((pte_val(entry) & _PAGE_PRESENT) && - (pte_val(entry) & _PAGE_WRITE) && - !(pte_val(entry) & _PAGE_INVALID)) { - if (!machine_has_esop()) { - /* - * Without enhanced suppression-on-protection force - * the dirty bit on for all writable ptes. - */ - entry =3D set_pte_bit(entry, __pgprot(_PAGE_DIRTY)); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - if (!(pte_val(entry) & _PAGE_PROTECT)) - /* This pte allows write access, set user-dirty */ - pgste =3D set_pgste_bit(pgste, PGSTE_UC_BIT); - } -#endif - set_pte(ptep, entry); - return pgste; -} - -static inline pgste_t pgste_pte_notify(struct mm_struct *mm, - unsigned long addr, - pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - unsigned long bits; - - bits =3D pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT); - if (bits) { - pgste =3D __pgste(pgste_val(pgste) ^ bits); - ptep_notify(mm, addr, ptep, bits); - } -#endif - return pgste; -} - -static inline pgste_t ptep_xchg_start(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) -{ - pgste_t pgste =3D __pgste(0); - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get_lock(ptep); - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - } - return pgste; -} - -static inline pte_t ptep_xchg_commit(struct mm_struct *mm, - unsigned long addr, pte_t *ptep, - pgste_t pgste, pte_t old, pte_t new) -{ - if (mm_has_pgste(mm)) { - if (pte_val(old) & _PAGE_INVALID) - pgste_set_key(ptep, pgste, new, mm); - if (pte_val(new) & _PAGE_INVALID) { - pgste =3D pgste_update_all(old, pgste, mm); - if ((pgste_val(pgste) & _PGSTE_GPS_USAGE_MASK) =3D=3D - _PGSTE_GPS_USAGE_UNUSED) - old =3D set_pte_bit(old, __pgprot(_PAGE_UNUSED)); - } - pgste =3D pgste_set_pte(ptep, pgste, new); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, new); - } - return old; -} - pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_direct(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_direct(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -313,15 +156,11 @@ EXPORT_SYMBOL(ptep_reset_dat_prot); pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_lazy(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -330,43 +169,20 @@ EXPORT_SYMBOL(ptep_xchg_lazy); pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep) { - pgste_t pgste; - pte_t old; - int nodat; - struct mm_struct *mm =3D vma->vm_mm; - - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - if (mm_has_pgste(mm)) { - pgste =3D pgste_update_all(old, pgste, mm); - pgste_set(ptep, pgste); - } - return old; + return ptep_flush_lazy(vma->vm_mm, addr, ptep, 1); } =20 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, pte_t old_pte, pte_t pte) { - pgste_t pgste; - struct mm_struct *mm =3D vma->vm_mm; - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get(ptep); - pgste_set_key(ptep, pgste, pte, mm); - pgste =3D pgste_set_pte(ptep, pgste, pte); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, pte); - } + set_pte(ptep, pte); } =20 static inline void pmdp_idte_local(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { if (machine_has_tlb_guest()) - __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, - mm->context.asce, IDTE_LOCAL); + __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, = IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); } @@ -420,40 +236,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *= mm, return old; } =20 -#ifdef CONFIG_PGSTE -static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pm= dp) -{ - struct vm_area_struct *vma; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - - /* We need a valid VMA, otherwise this is clearly a fault. */ - vma =3D vma_lookup(mm, addr); - if (!vma) - return -EFAULT; - - pgd =3D pgd_offset(mm, addr); - if (!pgd_present(*pgd)) - return -ENOENT; - - p4d =3D p4d_offset(pgd, addr); - if (!p4d_present(*p4d)) - return -ENOENT; - - pud =3D pud_offset(p4d, addr); - if (!pud_present(*pud)) - return -ENOENT; - - /* Large PUDs are not supported yet. */ - if (pud_leaf(*pud)) - return -EFAULT; - - *pmdp =3D pmd_offset(pud, addr); - return 0; -} -#endif - pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t new) { @@ -571,598 +353,3 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struc= t *mm, pmd_t *pmdp) return pgtable; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -#ifdef CONFIG_PGSTE -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry) -{ - pgste_t pgste; - - /* the mm_has_pgste() check is done in set_pte_at() */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_ZERO); - pgste_set_key(ptep, pgste, entry, mm); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - pgste_t pgste; - - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D set_pgste_bit(pgste, PGSTE_IN_BIT); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/** - * ptep_force_prot - change access rights of a locked pte - * @mm: pointer to the process mm_struct - * @addr: virtual address in the guest address space - * @ptep: pointer to the page table entry - * @prot: indicates guest access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bit: pgste bit to set (e.g. for notification) - * - * Returns 0 if the access rights were changed and -EAGAIN if the current - * and requested access rights are incompatible. - */ -int ptep_force_prot(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int prot, unsigned long bit) -{ - pte_t entry; - pgste_t pgste; - int pte_i, pte_p, nodat; - - pgste =3D pgste_get_lock(ptep); - entry =3D *ptep; - /* Check pte entry after all locks have been acquired */ - pte_i =3D pte_val(entry) & _PAGE_INVALID; - pte_p =3D pte_val(entry) & _PAGE_PROTECT; - if ((pte_i && (prot !=3D PROT_NONE)) || - (pte_p && (prot & PROT_WRITE))) { - pgste_set_unlock(ptep, pgste); - return -EAGAIN; - } - /* Change access rights and set pgste bit */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - if (prot =3D=3D PROT_NONE && !pte_i) { - ptep_flush_direct(mm, addr, ptep, nodat); - pgste =3D pgste_update_all(entry, pgste, mm); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_INVALID)); - } - if (prot =3D=3D PROT_READ && !pte_p) { - ptep_flush_direct(mm, addr, ptep, nodat); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_INVALID)); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - pgste =3D set_pgste_bit(pgste, bit); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - return 0; -} - -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte) -{ - pgste_t spgste, tpgste; - pte_t spte, tpte; - int rc =3D -EAGAIN; - - if (!(pte_val(*tptep) & _PAGE_INVALID)) - return 0; /* already shadowed */ - spgste =3D pgste_get_lock(sptep); - spte =3D *sptep; - if (!(pte_val(spte) & _PAGE_INVALID) && - !((pte_val(spte) & _PAGE_PROTECT) && - !(pte_val(pte) & _PAGE_PROTECT))) { - spgste =3D set_pgste_bit(spgste, PGSTE_VSIE_BIT); - tpgste =3D pgste_get_lock(tptep); - tpte =3D __pte((pte_val(spte) & PAGE_MASK) | - (pte_val(pte) & _PAGE_PROTECT)); - /* don't touch the storage key - it belongs to parent pgste */ - tpgste =3D pgste_set_pte(tptep, tpgste, tpte); - pgste_set_unlock(tptep, tpgste); - rc =3D 1; - } - pgste_set_unlock(sptep, spgste); - return rc; -} - -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep) -{ - pgste_t pgste; - int nodat; - - pgste =3D pgste_get_lock(ptep); - /* notifier is called by the caller */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_flush_direct(mm, saddr, ptep, nodat); - /* don't touch the storage key - it belongs to parent pgste */ - pgste =3D pgste_set_pte(ptep, pgste, __pte(_PAGE_INVALID)); - pgste_set_unlock(ptep, pgste); -} - -static void ptep_zap_softleaf_entry(struct mm_struct *mm, softleaf_t entry) -{ - if (softleaf_is_swap(entry)) - dec_mm_counter(mm, MM_SWAPENTS); - else if (softleaf_is_migration(entry)) { - struct folio *folio =3D softleaf_to_folio(entry); - - dec_mm_counter(mm, mm_counter(folio)); - } - free_swap_and_cache(entry); -} - -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int reset) -{ - unsigned long pgstev; - pgste_t pgste; - pte_t pte; - - /* Zap unused and logically-zero pages */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - pte =3D *ptep; - if (!reset && pte_swap(pte) && - ((pgstev & _PGSTE_GPS_USAGE_MASK) =3D=3D _PGSTE_GPS_USAGE_UNUSED || - (pgstev & _PGSTE_GPS_ZERO))) { - ptep_zap_softleaf_entry(mm, softleaf_from_pte(pte)); - pte_clear(mm, addr, ptep); - } - if (reset) - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODA= T); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - unsigned long ptev; - pgste_t pgste; - - /* Clear storage key ACC and F, but set R/C */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, PGSTE_GR_BIT | PGSTE_GC_BIT); - ptev =3D pte_val(*ptep); - if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) - page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 0); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/* - * Test and reset if a guest page is dirty - */ -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) -{ - pgste_t pgste; - pte_t pte; - bool dirty; - int nodat; - - pgste =3D pgste_get_lock(ptep); - dirty =3D !!(pgste_val(pgste) & PGSTE_UC_BIT); - pgste =3D clear_pgste_bit(pgste, PGSTE_UC_BIT); - pte =3D *ptep; - if (dirty && (pte_val(pte) & _PAGE_PRESENT)) { - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_ipte_global(mm, addr, ptep, nodat); - if (machine_has_esop() || !(pte_val(pte) & _PAGE_WRITE)) - pte =3D set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); - else - pte =3D set_pte_bit(pte, __pgprot(_PAGE_INVALID)); - set_pte(ptep, pte); - } - pgste_set_unlock(ptep, pgste); - return dirty; -} -EXPORT_SYMBOL_GPL(ptep_test_and_clear_uc); - -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq) -{ - unsigned long keyul, paddr; - spinlock_t *ptl; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * we can ignore attempts to set the key to 0, because it already is 0. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return key ? -EFAULT : 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return key ? -EFAULT : 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - /* - * Huge pmds need quiescing operations, they are - * always mapped. - */ - page_set_storage_key(paddr, key, 1); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - new =3D clear_pgste_bit(new, PGSTE_GR_BIT | PGSTE_GC_BIT | - PGSTE_ACC_BITS | PGSTE_FP_BIT); - keyul =3D (unsigned long) key; - new =3D set_pgste_bit(new, (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) <= < 48); - new =3D set_pgste_bit(new, (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 5= 6); - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - unsigned long bits, skey; - - paddr =3D pte_val(*ptep) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(paddr); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - skey =3D key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); - /* Set storage key ACC and FP */ - page_set_storage_key(paddr, skey, !nq); - /* Merge host changed & referenced into pgste */ - new =3D set_pgste_bit(new, bits << 52); - } - /* changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & - (PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT)) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_guest_storage_key); - -/* - * Conditionally set a guest storage key (handling csske). - * oldkey will be updated when either mr or mc is set and a pointer is giv= en. - * - * Returns 0 if a guests storage key update wasn't necessary, 1 if the gue= st - * storage key was updated and -EFAULT on access errors. - */ -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc) -{ - unsigned char tmp, mask =3D _PAGE_ACC_BITS | _PAGE_FP_BIT; - int rc; - - /* we can drop the pgste lock between getting and setting the key */ - if (mr | mc) { - rc =3D get_guest_storage_key(current->mm, addr, &tmp); - if (rc) - return rc; - if (oldkey) - *oldkey =3D tmp; - if (!mr) - mask |=3D _PAGE_REFERENCED; - if (!mc) - mask |=3D _PAGE_CHANGED; - if (!((tmp ^ key) & mask)) - return 0; - } - rc =3D set_guest_storage_key(current->mm, addr, key, nq); - return rc < 0 ? rc : 1; -} -EXPORT_SYMBOL(cond_set_guest_storage_key); - -/* - * Reset a guest reference bit (rrbe), returning the reference and changed= bit. - * - * Returns < 0 in case of error, otherwise the cc to be reported to the gu= est. - */ -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr) -{ - spinlock_t *ptl; - unsigned long paddr; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - int cc =3D 0; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0 and there is nothing for us to do. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - cc =3D page_reset_referenced(paddr); - spin_unlock(ptl); - return cc; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - /* Reset guest reference bit only */ - new =3D clear_pgste_bit(new, PGSTE_GR_BIT); - - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - paddr =3D pte_val(*ptep) & PAGE_MASK; - cc =3D page_reset_referenced(paddr); - /* Merge real referenced bit into host-set */ - new =3D set_pgste_bit(new, ((unsigned long)cc << 53) & PGSTE_HR_BIT); - } - /* Reflect guest's logical view, not physical */ - cc |=3D (pgste_val(old) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 49; - /* Changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & PGSTE_GR_BIT) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return cc; -} -EXPORT_SYMBOL(reset_guest_reference_bit); - -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key) -{ - unsigned long paddr; - spinlock_t *ptl; - pgste_t pgste; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0. - */ - *key =3D 0; - - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - *key =3D page_get_storage_key(paddr); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - pgste =3D pgste_get_lock(ptep); - *key =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - paddr =3D pte_val(*ptep) & PAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) - *key =3D page_get_storage_key(paddr); - /* Reflect guest's logical view, not physical */ - *key |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_guest_storage_key); - -/** - * pgste_perform_essa - perform ESSA actions on the PGSTE. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @orc: the specific action to perform, see the ESSA_SET_* macros. - * @oldpte: the PTE will be saved there if the pointer is not NULL. - * @oldpgste: the old PGSTE will be saved there if the pointer is not NULL. - * - * Return: 1 if the page is to be added to the CBRL, otherwise 0, - * or < 0 in case of error. -EINVAL is returned for invalid values - * of orc, -EFAULT for invalid addresses. - */ -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste) -{ - struct vm_area_struct *vma; - unsigned long pgstev; - spinlock_t *ptl; - pgste_t pgste; - pte_t *ptep; - int res =3D 0; - - WARN_ON_ONCE(orc > ESSA_MAX); - if (unlikely(orc > ESSA_MAX)) - return -EINVAL; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - if (oldpte) - *oldpte =3D pte_val(*ptep); - if (oldpgste) - *oldpgste =3D pgstev; - - switch (orc) { - case ESSA_GET_STATE: - break; - case ESSA_SET_STABLE: - pgstev &=3D ~(_PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT); - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - break; - case ESSA_SET_UNUSED: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_UNUSED; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_POT_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev |=3D _PGSTE_GPS_USAGE_POT_VOLATILE; - break; - } - if (pgstev & _PGSTE_GPS_ZERO) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - break; - } - if (!(pgstev & PGSTE_GC_BIT)) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - res =3D 1; - break; - } - break; - case ESSA_SET_STABLE_RESIDENT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - /* - * Since the resident state can go away any time after this - * call, we will not make this page resident. We can revisit - * this decision if a guest will ever start using this. - */ - break; - case ESSA_SET_STABLE_IF_RESIDENT: - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - } - break; - case ESSA_SET_STABLE_NODAT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE | _PGSTE_GPS_NODAT; - break; - default: - /* we should never get here! */ - break; - } - /* If we are discarding a page, set it to logical zero */ - if (res) - pgstev |=3D _PGSTE_GPS_ZERO; - - pgste =3D __pgste(pgstev); - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return res; -} -EXPORT_SYMBOL(pgste_perform_essa); - -/** - * set_pgste_bits - set specific PGSTE bits. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @bits: a bitmask representing the bits that will be touched - * @value: the values of the bits to be written. Only the bits in the mask - * will be written. - * - * Return: 0 on success, < 0 in case of error. - */ -int set_pgste_bits(struct mm_struct *mm, unsigned long hva, - unsigned long bits, unsigned long value) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pgste_t new; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - new =3D pgste_get_lock(ptep); - - new =3D clear_pgste_bit(new, bits); - new =3D set_pgste_bit(new, value & bits); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_pgste_bits); - -/** - * get_pgste - get the current PGSTE for the given address. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @pgstep: will be written with the current PGSTE for the given address. - * - * Return: 0 on success, < 0 in case of error. - */ -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - *pgstep =3D pgste_val(pgste_get(ptep)); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_pgste); -#endif diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 97d1b2824386..be3a2a603fb1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -343,15 +343,6 @@ int hugepage_madvise(struct vm_area_struct *vma, { switch (advice) { case MADV_HUGEPAGE: -#ifdef CONFIG_S390 - /* - * qemu blindly sets MADV_HUGEPAGE on all allocations, but s390 - * can't handle this properly after s390_enable_sie, so we simply - * ignore the madvise to prevent qemu from causing a SIGSEGV. - */ - if (mm_has_pgste(vma->vm_mm)) - return 0; -#endif *vm_flags &=3D ~VM_NOHUGEPAGE; *vm_flags |=3D VM_HUGEPAGE; /* --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C685B426D18; Wed, 4 Feb 2026 15:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; cv=none; b=aakU0BjhDcOiRSyq6tlEUZPUb6knmiddMHywVAAeA0S2761AdNJ+ySyT/hxoCDOLfXcHQySl6AtqtPXVpMdNSe4atZyNkBt0/CGBtyoDlk0+nnvmDKIb5bIS0KDfZS75zcP3CxhnxRRZY0h4lod9VN98KOJzLSSHqghnzxUcmWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; c=relaxed/simple; bh=iaDeeAoqyQexpm0v5nF88NpyzLFFGn3GmxMmLhX7GXA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JkxqX0gEqixzKVsUqspR2fpEc1Yj/rnT+MUJPyDuAUmV+D4EdtL9d4A/zqTWlUzG7WlZLvh1W+j7LAe61hnJ3oKt8BgzqXL//h8Vm5SBLrU0U7nb87tCBEuvTHOEa1bM3kmcfULZzhsNRPybBJj3eXFKGGJbJXXnTF5nks7pLXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=OiHvoJwN; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="OiHvoJwN" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6142HcSH026415; Wed, 4 Feb 2026 15:03:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=ns863W8Wv5bMVsucA hesthm4tlT4MBzu/kPxzIiikfo=; b=OiHvoJwN7KNfeYGsoFD/MIWsjKaZXevJY Vk+qLM1tpZEK7H0gJPqFSS/M2IDnJtqQ/UwDPQS1NMd7wpTBeeJ0GI9SsIvFEB2H FOGKzZOC23JIxHwwk5crmjRluyRGPkYd7uGTNOhkQ+SuBTRwF9QTsmoJczKD1xsk bbeRH6Mgw5BJcc5w1/Z1z7tXcCVlQ4J0w1ylolJMLAuvehn4LdOKYEkq04Pf0hp1 aVTpIf1lcISrkCpu1FdZXtV1J9SSMa1nit3juYrEw659+HqvVWd8xx4/V1BICXpH 16Sj5zaLmFAvAlYoYJo+IEJcP38vfwdgVSwTPX0Te7ajyKBBwG24w== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c175n00fb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614C9rYJ029053; Wed, 4 Feb 2026 15:03:12 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1v2sdsq0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F386M15598010 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:08 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4EED42004D; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0E4C42004B; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 27/29] KVM: s390: Enable 1M pages for gmap Date: Wed, 4 Feb 2026 16:02:56 +0100 Message-ID: <20260204150259.60425-28-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: FJarjiDj94UGvX1A0WsCm68wUcWM90Or X-Authority-Analysis: v=2.4 cv=VcX6/Vp9 c=1 sm=1 tr=0 ts=69835fb1 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=2DwSK8QdlStOok5YbGYA:9 X-Proofpoint-GUID: FJarjiDj94UGvX1A0WsCm68wUcWM90Or X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX/f5Szz06VE5y zYtgjsGzLILIfBrgwGccHzS+7hWG3Ai0K0bBJEyZD9eI2McMBieUdEzrCwB5qVDAIW8E1DSbQVl aVCdHJKyYOh/25NyVBf1Eo5OBa/troojWbO7LzzWtOBKZtUsJwwquwWiRVSd4U5fnutRJRfjbhS 8Qe5yRCVOrXdlB84ch1TIeomCFdXHFh7PFgC7+DuZYu8WdiFoeGA+MXHCb8Gl6FwSVJ6im/plCI xl3FTCnL+utk4wEcfmKSGVbjVAaSy6XYnrQCeAmjOMD9bu52TaKSL+WDk+bcnjm++SQAcsITMi4 SKDeU7Cz9fpjul0v/mhV+EI55w6zxftmjinaDjvBIXuM+gu10jhP8Ma9n2DcQel4icXJud2gf8H wmZDdf6tLcjMiyLyEgnT+5u4Gm5RAC7Hf8zJjIotxaKeqngEcplzMizpheG7jQ6BHVSt/3GVp/4 /lzxzVniAqwFKsvOVpw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" While userspace is allowed to have pages of any size, the new gmap would always use 4k pages to back the guest. Enable 1M pages for gmap. This allows 1M pages to be used to back a guest when userspace is using 1M pages for the corresponding addresses (e.g. THP or hugetlbfs). Remove the limitation that disallowed having nested guests and hugepages at the same time. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- arch/s390/kvm/gmap.c | 2 +- arch/s390/kvm/kvm-s390.c | 6 +----- arch/s390/kvm/pv.c | 3 +++ 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index fea1c66fcabe..da222962ef6d 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -620,7 +620,7 @@ static inline bool gmap_2g_allowed(struct gmap *gmap, g= fn_t gfn) =20 static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) { - return false; + return test_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &gmap->flags); } =20 int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index bde55761bf8a..ac7b5f56f0b5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -851,6 +851,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; + set_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &kvm->arch.gmap->flags); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -5729,11 +5730,6 @@ static int __init kvm_s390_init(void) return -ENODEV; } =20 - if (nested && hpage) { - pr_info("A KVM host that supports nesting cannot back its KVM guests wit= h huge pages\n"); - return -EINVAL; - } - for (i =3D 0; i < 16; i++) kvm_s390_fac_base[i] |=3D stfle_fac_list[i] & nonhyp_mask(i); diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index a48a8afd40df..461b413c76a3 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -721,6 +721,9 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) uvcb.flags.ap_allow_instr =3D kvm->arch.model.uv_feat_guest.ap; uvcb.flags.ap_instr_intr =3D kvm->arch.model.uv_feat_guest.ap_intr; =20 + clear_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &kvm->arch.gmap->flags); + gmap_split_huge_pages(kvm->arch.gmap); + cc =3D uv_call_sched(0, (u64)&uvcb); *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75112426EB7; Wed, 4 Feb 2026 15:03:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; cv=none; b=ohhWb+2JZdaXCeazV9dSxqNUrAe3dy9CcuN85LBVk5XwwqAfVC9sbPMaZCPaz+5DraRojNDuxykRBn0FqPY3ovOALGK1dRpFNCGUAvHy6hFhCe5zrLuIVgYaZIIw+i6j3nPPbiXGboW0wj62FpYbFYzqd3//Y+iRHLxPMvNhAa0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217397; c=relaxed/simple; bh=Vjj9x0om/TE5l/TnOuT6oFg8gd3mI14YxbuDYWY/aD8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gcXDjsiOhZgLyEDnT0alhYfwlAW5/KmQtTbNYi+o5SBkTzwe0WKe6CWyHbJ9gJz5dU0tlrAzTmOz+cmoIlgJEQ5ka3qt7zI4NL30Ay40jcm/JOiFUZse78VlO6GxOWvTwq9AKVsjVTR1l7MoHDzkVhWI07nFTwDvvymS1lOR8yo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ErEOwL2X; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ErEOwL2X" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 613MZQha024345; Wed, 4 Feb 2026 15:03:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=xoow5PPmZXyqk0bxm kYGArhZeaNXf1UPw8ifuUMYBFw=; b=ErEOwL2X6PQADCiIVl30mwfUJPWTmOnSc 1jaCymmZcmUg7/v29uIDuYWfFfFznrF5Ss2797RYg06b0tkDfLKMxkjG2bdpK2Ca wJ/rackoVD+Z+hqf1EVqBK62DLB36jqcbkzxpRwCwAZzEJYnMGPLzVYP8sM/1kmN S3nVuPSruh12IGaii8l4t2nnui4EQLSNjG9nYA6lvkvxlm0WoEBQnln6ZUtcrmQT UsckI58CDIlqOl2jXHJW2jIR5dgtSnMgvrqaNiFWYeZKv7ih3rTDQUDNtQnmERYS 3SVKWNLkyi3ySDEhcd27DM0I3k8MnNQq5DC5sF8ex0lDqLnytHvwQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c175n00fc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:13 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614BYpMH029047; Wed, 4 Feb 2026 15:03:12 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4c1v2sdsq2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:12 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F387U30474824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:08 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 96B6120043; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 55A2520040; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 28/29] KVM: s390: Storage key manipulation IOCTL Date: Wed, 4 Feb 2026 16:02:57 +0100 Message-ID: <20260204150259.60425-29-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: d0Q2CYolNDsfLQuLVl0ZONy5LoH9KwR2 X-Authority-Analysis: v=2.4 cv=VcX6/Vp9 c=1 sm=1 tr=0 ts=69835fb1 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=jVMGocEsZ36WJnCjVsIA:9 X-Proofpoint-GUID: d0Q2CYolNDsfLQuLVl0ZONy5LoH9KwR2 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX0yBCghTj79AK Gc/0eudK07H+oByZhT/EWlm1lnEfgP3m66qM5oLn3GSDjiHVnBUSEGJbFBuja5fyaYImRSGLQnk olPstaThebGdwfg6xVhZrqCaL+CINB3qzbbmPEpu6cWMZYgVmC/Z5pACOe8I/5znAz21f9t4Tez NoPr8mXSyfgjFiiFb1rW1HzQJMpooWbU6WDi4md4Ri6tBgh3iN7DaN3CVrZeOiy7RwOmVOcfl1H 7bEas1CAkdhxeCdIxsSOOhLtLJueHoDqClETYYdOui3vEriDM812Yv0d5IEEnehFL3Z04ok7Hg1 PLyVOPS5qey5CwJepwfhU3QOw5DfmyKh6m07rP2mVEYSEfPCCQUDyQr+Se2BxWt7VBX+jL4HYA2 MrJn7hXT/EAMmEo4B7cEp2jjI+SdKsNO7rPsSI8nqSlmCcr4ztyEGDawwW2edgr3gHyyb73yZ/Q UcyiyO39gXTx+Zoc4Zg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" Add a new IOCTL to allow userspace to manipulate storage keys directly. This will make it easier to write selftests related to storage keys. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- Documentation/virt/kvm/api.rst | 42 ++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 58 ++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 11 +++++++ 3 files changed, 111 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 01a3abef8abb..72e04dedb068 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6517,6 +6517,40 @@ the capability to be present. =20 `flags` must currently be zero. =20 +4.144 KVM_S390_KEYOP +-------------------- + +:Capability: KVM_CAP_S390_KEYOP +:Architectures: s390 +:Type: vm ioctl +:Parameters: struct kvm_s390_keyop (in/out) +:Returns: 0 in case of success, < 0 on error + +The specified key operation is performed on the given guest address. The +previous storage key (or the relevant part thereof) will be returned in +`key`. + +:: + + struct kvm_s390_keyop { + __u64 guest_addr; + __u8 key; + __u8 operation; + }; + +Currently supported values for ``operation``: + +KVM_S390_KEYOP_ISKE + Returns the storage key for the guest address ``guest_addr`` in ``key``. + +KVM_S390_KEYOP_RRBE + Resets the reference bit for the guest address ``guest_addr``, returning= the + R and C bits of the old storage key in ``key``; the remaining fields of + the storage key will be set to 0. + +KVM_S390_KEYOP_SSKE + Sets the storage key for the guest address ``guest_addr`` to the key + specified in ``key``, returning the previous value in ``key``. =20 .. _kvm_run: =20 @@ -9287,6 +9321,14 @@ The presence of this capability indicates that KVM_R= UN will update the KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the vCPU was executing nested guest code when it exited. =20 +8.46 KVM_CAP_S390_KEYOP +----------------------- + +:Architectures: s390 + +The presence of this capability indicates that the KVM_S390_KEYOP ioctl is +available. + KVM exits with the register state of either the L1 or L2 guest depending on which executed at the time of an exit. Userspace must take care to differentiate between these cases. diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ac7b5f56f0b5..9f24252775dd 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -554,6 +554,37 @@ static void __kvm_s390_exit(void) debug_unregister(kvm_s390_dbf_uv); } =20 +static int kvm_s390_keyop(struct kvm_s390_mmu_cache *mc, struct kvm *kvm, = int op, + unsigned long addr, union skey skey) +{ + union asce asce =3D kvm->arch.gmap->asce; + gfn_t gfn =3D gpa_to_gfn(addr); + int r; + + guard(read_lock)(&kvm->mmu_lock); + + switch (op) { + case KVM_S390_KEYOP_SSKE: + r =3D dat_cond_set_storage_key(mc, asce, gfn, skey, &skey, 0, 0, 0); + if (r >=3D 0) + return skey.skey; + break; + case KVM_S390_KEYOP_ISKE: + r =3D dat_get_storage_key(asce, gfn, &skey); + if (!r) + return skey.skey; + break; + case KVM_S390_KEYOP_RRBE: + r =3D dat_reset_reference_bit(asce, gfn); + if (r > 0) + return r << 1; + break; + default: + return -EINVAL; + } + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) @@ -598,6 +629,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) case KVM_CAP_S390_DIAG318: case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_S390_USER_OPEREXEC: + case KVM_CAP_S390_KEYOP: r =3D 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -2931,6 +2963,32 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned in= t ioctl, unsigned long arg) r =3D -EFAULT; break; } + case KVM_S390_KEYOP: { + struct kvm_s390_mmu_cache *mc; + struct kvm_s390_keyop kop; + union skey skey; + + if (copy_from_user(&kop, argp, sizeof(kop))) { + r =3D -EFAULT; + break; + } + skey.skey =3D kop.key; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + r =3D kvm_s390_keyop(mc, kvm, kop.operation, kop.guest_addr, skey); + kvm_s390_free_mmu_cache(mc); + if (r < 0) + break; + + kop.key =3D r; + r =3D 0; + if (copy_to_user(argp, &kop, sizeof(kop))) + r =3D -EFAULT; + break; + } case KVM_S390_ZPCI_OP: { struct kvm_s390_zpci_op args; =20 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dddb781b0507..ab3d3d96e75f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -974,6 +974,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 +#define KVM_CAP_S390_KEYOP 247 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1219,6 +1220,15 @@ struct kvm_vfio_spapr_tce { __s32 tablefd; }; =20 +#define KVM_S390_KEYOP_ISKE 0x01 +#define KVM_S390_KEYOP_RRBE 0x02 +#define KVM_S390_KEYOP_SSKE 0x03 +struct kvm_s390_keyop { + __u64 guest_addr; + __u8 key; + __u8 operation; +}; + /* * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns * a vcpu fd. @@ -1238,6 +1248,7 @@ struct kvm_vfio_spapr_tce { #define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_ma= pping) #define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_ma= pping) #define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long) +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x53, struct kvm_s390_keyop) =20 /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) --=20 2.52.0 From nobody Sat Feb 7 06:34:15 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF53F428473; Wed, 4 Feb 2026 15:03:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217399; cv=none; b=ToH3I2kBdgXg4L053uxL/HP2dqfuIWJG4/HXmlXDGXrs9+igxG7HUB9qyjRyN/NlI/9Z7iIq1l5tcKIOmO5IHT5vTA6g36YcrZFw5oxOcnZCzL6As9CYvg/xC9ZA/Lmq53ASLSfIY374qkZTYEncCckYczfAZtUcZwo8XDyln/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770217399; c=relaxed/simple; bh=cKJxClPK5a3VZay+oHoACghzm5a+dYFVppNoSPLSSF4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nEfMxJK8NAHoA0U0yQUMz2JLTCNt//lZ7cllTxnsDMOBRqwj/Z4SEn4umZkam2wKr5BiQHEvdmnr1Pw/oQay4kDgK8nVX4UoK7lUjiEUqMTG2w/AJQDVbV5xEZIlbf0CjmjmwpOTZ2bMf7TnNOYdx9USpU5F8n6i+lfw5Uil2Ds= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=o3PNiGri; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="o3PNiGri" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 61407h9f023046; Wed, 4 Feb 2026 15:03:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=CL2qKJCron5eTwIBP MOywgwrMnhf68egnc7ynJ6/vtw=; b=o3PNiGriQsQtlPeWRCguwdUWDiZmj0Fs4 YWL5t2DJi2UvEfPN80XrRdIyFKKmC7dwetFaS/4QMi8ZXsVPRFBOlA/slMJTPd9U 5ttid/woQ/koAybjsPGU9SFQyMrxinbe4upHOGsPQDLqS+xB0zdIniyN7R3MZr18 rO5PXTeDltBuauWn0/bTlBHJLXYYoH2d3fXfWJXf3pA3ePAJYL2Ls6ctdt4da0Gu 2Iw6dRJ32NYWjbdJSFNVZXi58a5ahRZsI3t0UYnP8auaI+wyeNRQ3vuLiaQMIugC +EF2vUJ6jBxwmSsUfDbpTOPF72XTEQpprb9zpXvpdjCYHdZDSnQww== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4c19dtad7c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:14 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 614CZO8F025700; Wed, 4 Feb 2026 15:03:13 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4c1w2mwnkh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Feb 2026 15:03:13 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 614F39E743057578 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Feb 2026 15:03:09 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E701A20043; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9E1F52004B; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) Received: from p-imbrenda.aag-de.ibm.com (unknown [9.52.223.175]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Feb 2026 15:03:08 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@kernel.org, gerald.schaefer@linux.ibm.com Subject: [PATCH v7 29/29] KVM: s390: selftests: Add selftest for the KVM_S390_KEYOP ioctl Date: Wed, 4 Feb 2026 16:02:58 +0100 Message-ID: <20260204150259.60425-30-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204150259.60425-1-imbrenda@linux.ibm.com> References: <20260204150259.60425-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDExMyBTYWx0ZWRfX9qTUbrNkkwo7 h6rSoOIO6D8Jo0iu9Lo9np5rz0xaBZTkCYwea9j4uim31aTHCb6OQ14ehPbpxsjBjf06uXyDyQ/ roJoM0iiASlVoo4y8ujLvvM//GYWiT4eQ1rRALTh8Go99S0sDt4fzXRyenBUElj9bAnCAYcmrKy 7PkHvzq7wCICI2f/mXFnKaUP2u7ivhgdxTKM9YcKSv8Sm80CwHqsZxuv7aYbJjb8VJiAuEkfnu+ 4WnkVIZMB0ngyf2S7Lned3zEkLCcPoHPckz4u0MpjZTf467knMiE21W+xH/8WseQD76Bk+IZ05v /MTOlgFQzid5ofD5m+v6mzuw9+ft3QkmER5FpPzESJ/nEOu/0ofrQy3OLqi+0Udegeg0ypntC7z 8y8sJcTcFfzT6Wuas+mtC86HmHa7sSmYb4HisFykmgrCi60Icbav8p5fo62BLEymR+tFY8Xr1L2 pJUXRccWVY4GcxfdLkg== X-Proofpoint-GUID: qnK821yCSujqTcQWo5QK88i-kM0uyAS7 X-Proofpoint-ORIG-GUID: qnK821yCSujqTcQWo5QK88i-kM0uyAS7 X-Authority-Analysis: v=2.4 cv=LesxKzfi c=1 sm=1 tr=0 ts=69835fb2 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=kO5kFCcFLGMwZ0vRLfsA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_04,2026-02-04_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 phishscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2601150000 definitions=main-2602040113 Content-Type: text/plain; charset="utf-8" This test allows to test the various storage key handling functions. Signed-off-by: Claudio Imbrenda Acked-by: Heiko Carstens --- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/s390/keyop.c | 299 +++++++++++++++++++++++ 2 files changed, 300 insertions(+) create mode 100644 tools/testing/selftests/kvm/s390/keyop.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index ba5c2b643efa..2e4774666723 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -199,6 +199,7 @@ TEST_GEN_PROGS_s390 +=3D s390/cpumodel_subfuncs_test TEST_GEN_PROGS_s390 +=3D s390/shared_zeropage_test TEST_GEN_PROGS_s390 +=3D s390/ucontrol_test TEST_GEN_PROGS_s390 +=3D s390/user_operexec +TEST_GEN_PROGS_s390 +=3D s390/keyop TEST_GEN_PROGS_s390 +=3D rseq_test =20 TEST_GEN_PROGS_riscv =3D $(TEST_GEN_PROGS_COMMON) diff --git a/tools/testing/selftests/kvm/s390/keyop.c b/tools/testing/selft= ests/kvm/s390/keyop.c new file mode 100644 index 000000000000..c7805e87d12c --- /dev/null +++ b/tools/testing/selftests/kvm/s390/keyop.c @@ -0,0 +1,299 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test for s390x KVM_S390_KEYOP + * + * Copyright IBM Corp. 2026 + * + * Authors: + * Claudio Imbrenda + */ +#include +#include +#include +#include + +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "kselftest.h" +#include "processor.h" + +#define BUF_PAGES 128UL +#define GUEST_PAGES 256UL + +#define BUF_START_GFN (GUEST_PAGES - BUF_PAGES) +#define BUF_START_ADDR (BUF_START_GFN << PAGE_SHIFT) + +#define KEY_BITS_ACC 0xf0 +#define KEY_BIT_F 0x08 +#define KEY_BIT_R 0x04 +#define KEY_BIT_C 0x02 + +#define KEY_BITS_RC (KEY_BIT_R | KEY_BIT_C) +#define KEY_BITS_ALL (KEY_BITS_ACC | KEY_BIT_F | KEY_BITS_RC) + +static unsigned char tmp[BUF_PAGES]; +static unsigned char old[BUF_PAGES]; +static unsigned char expected[BUF_PAGES]; + +static int _get_skeys(struct kvm_vcpu *vcpu, unsigned char skeys[]) +{ + struct kvm_s390_skeys skeys_ioctl =3D { + .start_gfn =3D BUF_START_GFN, + .count =3D BUF_PAGES, + .skeydata_addr =3D (unsigned long)skeys, + }; + + return __vm_ioctl(vcpu->vm, KVM_S390_GET_SKEYS, &skeys_ioctl); +} + +static void get_skeys(struct kvm_vcpu *vcpu, unsigned char skeys[]) +{ + int r =3D _get_skeys(vcpu, skeys); + + TEST_ASSERT(!r, "Failed to get storage keys, r=3D%d", r); +} + +static void set_skeys(struct kvm_vcpu *vcpu, unsigned char skeys[]) +{ + struct kvm_s390_skeys skeys_ioctl =3D { + .start_gfn =3D BUF_START_GFN, + .count =3D BUF_PAGES, + .skeydata_addr =3D (unsigned long)skeys, + }; + int r; + + r =3D __vm_ioctl(vcpu->vm, KVM_S390_SET_SKEYS, &skeys_ioctl); + TEST_ASSERT(!r, "Failed to set storage keys, r=3D%d", r); +} + +static int do_keyop(struct kvm_vcpu *vcpu, int op, unsigned long page_idx,= unsigned char skey) +{ + struct kvm_s390_keyop keyop =3D { + .guest_addr =3D BUF_START_ADDR + page_idx * PAGE_SIZE, + .key =3D skey, + .operation =3D op, + }; + int r; + + r =3D __vm_ioctl(vcpu->vm, KVM_S390_KEYOP, &keyop); + TEST_ASSERT(!r, "Failed to perform keyop, r=3D%d", r); + TEST_ASSERT((keyop.key & 1) =3D=3D 0, + "Last bit of key is 1, should be 0! page %lu, new key=3D%#x, old key= =3D%#x", + page_idx, skey, keyop.key); + + return keyop.key; +} + +static void fault_in_buffer(struct kvm_vcpu *vcpu, int where, int cur_loc) +{ + unsigned long i; + int r; + + if (where !=3D cur_loc) + return; + + for (i =3D 0; i < BUF_PAGES; i++) { + r =3D ioctl(vcpu->fd, KVM_S390_VCPU_FAULT, BUF_START_ADDR + i * PAGE_SIZ= E); + TEST_ASSERT(!r, "Faulting in buffer page %lu, r=3D%d", i, r); + } +} + +static inline void set_pattern(unsigned char skeys[]) +{ + int i; + + for (i =3D 0; i < BUF_PAGES; i++) + skeys[i] =3D i << 1; +} + +static void dump_sk(const unsigned char skeys[], const char *descr) +{ + int i, j; + + fprintf(stderr, "# %s:\n", descr); + for (i =3D 0; i < BUF_PAGES; i +=3D 32) { + fprintf(stderr, "# %3d: ", i); + for (j =3D 0; j < 32; j++) + fprintf(stderr, "%02x ", skeys[i + j]); + fprintf(stderr, "\n"); + } +} + +static inline void compare(const unsigned char what[], const unsigned char= expected[], + const char *descr, int fault_in_loc) +{ + int i; + + for (i =3D 0; i < BUF_PAGES; i++) { + if (expected[i] !=3D what[i]) { + dump_sk(expected, "Expected"); + dump_sk(what, "Got"); + } + TEST_ASSERT(expected[i] =3D=3D what[i], + "%s! fault-in location %d, page %d, expected %#x, got %#x", + descr, fault_in_loc, i, expected[i], what[i]); + } +} + +static inline void clear_all(void) +{ + memset(tmp, 0, BUF_PAGES); + memset(old, 0, BUF_PAGES); + memset(expected, 0, BUF_PAGES); +} + +static void test_init(struct kvm_vcpu *vcpu, int fault_in) +{ + /* Set all storage keys to zero */ + fault_in_buffer(vcpu, fault_in, 1); + set_skeys(vcpu, expected); + + fault_in_buffer(vcpu, fault_in, 2); + get_skeys(vcpu, tmp); + compare(tmp, expected, "Setting keys not zero", fault_in); + + /* Set storage keys to a sequential pattern */ + fault_in_buffer(vcpu, fault_in, 3); + set_pattern(expected); + set_skeys(vcpu, expected); + + fault_in_buffer(vcpu, fault_in, 4); + get_skeys(vcpu, tmp); + compare(tmp, expected, "Setting storage keys failed", fault_in); +} + +static void test_rrbe(struct kvm_vcpu *vcpu, int fault_in) +{ + unsigned char k; + int i; + + /* Set storage keys to a sequential pattern */ + fault_in_buffer(vcpu, fault_in, 1); + set_pattern(expected); + set_skeys(vcpu, expected); + + /* Call the RRBE KEYOP ioctl on each page and verify the result */ + fault_in_buffer(vcpu, fault_in, 2); + for (i =3D 0; i < BUF_PAGES; i++) { + k =3D do_keyop(vcpu, KVM_S390_KEYOP_RRBE, i, 0xff); + TEST_ASSERT((expected[i] & KEY_BITS_RC) =3D=3D k, + "Old R or C value mismatch! expected: %#x, got %#x", + expected[i] & KEY_BITS_RC, k); + if (i =3D=3D BUF_PAGES / 2) + fault_in_buffer(vcpu, fault_in, 3); + } + + for (i =3D 0; i < BUF_PAGES; i++) + expected[i] &=3D ~KEY_BIT_R; + + /* Verify that only the R bit has been cleared */ + fault_in_buffer(vcpu, fault_in, 4); + get_skeys(vcpu, tmp); + compare(tmp, expected, "New value mismatch", fault_in); +} + +static void test_iske(struct kvm_vcpu *vcpu, int fault_in) +{ + int i; + + /* Set storage keys to a sequential pattern */ + fault_in_buffer(vcpu, fault_in, 1); + set_pattern(expected); + set_skeys(vcpu, expected); + + /* Call the ISKE KEYOP ioctl on each page and verify the result */ + fault_in_buffer(vcpu, fault_in, 2); + for (i =3D 0; i < BUF_PAGES; i++) { + tmp[i] =3D do_keyop(vcpu, KVM_S390_KEYOP_ISKE, i, 0xff); + if (i =3D=3D BUF_PAGES / 2) + fault_in_buffer(vcpu, fault_in, 3); + } + compare(tmp, expected, "Old value mismatch", fault_in); + + /* Check storage keys have not changed */ + fault_in_buffer(vcpu, fault_in, 4); + get_skeys(vcpu, tmp); + compare(tmp, expected, "Storage keys values changed", fault_in); +} + +static void test_sske(struct kvm_vcpu *vcpu, int fault_in) +{ + int i; + + /* Set storage keys to a sequential pattern */ + fault_in_buffer(vcpu, fault_in, 1); + set_pattern(tmp); + set_skeys(vcpu, tmp); + + /* Call the SSKE KEYOP ioctl on each page and verify the result */ + fault_in_buffer(vcpu, fault_in, 2); + for (i =3D 0; i < BUF_PAGES; i++) { + expected[i] =3D ~tmp[i] & KEY_BITS_ALL; + /* Set the new storage keys to be the bit-inversion of the previous ones= */ + old[i] =3D do_keyop(vcpu, KVM_S390_KEYOP_SSKE, i, expected[i] | 1); + if (i =3D=3D BUF_PAGES / 2) + fault_in_buffer(vcpu, fault_in, 3); + } + compare(old, tmp, "Old value mismatch", fault_in); + + /* Verify that the storage keys have been set correctly */ + fault_in_buffer(vcpu, fault_in, 4); + get_skeys(vcpu, tmp); + compare(tmp, expected, "New value mismatch", fault_in); +} + +static struct testdef { + const char *name; + void (*test)(struct kvm_vcpu *vcpu, int fault_in_location); + int n_fault_in_locations; +} testplan[] =3D { + { "Initialization", test_init, 5 }, + { "RRBE", test_rrbe, 5 }, + { "ISKE", test_iske, 5 }, + { "SSKE", test_sske, 5 }, +}; + +static void run_test(void (*the_test)(struct kvm_vcpu *, int), int fault_i= n_location) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int r; + + vm =3D vm_create_barebones(); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, GUEST_PAGES, = 0); + vcpu =3D __vm_vcpu_add(vm, 0); + + r =3D _get_skeys(vcpu, tmp); + TEST_ASSERT(r =3D=3D KVM_S390_GET_SKEYS_NONE, + "Storage keys are not disabled initially, r=3D%d", r); + + clear_all(); + + the_test(vcpu, fault_in_location); + + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + int i, f; + + TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_KEYOP)); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_UCONTROL)); + + ksft_print_header(); + for (i =3D 0, f =3D 0; i < ARRAY_SIZE(testplan); i++) + f +=3D testplan[i].n_fault_in_locations; + ksft_set_plan(f); + + for (i =3D 0; i < ARRAY_SIZE(testplan); i++) { + for (f =3D 0; f < testplan[i].n_fault_in_locations; f++) { + run_test(testplan[i].test, f); + ksft_test_result_pass("%s (fault-in location %d)\n", testplan[i].name, = f); + } + } + + ksft_finished(); /* Print results and exit() accordingly */ +} --=20 2.52.0