From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 974E8331A40; Mon, 22 Dec 2025 16:50:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422245; cv=none; b=Uhv2nfPkkdyPln3MS3zKlL8+tPTqqEb9Don3mD3wbcCwaSMGJ0n2EPVw4W4sJ0L4fNcRtjMCMDXApFL3+7JFsX690g4as/gTLD3i1D/8nzXEQ795YSD0hFrt8Y5bv4k0wg2OzDcCOBwDP0uhitxQEqwoUnd/SKKk4O21p2ER184= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422245; c=relaxed/simple; bh=Frn6hltIyV5WJ2d6pPnm0V60inoBt8b4iK5mJVeSKiw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Gn8EfCinTnuUNQLCuhFJW96LKRw0dIH3SwejzDf96wqSgBTaf1Supwvmaks2R4NIkdwqwCZxelr+B2qFs8Yi00rm+A9oRAgPVFWvgsyf/+mmg9KQV1lRzvIPEJxYdMu8iO/B/+32+0qDca4C3Ki5rl0MArGXQN+nj7rIosbxoNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=rNfFLqs9; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="rNfFLqs9" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM9TqQF006558; Mon, 22 Dec 2025 16:50:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=rv39S/P+MdKrw6xwW OVNfixy/froEIPZnW3Fr4nBnwA=; b=rNfFLqs9DsZC53L/ueo7Xm7b0+LmNY07+ I7dw1uKK8wgJq5r+heWJB3eGIhwaRpiB1bNtuBkGNHTHfJ1Iz5fGV8zAFN4xtAcv cY4Rix7FNXAOrc5Y2lAOVwsGB73kM0dGkQEd22RX7qiQBCSWsHcGroI3uWHgk9jB BgbLI5FPFqngGmuynrdjiB8uZZ0fJZlHXGHqvCApLdpntW6GyQAT+jJKshA8gUdb Y4AXPxbb2DscigjYLzxgB/ICMm29NqxRrlh+zuOUSzPY4CCv0Lu3wlyi0Y4d+4WO sNbLCeUUP+kQw2sN35FjDwkQ4n+/029qKSYrJHpIVoUVRLoBcqPuQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5j7e10xx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:41 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGOJsj004192; Mon, 22 Dec 2025 16:50:40 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq57v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:40 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGoa5a25428248 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:36 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F45620043; Mon, 22 Dec 2025 16:50:36 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F0CCF20040; Mon, 22 Dec 2025 16:50:34 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:34 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 01/28] KVM: s390: Refactor pgste lock and unlock functions Date: Mon, 22 Dec 2025 17:50:06 +0100 Message-ID: <20251222165033.162329-2-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=G8YR0tk5 c=1 sm=1 tr=0 ts=694976e1 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=gw90SDbXj9Ny8T8iXsQA:9 X-Proofpoint-ORIG-GUID: EHJiBH7nGbZkwtaw8iEipERcpS_Dchul X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX0q19DCMiohw6 KIaJQ0gnsQ4JyCf1YmYU+psEY/y8/XFQlFJqyT7VEZi5+FJNLcm9OSIehIDxEeRV4GkRuOvTHIJ dEl5HXx+Szdrgy41UDwdCg2qYT+NCQ9M7nzxkvvHv8C23ECJ0XozZLO9tOUhMFex1lxPW+23k1/ vPJ18eQkpT1zdjreC5GKZy3BOeNIC2KTBuTMYkm8gLJeRLk4Aec5O2+oiVEhZEJTEuJp4elA6Mt P+OvOz6CfIAKbf46xJqlSzNZ4rptaPOEsPy0nv2YuWtMWxVUya4sUUtOPY4SuhqG2HpCtvAVu0x Bvb9r4G2BJDu0iSlN2dUjspoMKnAU2JXQjqFVy76vvjLguL75hPF7TkIvam7I16qiMkUWr39t1y KEXg2kYw6xi2cZZeFWG1mqRs6YE9zbAJR1oNpIPqwALVh0RHccAi7vrh0KBFA/FmXEHrbupp/YF w3ppaF1j9EIeX7n6MFw== X-Proofpoint-GUID: EHJiBH7nGbZkwtaw8iEipERcpS_Dchul X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 clxscore=1015 adultscore=0 spamscore=0 malwarescore=0 impostorscore=0 phishscore=0 suspectscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Move the pgste lock and unlock functions back into mm/pgtable.c and duplicate them in mm/gmap_helpers.c to avoid function name collisions later on. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/pgtable.h | 22 ---------------------- arch/s390/mm/gmap_helpers.c | 23 ++++++++++++++++++++++- arch/s390/mm/pgtable.c | 23 ++++++++++++++++++++++- 3 files changed, 44 insertions(+), 24 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index bca9b29778c3..8194a2b12ecf 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -2040,26 +2040,4 @@ static inline unsigned long gmap_pgste_get_pgt_addr(= unsigned long *pgt) return res; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index d41b19925a5a..4fba13675950 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -15,7 +15,6 @@ #include #include #include -#include =20 /** * ptep_zap_softleaf_entry() - discard a software leaf entry. @@ -35,6 +34,28 @@ static void ptep_zap_softleaf_entry(struct mm_struct *mm= , softleaf_t entry) free_swap_and_cache(entry); } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 666adcd681ab..08743c1dac2f 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -24,7 +24,6 @@ #include #include #include -#include #include =20 pgprot_t pgprot_writecombine(pgprot_t prot) @@ -116,6 +115,28 @@ static inline pte_t ptep_flush_lazy(struct mm_struct *= mm, return old; } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + static inline pgste_t pgste_get(pte_t *ptep) { unsigned long pgste =3D 0; --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7A67332ECD; Mon, 22 Dec 2025 16:50:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422246; cv=none; b=I2si0Hjnvdix/g3G1LaP+jeRcUFwL443KUJnEBsZT0X/oTxPKMUZ5ZUCuneoSq1+/B3YrAz4/SMlnBdqUnrgKOVYn5U7QSDG8MU+EhlQX/MzLpUYRMRBRI1RcOHCqNHN/DmOobPiGheeGdfufewE6XxbjKjj0YWgY/CjsOA+vak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422246; c=relaxed/simple; bh=8V0RsbJjbg+C2CRylv9rHMgahDcLKD21enufdKYElYs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xchjt/e0ZGoGysz2RNUEWhpgQGXrEYrKnUnSgD3SZZ0Z5YUvQyfxzzMONzeSmHzA7aaib4R5IGsg6tUY6/j5ePmDoZjQxOA8dEcIHzjCFmW62j8Xv3wCWa90EZrxzUbpKUxqWDNmm3tvsivXfaFmJ+ItMzHrjUVXLCEBUPoqcBo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=eoFTcz4Z; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="eoFTcz4Z" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMCpBbw017912; Mon, 22 Dec 2025 16:50:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=+5QN9fBrz8NVdvPYx 1QvWxk2hn6Rv7kUyS/mMpg6ITQ=; b=eoFTcz4ZvtJ0+fU5Q84mQYxKmZ0FV9T2K kou4hZRNha4uZ0Q7xMWYIwwByddVEaEwoP6aouWGvgca4hxC2ydPXWw0HvSwY9dY 39YWInD2B6wHEClnG0ELzHE0c4lH8zeP4gkrH+Vw+lT6mx8d4mTMz3oYOCU1G9PP EFMBCrkix18KGDxaZgFqLRUUoIZEG37ZdDH/4pqqVyujhl2tseHOIF/t62rwcV/y NHoOOHBq+xZz6zLfRA6lituTtilz/LsxXZqZUue0NMs8Ib9BgY65RAA0Efryl3At dlYaJq7FSkB+b5CQZPVd1NIwcewNfPxHhIzmvo9bqQvi6+75HbqQw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ketrtac-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:42 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGKH7s004272; Mon, 22 Dec 2025 16:50:41 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq582-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:41 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGobQw50856198 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:37 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7E70E20043; Mon, 22 Dec 2025 16:50:37 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5C1A520040; Mon, 22 Dec 2025 16:50:36 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:36 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 02/28] KVM: s390: add P bit in table entry bitfields, move union vaddress Date: Mon, 22 Dec 2025 17:50:07 +0100 Message-ID: <20251222165033.162329-3-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: uE34NIIOU3eTeik5C15RuIg0jwrPHKyT X-Authority-Analysis: v=2.4 cv=Qdxrf8bv c=1 sm=1 tr=0 ts=694976e2 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=ejxT3_R5mcMXAYFV490A:9 X-Proofpoint-ORIG-GUID: uE34NIIOU3eTeik5C15RuIg0jwrPHKyT X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXyg9Ft+hTY2Cz hoJe0kzvM4O95S8mPM9VxDBIrHxZAHVvJ/oJKQKRvF/5OsOMjOh55zMpITmpeym7Kv7uaX1ETUx Yvzmn2AEOOseDeeeif1i6EilH+CO0JzO3DVBy9HCPVV96OSjiVazv94OS7gveq6/+VcCtHiBz4Y NxF1EyZotvUviIm7kKfnvtkLSUp1qF1YVHLPWmGWd3HHIDd/SRa33EoqKuKxkwXEx1CztHAOJRy I01JLuoTYbWtAC7RQycOyrorVJssCo0Gwlk03E714EJMJGL/O5x5B+f8Kqj4Lvj0qNDJsCf7sF0 LCiZ0bKX8VxsEvecYIebJ01RGt5P2DK8IVK0/NRPlNWqaoLwRZ+cdfX2XYRMpW5NY3OGa/lfuIs WdJKYeGvu78XDrMs9TPSrY9F8a6DkjaoXKQWUDfJ2ayL/HnVtT9CYw0JLJHnYgz8rtGBMZTXfYP Rv1GwRPaVgBdOpoprqg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add P bit in hardware definition of region 3 and segment table entries. Move union vaddress from kvm/gaccess.c to asm/dat_bits.h Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/include/asm/dat-bits.h | 32 ++++++++++++++++++++++++++++++-- arch/s390/kvm/gaccess.c | 26 -------------------------- 2 files changed, 30 insertions(+), 28 deletions(-) diff --git a/arch/s390/include/asm/dat-bits.h b/arch/s390/include/asm/dat-b= its.h index 8d65eec2f124..c40874e0e426 100644 --- a/arch/s390/include/asm/dat-bits.h +++ b/arch/s390/include/asm/dat-bits.h @@ -9,6 +9,32 @@ #ifndef _S390_DAT_BITS_H #define _S390_DAT_BITS_H =20 +/* + * vaddress union in order to easily decode a virtual address into its + * region first index, region second index etc. parts. + */ +union vaddress { + unsigned long addr; + struct { + unsigned long rfx : 11; + unsigned long rsx : 11; + unsigned long rtx : 11; + unsigned long sx : 11; + unsigned long px : 8; + unsigned long bx : 12; + }; + struct { + unsigned long rfx01 : 2; + unsigned long : 9; + unsigned long rsx01 : 2; + unsigned long : 9; + unsigned long rtx01 : 2; + unsigned long : 9; + unsigned long sx01 : 2; + unsigned long : 29; + }; +}; + union asce { unsigned long val; struct { @@ -98,7 +124,8 @@ union region3_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Region-Invalid Bit */ unsigned long cr: 1; /* Common-Region Bit */ unsigned long tt: 2; /* Table-Type Bits */ @@ -140,7 +167,8 @@ union segment_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Segment-Invalid Bit */ unsigned long cs: 1; /* Common-Segment Bit */ unsigned long tt: 2; /* Table-Type Bits */ diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 41ca6b0ee7a9..d8347f7cbe51 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -20,32 +20,6 @@ =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 -/* - * vaddress union in order to easily decode a virtual address into its - * region first index, region second index etc. parts. - */ -union vaddress { - unsigned long addr; - struct { - unsigned long rfx : 11; - unsigned long rsx : 11; - unsigned long rtx : 11; - unsigned long sx : 11; - unsigned long px : 8; - unsigned long bx : 12; - }; - struct { - unsigned long rfx01 : 2; - unsigned long : 9; - unsigned long rsx01 : 2; - unsigned long : 9; - unsigned long rtx01 : 2; - unsigned long : 9; - unsigned long sx01 : 2; - unsigned long : 29; - }; -}; - /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1014633290B; Mon, 22 Dec 2025 16:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422248; cv=none; b=HJrWtqPZ6pdLQuEblGIAZlpNUxHVexQtrKbTs6l/ep0JI5Rae4rS/XGzHJ8Nxfus20ORjF8ni3rKXIkY58CgzYtiVWdw+xLZBVbZ7VrCD8yd6ZtfnMK7HcDHqOSXhnnztPfKfagIVrZgwxbDBXuP0iY5U7MWA7n2eagoVLpbg5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422248; c=relaxed/simple; bh=KKJC+YnSQfRueC8R5pCG5k1inUq4VYQQ3SDb0JxrwfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DE7+lxvLWh2m+AORcZxaWUUaoSuRiCm54O54RShRwuD9BkxHZ/9qiHED9fHSTJQtcBjsIpdsC/M6Qfv1tkQEAC61lLWIcVd230raRne/ZjDbT6kOwl7Jqe8VTNYTjdlZuG7ZuGw8HHtwYYvT6uD33TCP+0wux/bGzyQg4INgi4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Neh1A8Vt; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Neh1A8Vt" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM7CS4t002196; Mon, 22 Dec 2025 16:50:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=HX2NRnK5H2nIdD2i/ 6mgkh0Vo4qi+8ItWn9NM+weks8=; b=Neh1A8VtOd0sy7lht2Jl/ZMuaL3gvsakT rR7n5+pU3ks1FvUSgTqtxXxsBpqRVsTNrjl1P/iRzzc9eaYKQVAtyfSwMR2CZhOS zzFfbemSrD6zfbR0nLZbNGhskLmdk4MjW/ut7h9Mk+tLiT7gmN5hTrip8hvhSdFw 7nLygNcmELebN35+7g371N1R8P7VjqEQvZqcRkHOJSD+/LKSjHzTxAvZWvM7FnOi 9p7zKICqWCypzvSS1A0n0lV0ejr3XO0GGKxMUfbuSQYN9MKbBheyA+zV5Wa5JUuK JW/TfoYYYY3gl+7lFe8qIyS8dsAQyZNS3t7Sp24JRj4TrJdRgDiew== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh4968q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:43 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFUiE5001109; Mon, 22 Dec 2025 16:50:42 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b664s7a5n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:42 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGocTZ41615864 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:39 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C5BA420043; Mon, 22 Dec 2025 16:50:38 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9DE3B20040; Mon, 22 Dec 2025 16:50:37 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:37 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 03/28] s390: Make UV folio operations work on whole folio Date: Mon, 22 Dec 2025 17:50:08 +0100 Message-ID: <20251222165033.162329-4-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976e3 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=56eYp6RrapUnXBmR778A:9 X-Proofpoint-ORIG-GUID: RBr-c5kDQXEgry7RpIWLrnHlUz4Ilnp_ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX5oeKI2N1W7mp pDp6tkvPAtMT31gKnvkBTqcaZpD0rIO6Oh/blDNlIN72INPOVr+XREvA6qpicvFkVAcozpNnA0a J7D3BZ/S8PzYkxVX3ws1RNRpu4pY1qRsT5OiiEjTgDVlDSWQfw3sKupIl1V/ETEMSdDhOjRwfHW jr1L/15lld6N3rqlUjXng68puL3P6I+I2pU5cZ3HCcpJD3bOtZJoSzjF4SiAzquxM2y+0uyru2d KP3Th8VOyntES4n6NMXw/3DsxYhRBMGCq4hKyllzqP5hisvCLGy6jNQJi0rphlkZ9hgD1TFY04u jJUmAlKveUbUnEAwStYyDcUj44XYGWNGIUtcOzE3N7JtdUsfYynFeeoClsHX7QA68wbgpPx63pH ALCURtZCoslfQSQ58ttak+OqHP4vDrtSp/R8GkyYRIlQaN6UEBbkk3L6Y5VfviC8DF9MzdraXli ku0tCAzmeY9WgV9T3RA== X-Proofpoint-GUID: RBr-c5kDQXEgry7RpIWLrnHlUz4Ilnp_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" uv_destroy_folio() and uv_convert_from_secure_folio() should work on all pages in the folio, not just the first one. This was fine until now, but it will become a problem with upcoming patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kernel/uv.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index ed46950be86f..ca0849008c0d 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -134,14 +134,15 @@ static int uv_destroy(unsigned long paddr) */ int uv_destroy_folio(struct folio *folio) { + unsigned long i; int rc; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - folio_get(folio); - rc =3D uv_destroy(folio_to_phys(folio)); + for (i =3D 0; i < (1 << folio_order(folio)); i++) { + rc =3D uv_destroy(folio_to_phys(folio) + i * PAGE_SIZE); + if (rc) + break; + } if (!rc) clear_bit(PG_arch_1, &folio->flags.f); folio_put(folio); @@ -183,14 +184,15 @@ EXPORT_SYMBOL_GPL(uv_convert_from_secure); */ int uv_convert_from_secure_folio(struct folio *folio) { + unsigned long i; int rc; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - folio_get(folio); - rc =3D uv_convert_from_secure(folio_to_phys(folio)); + for (i =3D 0; i < (1 << folio_order(folio)); i++) { + rc =3D uv_convert_from_secure(folio_to_phys(folio) + i * PAGE_SIZE); + if (rc) + break; + } if (!rc) clear_bit(PG_arch_1, &folio->flags.f); folio_put(folio); --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36798332901; Mon, 22 Dec 2025 16:50:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422250; cv=none; b=PEzxlePAQ6f/UK8fpOdHC7injeNDWZ/P3g5gnYPSYxDlKECadPhuTlkQDx5tExf3EYqN2FXKKfXRiNpAdMP8smkHV1xN8TMjfOpAtzusb4Ol3hw9BRue4YPUM2xHJnhcRSSY3qYl1UPrK4eKqKT6Uwq4e42m/ogWt2gbIIyUUAo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422250; c=relaxed/simple; bh=ppddRHBjjzuGKYC56DMR9Y/6uMZxA+vA361WbAMUQjk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D2+DyfOAJ3SCX76XKMeDDzBXuy9y+WyWoAgWCqa3UiX9MdZpcCaNUPMMdYIH3NwtkTuzQVTmah5rlc7OWgEpoT7s+JcWrS72OPjEfpkLpA4HdqvodMPqIfsLFOJCfmO0+NlB8KpNe0l5u2f3DhvEuBsulFFJu/s9uqRyHkVJBbg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Um0kXMy0; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Um0kXMy0" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMBIMPN020242; Mon, 22 Dec 2025 16:50:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=53+6YA0gHtcxVllIt onUc6YdfZpmIkiOnoO0JHtuaEQ=; b=Um0kXMy0mPe8r7w5xJhJg6Os7gccTYZou uzScTxf6aUAILTSFI20JK3d8WMBmAXxPkcbBj1EGun/w8gokO6NzweE3t8/8ORro K8czLEdlIGYXub8ttvzidJx97WlP74N+WdL30ipWrih1ZMACXX0uKzMIMVGlLw6W APRIrluPk/v2j0bMqpc5g0R616yZcI6l5gBS4irANNNo+gvrIthPYKyC8AElycl/ UKGejutgBZ5weNknb52ywbWWatzC2BpFcLHfrBl3/z89/yjrFnltAzkPhTKOjsZc KccQwfmw3ZT6s1PGWnSu4lEn5RpiJVMdF7sXRiSjXUPAbtjq0SLOw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399et-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:45 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDcXZ5005286; Mon, 22 Dec 2025 16:50:44 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b67mjy3hf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:44 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGoeQr14680506 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:40 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 09FBB20043; Mon, 22 Dec 2025 16:50:40 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E412B20040; Mon, 22 Dec 2025 16:50:38 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:38 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 04/28] s390: Move sske_frame() to a header Date: Mon, 22 Dec 2025 17:50:09 +0100 Message-ID: <20251222165033.162329-5-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: x55RDZf1V80rFGrcYKmP14sIwLW_VqTt X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX6vJY0XcYlMtl kcZQA6rfsCXiQS9lKvMBAfg29yrrBlIZ/Z8nELqyvWG7PxQ6qfXzDBcvZdGcx6yYh+tdDAK390I sqQMYKB5r2t8QQZi/VWLK78TDpQtvTC8Nv1lelCiI0AilQfqLVjOewieOIXw9ehOhxK2Oi2JeCg SgvFhTVkX2B/zNrELTX7gojCl5CabJr1t94YJyIU7Yiv8v+QaYL8sFQXGz4TwW2zCA+ZG/+Nhgu lMRJ5kIo+7baNYyJQa1rsIRYIrafC3m692jI27nu9PdkZELJcCZtFcFx6v0NiUMLqgLrLaD4c1K fHS1yI2/Jy18TBrrRisEevQzdoWR8s5+LQ9LhrhLBileYl4vUdfwzilo1PE0zpMgsq8LwuYn37e 9A2PlNieRKV/3sHz2b4+62MWjtoX5qzz1h60MW2PwUy7gkhLgsAuqGnBN0PYaXFgoJBLsAmKHWx tuyKbxfRpr9SdB1ORBQ== X-Proofpoint-ORIG-GUID: x55RDZf1V80rFGrcYKmP14sIwLW_VqTt X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976e5 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=cwyL-bC_1qH1TysSW8cA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Move the sske_frame() function to asm/pgtable.h, so it can be used in other modules too. Opportunistically convert the .insn opcode specification to the appropriate mnemonic. Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Reviewed-by: Nina Schoetterl-Glausch --- arch/s390/include/asm/pgtable.h | 7 +++++++ arch/s390/mm/pageattr.c | 7 ------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 8194a2b12ecf..73c30b811b98 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1136,6 +1136,13 @@ static inline pte_t pte_mkhuge(pte_t pte) } #endif =20 +static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) +{ + asm volatile("sske %[skey],%[addr],1" + : [addr] "+a" (addr) : [skey] "d" (skey)); + return addr; +} + #define IPTE_GLOBAL 0 #define IPTE_LOCAL 1 =20 diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index d3ce04a4b248..bb29c38ae624 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -16,13 +16,6 @@ #include #include =20 -static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) -{ - asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],1,0" - : [addr] "+a" (addr) : [skey] "d" (skey)); - return addr; -} - void __storage_key_init_range(unsigned long start, unsigned long end) { unsigned long boundary, size; --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74CA53358C7; Mon, 22 Dec 2025 16:50:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422251; cv=none; b=m2R2TtAJog/EULjyV8Es3NdIPhp8+0Jk8TZUnCfLtZKPX/byeUR9eTYIwPAVrYXcX6EkxUdccvHThzSbjFFPMpIdmMgu5vnKVbZ4446oqs/WHh+kM1tigX6uEVOioALXcIrAgyp+1a8uRmEumJYxtC7Vwri3AjsiuJHMZVPCBCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422251; c=relaxed/simple; bh=wZIM7tTYqqSB7gE+7VmQ8HvdCuIZC1gLeC4DpqWpcvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aD6r8uj4HRnUJdbU+cjO1JvF+x9VW9EIvyJW7bDklA4dxG/Hv2oCNA29cFCJmq1HrI9KDOEYIacqp/xZP1fZO25joMQRD+YYifJX5rb4Hq7nMbpunug5150aRJY3J3DP7sQkpOt6Jda/tyMCLJAniBUD1qTnnea3wtR1aUfgk7o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=co7TePSq; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="co7TePSq" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGG7bP007593; Mon, 22 Dec 2025 16:50:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=CE4JpDSEghk1nwVFo 3HYe4mFC7f75/s97B715bNStdc=; b=co7TePSqjA6sbiYkSIrjESWYN7+IpJ7jB z6eOttgpeWxxUiqmw0nZNzpbUb2zM6A/U8hM7BBgVwIuoeMAe1bW26f4dkTQQM3z j2strqIcAGSZqf/B6IJtzCb5PV2jpZfMDKMGWTE+YV/a4OpskRnvdwNvfVn9obG3 a3v2/OkQSCknzcjW3B4yNezlfOcnNSa3u8ntp2yv3dSSzMBkIkVFeAzAQbOWJG71 tiF21cddvv02tXZrVuC8IPIExu0JFFXJQqW+4onkSEO7g0egBjFidYRUB/2W2N3l rVzyO/pFtwrwlH8JO7dg9osyZVZDGeYaBmu6N4TNPysnjhw/sUPbw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh49693-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:46 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME7hoH027014; Mon, 22 Dec 2025 16:50:46 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b6r934860-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:45 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGofsF51184118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:41 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C83B720043; Mon, 22 Dec 2025 16:50:41 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2895420040; Mon, 22 Dec 2025 16:50:40 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:40 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 05/28] KVM: s390: Add gmap_helper_set_unused() Date: Mon, 22 Dec 2025 17:50:10 +0100 Message-ID: <20251222165033.162329-6-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976e6 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=F2Av9RbSm_i__n-hslAA:9 X-Proofpoint-ORIG-GUID: A8wfDoF-JV2lBbZPqcutWcJyRttNmwKA X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX18FoV4JY51A6 /cvZVJsZMM+LX7Cfd/KP5myFPm4y74o73//LT42wThJvOJLCI1nm8sGQfGk66EXp3YkizQ279Gj VwBubpxiNIPy/B0aHvw3NXj3l/sJycQOqL3fYAaaTrogSyPQHdD3X80WOQDM7IOeJ9fyENMpEL4 GIalxya+COy4QDHBIWy1llRlFrMd/gwWKGn2wsTluolJnkKCSD2079gz75XqUZQdFYSn15ovBT2 XrellpggvbL+IyTGcRKVvKIAsXT5EOIL0g5GSUlZLrugldYodrbHrjmAbCQ0gbc2HWa1hsqM+RE x1QRtzxCEiz0x7bmQAwl0z2P5Upmq8CgjW+5CHF8wTNj5foDzvLo3xecgIAPUvir0F5MGF19nA9 WH+MidIRZYZSq/KS93GqK3NJLpyV8n0uzyh6Fygppv77JxsRK76NQosDNl5s2rRCgps0DSrzVxm 13lyBM0Q36+DHXN+uIw== X-Proofpoint-GUID: A8wfDoF-JV2lBbZPqcutWcJyRttNmwKA X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add gmap_helper_set_unused() to mark userspace ptes as unused. Core mm code will use that information to discard unused pages instead of attempting to swap them. Signed-off-by: Claudio Imbrenda Reviewed-by: Nico Boehr Tested-by: Nico Boehr Acked-by: Christoph Schlameuss --- arch/s390/include/asm/gmap_helpers.h | 1 + arch/s390/mm/gmap_helpers.c | 79 ++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/arch/s390/include/asm/gmap_helpers.h b/arch/s390/include/asm/g= map_helpers.h index 5356446a61c4..2d3ae421077e 100644 --- a/arch/s390/include/asm/gmap_helpers.h +++ b/arch/s390/include/asm/gmap_helpers.h @@ -11,5 +11,6 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr); void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsig= ned long end); int gmap_helper_disable_cow_sharing(void); +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr); =20 #endif /* _ASM_S390_GMAP_HELPERS_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index 4fba13675950..4864cb35fc25 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -129,6 +129,85 @@ void gmap_helper_discard(struct mm_struct *mm, unsigne= d long vmaddr, unsigned lo } EXPORT_SYMBOL_GPL(gmap_helper_discard); =20 +/** + * gmap_helper_try_set_pte_unused() - mark a pte entry as unused + * @mm: the mm + * @vmaddr: the userspace address whose pte is to be marked + * + * Mark the pte corresponding the given address as unused. This will cause + * core mm code to just drop this page instead of swapping it. + * + * This function needs to be called with interrupts disabled (for example + * while holding a spinlock), or while holding the mmap lock. Normally this + * function is called as a result of an unmap operation, and thus KVM comm= on + * code will already hold kvm->mmu_lock in write mode. + * + * Context: Needs to be called while holding the mmap lock or with interru= pts + * disabled. + */ +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr) +{ + pmd_t *pmdp, pmd, pmdval; + pud_t *pudp, pud; + p4d_t *p4dp, p4d; + pgd_t *pgdp, pgd; + spinlock_t *ptl; /* Lock for the host (userspace) page table */ + pte_t *ptep; + + pgdp =3D pgd_offset(mm, vmaddr); + pgd =3D pgdp_get(pgdp); + if (pgd_none(pgd) || !pgd_present(pgd)) + return; + + p4dp =3D p4d_offset(pgdp, vmaddr); + p4d =3D p4dp_get(p4dp); + if (p4d_none(p4d) || !p4d_present(p4d)) + return; + + pudp =3D pud_offset(p4dp, vmaddr); + pud =3D pudp_get(pudp); + if (pud_none(pud) || pud_leaf(pud) || !pud_present(pud)) + return; + + pmdp =3D pmd_offset(pudp, vmaddr); + pmd =3D pmdp_get_lockless(pmdp); + if (pmd_none(pmd) || pmd_leaf(pmd) || !pmd_present(pmd)) + return; + + ptep =3D pte_offset_map_rw_nolock(mm, pmdp, vmaddr, &pmdval, &ptl); + if (!ptep) + return; + + /* + * Several paths exists that takes the ptl lock and then call the + * mmu_notifier, which takes the mmu_lock. The unmap path, instead, + * takes the mmu_lock in write mode first, and then potentially + * calls this function, which takes the ptl lock. This can lead to a + * deadlock. + * The unused page mechanism is only an optimization, if the + * _PAGE_UNUSED bit is not set, the unused page is swapped as normal + * instead of being discarded. + * If the lock is contended the bit is not set and the deadlock is + * avoided. + */ + if (spin_trylock(ptl)) { + /* + * Make sure the pte we are touching is still the correct + * one. In theory this check should not be needed, but + * better safe than sorry. + * Disabling interrupts or holding the mmap lock is enough to + * guarantee that no concurrent updates to the page tables + * are possible. + */ + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmdp)))) + __atomic64_or(_PAGE_UNUSED, (long *)ptep); + spin_unlock(ptl); + } + + pte_unmap(ptep); +} +EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused); + static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B286338595; Mon, 22 Dec 2025 16:50:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422253; cv=none; b=fSDFe5Z/aHejfRQ+n5VbwgcG+GL7jA9KOuJFt145iAGT08jQPstl1v3rGHhmTWtnYn41T6naH6uafv3mXUAy+/mU9K75j0aM5ovjojTBIqNQFCbYw4XxKP73vK8eXUMlRsCWjQu4Tws2siun+U0SVp0Pj5T2fQeSQjQLSfpbbu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422253; c=relaxed/simple; bh=2G/RNx8tFc20K9NAY2KiZ4CkWJ7nDB/pbKmFm1hV4Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IIWIyxki7MuayqnHvG0DYGieYDJ2aGx1329NprfdBWrHT2w0aFoZWEIpT2nFKIREzhr+l6LEY1RbBvh+GojNfnmAawysfbksQcXActB955x0Qyk/8KMdSXqj0QqaDhcpsGAbhuFmdIuDYgSNVlEBJlpkdpQsF6h9mfWvooidsRI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=pgLF84UF; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="pgLF84UF" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDi0Lo002084; Mon, 22 Dec 2025 16:50:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9qirMlnR08lnLuOsu ocG1SXixr7jpbIgJF3CIcBfEdM=; b=pgLF84UFm2f5fCpeUZ8qddNaS9qTPwF73 2PTnrPZ3DFJmOzY6dVXKR9ZuyYSTULBea2vH8k236NnNTW2KViIFunt3HosfbpCl y+stVaJ3WPM352uQiiIhz/w3KFilNfwSJwMUn0+6Y68lkRnyaYMa8slmj10wdolk AB30pQzBQybmvhKBZ7GtOvqV3J8bRabpFyIA+LkpSwebfX2vpOkoao1M1FVC7ccN Shmh8J4n+VXKczyH8I+UBjUoba7kMADhqfwcA7M92i6OVI0SU1fT0Mmdc6s4gk7n uxPf2rRyvwwh3gFm00+3LS6QpLUllH+FV6KYkHgYoMW9yShsc/pUQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5j7e10yd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:47 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME83G7032274; Mon, 22 Dec 2025 16:50:47 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b68u0xva7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:47 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGohQC29098362 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:43 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3EFD42004F; Mon, 22 Dec 2025 16:50:43 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E6C2C20040; Mon, 22 Dec 2025 16:50:41 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:41 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 06/28] KVM: s390: Introduce import_lock Date: Mon, 22 Dec 2025 17:50:11 +0100 Message-ID: <20251222165033.162329-7-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=G8YR0tk5 c=1 sm=1 tr=0 ts=694976e7 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=ukBcN2NOhkw1kEjOR7IA:9 X-Proofpoint-ORIG-GUID: tawjs8rIpNfPuimSNT2LbofBU_SmuI3P X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX09KcxDK/nkMm 3OFlZKvHdNT3IwBpi6AA6L/sZLinO+CVksxRjH4gXZo2AtkOwgIqW0ageyj4PqIDSVgOtqG9oc+ 1oiZWQSPol/ZMzzpv4AJtBhan0rYoGYZXH5IVdk1mqvGjchAH4Z5ZQeYFAL0vHMPAvvkFjpGV5q LFVoh4gW9zKN5BEXP1msSi5OHGtTZSmQkbpK1aHpimGS3jD9Cv2KCoOiR9xZyhOWfzphsO0+w0p mdOmZIY7uOwLVxquVVlJRUc+cCCLlRN16Q4Zc0vTKNDtWsBr3HIPRa8acbmLzN6L6vjahO4ABWz l1KTIi3T56nx119DPNadPcxiK8uTTyUKUCpE5Va3TVLQgyATRuGEJRTtEe/GeAAnr2+Am5jc2QZ NdMKMoWwy7HdiU7+xlaJvi3dJjqwyTZhFBQu0gaUksC8XTh/V4VNK9ifEi+/+9MTqVVVqFr0+5Z 62vEx82q0YsH3MJkivg== X-Proofpoint-GUID: tawjs8rIpNfPuimSNT2LbofBU_SmuI3P X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 clxscore=1015 adultscore=0 spamscore=0 malwarescore=0 impostorscore=0 phishscore=0 suspectscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Introduce import_lock to avoid future races when converting pages to secure. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/kvm_host.h | 2 ++ arch/s390/kvm/kvm-s390.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index ae1223264d3c..3dbddb7c60a9 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -630,6 +630,8 @@ struct kvm_s390_pv { void *set_aside; struct list_head need_cleanup; struct mmu_notifier mmu_notifier; + /* Protects against concurrent import-like operations */ + struct mutex import_lock; }; =20 struct kvm_arch { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 56a50524b3ee..cd39b2f099ca 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3330,6 +3330,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long t= ype) char debug_name[16]; int i, rc; =20 + mutex_init(&kvm->arch.pv.import_lock); + rc =3D -EINVAL; #ifdef CONFIG_KVM_S390_UCONTROL if (type & ~KVM_VM_S390_UCONTROL) --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02F6F339844; Mon, 22 Dec 2025 16:50:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422253; cv=none; b=IKw4+8c1GcP5xJrUIY9bfsHIHbqR2dLflbAYgKerF/nm2aDktD/biyZ4jBzw1K53cwiiBh/KY19O0nw5uW/0bMcWsM4BNxrqCjMcgDUsNQZx4sW6Hc85YvByxYbrEAIF+f1aC4O2RWBbmPxXpxfgUi5GAnJGHrbVAgffVpZkwCA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422253; c=relaxed/simple; bh=9YJhYzyxGQsOEJpcFJeaFUho+YsLDtwkF+hgJBjHvhc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AN6t0Xi+pXLnYI09VXV6iFbXi1YjbXyLDzElzvfwiGDI66Shgpg7Cij49AgGRoB+SyYncbhumYmC9+9a6/GDMsoERZFm1IVf2i1n4VMaDysm5e4sd0IcJnxffc4uLs60k8R6XpbwyyGGdWAV4FxwDogc2NfgAo0oRhEoCzpVvyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=eTEZXbNO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="eTEZXbNO" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDTEed009176; Mon, 22 Dec 2025 16:50:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=z1DOOgHqufta6Fg5x p6jSQBL+c+NsNMwlN8dukx9jZs=; b=eTEZXbNOm7D8ee4j1UhrsjsugraJWOws+ rHYO8u2lewHLqowLtPAoI+WhgB6fkaNhvPtX5CGymH4q4VZLkxT3VY+vCuC7kBMz fQfBEQr5y77JaC54XgmxxwmCkk7B8wnmm3+ElQyYyr3sm3Ps/55yDZjqGkF/mtds vyB+e6tNMNaqdtURhjaR0OYyao/MYzqzk6T/wT7cbumZaXthnry5o3i4+sTWy1zn aw3NK+MvtKWKQTY/VtRW/6PLfCyQvnI9343ItaMhaJmRmxnSjKV9k0GflEZbK7Qh uAdMF8KrumcC7rzCcEupf2hGMBbDlAjMArA9jHVqiZkym9A8EHnmA== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ketrtb0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:49 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGDAo1030207; Mon, 22 Dec 2025 16:50:49 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b66gxq962-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:48 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGoi6g50856242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:44 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9757320043; Mon, 22 Dec 2025 16:50:44 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 59E5420040; Mon, 22 Dec 2025 16:50:43 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:43 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 07/28] KVM: s390: Export two functions Date: Mon, 22 Dec 2025 17:50:12 +0100 Message-ID: <20251222165033.162329-8-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: zhEf1smeKqzv5QU8L8fvvzDQR18Ni035 X-Authority-Analysis: v=2.4 cv=Qdxrf8bv c=1 sm=1 tr=0 ts=694976e9 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=kE4rddDq9Ks6am3DDGoA:9 X-Proofpoint-ORIG-GUID: zhEf1smeKqzv5QU8L8fvvzDQR18Ni035 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXyp1QyhyAVzMh SrlB2ycFYANhtxyLPaYiLFQQTMtathEHM2QiDZFQnUW58VAg9PYWatkxQ53YLgVYRudJhQIIo2x FM09x7e+wQ8mDfL81FcO7Clmoo1/2ZXRw1XiGlr4OcZBcX5gmh1sIsxfMgmLKsNUtKcmww5IGq4 NDfrgX/S8XpSfQdpfFap7XOS7K7GQeDfFQF/8f2kILjdfE2fhp8bn4LfPxatFQIBiMgJDon4HDL UkrFCCHd6aXCw2sliTBta8lW9V+WUKOwEyWK8B/xb73etc4fwkMfJYSCac9xHc3pJon6VZySMcp sOWMaGhQ6m25I46SG+zl4GrrB29crX0AoLY3iJQegXZoctATP2k3Sr3w5rIWP9PDChlNVdvIQ4d Z2069orXORVd4ZqciRwwE2o0bA8aXfCgxssQIF56IsasG5jTjPDK2D2aCsSlgJQuvevzNje7vNI fikSOQorKjRm5oLcDvg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Export __make_folio_secure() and s390_wiggle_split_folio(), as they will be needed to be used by KVM. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/uv.h | 2 ++ arch/s390/kernel/uv.c | 6 ++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index 8018549a1ad2..0744874ca6df 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -632,6 +632,8 @@ int uv_destroy_folio(struct folio *folio); int uv_destroy_pte(pte_t pte); int uv_convert_from_secure_pte(pte_t pte); int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb); +int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio); +int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb); int uv_convert_from_secure(unsigned long paddr); int uv_convert_from_secure_folio(struct folio *folio); =20 diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index ca0849008c0d..cb4e8089fbca 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -281,7 +281,7 @@ static int expected_folio_refs(struct folio *folio) * (it's the same logic as split_folio()), and the folio must be * locked. */ -static int __make_folio_secure(struct folio *folio, struct uv_cb_header *u= vcb) +int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb) { int expected, cc =3D 0; =20 @@ -311,6 +311,7 @@ static int __make_folio_secure(struct folio *folio, str= uct uv_cb_header *uvcb) return -EAGAIN; return uvcb->rc =3D=3D 0x10a ? -ENXIO : -EINVAL; } +EXPORT_SYMBOL(__make_folio_secure); =20 static int make_folio_secure(struct mm_struct *mm, struct folio *folio, st= ruct uv_cb_header *uvcb) { @@ -339,7 +340,7 @@ static int make_folio_secure(struct mm_struct *mm, stru= ct folio *folio, struct u * but another attempt can be made; * -EINVAL in case of other folio splitting errors. See split_folio(). */ -static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *fol= io) +int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio) { int rc, tried_splits; =20 @@ -411,6 +412,7 @@ static int s390_wiggle_split_folio(struct mm_struct *mm= , struct folio *folio) } return -EAGAIN; } +EXPORT_SYMBOL_GPL(s390_wiggle_split_folio); =20 int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb) { --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEB7433A6EC; Mon, 22 Dec 2025 16:50:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422254; cv=none; b=VUxCNTkqFfsRxv93PbQ/2Je4vQymbSBRVqCcTXJHxvBoK6Q9eJK/WoL/D2FX4l0H00y9D/zR0lUQkrPDjr12x8A62vTRmKOVT+ttnZuizWdas6gcO7vU2w+roBD7XHuImZxOU8zNMzm6bPteWgFxZDEF/GKzcZ14cDX8ZXwHb74= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422254; c=relaxed/simple; bh=9J579RODsBY3NHU28XOdCR6DAB1tRTgjQOLebDa/iz0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TfiiyouoIVYbYDY3svzVMou+dR8wGMAxe5U07moVfcJyGSb6ET4hHvUTqDlnXvZwGkQNASZbCmAWrId5cGJRSu3FY/2kmdamQ1VtqPyPOWPIIuKBjjeMsqsBLw6e0YxGjKTlM+lxV6ccWLA1WxGDgL3V4h67NVlu7CWZJUsA9C0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=NY2elnUh; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="NY2elnUh" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDimLS011531; Mon, 22 Dec 2025 16:50:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=oOx4bRoIpElNi+6/4 tyln67C51SNBcTtwkDt4dOQoUw=; b=NY2elnUhxukFvxeI0isrJjAX9Co8W9jmj 72hnKt+RTf3I68O3hBH2vr1lo8/s6oddWPjFtxSiHk35QnPK15JpAe8BqZGKbfzm f2md2PzQ7mEsujUFJuxmcbNBUuJvD7g5ancyKpFDYfTft/pjP7WjksYqPPcnIDDE QohE6fcF+YwCEn32I62KnWilS4T9PCTZB/egiwft/r76p3CqReIU7CcxTIqufQfE IcMAMQDtdvhDuLSB1UrbO4F/9nTc0pAZqA/iOpJhiOQut6lMU6m7emXZijkkzZDW I6hdJw4JjOkyWzHXJf5wmBlLJjjOWxscAk7XHgOw7TVpi8T5oUz7A== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5j7e10yr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:50 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME27sM027076; Mon, 22 Dec 2025 16:50:49 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b6r93486n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:49 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGokfH54591806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:46 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 29B9D20043; Mon, 22 Dec 2025 16:50:46 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BDFF420040; Mon, 22 Dec 2025 16:50:44 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:44 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 08/28] s390/mm: Warn if uv_convert_from_secure_pte() fails Date: Mon, 22 Dec 2025 17:50:13 +0100 Message-ID: <20251222165033.162329-9-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=G8YR0tk5 c=1 sm=1 tr=0 ts=694976ea cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=gEHK17p0qXyn9eJGjbgA:9 X-Proofpoint-ORIG-GUID: oCFdL7MaIBlkGnZk8jJ_vNBmpfnRIUz4 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXwxGry1/Ln++X atRNnCX12nlJZg42cYkA5ATxoK1dJm1cAdVWcg+8qEFRFpb7J7MPJFKWqh2fuAG6WAPx3AWcAnW xU9LqPAGc4na+fNqB9tpaKzenbOr0AxLUypvPHhag94Z4eTMvreJgisXHyAOjLUAXS1pAVnKB9l +48+8f1tvUHmQkubKZfM3r+uEkxqUfqfz9KOFjkaGOU6W34XFSpxgAfhzjPe1JEXr8z/mtW44is lb2QKbRVzqytebw4PPvXXrql2r13qLMVPbncNssrGy+b6tWZ4GAai5mqXRH5AS59ybdwDP2J7K2 eGmdsUS50zB530b0afaIfm45ebB8hSLJvD6XARAaougctWnKqJzoIKEHIfx/xYP/O6kcMBzOwmG 7BiGNOnMWkJ+5qyGQwfoUGEoU8uzRUzeglDouaG3QQL9TnJYHHr1gQH5SxB/YSEfeLbtPdRtcAV UBQIBtBN7LsZ5bgs2Ug== X-Proofpoint-GUID: oCFdL7MaIBlkGnZk8jJ_vNBmpfnRIUz4 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 clxscore=1015 adultscore=0 spamscore=0 malwarescore=0 impostorscore=0 phishscore=0 suspectscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" If uv_convert_from_secure_pte() fails, the page becomes unusable by the host. The failure can only occour in case of hardware malfunction or a serious KVM bug. When the unusable page is reused, the system can have issues and hang. Print a warning to aid debugging such unlikely scenarios. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/pgtable.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 73c30b811b98..04335f5e7f47 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1239,7 +1239,7 @@ static inline pte_t ptep_get_and_clear(struct mm_stru= ct *mm, res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); /* At this point the reference through the mapping is still present */ if (mm_is_protected(mm) && pte_present(res)) - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 @@ -1257,7 +1257,7 @@ static inline pte_t ptep_clear_flush(struct vm_area_s= truct *vma, res =3D ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID)); /* At this point the reference through the mapping is still present */ if (mm_is_protected(vma->vm_mm) && pte_present(res)) - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 @@ -1294,9 +1294,10 @@ static inline pte_t ptep_get_and_clear_full(struct m= m_struct *mm, /* * If something went wrong and the page could not be destroyed, or * if this is not a mm teardown, the slower export is used as - * fallback instead. + * fallback instead. If even that fails, print a warning and leak + * the page, to avoid crashing the whole system. */ - uv_convert_from_secure_pte(res); + WARN_ON_ONCE(uv_convert_from_secure_pte(res)); return res; } =20 --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBB4F33C1AD; Mon, 22 Dec 2025 16:50:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422258; cv=none; b=A8A4h8WeTarQ0MuR7ZgieTzuwbHMp8fGAbC6531g4Ga6oXS5DhfSZPoBM+qUyMQS1p92GS+hm5xUE0WQ+m0bkZQ+nlNXPEJGcnpWAAjZYPElRSRthfe3suv8KdcAqgQgtQvVPEpD3j46d6RnSKEqpUEXNqSttoRwB0w+bvrVlKo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422258; c=relaxed/simple; bh=MwRlEXS+9TTKAVxXRbRJfm30lcddhdF+DFAQ7aLn7eQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oVrmmwOBOgzAUcgiSrn0ItFfH/iZBUr6KjM/tpUcZH6no41wbeqsotVpKSqGVVLc0uFpl8VHlGV7IGYm0XvhogBv7lMCkJwczJeK9nvSZsl6eG5/8PWWkbogCbGljRRxOWWKFDIWnvBpECikifg3rwogLzCc4bKCjb9uSGNhZPA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=R+eAUF6Y; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="R+eAUF6Y" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDwHVx007191; Mon, 22 Dec 2025 16:50:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9VBHMGApuDPu1H5nW xC2gjbA/GYRZEiMfQwZendHh0k=; b=R+eAUF6YW3Y6JUHRVMHX8YAHwPsZl24lJ vtXuRBneMOMmh3r7n5zfJtoxwQYEeMjD87PMuMp07nmNZJlkRcz6bHtncPQXJNgn atOJbFlDQWie8EccCdjJQ0ywWEJdKeFB4ceuUEVucXQSnlC1nx+GomnIG3iiB/uQ nnqg2KYNj/TvH+ZXNOL5HsSgJGc6I17fCwwOL5cP14dmvDBHxToO8Ul+zDEK98O3 2EQpxInlk0nLkFf1Xk/HY8QiWlK25K/IpByrZYlBwnRreJm6ak1wC40pS5uV0Ib4 3cDSJhK/jGQPI61bLo2V4qvi/yBZQuHTVeLjms+Q5vOENEi82Q1jA== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kfq16wt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:52 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGKfpp005012; Mon, 22 Dec 2025 16:50:51 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq58y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:51 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGolOt12189982 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:47 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6220220043; Mon, 22 Dec 2025 16:50:47 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 47DA220040; Mon, 22 Dec 2025 16:50:46 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:46 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 09/28] KVM: s390: vsie: Pass gmap explicitly as parameter Date: Mon, 22 Dec 2025 17:50:14 +0100 Message-ID: <20251222165033.162329-10-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=carfb3DM c=1 sm=1 tr=0 ts=694976ec cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=HXG1Eys9fZm8ipb54dgA:9 X-Proofpoint-ORIG-GUID: 4za2X7nckNiqDGDSlfTESqBca21_Fsll X-Proofpoint-GUID: 4za2X7nckNiqDGDSlfTESqBca21_Fsll X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX/V7/oVnyZER2 arMPM0aylySxVz+Rxl5E5+BIdUYMHC+rzOzG4GhpxMGD0k/v2SNEnyi8ZJgPQEnNl9MXRv8H1Vk zm0A+PEI+RRA1tp+66criV6E7tghbcIIHXGDf+/YVsnoQyxD7FfAQ8uLlXppO/y51sy0tFNb3f8 oM9r0g2egSVbKBUKGwZxCk4d39cvyaNs7AmOq3hZSdOxR5fttDjpGLtqMzsNhxYQSIEfLNjxOJb JP5RjNXhoRRdTNjnYsuA0HTBT0XsBraMM9khCdAURqSmGb4qOA3oOHlKaMo/E95L4Y/rQM03crH hR0OZvxXsg5tfv5F1Kpzok0CRhAXOHUkBva6uglSVnxdfbJL/oNnAH0CxFT/9o1R32Po9MlV2p7 cUCOI5IdcB18KpXmLQtpz2mx9YFWgsG5pZhxk4Y76tZnjE/wC8EobjQQJaOeXL50F1UOWzi5HZK z2XoRtu27JfO6CjgdsA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 spamscore=0 clxscore=1015 malwarescore=0 impostorscore=0 priorityscore=1501 adultscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Pass the gmap explicitly as parameter, instead of just using vsie_page->gmap. This will be used in upcoming patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/vsie.c | 40 +++++++++++++++++++--------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index b526621d2a1b..1dd54ca3070a 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -652,7 +652,7 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, uns= igned long start, * - -EAGAIN if the caller can retry immediately * - -ENOMEM if out of memory */ -static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page, = struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; u64 prefix =3D scb_s->prefix << GUEST_PREFIX_SHIFT; @@ -667,10 +667,9 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vs= ie_page *vsie_page) /* with mso/msl, the prefix lies at offset *mso* */ prefix +=3D scb_s->mso; =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix, NULL); if (!rc && (scb_s->ecb & ECB_TE)) - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - prefix + PAGE_SIZE, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL); /* * We don't have to mprotect, we will be called for all unshadows. * SIE will detect if protection applies and trigger a validity. @@ -951,7 +950,7 @@ static int inject_fault(struct kvm_vcpu *vcpu, __u16 co= de, __u64 vaddr, * - > 0 if control has to be given to guest 2 * - < 0 if an error occurred */ -static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page= , struct gmap *sg) { int rc; =20 @@ -960,8 +959,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) return inject_fault(vcpu, PGM_PROTECTION, current->thread.gmap_teid.addr * PAGE_SIZE, 1); =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - current->thread.gmap_teid.addr * PAGE_SIZE, NULL); + rc =3D kvm_s390_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * P= AGE_SIZE, NULL); if (rc > 0) { rc =3D inject_fault(vcpu, rc, current->thread.gmap_teid.addr * PAGE_SIZE, @@ -978,12 +976,10 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct= vsie_page *vsie_page) * * Will ignore any errors. The next SIE fault will do proper fault handlin= g. */ -static void handle_last_fault(struct kvm_vcpu *vcpu, - struct vsie_page *vsie_page) +static void handle_last_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsi= e_page, struct gmap *sg) { if (vsie_page->fault_addr) - kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - vsie_page->fault_addr, NULL); + kvm_s390_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL); vsie_page->fault_addr =3D 0; } =20 @@ -1065,7 +1061,7 @@ static u64 vsie_get_register(struct kvm_vcpu *vcpu, s= truct vsie_page *vsie_page, } } =20 -static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page) +static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page, struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; unsigned long pei_dest, pei_src, src, dest, mask, prefix; @@ -1083,8 +1079,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) src =3D vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; src =3D _kvm_s390_real_to_abs(prefix, src) + scb_s->mso; =20 - rc_dest =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest); - rc_src =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src); + rc_dest =3D kvm_s390_shadow_fault(vcpu, sg, dest, &pei_dest); + rc_src =3D kvm_s390_shadow_fault(vcpu, sg, src, &pei_src); /* * Either everything went well, or something non-critical went wrong * e.g. because of a race. In either case, simply retry. @@ -1144,7 +1140,7 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) * - > 0 if control has to be given to guest 2 * - < 0 if an error occurred */ -static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page,= struct gmap *sg) __releases(vcpu->kvm->srcu) __acquires(vcpu->kvm->srcu) { @@ -1153,7 +1149,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) int guest_bp_isolation; int rc =3D 0; =20 - handle_last_fault(vcpu, vsie_page); + handle_last_fault(vcpu, vsie_page, sg); =20 kvm_vcpu_srcu_read_unlock(vcpu); =20 @@ -1191,7 +1187,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) goto xfer_to_guest_mode_check; } guest_timing_enter_irqoff(); - rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, vsie_page-= >gmap->asce); + rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce); guest_timing_exit_irqoff(); local_irq_enable(); } @@ -1215,7 +1211,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) if (rc > 0) rc =3D 0; /* we could still have an icpt */ else if (current->thread.gmap_int_code) - return handle_fault(vcpu, vsie_page); + return handle_fault(vcpu, vsie_page, sg); =20 switch (scb_s->icptcode) { case ICPT_INST: @@ -1233,7 +1229,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) break; case ICPT_PARTEXEC: if (scb_s->ipa =3D=3D 0xb254) - rc =3D vsie_handle_mvpg(vcpu, vsie_page); + rc =3D vsie_handle_mvpg(vcpu, vsie_page, sg); break; } return rc; @@ -1330,15 +1326,17 @@ static void unregister_shadow_scb(struct kvm_vcpu *= vcpu) static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; + struct gmap *sg; int rc =3D 0; =20 while (1) { rc =3D acquire_gmap_shadow(vcpu, vsie_page); + sg =3D vsie_page->gmap; if (!rc) - rc =3D map_prefix(vcpu, vsie_page); + rc =3D map_prefix(vcpu, vsie_page, sg); if (!rc) { update_intervention_requests(vsie_page); - rc =3D do_vsie_run(vcpu, vsie_page); + rc =3D do_vsie_run(vcpu, vsie_page, sg); } atomic_andnot(PROG_BLOCK_SIE, &scb_s->prog20); =20 --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A1ED33B6CF; Mon, 22 Dec 2025 16:50:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422258; cv=none; b=fmgMKl4U9imxY+cSyCNPQYEY//fS0zqezi6XL00C2QcEv47udfeE0xY6ScQYo+n8JCRNzxgOBSMHcCx1dLEzi7UjoBCZAGVyUY/yTSOBB/CcfPfsZNRVmOQLYluhKQUsE9rUgyK3i4QRtQ8Jjix8f2t8vom8JuA3+e/3hcOAlo8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422258; c=relaxed/simple; bh=8UkpGTWwtwpBCmzbEuC+cxDdvga6PTHi3Kdv9DBkpd4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lsg1z/LSdj8JEYQmZIzxB3bHifjUVQ/adJWqcaFb/LhK5Fu+v/ayfCawOrCc2LPQgu0kCPP9pOZbAjWzRdRM3xNKB+bS/3wRLOrLnovozGGeBrXp7VTCdsNGWDC9EwpWdwDXwRKwD/gX1twRqz5eRpDegTfZPPodQPJDfizaGCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=nNlsQfvd; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="nNlsQfvd" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMB2l0t009244; Mon, 22 Dec 2025 16:50:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=lwTngDyaNYF1vtCdZ /G6YT4Oibj8h9KDMYrxj6jl/Wg=; b=nNlsQfvde9MPOjFBvN4aCqAzYS32thaR9 zA61zTvCcXcYTAFK/xT81VPmt8abuFFfVBMEivQL+3iO6ohyfh8q8wNGJlqkK5/4 MLAF5u//BgrlOqye8PcPkfllAFg7E756m5SLkciSNI21pY4lRXYatxAV3lg7xQ/j ePX5S1BS+wFnL/G7dO0A+BXZ9ErV6Mvg1+sddpeBRbG4B/xURZsk0N6voBx6YoLL 0I7vnPqcPp6cB7jPdPPJ187834YslqHq8cZhI/041tkhhWet5lGexUs4y+eVJ0p9 MpAC8d69z0JEGu7IkDAvuvTCneNIQnB+KGj0RuPUulTEiSdBmJB6A== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh4969g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:53 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGWSRN005259; Mon, 22 Dec 2025 16:50:52 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b67mjy3j7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:52 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGomkM51183874 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:48 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A85420043; Mon, 22 Dec 2025 16:50:48 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7FFB120040; Mon, 22 Dec 2025 16:50:47 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:47 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 10/28] KVM: s390: Enable KVM_GENERIC_MMU_NOTIFIER Date: Mon, 22 Dec 2025 17:50:15 +0100 Message-ID: <20251222165033.162329-11-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976ed cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=sWpJ-_il0aw9mM0vlR0A:9 X-Proofpoint-ORIG-GUID: gQx8teKQmE3nWEQf_ZXii7oWBRcHvmt9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX9rpQV49u2xtt slI8cPukGjdUTfK7XZeX7TRdHDEDYeqRG9C03BVIK9D+xS7baFD9zdTrw/k4JAQjtzyFRYnzTGy Czxd2BpDYZ5Iw+bXrEGt/wEftwt03S3VW5P6OArLESxEWQ7Kswxn96elRyKGSxjJMQh8N5TlWxJ 13oTe+BD2eQn5X0sRNxvcMZfIkoWo2qDUi67E8azUhN3xIgNsvrW09U45JiJrkYzZOuWl4i2gi5 r186DbaTjNFk07cE6l7OAqwY+oqcEu1GyGf2vnqeq7Ud6CkP7zQ7Q5zy+khEIYOIfBIjZp6Q9Qw RkPzkkeMOauxPj7Fgxit+99AM2NVdEYZkPH6LG9+6q70yu3zPRpWWM9oAV/nidB7/yHdkSP0A4f T48SY8v90kLxd8pRMnd6KEbZ3u/hTYfVMTPAcpb8mr0vCifpGoINefU+f6e5CY5Ei7kXfujInGl GR/k7pl2nq+jGQ20uNw== X-Proofpoint-GUID: gQx8teKQmE3nWEQf_ZXii7oWBRcHvmt9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Enable KVM_GENERIC_MMU_NOTIFIER, for now with empty placeholder callbacks. Also enable KVM_MMU_LOCKLESS_AGING and define KVM_HAVE_MMU_RWLOCK. Signed-off-by: Claudio Imbrenda Acked-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Kconfig | 2 ++ arch/s390/kvm/kvm-s390.c | 45 +++++++++++++++++++++++++++++++- 3 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 3dbddb7c60a9..6ba99870fc32 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -27,6 +27,7 @@ #include #include =20 +#define KVM_HAVE_MMU_RWLOCK #define KVM_MAX_VCPUS 255 =20 #define KVM_INTERNAL_MEM_SLOTS 1 diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index f4ec8c1ce214..917ac740513e 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -30,6 +30,8 @@ config KVM select KVM_VFIO select MMU_NOTIFIER select VIRT_XFER_TO_GUEST_WORK + select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_LOCKLESS_AGING help Support hosting paravirtualized guest machines using the SIE virtualization capability on the mainframe. This should work diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index cd39b2f099ca..ec92e6361eab 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4805,7 +4805,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu= , gfn_t gfn, gpa_t gaddr, u rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); if (!rc) rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(spinlock, &vcpu->kvm->mmu_lock) { + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { kvm_release_faultin_page(vcpu->kvm, page, false, writable); } mmap_read_unlock(vcpu->arch.gmap->mm); @@ -6021,6 +6021,49 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, return; } =20 +/** + * kvm_test_age_gfn() - test young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range is young, otherwise 0. + */ +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_age_gfn() - clear young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range was young, otherwise 0. + */ +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_unmap_gfn_range() - Unmap a range of guest addresses + * @kvm: the kvm instance + * @range: the range of guest page frames to invalidate + * + * This function always returns false because every DAT table modification + * has to use the appropriate DAT table manipulation instructions, which w= ill + * keep the TLB coherent, hence no additional TLB flush is ever required. + * + * Context: called by KVM common code with the kvm mmu write lock held + * Return: false + */ +bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + static inline unsigned long nonhyp_mask(int i) { unsigned int nonhyp_fai =3D (sclp.hmfai << i * 2) >> 30; --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10CF633D4F3; Mon, 22 Dec 2025 16:50:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422259; cv=none; b=f4UAUJ8stOuoZXN1fbI3+FNqzFrZqjaXG636l1pe/5MJJ02RMRzoEyw13H9BtBD+aF/H8s8+NB6yztFUBmnZqDrN5xwuQmGq166stjdEw1ccISaVZIoj0QtZYw/aoCD9n06+TR9bSoGACz9+nIkktm8a2nz0GfbhoNa17nEWSdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422259; c=relaxed/simple; bh=gvAnFdHT/XMjCR1zs+fqNo8NXhkPFU1ciyQBZIXGeAY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ds5/JBMNLTHlMAFCAxxMHoBE9TIwztOpxW/zm9hrNv4nQdDTGCZprbwxj5l7s40JpFozx8fsFhp7WC45RvJf6MWE/9s+oXpUxsTPPj+WTqSwGcBh4sFeQIj8i7RNEFWNHoiQeu8LnuEw7jB5yIzBZ35MQR+ti8I0WcOoLuSwiXM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=As2BtLOa; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="As2BtLOa" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM98BRG006456; Mon, 22 Dec 2025 16:50:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=JtFCSxrF9CLU3LvfT rCfyoeTzzbfLQKkoH4wpG6bzTA=; b=As2BtLOaQnTj9VnDUzi309on7nUIDo1ys xK/tGuT3l7QTYNB5ukFiFx8f9+seaeFwgv2v/2Sa4KiRocwxJEQ1cvWUvYOMeQ3t YPnNd7YixPfiqMrPVTCMjhdkWq/ahk45PeRW7ajerIxWK4yinmUURrG9r9+pDebO Q3jUR/i2/NeuI2L6dqua2HBf/RAQuTEdyAKEXir2yxP9C95/a1FZ7fOvh8pdhDP4 x+GeqTaBnNa3ENTuxaWxxALwjx4g0/UN2z2FjxugLwv3mWtQ/9TmyGiKuQX9Ds9G DrPaaTUAPd4VuFDPLE46Gx9vDkMiDBR/FLHzZRDhWYlqWT60YkKdg== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399fk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:55 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGJQkw005081; Mon, 22 Dec 2025 16:50:54 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq594-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:54 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGoogs30867750 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:50 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E383920043; Mon, 22 Dec 2025 16:50:49 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B95F820040; Mon, 22 Dec 2025 16:50:48 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:48 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 11/28] KVM: s390: Rename some functions in gaccess.c Date: Mon, 22 Dec 2025 17:50:16 +0100 Message-ID: <20251222165033.162329-12-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: rrLLpt03bAYDQLi4t1WDVC-UlZZ16wYQ X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX9p3wcmZFzSUk UsoJA3pf3MWuFG+T0enCEt31+Ic1hijs9odYAsjo9q++5NUuf12ApPwMoQeRu6GgLmPBxLjqAR9 cgg4VvV5lMgU8lsM1kAvooBD/fWh8Mc1MMGQDexS57KxGSy++dm6bt4Tzky/NtuVzqoV3Hjk7Hb E0vHp3Rw9rHVMoHVSENQRJB0xE4rrI798ExdVMgKrgrkZSl4Jg2VVi0ZINZ0dxdyRIkIF+44Z50 Z+yuSnqidgEgzSz99kdh1iuccXtnaanVIqMfe3OR9ahhvCw/VDVZXQeqQB2mcpbiVNp4gCgu3VK JcMv3+zobdTUPtdK7uLkJI9Syxit5HQi3l0BGvorSGwPfpcO2ob61xa/KeSUGV/xwREt8946TqC ZAdzq9MCChNqBo8oTupSV9A8NlVAv/al3HNbIy+foqXvZaOJFOCTcis34LaiDnAw4MJBFBuYADh h58jDqlrAvFI8XtnVaQ== X-Proofpoint-ORIG-GUID: rrLLpt03bAYDQLi4t1WDVC-UlZZ16wYQ X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976ef cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=daTLW1200KpnZZ0VTdgA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Rename some functions in gaccess.c to add a _gva or _gpa suffix to indicate whether the function accepts a virtual or a guest-absolute address. This makes it easier to understand the code. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/kvm/gaccess.c | 51 +++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index d8347f7cbe51..9df868bddf9a 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -397,7 +397,7 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) } =20 /** - * guest_translate - translate a guest virtual into a guest absolute addre= ss + * guest_translate_gva() - translate a guest virtual into a guest absolute= address * @vcpu: virtual cpu * @gva: guest virtual address * @gpa: points to where guest physical (absolute) address should be stored @@ -417,9 +417,9 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) * the returned value is the program interruption code as defined * by the architecture */ -static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long = gva, - unsigned long *gpa, const union asce asce, - enum gacc_mode mode, enum prot_type *prot) +static unsigned long guest_translate_gva(struct kvm_vcpu *vcpu, unsigned l= ong gva, + unsigned long *gpa, const union asce asce, + enum gacc_mode mode, enum prot_type *prot) { union vaddress vaddr =3D {.addr =3D gva}; union raddress raddr =3D {.addr =3D gva}; @@ -600,8 +600,8 @@ static int low_address_protection_enabled(struct kvm_vc= pu *vcpu, return 1; } =20 -static int vm_check_access_key(struct kvm *kvm, u8 access_key, - enum gacc_mode mode, gpa_t gpa) +static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, + enum gacc_mode mode, gpa_t gpa) { u8 storage_key, access_control; bool fetch_protected; @@ -663,9 +663,9 @@ static bool storage_prot_override_applies(u8 access_con= trol) return access_control =3D=3D PAGE_SPO_ACC; } =20 -static int vcpu_check_access_key(struct kvm_vcpu *vcpu, u8 access_key, - enum gacc_mode mode, union asce asce, gpa_t gpa, - unsigned long ga, unsigned int len) +static int vcpu_check_access_key_gpa(struct kvm_vcpu *vcpu, u8 access_key, + enum gacc_mode mode, union asce asce, gpa_t gpa, + unsigned long ga, unsigned int len) { u8 storage_key, access_control; unsigned long hva; @@ -757,7 +757,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return trans_exc(vcpu, PGM_PROTECTION, ga, ar, mode, PROT_TYPE_LA); if (psw_bits(*psw).dat) { - rc =3D guest_translate(vcpu, ga, &gpa, asce, mode, &prot); + rc =3D guest_translate_gva(vcpu, ga, &gpa, asce, mode, &prot); if (rc < 0) return rc; } else { @@ -769,8 +769,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, } if (rc) return trans_exc(vcpu, rc, ga, ar, mode, prot); - rc =3D vcpu_check_access_key(vcpu, access_key, mode, asce, gpa, ga, - fragment_len); + rc =3D vcpu_check_access_key_gpa(vcpu, access_key, mode, asce, gpa, ga, = fragment_len); if (rc) return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_KEYC); if (gpas) @@ -782,8 +781,8 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return 0; } =20 -static int access_guest_page(struct kvm *kvm, enum gacc_mode mode, gpa_t g= pa, - void *data, unsigned int len) +static int access_guest_page_gpa(struct kvm *kvm, enum gacc_mode mode, gpa= _t gpa, + void *data, unsigned int len) { const unsigned int offset =3D offset_in_page(gpa); const gfn_t gfn =3D gpa_to_gfn(gpa); @@ -798,9 +797,8 @@ static int access_guest_page(struct kvm *kvm, enum gacc= _mode mode, gpa_t gpa, return rc; } =20 -static int -access_guest_page_with_key(struct kvm *kvm, enum gacc_mode mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) +static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, + void *data, unsigned int len, u8 access_key) { struct kvm_memory_slot *slot; bool writable; @@ -808,7 +806,7 @@ access_guest_page_with_key(struct kvm *kvm, enum gacc_m= ode mode, gpa_t gpa, hva_t hva; int rc; =20 - gfn =3D gpa >> PAGE_SHIFT; + gfn =3D gpa_to_gfn(gpa); slot =3D gfn_to_memslot(kvm, gfn); hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); =20 @@ -841,7 +839,7 @@ int access_guest_abs_with_key(struct kvm *kvm, gpa_t gp= a, void *data, =20 while (min(PAGE_SIZE - offset, len) > 0) { fragment_len =3D min(PAGE_SIZE - offset, len); - rc =3D access_guest_page_with_key(kvm, mode, gpa, data, fragment_len, ac= cess_key); + rc =3D access_guest_page_with_key_gpa(kvm, mode, gpa, data, fragment_len= , access_key); if (rc) return rc; offset =3D 0; @@ -901,15 +899,14 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsi= gned long ga, u8 ar, for (idx =3D 0; idx < nr_pages; idx++) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpas[idx]), len); if (try_fetch_prot_override && fetch_prot_override_applies(ga, fragment_= len)) { - rc =3D access_guest_page(vcpu->kvm, mode, gpas[idx], - data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpas[idx], data, fragment= _len); } else { - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, access_key); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, access_key); } if (rc =3D=3D PGM_PROTECTION && try_storage_prot_override) - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, PAGE_SPO_ACC); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, PAGE_SPO_ACC); if (rc) break; len -=3D fragment_len; @@ -943,7 +940,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, while (len && !rc) { gpa =3D kvm_s390_real_to_abs(vcpu, gra); fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), len); - rc =3D access_guest_page(vcpu->kvm, mode, gpa, data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpa, data, fragment_len); len -=3D fragment_len; gra +=3D fragment_len; data +=3D fragment_len; @@ -1134,7 +1131,7 @@ int check_gpa_range(struct kvm *kvm, unsigned long gp= a, unsigned long length, =20 while (length && !rc) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), length); - rc =3D vm_check_access_key(kvm, access_key, mode, gpa); + rc =3D vm_check_access_key_gpa(kvm, access_key, mode, gpa); length -=3D fragment_len; gpa +=3D fragment_len; } --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DADE0343D60; Mon, 22 Dec 2025 16:51:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422271; cv=none; b=Tg08xnb4Uwo3YtpkLPBkWQiko6c50SJzST8urH2VmxbJ+BzFiXcvjq/v0W3Q5CRktI856AjbBvs/Dhtvp5L/Q7FvhpeqBi/goX7CmEJjcnHmUCJuwRV2t/79JMgosnrpfLggtbl+8ZpQAk1wz/fdxmMjgebJNaRV4xopwtDLfts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422271; c=relaxed/simple; bh=p6v96WXUXFEdJeIINwLM9hNHIebxU2aN1Q/XRUwOUhA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iF+mSmIvruIXPwSLT0MorgzLdVB/Dts9ycutzXxMvf6xbkN3KP9O7DIesdImhotPrGqJazt8AtB39tLWK69VmIEBx8VGPBufLg3HNt7llUfTSOqyamnUhpxxD+Wr2FvkFhdvsPoDQcw9aQdDrH2/40WSLy1F5ESUwSbeChwEnM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=fA/pGQU/; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="fA/pGQU/" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM8etxS026247; Mon, 22 Dec 2025 16:50:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=mnVpvp1aD2Z+er12f VM/OjMXWhshAlJqos4w2+1aTAc=; b=fA/pGQU/oMvV1+XCbrvrZ583MsPgDn04i fpvCDvIGiK8VxwoI4jpJs/n7IpIwvfhk/PGsWxJsWW9A1d3406oEYbp+ZbWBWZkD uuLNo5wc2VJMX86hY1K2mEM3lc4HeBHt7ayRwerfazqUtYuqxOQ5eubTlyD6OPxR qfQMtOFKihHoa+N1fGuafkkjrTscuQvDtxjVaQfFV0ByCh57qvvsI6jh+71pvqf2 0H9/Pl3VDKD/KpTQ2xOWZ3g73o82boJ3NuD6Q36yAo1nezK4OzuqaDzASUVRvIIa z+yY/rzBoyZ7LIg5RN86ZZE3NokaEgqp5YXtpE2gdhQilnZf1gnxA== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh4969q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:56 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFUiE9001109; Mon, 22 Dec 2025 16:50:55 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b664s7a73-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:55 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGopBv51446214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:51 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4D5DD20043; Mon, 22 Dec 2025 16:50:51 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0DB5A20040; Mon, 22 Dec 2025 16:50:50 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:49 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 12/28] KVM: s390: KVM-specific bitfields and helper functions Date: Mon, 22 Dec 2025 17:50:17 +0100 Message-ID: <20251222165033.162329-13-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976f0 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=3TdPgYH1icVPsE-fAXsA:9 X-Proofpoint-ORIG-GUID: 2NeSkurF68BfpDeUPTI7fX1DwImIb71w X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXwpa+2apoLjz9 5u5F5c24FyuEuPAe1UagY77XpDCMq6Tp3h835C0aCPAWaLvVeYaHxXDhAwnOOMMzTlhTw2y0okC cV56Mi4MlM1JESEHUdQVrPhlSvVT1HQ20ncIPj0cCdJij4AS2/QUMsJFWctOW+vmrQsmuJh7RBH 4CPkBydm+qcZKSZ5RCygK9zm+N9wz3nBn4zSunOD2noUGw/cKarx9cqf7KQ3Otp9rqDqd7OQvMG 3r1Q/mlpv7mTte5wSr+y21PNbjAWEiVyv1Q+uxR2+ioaxzlLdMSkJqTnet8oOdRz/iJHhmlUb47 FlmUhWWbHi56eUs3St8VSJT7gzkRjYK54lk2qnBjA2VDU8M1z6XjkfX49FRhSrNblYyjIiWBPFH NiWXUDHpy71qmK5P8mF/CZ6b3hrYrhylb/WkVwUl0KIV+UScj0MIBa+TRJmte0e7ta4yZX8d4C3 opIM9ghR56bwxpsMZ8g== X-Proofpoint-GUID: 2NeSkurF68BfpDeUPTI7fX1DwImIb71w X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add KVM-s390 specific bitfields and helper functions to manipulate DAT tables. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.h | 720 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 720 insertions(+) create mode 100644 arch/s390/kvm/dat.h diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h new file mode 100644 index 000000000000..4d2b7a7bf898 --- /dev/null +++ b/arch/s390/kvm/dat.h @@ -0,0 +1,720 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2024, 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_DAT_H +#define __KVM_S390_DAT_H + +#include +#include +#include +#include +#include +#include +#include + +#define _ASCE(x) ((union asce) { .val =3D (x), }) +#define NULL_ASCE _ASCE(0) + +enum { + _DAT_TOKEN_NONE =3D 0, + _DAT_TOKEN_PIC, +}; + +#define _CRSTE_TOK(l, t, p) ((union crste) { \ + .tok.i =3D 1, \ + .tok.tt =3D (l), \ + .tok.type =3D (t), \ + .tok.par =3D (p) \ + }) +#define _CRSTE_PIC(l, p) _CRSTE_TOK(l, _DAT_TOKEN_PIC, p) + +#define _CRSTE_HOLE(l) _CRSTE_PIC(l, PGM_ADDRESSING) +#define _CRSTE_EMPTY(l) _CRSTE_TOK(l, _DAT_TOKEN_NONE, 0) + +#define _PMD_EMPTY _CRSTE_EMPTY(TABLE_TYPE_SEGMENT) + +#define _PTE_TOK(t, p) ((union pte) { .tok.i =3D 1, .tok.type =3D (t), .to= k.par =3D (p) }) +#define _PTE_EMPTY _PTE_TOK(_DAT_TOKEN_NONE, 0) + +/* This fake table type is used for page table walks (both for normal page= tables and vSIE) */ +#define TABLE_TYPE_PAGE_TABLE -1 + +enum dat_walk_flags { + DAT_WALK_CONTINUE =3D 0x20, + DAT_WALK_IGN_HOLES =3D 0x10, + DAT_WALK_SPLIT =3D 0x08, + DAT_WALK_ALLOC =3D 0x04, + DAT_WALK_ANY =3D 0x02, + DAT_WALK_LEAF =3D 0x01, + DAT_WALK_DEFAULT =3D 0 +}; + +#define DAT_WALK_SPLIT_ALLOC (DAT_WALK_SPLIT | DAT_WALK_ALLOC) +#define DAT_WALK_ALLOC_CONTINUE (DAT_WALK_CONTINUE | DAT_WALK_ALLOC) +#define DAT_WALK_LEAF_ALLOC (DAT_WALK_LEAF | DAT_WALK_ALLOC) + +union pte { + unsigned long val; + union page_table_entry h; + struct { + unsigned long :56; /* Hardware bits */ + unsigned long u : 1; /* Page unused */ + unsigned long s : 1; /* Special */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long sd: 1; /* Soft dirty */ + unsigned long pr: 1; /* Present */ + } s; + struct { + unsigned char hwbytes[7]; + unsigned char swbyte; + }; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :20; + unsigned long : 1; /* Must be 0 */ + unsigned long i : 1; /* Must be 1 */ + unsigned long : 2; + unsigned long : 7; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; +}; + +/* Soft dirty, needed as macro for atomic operations on ptes */ +#define _PAGE_SD 0x002 + +/* Needed as macro to perform atomic operations */ +#define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ + +enum pgste_gps_usage { + PGSTE_GPS_USAGE_STABLE =3D 0, + PGSTE_GPS_USAGE_UNUSED, + PGSTE_GPS_USAGE_POT_VOLATILE, + PGSTE_GPS_USAGE_VOLATILE, +}; + +union pgste { + unsigned long val; + struct { + unsigned long acc : 4; + unsigned long fp : 1; + unsigned long : 3; + unsigned long pcl : 1; + unsigned long hr : 1; + unsigned long hc : 1; + unsigned long : 2; + unsigned long gr : 1; + unsigned long gc : 1; + unsigned long : 1; + unsigned long :16; /* val16 */ + unsigned long zero : 1; + unsigned long nodat : 1; + unsigned long : 4; + unsigned long usage : 2; + unsigned long : 8; + unsigned long cmma_d : 1; /* Dirty flag for CMMA bits */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificatio= n */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 5; + unsigned long : 8; + }; + struct { + unsigned short hwbytes0; + unsigned short val16; /* used to store chunked values, see dat_{s,g}et_p= tval() */ + unsigned short hwbytes4; + unsigned char flags; /* maps to the software bits */ + unsigned char hwbyte7; + } __packed; +}; + +union pmd { + unsigned long val; + union segment_table_entry h; + struct { + struct { + unsigned long :44; /* HW */ + unsigned long : 3; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union pud { + unsigned long val; + union region3_table_entry h; + struct { + struct { + unsigned long :33; /* HW */ + unsigned long :14; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union p4d { + unsigned long val; + union region2_table_entry h; +}; + +union pgd { + unsigned long val; + union region1_table_entry h; +}; + +union crste { + unsigned long val; + union { + struct { + unsigned long :52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long : 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long : 2; + }; + struct { + unsigned long to:52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long tf: 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long tl: 2; + } fc0; + struct { + unsigned long :47; + unsigned long av : 1; /* ACCF-Validity Control */ + unsigned long acc: 4; /* Access-Control Bits */ + unsigned long f : 1; /* Fetch-Protection Bit */ + unsigned long fc : 1; /* Format-Control */ + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long iep: 1; /* Instruction-Execution-Protection */ + unsigned long : 2; + unsigned long i : 1; /* Segment-Invalid Bit */ + unsigned long cs : 1; /* Common-Segment Bit */ + unsigned long tt : 2; /* Table-Type Bits */ + unsigned long : 2; + } fc1; + } h; + struct { + struct { + unsigned long :47; + unsigned long : 1; /* HW (should be 0) */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :26; + unsigned long i : 1; /* Must be 1 */ + unsigned long : 1; + unsigned long tt : 2; + unsigned long : 1; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; + union pmd pmd; + union pud pud; + union p4d p4d; + union pgd pgd; +}; + +union skey { + unsigned char skey; + struct { + unsigned char acc :4; + unsigned char fp :1; + unsigned char r :1; + unsigned char c :1; + unsigned char zero:1; + }; +}; + +static_assert(sizeof(union pgste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pte) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pmd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pud) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union p4d) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pgd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union crste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union skey) =3D=3D sizeof(char)); + +struct segment_table { + union pmd pmds[_CRST_ENTRIES]; +}; + +struct region3_table { + union pud puds[_CRST_ENTRIES]; +}; + +struct region2_table { + union p4d p4ds[_CRST_ENTRIES]; +}; + +struct region1_table { + union pgd pgds[_CRST_ENTRIES]; +}; + +struct crst_table { + union { + union crste crstes[_CRST_ENTRIES]; + struct segment_table segment; + struct region3_table region3; + struct region2_table region2; + struct region1_table region1; + }; +}; + +struct page_table { + union pte ptes[_PAGE_ENTRIES]; + union pgste pgstes[_PAGE_ENTRIES]; +}; + +static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); +static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); + +/** + * _pte() - Useful constructor for union pte + * @pfn: the pfn this pte should point to. + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * @special: whether the pte should be marked as special + * + * The pte is also marked as young and present. If the pte is marked as di= rty, + * it gets marked as soft-dirty too. If the pte is not dirty, the hardware + * protect bit is set (independently of the write softbit); this way proper + * dirty tracking can be performed. + * + * Return: a union pte value. + */ +static inline union pte _pte(kvm_pfn_t pfn, bool writable, bool dirty, boo= l special) +{ + union pte res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.p =3D !dirty; + res.s.y =3D 1; + res.s.pr =3D 1; + res.s.w =3D writable; + res.s.d =3D dirty; + res.s.sd =3D dirty; + res.s.s =3D special; + return res; +} + +static inline union crste _crste_fc0(kvm_pfn_t pfn, int tt) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.tt =3D tt; + res.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + res.h.fc0.tf =3D 0; + return res; +} + +/** + * _crste() - Useful constructor for union crste with FC=3D1 + * @pfn: the pfn this pte should point to. + * @tt: the table type + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * + * The crste is also marked as young and present. If the crste is marked as + * dirty, it gets marked as soft-dirty too. If the crste is not dirty, the + * hardware protect bit is set (independently of the write softbit); this = way + * proper dirty tracking can be performed. + * + * Return: a union crste value. + */ +static inline union crste _crste_fc1(kvm_pfn_t pfn, int tt, bool writable,= bool dirty) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) & _SEGMENT_MASK }; + + res.h.tt =3D tt; + res.h.p =3D !dirty; + res.h.fc =3D 1; + res.s.fc1.y =3D 1; + res.s.fc1.pr =3D 1; + res.s.fc1.w =3D writable; + res.s.fc1.d =3D dirty; + res.s.fc1.sd =3D dirty; + return res; +} + +/** + * struct vsie_rmap - reverse mapping for shadow page table entries + * @next: pointer to next rmap in the list + * @r_gfn: virtual rmap address in the shadow guest address space + */ +struct vsie_rmap { + struct vsie_rmap *next; + union { + unsigned long val; + struct { + long level: 8; + unsigned long : 4; + unsigned long r_gfn:52; + }; + }; +}; + +static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); + +static inline struct crst_table *crste_table_start(union crste *crstep) +{ + return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); +} + +static inline struct page_table *pte_table_start(union pte *ptep) +{ + return (struct page_table *)ALIGN_DOWN((unsigned long)ptep, _PAGE_TABLE_S= IZE); +} + +static inline bool crdte_crste(union crste *crstep, union crste old, union= crste new, gfn_t gfn, + union asce asce) +{ + unsigned long dtt =3D 0x10 | new.h.tt << 2; + void *table =3D crste_table_start(crstep); + + return crdte(old.val, new.val, table, dtt, gfn_to_gpa(gfn), asce.val); +} + +/** + * idte_crste() - invalidate a crste entry using idte + * @crstep: pointer to the crste to be invalidated + * @gfn: a gfn mapped by the crste + * @opt: options for the idte instruction + * @asce: the asce + * @local: whether the operation is cpu-local + */ +static __always_inline void idte_crste(union crste *crstep, gfn_t gfn, uns= igned long opt, + union asce asce, int local) +{ + unsigned long table_origin =3D __pa(crste_table_start(crstep)); + unsigned long gaddr =3D gfn_to_gpa(gfn) & HPAGE_MASK; + + if (__builtin_constant_p(opt) && opt =3D=3D 0) { + /* flush without guest asce */ + asm volatile("idte %[table_origin],0,%[gaddr],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr] "a" (gaddr), + [local] "i" (local) + : "cc"); + } else { + /* flush with guest asce */ + asm volatile("idte %[table_origin],%[asce],%[gaddr_opt],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr_opt] "a" (gaddr | opt), + [asce] "a" (asce.val), [local] "i" (local) + : "cc"); + } +} + +static inline void dat_init_pgstes(struct page_table *pt, unsigned long va= l) +{ + memset64((void *)pt->pgstes, val, PTRS_PER_PTE); +} + +static inline void dat_init_page_table(struct page_table *pt, unsigned lon= g ptes, + unsigned long pgstes) +{ + memset64((void *)pt->ptes, ptes, PTRS_PER_PTE); + dat_init_pgstes(pt, pgstes); +} + +static inline gfn_t asce_end(union asce asce) +{ + return 1ULL << ((asce.dt + 1) * 11 + _SEGMENT_SHIFT - PAGE_SHIFT); +} + +#define _CRSTE(x) ((union crste) { .val =3D _Generic((x), \ + union pgd : (x).val, \ + union p4d : (x).val, \ + union pud : (x).val, \ + union pmd : (x).val, \ + union crste : (x).val)}) + +#define _CRSTEP(x) ((union crste *)_Generic((*(x)), \ + union pgd : (x), \ + union p4d : (x), \ + union pud : (x), \ + union pmd : (x), \ + union crste : (x))) + +#define _CRSTP(x) ((struct crst_table *)_Generic((*(x)), \ + struct crst_table : (x), \ + struct segment_table : (x), \ + struct region3_table : (x), \ + struct region2_table : (x), \ + struct region1_table : (x))) + +static inline bool asce_contains_gfn(union asce asce, gfn_t gfn) +{ + return gfn < asce_end(asce); +} + +static inline bool is_pmd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_SEGMENT; +} + +static inline bool is_pud(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION3; +} + +static inline bool is_p4d(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION2; +} + +static inline bool is_pgd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION1; +} + +static inline phys_addr_t pmd_origin_large(union pmd pmd) +{ + return pmd.val & _SEGMENT_ENTRY_ORIGIN_LARGE; +} + +static inline phys_addr_t pud_origin_large(union pud pud) +{ + return pud.val & _REGION3_ENTRY_ORIGIN_LARGE; +} + +/** + * crste_origin_large() - Return the large frame origin of a large crste + * @crste: The crste whose origin is to be returned. Should be either a + * region-3 table entry or a segment table entry, in both cases wi= th + * FC set to 1 (large pages). + * + * Return: The origin of the large frame pointed to by @crste, or -1 if the + * crste was not large (wrong table type, or FC=3D=3D0) + */ +static inline phys_addr_t crste_origin_large(union crste crste) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return pmd_origin_large(crste.pmd); + return pud_origin_large(crste.pud); +} + +#define crste_origin(x) (_Generic((x), \ + union pmd : (x).val & _SEGMENT_ENTRY_ORIGIN, \ + union pud : (x).val & _REGION_ENTRY_ORIGIN, \ + union p4d : (x).val & _REGION_ENTRY_ORIGIN, \ + union pgd : (x).val & _REGION_ENTRY_ORIGIN)) + +static inline unsigned long pte_origin(union pte pte) +{ + return pte.val & PAGE_MASK; +} + +static inline bool pmd_prefix(union pmd pmd) +{ + return pmd.h.fc && pmd.s.fc1.prefix_notif; +} + +static inline bool pud_prefix(union pud pud) +{ + return pud.h.fc && pud.s.fc1.prefix_notif; +} + +static inline bool crste_leaf(union crste crste) +{ + return (crste.h.tt <=3D TABLE_TYPE_REGION3) && crste.h.fc; +} + +static inline bool crste_prefix(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.prefix_notif; +} + +static inline bool crste_dirty(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.d; +} + +static inline union pgste *pgste_of(union pte *pte) +{ + return (union pgste *)(pte + _PAGE_ENTRIES); +} + +static inline bool pte_hole(union pte pte) +{ + return pte.h.i && !pte.tok.pr && pte.tok.type !=3D _DAT_TOKEN_NONE; +} + +static inline bool _crste_hole(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type !=3D _DAT_TOKEN_NONE; +} + +#define crste_hole(x) _crste_hole(_CRSTE(x)) + +static inline bool _crste_none(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type =3D=3D _DAT_TOKEN_NON= E; +} + +#define crste_none(x) _crste_none(_CRSTE(x)) + +static inline phys_addr_t large_pud_to_phys(union pud pud, gfn_t gfn) +{ + return pud_origin_large(pud) | (gfn_to_gpa(gfn) & ~_REGION3_MASK); +} + +static inline phys_addr_t large_pmd_to_phys(union pmd pmd, gfn_t gfn) +{ + return pmd_origin_large(pmd) | (gfn_to_gpa(gfn) & ~_SEGMENT_MASK); +} + +static inline phys_addr_t large_crste_to_phys(union crste crste, gfn_t gfn) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return large_pmd_to_phys(crste.pmd, gfn); + return large_pud_to_phys(crste.pud, gfn); +} + +static inline bool cspg_crste(union crste *crstep, union crste old, union = crste new) +{ + return cspg(&crstep->val, old.val, new.val); +} + +static inline struct page_table *dereference_pmd(union pmd pmd) +{ + return phys_to_virt(crste_origin(pmd)); +} + +static inline struct segment_table *dereference_pud(union pud pud) +{ + return phys_to_virt(crste_origin(pud)); +} + +static inline struct region3_table *dereference_p4d(union p4d p4d) +{ + return phys_to_virt(crste_origin(p4d)); +} + +static inline struct region2_table *dereference_pgd(union pgd pgd) +{ + return phys_to_virt(crste_origin(pgd)); +} + +static inline struct crst_table *_dereference_crste(union crste crste) +{ + if (unlikely(is_pmd(crste))) + return NULL; + return phys_to_virt(crste_origin(crste.pud)); +} + +#define dereference_crste(x) (_Generic((x), \ + union pud : _dereference_crste(_CRSTE(x)), \ + union p4d : _dereference_crste(_CRSTE(x)), \ + union pgd : _dereference_crste(_CRSTE(x)), \ + union crste : _dereference_crste(_CRSTE(x)))) + +static inline struct crst_table *dereference_asce(union asce asce) +{ + return phys_to_virt(asce.val & _ASCE_ORIGIN); +} + +static inline void asce_flush_tlb(union asce asce) +{ + __tlb_flush_idte(asce.val); +} + +static inline bool pgste_get_trylock(union pte *ptep, union pgste *res) +{ + union pgste *pgstep =3D pgste_of(ptep); + union pgste old_pgste; + + if (READ_ONCE(pgstep->val) & PGSTE_PCL_BIT) + return false; + old_pgste.val =3D __atomic64_or_barrier(PGSTE_PCL_BIT, &pgstep->val); + if (old_pgste.pcl) + return false; + old_pgste.pcl =3D 1; + *res =3D old_pgste; + return true; +} + +static inline union pgste pgste_get_lock(union pte *ptep) +{ + union pgste res; + + while (!pgste_get_trylock(ptep, &res)) + cpu_relax(); + return res; +} + +static inline void pgste_set_unlock(union pte *ptep, union pgste pgste) +{ + pgste.pcl =3D 0; + barrier(); + WRITE_ONCE(*pgste_of(ptep), pgste); +} + +#endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B860F33F374; Mon, 22 Dec 2025 16:50:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422262; cv=none; b=f4VxLBcVJR5FITjqx0a4/F8yJWAmiZhmGkaWrrCQaYSTm+ZWZytHWzceHSa9/Gr3knR+fSOjBryUOQ4VSxsRWqvVxyu3XUDOY37g7oXrN2GaqXK+t5eyaSP1WPzo6EGn0SI74N7GAaJoyWasRmpre+DEVyKkrwTYCJ7tfAzhxIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422262; c=relaxed/simple; bh=eoxuQcXQTIkizKSt1WAd0DTo5vVsXjCYfBx5zhUqhbo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nKIT80wyzn5A5Ed+s6fgI4h5zHKUObYG6aQ7NJCFqvYjp3VBzvPeOMuvNbSrME4982mgeXQO+sF6w7rcVWkgeTf5C1HNnZMjjOsaA967G8n4B9W4bOred0RE6h/QQ3FmO/QqO3ydekYIKJQIF0X4exyxH0HwhivFPb75SgzE+tQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=OB6rbylj; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="OB6rbylj" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME1LGi008674; Mon, 22 Dec 2025 16:50:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=HbKFka+cfEybw+wKr JAUM/1DHwPpFXpwTBFJ49GKlFw=; b=OB6rbyljUhBzbvhgPC4iHTuo2Q8mBvcC0 UCEauX0pOeOlKZYUUJLegcjvEos/yVNvi3n9t4BIlE7FLusw6xCSc5GZmNHeSnFN RencSTKahU1MmvRL16c7adbP+5QeF0bcKWbN1FvO9moGnzSjNDRhsVJfMFzhPpqm YqYrIg7EfHSJkQqM0V4YCct050FRstfgMA8ARHHnPm8EvdXp4xdsE7qXwCv5J/po hHVVfOHf+HaIOaB8hBWSbbNbbMeoUNXkWWV0qeuUrg8hjsVM9k1gGlQ2mfux1WZL X6xHBmOYGCBYODkx9z0ekZi/2/Qh8mqnu/SzPCPyeskJSL4K9cBOA== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ketrtbf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:57 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMETqc5032307; Mon, 22 Dec 2025 16:50:56 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b68u0xvb5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:56 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGoqeh57213260 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:52 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BDA9020040; Mon, 22 Dec 2025 16:50:52 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6CF7A2004B; Mon, 22 Dec 2025 16:50:51 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:51 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 13/28] KVM: s390: KVM page table management functions: allocation Date: Mon, 22 Dec 2025 17:50:18 +0100 Message-ID: <20251222165033.162329-14-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: MUQnp7uS6yOx_gN5wD9F0GyYZ0CprvZ6 X-Authority-Analysis: v=2.4 cv=Qdxrf8bv c=1 sm=1 tr=0 ts=694976f1 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=ciOewYurfqE_QU1RCzEA:9 X-Proofpoint-ORIG-GUID: MUQnp7uS6yOx_gN5wD9F0GyYZ0CprvZ6 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX5fzV+yvMWjam tbK0A8jg6SF037k20PstZekc0vV+8E7F3vzOn8Lewb54hggLBPJ6bcw3hLooGe5FpnQm+qktMzW FxjZJQ5FBo1I1Fob9xhm4ZBgowJnk64w5/4meKc2ZsBMJJN2uKzu8rDB6uE26h9NmYPROCJTExY xb3dwC5Xa+CSIy1Kq+hLCKPpQLQt53v0VfhxhdAn0rN6sYtc+GSZcfvX+FlhpKEizi5yT3MyWxj SHO6ivdb41Wue7vY+oB3ZtQ7NcmeodObjJk83+YbjC57SYs3iVVXaH77tCuZUg1T4vFSYMtjLzS GGtyHFCTKbqhUXPfe/CXbJEdfZ++lddJsVU/U3pc/ZIn7Wrp6W4ogUXmh8gahCmCsj8JKlyl7BF owJ9v+a1+9G88gk89+Vz+Fd943Qga1TlVIWeWFlzdkalon4otjgEpCNuCcEdml5ySd0T4Y9aKVu BFLFnlPPtxIZvUNHqQw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds the boilerplate and functions for the allocation and deallocation of DAT tables. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/Makefile | 1 + arch/s390/kvm/dat.c | 103 +++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 77 +++++++++++++++++++++++++++ arch/s390/mm/page-states.c | 1 + 4 files changed, 182 insertions(+) create mode 100644 arch/s390/kvm/dat.c diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 9a723c48b05a..84315d2f75fb 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,6 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D dat.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c new file mode 100644 index 000000000000..c324a27f379f --- /dev/null +++ b/arch/s390/kvm/dat.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2020, 2024 + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "dat.h" + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc) +{ + void *o; + + for ( ; mc->n_crsts < KVM_S390_MMU_CACHE_N_CRSTS; mc->n_crsts++) { + o =3D (void *)__get_free_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALL= OC_ORDER); + if (!o) + return -ENOMEM; + mc->crsts[mc->n_crsts] =3D o; + } + for ( ; mc->n_pts < KVM_S390_MMU_CACHE_N_PTS; mc->n_pts++) { + o =3D (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->pts[mc->n_pts] =3D o; + } + for ( ; mc->n_rmaps < KVM_S390_MMU_CACHE_N_RMAPS; mc->n_rmaps++) { + o =3D kzalloc(sizeof(*mc->rmaps[0]), GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->rmaps[mc->n_rmaps] =3D o; + } + return 0; +} + +static inline struct page_table *dat_alloc_pt_noinit(struct kvm_s390_mmu_c= ache *mc) +{ + struct page_table *res; + + res =3D kvm_s390_mmu_cache_alloc_pt(mc); + if (res) + __arch_set_page_dat(res, 1); + return res; +} + +static inline struct crst_table *dat_alloc_crst_noinit(struct kvm_s390_mmu= _cache *mc) +{ + struct crst_table *res; + + res =3D kvm_s390_mmu_cache_alloc_crst(mc); + if (res) + __arch_set_page_dat(res, 1UL << CRST_ALLOC_ORDER); + return res; +} + +struct crst_table *dat_alloc_crst_sleepable(unsigned long init) +{ + struct page *page; + void *virt; + + page =3D alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALLOC_ORDER); + if (!page) + return NULL; + virt =3D page_to_virt(page); + __arch_set_page_dat(virt, 1UL << CRST_ALLOC_ORDER); + crst_table_init(virt, init); + return virt; +} + +void dat_free_level(struct crst_table *table, bool owns_ptes) +{ + unsigned int i; + + for (i =3D 0; i < _CRST_ENTRIES; i++) { + if (table->crstes[i].h.fc || table->crstes[i].h.i) + continue; + if (!is_pmd(table->crstes[i])) + dat_free_level(dereference_crste(table->crstes[i]), owns_ptes); + else if (owns_ptes) + dat_free_pt(dereference_pmd(table->crstes[i].pmd)); + } + dat_free_crst(table); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 4d2b7a7bf898..4c75d3f75b33 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -418,6 +418,46 @@ struct vsie_rmap { =20 static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); =20 +#define KVM_S390_MMU_CACHE_N_CRSTS 6 +#define KVM_S390_MMU_CACHE_N_PTS 2 +#define KVM_S390_MMU_CACHE_N_RMAPS 16 +struct kvm_s390_mmu_cache { + void *crsts[KVM_S390_MMU_CACHE_N_CRSTS]; + void *pts[KVM_S390_MMU_CACHE_N_PTS]; + void *rmaps[KVM_S390_MMU_CACHE_N_RMAPS]; + short int n_crsts; + short int n_pts; + short int n_rmaps; +}; + +void dat_free_level(struct crst_table *table, bool owns_ptes); +struct crst_table *dat_alloc_crst_sleepable(unsigned long init); + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); + +#define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) + +static inline struct page_table *kvm_s390_mmu_cache_alloc_pt(struct kvm_s3= 90_mmu_cache *mc) +{ + if (mc->n_pts) + return mc->pts[--mc->n_pts]; + return (void *)__get_free_page(GFP_KVM_S390_MMU_CACHE); +} + +static inline struct crst_table *kvm_s390_mmu_cache_alloc_crst(struct kvm_= s390_mmu_cache *mc) +{ + if (mc->n_crsts) + return mc->crsts[--mc->n_crsts]; + return (void *)__get_free_pages(GFP_KVM_S390_MMU_CACHE | __GFP_COMP, CRST= _ALLOC_ORDER); +} + +static inline struct vsie_rmap *kvm_s390_mmu_cache_alloc_rmap(struct kvm_s= 390_mmu_cache *mc) +{ + if (mc->n_rmaps) + return mc->rmaps[--mc->n_rmaps]; + return kzalloc(sizeof(struct vsie_rmap), GFP_KVM_S390_MMU_CACHE); +} + static inline struct crst_table *crste_table_start(union crste *crstep) { return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); @@ -717,4 +757,41 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_free_pt(struct page_table *pt) +{ + free_page((unsigned long)pt); +} + +static inline void _dat_free_crst(struct crst_table *table) +{ + free_pages((unsigned long)table, CRST_ALLOC_ORDER); +} + +#define dat_free_crst(x) _dat_free_crst(_CRSTP(x)) + +static inline void kvm_s390_free_mmu_cache(struct kvm_s390_mmu_cache *mc) +{ + if (!mc) + return; + while (mc->n_pts) + dat_free_pt(mc->pts[--mc->n_pts]); + while (mc->n_crsts) + _dat_free_crst(mc->crsts[--mc->n_crsts]); + while (mc->n_rmaps) + kfree(mc->rmaps[--mc->n_rmaps]); + kfree(mc); +} + +DEFINE_FREE(kvm_s390_mmu_cache, struct kvm_s390_mmu_cache *, if (_T) kvm_s= 390_free_mmu_cache(_T)) + +static inline struct kvm_s390_mmu_cache *kvm_s390_new_mmu_cache(void) +{ + struct kvm_s390_mmu_cache *mc __free(kvm_s390_mmu_cache) =3D NULL; + + mc =3D kzalloc(sizeof(*mc), GFP_KERNEL_ACCOUNT); + if (mc && !kvm_s390_mmu_cache_topup(mc)) + return_ptr(mc); + return NULL; +} + #endif /* __KVM_S390_DAT_H */ diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c index 01f9b39e65f5..5bee173db72e 100644 --- a/arch/s390/mm/page-states.c +++ b/arch/s390/mm/page-states.c @@ -13,6 +13,7 @@ #include =20 int __bootdata_preserved(cmma_flag); +EXPORT_SYMBOL(cmma_flag); =20 void arch_free_page(struct page *page, int order) { --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97D4D341AB1; Mon, 22 Dec 2025 16:51:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422265; cv=none; b=EncK/G0NyyRqhyHdbvXYKw0xThjhqRTByMrUMGEYXlc7bc6zzTs889sIYV+4994pBb0hqhtjmZB+kYfVB3lInTVYqBT1kbZd3BopGi/ZSEwzUC8vomVjb8FXzPGUwvXU2B1SPA1/AtEWWTqOiXdX7kMhARBHJY4vkxd+VVhbCdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422265; c=relaxed/simple; bh=X2TJVC2pFsJBNiUDyEEdwK8qIoifTxXHQZk0Ly1TXR0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=elIDoxcD6i7bQ40BWw8yPpkZsGlwCg8TqvNX4SzzjjxlO2hsLAjXZijgUjtYDxbRsoxmjYcbkMUv3H+PlW/CgEs5kj+tMaI5oPxawBUA8gUXEz6Bls6/SroEFJJdhYOOzWYNbxqQUzlhphLrHk6vSNq+FSPcCouh7PYeVawFYgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=lLwzAL83; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="lLwzAL83" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFmoUG028485; Mon, 22 Dec 2025 16:50:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Giij1mw98ZfVYgxs9 QmMMATXaIoF0clrpmC5B56yKYM=; b=lLwzAL83Y6PXVZqA5oyInWDstvHZFm4+q V42q9Q9L1wcJPnRJs05fTOaUJ7iLllFqRAkFI3F1Em9wWq7+uAkaXIH4ZlSg/GgF gCcJ+0KzxX1SZKaZR69BDB/w+ybKq6WTfhkDWZC2unWEL/JN2FNiPO4432bVMBkS d4ESxhBxyIcxSM7GCaf5UztKi6C0tl3Pzr24Ug8Zq81WxX+gjIraYOGdpCfqxdy4 Z2zxKDs2HlkZZtOhrfPju7oKl1ZLNmr9mzzg/wg9m7r+DLnp+VTX+DN6jk6CL+5i ca/gCGk2QPSnZ86CwkcuRWZBUN4iwNH2oynItqNTAX1s1X792MRDw== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399g2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:59 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFeVx5030236; Mon, 22 Dec 2025 16:50:58 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b66gxq973-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:58 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGosYK25100984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:54 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1FB6E20040; Mon, 22 Dec 2025 16:50:54 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DC03720043; Mon, 22 Dec 2025 16:50:52 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:52 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 14/28] KVM: s390: KVM page table management functions: clear and replace Date: Mon, 22 Dec 2025 17:50:19 +0100 Message-ID: <20251222165033.162329-15-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: DCgaGH-PWnNhBf5YB_9QRoNjze1e6Z0I X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX9MIYhYSlSzPK E8nqJ9CBcgBd5J8dqlgauzXxkNzB37ZHqaIckXfA7rqh+F/ZVRUZnSv+LHVwSVqo5L1TSq+52n3 GdzeTkVPPubIv6I4lQdgUBglgXCVt2U+LLQVnJomk2ivCDhK7eG7BDk3XtDExCsHZbsKeSK+2VB T48MgWKtA/Y32VeF7AmQ/GaLgVtSiFDUjzKcnmMnWlVzTWCmEB5bdNQzARGVEhhaKJmyrsUIJJA 8Uj9e6tP2ZcwWkQfusNs523JuF1Lg+RH0OvDz5Yp1UDh9TF2Ry8MXooMoiBVHapL9RHJ34FeHpG eCn68tvpkfIs+21R/FrwA5DZAbP9NP02FxlHZtumGJks0UOgOic6PanxyhV7666DUf569cPc01A bW9GcYRN2Q5SuqRqVycXX5MvMt3iQ0fTl+JgYOQkPsI/9TmcyFpwVfkduDjD6XKZ5rS7gFtVk2f TghzvgQTfMRz4fPf8nA== X-Proofpoint-ORIG-GUID: DCgaGH-PWnNhBf5YB_9QRoNjze1e6Z0I X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976f3 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=cTj5uL8xXfUXbJxEK6wA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to clear, replace or exchange DAT table entries. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 118 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 40 +++++++++++++++ 2 files changed, 158 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index c324a27f379f..a9d5b49ac411 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -101,3 +101,121 @@ void dat_free_level(struct crst_table *table, bool ow= ns_ptes) } dat_free_crst(table); } + +/** + * dat_crstep_xchg - exchange a gmap CRSTE with another + * @crstep: pointer to the CRST entry + * @new: replacement entry + * @gfn: the affected guest address + * @asce: the ASCE of the address space + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + */ +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce) +{ + if (crstep->h.i) { + WRITE_ONCE(*crstep, new); + return; + } else if (cpu_has_edat2()) { + crdte_crste(crstep, *crstep, new, gfn, asce); + return; + } + + if (machine_has_tlb_guest()) + idte_crste(crstep, gfn, IDTE_GUEST_ASCE, asce, IDTE_GLOBAL); + else + idte_crste(crstep, gfn, 0, NULL_ASCE, IDTE_GLOBAL); + WRITE_ONCE(*crstep, new); +} + +/** + * dat_crstep_xchg_atomic - atomically exchange a gmap CRSTE with another + * @crstep: pointer to the CRST entry + * @old: expected old value + * @new: replacement entry + * @gfn: the affected guest address + * @asce: the asce of the address space + * + * This function should only be called on invalid crstes, or on crstes with + * FC =3D 1, as that guarantees the presence of CSPG. + * + * This function is needed to atomically exchange a CRSTE that potentially + * maps a prefix area, without having to invalidate it inbetween. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: true if the exchange was successful. + */ +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce) +{ + if (old.h.i) + return arch_try_cmpxchg((long *)crstep, &old.val, new.val); + if (cpu_has_edat2()) + return crdte_crste(crstep, old, new, gfn, asce); + return cspg_crste(crstep, old, new); +} + +static void dat_set_storage_key_from_pgste(union pte pte, union pgste pgst= e) +{ + union skey nkey =3D { .acc =3D pgste.acc, .fp =3D pgste.fp }; + + page_set_storage_key(pte_origin(pte), nkey.skey, 0); +} + +static void dat_move_storage_key(union pte old, union pte new) +{ + page_set_storage_key(pte_origin(new), page_get_storage_key(pte_origin(old= )), 1); +} + +static union pgste dat_save_storage_key_into_pgste(union pte pte, union pg= ste pgste) +{ + union skey skey; + + skey.skey =3D page_get_storage_key(pte_origin(pte)); + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gr |=3D skey.r; + pgste.gc |=3D skey.c; + + return pgste; +} + +union pgste __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte = new, gfn_t gfn, + union asce asce, bool uses_skeys) +{ + union pte old =3D READ_ONCE(*ptep); + + /* Updating only the software bits while holding the pgste lock */ + if (!((ptep->val ^ new.val) & ~_PAGE_SW_BITS)) { + WRITE_ONCE(ptep->swbyte, new.swbyte); + return pgste; + } + + if (!old.h.i) { + unsigned long opts =3D IPTE_GUEST_ASCE | (pgste.nodat ? IPTE_NODAT : 0); + + if (machine_has_tlb_guest()) + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, opts, asce.val, IPTE_GLOBAL); + else + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, 0, 0, IPTE_GLOBAL); + } + + if (uses_skeys) { + if (old.h.i && !new.h.i) + /* Invalid to valid: restore storage keys from PGSTE */ + dat_set_storage_key_from_pgste(new, pgste); + else if (!old.h.i && new.h.i) + /* Valid to invalid: save storage keys to PGSTE */ + pgste =3D dat_save_storage_key_into_pgste(old, pgste); + else if (!old.h.i && !new.h.i) + /* Valid to valid: move storage keys */ + if (old.h.pfra !=3D new.h.pfra) + dat_move_storage_key(old, new); + /* Invalid to invalid: nothing to do */ + } + + WRITE_ONCE(*ptep, new); + return pgste; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 4c75d3f75b33..c0644021c92f 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -430,6 +430,12 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, + gfn_t gfn, union asce asce, bool uses_skeys); +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce); +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); + void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -757,6 +763,21 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_ptep_xchg(union pte *ptep, union pte new, gfn_t gfn= , union asce asce, + bool has_skeys) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, asce, has_skeys); + pgste_set_unlock(ptep, pgste); +} + +static inline void dat_ptep_clear(union pte *ptep, gfn_t gfn, union asce a= sce, bool has_skeys) +{ + dat_ptep_xchg(ptep, _PTE_EMPTY, gfn, asce, has_skeys); +} + static inline void dat_free_pt(struct page_table *pt) { free_page((unsigned long)pt); @@ -794,4 +815,23 @@ static inline struct kvm_s390_mmu_cache *kvm_s390_new_= mmu_cache(void) return NULL; } =20 +static inline bool dat_pmdp_xchg_atomic(union pmd *pmdp, union pmd old, un= ion pmd new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pmdp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline bool dat_pudp_xchg_atomic(union pud *pudp, union pud old, un= ion pud new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pudp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline void dat_crstep_clear(union crste *crstep, gfn_t gfn, union = asce asce) +{ + union crste newcrste =3D _CRSTE_EMPTY(crstep->h.tt); + + dat_crstep_xchg(crstep, newcrste, gfn, asce); +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:13:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28DD0345758; Mon, 22 Dec 2025 16:51:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; cv=none; b=NkR/k4OX1yNQ/piq4cJjcm94drBWXdr5oUrN82z1O61J+t8lrFuogMgrdKWUWjt9vZ0jctFYmz4IGG7S4L/fYbwgT6exq8dLp7/FOfbL4YhAoSQGMGiuzl5okByUWsg5qI8/WziTdr0AE/x9fPjPozu63ZBYo0D5ChEjFSzVRg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; c=relaxed/simple; bh=dOGoktkSnDROYSWx/tDAgTahb1C9GZks20MCvXf0EHc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vwm/vgOAEScC+WWCWKZyC0p/1NGG83c3YaZTvwY5HjqZvQdWwyHKXK3sjC7AoGbNxZXGC44kXl1716CCys2PTQwOsfvqzdgzFRTeu+VyPUO1N6Xfnb+Qol8kcOvbY9GZuFLP4rYZswntMUrxgqTjMNpdbuB4svfMRMoQhd7J4k4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=bHTnbL6k; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="bHTnbL6k" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM8e8V1011222; Mon, 22 Dec 2025 16:51:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=NTr//VvWPk13tZIUa XxqfMQNotjeI07nz81OxxYw+wA=; b=bHTnbL6kObB7/Vpr8xAwDhgxPI+O8hWGX EebpAcuPZw7kqpJnydWhVk9k33UAjPZULphjP9ttc1GCaYnHvwr+B/w0jZ7S3Ywc MbDyx+7CEjqDYt11HQEElb2HP/EzmN/k13BfIYimXlK+LLKCpkdTFRK5ZxE8zZjO LKXAIrWuemzMI66iobBnjNRqyYBynSxSc22TBj1XWrgxr/NnnyoyMi8+Hz1aqI99 B+BP8Nn55jkXYBQff5cf4D+1tczVKtAKbWgPVqnDrAK/5knEztTTUIDMujAhDptl v3NlzvLC/iKqjw0mYBY+40uGdOnuqoH6JS0T8+yOCCoDSvBfAq84Q== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399g4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:00 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGOJss004192; Mon, 22 Dec 2025 16:50:59 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq59k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:50:59 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGotoS50266514 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:55 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7CBA420040; Mon, 22 Dec 2025 16:50:55 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E95620043; Mon, 22 Dec 2025 16:50:54 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:54 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 15/28] KVM: s390: KVM page table management functions: walks Date: Mon, 22 Dec 2025 17:50:20 +0100 Message-ID: <20251222165033.162329-16-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Yl8XHNglxAw1hWwTGxOxpMJPrUFK67b6 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXxZD+OIUsJwkp VZVP+TRa7aCFLwYR3gi/FUxfc/NH0PYcCYO8a6xrDxawlkdXKSpMebkipPFnijOF1iXjwABC1Qo IMhblMs6woKVKPIRsfaWsLg8K9WOYlpIcDd7dKMvYVeMHBu+2UKLcqhlrWux/NCCO+hDFGAo/dR a1twUxQD4OjeeZeWnUQmEOzVo4fOEUvBhh5FnQkT7r7vnqENS3CqxEttOJOhwR4fL6zHWLtUgbu oQ6yHtFlHbJ3lCAzFl/1sqcQxzVRyMFAHqNwDYMv5ueRV4OP2uhobjsdppqpnTPEQq7SkGZOjlb svob3WbUBT+fPgB8nli6Gkcg7Upypq2Te+0Kn1BcB2VHddoXu82wB96JBxn/AFn56FiouqQKoZV E4tl8JhaiI2zEKhN4ywZnhY7bIKpbLwjbDlBdERbs5OfF+leV0RH77rikUGPvt6zGRrxwPeJq6Q KbV77TQwPvpKu+C2r0A== X-Proofpoint-ORIG-GUID: Yl8XHNglxAw1hWwTGxOxpMJPrUFK67b6 X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976f4 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=FjFqaoouhZgXAMKSwZQA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to walk to specific table entries, or to perform actions on a range of entries. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 386 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 39 +++++ 2 files changed, 425 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index a9d5b49ac411..3158652dbe8e 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -219,3 +219,389 @@ union pgste __dat_ptep_xchg(union pte *ptep, union pg= ste pgste, union pte new, g WRITE_ONCE(*ptep, new); return pgste; } + +/* + * dat_split_ste - Split a segment table entry into page table entries + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: 0 in case of success, -ENOMEM if running out of memory. + */ +static int dat_split_ste(struct kvm_s390_mmu_cache *mc, union pmd *pmdp, g= fn_t gfn, + union asce asce, bool uses_skeys) +{ + union pgste pgste_init; + struct page_table *pt; + union pmd new, old; + union pte init; + int i; + + BUG_ON(!mc); + old =3D READ_ONCE(*pmdp); + + /* Already split, nothing to do */ + if (!old.h.i && !old.h.fc) + return 0; + + pt =3D dat_alloc_pt_noinit(mc); + if (!pt) + return -ENOMEM; + new.val =3D virt_to_phys(pt); + + while (old.h.i || old.h.fc) { + init.val =3D pmd_origin_large(old); + init.h.p =3D old.h.p; + init.h.i =3D old.h.i; + init.s.d =3D old.s.fc1.d; + init.s.w =3D old.s.fc1.w; + init.s.y =3D old.s.fc1.y; + init.s.sd =3D old.s.fc1.sd; + init.s.pr =3D old.s.fc1.pr; + pgste_init.val =3D 0; + if (old.h.fc) { + for (i =3D 0; i < _PAGE_ENTRIES; i++) + pt->ptes[i].val =3D init.val | i * PAGE_SIZE; + /* no need to take locks as the page table is not installed yet */ + pgste_init.prefix_notif =3D old.s.fc1.prefix_notif; + pgste_init.pcl =3D uses_skeys && init.h.i; + dat_init_pgstes(pt, pgste_init.val); + } else { + dat_init_page_table(pt, init.val, 0); + } + + if (dat_pmdp_xchg_atomic(pmdp, old, new, gfn, asce)) { + if (!pgste_init.pcl) + return 0; + for (i =3D 0; i < _PAGE_ENTRIES; i++) { + union pgste pgste =3D pt->pgstes[i]; + + pgste =3D dat_save_storage_key_into_pgste(pt->ptes[i], pgste); + pgste_set_unlock(pt->ptes + i, pgste); + } + return 0; + } + old =3D READ_ONCE(*pmdp); + } + + dat_free_pt(pt); + return 0; +} + +/* + * dat_split_crste - Split a crste into smaller crstes + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: 0 in case of success, -ENOMEM if running out of memory. + */ +static int dat_split_crste(struct kvm_s390_mmu_cache *mc, union crste *crs= tep, + gfn_t gfn, union asce asce, bool uses_skeys) +{ + struct crst_table *table; + union crste old, new, init; + int i; + + old =3D READ_ONCE(*crstep); + if (is_pmd(old)) + return dat_split_ste(mc, &crstep->pmd, gfn, asce, uses_skeys); + + BUG_ON(!mc); + + /* Already split, nothing to do */ + if (!old.h.i && !old.h.fc) + return 0; + + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + + new.val =3D virt_to_phys(table); + new.h.tt =3D old.h.tt; + new.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + + while (old.h.i || old.h.fc) { + init =3D old; + init.h.tt--; + if (old.h.fc) { + for (i =3D 0; i < _CRST_ENTRIES; i++) + table->crstes[i].val =3D init.val | i * HPAGE_SIZE; + } else { + crst_table_init((void *)table, init.val); + } + if (dat_crstep_xchg_atomic(crstep, old, new, gfn, asce)) + return 0; + old =3D READ_ONCE(*crstep); + } + + dat_free_crst(table); + return 0; +} + +/** + * dat_entry_walk() - walk the gmap page tables + * @mc: cache to use to allocate dat tables, if needed; can be NULL if nei= ther + * @DAT_WALK_SPLIT or @DAT_WALK_ALLOC is specified. + * @gfn: guest frame + * @asce: the ASCE of the address space + * @flags: flags from WALK_* macros + * @walk_level: level to walk to, from LEVEL_* macros + * @last: will be filled the last visited non-pte DAT entry + * @ptepp: will be filled the last visited pte entry, if any, otherwise NU= LL + * + * Returns a table entry pointer for the given guest address and @level + * + * The @flags have the following meanings: + * * @DAT_WALK_IGN_HOLES: consider holes as normal table entries + * * @DAT_WALK_ALLOC: allocate new tables to reach the requested level, if= needed + * * @DAT_WALK_SPLIT: split existing large pages to reach the requested le= vel, if needed + * * @DAT_WALK_LEAF: return successfully whenever a large page is encounte= red + * * @DAT_WALK_ANY: return successfully even if the requested level could = not be reached + * * @DAT_WALK_CONTINUE: walk to the requested level with the specified fl= ags, and then try to + * continue walking to ptes with only DAT_WALK_ANY + * * @DAT_WALK_USES_SKEYS: storage keys are in use + * + * Context: called with kvm->mmu_lock held. + * + * Return: + * * PGM_ADDRESSING if the requested address lies outside memory + * * a PIC number if the requested address lies in a memory hole of type _= DAT_TOKEN_PIC + * * -EFAULT if the requested address lies inside a memory hole of a diffe= rent type + * * -EINVAL if the given ASCE is not compatible with the requested level + * * -EFBIG if the requested level could not be reached because a larger f= rame was found + * * -ENOENT if the requested level could not be reached for other reasons + * * -ENOMEM if running out of memory while allocating or splitting a table + */ +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp) +{ + union vaddress vaddr =3D { .addr =3D gfn_to_gpa(gfn) }; + bool continue_anyway =3D flags & DAT_WALK_CONTINUE; + bool uses_skeys =3D flags & DAT_WALK_USES_SKEYS; + bool ign_holes =3D flags & DAT_WALK_IGN_HOLES; + bool allocate =3D flags & DAT_WALK_ALLOC; + bool split =3D flags & DAT_WALK_SPLIT; + bool leaf =3D flags & DAT_WALK_LEAF; + bool any =3D flags & DAT_WALK_ANY; + struct page_table *pgtable; + struct crst_table *table; + union crste entry; + int rc; + + *last =3D NULL; + *ptepp =3D NULL; + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (WARN_ON_ONCE(unlikely(walk_level > asce.dt))) + return -EINVAL; + if (!asce_contains_gfn(asce, gfn)) + return PGM_ADDRESSING; + + table =3D dereference_asce(asce); + if (asce.dt >=3D ASCE_TYPE_REGION1) { + *last =3D table->crstes + vaddr.rfx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION1)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION1) + return 0; + if (entry.pgd.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pgd); + } + + if (asce.dt >=3D ASCE_TYPE_REGION2) { + *last =3D table->crstes + vaddr.rsx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION2)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION2) + return 0; + if (entry.p4d.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.p4d); + } + + if (asce.dt >=3D ASCE_TYPE_REGION3) { + *last =3D table->crstes + vaddr.rtx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION3)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION3 && + continue_anyway && !entry.pud.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_REGION3 || ((leaf || any) && entry.pud.= h.fc)) + return 0; + if (entry.pud.h.i && !entry.pud.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_SEGMENT && entry.pud.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pud); + } + + *last =3D table->crstes + vaddr.sx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_SEGMENT)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (continue_anyway && !entry.pmd.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_SEGMENT || ((leaf || any) && entry.pmd.h= .fc)) + return 0; + + if (entry.pmd.h.i && !entry.pmd.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_PAGE_TABLE && entry.pmd.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + pgtable =3D dereference_pmd(entry.pmd); + *ptepp =3D pgtable->ptes + vaddr.px; + if (pte_hole(**ptepp) && !ign_holes) + return (*ptepp)->tok.type =3D=3D _DAT_TOKEN_PIC ? (*ptepp)->tok.par : -E= FAULT; + return 0; +} + +static long dat_pte_walk_range(gfn_t gfn, gfn_t end, struct page_table *ta= ble, struct dat_walk *w) +{ + unsigned int idx =3D gfn & (_PAGE_ENTRIES - 1); + long rc =3D 0; + + for ( ; gfn < end; idx++, gfn++) { + if (pte_hole(READ_ONCE(table->ptes[idx]))) { + if (!(w->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(w->flags & DAT_WALK_ANY)) + continue; + } + + rc =3D w->ops->pte_entry(table->ptes + idx, gfn, gfn + 1, w); + if (rc) + break; + } + return rc; +} + +static long dat_crste_walk_range(gfn_t start, gfn_t end, struct crst_table= *table, + struct dat_walk *walk) +{ + unsigned long idx, cur_shift, cur_size; + dat_walk_op the_op; + union crste crste; + gfn_t cur, next; + long rc =3D 0; + + cur_shift =3D 8 + table->crstes[0].h.tt * 11; + idx =3D (start >> cur_shift) & (_CRST_ENTRIES - 1); + cur_size =3D 1UL << cur_shift; + + for (cur =3D ALIGN_DOWN(start, cur_size); cur < end; idx++, cur =3D next)= { + next =3D cur + cur_size; + walk->last =3D table->crstes + idx; + crste =3D READ_ONCE(*walk->last); + + if (crste_hole(crste)) { + if (!(walk->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(walk->flags & DAT_WALK_ANY)) + continue; + } + + the_op =3D walk->ops->crste_ops[crste.h.tt]; + if (the_op) { + rc =3D the_op(walk->last, cur, next, walk); + crste =3D READ_ONCE(*walk->last); + } + if (rc) + break; + if (!crste.h.i && !crste.h.fc) { + if (!is_pmd(crste)) + rc =3D dat_crste_walk_range(max(start, cur), min(end, next), + _dereference_crste(crste), walk); + else if (walk->ops->pte_entry) + rc =3D dat_pte_walk_range(max(start, cur), min(end, next), + dereference_pmd(crste.pmd), walk); + } + } + return rc; +} + +/** + * _dat_walk_gfn_range() - walk DAT tables + * @start: the first guest page frame to walk + * @end: the guest page frame immediately after the last one to walk + * @asce: the ASCE of the guest mapping + * @ops: the gmap_walk_ops that will be used to perform the walk + * @flags: flags from WALK_* (currently only WALK_IGN_HOLES is supported) + * @priv: will be passed as-is to the callbacks + * + * Any callback returning non-zero causes the walk to stop immediately. + * + * Return: -EINVAL in case of error, -EFAULT if @start is too high for the= given + * asce unless the DAT_WALK_IGN_HOLES flag is specified, otherwise= it + * returns whatever the callbacks return. + */ +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv) +{ + struct crst_table *table =3D dereference_asce(asce); + struct dat_walk walk =3D { + .ops =3D ops, + .asce =3D asce, + .priv =3D priv, + .flags =3D flags, + .start =3D start, + .end =3D end, + }; + + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (!asce_contains_gfn(asce, start)) + return (flags & DAT_WALK_IGN_HOLES) ? 0 : -EFAULT; + + return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index c0644021c92f..902d4a044067 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -45,6 +45,7 @@ enum { #define TABLE_TYPE_PAGE_TABLE -1 =20 enum dat_walk_flags { + DAT_WALK_USES_SKEYS =3D 0x40, DAT_WALK_CONTINUE =3D 0x20, DAT_WALK_IGN_HOLES =3D 0x10, DAT_WALK_SPLIT =3D 0x08, @@ -332,6 +333,34 @@ struct page_table { static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); =20 +struct dat_walk; + +typedef long (*dat_walk_op)(union crste *crste, gfn_t gfn, gfn_t next, str= uct dat_walk *w); + +struct dat_walk_ops { + union { + dat_walk_op crste_ops[4]; + struct { + dat_walk_op pmd_entry; + dat_walk_op pud_entry; + dat_walk_op p4d_entry; + dat_walk_op pgd_entry; + }; + }; + long (*pte_entry)(union pte *pte, gfn_t gfn, gfn_t next, struct dat_walk = *w); +}; + +struct dat_walk { + const struct dat_walk_ops *ops; + union crste *last; + union pte *last_pte; + union asce asce; + gfn_t start; + gfn_t end; + int flags; + void *priv; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -436,6 +465,11 @@ bool dat_crstep_xchg_atomic(union crste *crstep, union= crste old, union crste ne union asce asce); void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); =20 +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv); + +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -834,4 +868,9 @@ static inline void dat_crstep_clear(union crste *crstep= , gfn_t gfn, union asce a dat_crstep_xchg(crstep, newcrste, gfn, asce); } =20 +static inline int get_level(union crste *crstep, union pte *ptep) +{ + return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AAE8342504; Mon, 22 Dec 2025 16:51:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422267; cv=none; b=kWKLaKdFo7ZG7QcaWUA2i6dR7ttc06/ljyY2yuIcBfSz7W2c0RCzlPbY5hc4xCEFi+U8sNOb0A7iJo51eMq7Fb8pzqH2qhkG/rkUSzDQhb6GKZyYdxTORU3ghhLqba+ZpVBtAtSC/3WdYksxh39zgEvVKIWJo2N+epPrslnx6Nk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422267; c=relaxed/simple; bh=SNPVmGbDk6iSrTLOy691ONUr2as1d8g3OduFeiPvqW8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nOmGOnJrqi1NWy2j38MtU5299714io+TkpVi8IN7A3uwlA4vUVOsunM6C1abrHJxO2CphyLJvgNigd8wix5yIm4hInW7+SyyiVDg/At/8ovMVt5B3gL5+GSQKvAw6BtfcvDIYsDcvj390gO2tTa7mo+Gs68WkOmh7h+KDqG3QAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ncIRb0TO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ncIRb0TO" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMC0i78030227; Mon, 22 Dec 2025 16:51:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=ZT9r8islz9PO0XWUm +hoR0dy1jVhIG1QR0XXNydAgPM=; b=ncIRb0TO3FhgBe/sUbrEnZxVyjdfdv2OP PWHFf3LxPWw55NoFVDoB4+8tHQgxmBTOZcMXPrQS+QeltE3FH162oeIXTFniwlqb WVRndWD4uhNlquiZkFU0Bap+3QSmsrbM0nDVvC1AxB1SeRJ2KzMjrU2F2wPY0qWH Y85daIANmx34BJLgy5e22N1t/ADfQs01SPiYnDoTdsQgtTQZc5yCc69jv7tXDeIH 41Tgkpw9mHgQJDT+oCyffktv/aaOAGTCFVIVun5t9/lelAUZu84+nuXXQKueylqD /VupFi1X6oI7bZ5mLBs3VgyXgUtMHgIfnbNlpeS3O6WoTEDp8H91g== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5h7hs7ed-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:01 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDcXZB005286; Mon, 22 Dec 2025 16:51:01 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b67mjy3jv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:00 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGouFS27394446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:57 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C964C20040; Mon, 22 Dec 2025 16:50:56 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9D29720043; Mon, 22 Dec 2025 16:50:55 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:55 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 16/28] KVM: s390: KVM page table management functions: storage keys Date: Mon, 22 Dec 2025 17:50:21 +0100 Message-ID: <20251222165033.162329-17-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX5tNUWdS/exYC WhNuvNz9FidyrQkIfcgJg8OLUkhCovcDCeHdg9+yjIkxZKtB9BIuWjerXf7ZUJ2z/d1Yluztf6l iSSeAgCTBD89tU1E/tO/mRPFHdPqaAR0TtmJHsajCfkmi9y8V/uRVIU29IDMUFoFvUKCNxyIYWo whWP2tLL91K1QsfAn+tag1Zw5XbfC6qyNvXHOB04q8XFaeO6Je4qf9u9bR+KcgpNzStkjQhoJX7 iKaO5t0XlfPkgGqR4rfU6R0nIPrD1kIPzpE9mbRA26Bty9OuoRh/2DyHBRTcfSZ/oMiZLyXxTXq iiZlvV+JLPGGMLbBoGKpg3NwTygf2qQ06XytomT/QCXmIXIMQgPTZkKUWs+JfYtvJwggZQuKlSK KMg3gNllLSKmgAkyzREwu+nLyRTFfx9CfwcpXv1IYaGh2BwTI45ZYWloIM/ZayVaFKXWAF2PHGr IWcztaLbVD0IsFYwY5A== X-Authority-Analysis: v=2.4 cv=Ba3VE7t2 c=1 sm=1 tr=0 ts=694976f5 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=W9Pr6BDoNAxxbfFvkF8A:9 X-Proofpoint-ORIG-GUID: FQBO9Xj6ir5p2Rj1hEpICd0VzBN2fid_ X-Proofpoint-GUID: FQBO9Xj6ir5p2Rj1hEpICd0VzBN2fid_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 bulkscore=0 adultscore=0 priorityscore=1501 malwarescore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions related to storage key handling. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 215 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 7 ++ 2 files changed, 222 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index 3158652dbe8e..d75329cf1f47 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -605,3 +605,218 @@ long _dat_walk_gfn_range(gfn_t start, gfn_t end, unio= n asce asce, =20 return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); } + +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int rc; + + skey->skey =3D 0; + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste; + + crste =3D READ_ONCE(*crstep); + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + skey->skey =3D page_get_storage_key(large_crste_to_phys(crste, gfn)); + return 0; + } + pgste =3D pgste_get_lock(ptep); + if (ptep->h.i) { + skey->acc =3D pgste.acc; + skey->fp =3D pgste.fp; + } else { + skey->skey =3D page_get_storage_key(pte_origin(*ptep)); + } + skey->r |=3D pgste.gr; + skey->c |=3D pgste.gc; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static void dat_update_ptep_sd(union pgste old, union pgste pgste, union p= te *ptep) +{ + if (pgste.acc !=3D old.acc || pgste.fp !=3D old.fp || pgste.gr !=3D old.g= r || pgste.gc !=3D old.gc) + __atomic64_or(_PAGE_SD, &ptep->val); +} + +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAGE= _TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + page_set_storage_key(large_crste_to_phys(*crstep, gfn), skey.skey, !nq); + return 0; + } + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + union skey old_skey; + + old_skey.skey =3D page_get_storage_key(pte_origin(*ptep)); + pgste.hc |=3D old_skey.c; + pgste.hr |=3D old_skey.r; + skey.r =3D 0; + skey.c =3D 0; + page_set_storage_key(pte_origin(*ptep), skey.skey, !nq); + } + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static bool page_cond_set_storage_key(phys_addr_t paddr, union skey skey, = union skey *oldkey, + bool nq, bool mr, bool mc) +{ + oldkey->skey =3D page_get_storage_key(paddr); + if (oldkey->acc =3D=3D skey.acc && oldkey->fp =3D=3D skey.fp && + (oldkey->r =3D=3D skey.r || mr) && (oldkey->c =3D=3D skey.c || mc)) + return false; + page_set_storage_key(paddr, skey.skey, !nq); + return true; +} + +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mmc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAG= E_TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) + return page_cond_set_storage_key(large_crste_to_phys(*crstep, gfn), skey= , oldkey, + nq, mr, mc); + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + rc =3D 1; + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + union skey prev; + + rc =3D page_cond_set_storage_key(pte_origin(*ptep), skey, &prev, nq, mr,= mc); + pgste.hc |=3D prev.c; + pgste.hr |=3D prev.r; + if (oldkey) + *oldkey =3D prev; + } + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +int dat_reset_reference_bit(union asce asce, gfn_t gfn) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + return page_reset_referenced(large_crste_to_phys(*crstep, gfn)); + } + old =3D pgste_get_lock(ptep); + pgste =3D old; + + if (!ptep->h.i) { + rc =3D page_reset_referenced(pte_origin(*ptep)); + pgste.hr =3D rc >> 1; + } + rc |=3D (pgste.gr << 1) | pgste.gc; + pgste.gr =3D 0; + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +static long dat_reset_skeys_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.acc =3D 0; + pgste.fp =3D 0; + pgste.gr =3D 0; + pgste.gc =3D 0; + if (ptep->s.pr) + page_set_storage_key(pte_origin(*ptep), PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(ptep, pgste); + + if (need_resched()) + return next; + return 0; +} + +static long dat_reset_skeys_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + phys_addr_t addr, end, origin =3D crste_origin_large(*crstep); + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + addr =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + while (ALIGN(addr + 1, _SEGMENT_SIZE) <=3D end) + addr =3D sske_frame(addr, PAGE_DEFAULT_KEY); + for ( ; addr < end; addr +=3D PAGE_SIZE) + page_set_storage_key(addr, PAGE_DEFAULT_KEY, 1); + + if (need_resched()) + return next; + return 0; +} + +long dat_reset_skeys(union asce asce, gfn_t start) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_reset_skeys_pte, + .pmd_entry =3D dat_reset_skeys_crste, + .pud_entry =3D dat_reset_skeys_crste, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 902d4a044067..4cd573fcbdd6 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -472,6 +472,13 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_= t gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq); +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc); +int dat_reset_reference_bit(union asce asce, gfn_t gfn); +long dat_reset_skeys(union asce asce, gfn_t start); =20 int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69B0A346A0F; Mon, 22 Dec 2025 16:51:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422277; cv=none; b=Hu1YZb5Y47XIBKpF/bQEczhxB51GdybDpKj5z30QQX8LLHEvU0Al1j3rhc/FVeNNssBlNUHoH0PNrplULjiT5V5jcPiwgWZTlwywcxR9FuLBkLNnkyYOLP/EZwkQ/Hr52ffnge5Y84s8uuX4UrbHjBRnPZrJAlVktzZtMsaLOws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422277; c=relaxed/simple; bh=of3lYvcLbms99qSG5RwvlyIFH06bv0/41A95CrzPfsM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uYDdSGSR+lML1ILVAiVsRv56SCNL7jXmrdWR5PMjws+i+GfTlO+HBhTAGgTM4+8l+Hb4KzwZlai/GoVrliVcYAj23QSGCd96b6VArGZNst4kjY3WcxxZQbEhnqOqkW2uzSeTeKT0Vfuz8kKAU1eYR5HgUgYGWBAh8CQWo232C/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=GxZYeeCB; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="GxZYeeCB" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM8e8V2011222; Mon, 22 Dec 2025 16:51:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=grH8tOJZn9+NERonw cyRmROq4CPB+nBGfXbnHu/EVdM=; b=GxZYeeCBvVYVcLWyH2auDbPFbLZ6EO1+W 0zR/5tVy0CuWh42pUrg17FVLMAeXl/CjwdrOB/eV5dF3vRBcSnEKlE6KM06NTqUa 29B8oOtaxmRG3ixZP7IvQl0lmnlqxifya+Vs8OgRE752CScmJ7YillPJx32r7Bav zMEWUS41SFdJRzhET5DDX4EgcONKWF1dBmsXLzCX0twpQ1aTtQ2Q0IdyFEyinmbv g0pQwbaghXzPzvCxK1Zz9eGce5XJWtaThL7JfolQOf1ElAOXqHg1WxXtvCb5FSIB ZJaIIcbRaedhQSvpB4BNkq8qTtoeDtYOx1lc8j5JeLDG8HTAnj86w== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399g9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:02 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME7GhC026979; Mon, 22 Dec 2025 16:51:02 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b6r93487g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:01 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGowkV51315036 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:58 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0EE0620040; Mon, 22 Dec 2025 16:50:58 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E7FE220043; Mon, 22 Dec 2025 16:50:56 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:56 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 17/28] KVM: s390: KVM page table management functions: lifecycle management Date: Mon, 22 Dec 2025 17:50:22 +0100 Message-ID: <20251222165033.162329-18-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: lYkHhPywXz3KU29ZPRmKWegKyNj3i5Vd X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXxMAjhhEaY8lA RKTytjESpZxuV9VccXxh/dP0OSisnDCX+E90bBnbl0aIuN5ssdK8LjcGp5oV+VBsNP2sn9WCzEi Sk3dwPtSQygpW4TDIbMjJztjy0aNlnvgHx2l883lZY+dSnYKiGiPrn2HHpydIkpgl7YY6+GkcsY PxMnmTmXveLq6tFJjGiw7kxX4afcF9Yj/dh2Tk4GfcuQAb/WSNVsMBXApozvKcdCVUDujVAzzu8 jP/kbsmIzQ42VLXsHhI2ed5VNkWAoxbDchkbPd3Xt37arg7xHsbQFzsN2Dio1q1sBYz5tXvnwDg jUjqhJ/cbcBa5eaZTZ093ggWFarBtOpHyTj9OjHIRBbnfKDoC+Wtr/hL1lHgarsPJnzra7tlYr+ SgfWK9EKbFmiH31u0mmMScY7DFjX46/iewwK9XfBm/HtBo+ILrSYILebQxxST78q+TEj0UwAK9V hAjIKcgcoVyx2NlWU8Q== X-Proofpoint-ORIG-GUID: lYkHhPywXz3KU29ZPRmKWegKyNj3i5Vd X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976f7 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iJ2WplcNSJuwDbTYnpEA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle memslot creation and destruction, additional per-pagetable data stored in the PGSTEs, mapping physical addresses into the gmap, and marking address ranges as prefix. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 284 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 54 +++++++++ 2 files changed, 338 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index d75329cf1f47..cafbcb665da4 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -102,6 +102,38 @@ void dat_free_level(struct crst_table *table, bool own= s_ptes) dat_free_crst(table); } =20 +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype) +{ + struct crst_table *table; + union crste crste; + + while (asce->dt > newtype) { + table =3D dereference_asce(*asce); + crste =3D table->crstes[0]; + if (crste.h.fc) + return 0; + if (!crste.h.i) { + asce->rsto =3D crste.h.fc0.to; + dat_free_crst(table); + } else { + crste.h.tt--; + crst_table_init((void *)table, crste.val); + } + asce->dt--; + } + while (asce->dt < newtype) { + crste =3D _crste_fc0(asce->rsto, asce->dt + 1); + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + crst_table_init((void *)table, _CRSTE_HOLE(crste.h.tt).val); + table->crstes[0] =3D crste; + asce->rsto =3D __pa(table) >> PAGE_SHIFT; + asce->dt++; + } + return 0; +} + /** * dat_crstep_xchg - exchange a gmap CRSTE with another * @crstep: pointer to the CRST entry @@ -820,3 +852,255 @@ long dat_reset_skeys(union asce asce, gfn_t start) =20 return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); } + +struct slot_priv { + unsigned long token; + struct kvm_s390_mmu_cache *mc; +}; + +static long _dat_slot_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct d= at_walk *walk) +{ + struct slot_priv *p =3D walk->priv; + union crste dummy =3D { .val =3D p->token }; + union pte new_pte, pte =3D READ_ONCE(*ptep); + + new_pte =3D _PTE_TOK(dummy.tok.type, dummy.tok.par); + + /* Table entry already in the desired state */ + if (pte.val =3D=3D new_pte.val) + return 0; + + dat_ptep_xchg(ptep, new_pte, gfn, walk->asce, false); + return 0; +} + +static long _dat_slot_crste(union crste *crstep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union crste new_crste, crste =3D READ_ONCE(*crstep); + struct slot_priv *p =3D walk->priv; + + new_crste.val =3D p->token; + new_crste.h.tt =3D crste.h.tt; + + /* Table entry already in the desired state */ + if (crste.val =3D=3D new_crste.val) + return 0; + + /* This table entry needs to be updated */ + if (walk->start <=3D gfn && walk->end >=3D next) { + dat_crstep_xchg_atomic(crstep, crste, new_crste, gfn, walk->asce); + /* A lower level table was present, needs to be freed */ + if (!crste.h.fc && !crste.h.i) + dat_free_level(dereference_crste(crste), true); + return 0; + } + + /* A lower level table is present, things will handled there */ + if (!crste.h.fc && !crste.h.i) + return 0; + /* Split (install a lower level table), and handle things there */ + return dat_split_crste(p->mc, crstep, gfn, walk->asce, false); +} + +static const struct dat_walk_ops dat_slot_ops =3D { + .pte_entry =3D _dat_slot_pte, + .crste_ops =3D { _dat_slot_crste, _dat_slot_crste, _dat_slot_crste, _dat_= slot_crste, }, +}; + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param) +{ + struct slot_priv priv =3D { + .token =3D _CRSTE_TOK(0, type, param).val, + .mc =3D mc, + }; + + return _dat_walk_gfn_range(start, end, asce, &dat_slot_ops, + DAT_WALK_IGN_HOLES | DAT_WALK_ANY, &priv); +} + +static void pgste_set_unlock_multiple(union pte *first, int n, union pgste= *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgstes[i].pcl) + break; + pgste_set_unlock(first + i, pgstes[i]); + } +} + +static bool pgste_get_trylock_multiple(union pte *first, int n, union pgst= e *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgste_get_trylock(first + i, pgstes + i)) + break; + } + if (i =3D=3D n) + return true; + pgste_set_unlock_multiple(first, n, pgstes); + return false; +} + +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram) +{ + union pgste pgstes[4] =3D {}; + unsigned long res =3D 0; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D 0; i < n; i++) + res =3D res << 16 | pgstes[i].val16; + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); + return res; +} + +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val) +{ + union pgste pgstes[4] =3D {}; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D param.len; i >=3D 0; i--) { + pgstes[i].val16 =3D val; + val =3D val >> 16; + } + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); +} + +static long _dat_test_young_pte(union pte *ptep, gfn_t start, gfn_t end, s= truct dat_walk *walk) +{ + return ptep->s.y; +} + +static long _dat_test_young_crste(union crste *crstep, gfn_t start, gfn_t = end, + struct dat_walk *walk) +{ + return crstep->h.fc && crstep->s.fc1.y; +} + +static const struct dat_walk_ops test_age_ops =3D { + .pte_entry =3D _dat_test_young_pte, + .pmd_entry =3D _dat_test_young_crste, + .pud_entry =3D _dat_test_young_crste, +}; + +/** + * dat_test_age_gfn() - test young + * @asce: the ASCE whose address range is to be tested + * @start: the first guest frame of the range to check + * @end: the guest frame after the last in the range + * + * Context: called by KVM common code with the kvm mmu write lock held + * Return: 1 if any page in the given range is young, otherwise 0. + */ +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end) +{ + return _dat_walk_gfn_range(start, end, asce, &test_age_ops, 0, NULL) > 0; +} + +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f) +{ + union crste oldval, newval; + union pte newpte, oldpte; + union pgste pgste; + int rc =3D 0; + + rc =3D dat_entry_walk(mc, f->gfn, asce, DAT_WALK_ALLOC_CONTINUE, level, &= f->crstep, &f->ptep); + if (rc =3D=3D -EINVAL || rc =3D=3D -ENOMEM) + return rc; + if (rc) + return -EAGAIN; + + if (WARN_ON_ONCE(unlikely(get_level(f->crstep, f->ptep) > level))) + return -EINVAL; + + if (f->ptep) { + pgste =3D pgste_get_lock(f->ptep); + oldpte =3D *f->ptep; + newpte =3D _pte(f->pfn, f->writable, f->write_attempt | oldpte.s.d, !f->= page); + newpte.s.sd =3D oldpte.s.sd; + oldpte.s.sd =3D 0; + if (oldpte.val =3D=3D _PTE_EMPTY.val || oldpte.h.pfra =3D=3D f->pfn) { + pgste =3D __dat_ptep_xchg(f->ptep, pgste, newpte, f->gfn, asce, uses_sk= eys); + if (f->callback) + f->callback(f); + } else { + rc =3D -EAGAIN; + } + pgste_set_unlock(f->ptep, pgste); + } else { + oldval =3D READ_ONCE(*f->crstep); + newval =3D _crste_fc1(f->pfn, oldval.h.tt, f->writable, + f->write_attempt | oldval.s.fc1.d); + newval.s.fc1.sd =3D oldval.s.fc1.sd; + if (oldval.val !=3D _CRSTE_EMPTY(oldval.h.tt).val && + crste_origin_large(oldval) !=3D crste_origin_large(newval)) + return -EAGAIN; + if (!dat_crstep_xchg_atomic(f->crstep, oldval, newval, f->gfn, asce)) + return -EAGAIN; + if (f->callback) + f->callback(f); + } + + return rc; +} + +static long dat_set_pn_crste(union crste *crstep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + union crste crste =3D READ_ONCE(*crstep); + int *n =3D walk->priv; + + if (!crste.h.fc || crste.h.i || crste.h.p) + return 0; + + *n =3D 2; + if (crste.s.fc1.prefix_notif) + return 0; + crste.s.fc1.prefix_notif =3D 1; + dat_crstep_xchg(crstep, crste, gfn, walk->asce); + return 0; +} + +static long dat_set_pn_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct = dat_walk *walk) +{ + int *n =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (!ptep->h.i && !ptep->h.p) { + pgste.prefix_notif =3D 1; + *n +=3D 1; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn) +{ + static const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_set_pn_pte, + .pmd_entry =3D dat_set_pn_crste, + .pud_entry =3D dat_set_pn_crste, + }; + + int n =3D 0; + + _dat_walk_gfn_range(gfn, gfn + 2, asce, &ops, DAT_WALK_IGN_HOLES, &n); + if (n !=3D 2) + return -EAGAIN; + return 0; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 4cd573fcbdd6..11573b1002f6 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -361,6 +361,11 @@ struct dat_walk { void *priv; }; =20 +struct ptval_param { + unsigned char offset : 6; + unsigned char len : 2; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -459,6 +464,32 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +struct guest_fault { + gfn_t gfn; /* Guest frame */ + kvm_pfn_t pfn; /* Host PFN */ + struct page *page; /* Host page */ + union pte *ptep; /* Used to resolve the fault, or NULL */ + union crste *crstep; /* Used to resolve the fault, or NULL */ + bool writable; /* Mapping is writable */ + bool write_attempt; /* Write access attempted */ + bool attempt_pfault; /* Attempt a pfault first */ + bool valid; /* This entry contains valid data */ + void (*callback)(struct guest_fault *f); + void *priv; +}; + +/* + * 0 1 2 3 4 5 6 7 + * +-------+-------+-------+-------+-------+-------+-------+-------+ + * 0 | | PGT_ADDR | + * 8 | VMADDR | | + * 16 | | + * 24 | | + */ +#define MKPTVAL(o, l) ((struct ptval_param) { .offset =3D (o), .len =3D ((= l) + 1) / 2 - 1}) +#define PTVAL_PGT_ADDR MKPTVAL(4, 8) +#define PTVAL_VMADDR MKPTVAL(8, 6) + union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, gfn_t gfn, union asce asce, bool uses_skeys); bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, @@ -472,6 +503,7 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t= gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype); int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, union skey skey, bool nq); @@ -480,6 +512,16 @@ int dat_cond_set_storage_key(struct kvm_s390_mmu_cache= *mmc, union asce asce, gf int dat_reset_reference_bit(union asce asce, gfn_t gfn); long dat_reset_skeys(union asce asce, gfn_t start); =20 +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram); +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val); + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param); +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn); +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end); +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) @@ -880,4 +922,16 @@ static inline int get_level(union crste *crstep, union= pte *ptep) return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; } =20 +static inline int dat_delete_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_PIC, PGM_= ADDRESSING); +} + +static inline int dat_create_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_NONE, 0); +} + #endif /* __KVM_S390_DAT_H */ --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAE62343D67; Mon, 22 Dec 2025 16:51:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422269; cv=none; b=t28I0YivbYMSymLON6MS5VpWwtayhVk6q/Bartxk0ljMd6ceVFoCDMNr9kp25QQ7G9sEIVjnZAhLoYg9oFEcwoJ8Uogk4RAVosfGudq3jWY0QmXVzeZEi90MI7dnKiwA23PmnHWBQoflRzdaz+RqKRCAD35lGlwNDhket0zE5zU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422269; c=relaxed/simple; bh=CJyFsJVQjunj6BueTVvsRyaCiXxMMDg1d6u7t4Nz4NY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T6llutD8IHf//+tDzrTSYe4xOkTCLeTaz7nwYptz35ykdqVUUSC6bvOsvbB9Ti/pAcQSPGJ6emaTNvUOx2U9oODPTaIH19F6GJLkZ7Ft/Q7Pxl9vzsyq37UYZkE4UMTMZi70Fd/Mlly+b2MS7hyCUcniwFL3siV5duiCdNa3viM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=XZA0TAV6; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="XZA0TAV6" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMAvnqj015271; Mon, 22 Dec 2025 16:51:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=99wBQTUVMuHSt/cv+ o7jfL3VGSLdqWIcxffaBnnI+68=; b=XZA0TAV6QWxmyCMlOez4pe4X7O9A/ayOW 5mN4bMhDquEufiGBedQvLUG8jHCIypRJ4Op8GFvA+ZYh9k3y2RxF0wHENyaVn6Gr Md063pH+S2oMeDSG/HU2DPrbD/soQq2q4o+OS1su6+VnwDamUBUnUF3yzlhQYa+n MECmsTKrQ9TczZfNo4f3bJ8hNPIBwsHDV2t6SQ0I5mtyFXVJhgpSzvVGrLwjdJBI Do6x4dmrBrnqlXeZYI1sOA82OjXvHolRwFNFhib1W5deXTYl02trpAzEjrmAVS86 Qa6GAJtBHLAkyByVgftlm99KUap8R6BzyfNy8VRHGbkwMEd0YQSBw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh496a7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:04 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGcjA9004608; Mon, 22 Dec 2025 16:51:03 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq59u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:03 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGox3u35324202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:50:59 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4B2C02004D; Mon, 22 Dec 2025 16:50:59 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2DF3620043; Mon, 22 Dec 2025 16:50:58 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:58 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 18/28] KVM: s390: KVM page table management functions: CMMA Date: Mon, 22 Dec 2025 17:50:23 +0100 Message-ID: <20251222165033.162329-19-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976f8 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=XQIW6j_J4to4VNKvMa8A:9 X-Proofpoint-ORIG-GUID: YWpajPLocPO4hYKeO_bKTD918fRcPZO2 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX2Jox4I5xBDvT kHkO+rn3vO35IvcLSVRBVg1Mx6CjTG3AXWRxpxJnUE8jK/IfpI1ZdbevDrD5vnZDrU8avJrAXUh qdmlY8x1LJ23oqLJeI6/KvKcNTnawhaeZbEsRL8M4wm6zXX3i0ZPSdnOV40fZUurjIEKzQAvE6Z nkNqwtccBxolHCmQhf7maiyCBBALsfZCETukG0rC3sDk5BhL7n4FYkwgPg9AUqI2kLQWb9GsuZE xEwFm23RWEE9QsATsaqNm/2LGJDM9eyIyW/RnLP3UPDpyL5Hb2VtEE2BBDaETZxviZ5AtJQ9sN4 ZwART3GPGcQPacURzlH8DMQoZMDFguhM5qLDCBrENbTe5ZsE2MGYEfW0NtyqXuNyKjI+wK8wF9J arNmAggHzsSpLZBD52OL9pPIE2zWQ7vU05KhT7pF355Y5MmSXsro9JKOoClCGAIQ+7comcksL/y vR44LCkbKAInHA1v2Bw== X-Proofpoint-GUID: YWpajPLocPO4hYKeO_bKTD918fRcPZO2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle CMMA and the ESSA instruction. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 262 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 27 +++++ 2 files changed, 289 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index cafbcb665da4..458e725315cc 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -1104,3 +1104,265 @@ int dat_set_prefix_notif_bit(union asce asce, gfn_t= gfn) return -EAGAIN; return 0; } + +/** + * dat_perform_essa() - perform ESSA actions on the PGSTE. + * @asce: the asce to operate on. + * @gfn: the guest page frame to operate on. + * @orc: the specific action to perform, see the ESSA_SET_* macros. + * @state: the storage attributes to be returned to the guest. + * @dirty: returns whether the function dirtied a previously clean entry. + * + * Context: Called with kvm->mmu_lock held. + * + * Return: + * * 1 if the page state has been altered and the page is to be added to t= he CBRL + * * 0 if the page state has been altered, but the page is not to be added= to the CBRL + * * -1 if the page state has not been altered and the page is not to be a= dded to the CBRL + */ +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int res =3D 0; + + if (dat_entry_walk(NULL, gfn, asce, 0, TABLE_TYPE_PAGE_TABLE, &crstep, &p= tep)) { + *state =3D (union essa_state) { .exception =3D 1 }; + return -1; + } + + pgste =3D pgste_get_lock(ptep); + + *state =3D (union essa_state) { + .content =3D (ptep->h.i << 1) + (ptep->h.i && pgste.zero), + .nodat =3D pgste.nodat, + .usage =3D pgste.usage, + }; + + switch (orc) { + case ESSA_GET_STATE: + res =3D -1; + break; + case ESSA_SET_STABLE: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 0; + break; + case ESSA_SET_UNUSED: + pgste.usage =3D PGSTE_GPS_USAGE_UNUSED; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_VOLATILE: + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_POT_VOLATILE: + if (!ptep->h.i) { + pgste.usage =3D PGSTE_GPS_USAGE_POT_VOLATILE; + } else if (pgste.zero) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + } else if (!pgste.gc) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + res =3D 1; + } + break; + case ESSA_SET_STABLE_RESIDENT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + /* + * Since the resident state can go away any time after this + * call, we will not make this page resident. We can revisit + * this decision if a guest will ever start using this. + */ + break; + case ESSA_SET_STABLE_IF_RESIDENT: + if (!ptep->h.i) + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + break; + case ESSA_SET_STABLE_NODAT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 1; + break; + default: + WARN_ONCE(1, "Invalid ORC!"); + res =3D -1; + break; + } + /* If we are discarding a page, set it to logical zero */ + pgste.zero =3D res =3D=3D 1; + if (orc > 0) { + *dirty =3D !pgste.cmma_d; + pgste.cmma_d =3D 1; + } + + pgste_set_unlock(ptep, pgste); + + return res; +} + +static long dat_reset_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D 0; + pgste.nodat =3D 0; + pgste.cmma_d =3D 0; + pgste_set_unlock(ptep, pgste); + if (need_resched()) + return next; + return 0; +} + +long dat_reset_cmma(union asce asce, gfn_t start) +{ + const struct dat_walk_ops dat_reset_cmma_ops =3D { + .pte_entry =3D dat_reset_cmma_pte, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &dat_reset_cmma_o= ps, + DAT_WALK_IGN_HOLES, NULL); +} + +struct dat_get_cmma_state { + gfn_t start; + gfn_t end; + unsigned int count; + u8 *values; + atomic64_t *remaining; +}; + +static long __dat_peek_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + state->values[gfn - walk->start] =3D pgste.usage | (pgste.nodat << 6); + pgste_set_unlock(ptep, pgste); + state->end =3D next; + + return 0; +} + +static long __dat_peek_cmma_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + + if (crstep->h.i) + state->end =3D min(walk->end, next); + return 0; +} + +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D __dat_peek_cmma_pte, + .pmd_entry =3D __dat_peek_cmma_crste, + .pud_entry =3D __dat_peek_cmma_crste, + .p4d_entry =3D __dat_peek_cmma_crste, + .pgd_entry =3D __dat_peek_cmma_crste, + }; + struct dat_get_cmma_state state =3D { .values =3D values, }; + int rc; + + rc =3D _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DE= FAULT, &state); + *count =3D state.end - start; + /* Return success if at least one value was saved, otherwise an error. */ + return (rc =3D=3D -EFAULT && *count > 0) ? 0 : rc; +} + +static long __dat_get_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + if (state->start !=3D -1) { + if ((gfn - state->end) > KVM_S390_MAX_BIT_DISTANCE) + return 1; + if (gfn - state->start >=3D state->count) + return 1; + } + + if (!READ_ONCE(*pgste_of(ptep)).cmma_d) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (pgste.cmma_d) { + if (state->start =3D=3D -1) + state->start =3D gfn; + pgste.cmma_d =3D 0; + atomic64_dec(state->remaining); + state->values[gfn - state->start] =3D pgste.usage | pgste.nodat << 6; + state->end =3D next; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_get_cmma_pte, }; + struct dat_get_cmma_state state =3D { + .remaining =3D rem, + .values =3D values, + .count =3D *count, + .start =3D -1, + }; + + _dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLE= S, &state); + + if (state.start =3D=3D -1) { + *count =3D 0; + } else { + *count =3D state.end - state.start; + *start =3D state.start; + } + + return 0; +} + +struct dat_set_cmma_state { + unsigned long mask; + const u8 *bits; +}; + +static long __dat_set_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_set_cmma_state *state =3D walk->priv; + union pgste pgste, tmp; + + tmp.val =3D (state->bits[gfn - walk->start] << 24) & state->mask; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D tmp.usage; + pgste.nodat =3D tmp.nodat; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +/* + * This function sets the CMMA attributes for the given pages. If the input + * buffer has zero length, no action is taken, otherwise the attributes are + * set and the mm->context.uses_cmm flag is set. + */ +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_set_cmma_pte, }; + struct dat_set_cmma_state state =3D { .mask =3D mask, .bits =3D bits, }; + union crste *crstep; + union pte *ptep; + gfn_t cur; + int rc; + + for (cur =3D ALIGN_DOWN(gfn, _PAGE_ENTRIES); cur < gfn + count; cur +=3D = _PAGE_ENTRIES) { + rc =3D dat_entry_walk(mc, cur, asce, DAT_WALK_ALLOC, TABLE_TYPE_PAGE_TAB= LE, + &crstep, &ptep); + if (rc) + return rc; + } + return _dat_walk_gfn_range(gfn, gfn + count, asce, &ops, DAT_WALK_IGN_HOL= ES, &state); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 11573b1002f6..b3ac58d10d6c 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -17,6 +17,15 @@ #include #include =20 +/* + * Base address and length must be sent at the start of each block, theref= ore + * it's cheaper to send some clean data, as long as it's less than the siz= e of + * two longs. + */ +#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) +/* For consistency */ +#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) + #define _ASCE(x) ((union asce) { .val =3D (x), }) #define NULL_ASCE _ASCE(0) =20 @@ -433,6 +442,17 @@ static inline union crste _crste_fc1(kvm_pfn_t pfn, in= t tt, bool writable, bool return res; } =20 +union essa_state { + unsigned char val; + struct { + unsigned char : 2; + unsigned char nodat : 1; + unsigned char exception : 1; + unsigned char usage : 2; + unsigned char content : 2; + }; +}; + /** * struct vsie_rmap - reverse mapping for shadow page table entries * @next: pointer to next rmap in the list @@ -522,6 +542,13 @@ bool dat_test_age_gfn(union asce asce, gfn_t start, gf= n_t end); int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, bool uses_skeys, struct guest_fault *f); =20 +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty); +long dat_reset_cmma(union asce asce, gfn_t start_gfn); +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues); +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem); +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3675E347BD2; Mon, 22 Dec 2025 16:51:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422280; cv=none; b=qdyx8KAYbx4MIh1DMEgwYNnsbU6O7/q89QhCIwPJIOVlu2A+7Ho8pEGP1mjCcCcO8XXFQ+Zat2FLQcFp0x63TASRm9+V4BHMa2gycayi4dQjUR9SxVCRTdUc74a0S90CAuf5WkhEabFC8suyho1hOdbLPYY2EkgfwUq3EfLLjX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422280; c=relaxed/simple; bh=XoZQ2UO8CCsmqosf5/NO7U39zkHO1l42SYBYI9H8/OU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bm0YookYKeYPmjtoJtEIitTqLFKL8PPidFBCxQ7PAxedL9FV2aKEtFneEaZGjDmfVUG8czmZ5MV48wrHI5jtTX4tTSi9r5tcJ+Dm7X4h8qW7drg3It+SXh/5H1WFnlz0CmuKvriqH6i/9ISGlVpcCE3vN9+JwgI8VXgrD8WvWvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=X6pP+8xc; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="X6pP+8xc" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGR0Os012571; Mon, 22 Dec 2025 16:51:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=x8UrrwagYh13+zukH mCe9NfSS46iZMLOa3UEIbKUkVE=; b=X6pP+8xcpd6sgt9SNZDSD9R2HXrR+ubje /flQU6xfXaFl8ZqSdIoFQEhjAl2b3yYIcNE8Y9Oa/vdsQA/JT5bTEAFx0lP8q01g P96F1yR8lQCDiFD7IR75LuMzrzlIr1+7ZCYMMgsacEJCkW2DGZ31aVirvwjA6K/X EU3gXYW7P+p1m/r5zbIGGw537fCF+2h3RA2FapSOndsjIWFHtWln9O8osWlPH86Z DXtFkh9PkFiv4G+4lUeeehsjsjvqKBLFT2XcOtQGsBTreNKJv/iyJjHx0ver7OOw OFEeD60jT1eEpgskhnpAQB2IXgNFEqzmd9kcCRnq+zXVnuVs7KfAQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399gh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:05 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME298k032273; Mon, 22 Dec 2025 16:51:04 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b68u0xvbn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:04 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp0cW54395238 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:00 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A46220043; Mon, 22 Dec 2025 16:51:00 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6AFA820040; Mon, 22 Dec 2025 16:50:59 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:50:59 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 19/28] KVM: s390: New gmap code Date: Mon, 22 Dec 2025 17:50:24 +0100 Message-ID: <20251222165033.162329-20-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: nX0DfW8Ac91zzAFtbBDslbUbqXhCWmcy X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX3UoocdmrDvAA +YGVTsUX6mG0YD5dN6VBLplpoZWB2TtTqhWK4o6046kGRRZPtMzF5yLzx2dXXaAanFbLedkoq5t HKSbTOA7FwiR0j5qHKO+grcxWnJz9izonxd/5BrvrNyEHMa2v8s+eo99tdFWQ+ODVmUWMUKkb/v 9WCagjY21Lrfdnm/XXcRINK+zAtwQCoOoQaU587NVGKh4w017VokJtzFKs9VEf0UtMbiZTi+lG6 jb5FituKEQ7NcuaahE/yn0ZM/sMjLxj4CTbHSGXoow54e5pkVQ19Ot4r85NOS+w//pZZohRBQbd CYpvfuP30BePbhUAlaEdKHON5MLNlsfkTVrZB9rM/DecFSf0NCzw799qM6n8zvexpOPG30MRihB DLljt+b96ebb0jWYo8FYUZVENZMls3hj/n9c0keABnY2xoC4MCgszVgVA/yPmKvOLUw2aXjRyky TjaPovC1vORqJnavrPQ== X-Proofpoint-ORIG-GUID: nX0DfW8Ac91zzAFtbBDslbUbqXhCWmcy X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976f9 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=yW6iSVRhXEzEGwDW6R4A:9 a=jwbIiiTHM2yhIzWu:21 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" New gmap (guest map) code. This new gmap code will only be used by KVM. This will replace the existing gmap. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/gmap.c | 1083 ++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/gmap.h | 240 +++++++++ 3 files changed, 1324 insertions(+), 1 deletion(-) create mode 100644 arch/s390/kvm/gmap.c create mode 100644 arch/s390/kvm/gmap.h diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 84315d2f75fb..21088265402c 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o +kvm-y +=3D dat.o gmap.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c new file mode 100644 index 000000000000..f4f47e5fc3d5 --- /dev/null +++ b/arch/s390/kvm/gmap.c @@ -0,0 +1,1083 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Guest memory management for KVM/s390 + * + * Copyright IBM Corp. 2008, 2020, 2024 + * + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dat.h" +#include "gmap.h" +#include "kvm-s390.h" + +static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.sie_block->prog0c & PROG_IN_SIE; +} + +static int gmap_limit_to_type(gfn_t limit) +{ + if (!limit) + return TABLE_TYPE_REGION1; + if (limit <=3D _REGION3_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_SEGMENT; + if (limit <=3D _REGION2_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION3; + if (limit <=3D _REGION1_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION2; + return TABLE_TYPE_REGION1; +} + +/** + * gmap_new() - allocate and initialize a guest address space + * @kvm: the kvm owning the guest + * @limit: maximum address of the gmap address space + * + * Returns a guest address space structure. + */ +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit) +{ + struct crst_table *table; + struct gmap *gmap; + int type; + + type =3D gmap_limit_to_type(limit); + + gmap =3D kzalloc(sizeof(*gmap), GFP_KERNEL_ACCOUNT); + if (!gmap) + return NULL; + INIT_LIST_HEAD(&gmap->children); + INIT_LIST_HEAD(&gmap->list); + INIT_LIST_HEAD(&gmap->scb_users); + INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_KVM_S390_MMU_CACHE); + spin_lock_init(&gmap->children_lock); + spin_lock_init(&gmap->host_to_rmap_lock); + refcount_set(&gmap->refcount, 1); + + table =3D dat_alloc_crst_sleepable(_CRSTE_EMPTY(type).val); + if (!table) { + kfree(gmap); + return NULL; + } + + gmap->asce.val =3D __pa(table); + gmap->asce.dt =3D type; + gmap->asce.tl =3D _ASCE_TABLE_LENGTH; + gmap->asce.x =3D 1; + gmap->asce.p =3D 1; + gmap->asce.s =3D 1; + gmap->kvm =3D kvm; + set_bit(GMAP_FLAG_OWNS_PAGETABLES, &gmap->flags); + + return gmap; +} + +static void gmap_add_child(struct gmap *parent, struct gmap *child) +{ + KVM_BUG_ON(is_ucontrol(parent) && parent->parent, parent->kvm); + KVM_BUG_ON(is_ucontrol(parent) && !owns_page_tables(parent), parent->kvm); + KVM_BUG_ON(!refcount_read(&child->refcount), parent->kvm); + lockdep_assert_held(&parent->children_lock); + + child->parent =3D parent; + + if (is_ucontrol(parent)) + set_bit(GMAP_FLAG_IS_UCONTROL, &child->flags); + else + clear_bit(GMAP_FLAG_IS_UCONTROL, &child->flags); + + if (test_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &parent->flags)) + set_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &child->flags); + else + clear_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &child->flags); + + if (kvm_is_ucontrol(parent->kvm)) + clear_bit(GMAP_FLAG_OWNS_PAGETABLES, &child->flags); + list_add(&child->list, &parent->children); +} + +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit) +{ + struct gmap *res; + + lockdep_assert_not_held(&parent->children_lock); + res =3D gmap_new(parent->kvm, limit); + if (res) { + scoped_guard(spinlock, &parent->children_lock) + gmap_add_child(parent, res); + } + return res; +} + +int gmap_set_limit(struct gmap *gmap, gfn_t limit) +{ + struct kvm_s390_mmu_cache *mc; + int rc, type; + + type =3D gmap_limit_to_type(limit); + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + rc =3D dat_set_asce_limit(mc, &gmap->asce, type); + } while (rc =3D=3D -ENOMEM); + + kvm_s390_free_mmu_cache(mc); + return 0; +} + +static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct radix_tree_iter iter; + unsigned long indices[16]; + unsigned long index; + void __rcu **slot; + int i, nr; + + /* A radix tree is freed by deleting all of its entries */ + index =3D 0; + do { + nr =3D 0; + radix_tree_for_each_slot(slot, root, &iter, index) { + indices[nr] =3D iter.index; + if (++nr =3D=3D 16) + break; + } + for (i =3D 0; i < nr; i++) { + index =3D indices[i]; + head =3D radix_tree_delete(root, index); + gmap_for_each_rmap_safe(rmap, rnext, head) + kfree(rmap); + } + } while (nr > 0); +} + +void gmap_remove_child(struct gmap *child) +{ + if (KVM_BUG_ON(!child->parent, child->kvm)) + return; + lockdep_assert_held(&child->parent->children_lock); + + list_del(&child->list); + child->parent =3D NULL; +} + +/** + * gmap_dispose - remove and free a guest address space and its children + * @gmap: pointer to the guest address space structure + */ +void gmap_dispose(struct gmap *gmap) +{ + /* The gmap must have been removed from the parent beforehands */ + KVM_BUG_ON(gmap->parent, gmap->kvm); + /* All children of this gmap must have been removed beforehands*/ + KVM_BUG_ON(!list_empty(&gmap->children), gmap->kvm); + /* No VSIE shadow block is allowed to use this gmap */ + KVM_BUG_ON(!list_empty(&gmap->scb_users), gmap->kvm); + /* The ASCE must be valid */ + KVM_BUG_ON(!gmap->asce.val, gmap->kvm); + /* The refcount must be 0 */ + KVM_BUG_ON(refcount_read(&gmap->refcount), gmap->kvm); + + /* Flush tlb of all gmaps */ + asce_flush_tlb(gmap->asce); + + /* Free all DAT tables. */ + dat_free_level(dereference_asce(gmap->asce), owns_page_tables(gmap)); + + /* Free additional data for a shadow gmap */ + if (is_shadow(gmap)) + gmap_rmap_radix_tree_free(&gmap->host_to_rmap); + + kfree(gmap); +} + +/** + * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py + * @gmap: the gmap whose ASCE needs to be replaced + * + * If the ASCE is a SEGMENT type then this function will return -EINVAL, + * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng + * to the wrong pages, causing use-after-free and memory corruption. + * If the allocation of the new top level page table fails, the ASCE is not + * replaced. + * In any case, the old ASCE is always removed from the gmap CRST list. + * Therefore the caller has to make sure to save a pointer to it + * beforehand, unless a leak is actually intended. + */ +int s390_replace_asce(struct gmap *gmap) +{ + struct crst_table *table; + union asce asce; + + /* Replacing segment type ASCEs would cause serious issues */ + if (gmap->asce.dt =3D=3D ASCE_TYPE_SEGMENT) + return -EINVAL; + + table =3D dat_alloc_crst_sleepable(0); + if (!table) + return -ENOMEM; + memcpy(table, dereference_asce(gmap->asce), sizeof(*table)); + + /* Set new table origin while preserving existing ASCE control bits */ + asce =3D gmap->asce; + asce.rsto =3D virt_to_pfn(table); + WRITE_ONCE(gmap->asce, asce); + + return 0; +} + +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint) +{ + struct kvm *kvm =3D gmap->kvm; + struct kvm_vcpu *vcpu; + gfn_t prefix_gfn; + unsigned long i; + + if (is_shadow(gmap)) + return false; + kvm_for_each_vcpu(i, vcpu, kvm) { + /* match against both prefix pages */ + prefix_gfn =3D gpa_to_gfn(kvm_s390_get_prefix(vcpu)); + if (prefix_gfn < end && gfn <=3D prefix_gfn + 1) { + if (hint && kvm_s390_is_in_sie(vcpu)) + return false; + VCPU_EVENT(vcpu, 2, "gmap notifier for %llx-%llx", + gfn_to_gpa(gfn), gfn_to_gpa(end)); + kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); + } + } + return true; +} + +struct clear_young_pte_priv { + struct gmap *gmap; + bool young; +}; + +static long gmap_clear_young_pte(union pte *ptep, gfn_t gfn, gfn_t end, st= ruct dat_walk *walk) +{ + struct clear_young_pte_priv *p =3D walk->priv; + union pgste pgste; + union pte pte, new; + + pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (!pte.s.y && pte.h.i)) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (!pgste.prefix_notif || gmap_mkold_prefix(p->gmap, gfn, end)) { + new =3D pte; + new.h.i =3D 1; + new.s.y =3D 0; + if ((new.s.d || !new.h.p) && !new.s.s) + folio_set_dirty(pfn_folio(pte.h.pfra)); + new.s.d =3D 0; + new.h.p =3D 1; + + pgste.prefix_notif =3D 0; + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, walk->asce, uses_skeys(= p->gmap)); + } + p->young =3D 1; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t e= nd, struct dat_walk *walk) +{ + struct clear_young_pte_priv *priv =3D walk->priv; + union crste crste, new; + + crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc) + return 0; + if (!crste.s.fc1.y && crste.h.i) + return 0; + if (!crste_prefix(crste) || gmap_mkold_prefix(priv->gmap, gfn, end)) { + new =3D crste; + new.h.i =3D 1; + new.s.fc1.y =3D 0; + new.s.fc1.prefix_notif =3D 0; + if (new.s.fc1.d || !new.h.p) + folio_set_dirty(phys_to_folio(crste_origin_large(crste))); + new.s.fc1.d =3D 0; + new.h.p =3D 1; + dat_crstep_xchg(crstep, new, gfn, walk->asce); + } + priv->young =3D 1; + return 0; +} + +/** + * gmap_age_gfn() - clear young + * @gmap: the guest gmap + * @start: the first gfn to test + * @end: the gfn after the last one to test + * + * Context: called with the kvm mmu write lock held + * Return: 1 if any page in the given range was young, otherwise 0. + */ +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D gmap_clear_young_pte, + .pmd_entry =3D gmap_clear_young_crste, + .pud_entry =3D gmap_clear_young_crste, + }; + struct clear_young_pte_priv priv =3D { + .gmap =3D gmap, + .young =3D false, + }; + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + + return priv.young; +} + +struct gmap_unmap_priv { + struct gmap *gmap; + struct kvm_memory_slot *slot; +}; + +static long _gmap_unmap_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct= dat_walk *w) +{ + struct gmap_unmap_priv *priv =3D w->priv; + struct folio *folio =3D NULL; + unsigned long vmaddr; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (ptep->s.pr && pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED) { + vmaddr =3D __gfn_to_hva_memslot(priv->slot, gfn); + gmap_helper_try_set_pte_unused(priv->gmap->kvm->mm, vmaddr); + } + if (ptep->s.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap->flags)) + folio =3D pfn_folio(ptep->h.pfra); + pgste =3D gmap_ptep_xchg(priv->gmap, ptep, _PTE_EMPTY, pgste, gfn); + pgste_set_unlock(ptep, pgste); + if (folio) + uv_convert_from_secure_folio(folio); + + return 0; +} + +static long _gmap_unmap_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap_unmap_priv *priv =3D walk->priv; + struct folio *folio =3D NULL; + + if (crstep->h.fc) { + if (crstep->s.fc1.pr && test_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &priv->gmap-= >flags)) + folio =3D phys_to_folio(crste_origin_large(*crstep)); + gmap_crstep_xchg(priv->gmap, crstep, _CRSTE_EMPTY(crstep->h.tt), gfn); + if (folio) + uv_convert_from_secure_folio(folio); + } + + return 0; +} + +/** + * gmap_unmap_gfn_range() - Unmap a range of guest addresses + * @gmap: the gmap to act on + * @slot: the memslot in which the range is located + * @start: the first gfn to unmap + * @end: the gfn after the last one to unmap + * + * Context: called with the kvm mmu write lock held + * Return: false + */ +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _gmap_unmap_pte, + .pmd_entry =3D _gmap_unmap_crste, + .pud_entry =3D _gmap_unmap_crste, + }; + struct gmap_unmap_priv priv =3D { + .gmap =3D gmap, + .slot =3D slot, + }; + + lockdep_assert_held_write(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + return false; +} + +static union pgste __pte_test_and_clear_softdirty(union pte *ptep, union p= gste pgste, gfn_t gfn, + struct gmap *gmap) +{ + union pte pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (pte.h.p && !pte.s.sd)) + return pgste; + + /* + * If this page contains one or more prefixes of vCPUS that are currently + * running, do not reset the protection, leave it marked as dirty. + */ + if (!pgste.prefix_notif || gmap_mkold_prefix(gmap, gfn, gfn + 1)) { + pte.h.p =3D 1; + pte.s.sd =3D 0; + pgste =3D gmap_ptep_xchg(gmap, ptep, pte, pgste, gfn); + } + + mark_page_dirty(gmap->kvm, gfn); + + return pgste; +} + +static long _pte_test_and_clear_softdirty(union pte *ptep, gfn_t gfn, gfn_= t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __pte_test_and_clear_softdirty(ptep, pgste, gfn, gmap); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long _crste_test_and_clear_softdirty(union crste *table, gfn_t gfn,= gfn_t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, new; + + if (fatal_signal_pending(current)) + return 1; + crste =3D READ_ONCE(*table); + if (!crste.h.fc) + return 0; + if (crste.h.p && !crste.s.fc1.sd) + return 0; + + /* + * If this large page contains one or more prefixes of vCPUs that are + * currently running, do not reset the protection, leave it marked as + * dirty. + */ + if (!crste.s.fc1.prefix_notif || gmap_mkold_prefix(gmap, gfn, end)) { + new =3D crste; + new.h.p =3D 1; + new.s.fc1.sd =3D 0; + gmap_crstep_xchg(gmap, table, new, gfn); + } + + for ( ; gfn < end; gfn++) + mark_page_dirty(gmap->kvm, gfn); + + return 0; +} + +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops walk_ops =3D { + .pte_entry =3D _pte_test_and_clear_softdirty, + .pmd_entry =3D _crste_test_and_clear_softdirty, + .pud_entry =3D _crste_test_and_clear_softdirty, + }; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &walk_ops, 0, gmap); +} + +static int gmap_handle_minor_crste_fault(union asce asce, struct guest_fau= lt *f) +{ + union crste newcrste, oldcrste =3D READ_ONCE(*f->crstep); + + /* Somehow the crste is not large anymore, let the slow path deal with it= */ + if (!oldcrste.h.fc) + return 1; + + f->pfn =3D PHYS_PFN(large_crste_to_phys(oldcrste, f->gfn)); + f->writable =3D oldcrste.s.fc1.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldcrste.h.i && !(f->write_attempt && oldcrste.h.p)) + return 0; + + if (!f->write_attempt || oldcrste.s.fc1.w) { + f->write_attempt |=3D oldcrste.s.fc1.w && oldcrste.s.fc1.d; + newcrste =3D oldcrste; + newcrste.h.i =3D 0; + newcrste.s.fc1.y =3D 1; + if (f->write_attempt) { + newcrste.h.p =3D 0; + newcrste.s.fc1.d =3D 1; + newcrste.s.fc1.sd =3D 1; + } + if (!oldcrste.s.fc1.d && newcrste.s.fc1.d) + SetPageDirty(phys_to_page(crste_origin_large(newcrste))); + /* In case of races, let the slow path deal with it */ + return !dat_crstep_xchg_atomic(f->crstep, oldcrste, newcrste, f->gfn, as= ce); + } + /* Trying to write on a read-only page, let the slow path deal with it */ + return 1; +} + +static int _gmap_handle_minor_pte_fault(struct gmap *gmap, union pgste *pg= ste, + struct guest_fault *f) +{ + union pte newpte, oldpte =3D READ_ONCE(*f->ptep); + + f->pfn =3D oldpte.h.pfra; + f->writable =3D oldpte.s.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldpte.h.i && !(f->write_attempt && oldpte.h.p)) + return 0; + /* Trying to write on a read-only page, let the slow path deal with it */ + if (!oldpte.s.pr || (f->write_attempt && !oldpte.s.w)) + return 1; + + newpte =3D oldpte; + newpte.h.i =3D 0; + newpte.s.y =3D 1; + if (f->write_attempt) { + newpte.h.p =3D 0; + newpte.s.d =3D 1; + newpte.s.sd =3D 1; + } + if (!oldpte.s.d && newpte.s.d) + SetPageDirty(pfn_to_page(newpte.h.pfra)); + *pgste =3D gmap_ptep_xchg(gmap, f->ptep, newpte, *pgste, f->gfn); + + return 0; +} + +/** + * gmap_try_fixup_minor() -- Try to fixup a minor gmap fault. + * @gmap: the gmap whose fault needs to be resolved. + * @fault: describes the fault that is being resolved. + * + * A minor fault is a fault that can be resolved quickly within gmap. + * The page is already mapped, the fault is only due to dirty/young tracki= ng. + * + * Return: 0 in case of success, < 0 in case of error, > 0 if the fault co= uld + * not be resolved and needs to go through the slow path. + */ +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault) +{ + union pgste pgste; + int rc; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(NULL, fault->gfn, gmap->asce, DAT_WALK_LEAF, TABLE_= TYPE_PAGE_TABLE, + &fault->crstep, &fault->ptep); + /* If a PTE or a leaf CRSTE could not be reached, slow path */ + if (rc) + return 1; + + if (fault->ptep) { + pgste =3D pgste_get_lock(fault->ptep); + rc =3D _gmap_handle_minor_pte_fault(gmap, &pgste, fault); + if (!rc && fault->callback) + fault->callback(fault); + pgste_set_unlock(fault->ptep, pgste); + } else { + rc =3D gmap_handle_minor_crste_fault(gmap->asce, fault); + if (!rc && fault->callback) + fault->callback(fault); + } + return rc; +} + +static inline bool gmap_2g_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) +{ + unsigned int order; + int rc, level; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + level =3D TABLE_TYPE_PAGE_TABLE; + if (f->page) { + order =3D folio_order(page_folio(f->page)); + if (order >=3D get_order(_REGION3_SIZE) && gmap_2g_allowed(gmap, f->gfn)) + level =3D TABLE_TYPE_REGION3; + else if (order >=3D get_order(_SEGMENT_SIZE) && gmap_1m_allowed(gmap, f-= >gfn)) + level =3D TABLE_TYPE_SEGMENT; + } + rc =3D dat_link(mc, gmap->asce, level, uses_skeys(gmap), f); + KVM_BUG_ON(rc =3D=3D -EINVAL, gmap->kvm); + return rc; +} + +static int gmap_ucas_map_one(struct kvm_s390_mmu_cache *mc, struct gmap *g= map, + gfn_t p_gfn, gfn_t c_gfn) +{ + struct page_table *pt; + union crste *crstep; + union pte *ptep; + int rc; + + guard(read_lock)(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(mc, p_gfn, gmap->parent->asce, DAT_WALK_ALLOC, TABL= E_TYPE_PAGE_TABLE, + &crstep, &ptep); + if (rc) + return rc; + pt =3D pte_table_start(ptep); + dat_set_ptval(pt, PTVAL_VMADDR, p_gfn >> (_SEGMENT_SHIFT - PAGE_SHIFT)); + + rc =3D dat_entry_walk(mc, c_gfn, gmap->asce, DAT_WALK_ALLOC, TABLE_TYPE_S= EGMENT, + &crstep, &ptep); + if (rc) + return rc; + dat_crstep_xchg(crstep, _crste_fc0(virt_to_pfn(pt), TABLE_TYPE_SEGMENT), = c_gfn, gmap->asce); + return 0; +} + +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count) +{ + struct kvm_s390_mmu_cache *mc; + int rc; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + while (count) { + rc =3D gmap_ucas_map_one(mc, gmap, p_gfn, c_gfn); + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + break; + continue; + } + if (rc) + break; + + count--; + c_gfn +=3D _PAGE_ENTRIES; + p_gfn +=3D _PAGE_ENTRIES; + } + return rc; +} + +static void gmap_ucas_unmap_one(struct gmap *gmap, gfn_t c_gfn) +{ + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, c_gfn, gmap->asce, 0, TABLE_TYPE_SEGMENT, &cr= step, &ptep); + if (!rc) + dat_crstep_xchg(crstep, _PMD_EMPTY, c_gfn, gmap->asce); +} + +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count) +{ + guard(read_lock)(&gmap->kvm->mmu_lock); + + for ( ; count; count--, c_gfn +=3D _PAGE_ENTRIES) + gmap_ucas_unmap_one(gmap, c_gfn); +} + +static long _gmap_split_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, newcrste; + + crste =3D READ_ONCE(*crstep); + newcrste =3D _CRSTE_EMPTY(crste.h.tt); + + while (crste_leaf(crste)) { + if (crste_prefix(crste)) + gmap_unmap_prefix(gmap, gfn, next); + if (crste.s.fc1.vsie_notif) + gmap_handle_vsie_unshadow_event(gmap, gfn); + if (dat_crstep_xchg_atomic(crstep, crste, newcrste, gfn, walk->asce)) + break; + crste =3D READ_ONCE(*crstep); + } + + if (need_resched()) + return next; + + return 0; +} + +void gmap_split_huge_pages(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { + .pmd_entry =3D _gmap_split_crste, + .pud_entry =3D _gmap_split_crste, + }; + gfn_t start =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, asce_end(gmap->asce), gmap->asce, + &ops, DAT_WALK_IGN_HOLES, gmap); + cond_resched(); + } while (start); +} + +static int _gmap_enable_skeys(struct gmap *gmap) +{ + gfn_t start =3D 0; + int rc; + + if (uses_skeys(gmap)) + return 0; + + set_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); + rc =3D gmap_helper_disable_cow_sharing(); + if (rc) { + clear_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); + return rc; + } + + do { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + start =3D dat_reset_skeys(gmap->asce, start); + cond_resched(); + } while (start); + return 0; +} + +int gmap_enable_skeys(struct gmap *gmap) +{ + int rc; + + mmap_write_lock(gmap->kvm->mm); + rc =3D _gmap_enable_skeys(gmap); + mmap_write_unlock(gmap->kvm->mm); + return rc; +} + +static long _destroy_pages_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + if (!ptep->s.pr) + return 0; + __kvm_s390_pv_destroy_page(phys_to_page(pte_origin(*ptep))); + if (need_resched()) + return next; + return 0; +} + +static long _destroy_pages_crste(union crste *crstep, gfn_t gfn, gfn_t nex= t, struct dat_walk *walk) +{ + phys_addr_t origin, cur, end; + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + origin =3D crste_origin_large(*crstep); + cur =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + for ( ; cur < end; cur +=3D PAGE_SIZE) + __kvm_s390_pv_destroy_page(phys_to_page(cur)); + if (need_resched()) + return next; + return 0; +} + +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _destroy_pages_pte, + .pmd_entry =3D _destroy_pages_crste, + .pud_entry =3D _destroy_pages_crste, + }; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, end, gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + if (interruptible && fatal_signal_pending(current)) + return -EINTR; + cond_resched(); + } while (start && start < end); + return 0; +} + +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level) +{ + struct vsie_rmap *rmap __free(kvfree) =3D NULL; + struct vsie_rmap *temp; + void __rcu **slot; + int rc =3D 0; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + lockdep_assert_held(&sg->host_to_rmap_lock); + + rmap =3D kzalloc(sizeof(*rmap), GFP_ATOMIC); + if (!rmap) + return -ENOMEM; + + rmap->r_gfn =3D r_gfn; + rmap->level =3D level; + slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, p_gfn); + if (slot) { + rmap->next =3D radix_tree_deref_slot_protected(slot, &sg->host_to_rmap_l= ock); + for (temp =3D rmap->next; temp; temp =3D temp->next) { + if (temp->val =3D=3D rmap->val) + return 0; + } + radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); + } else { + rmap->next =3D NULL; + rc =3D radix_tree_insert(&sg->host_to_rmap, p_gfn, rmap); + if (rc) + return rc; + } + rmap =3D NULL; + + return 0; +} + +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + union pte pte; + int flags, rc; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + lockdep_assert_held(&sg->parent->children_lock); + + flags =3D DAT_WALK_SPLIT_ALLOC | (uses_skeys(sg->parent) ? DAT_WALK_USES_= SKEYS : 0); + rc =3D dat_entry_walk(mc, p_gfn, sg->parent->asce, flags, + TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) + return rc; + if (level <=3D TABLE_TYPE_REGION1) { + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, p_gfn, r_gfn, level); + } + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep); + pte =3D ptep->s.pr ? *ptep : _pte(pfn, wr, false, false); + pte.h.p =3D 1; + pgste =3D _gmap_ptep_xchg(sg->parent, ptep, pte, pgste, p_gfn, false); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + __atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val); + if (need_resched()) + return next; + return 0; +} + +void gmap_set_cmma_all_dirty(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __set_cmma_dirty_pte, = }; + gfn_t gfn =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + gfn =3D _dat_walk_gfn_range(gfn, asce_end(gmap->asce), gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + cond_resched(); + } while (gfn); +} + +static void gmap_unshadow_level(struct gmap *sg, gfn_t r_gfn, int level) +{ + unsigned long align =3D PAGE_SIZE; + gpa_t gaddr =3D gfn_to_gpa(r_gfn); + union crste *crstep; + union crste crste; + union pte *ptep; + + if (level > TABLE_TYPE_PAGE_TABLE) + align =3D 1UL << (11 * level + _SEGMENT_SHIFT); + kvm_s390_vsie_gmap_notifier(sg, ALIGN_DOWN(gaddr, align), ALIGN(gaddr + 1= , align)); + if (dat_entry_walk(NULL, r_gfn, sg->asce, 0, level, &crstep, &ptep)) + return; + if (ptep) { + dat_ptep_xchg(ptep, _PTE_EMPTY, r_gfn, sg->asce, uses_skeys(sg)); + return; + } + crste =3D READ_ONCE(*crstep); + dat_crstep_clear(crstep, r_gfn, sg->asce); + if (is_pmd(crste)) + dat_free_pt(dereference_pmd(crste.pmd)); + else + dat_free_level(dereference_crste(crste), true); +} + +static void gmap_unshadow(struct gmap *sg) +{ + struct gmap_cache *gmap_cache, *next; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + KVM_BUG_ON(!sg->parent, sg->kvm); + + lockdep_assert_held(&sg->parent->children_lock); + + gmap_remove_child(sg); + kvm_s390_vsie_gmap_notifier(sg, 0, -1UL); + + list_for_each_entry_safe(gmap_cache, next, &sg->scb_users, list) { + gmap_cache->gmap =3D NULL; + list_del(&gmap_cache->list); + } + + gmap_put(sg); +} + +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct gmap *sg, *next; + gfn_t start, end; + + list_for_each_entry_safe(sg, next, &parent->children, list) { + start =3D sg->guest_asce.rsto; + end =3D start + sg->guest_asce.tl + 1; + if (!sg->guest_asce.r && gfn >=3D start && gfn < end) { + gmap_unshadow(sg); + continue; + } + scoped_guard(spinlock, &sg->host_to_rmap_lock) + head =3D radix_tree_delete(&sg->host_to_rmap, gfn); + gmap_for_each_rmap_safe(rmap, rnext, head) + gmap_unshadow_level(sg, rmap->r_gfn, rmap->level); + } +} + +/** + * gmap_find_shadow - find a specific asce in the list of shadow tables + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * Returns the pointer to a gmap if a shadow table with the given asce is + * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, + * otherwise NULL + * + * Context: Called with parent->children_lock held + */ +static struct gmap *gmap_find_shadow(struct gmap *parent, union asce asce,= int edat_level) +{ + struct gmap *sg; + + lockdep_assert_held(&parent->children_lock); + list_for_each_entry(sg, &parent->children, list) { + if (!gmap_is_shadow_valid(sg, asce, edat_level)) + continue; + return sg; + } + return NULL; +} + +static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) +{ + KVM_BUG_ON(1, sg->kvm); + return -EINVAL; +} + +/** + * gmap_create_shadow() - create/find a shadow guest address space + * @mc: the cache to use to allocate dat tables + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * The pages of the top level page table referred by the asce parameter + * will be set to read-only and marked in the PGSTEs of the kvm process. + * The shadow table will be removed automatically on any change to the + * PTE mapping for the source table. + * + * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, + * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the + * parent gmap table could not be protected. + */ +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *parent, + union asce asce, int edat_level) +{ + struct gmap *sg, *new; + int rc; + + scoped_guard(spinlock, &parent->children_lock) + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) + return sg; + /* Create a new shadow gmap */ + new =3D gmap_new(parent->kvm, asce.r ? 1UL << (64 - PAGE_SHIFT) : asce_en= d(asce)); + if (!new) + return ERR_PTR(-ENOMEM); + new->guest_asce =3D asce; + new->edat_level =3D edat_level; + set_bit(GMAP_FLAG_SHADOW, &new->flags); + + scoped_guard(spinlock, &parent->children_lock) { + /* Recheck if another CPU created the same shadow */ + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) { + gmap_put(new); + return sg; + } + if (asce.r) { + /* only allow one real-space gmap shadow */ + list_for_each_entry(sg, &parent->children, list) { + if (sg->guest_asce.r) { + scoped_guard(write_lock, &parent->kvm->mmu_lock) + gmap_unshadow(sg); + break; + } + } + gmap_add_child(parent, new); + /* nothing to protect, return right away */ + return new; + } + } + + new->parent =3D parent; + /* protect while inserting, protects against invalidation races */ + rc =3D gmap_protect_asce_top_level(mc, new); + if (rc) { + new->parent =3D NULL; + gmap_put(new); + return ERR_PTR(rc); + } + return new; +} diff --git a/arch/s390/kvm/gmap.h b/arch/s390/kvm/gmap.h new file mode 100644 index 000000000000..f217d560234a --- /dev/null +++ b/arch/s390/kvm/gmap.h @@ -0,0 +1,240 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2016, 2025 + * Author(s): Martin Schwidefsky + * Claudio Imbrenda + */ + +#ifndef ARCH_KVM_S390_GMAP_H +#define ARCH_KVM_S390_GMAP_H + +#include "dat.h" + +/** + * enum gmap_flags - Flags of a gmap + * + * @GMAP_FLAG_SHADOW: the gmap is a vsie shadow gmap. + * @GMAP_FLAG_OWNS_PAGETABLES: the gmap owns all dat levels; normally 1, i= s 0 + * only for ucontrol per-cpu gmaps, since they + * share the page tables with the main gmap. + * @GMAP_FLAG_IS_UCONTROL: the gmap is ucontrol (main gmap or per-cpu gmap= ). + * @GMAP_FLAG_ALLOW_HPAGE_1M: 1M hugepages are allowed for this gmap, + * independently of the page size used by users= pace. + * @GMAP_FLAG_ALLOW_HPAGE_2G: 2G hugepages are allowed for this gmap, + * independently of the page size used by users= pace. + * @GMAP_FLAG_PFAULT_ENABLED: pfault is enabled for the gmap. + * @GMAP_FLAG_USES_SKEYS: if the guest uses storage keys. + * @GMAP_FLAG_USES_CMM: whether the guest uses CMMA. + * @GMAP_FLAG_EXPORT_ON_UNMAP: whether to export guest pages when unmappin= g. + */ +enum gmap_flags { + GMAP_FLAG_SHADOW =3D 0, + GMAP_FLAG_OWNS_PAGETABLES, + GMAP_FLAG_IS_UCONTROL, + GMAP_FLAG_ALLOW_HPAGE_1M, + GMAP_FLAG_ALLOW_HPAGE_2G, + GMAP_FLAG_PFAULT_ENABLED, + GMAP_FLAG_USES_SKEYS, + GMAP_FLAG_USES_CMM, + GMAP_FLAG_EXPORT_ON_UNMAP, +}; + +/** + * struct gmap_struct - guest address space + * @flags: GMAP_FLAG_* flags + * @edat_level: the edat level of this shadow gmap + * @kvm: the vm + * @asce: the ASCE used by this gmap + * @list: list head used in children gmaps for the children gmap list + * @children_lock: protects children and scb_users + * @children: list of child gmaps of this gmap + * @scb_users: list of vsie_scb that use this shadow gmap + * @parent: parent gmap of a child gmap + * @guest_asce: original ASCE of this shadow gmap + * @host_to_rmap_lock: protects host_to_rmap + * @host_to_rmap: radix tree mapping host addresses to guest addresses + */ +struct gmap { + unsigned long flags; + unsigned char edat_level; + struct kvm *kvm; + union asce asce; + struct list_head list; + spinlock_t children_lock; /* protects: children, scb_users */ + struct list_head children; + struct list_head scb_users; + struct gmap *parent; + union asce guest_asce; + spinlock_t host_to_rmap_lock; /* protects host_to_rmap */ + struct radix_tree_root host_to_rmap; + refcount_t refcount; +}; + +struct gmap_cache { + struct list_head list; + struct gmap *gmap; +}; + +#define gmap_for_each_rmap_safe(pos, n, head) \ + for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) + +int s390_replace_asce(struct gmap *gmap); +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint= ); +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end); +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end); +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault); +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit); +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit); +void gmap_remove_child(struct gmap *child); +void gmap_dispose(struct gmap *gmap); +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *fault); +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end); +int gmap_set_limit(struct gmap *gmap, gfn_t limit); +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count); +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count); +int gmap_enable_skeys(struct gmap *gmap); +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible); +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level); +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr); +void gmap_set_cmma_all_dirty(struct gmap *gmap); +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn); +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *gmap, + union asce asce, int edat_level); +void gmap_split_huge_pages(struct gmap *gmap); + +static inline bool uses_skeys(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags); +} + +static inline bool uses_cmm(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_USES_CMM, &gmap->flags); +} + +static inline bool pfault_enabled(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_PFAULT_ENABLED, &gmap->flags); +} + +static inline bool is_ucontrol(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_IS_UCONTROL, &gmap->flags); +} + +static inline bool is_shadow(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_SHADOW, &gmap->flags); +} + +static inline bool owns_page_tables(struct gmap *gmap) +{ + return test_bit(GMAP_FLAG_OWNS_PAGETABLES, &gmap->flags); +} + +static inline struct gmap *gmap_put(struct gmap *gmap) +{ + if (refcount_dec_and_test(&gmap->refcount)) + gmap_dispose(gmap); + return NULL; +} + +static inline void gmap_get(struct gmap *gmap) +{ + WARN_ON_ONCE(unlikely(!refcount_inc_not_zero(&gmap->refcount))); +} + +static inline void gmap_handle_vsie_unshadow_event(struct gmap *parent, gf= n_t gfn) +{ + scoped_guard(spinlock, &parent->children_lock) + _gmap_handle_vsie_unshadow_event(parent, gfn); +} + +static inline bool gmap_mkold_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, true); +} + +static inline bool gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, false); +} + +static inline union pgste _gmap_ptep_xchg(struct gmap *gmap, union pte *pt= ep, union pte newpte, + union pgste pgste, gfn_t gfn, bool needs_lock) +{ + lockdep_assert_held(&gmap->kvm->mmu_lock); + if (!needs_lock) + lockdep_assert_held(&gmap->children_lock); + + if (pgste.prefix_notif && (newpte.h.p || newpte.h.i)) { + pgste.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + 1); + } + if (pgste.vsie_notif && (ptep->h.p !=3D newpte.h.p || newpte.h.i)) { + pgste.vsie_notif =3D 0; + if (needs_lock) + gmap_handle_vsie_unshadow_event(gmap, gfn); + else + _gmap_handle_vsie_unshadow_event(gmap, gfn); + } + return __dat_ptep_xchg(ptep, pgste, newpte, gfn, gmap->asce, uses_skeys(g= map)); +} + +static inline union pgste gmap_ptep_xchg(struct gmap *gmap, union pte *pte= p, union pte newpte, + union pgste pgste, gfn_t gfn) +{ + return _gmap_ptep_xchg(gmap, ptep, newpte, pgste, gfn, true); +} + +static inline void _gmap_crstep_xchg(struct gmap *gmap, union crste *crste= p, union crste ne, + gfn_t gfn, bool needs_lock) +{ + unsigned long align =3D 8 + (is_pmd(*crstep) ? 0 : 11); + + lockdep_assert_held(&gmap->kvm->mmu_lock); + if (!needs_lock) + lockdep_assert_held(&gmap->children_lock); + + gfn =3D ALIGN_DOWN(gfn, align); + if (crste_prefix(*crstep) && (ne.h.p || ne.h.i || !crste_prefix(ne))) { + ne.s.fc1.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + align); + } + if (crste_leaf(*crstep) && crstep->s.fc1.vsie_notif && + (ne.h.p || ne.h.i || !ne.s.fc1.vsie_notif)) { + ne.s.fc1.vsie_notif =3D 0; + if (needs_lock) + gmap_handle_vsie_unshadow_event(gmap, gfn); + else + _gmap_handle_vsie_unshadow_event(gmap, gfn); + } + dat_crstep_xchg(crstep, ne, gfn, gmap->asce); +} + +static inline void gmap_crstep_xchg(struct gmap *gmap, union crste *crstep= , union crste ne, + gfn_t gfn) +{ + return _gmap_crstep_xchg(gmap, crstep, ne, gfn, true); +} + +/** + * gmap_is_shadow_valid() - check if a shadow guest address space matches = the + * given properties and is still valid + * @sg: pointer to the shadow guest address space structure + * @asce: ASCE for which the shadow table is requested + * @edat_level: edat level to be used for the shadow translation + * + * Returns true if the gmap shadow is still valid and matches the given + * properties, the caller can continue using it. Returns false otherwise; = the + * caller has to request a new shadow gmap in this case. + */ +static inline bool gmap_is_shadow_valid(struct gmap *sg, union asce asce, = int edat_level) +{ + return sg->guest_asce.val =3D=3D asce.val && sg->edat_level =3D=3D edat_l= evel; +} + +#endif /* ARCH_KVM_S390_GMAP_H */ --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14E613446B7; Mon, 22 Dec 2025 16:51:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422271; cv=none; b=oDuvihPhIAyuy/KwDt2lgq03GsujPfllcuxgC9th/eRtYMjwL8atX2Mrg4v9OHcxzw39o31x4viZQiv4LFDoLAlzGVQf1o6ei4/3sQf45272g7qkDrMvXkaF2zjg0qYDqGiXwI64hWP5/Ez5Xxm0WChFX8CWqZsBX0eSl8Q4k+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422271; c=relaxed/simple; bh=s0HRqDLX4s7JJKml4B0hKQLNQhtu3Evm+dKwWKi8r1A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K/W9f8FyqzGe+2JScCdhMZVhvQXlUNY5vOlwLplWGnV+CfdkDc8U0ROHP0uqv0GYxB/SPp8k9NXnndGVcF3zS8C2+OnprVCxSHj9yAT7NvuaHKCvN42qRT0FbYe8/XLsMJJcDNU0SlTHJKN1XsbREICpbRzRNilNt/XmbuAFtCc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=s+P04fgB; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="s+P04fgB" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFkioc002976; Mon, 22 Dec 2025 16:51:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=OySW0GF3tAVeu/sAu gHa3IqYYyoNrmBdgE4avZxjm3Y=; b=s+P04fgBsQ9Oyk4atk2+K7qCF9ccO9JQK x4MC9VDxZY0r+wIlyfU98ydum7zwvQgCjrgseNc7eRshrP22n+NpHFOeG2Pp3bvD pdeR9ZFWZ/ei/OEnyGrwIbEzU2Tldcgrz5W2dnLW0T2S3xGX+v41eZ3vLWHdrmKL alJYlF/FAF7UyFGfCK/yKO9wGvjMQLri+4egbm2dP/bihtwkPUXHnG+pVuvB4cHc HiRA8NqHI8vISF1pPqJYlo/dDhYYPve4ZZPOA8FDPUKkTsWdGMS0dsccG3c3hvMj lgTVtBNUgIsWQ+7/TgrJ8tkZDrhnHQQQ1rSAA4AvEu+vdF3LgDaVw== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kfq16xv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:06 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFwpkM001126; Mon, 22 Dec 2025 16:51:05 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b664s7a7p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:05 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp1NL50266546 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:02 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DFE1620040; Mon, 22 Dec 2025 16:51:01 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B94F72004B; Mon, 22 Dec 2025 16:51:00 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:00 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 20/28] KVM: s390: Add helper functions for fault handling Date: Mon, 22 Dec 2025 17:50:25 +0100 Message-ID: <20251222165033.162329-21-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=carfb3DM c=1 sm=1 tr=0 ts=694976fa cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=A4J1138ORUdiv7jBIEgA:9 X-Proofpoint-ORIG-GUID: fdgK2KnFQhm3aj_jZuf1MTiaUWOo5yRR X-Proofpoint-GUID: fdgK2KnFQhm3aj_jZuf1MTiaUWOo5yRR X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX40szZedEOPMV xlEXQhBid7n/ZPm9cZjBJqZDPDfowibGcqD0GQP4JqT8WCAxpObAOX5XBaIEdu2ZUaF/g/qFOCz cvxOONomBuGI5Jsd11oozv/jE+fxAvOnLla5/10aa/QBMQy/wQgBWIiqd1j5B20dznGHvtgJWNA bK/XUlfd7mpbNKoOJib5QHNsvON7xHZIkui8H+O7ZaKpCc+Aq9loWbwfc/7EhS0LnE7vvvUxJe4 lctTmp46BF7F/6cP+fqY2Q5xWiFSrpxTog3dp8WSt2xlJA7soFkxAxwOozQyfrzjVLukdtN4ziV Q9IByszO9hK6kitv9Bi/lGkGJQlYseWpTKKMHAj7qgCk6gwxnwdub0DF3D5jY8FpRiViHbJLCC7 dB02zlW+qE/fyVhSXf8PqvWGaaiKpoez0qNAaXZ9bAsuaHGQN9zzfFEBtdOxW9yF31Qg8GSLGZR x1XZpbR1mIBbgd/w95w== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 spamscore=0 clxscore=1015 malwarescore=0 impostorscore=0 priorityscore=1501 adultscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add some helper functions for handling multiple guest faults at the same time. This will be needed for VSIE, where a nested guest access also needs to access all the page tables that map it. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/faultin.c | 148 +++++++++++++++++++++++++++++++ arch/s390/kvm/faultin.h | 92 +++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/kvm-s390.h | 2 + 6 files changed, 245 insertions(+), 2 deletions(-) create mode 100644 arch/s390/kvm/faultin.c create mode 100644 arch/s390/kvm/faultin.h diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 6ba99870fc32..816776a8a8e3 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -442,6 +442,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; + void *mc; /* Placeholder */ }; =20 struct kvm_vm_stat { diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 21088265402c..1e2dcd3e2436 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o gmap.o +kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/faultin.c b/arch/s390/kvm/faultin.c new file mode 100644 index 000000000000..9795ed429097 --- /dev/null +++ b/arch/s390/kvm/faultin.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ +#include +#include + +#include "gmap.h" +#include "trace.h" +#include "faultin.h" + +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + +/* + * kvm_s390_faultin_gfn() - handle a dat fault. + * @vcpu: the vCPU whose gmap is to be fixed up, or NULL if operating on t= he VM. + * @kvm: the VM whose gmap is to be fixed up, or NULL if operating on a vC= PU. + * @f: the guest fault that needs to be resolved. + * + * Return: + * * 0 on success + * * < 0 in case of error + * * > 0 in case of guest exceptions + * + * Context: + * * The mm lock must not be held before calling + * * kvm->srcu must be held + * * may sleep + */ +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f) +{ + struct kvm_s390_mmu_cache *local_mc __free(kvm_s390_mmu_cache) =3D NULL; + struct kvm_s390_mmu_cache *mc =3D NULL; + struct kvm_memory_slot *slot; + unsigned long inv_seq; + int foll, rc =3D 0; + + foll =3D f->write_attempt ? FOLL_WRITE : 0; + foll |=3D f->attempt_pfault ? FOLL_NOWAIT : 0; + + if (vcpu) { + kvm =3D vcpu->kvm; + mc =3D vcpu->arch.mc; + } + + lockdep_assert_held(&kvm->srcu); + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (gmap_try_fixup_minor(kvm->arch.gmap, f) =3D=3D 0) + return 0; + } + + while (1) { + f->valid =3D false; + inv_seq =3D kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + if (vcpu) + slot =3D kvm_vcpu_gfn_to_memslot(vcpu, f->gfn); + else + slot =3D gfn_to_memslot(kvm, f->gfn); + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page); + + /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT)= */ + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { + if (unlikely(!f->attempt_pfault)) + return -EAGAIN; + if (unlikely(!vcpu)) + return -EINVAL; + trace_kvm_s390_major_guest_pfault(vcpu); + if (kvm_arch_setup_async_pf(vcpu)) + return 0; + vcpu->stat.pfault_sync++; + /* Could not setup async pfault, try again synchronously */ + foll &=3D ~FOLL_NOWAIT; + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page= ); + } + + /* Access outside memory, addressing exception */ + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + /* Signal pending: try again */ + if (f->pfn =3D=3D KVM_PFN_ERR_SIGPENDING) + return -EAGAIN; + /* Check if it's read-only memory; don't try to actually handle that cas= e. */ + if (f->pfn =3D=3D KVM_PFN_ERR_RO_FAULT) + return -EOPNOTSUPP; + /* Any other error */ + if (is_error_pfn(f->pfn)) + return -EFAULT; + + if (!mc) { + local_mc =3D kvm_s390_new_mmu_cache(); + if (!local_mc) + return -ENOMEM; + mc =3D local_mc; + } + + /* Loop, will automatically release the faulted page */ + if (mmu_invalidate_retry_gfn_unsafe(kvm, inv_seq, f->gfn)) { + kvm_release_faultin_page(kvm, f->page, true, false); + continue; + } + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (!mmu_invalidate_retry_gfn(kvm, inv_seq, f->gfn)) { + f->valid =3D true; + rc =3D gmap_link(mc, kvm->arch.gmap, f); + kvm_release_faultin_page(kvm, f->page, !!rc, f->write_attempt); + f->page =3D NULL; + } + } + kvm_release_faultin_page(kvm, f->page, true, false); + + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + } else if (rc !=3D -EAGAIN) { + return rc; + } + } +} + +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w) +{ + struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); + int foll =3D w ? FOLL_WRITE : 0; + + f->write_attempt =3D w; + f->gfn =3D gfn; + f->pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &f->writable, &f->page); + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + if (is_sigpending_pfn(f->pfn)) + return -EINTR; + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) + return -EAGAIN; + if (is_error_pfn(f->pfn)) + return -EFAULT; + + f->valid =3D true; + return 0; +} diff --git a/arch/s390/kvm/faultin.h b/arch/s390/kvm/faultin.h new file mode 100644 index 000000000000..f86176d2769c --- /dev/null +++ b/arch/s390/kvm/faultin.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_FAULTIN_H +#define __KVM_S390_FAULTIN_H + +#include + +#include "dat.h" + +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f); +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w); + +static inline int kvm_s390_faultin_gfn_simple(struct kvm_vcpu *vcpu, struc= t kvm *kvm, + gfn_t gfn, bool wr) +{ + struct guest_fault f =3D { .gfn =3D gfn, .write_attempt =3D wr, }; + + return kvm_s390_faultin_gfn(vcpu, kvm, &f); +} + +static inline int kvm_s390_get_guest_page_and_read_gpa(struct kvm *kvm, st= ruct guest_fault *f, + gpa_t gaddr, unsigned long *val) +{ + int rc; + + rc =3D kvm_s390_get_guest_page(kvm, f, gpa_to_gfn(gaddr), false); + if (rc) + return rc; + + *val =3D *(unsigned long *)phys_to_virt(pfn_to_phys(f->pfn) | offset_in_p= age(gaddr)); + + return 0; +} + +static inline void kvm_s390_release_multiple(struct kvm *kvm, struct guest= _fault *guest_faults, + int n, bool ignore) +{ + int i; + + for (i =3D 0; i < n; i++) { + kvm_release_faultin_page(kvm, guest_faults[i].page, ignore, + guest_faults[i].write_attempt); + guest_faults[i].page =3D NULL; + } +} + +static inline bool kvm_s390_multiple_faults_need_retry(struct kvm *kvm, un= signed long seq, + struct guest_fault *guest_faults, int n, + bool unsafe) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!guest_faults[i].valid) + continue; + if (unsafe && mmu_invalidate_retry_gfn_unsafe(kvm, seq, guest_faults[i].= gfn)) + return true; + if (!unsafe && mmu_invalidate_retry_gfn(kvm, seq, guest_faults[i].gfn)) + return true; + } + return false; +} + +static inline int kvm_s390_get_guest_pages(struct kvm *kvm, struct guest_f= ault *guest_faults, + gfn_t start, int n_pages, bool write_attempt) +{ + int i, rc; + + for (i =3D 0; i < n_pages; i++) { + rc =3D kvm_s390_get_guest_page(kvm, guest_faults + i, start + i, write_a= ttempt); + if (rc) + break; + } + return rc; +} + +#define kvm_s390_release_faultin_array(kvm, array, ignore) \ + kvm_s390_release_multiple(kvm, array, ARRAY_SIZE(array), ignore) + +#define kvm_s390_array_needs_retry_unsafe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), t= rue) + +#define kvm_s390_array_needs_retry_safe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), f= alse) + +#endif /* __KVM_S390_FAULTIN_H */ diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ec92e6361eab..2b5ecdc3814e 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4637,7 +4637,7 @@ bool kvm_arch_can_dequeue_async_page_present(struct k= vm_vcpu *vcpu) return true; } =20 -static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) { hva_t hva; struct kvm_arch_async_pf arch; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 65c950760993..9ce71c8433a1 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -470,6 +470,8 @@ static inline int kvm_s390_handle_dat_fault(struct kvm_= vcpu *vcpu, gpa_t gaddr, return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); } =20 +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + /* implemented in diag.c */ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu); =20 --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83F483446C7; Mon, 22 Dec 2025 16:51:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422272; cv=none; b=IGkDJ4PuTINfknRWxPacnLb+Jt1Hf0nNjyQCmqadZ3YuCYlf9MqPd/F92wtAVllEYaUQLRyo9NAXRZvkFNP7GZ+xcFuQJNyW/OAzcvYuBfmh/IB7RBA4f7/8Pl87twh0vOaZyvCODXVa4wdBtcwrH9dqbc41QpdFxvPzGqy6Db4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422272; c=relaxed/simple; bh=3bXGPw+sqw4OSROua4l0QQfwqREdJhj7Ojv2eP9AtN8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mKqZaP++IBERAZFui9eiJqaGMJG25TOOfMljdfCRgixMDRq4ElvWWxb0hBT2vbVggBSPeT/u/NsGPXxEPxFJ75vevIAUjfFqswHtQft2VlmP/oienSDHE7Bmu3Fa8roxV8W/Ph10b/RD76M3zeGL3EWLL4FNrAWpy5OtcijjMmk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Gxihkx5A; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Gxihkx5A" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDUYwb006747; Mon, 22 Dec 2025 16:51:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Yo1/wWzGulIqhiMgo ZSkXn1/6nmpxw1R8Mrxi66Tl4s=; b=Gxihkx5AJda7X7dimSmW8Jlx7pUkI7Ijz TphxZkCoev2wj27ta3hrOXHg5COONDwzNYBajAIh2M1HkDZu9JDwk1B4OAEOqUKc NBbryUbF5KjdMHzgwRU6WYvvgM5lP8/WuUdHyG7M09A/yHliQukcMycYCy7+XM7L 1CAhYz79iQwiWHvlaGWOCQkJ8IWnxFBom+C+mTdeneIeDM4m1LRsYVZMgRUzW0V6 tNFxmbZYgXuXFQ+i7foG3jzCIZ5ueQ3fULrfnGnNwIe3rMdH6zzW1pgfCRLqI2dP FvlsjhCHsBFUtJGhVy9YcgsjhSfC2xEKH4mxx6nZf53j4oQkaHqOA== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399gy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:07 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFXWeO001099; Mon, 22 Dec 2025 16:51:06 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b664s7a7s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:06 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp3Hu43450660 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:03 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 274892004B; Mon, 22 Dec 2025 16:51:03 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0A56C20043; Mon, 22 Dec 2025 16:51:02 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 21/28] KVM: s390: Add some helper functions needed for vSIE Date: Mon, 22 Dec 2025 17:50:26 +0100 Message-ID: <20251222165033.162329-22-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: cw7WvHtpcQppx9MxdhjZqUlVmJaQon1r X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX6KTuAR+SjC2Q TUfCiiHWeMcud4WvjHmN/gRMBoL7WmOvMt51fh6U0/RKfdkh7toxKA9qQPPBh9SM4Hsv9EhFH/r XeJObs5VyQsd/f8azXMZio5zC0l/lGj5DdCctdl72aFbWT1zqVowikpLvKK6R042rTJe4fgl7sO nXvDN77HU9+hY6k2cf01XboW5tm8fHoo+FsHvuRcqA2ODtOf+X75B9zO7eCzDa5WYiPqwGpNWPF Yen6BIkeZoqpjK7Jfk/2FRPgI04ABFaoGYJzDj4kcJ1Hjip03oaU8s+WxXixzCUcGxoypS3A/bs UOhuZgZEWEtiKMLTug7Mw7JbKvu9BiTm4Tl0QnGyxVtqhnCFNC6RinOleUgSITJ8UNN5fJSbZdp lnf8XAD8mZqVtdfbrgoQrFoWVHJD4kD6A9oMJ53/6zcat1KWFLYh/oUKMfX2UGwlmZ1XY6La89j mwhTxMtLsDMMXx0mVmg== X-Proofpoint-ORIG-GUID: cw7WvHtpcQppx9MxdhjZqUlVmJaQon1r X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=694976fb cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=DmfEQQ1nzNyZHvt6oY8A:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Implement gmap_protect_asce_top_level(), which was a stub. This function was a stub due to cross dependencies with other patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gmap.c | 72 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index f4f47e5fc3d5..0abed178dde0 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -22,6 +22,7 @@ #include "dat.h" #include "gmap.h" #include "kvm-s390.h" +#include "faultin.h" =20 static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) { @@ -1009,10 +1010,77 @@ static struct gmap *gmap_find_shadow(struct gmap *p= arent, union asce asce, int e return NULL; } =20 +#define CRST_TABLE_PAGES (_CRST_TABLE_SIZE / PAGE_SIZE) +struct gmap_protect_asce_top_level { + unsigned long seq; + struct guest_fault f[CRST_TABLE_PAGES]; +}; + +static inline int __gmap_protect_asce_top_level(struct kvm_s390_mmu_cache = *mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc, i; + + guard(write_lock)(&sg->kvm->mmu_lock); + + if (kvm_s390_array_needs_retry_safe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + + scoped_guard(spinlock, &sg->parent->children_lock) { + for (i =3D 0; i < CRST_TABLE_PAGES; i++) { + rc =3D gmap_protect_rmap(mc, sg, context->f[i].gfn, 0, context->f[i].pf= n, + TABLE_TYPE_REGION1 + 1, context->f[i].writable); + if (rc) + return rc; + } + gmap_add_child(sg->parent, sg); + } + + kvm_s390_release_faultin_array(sg->kvm, context->f, false); + return 0; +} + +static inline int _gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *= mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc; + + if (kvm_s390_array_needs_retry_unsafe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + rc =3D radix_tree_preload(GFP_KERNEL); + if (rc) + return rc; + rc =3D __gmap_protect_asce_top_level(mc, sg, context); + radix_tree_preload_end(); + } while (rc =3D=3D -ENOMEM); + + return rc; +} + static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) { - KVM_BUG_ON(1, sg->kvm); - return -EINVAL; + struct gmap_protect_asce_top_level context =3D {}; + union asce asce =3D sg->guest_asce; + int rc; + + KVM_BUG_ON(!is_shadow(sg), sg->kvm); + + context.seq =3D sg->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + rc =3D kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + = 1, false); + if (rc > 0) + rc =3D -EFAULT; + if (!rc) + rc =3D _gmap_protect_asce_top_level(mc, sg, &context); + if (rc) + kvm_s390_release_faultin_array(sg->kvm, context.f, true); + return rc; } =20 /** --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74CF23469E0; Mon, 22 Dec 2025 16:51:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; cv=none; b=sBgYJAUg7Bx982k2W7Jlz65xk+dllYtiYJHUazyvu6iaCFTL70Zf6Jrr9v/JkYl8xcMr4RImX5agRXY3E1yngqxxtLDwZXF2Sz8BgHtWCOtQpuH29ZRXul2za8BGrbetUXsOTsdOnbm0Fy2nOqqqt56enxpRM6uCGOjvxpol0EE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; c=relaxed/simple; bh=igISKK7+v8BlipLv7r4rS7c4NKEeh58owjmh9HI5ruM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XeG6NBWMqr5adf6ArINxxcVCgdGT7IgoXUsggMdKXD+hEMn4ugrfcuAtDUUuvL6eCuLA1xC3a1Mn5VYQeYRIRgk1dSESrv8hJ6akDU9EqVrB6r1u70kR7FDlvmBhQYdQHd0A3/03Wo3jOkHRJ9E2kk9+lbXTd5dZ3ivyqf+wSRE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=YAue7iuH; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="YAue7iuH" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDStjJ028720; Mon, 22 Dec 2025 16:51:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=gswWEi089zPf+dLFl 41BgwM5CH+umFApMFEKUX6Stgg=; b=YAue7iuHd2NApRzpsF9GeEp6kJV/qK2BJ kp5WLs2rIiBWBf94c+OU7zunmD4211nyHh8cgYvDChKoxEnIl+6CQssWAxApIWju tMVLub59nFTPMa8aR4wqs4q8cC1qTfmwxJjOMU6m7dB1NsEg9srpGRlEwWC9C2FN N0qfCxJYuGRB5LwxdsZquQIsY1kTIxmjv7J2KAQgQdqaBovJjjcPLSxtlGs9+yTY j0d3ywxUPrSXw2kgyJ06AuqNw724Kl0Xndb0EWWsrRZcnFrsUQxAHMVMAPQdlM1g jwQHBjTlAzAbmdZSK+zFLg1wJ/lHASYX0E95zpuzoMKWo7UZKQo1A== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kh496au-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:09 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMG0M4b030263; Mon, 22 Dec 2025 16:51:08 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b66gxq97j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:08 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp4VK28967388 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:04 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7EB3720040; Mon, 22 Dec 2025 16:51:04 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 45E4A20043; Mon, 22 Dec 2025 16:51:03 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:03 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 22/28] KVM: s390: Stop using CONFIG_PGSTE Date: Mon, 22 Dec 2025 17:50:27 +0100 Message-ID: <20251222165033.162329-23-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=bulBxUai c=1 sm=1 tr=0 ts=694976fd cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=WeuG7GAIFJ3A7Xv3FmEA:9 X-Proofpoint-ORIG-GUID: jUZRJLJ6fAfsaQhkZVNegxeZN8jCPJvf X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXzI0NzAf9VT1Q Gbl/1r/Ogd4idXRhagL9ZRNvXPs/SnITzQsWVTjKA3VCfT7uxX/X8RyqcVMSuv9ZWrM3rgLr5Zq VfsB2jGi19BCjKe3dhPy8W0L9tK7m38rKH4ujB0uHnwsS3VPL114OrQtIfhRX/glVUSSrlzBv+J CaH+c6tz+s2Ij+Li+eLxR1LVjf/uUoLhIIO7rj6bId/6QIb5JUYe4b1xJJ8YlM9/UP7Dy4ik2vT APtkp8cODZg9nXn79wHkq8AQ/iwJ9JdH1nEErvg457vT3B36/34zEzeI5xTL8rYmKNq7zDO/KGy TGk9C56J4Ae6rV8OWu4KMrOBgxkfEdYb9T2pKmRGy6I2Pvo+TJ1knMV6tRvx1IT8tgtoIFWY753 TFbyQSvkH2ZdW5uayRk849CIxzpBiF40RxpQfvjF/zie1Q5opwxQOw812hXCm88Aouw/B09bU1O BiHSuxcyswWrXAvC/Lg== X-Proofpoint-GUID: jUZRJLJ6fAfsaQhkZVNegxeZN8jCPJvf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Switch to using IS_ENABLED(CONFIG_KVM) instead of CONFIG_PGSTE, since the latter will be removed soon. Many CONFIG_PGSTE are left behind, because they will be removed completely in upcoming patches. The ones replaced here are mostly the ones that will stay. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden --- arch/s390/include/asm/mmu_context.h | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/s390/mm/fault.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index d9b8501bc93d..48e548c01daa 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -29,7 +29,7 @@ static inline int init_new_context(struct task_struct *ts= k, atomic_set(&mm->context.protected_count, 0); mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) mm->context.has_pgste =3D 0; mm->context.uses_skeys =3D 0; mm->context.uses_cmm =3D 0; diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 04335f5e7f47..cd4d135c4503 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -577,7 +577,7 @@ static inline int mm_has_pgste(struct mm_struct *mm) =20 static inline int mm_is_protected(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (unlikely(atomic_read(&mm->context.protected_count))) return 1; #endif @@ -632,7 +632,7 @@ static inline pud_t set_pud_bit(pud_t pud, pgprot_t pro= t) #define mm_forbids_zeropage mm_forbids_zeropage static inline int mm_forbids_zeropage(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (!mm->context.allow_cow_sharing) return 1; #endif diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index e2e13778c36a..a52aa7a99b6b 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -403,7 +403,7 @@ void do_dat_exception(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_dat_exception); =20 -#if IS_ENABLED(CONFIG_PGSTE) +#if IS_ENABLED(CONFIG_KVM) =20 void do_secure_storage_access(struct pt_regs *regs) { @@ -470,4 +470,4 @@ void do_secure_storage_access(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_secure_storage_access); =20 -#endif /* CONFIG_PGSTE */ +#endif /* CONFIG_KVM */ --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08E20346A05; Mon, 22 Dec 2025 16:51:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; cv=none; b=AO6N+VpotwyejdCWR4XKJnIeVMvaVXpCQxIRC650yocvJlZMkG0bYgWoCSqXk5rAFpyiBaZXfmT5AJvn6KZ5LSQ9IC8UjCRVi2PbY1cCbmErIDle1aMJE+/64KI91cga6tGtDPo0D+8A1Xc62ccppSuquFyHqASCO6fgT5RNmng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422275; c=relaxed/simple; bh=M+czOadzoFvgMv1Fx4fUfJMtXLjRcRowuKOqyGcoe0k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d9wifWUwDzeLiNJp722C6UDsARhZKtJ3MOS527yfEKNNg4AJF6u75ggnBl/Nx46aZgPI6/4zNj9gPIWAS4fp47qjw/nHUs5ZzHJlQZeWA7g7tH+4JVfQG9cuDDX6B556DxoQ+zsNdcSACF2QFpa852Xj7WcwdO42UdJL9b8y8Ls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Xk2BL67p; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Xk2BL67p" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDwHW1007191; Mon, 22 Dec 2025 16:51:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=fxj3oYfd0W1KwfxP2 KNSUi1hqNZHvPqvToB0azzAAk4=; b=Xk2BL67pjwzhyEUBGak6aj5Ccsku+jUkr SSWmSTDd6do3XsJGAtGCpUmbcinBxTJ3xqymo/gfWcqurEfgHWOKHLAkrYrZM3Pv pANbO0esbTVMz+rUit3MkqldRhhRsxX03I3udU+kIVrNtDghrsQ3O41CotbwTpIS wGFJl4bZmxMXkQMJoLO2s+JyJH2h3XRp9Ym+6QDIfcIW8FyanEAjTqhPTrBCPp2o lVxyDOfQRDz6bqGtBfLTMJWR8RRZZ7U4hJLaq2XFU7IrdHzWNUv/gvyi8BJjFNLy k8BZu7jW6aT/DL5u02gCafwf1pJjyVgjaGJqmCWoe7xJ9P0pxx0UA== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kfq16y5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:11 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMDcXZE005286; Mon, 22 Dec 2025 16:51:09 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b67mjy3k8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:09 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp52Z50987440 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:06 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D0B7420040; Mon, 22 Dec 2025 16:51:05 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A942120043; Mon, 22 Dec 2025 16:51:04 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:04 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 23/28] KVM: s390: Storage key functions refactoring Date: Mon, 22 Dec 2025 17:50:28 +0100 Message-ID: <20251222165033.162329-24-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=carfb3DM c=1 sm=1 tr=0 ts=694976ff cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=pIs5vcIGI8bcUft9BRQA:9 X-Proofpoint-ORIG-GUID: GFeOCg1ZIvjAOzFz3utUctV-DwVwmeF9 X-Proofpoint-GUID: GFeOCg1ZIvjAOzFz3utUctV-DwVwmeF9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX2WxjFGwdHMtP WbAsVszdn8wFvaTFKfYYVhLdWJuPKBIVByQ3Ik836G582gUJTa1cBMDiCILrdGqOjmGfgKYsznf R7Ov0mYkdlI7Z1KWPSMHOXHGiGdWX4EzVOBKhzOd7mVvQaO6H76ficyBK56x15PJViuWVsc00+Y iX6v/MG7aCB8bev+lgrtFfK1WE3awQXVlZZ4ctH88EUVvZy/8SuTj3Oqb9XgaKa47BBunlmASl6 Fl1osDJKOdWfeqO9eCXtjN3BYpV8Mgg4qxeK+PZhATTCWQBJa42L07ChlJavDZflygFtnkTl7Gt ek59Dh2KAVetjX5HGwFArJfpraTK367zNAAvBj46aba1gnVn9SmTw/hUObsDPw0p7dbHFX/uAAh FLHQhbs46EL2aT1TTaDiwRDYeJMR/OBQHSIjv1SfcKo5ZJ++Bau5AGy10aureEhqM9DCvfGlgzM qVcTIFz1UNgGa2lJx+A== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 spamscore=0 clxscore=1015 malwarescore=0 impostorscore=0 priorityscore=1501 adultscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Refactor some storage key functions to improve readability. Introduce helper functions that will be used in the next patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gaccess.c | 36 +++++++++--------- arch/s390/kvm/gaccess.h | 4 +- arch/s390/kvm/kvm-s390.c | 80 +++++++++++++++------------------------- arch/s390/kvm/kvm-s390.h | 8 ++++ 4 files changed, 58 insertions(+), 70 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 9df868bddf9a..1d0725f3951a 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -974,9 +974,8 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, - __uint128_t *old_addr, __uint128_t new, - u8 access_key, bool *success) +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 acc, bool *success) { gfn_t gfn =3D gpa_to_gfn(gpa); struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); @@ -1008,41 +1007,42 @@ int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa= _t gpa, int len, case 1: { u8 old; =20 - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, *old_addr, new, access_= key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); + *success =3D !ret && old =3D=3D old_addr->one; + old_addr->one =3D old; break; } case 2: { u16 old; =20 - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); + *success =3D !ret && old =3D=3D old_addr->two; + old_addr->two =3D old; break; } case 4: { u32 old; =20 - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); + *success =3D !ret && old =3D=3D old_addr->four; + old_addr->four =3D old; break; } case 8: { u64 old; =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); + *success =3D !ret && old =3D=3D old_addr->eight; + old_addr->eight =3D old; break; } case 16: { __uint128_t old; =20 - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, *old_addr, new= , access_key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, + new.sixteen, acc); + *success =3D !ret && old =3D=3D old_addr->sixteen; + old_addr->sixteen =3D old; break; } default: diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 3fde45a151f2..774cdf19998f 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,8 +206,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, __uint= 128_t *old, - __uint128_t new, u8 access_key, bool *success); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 access_key, bool *success); =20 /** * write_guest_with_key - copy data from kernel space to guest space diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2b5ecdc3814e..f5411e093fb5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2900,9 +2900,9 @@ static int mem_op_validate_common(struct kvm_s390_mem= _op *mop, u64 supported_fla static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op = *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; - int r, srcu_idx; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION | KVM_S390_MEMOP_F_CHECK_ONLY); @@ -2915,52 +2915,36 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, = struct kvm_s390_mem_op *mop) return -ENOMEM; } =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { - r =3D check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); - goto out_unlock; - } - if (acc_mode =3D=3D GACC_FETCH) { + if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) + return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); + + if (acc_mode =3D=3D GACC_STORE && copy_from_user(tmpbuf, uaddr, mop->siz= e)) + return -EFAULT; r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_FETCH, mop->key); + mop->size, acc_mode, mop->key); if (r) - goto out_unlock; - if (copy_to_user(uaddr, tmpbuf, mop->size)) - r =3D -EFAULT; - } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_unlock; - } - r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_STORE, mop->key); + return r; + if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); - - vfree(tmpbuf); - return r; + return 0; } =20 static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem= _op *mop) { void __user *uaddr =3D (void __user *)mop->buf; void __user *old_addr =3D (void __user *)mop->old_addr; - union { - __uint128_t quad; - char raw[sizeof(__uint128_t)]; - } old =3D { .quad =3D 0}, new =3D { .quad =3D 0 }; - unsigned int off_in_quad =3D sizeof(new) - mop->size; - int r, srcu_idx; - bool success; + union kvm_s390_quad old =3D { .sixteen =3D 0 }; + union kvm_s390_quad new =3D { .sixteen =3D 0 }; + bool success =3D false; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION); if (r) @@ -2972,25 +2956,21 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *k= vm, struct kvm_s390_mem_op *m */ if (mop->size > sizeof(new)) return -EINVAL; - if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size)) + if (copy_from_user(&new, uaddr, mop->size)) return -EFAULT; - if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size)) + if (copy_from_user(&old, old_addr, mop->size)) return -EFAULT; =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } - - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old.quad, - new.quad, mop->key, &success); - if (!success && copy_to_user(old_addr, &old.raw[off_in_quad], mop->size)) - r =3D -EFAULT; + r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, + mop->key, &success); =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); + if (!success && copy_to_user(old_addr, &old, mop->size)) + return -EFAULT; + } return r; } =20 diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9ce71c8433a1..c44c52266e26 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -22,6 +22,14 @@ =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 +union kvm_s390_quad { + __uint128_t sixteen; + unsigned long eight; + unsigned int four; + unsigned short two; + unsigned char one; +}; + static inline void kvm_s390_fpu_store(struct kvm_run *run) { fpu_stfpc(&run->s.regs.fpc); --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A269934AB14; Mon, 22 Dec 2025 16:51:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422291; cv=none; b=A7HRIIco/S0382db7uDKptDAg2KhRYovxkUNV8tISxA9pmXWJUgOwPXVjSJib6Pjku7iJ8D6dySAz8Mda4zbZ6flVvGQHr6roH8fxSSclZGwT5FWecSaWyT/KcxcYlcVXlmlKBmBvhUe6ZOBD/92XK6UzWXaH5JYoJB2vE1SZcA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422291; c=relaxed/simple; bh=tQ7ZfXB9ZXDlcmFysBo23Xp9GFuYby3tlZYYHMXadd4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sgP85p2uSLnhCEFAslY8KqWv92pTPAG/7o5m6nEjT1SE3eOZe+GAEIJUtyA2LsnsZmkY2yomSbgfxnzO9izvOXRa7iMHFIgFCoSNz6O+FXigonWaoKCXBuPnQ+dENZHYwCNpgNPW/ZCnwcMvyeI+2iqYorhINRBmxa2YAg5IkZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=TiKckmly; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="TiKckmly" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMEPSIH009936; Mon, 22 Dec 2025 16:51:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=sfabJ+KRqcST8fijI Ee+SWh0YBAFVQBRgPbcPZm55fw=; b=TiKckmlymK6higgoBCeLmqu9dWI7nj3bI Dnz/inMB2AdvIk9CIyTbmOBlq1X2Uy061YlBZFx9u+S3b+CZymxRBY0CgN0VKfYU y4jENuXnP+F6gRP+tENSezpb1JCcnf1OUskHC89QGvvqe5o67KuU0jhWxnm8myOf +ZIefU/Kp6w8WTGGNN+bz0k6HLJUGE4SQZYYaGfGWs48jNswsGeWBih7GJz9IhuY OVz0kfXjNiO1H7ebGTdnz15zbYR5+YBxMQxZUWvQYm1Scu3G7rfDx4/ASG9gghHr iwv/GtfXcUW5W8XHl8yCxnZADW/HYNYOY3z2C+psqi1zAvH032Zyw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ketrtcr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:13 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGN9dc005242; Mon, 22 Dec 2025 16:51:13 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b67mjy3kg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:12 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp7MZ45744570 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:07 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A15B520040; Mon, 22 Dec 2025 16:51:07 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F3BA020043; Mon, 22 Dec 2025 16:51:05 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:05 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 24/28] KVM: s390: Switch to new gmap Date: Mon, 22 Dec 2025 17:50:29 +0100 Message-ID: <20251222165033.162329-25-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: X5O2oS-FoorksShJAzu3ajnvz2OhlnOq X-Authority-Analysis: v=2.4 cv=Qdxrf8bv c=1 sm=1 tr=0 ts=69497701 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=uKg510MlI9vT2tuH0HEA:9 a=obqATW7AqfSl8cD7:21 X-Proofpoint-ORIG-GUID: X5O2oS-FoorksShJAzu3ajnvz2OhlnOq X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX/I1/A7Ogst6R 3S9V86+FcL5LCLCG+E2gfhdVAMgL7Yn54B9h0jltF7EJJblDWhR1IKcgauwSjfb8w+xcZ3MAVjI CGshJkgypAWHsUq8YmePBUpQYLAeSwRwDzeY8Ss2xWVOfmRDtVg86gsmpHgkhqvQjBgXlQxjnPx BKMI7+RjEbUKDzVfBw/PyRe7hcYucZe7E7m1gUSanP4sxka0M8sAZDDbxGIJkqQPZ87iJK81Je1 crZgveRWoDaGY4K4mUH0esreFeZX013y4NMrFFL2L+WlVFgQ+NO7ydQMo8JsJzGc5slzEZ8knoi 8IfUNAyl+Zm+1wPMoDGcAZxg2Zn6hNSPYEVfBf75Py/KvSKo9uQtGcOGDC/zjAdGujgxi0qyb5e UCQ7tay8FMTA5KzMcdE1ndXoa60Z6m3j+hbBp8IBOWw9tpGwEtAU0cNJoKbdB+gbDt+KWCnPiGV fLXJVGqLCanSw0mkqGg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Switch KVM/s390 to use the new gmap code. Remove includes to and include "gmap.h" instead; fix all the existing users of the old gmap functions to use the new ones instead. Fix guest storage key access functions to work with the new gmap. Signed-off-by: Claudio Imbrenda --- arch/s390/Kconfig | 2 +- arch/s390/include/asm/kvm_host.h | 5 +- arch/s390/include/asm/mmu_context.h | 4 - arch/s390/include/asm/tlb.h | 3 - arch/s390/include/asm/uaccess.h | 70 +-- arch/s390/include/asm/uv.h | 1 - arch/s390/kernel/uv.c | 114 +--- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c | 2 +- arch/s390/kvm/gaccess.c | 869 +++++++++++++++++----------- arch/s390/kvm/gaccess.h | 18 +- arch/s390/kvm/gmap-vsie.c | 141 ----- arch/s390/kvm/intercept.c | 15 +- arch/s390/kvm/interrupt.c | 6 +- arch/s390/kvm/kvm-s390.c | 767 +++++++----------------- arch/s390/kvm/kvm-s390.h | 19 +- arch/s390/kvm/priv.c | 213 +++---- arch/s390/kvm/pv.c | 174 ++++-- arch/s390/kvm/vsie.c | 169 +++--- arch/s390/lib/uaccess.c | 184 +----- arch/s390/mm/gmap_helpers.c | 38 +- 21 files changed, 1106 insertions(+), 1710 deletions(-) delete mode 100644 arch/s390/kvm/gmap-vsie.c diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 0e5fad5f06ca..8270754985e9 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -33,7 +33,7 @@ config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 config PGSTE - def_bool y if KVM + def_bool n =20 config AUDIT_ARCH def_bool y diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 816776a8a8e3..64a50f0862aa 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -442,7 +442,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; - void *mc; /* Placeholder */ + struct kvm_s390_mmu_cache *mc; }; =20 struct kvm_vm_stat { @@ -636,6 +636,8 @@ struct kvm_s390_pv { struct mutex import_lock; }; =20 +struct kvm_s390_mmu_cache; + struct kvm_arch { struct esca_block *sca; debug_info_t *dbf; @@ -675,6 +677,7 @@ struct kvm_arch { struct kvm_s390_pv pv; struct list_head kzdev_list; spinlock_t kzdev_list_lock; + struct kvm_s390_mmu_cache *mc; }; =20 #define KVM_HVA_ERR_BAD (-1UL) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index 48e548c01daa..bd1ef5e2d2eb 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -30,11 +30,7 @@ static inline int init_new_context(struct task_struct *t= sk, mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; #if IS_ENABLED(CONFIG_KVM) - mm->context.has_pgste =3D 0; - mm->context.uses_skeys =3D 0; - mm->context.uses_cmm =3D 0; mm->context.allow_cow_sharing =3D 1; - mm->context.allow_gmap_hpage_1m =3D 0; #endif switch (mm->context.asce_limit) { default: diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index 1e50f6f1ad9d..7354b42ee994 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -36,7 +36,6 @@ static inline bool __tlb_remove_folio_pages(struct mmu_ga= ther *tlb, =20 #include #include -#include =20 /* * Release the page cache reference for a pte removed by @@ -85,8 +84,6 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, p= gtable_t pte, tlb->mm->context.flush_mm =3D 1; tlb->freed_tables =3D 1; tlb->cleared_pmds =3D 1; - if (mm_has_pgste(tlb->mm)) - gmap_unlink(tlb->mm, (unsigned long *)pte, address); tlb_remove_ptdesc(tlb, virt_to_ptdesc(pte)); } =20 diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uacces= s.h index c5e02addcd67..dff035372601 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -471,65 +471,15 @@ do { \ #define arch_get_kernel_nofault __mvc_kernel_nofault #define arch_put_kernel_nofault __mvc_kernel_nofault =20 -void __cmpxchg_user_key_called_with_bad_pointer(void); - -int __cmpxchg_user_key1(unsigned long address, unsigned char *uval, - unsigned char old, unsigned char new, unsigned long key); -int __cmpxchg_user_key2(unsigned long address, unsigned short *uval, - unsigned short old, unsigned short new, unsigned long key); -int __cmpxchg_user_key4(unsigned long address, unsigned int *uval, - unsigned int old, unsigned int new, unsigned long key); -int __cmpxchg_user_key8(unsigned long address, unsigned long *uval, - unsigned long old, unsigned long new, unsigned long key); -int __cmpxchg_user_key16(unsigned long address, __uint128_t *uval, - __uint128_t old, __uint128_t new, unsigned long key); - -static __always_inline int _cmpxchg_user_key(unsigned long address, void *= uval, - __uint128_t old, __uint128_t new, - unsigned long key, int size) -{ - switch (size) { - case 1: return __cmpxchg_user_key1(address, uval, old, new, key); - case 2: return __cmpxchg_user_key2(address, uval, old, new, key); - case 4: return __cmpxchg_user_key4(address, uval, old, new, key); - case 8: return __cmpxchg_user_key8(address, uval, old, new, key); - case 16: return __cmpxchg_user_key16(address, uval, old, new, key); - default: __cmpxchg_user_key_called_with_bad_pointer(); - } - return 0; -} - -/** - * cmpxchg_user_key() - cmpxchg with user space target, honoring storage k= eys - * @ptr: User space address of value to compare to @old and exchange with - * @new. Must be aligned to sizeof(*@ptr). - * @uval: Address where the old value of *@ptr is written to. - * @old: Old value. Compared to the content pointed to by @ptr in order to - * determine if the exchange occurs. The old value read from *@ptr is - * written to *@uval. - * @new: New value to place at *@ptr. - * @key: Access key to use for checking storage key protection. - * - * Perform a cmpxchg on a user space target, honoring storage key protecti= on. - * @key alone determines how key checking is performed, neither - * storage-protection-override nor fetch-protection-override apply. - * The caller must compare *@uval and @old to determine if values have been - * exchanged. In case of an exception *@uval is set to zero. - * - * Return: 0: cmpxchg executed - * -EFAULT: an exception happened when trying to access *@ptr - * -EAGAIN: maxed out number of retries (byte and short only) - */ -#define cmpxchg_user_key(ptr, uval, old, new, key) \ -({ \ - __typeof__(ptr) __ptr =3D (ptr); \ - __typeof__(uval) __uval =3D (uval); \ - \ - BUILD_BUG_ON(sizeof(*(__ptr)) !=3D sizeof(*(__uval))); \ - might_fault(); \ - __chk_user_ptr(__ptr); \ - _cmpxchg_user_key((unsigned long)(__ptr), (void *)(__uval), \ - (old), (new), (key), sizeof(*(__ptr))); \ -}) +int __cmpxchg_key1(void *address, unsigned char *uval, unsigned char old, + unsigned char new, unsigned long key); +int __cmpxchg_key2(void *address, unsigned short *uval, unsigned short old, + unsigned short new, unsigned long key); +int __cmpxchg_key4(void *address, unsigned int *uval, unsigned int old, + unsigned int new, unsigned long key); +int __cmpxchg_key8(void *address, unsigned long *uval, unsigned long old, + unsigned long new, unsigned long key); +int __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_t old, + __uint128_t new, unsigned long key); =20 #endif /* __S390_UACCESS_H */ diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index 0744874ca6df..d919e69662f5 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -631,7 +631,6 @@ int uv_pin_shared(unsigned long paddr); int uv_destroy_folio(struct folio *folio); int uv_destroy_pte(pte_t pte); int uv_convert_from_secure_pte(pte_t pte); -int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb); int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio); int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb); int uv_convert_from_secure(unsigned long paddr); diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index cb4e8089fbca..daf7481692ed 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -209,39 +209,6 @@ int uv_convert_from_secure_pte(pte_t pte) return uv_convert_from_secure_folio(pfn_folio(pte_pfn(pte))); } =20 -/** - * should_export_before_import - Determine whether an export is needed - * before an import-like operation - * @uvcb: the Ultravisor control block of the UVC to be performed - * @mm: the mm of the process - * - * Returns whether an export is needed before every import-like operation. - * This is needed for shared pages, which don't trigger a secure storage - * exception when accessed from a different guest. - * - * Although considered as one, the Unpin Page UVC is not an actual import, - * so it is not affected. - * - * No export is needed also when there is only one protected VM, because t= he - * page cannot belong to the wrong VM in that case (there is no "other VM" - * it can belong to). - * - * Return: true if an export is needed before every import, otherwise fals= e. - */ -static bool should_export_before_import(struct uv_cb_header *uvcb, struct = mm_struct *mm) -{ - /* - * The misc feature indicates, among other things, that importing a - * shared page from a different protected VM will automatically also - * transfer its ownership. - */ - if (uv_has_feature(BIT_UV_FEAT_MISC)) - return false; - if (uvcb->cmd =3D=3D UVC_CMD_UNPIN_PAGE_SHARED) - return false; - return atomic_read(&mm->context.protected_count) > 1; -} - /* * Calculate the expected ref_count for a folio that would otherwise have = no * further pins. This was cribbed from similar functions in other places in @@ -313,20 +280,6 @@ int __make_folio_secure(struct folio *folio, struct uv= _cb_header *uvcb) } EXPORT_SYMBOL(__make_folio_secure); =20 -static int make_folio_secure(struct mm_struct *mm, struct folio *folio, st= ruct uv_cb_header *uvcb) -{ - int rc; - - if (!folio_trylock(folio)) - return -EAGAIN; - if (should_export_before_import(uvcb, mm)) - uv_convert_from_secure(folio_to_phys(folio)); - rc =3D __make_folio_secure(folio, uvcb); - folio_unlock(folio); - - return rc; -} - /** * s390_wiggle_split_folio() - try to drain extra references to a folio and * split the folio if it is large. @@ -414,56 +367,6 @@ int s390_wiggle_split_folio(struct mm_struct *mm, stru= ct folio *folio) } EXPORT_SYMBOL_GPL(s390_wiggle_split_folio); =20 -int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_= header *uvcb) -{ - struct vm_area_struct *vma; - struct folio_walk fw; - struct folio *folio; - int rc; - - mmap_read_lock(mm); - vma =3D vma_lookup(mm, hva); - if (!vma) { - mmap_read_unlock(mm); - return -EFAULT; - } - folio =3D folio_walk_start(&fw, vma, hva, 0); - if (!folio) { - mmap_read_unlock(mm); - return -ENXIO; - } - - folio_get(folio); - /* - * Secure pages cannot be huge and userspace should not combine both. - * In case userspace does it anyway this will result in an -EFAULT for - * the unpack. The guest is thus never reaching secure mode. - * If userspace plays dirty tricks and decides to map huge pages at a - * later point in time, it will receive a segmentation fault or - * KVM_RUN will return -EFAULT. - */ - if (folio_test_hugetlb(folio)) - rc =3D -EFAULT; - else if (folio_test_large(folio)) - rc =3D -E2BIG; - else if (!pte_write(fw.pte) || (pte_val(fw.pte) & _PAGE_INVALID)) - rc =3D -ENXIO; - else - rc =3D make_folio_secure(mm, folio, uvcb); - folio_walk_end(&fw, vma); - mmap_read_unlock(mm); - - if (rc =3D=3D -E2BIG || rc =3D=3D -EBUSY) { - rc =3D s390_wiggle_split_folio(mm, folio); - if (!rc) - rc =3D -EAGAIN; - } - folio_put(folio); - - return rc; -} -EXPORT_SYMBOL_GPL(make_hva_secure); - /* * To be called with the folio locked or with an extra reference! This will * prevent kvm_s390_pv_make_secure() from touching the folio concurrently. @@ -474,21 +377,18 @@ int arch_make_folio_accessible(struct folio *folio) { int rc =3D 0; =20 - /* Large folios cannot be secure */ - if (unlikely(folio_test_large(folio))) - return 0; - /* - * PG_arch_1 is used in 2 places: - * 1. for storage keys of hugetlb folios and KVM - * 2. As an indication that this small folio might be secure. This can - * overindicate, e.g. we set the bit before calling - * convert_to_secure. - * As secure pages are never large folios, both variants can co-exists. + * PG_arch_1 is used as an indication that this small folio might be + * secure. This can overindicate, e.g. we set the bit before calling + * convert_to_secure. */ if (!test_bit(PG_arch_1, &folio->flags.f)) return 0; =20 + /* Large folios cannot be secure */ + if (WARN_ON_ONCE(folio_test_large(folio))) + return -EFAULT; + rc =3D uv_pin_shared(folio_to_phys(folio)); if (!rc) { clear_bit(PG_arch_1, &folio->flags.f); diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 1e2dcd3e2436..dac9d53b23d8 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -8,7 +8,7 @@ include $(srctree)/virt/kvm/Makefile.kvm ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o -kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index 53233dec8cad..d89d1c381522 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -10,13 +10,13 @@ =20 #include #include -#include #include #include #include "kvm-s390.h" #include "trace.h" #include "trace-s390.h" #include "gaccess.h" +#include "gmap.h" =20 static void do_discard_gfn_range(struct kvm_vcpu *vcpu, gfn_t gfn_start, g= fn_t gfn_end) { diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 1d0725f3951a..0fe3d91b8305 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -11,15 +11,43 @@ #include #include #include +#include +#include +#include #include #include -#include #include #include "kvm-s390.h" +#include "dat.h" +#include "gmap.h" #include "gaccess.h" +#include "faultin.h" =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 +union dat_table_entry { + unsigned long val; + union region1_table_entry pgd; + union region2_table_entry p4d; + union region3_table_entry pud; + union segment_table_entry pmd; + union page_table_entry pte; +}; + +#define WALK_N_ENTRIES 7 +#define LEVEL_MEM -2 +struct pgtwalk { + struct guest_fault raw_entries[WALK_N_ENTRIES]; + gpa_t last_addr; + int level; + bool p; +}; + +static inline struct guest_fault *get_entries(struct pgtwalk *w) +{ + return w->raw_entries - LEVEL_MEM; +} + /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to @@ -81,6 +109,28 @@ struct aste { /* .. more fields there */ }; =20 +union oac { + unsigned int val; + struct { + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac1; + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac2; + }; +}; + int ipte_lock_held(struct kvm *kvm) { if (sclp.has_siif) @@ -603,28 +653,16 @@ static int low_address_protection_enabled(struct kvm_= vcpu *vcpu, static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, enum gacc_mode mode, gpa_t gpa) { - u8 storage_key, access_control; - bool fetch_protected; - unsigned long hva; + union skey storage_key; int r; =20 - if (access_key =3D=3D 0) - return 0; - - hva =3D gfn_to_hva(kvm, gpa_to_gfn(gpa)); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &kvm->mmu_lock) + r =3D dat_get_storage_key(kvm->arch.gmap->asce, gpa_to_gfn(gpa), &storag= e_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); - if (access_control =3D=3D access_key) + if (access_key =3D=3D 0 || storage_key.acc =3D=3D access_key) return 0; - fetch_protected =3D storage_key & _PAGE_FP_BIT; - if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !fetch_protect= ed) + if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !storage_key.f= p) return 0; return PGM_PROTECTION; } @@ -667,8 +705,7 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu *v= cpu, u8 access_key, enum gacc_mode mode, union asce asce, gpa_t gpa, unsigned long ga, unsigned int len) { - u8 storage_key, access_control; - unsigned long hva; + union skey storage_key; int r; =20 /* access key 0 matches any storage key -> allow */ @@ -678,26 +715,23 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu = *vcpu, u8 access_key, * caller needs to ensure that gfn is accessible, so we can * assume that this cannot fail */ - hva =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gpa)); - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + r =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gpa), &stora= ge_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); /* access key matches storage key -> allow */ - if (access_control =3D=3D access_key) + if (storage_key.acc =3D=3D access_key) return 0; if (mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) { /* it is a fetch and fetch protection is off -> allow */ - if (!(storage_key & _PAGE_FP_BIT)) + if (!storage_key.fp) return 0; if (fetch_prot_override_applicable(vcpu, mode, asce) && fetch_prot_override_applies(ga, len)) return 0; } if (storage_prot_override_applicable(vcpu) && - storage_prot_override_applies(access_control)) + storage_prot_override_applies(storage_key.acc)) return 0; return PGM_PROTECTION; } @@ -797,37 +831,79 @@ static int access_guest_page_gpa(struct kvm *kvm, enu= m gacc_mode mode, gpa_t gpa return rc; } =20 +static int mvcos_key(void *to, const void *from, unsigned long size, u8 ds= t_key, u8 src_key) +{ + union oac spec =3D { + .oac1.key =3D dst_key, + .oac1.k =3D !!dst_key, + .oac2.key =3D src_key, + .oac2.k =3D !!src_key, + }; + int exception =3D PGM_PROTECTION; + + asm_inline volatile( + " lr %%r0,%[spec]\n" + "0: mvcos %[to],%[from],%[size]\n" + "1: lhi %[exc],0\n" + "2:\n" + EX_TABLE(0b, 2b) + EX_TABLE(1b, 2b) + : [size] "+d" (size), [to] "=3DQ" (*(char *)to), [exc] "+d" (exception) + : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) + : "memory", "cc", "0"); + return exception; +} + +struct acc_page_key_context { + void *data; + int exception; + unsigned short offset; + unsigned short len; + bool store; + u8 access_key; +}; + +static void _access_guest_page_with_key_gpa(struct guest_fault *f) +{ + struct acc_page_key_context *context =3D f->priv; + void *ptr; + int r; + + ptr =3D __va(PFN_PHYS(f->pfn) | context->offset); + + if (context->store) + r =3D mvcos_key(ptr, context->data, context->len, context->access_key, 0= ); + else + r =3D mvcos_key(context->data, ptr, context->len, 0, context->access_key= ); + + context->exception =3D r; +} + static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) + void *data, unsigned int len, u8 acc) { - struct kvm_memory_slot *slot; - bool writable; - gfn_t gfn; - hva_t hva; + struct acc_page_key_context context =3D { + .offset =3D offset_in_page(gpa), + .len =3D len, + .data =3D data, + .access_key =3D acc, + .store =3D mode =3D=3D GACC_STORE, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D mode =3D=3D GACC_STORE, + .callback =3D _access_guest_page_with_key_gpa, + }; int rc; =20 - gfn =3D gpa_to_gfn(gpa); - slot =3D gfn_to_memslot(kvm, gfn); - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); + if (KVM_BUG_ON((len + context.offset) > PAGE_SIZE, kvm)) + return -EINVAL; =20 - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a ro memslot, even tho that can't occur (they're unsuppo= rted). - * Don't try to actually handle that case. - */ - if (!writable && mode =3D=3D GACC_STORE) - return -EOPNOTSUPP; - hva +=3D offset_in_page(gpa); - if (mode =3D=3D GACC_STORE) - rc =3D copy_to_user_key((void __user *)hva, data, len, access_key); - else - rc =3D copy_from_user_key(data, (void __user *)hva, len, access_key); + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); if (rc) - return PGM_PROTECTION; - if (mode =3D=3D GACC_STORE) - mark_page_dirty_in_slot(kvm, slot, gfn); - return 0; + return rc; + return context.exception; } =20 int access_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, void *data, @@ -950,18 +1026,101 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsign= ed long gra, return rc; } =20 +/** + * __cmpxchg_with_key() - cmpxchg memory, honoring storage keys + * @ptr: Address of value to compare to *@old and exchange with + * @new. Must be aligned to sizeof(*@ptr). + * @uval: Address where the old value of *@ptr is written to. + * @old: Old value. Compared to the content pointed to by @ptr in order to + * determine if the exchange occurs. The old value read from *@ptr is + * written to *@uval. + * @new: New value to place at *@ptr. + * @access_key: Access key to use for checking storage key protection. + * + * Perform a cmpxchg on guest memory, honoring storage key protection. + * @access_key alone determines how key checking is performed, neither + * storage-protection-override nor fetch-protection-override apply. + * In case of an exception *@uval is set to zero. + * + * Return: + * * 0: cmpxchg executed successfully + * * 1: cmpxchg executed unsuccessfully + * * PGM_PROTECTION: an exception happened when trying to access *@ptr + * * -EAGAIN: maxed out number of retries (byte and short only) + */ +static int __cmpxchg_with_key(union kvm_s390_quad *ptr, union kvm_s390_qua= d *old, + union kvm_s390_quad new, int size, u8 access_key) +{ + union kvm_s390_quad tmp =3D { .sixteen =3D 0 }; + int rc; + + /* + * The cmpxchg_key macro depends on the type of "old", so we need + * a case for each valid length and get some code duplication as long + * as we don't introduce a new macro. + */ + switch (size) { + case 1: + rc =3D __cmpxchg_key1(&ptr->one, &tmp.one, old->one, new.one, access_key= ); + break; + case 2: + rc =3D __cmpxchg_key2(&ptr->two, &tmp.two, old->two, new.two, access_key= ); + break; + case 4: + rc =3D __cmpxchg_key4(&ptr->four, &tmp.four, old->four, new.four, access= _key); + break; + case 8: + rc =3D __cmpxchg_key8(&ptr->eight, &tmp.eight, old->eight, new.eight, ac= cess_key); + break; + case 16: + rc =3D __cmpxchg_key16(&ptr->sixteen, &tmp.sixteen, old->sixteen, new.si= xteen, + access_key); + break; + default: + return -EINVAL; + } + if (!rc && memcmp(&tmp, old, size)) + rc =3D 1; + *old =3D tmp; + /* + * Assume that the fault is caused by protection, either key protection + * or user page write protection. + */ + if (rc =3D=3D -EFAULT) + rc =3D PGM_PROTECTION; + return rc; +} + +struct cmpxchg_key_context { + union kvm_s390_quad new; + union kvm_s390_quad *old; + int exception; + unsigned short offset; + u8 access_key; + u8 len; +}; + +static void _cmpxchg_guest_abs_with_key(struct guest_fault *f) +{ + struct cmpxchg_key_context *context =3D f->priv; + + context->exception =3D __cmpxchg_with_key(__va(PFN_PHYS(f->pfn) | context= ->offset), + context->old, context->new, context->len, + context->access_key); +} + /** * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute addres= s. * @kvm: Virtual machine instance. * @gpa: Absolute guest address of the location to be changed. * @len: Operand length of the cmpxchg, required: 1 <=3D len <=3D 16. Prov= iding a * non power of two will result in failure. - * @old_addr: Pointer to old value. If the location at @gpa contains this = value, - * the exchange will succeed. After calling cmpxchg_guest_abs_w= ith_key() - * *@old_addr contains the value at @gpa before the attempt to - * exchange the value. + * @old: Pointer to old value. If the location at @gpa contains this value, + * the exchange will succeed. After calling cmpxchg_guest_abs_with_k= ey() + * *@old contains the value at @gpa before the attempt to + * exchange the value. * @new: The value to place at @gpa. - * @access_key: The access key to use for the guest access. + * @acc: The access key to use for the guest access. * @success: output value indicating if an exchange occurred. * * Atomically exchange the value at @gpa by @new, if it contains *@old. @@ -974,89 +1133,36 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigne= d long gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 acc, bool *success) { - gfn_t gfn =3D gpa_to_gfn(gpa); - struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); - bool writable; - hva_t hva; - int ret; - - if (!IS_ALIGNED(gpa, len)) - return -EINVAL; - - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a read-only memslot, even though that cannot occur - * since those are unsupported. - * Don't try to actually handle that case. - */ - if (!writable) - return -EOPNOTSUPP; - - hva +=3D offset_in_page(gpa); - /* - * The cmpxchg_user_key macro depends on the type of "old", so we need - * a case for each valid length and get some code duplication as long - * as we don't introduce a new macro. - */ - switch (len) { - case 1: { - u8 old; - - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); - *success =3D !ret && old =3D=3D old_addr->one; - old_addr->one =3D old; - break; - } - case 2: { - u16 old; - - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); - *success =3D !ret && old =3D=3D old_addr->two; - old_addr->two =3D old; - break; - } - case 4: { - u32 old; - - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); - *success =3D !ret && old =3D=3D old_addr->four; - old_addr->four =3D old; - break; - } - case 8: { - u64 old; + struct cmpxchg_key_context context =3D { + .old =3D old, + .new =3D new, + .offset =3D offset_in_page(gpa), + .len =3D len, + .access_key =3D acc, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D true, + .callback =3D _cmpxchg_guest_abs_with_key, + }; + int rc; =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); - *success =3D !ret && old =3D=3D old_addr->eight; - old_addr->eight =3D old; - break; - } - case 16: { - __uint128_t old; + lockdep_assert_held(&kvm->srcu); =20 - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, - new.sixteen, acc); - *success =3D !ret && old =3D=3D old_addr->sixteen; - old_addr->sixteen =3D old; - break; - } - default: + if (len > 16 || !IS_ALIGNED(gpa, len)) return -EINVAL; - } - if (*success) - mark_page_dirty_in_slot(kvm, slot, gfn); - /* - * Assume that the fault is caused by protection, either key protection - * or user page write protection. - */ - if (ret =3D=3D -EFAULT) - ret =3D PGM_PROTECTION; - return ret; + + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); + if (rc) + return rc; + *success =3D !context.exception; + if (context.exception =3D=3D 1) + return 0; + return context.exception; } =20 /** @@ -1158,304 +1264,365 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_= vcpu *vcpu, unsigned long gra) } =20 /** - * kvm_s390_shadow_tables - walk the guest page table and create shadow ta= bles + * walk_guest_tables() - walk the guest page table and pin the dat tables * @sg: pointer to the shadow guest address space structure * @saddr: faulting address in the shadow gmap - * @pgt: pointer to the beginning of the page table for the given address = if - * successful (return value 0), or to the first invalid DAT entry in - * case of exceptions (return value > 0) - * @dat_protection: referenced memory is write protected - * @fake: pgt references contiguous guest memory block, not a pgtable + * @w: will be filled with information on the pinned pages + * @wr: indicates a write access if true + * + * Return: + * * 0 in case of success, + * * a PIC code > 0 in case the address translation fails + * * an error code < 0 if other errors happen in the host */ -static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, - unsigned long *pgt, int *dat_protection, - int *fake) +static int walk_guest_tables(struct gmap *sg, unsigned long saddr, struct = pgtwalk *w, bool wr) { - struct kvm *kvm; - struct gmap *parent; - union asce asce; + struct gmap *parent =3D sg->parent; + struct guest_fault *entries; + union dat_table_entry table; union vaddress vaddr; unsigned long ptr; + struct kvm *kvm; + union asce asce; int rc; =20 - *fake =3D 0; - *dat_protection =3D 0; - kvm =3D sg->private; - parent =3D sg->parent; + kvm =3D parent->kvm; + asce =3D sg->guest_asce; + entries =3D get_entries(w); + + w->level =3D LEVEL_MEM; + w->last_addr =3D saddr; + if (asce.r) + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, gpa_to_gfn(sadd= r), false); + vaddr.addr =3D saddr; - asce.val =3D sg->orig_asce; ptr =3D asce.rsto * PAGE_SIZE; - if (asce.r) { - *fake =3D 1; - ptr =3D 0; - asce.dt =3D ASCE_TYPE_REGION1; - } + + if (!asce_contains_gfn(asce, gpa_to_gfn(saddr))) + return PGM_ASCE_TYPE; switch (asce.dt) { case ASCE_TYPE_REGION1: - if (vaddr.rfx01 > asce.tl && !*fake) + if (vaddr.rfx01 > asce.tl) return PGM_REGION_FIRST_TRANS; break; case ASCE_TYPE_REGION2: - if (vaddr.rfx) - return PGM_ASCE_TYPE; if (vaddr.rsx01 > asce.tl) return PGM_REGION_SECOND_TRANS; break; case ASCE_TYPE_REGION3: - if (vaddr.rfx || vaddr.rsx) - return PGM_ASCE_TYPE; if (vaddr.rtx01 > asce.tl) return PGM_REGION_THIRD_TRANS; break; case ASCE_TYPE_SEGMENT: - if (vaddr.rfx || vaddr.rsx || vaddr.rtx) - return PGM_ASCE_TYPE; if (vaddr.sx01 > asce.tl) return PGM_SEGMENT_TRANSLATION; break; } =20 + w->level =3D asce.dt; switch (asce.dt) { - case ASCE_TYPE_REGION1: { - union region1_table_entry rfte; - - if (*fake) { - ptr +=3D vaddr.rfx * _REGION1_SIZE; - rfte.val =3D ptr; - goto shadow_r2t; - } - *pgt =3D ptr + vaddr.rfx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val); + case ASCE_TYPE_REGION1: + w->last_addr =3D ptr + vaddr.rfx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rfte.i) + if (table.pgd.i) return PGM_REGION_FIRST_TRANS; - if (rfte.tt !=3D TABLE_TYPE_REGION1) + if (table.pgd.tt !=3D TABLE_TYPE_REGION1) return PGM_TRANSLATION_SPEC; - if (vaddr.rsx01 < rfte.tf || vaddr.rsx01 > rfte.tl) + if (vaddr.rsx01 < table.pgd.tf || vaddr.rsx01 > table.pgd.tl) return PGM_REGION_SECOND_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rfte.p; - ptr =3D rfte.rto * PAGE_SIZE; -shadow_r2t: - rc =3D gmap_shadow_r2t(sg, saddr, rfte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r1_entry++; - } + w->p |=3D table.pgd.p; + ptr =3D table.pgd.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION2: { - union region2_table_entry rste; - - if (*fake) { - ptr +=3D vaddr.rsx * _REGION2_SIZE; - rste.val =3D ptr; - goto shadow_r3t; - } - *pgt =3D ptr + vaddr.rsx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val); + case ASCE_TYPE_REGION2: + w->last_addr =3D ptr + vaddr.rsx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rste.i) + if (table.p4d.i) return PGM_REGION_SECOND_TRANS; - if (rste.tt !=3D TABLE_TYPE_REGION2) + if (table.p4d.tt !=3D TABLE_TYPE_REGION2) return PGM_TRANSLATION_SPEC; - if (vaddr.rtx01 < rste.tf || vaddr.rtx01 > rste.tl) + if (vaddr.rtx01 < table.p4d.tf || vaddr.rtx01 > table.p4d.tl) return PGM_REGION_THIRD_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rste.p; - ptr =3D rste.rto * PAGE_SIZE; -shadow_r3t: - rste.p |=3D *dat_protection; - rc =3D gmap_shadow_r3t(sg, saddr, rste.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r2_entry++; - } + w->p |=3D table.p4d.p; + ptr =3D table.p4d.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION3: { - union region3_table_entry rtte; - - if (*fake) { - ptr +=3D vaddr.rtx * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; - } - *pgt =3D ptr + vaddr.rtx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val); + case ASCE_TYPE_REGION3: + w->last_addr =3D ptr + vaddr.rtx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rtte.i) + if (table.pud.i) return PGM_REGION_THIRD_TRANS; - if (rtte.tt !=3D TABLE_TYPE_REGION3) + if (table.pud.tt !=3D TABLE_TYPE_REGION3) return PGM_TRANSLATION_SPEC; - if (rtte.cr && asce.p && sg->edat_level >=3D 2) + if (table.pud.cr && asce.p && sg->edat_level >=3D 2) return PGM_TRANSLATION_SPEC; - if (rtte.fc && sg->edat_level >=3D 2) { - *dat_protection |=3D rtte.fc0.p; - *fake =3D 1; - ptr =3D rtte.fc1.rfaa * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; + if (sg->edat_level >=3D 1) + w->p |=3D table.pud.p; + if (table.pud.fc && sg->edat_level >=3D 2) { + table.val =3D u64_replace_bits(table.val, saddr, ~_REGION3_MASK); + goto edat_applies; } - if (vaddr.sx01 < rtte.fc0.tf || vaddr.sx01 > rtte.fc0.tl) + if (vaddr.sx01 < table.pud.fc0.tf || vaddr.sx01 > table.pud.fc0.tl) return PGM_SEGMENT_TRANSLATION; - if (sg->edat_level >=3D 1) - *dat_protection |=3D rtte.fc0.p; - ptr =3D rtte.fc0.sto * PAGE_SIZE; -shadow_sgt: - rtte.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_sgt(sg, saddr, rtte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r3_entry++; - } + ptr =3D table.pud.fc0.sto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_SEGMENT: { - union segment_table_entry ste; - - if (*fake) { - ptr +=3D vaddr.sx * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; - } - *pgt =3D ptr + vaddr.sx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val); + case ASCE_TYPE_SEGMENT: + w->last_addr =3D ptr + vaddr.sx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (ste.i) + if (table.pmd.i) return PGM_SEGMENT_TRANSLATION; - if (ste.tt !=3D TABLE_TYPE_SEGMENT) + if (table.pmd.tt !=3D TABLE_TYPE_SEGMENT) return PGM_TRANSLATION_SPEC; - if (ste.cs && asce.p) + if (table.pmd.cs && asce.p) return PGM_TRANSLATION_SPEC; - *dat_protection |=3D ste.fc0.p; - if (ste.fc && sg->edat_level >=3D 1) { - *fake =3D 1; - ptr =3D ste.fc1.sfaa * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; + w->p |=3D table.pmd.p; + if (table.pmd.fc && sg->edat_level >=3D 1) { + table.val =3D u64_replace_bits(table.val, saddr, ~_SEGMENT_MASK); + goto edat_applies; } - ptr =3D ste.fc0.pto * (PAGE_SIZE / 2); -shadow_pgt: - ste.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_pgt(sg, saddr, ste.val, *fake); + ptr =3D table.pmd.fc0.pto * (PAGE_SIZE / 2); + w->level--; + } + w->last_addr =3D ptr + vaddr.px * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); + if (rc) + return rc; + if (table.pte.i) + return PGM_PAGE_TRANSLATION; + if (table.pte.z) + return PGM_TRANSLATION_SPEC; + w->p |=3D table.pte.p; +edat_applies: + if (wr && w->p) + return PGM_PROTECTION; + + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, table.pte.pfra, = wr); +} + +static int _do_shadow_pte(struct gmap *sg, gpa_t raddr, union pte *ptep_h,= union pte *ptep, + struct guest_fault *f, bool p) +{ + union pgste pgste; + union pte newpte; + int rc; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, f->gfn, gpa_to_gfn(raddr), TABLE_TYPE_PAGE_T= ABLE); + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep_h); + newpte =3D _pte(f->pfn, f->writable, !p, 0); + newpte.s.d |=3D ptep->s.d; + newpte.s.sd |=3D ptep->s.sd; + newpte.h.p &=3D ptep->h.p; + pgste =3D _gmap_ptep_xchg(sg->parent, ptep_h, newpte, pgste, f->gfn, fals= e); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep_h, pgste); + + newpte =3D _pte(f->pfn, 0, !p, 0); + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, newpte, gpa_to_gfn(raddr), sg->asc= e, uses_skeys(sg)); + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static int _do_shadow_crste(struct gmap *sg, gpa_t raddr, union crste *hos= t, union crste *table, + struct guest_fault *f, bool p) +{ + union crste newcrste; + gfn_t gfn; + int rc; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + gfn =3D f->gfn & gpa_to_gfn(is_pmd(*table) ? _SEGMENT_MASK : _REGION3_MAS= K); + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, gfn, gpa_to_gfn(raddr), host->h.tt); + if (rc) + return rc; + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, f->writable, !p); + newcrste.s.fc1.d |=3D host->s.fc1.d; + newcrste.s.fc1.sd |=3D host->s.fc1.sd; + newcrste.h.p &=3D host->h.p; + newcrste.s.fc1.vsie_notif =3D 1; + newcrste.s.fc1.prefix_notif =3D host->s.fc1.prefix_notif; + _gmap_crstep_xchg(sg->parent, host, newcrste, f->gfn, false); + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, 0, !p); + dat_crstep_xchg(table, newcrste, gpa_to_gfn(raddr), sg->asce); + return 0; +} + +static int _gaccess_do_shadow(struct kvm_s390_mmu_cache *mc, struct gmap *= sg, + unsigned long saddr, struct pgtwalk *w) +{ + struct guest_fault *entries; + int flags, i, hl, gl, l, rc; + union crste *table, *host; + union pte *ptep, *ptep_h; + + lockdep_assert_held(&sg->kvm->mmu_lock); + lockdep_assert_held(&sg->parent->children_lock); + + entries =3D get_entries(w); + ptep_h =3D NULL; + ptep =3D NULL; + + rc =3D dat_entry_walk(NULL, gpa_to_gfn(saddr), sg->asce, DAT_WALK_ANY, TA= BLE_TYPE_PAGE_TABLE, + &table, &ptep); + if (rc) + return rc; + + /* A race occourred. The shadow mapping is already valid, nothing to do */ + if ((ptep && !ptep->h.i) || (!ptep && crste_leaf(*table))) + return 0; + + gl =3D get_level(table, ptep); + + /* + * Skip levels that are already protected. For each level, protect + * only the page containing the entry, not the whole table. + */ + for (i =3D gl ; i >=3D w->level; i--) { + rc =3D gmap_protect_rmap(mc, sg, entries[i - 1].gfn, gpa_to_gfn(saddr), + entries[i - 1].pfn, i, entries[i - 1].writable); + if (rc) + return rc; + } + + rc =3D dat_entry_walk(NULL, entries[LEVEL_MEM].gfn, sg->parent->asce, DAT= _WALK_LEAF, + TABLE_TYPE_PAGE_TABLE, &host, &ptep_h); + if (rc) + return rc; + + hl =3D get_level(host, ptep_h); + /* Get the smallest granularity */ + l =3D min3(gl, hl, w->level); + + flags =3D DAT_WALK_SPLIT_ALLOC | (uses_skeys(sg->parent) ? DAT_WALK_USES_= SKEYS : 0); + /* If necessary, create the shadow mapping */ + if (l < gl) { + rc =3D dat_entry_walk(mc, gpa_to_gfn(saddr), sg->asce, flags, l, &table,= &ptep); if (rc) return rc; - kvm->stat.gmap_shadow_sg_entry++; } + if (l < hl) { + rc =3D dat_entry_walk(mc, entries[LEVEL_MEM].gfn, sg->parent->asce, + flags, l, &host, &ptep_h); + if (rc) + return rc; } - /* Return the parent address of the page table */ - *pgt =3D ptr; - return 0; + + if (KVM_BUG_ON(l > TABLE_TYPE_REGION3, sg->kvm)) + return -EFAULT; + if (l =3D=3D TABLE_TYPE_PAGE_TABLE) + return _do_shadow_pte(sg, saddr, ptep_h, ptep, entries + LEVEL_MEM, w->p= ); + return _do_shadow_crste(sg, saddr, host, table, entries + LEVEL_MEM, w->p= ); } =20 -/** - * shadow_pgt_lookup() - find a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: the address in the shadow aguest address space - * @pgt: parent gmap address of the page table to get shadowed - * @dat_protection: if the pgtable is marked as protected by dat - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if the shadow page table was found and -EAGAIN if the page - * table was not found. - * - * Called with sg->mm->mmap_lock in read. - */ -static int shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, unsigne= d long *pgt, - int *dat_protection, int *fake) +static inline int _gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap= *sg, gpa_t saddr, + unsigned long seq, struct pgtwalk *walk) { - unsigned long pt_index; - unsigned long *table; - struct page *page; int rc; =20 - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { - /* Shadow page tables are full pages (pte+pgste) */ - page =3D pfn_to_page(*table >> PAGE_SHIFT); - pt_index =3D gmap_pgste_get_pgt_addr(page_to_virt(page)); - *pgt =3D pt_index & ~GMAP_SHADOW_FAKE_TABLE; - *dat_protection =3D !!(*table & _SEGMENT_ENTRY_PROTECT); - *fake =3D !!(pt_index & GMAP_SHADOW_FAKE_TABLE); - rc =3D 0; - } else { - rc =3D -EAGAIN; + if (kvm_s390_array_needs_retry_unsafe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; +again: + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + if (kvm_s390_array_needs_retry_safe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; + scoped_guard(spinlock, &sg->parent->children_lock) + rc =3D _gaccess_do_shadow(vcpu->arch.mc, sg, saddr, walk); + if (rc =3D=3D -ENOMEM) + goto again; + if (!rc) + kvm_s390_release_faultin_array(vcpu->kvm, walk->raw_entries, false); } - spin_unlock(&sg->guest_table_lock); return rc; } =20 /** - * kvm_s390_shadow_fault - handle fault on a shadow page table - * @vcpu: virtual cpu - * @sg: pointer to the shadow guest address space structure + * __kvm_s390_shadow_fault() - handle fault on a shadow page table + * @vcpu: virtual cpu that triggered the action + * @sg: the shadow guest address space structure * @saddr: faulting address in the shadow gmap * @datptr: will contain the address of the faulting DAT table entry, or of * the valid leaf, plus some flags + * @wr: whether this is a write access * - * Returns: - 0 if the shadow fault was successfully resolved - * - > 0 (pgm exception code) on exceptions while faulting - * - -EAGAIN if the caller can retry immediately - * - -EFAULT when accessing invalid guest addresses - * - -ENOMEM if out of memory + * Return: + * * 0 if the shadow fault was successfully resolved + * * > 0 (pgm exception code) on exceptions while faulting + * * -EAGAIN if the caller can retry immediately + * * -EFAULT when accessing invalid guest addresses + * * -ENOMEM if out of memory */ -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, - unsigned long saddr, unsigned long *datptr) +static int __gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, = gpa_t saddr, + union mvpg_pei *datptr, bool wr) { - union vaddress vaddr; - union page_table_entry pte; - unsigned long pgt =3D 0; - int dat_protection, fake; + struct pgtwalk walk =3D { .p =3D false, }; + unsigned long seq; int rc; =20 - if (KVM_BUG_ON(!gmap_is_shadow(sg), vcpu->kvm)) - return -EFAULT; + seq =3D vcpu->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); =20 - mmap_read_lock(sg->mm); - /* - * We don't want any guest-2 tables to change - so the parent - * tables/pointers we read stay valid - unshadowing is however - * always possible - only guest_table_lock protects us. - */ - ipte_lock(vcpu->kvm); - - rc =3D shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); + rc =3D walk_guest_tables(sg, saddr, &walk, wr); + if (datptr) { + datptr->val =3D walk.last_addr; + datptr->dat_prot =3D wr && walk.p; + datptr->not_pte =3D walk.level > TABLE_TYPE_PAGE_TABLE; + datptr->real =3D sg->guest_asce.r; + } + if (!rc) + rc =3D _gaccess_shadow_fault(vcpu, sg, saddr, seq, &walk); if (rc) - rc =3D kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection, - &fake); + kvm_s390_release_faultin_array(vcpu->kvm, walk.raw_entries, true); + return rc; +} =20 - vaddr.addr =3D saddr; - if (fake) { - pte.val =3D pgt + vaddr.px * PAGE_SIZE; - goto shadow_page; - } +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr) +{ + int rc; =20 - switch (rc) { - case PGM_SEGMENT_TRANSLATION: - case PGM_REGION_THIRD_TRANS: - case PGM_REGION_SECOND_TRANS: - case PGM_REGION_FIRST_TRANS: - pgt |=3D PEI_NOT_PTE; - break; - case 0: - pgt +=3D vaddr.px * 8; - rc =3D gmap_read_table(sg->parent, pgt, &pte.val); - } - if (datptr) - *datptr =3D pgt | dat_protection * PEI_DAT_PROT; - if (!rc && pte.i) - rc =3D PGM_PAGE_TRANSLATION; - if (!rc && pte.z) - rc =3D PGM_TRANSLATION_SPEC; -shadow_page: - pte.p |=3D dat_protection; - if (!rc) - rc =3D gmap_shadow_page(sg, saddr, __pte(pte.val)); - vcpu->kvm->stat.gmap_shadow_pg_entry++; + if (KVM_BUG_ON(!test_bit(GMAP_FLAG_SHADOW, &sg->flags), vcpu->kvm)) + return -EFAULT; + + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + + ipte_lock(vcpu->kvm); + rc =3D __gaccess_shadow_fault(vcpu, sg, saddr, datptr, wr || sg->guest_as= ce.r); ipte_unlock(vcpu->kvm); - mmap_read_unlock(sg->mm); + return rc; } diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 774cdf19998f..b5385cec60f4 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,7 +206,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 access_key, bool *success); =20 /** @@ -450,11 +450,17 @@ void ipte_unlock(struct kvm *kvm); int ipte_lock_held(struct kvm *kvm); int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long= gra); =20 -/* MVPG PEI indication bits */ -#define PEI_DAT_PROT 2 -#define PEI_NOT_PTE 4 +union mvpg_pei { + unsigned long val; + struct { + unsigned long addr : 61; + unsigned long not_pte : 1; + unsigned long dat_prot: 1; + unsigned long real : 1; + }; +}; =20 -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow, - unsigned long saddr, unsigned long *datptr); +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr); =20 #endif /* __KVM_S390_GACCESS_H */ diff --git a/arch/s390/kvm/gmap-vsie.c b/arch/s390/kvm/gmap-vsie.c deleted file mode 100644 index 56ef153eb8fe..000000000000 --- a/arch/s390/kvm/gmap-vsie.c +++ /dev/null @@ -1,141 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Guest memory management for KVM/s390 nested VMs. - * - * Copyright IBM Corp. 2008, 2020, 2024 - * - * Author(s): Claudio Imbrenda - * Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -#include "kvm-s390.h" - -/** - * gmap_find_shadow - find a specific asce in the list of shadow tables - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * Returns the pointer to a gmap if a shadow table with the given asce is - * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, - * otherwise NULL - * - * Context: Called with parent->shadow_lock held - */ -static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long as= ce, int edat_level) -{ - struct gmap *sg; - - lockdep_assert_held(&parent->shadow_lock); - list_for_each_entry(sg, &parent->children, list) { - if (!gmap_shadow_valid(sg, asce, edat_level)) - continue; - if (!sg->initialized) - return ERR_PTR(-EAGAIN); - refcount_inc(&sg->ref_count); - return sg; - } - return NULL; -} - -/** - * gmap_shadow - create/find a shadow guest address space - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * The pages of the top level page table referred by the asce parameter - * will be set to read-only and marked in the PGSTEs of the kvm process. - * The shadow table will be removed automatically on any change to the - * PTE mapping for the source table. - * - * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, - * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the - * parent gmap table could not be protected. - */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level) -{ - struct gmap *sg, *new; - unsigned long limit; - int rc; - - if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)par= ent->private) || - KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private)) - return ERR_PTR(-EFAULT); - spin_lock(&parent->shadow_lock); - sg =3D gmap_find_shadow(parent, asce, edat_level); - spin_unlock(&parent->shadow_lock); - if (sg) - return sg; - /* Create a new shadow gmap */ - limit =3D -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11)); - if (asce & _ASCE_REAL_SPACE) - limit =3D -1UL; - new =3D gmap_alloc(limit); - if (!new) - return ERR_PTR(-ENOMEM); - new->mm =3D parent->mm; - new->parent =3D gmap_get(parent); - new->private =3D parent->private; - new->orig_asce =3D asce; - new->edat_level =3D edat_level; - new->initialized =3D false; - spin_lock(&parent->shadow_lock); - /* Recheck if another CPU created the same shadow */ - sg =3D gmap_find_shadow(parent, asce, edat_level); - if (sg) { - spin_unlock(&parent->shadow_lock); - gmap_free(new); - return sg; - } - if (asce & _ASCE_REAL_SPACE) { - /* only allow one real-space gmap shadow */ - list_for_each_entry(sg, &parent->children, list) { - if (sg->orig_asce & _ASCE_REAL_SPACE) { - spin_lock(&sg->guest_table_lock); - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - break; - } - } - } - refcount_set(&new->ref_count, 2); - list_add(&new->list, &parent->children); - if (asce & _ASCE_REAL_SPACE) { - /* nothing to protect, return right away */ - new->initialized =3D true; - spin_unlock(&parent->shadow_lock); - return new; - } - spin_unlock(&parent->shadow_lock); - /* protect after insertion, so it will get properly invalidated */ - mmap_read_lock(parent->mm); - rc =3D __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN, - ((asce & _ASCE_TABLE_LENGTH) + 1), - PROT_READ, GMAP_NOTIFY_SHADOW); - mmap_read_unlock(parent->mm); - spin_lock(&parent->shadow_lock); - new->initialized =3D true; - if (rc) { - list_del(&new->list); - gmap_free(new); - new =3D ERR_PTR(rc); - } - spin_unlock(&parent->shadow_lock); - return new; -} diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c index 420ae62977e2..39aff324203e 100644 --- a/arch/s390/kvm/intercept.c +++ b/arch/s390/kvm/intercept.c @@ -21,6 +21,7 @@ #include "gaccess.h" #include "trace.h" #include "trace-s390.h" +#include "faultin.h" =20 u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu) { @@ -367,8 +368,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg2, &srcaddr, GACC_FETCH, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, srcaddr, 0); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(srcaddr), fals= e); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 /* Ensure that the source is paged-in, no actual access -> no key checkin= g */ @@ -376,8 +380,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg1, &dstaddr, GACC_STORE, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(dstaddr), true= ); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 kvm_s390_retry_instr(vcpu); diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 249cdc822ec5..f55eca9aa638 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -34,6 +33,7 @@ #include "gaccess.h" #include "trace-s390.h" #include "pci.h" +#include "gmap.h" =20 #define PFAULT_INIT 0x0600 #define PFAULT_DONE 0x0680 @@ -2632,12 +2632,12 @@ static int flic_set_attr(struct kvm_device *dev, st= ruct kvm_device_attr *attr) case KVM_DEV_FLIC_APF_ENABLE: if (kvm_is_ucontrol(dev->kvm)) return -EINVAL; - dev->kvm->arch.gmap->pfault_enabled =3D 1; + set_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags); break; case KVM_DEV_FLIC_APF_DISABLE_WAIT: if (kvm_is_ucontrol(dev->kvm)) return -EINVAL; - dev->kvm->arch.gmap->pfault_enabled =3D 0; + clear_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags); /* * Make sure no async faults are in transition when * clearing the queues. So we don't need to worry diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f5411e093fb5..a714037cef31 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include @@ -53,6 +52,8 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" +#include "faultin.h" #include "pci.h" =20 #define CREATE_TRACE_POINTS @@ -264,16 +265,11 @@ static DECLARE_BITMAP(kvm_s390_available_cpu_feat, KV= M_S390_VM_CPU_FEAT_NR_BITS) /* available subfunctions indicated via query / "test bit" */ static struct kvm_s390_vm_cpu_subfunc kvm_s390_available_subfunc; =20 -static struct gmap_notifier gmap_notifier; -static struct gmap_notifier vsie_gmap_notifier; debug_info_t *kvm_s390_dbf; debug_info_t *kvm_s390_dbf_uv; =20 /* Section: not file related */ /* forward declarations */ -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); - static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta) { u8 delta_idx =3D 0; @@ -529,10 +525,6 @@ static int __init __kvm_s390_init(void) if (rc) goto err_gib; =20 - gmap_notifier.notifier_call =3D kvm_gmap_notifier; - gmap_register_pte_notifier(&gmap_notifier); - vsie_gmap_notifier.notifier_call =3D kvm_s390_vsie_gmap_notifier; - gmap_register_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_register(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -552,8 +544,6 @@ static int __init __kvm_s390_init(void) =20 static void __kvm_s390_exit(void) { - gmap_unregister_pte_notifier(&gmap_notifier); - gmap_unregister_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_unregister(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -569,7 +559,7 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { if (ioctl =3D=3D KVM_S390_ENABLE_SIE) - return s390_enable_sie(); + return 0; return -EINVAL; } =20 @@ -698,32 +688,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *mems= lot) { - int i; - gfn_t cur_gfn, last_gfn; - unsigned long gaddr, vmaddr; - struct gmap *gmap =3D kvm->arch.gmap; - DECLARE_BITMAP(bitmap, _PAGE_ENTRIES); - - /* Loop over all guest segments */ - cur_gfn =3D memslot->base_gfn; - last_gfn =3D memslot->base_gfn + memslot->npages; - for (; cur_gfn <=3D last_gfn; cur_gfn +=3D _PAGE_ENTRIES) { - gaddr =3D gfn_to_gpa(cur_gfn); - vmaddr =3D gfn_to_hva_memslot(memslot, cur_gfn); - if (kvm_is_error_hva(vmaddr)) - continue; - - bitmap_zero(bitmap, _PAGE_ENTRIES); - gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr); - for (i =3D 0; i < _PAGE_ENTRIES; i++) { - if (test_bit(i, bitmap)) - mark_page_dirty(kvm, cur_gfn + i); - } + gfn_t last_gfn =3D memslot->base_gfn + memslot->npages; =20 - if (fatal_signal_pending(current)) - return; - cond_resched(); - } + scoped_guard(read_lock, &kvm->mmu_lock) + gmap_sync_dirty_log(kvm->arch.gmap, memslot->base_gfn, last_gfn); } =20 /* Section: vm related */ @@ -883,9 +851,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; - mmap_write_lock(kvm->mm); - kvm->mm->context.allow_gmap_hpage_1m =3D 1; - mmap_write_unlock(kvm->mm); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -958,7 +923,7 @@ static int kvm_s390_get_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_att= r *attr) { int ret; - unsigned int idx; + switch (attr->attr) { case KVM_S390_VM_MEM_ENABLE_CMMA: ret =3D -ENXIO; @@ -969,8 +934,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att mutex_lock(&kvm->lock); if (kvm->created_vcpus) ret =3D -EBUSY; - else if (kvm->mm->context.allow_gmap_hpage_1m) - ret =3D -EINVAL; else { kvm->arch.use_cmma =3D 1; /* Not compatible with cmma. */ @@ -979,7 +942,9 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att } mutex_unlock(&kvm->lock); break; - case KVM_S390_VM_MEM_CLR_CMMA: + case KVM_S390_VM_MEM_CLR_CMMA: { + gfn_t start_gfn =3D 0; + ret =3D -ENXIO; if (!sclp.has_cmma) break; @@ -988,13 +953,13 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, = struct kvm_device_attr *att break; =20 VM_EVENT(kvm, 3, "%s", "RESET: CMMA states"); - mutex_lock(&kvm->lock); - idx =3D srcu_read_lock(&kvm->srcu); - s390_reset_cmma(kvm->arch.gmap->mm); - srcu_read_unlock(&kvm->srcu, idx); - mutex_unlock(&kvm->lock); + do { + start_gfn =3D dat_reset_cmma(kvm->arch.gmap->asce, start_gfn); + cond_resched(); + } while (start_gfn); ret =3D 0; break; + } case KVM_S390_VM_MEM_LIMIT_SIZE: { unsigned long new_limit; =20 @@ -1011,29 +976,12 @@ static int kvm_s390_set_mem_control(struct kvm *kvm,= struct kvm_device_attr *att if (!new_limit) return -EINVAL; =20 - /* gmap_create takes last usable address */ - if (new_limit !=3D KVM_S390_NO_MEM_LIMIT) - new_limit -=3D 1; - ret =3D -EBUSY; - mutex_lock(&kvm->lock); - if (!kvm->created_vcpus) { - /* gmap_create will round the limit up */ - struct gmap *new =3D gmap_create(current->mm, new_limit); - - if (!new) { - ret =3D -ENOMEM; - } else { - gmap_remove(kvm->arch.gmap); - new->private =3D kvm; - kvm->arch.gmap =3D new; - ret =3D 0; - } - } - mutex_unlock(&kvm->lock); + if (!kvm->created_vcpus) + ret =3D gmap_set_limit(kvm->arch.gmap, gpa_to_gfn(new_limit)); VM_EVENT(kvm, 3, "SET: max guest address: %lu", new_limit); VM_EVENT(kvm, 3, "New guest asce: 0x%p", - (void *) kvm->arch.gmap->asce); + (void *)kvm->arch.gmap->asce.val); break; } default: @@ -1198,19 +1146,13 @@ static int kvm_s390_vm_start_migration(struct kvm *= kvm) kvm->arch.migration_mode =3D 1; return 0; } - /* mark all the pages in active slots as dirty */ kvm_for_each_memslot(ms, bkt, slots) { if (!ms->dirty_bitmap) return -EINVAL; - /* - * The second half of the bitmap is only used on x86, - * and would be wasted otherwise, so we put it to good - * use here to keep track of the state of the storage - * attributes. - */ - memset(kvm_second_dirty_bitmap(ms), 0xff, kvm_dirty_bitmap_bytes(ms)); ram_pages +=3D ms->npages; } + /* mark all the pages as dirty */ + gmap_set_cmma_all_dirty(kvm->arch.gmap); atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages); kvm->arch.migration_mode =3D 1; kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION); @@ -2116,40 +2058,32 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, st= ruct kvm_device_attr *attr) =20 static int kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; =20 /* Is this guest using storage keys? */ - if (!mm_uses_skeys(current->mm)) + if (!uses_skeys(kvm->arch.gmap)) return KVM_S390_GET_SKEYS_NONE; =20 /* Enforce sane limit on memory allocation */ if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0; i < args->count; i++) { + r =3D dat_get_storage_key(kvm->arch.gmap->asce, + args->start_gfn + i, keys + i); + if (r) + break; } - - r =3D get_guest_storage_key(current->mm, hva, &keys[i]); - if (r) - break; } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); =20 if (!r) { r =3D copy_to_user((uint8_t __user *)args->skeydata_addr, keys, @@ -2164,10 +2098,9 @@ static int kvm_s390_get_skeys(struct kvm *kvm, struc= t kvm_s390_skeys *args) =20 static int kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; - bool unlocked; + struct kvm_s390_mmu_cache *mc; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; @@ -2176,7 +2109,7 @@ static int kvm_s390_set_skeys(struct kvm *kvm, struct= kvm_s390_skeys *args) if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 @@ -2188,159 +2121,41 @@ static int kvm_s390_set_skeys(struct kvm *kvm, str= uct kvm_s390_skeys *args) } =20 /* Enable storage key handling for the guest */ - r =3D s390_enable_skey(); + r =3D gmap_enable_skeys(kvm->arch.gmap); if (r) goto out; =20 - i =3D 0; - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - while (i < args->count) { - unlocked =3D false; - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; - } - + r =3D -EINVAL; + for (i =3D 0; i < args->count; i++) { /* Lowest order bit is reserved */ - if (keys[i] & 0x01) { - r =3D -EINVAL; - break; - } - - r =3D set_guest_storage_key(current->mm, hva, keys[i], 0); - if (r) { - r =3D fixup_user_fault(current->mm, hva, - FAULT_FLAG_WRITE, &unlocked); - if (r) - break; - } - if (!r) - i++; - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); -out: - kvfree(keys); - return r; -} - -/* - * Base address and length must be sent at the start of each block, theref= ore - * it's cheaper to send some clean data, as long as it's less than the siz= e of - * two longs. - */ -#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) -/* for consistency */ -#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) - -static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *a= rgs, - u8 *res, unsigned long bufsize) -{ - unsigned long pgstev, hva, cur_gfn =3D args->start_gfn; - - args->count =3D 0; - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - /* - * We return an error if the first value was invalid, but we - * return successfully if at least one value was copied. - */ - if (kvm_is_error_hva(hva)) - return args->count ? 0 : -EFAULT; - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - res[args->count++] =3D (pgstev >> 24) & 0x43; - cur_gfn++; + if (keys[i].zero) + goto out; } =20 - return 0; -} - -static struct kvm_memory_slot *gfn_to_memslot_approx(struct kvm_memslots *= slots, - gfn_t gfn) -{ - return ____gfn_to_memslot(slots, gfn, true); -} - -static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots, - unsigned long cur_gfn) -{ - struct kvm_memory_slot *ms =3D gfn_to_memslot_approx(slots, cur_gfn); - unsigned long ofs =3D cur_gfn - ms->base_gfn; - struct rb_node *mnode =3D &ms->gfn_node[slots->node_idx]; - - if (ms->base_gfn + ms->npages <=3D cur_gfn) { - mnode =3D rb_next(mnode); - /* If we are above the highest slot, wrap around */ - if (!mnode) - mnode =3D rb_first(&slots->gfn_tree); - - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D 0; - } - - if (cur_gfn < ms->base_gfn) - ofs =3D 0; - - ofs =3D find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs); - while (ofs >=3D ms->npages && (mnode =3D rb_next(mnode))) { - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages); + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + r =3D -ENOMEM; + goto out; } - return ms->base_gfn + ofs; -} =20 -static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *ar= gs, - u8 *res, unsigned long bufsize) -{ - unsigned long mem_end, cur_gfn, next_gfn, hva, pgstev; - struct kvm_memslots *slots =3D kvm_memslots(kvm); - struct kvm_memory_slot *ms; - - if (unlikely(kvm_memslots_empty(slots))) - return 0; - - cur_gfn =3D kvm_s390_next_dirty_cmma(slots, args->start_gfn); - ms =3D gfn_to_memslot(kvm, cur_gfn); - args->count =3D 0; - args->start_gfn =3D cur_gfn; - if (!ms) - return 0; - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - mem_end =3D kvm_s390_get_gfn_end(slots); - - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - if (kvm_is_error_hva(hva)) - return 0; - /* Decrement only if we actually flipped the bit to 0 */ - if (test_and_clear_bit(cur_gfn - ms->base_gfn, kvm_second_dirty_bitmap(m= s))) - atomic64_dec(&kvm->arch.cmma_dirty_pages); - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - /* Save the value */ - res[args->count++] =3D (pgstev >> 24) & 0x43; - /* If the next bit is too far away, stop. */ - if (next_gfn > cur_gfn + KVM_S390_MAX_BIT_DISTANCE) - return 0; - /* If we reached the previous "next", find the next one */ - if (cur_gfn =3D=3D next_gfn) - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - /* Reached the end of memory or of the buffer, stop */ - if ((next_gfn >=3D mem_end) || - (next_gfn - args->start_gfn >=3D bufsize)) - return 0; - cur_gfn++; - /* Reached the end of the current memslot, take the next one. */ - if (cur_gfn - ms->base_gfn >=3D ms->npages) { - ms =3D gfn_to_memslot(kvm, cur_gfn); - if (!ms) - return 0; + r =3D 0; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r =3D=3D -ENOMEM) + break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0 ; i < args->count; i++) { + r =3D dat_set_storage_key(mc, kvm->arch.gmap->asce, + args->start_gfn + i, keys[i], 0); + if (r) + break; + } } - } - return 0; + } while (r =3D=3D -ENOMEM); + kvm_s390_free_mmu_cache(mc); +out: + kvfree(keys); + return r; } =20 /* @@ -2354,8 +2169,7 @@ static int kvm_s390_get_cmma(struct kvm *kvm, struct = kvm_s390_cmma_log *args, static int kvm_s390_get_cmma_bits(struct kvm *kvm, struct kvm_s390_cmma_log *args) { - unsigned long bufsize; - int srcu_idx, peek, ret; + int peek, ret; u8 *values; =20 if (!kvm->arch.use_cmma) @@ -2368,8 +2182,8 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, if (!peek && !kvm->arch.migration_mode) return -EINVAL; /* CMMA is disabled or was not used, or the buffer has length zero */ - bufsize =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); - if (!bufsize || !kvm->mm->context.uses_cmm) { + args->count =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); + if (!args->count || !uses_cmm(kvm->arch.gmap)) { memset(args, 0, sizeof(*args)); return 0; } @@ -2379,18 +2193,18 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, return 0; } =20 - values =3D vmalloc(bufsize); + values =3D vmalloc(args->count); if (!values) return -ENOMEM; =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - if (peek) - ret =3D kvm_s390_peek_cmma(kvm, args, values, bufsize); - else - ret =3D kvm_s390_get_cmma(kvm, args, values, bufsize); - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); + scoped_guard(read_lock, &kvm->mmu_lock) { + if (peek) + ret =3D dat_peek_cmma(args->start_gfn, kvm->arch.gmap->asce, &args->cou= nt, + values); + else + ret =3D dat_get_cmma(kvm->arch.gmap->asce, &args->start_gfn, &args->cou= nt, + values, &kvm->arch.cmma_dirty_pages); + } =20 if (kvm->arch.migration_mode) args->remaining =3D atomic64_read(&kvm->arch.cmma_dirty_pages); @@ -2412,11 +2226,9 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, static int kvm_s390_set_cmma_bits(struct kvm *kvm, const struct kvm_s390_cmma_log *args) { - unsigned long hva, mask, pgstev, i; - uint8_t *bits; - int srcu_idx, r =3D 0; - - mask =3D args->mask; + struct kvm_s390_mmu_cache *mc; + u8 *bits =3D NULL; + int r =3D 0; =20 if (!kvm->arch.use_cmma) return -ENXIO; @@ -2430,9 +2242,12 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, if (args->count =3D=3D 0) return 0; =20 + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; bits =3D vmalloc(array_size(sizeof(*bits), args->count)); if (!bits) - return -ENOMEM; + goto out; =20 r =3D copy_from_user(bits, (void __user *)args->values, args->count); if (r) { @@ -2440,29 +2255,19 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, goto out; } =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r) break; + scoped_guard(read_lock, &kvm->mmu_lock) { + r =3D dat_set_cmma_bits(mc, kvm->arch.gmap->asce, args->start_gfn, + args->count, args->mask, bits); } + } while (r =3D=3D -ENOMEM); =20 - pgstev =3D bits[i]; - pgstev =3D pgstev << 24; - mask &=3D _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT; - set_pgste_bits(kvm->mm, hva, mask, pgstev); - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); - - if (!kvm->mm->context.uses_cmm) { - mmap_write_lock(kvm->mm); - kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(kvm->mm); - } + set_bit(GMAP_FLAG_USES_CMM, &kvm->arch.gmap->flags); out: + kvm_s390_free_mmu_cache(mc); vfree(bits); return r; } @@ -2671,6 +2476,13 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struc= t kvm_pv_cmd *cmd) break; =20 mmap_write_lock(kvm->mm); + /* + * Disable creation of new THPs. Existing THPs can stay, they + * will be split when any part of them gets imported. + */ + mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, kvm->mm); + mm_flags_set(MMF_DISABLE_THP_COMPLETELY, kvm->mm); + set_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &kvm->arch.gmap->flags); r =3D gmap_helper_disable_cow_sharing(); mmap_write_unlock(kvm->mm); if (r) @@ -2918,9 +2730,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); =20 @@ -2933,7 +2742,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) return -EFAULT; } - return 0; } =20 @@ -2962,9 +2770,6 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm= , struct kvm_s390_mem_op *m return -EFAULT; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, mop->key, &success); =20 @@ -3322,11 +3127,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) if (type) goto out_err; #endif - - rc =3D s390_enable_sie(); - if (rc) - goto out_err; - rc =3D -ENOMEM; =20 if (!sclp.has_64bscao) @@ -3400,6 +3200,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) debug_register_view(kvm->arch.dbf, &debug_sprintf_view); VM_EVENT(kvm, 3, "vm created with type %lu", type); =20 + kvm->arch.mem_limit =3D type & KVM_VM_S390_UCONTROL ? KVM_S390_NO_MEM_LIM= IT : sclp.hamax + 1; + kvm->arch.gmap =3D gmap_new(kvm, gpa_to_gfn(kvm->arch.mem_limit)); + if (!kvm->arch.gmap) + goto out_err; + clear_bit(GMAP_FLAG_PFAULT_ENABLED, &kvm->arch.gmap->flags); + if (type & KVM_VM_S390_UCONTROL) { struct kvm_userspace_memory_region2 fake_memslot =3D { .slot =3D KVM_S390_UCONTROL_MEMSLOT, @@ -3409,23 +3215,15 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) .flags =3D 0, }; =20 - kvm->arch.gmap =3D NULL; - kvm->arch.mem_limit =3D KVM_S390_NO_MEM_LIMIT; /* one flat fake memslot covering the whole address-space */ mutex_lock(&kvm->slots_lock); KVM_BUG_ON(kvm_set_internal_memslot(kvm, &fake_memslot), kvm); mutex_unlock(&kvm->slots_lock); + set_bit(GMAP_FLAG_IS_UCONTROL, &kvm->arch.gmap->flags); } else { - if (sclp.hamax =3D=3D U64_MAX) - kvm->arch.mem_limit =3D TASK_SIZE_MAX; - else - kvm->arch.mem_limit =3D min_t(unsigned long, TASK_SIZE_MAX, - sclp.hamax + 1); - kvm->arch.gmap =3D gmap_create(current->mm, kvm->arch.mem_limit - 1); - if (!kvm->arch.gmap) - goto out_err; - kvm->arch.gmap->private =3D kvm; - kvm->arch.gmap->pfault_enabled =3D 0; + struct crst_table *table =3D dereference_asce(kvm->arch.gmap->asce); + + crst_table_init((void *)table, _CRSTE_HOLE(table->crstes[0].h.tt).val); } =20 kvm->arch.use_pfmfi =3D sclp.has_pfmfi; @@ -3459,8 +3257,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) sca_del_vcpu(vcpu); kvm_s390_update_topology_change_report(vcpu->kvm, 1); =20 - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) + gmap_remove_child(vcpu->arch.gmap); + vcpu->arch.gmap =3D gmap_put(vcpu->arch.gmap); + } =20 if (vcpu->kvm->arch.use_cmma) kvm_s390_vcpu_unsetup_cmma(vcpu); @@ -3468,6 +3269,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) if (kvm_s390_pv_cpu_get_handle(vcpu)) kvm_s390_pv_destroy_cpu(vcpu, &rc, &rrc); free_page((unsigned long)(vcpu->arch.sie_block)); + kvm_s390_free_mmu_cache(vcpu->arch.mc); } =20 void kvm_arch_destroy_vm(struct kvm *kvm) @@ -3494,25 +3296,14 @@ void kvm_arch_destroy_vm(struct kvm *kvm) =20 debug_unregister(kvm->arch.dbf); free_page((unsigned long)kvm->arch.sie_page2); - if (!kvm_is_ucontrol(kvm)) - gmap_remove(kvm->arch.gmap); kvm_s390_destroy_adapters(kvm); kvm_s390_clear_float_irqs(kvm); kvm_s390_vsie_destroy(kvm); + kvm->arch.gmap =3D gmap_put(kvm->arch.gmap); KVM_EVENT(3, "vm 0x%p destroyed", kvm); } =20 /* Section: vcpu related */ -static int __kvm_ucontrol_vcpu_init(struct kvm_vcpu *vcpu) -{ - vcpu->arch.gmap =3D gmap_create(current->mm, -1UL); - if (!vcpu->arch.gmap) - return -ENOMEM; - vcpu->arch.gmap->private =3D vcpu->kvm; - - return 0; -} - static void sca_del_vcpu(struct kvm_vcpu *vcpu) { struct esca_block *sca =3D vcpu->kvm->arch.sca; @@ -3853,9 +3644,15 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) int rc; =20 BUILD_BUG_ON(sizeof(struct sie_page) !=3D 4096); + vcpu->arch.mc =3D kvm_s390_new_mmu_cache(); + if (!vcpu->arch.mc) + return -ENOMEM; sie_page =3D (struct sie_page *) get_zeroed_page(GFP_KERNEL_ACCOUNT); - if (!sie_page) + if (!sie_page) { + kvm_s390_free_mmu_cache(vcpu->arch.mc); + vcpu->arch.mc =3D NULL; return -ENOMEM; + } =20 vcpu->arch.sie_block =3D &sie_page->sie_block; vcpu->arch.sie_block->itdba =3D virt_to_phys(&sie_page->itdb); @@ -3897,8 +3694,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->run->kvm_valid_regs |=3D KVM_SYNC_FPRS; =20 if (kvm_is_ucontrol(vcpu->kvm)) { - rc =3D __kvm_ucontrol_vcpu_init(vcpu); - if (rc) + rc =3D -ENOMEM; + vcpu->arch.gmap =3D gmap_new_child(vcpu->kvm->arch.gmap, -1UL); + if (!vcpu->arch.gmap) goto out_free_sie_block; } =20 @@ -3914,8 +3712,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; =20 out_ucontrol_uninit: - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + gmap_remove_child(vcpu->arch.gmap); + vcpu->arch.gmap =3D gmap_put(vcpu->arch.gmap); + } out_free_sie_block: free_page((unsigned long)(vcpu->arch.sie_block)); return rc; @@ -3979,32 +3779,6 @@ void kvm_s390_sync_request(int req, struct kvm_vcpu = *vcpu) kvm_s390_vcpu_request(vcpu); } =20 -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct kvm *kvm =3D gmap->private; - struct kvm_vcpu *vcpu; - unsigned long prefix; - unsigned long i; - - trace_kvm_s390_gmap_notifier(start, end, gmap_is_shadow(gmap)); - - if (gmap_is_shadow(gmap)) - return; - if (start >=3D 1UL << 31) - /* We are only interested in prefix pages */ - return; - kvm_for_each_vcpu(i, vcpu, kvm) { - /* match against both prefix pages */ - prefix =3D kvm_s390_get_prefix(vcpu); - if (prefix <=3D end && start <=3D prefix + 2*PAGE_SIZE - 1) { - VCPU_EVENT(vcpu, 2, "gmap notifier for %lx-%lx", - start, end); - kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); - } - } -} - bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) { /* do not poll with more than halt_poll_max_steal percent of steal time */ @@ -4386,72 +4160,53 @@ static bool ibs_enabled(struct kvm_vcpu *vcpu) return kvm_s390_test_cpuflags(vcpu, CPUSTAT_IBS); } =20 -static int __kvm_s390_fixup_fault_sync(struct gmap *gmap, gpa_t gaddr, uns= igned int flags) -{ - struct kvm *kvm =3D gmap->private; - gfn_t gfn =3D gpa_to_gfn(gaddr); - bool unlocked; - hva_t vmaddr; - gpa_t tmp; - int rc; - - if (kvm_is_ucontrol(kvm)) { - tmp =3D __gmap_translate(gmap, gaddr); - gfn =3D gpa_to_gfn(tmp); - } - - vmaddr =3D gfn_to_hva(kvm, gfn); - rc =3D fixup_user_fault(gmap->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!rc) - rc =3D __gmap_link(gmap, gaddr, vmaddr); - return rc; -} - -/** - * __kvm_s390_mprotect_many() - Apply specified protection to guest pages - * @gmap: the gmap of the guest - * @gpa: the starting guest address - * @npages: how many pages to protect - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: 0 in case of success, < 0 in case of error - see gmap_protect_= one() - * - * Context: kvm->srcu and gmap->mm need to be held in read mode - */ -int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, - unsigned long bits) +static int vcpu_ucontrol_translate(struct kvm_vcpu *vcpu, gpa_t *gaddr) { - unsigned int fault_flag =3D (prot & PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - gpa_t end =3D gpa + npages * PAGE_SIZE; + union crste *crstep; + union pte *ptep; int rc; =20 - for (; gpa < end; gpa =3D ALIGN(gpa + 1, rc)) { - rc =3D gmap_protect_one(gmap, gpa, prot, bits); - if (rc =3D=3D -EAGAIN) { - __kvm_s390_fixup_fault_sync(gmap, gpa, fault_flag); - rc =3D gmap_protect_one(gmap, gpa, prot, bits); + if (kvm_is_ucontrol(vcpu->kvm)) { + /* + * This translates the per-vCPU guest address into a + * fake guest address, which can then be used with the + * fake memslots that are identity mapping userspace. + * This allows ucontrol VMs to use the normal fault + * resolution path, like normal VMs. + */ + rc =3D dat_entry_walk(NULL, gpa_to_gfn(*gaddr), vcpu->arch.gmap->asce, + 0, TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) { + vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; + vcpu->run->s390_ucontrol.trans_exc_code =3D *gaddr; + vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; + return -EREMOTE; } - if (rc < 0) - return rc; + *gaddr &=3D ~_SEGMENT_MASK; + *gaddr |=3D dat_get_ptval(pte_table_start(ptep), PTVAL_VMADDR) << _SEGME= NT_SHIFT; } - return 0; } =20 -static int kvm_s390_mprotect_notify_prefix(struct kvm_vcpu *vcpu) +static int kvm_s390_fixup_prefix(struct kvm_vcpu *vcpu) { gpa_t gaddr =3D kvm_s390_get_prefix(vcpu); - int idx, rc; - - idx =3D srcu_read_lock(&vcpu->kvm->srcu); - mmap_read_lock(vcpu->arch.gmap->mm); + gfn_t gfn; + int rc; =20 - rc =3D __kvm_s390_mprotect_many(vcpu->arch.gmap, gaddr, 2, PROT_WRITE, GM= AP_NOTIFY_MPROT); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + gfn =3D gpa_to_gfn(gaddr); =20 - mmap_read_unlock(vcpu->arch.gmap->mm); - srcu_read_unlock(&vcpu->kvm->srcu, idx); + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn, true); + if (rc) + return rc; + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn + 1, true); + if (rc) + return rc; =20 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_set_prefix_notif_bit(vcpu->kvm->arch.gmap->asce, gfn); return rc; } =20 @@ -4471,7 +4226,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) { int rc; =20 - rc =3D kvm_s390_mprotect_notify_prefix(vcpu); + rc =3D kvm_s390_fixup_prefix(vcpu); if (rc) { kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); return rc; @@ -4520,8 +4275,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) * Re-enable CMM virtualization if CMMA is available and * CMM has been used. */ - if ((vcpu->kvm->arch.use_cmma) && - (vcpu->kvm->mm->context.uses_cmm)) + if (vcpu->kvm->arch.use_cmma && uses_cmm(vcpu->arch.gmap)) vcpu->arch.sie_block->ecb2 |=3D ECB2_CMMA; goto retry; } @@ -4633,7 +4387,7 @@ bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) return false; if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) return false; - if (!vcpu->arch.gmap->pfault_enabled) + if (!pfault_enabled(vcpu->arch.gmap)) return false; =20 hva =3D gfn_to_hva(vcpu->kvm, current->thread.gmap_teid.addr); @@ -4726,98 +4480,25 @@ static void kvm_s390_assert_primary_as(struct kvm_v= cpu *vcpu) current->thread.gmap_int_code, current->thread.gmap_teid.val); } =20 -/* - * __kvm_s390_handle_dat_fault() - handle a dat fault for the gmap of a vc= pu - * @vcpu: the vCPU whose gmap is to be fixed up - * @gfn: the guest frame number used for memslots (including fake memslots) - * @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps - * @foll: FOLL_* flags - * - * Return: 0 on success, < 0 in case of error. - * Context: The mm lock must not be held before calling. May sleep. - */ -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int foll) -{ - struct kvm_memory_slot *slot; - unsigned int fault_flags; - bool writable, unlocked; - unsigned long vmaddr; - struct page *page; - kvm_pfn_t pfn; +static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, gpa_t gaddr, bool= wr) +{ + struct guest_fault f =3D { + .write_attempt =3D wr, + .attempt_pfault =3D pfault_enabled(vcpu->arch.gmap), + }; int rc; =20 - slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); - if (!slot || slot->flags & KVM_MEMSLOT_INVALID) - return vcpu_post_run_addressing_exception(vcpu); - - fault_flags =3D foll & FOLL_WRITE ? FAULT_FLAG_WRITE : 0; - if (vcpu->arch.gmap->pfault_enabled) - foll |=3D FOLL_NOWAIT; - vmaddr =3D __gfn_to_hva_memslot(slot, gfn); - -try_again: - pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &writable, &page); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + f.gfn =3D gpa_to_gfn(gaddr); =20 - /* Access outside memory, inject addressing exception */ - if (is_noslot_pfn(pfn)) + rc =3D kvm_s390_faultin_gfn(vcpu, NULL, &f); + if (rc <=3D 0) + return rc; + if (rc =3D=3D PGM_ADDRESSING) return vcpu_post_run_addressing_exception(vcpu); - /* Signal pending: try again */ - if (pfn =3D=3D KVM_PFN_ERR_SIGPENDING) - return -EAGAIN; - - /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT) = */ - if (pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { - trace_kvm_s390_major_guest_pfault(vcpu); - if (kvm_arch_setup_async_pf(vcpu)) - return 0; - vcpu->stat.pfault_sync++; - /* Could not setup async pfault, try again synchronously */ - foll &=3D ~FOLL_NOWAIT; - goto try_again; - } - /* Any other error */ - if (is_error_pfn(pfn)) - return -EFAULT; - - /* Success */ - mmap_read_lock(vcpu->arch.gmap->mm); - /* Mark the userspace PTEs as young and/or dirty, to avoid page fault loo= ps */ - rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); - if (!rc) - rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { - kvm_release_faultin_page(vcpu->kvm, page, false, writable); - } - mmap_read_unlock(vcpu->arch.gmap->mm); - return rc; -} - -static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gad= dr, unsigned int foll) -{ - unsigned long gaddr_tmp; - gfn_t gfn; - - gfn =3D gpa_to_gfn(gaddr); - if (kvm_is_ucontrol(vcpu->kvm)) { - /* - * This translates the per-vCPU guest address into a - * fake guest address, which can then be used with the - * fake memslots that are identity mapping userspace. - * This allows ucontrol VMs to use the normal fault - * resolution path, like normal VMs. - */ - mmap_read_lock(vcpu->arch.gmap->mm); - gaddr_tmp =3D __gmap_translate(vcpu->arch.gmap, gaddr); - mmap_read_unlock(vcpu->arch.gmap->mm); - if (gaddr_tmp =3D=3D -EFAULT) { - vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; - vcpu->run->s390_ucontrol.trans_exc_code =3D gaddr; - vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; - return -EREMOTE; - } - gfn =3D gpa_to_gfn(gaddr_tmp); - } - return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, foll); + KVM_BUG_ON(rc, vcpu->kvm); + return -EINVAL; } =20 static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu) @@ -4994,7 +4675,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) =20 exit_reason =3D kvm_s390_enter_exit_sie(vcpu->arch.sie_block, vcpu->run->s.regs.gprs, - vcpu->arch.gmap->asce); + vcpu->arch.gmap->asce.val); =20 __enable_cpu_timer_accounting(vcpu); guest_timing_exit_irqoff(); @@ -5529,8 +5210,8 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcp= u, struct kvm_s390_mem_op *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_INJECT_EXCEPTION | @@ -5552,32 +5233,21 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *v= cpu, if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r =3D check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, acc_mode, mop->key); - goto out_inject; - } - if (acc_mode =3D=3D GACC_FETCH) { + } else if (acc_mode =3D=3D GACC_FETCH) { r =3D read_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); - if (r) - goto out_inject; - if (copy_to_user(uaddr, tmpbuf, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (!r && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (copy_from_user(tmpbuf, uaddr, mop->size)) + return -EFAULT; r =3D write_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); } =20 -out_inject: if (r > 0 && (mop->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION) !=3D 0) kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm); =20 -out_free: - vfree(tmpbuf); return r; } =20 @@ -5767,37 +5437,39 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } #ifdef CONFIG_KVM_S390_UCONTROL case KVM_S390_UCAS_MAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.user_addr | ucas.vcpu_addr | ucas.length, _SEGMENT_= SIZE)) break; - } =20 - r =3D gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr, - ucasmap.vcpu_addr, ucasmap.length); + r =3D gmap_ucas_map(vcpu->arch.gmap, gpa_to_gfn(ucas.user_addr), + gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); break; } case KVM_S390_UCAS_UNMAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.vcpu_addr | ucas.length, _SEGMENT_SIZE)) break; - } =20 - r =3D gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr, - ucasmap.length); + gmap_ucas_unmap(vcpu->arch.gmap, gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); + r =3D 0; break; } #endif @@ -5970,34 +5642,39 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *new, enum kvm_mr_change change) { + struct kvm_s390_mmu_cache *mc =3D NULL; int rc =3D 0; =20 - if (kvm_is_ucontrol(kvm)) + if (change =3D=3D KVM_MR_FLAGS_ONLY) return; =20 + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + rc =3D -ENOMEM; + goto out; + } + switch (change) { case KVM_MR_DELETE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->npa= ges); break; case KVM_MR_MOVE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->npa= ges); if (rc) break; fallthrough; case KVM_MR_CREATE: - rc =3D gmap_map_segment(kvm->arch.gmap, new->userspace_addr, - new->base_gfn * PAGE_SIZE, - new->npages * PAGE_SIZE); + rc =3D dat_create_slot(mc, kvm->arch.gmap->asce, new->base_gfn, new->npa= ges); break; case KVM_MR_FLAGS_ONLY: break; default: WARN(1, "Unknown KVM MR CHANGE: %d\n", change); } +out: if (rc) pr_warn("failed to commit memory region\n"); + kvm_s390_free_mmu_cache(mc); return; } =20 @@ -6011,7 +5688,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, */ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return dat_test_age_gfn(kvm->arch.gmap->asce, range->start, range->end); } =20 /** @@ -6024,7 +5702,8 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn= _range *range) */ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return gmap_age_gfn(kvm->arch.gmap, range->start, range->end); } =20 /** @@ -6041,7 +5720,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) */ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + return gmap_unmap_gfn_range(kvm->arch.gmap, range->slot, range->start, ra= nge->end); } =20 static inline unsigned long nonhyp_mask(int i) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index c44c52266e26..bf1d7798c1af 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -19,6 +19,8 @@ #include #include #include +#include "dat.h" +#include "gmap.h" =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 @@ -114,9 +116,7 @@ static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) static inline int kvm_is_ucontrol(struct kvm *kvm) { #ifdef CONFIG_KVM_S390_UCONTROL - if (kvm->arch.gmap) - return 0; - return 1; + return test_bit(GMAP_FLAG_IS_UCONTROL, &kvm->arch.gmap->flags); #else return 0; #endif @@ -440,14 +440,9 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu); /* implemented in vsie.c */ int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu); void kvm_s390_vsie_kick(struct kvm_vcpu *vcpu); -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end= ); void kvm_s390_vsie_init(struct kvm *kvm); void kvm_s390_vsie_destroy(struct kvm *kvm); -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level); - -/* implemented in gmap-vsie.c */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level); =20 /* implemented in sigp.c */ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); @@ -469,15 +464,9 @@ void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu); void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm); __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu); int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rc, u16 *rrc); -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int flags); int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, unsigned long bits); =20 -static inline int kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gpa_t g= addr, unsigned int flags) -{ - return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); -} - bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); =20 /* implemented in diag.c */ diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 0b14d894f38a..a3250ad83a8e 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -21,13 +21,14 @@ #include #include #include -#include #include #include #include +#include #include "gaccess.h" #include "kvm-s390.h" #include "trace.h" +#include "gmap.h" =20 static int handle_ri(struct kvm_vcpu *vcpu) { @@ -222,7 +223,7 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu) if (vcpu->arch.skey_enabled) return 0; =20 - rc =3D s390_enable_skey(); + rc =3D gmap_enable_skeys(vcpu->arch.gmap); VCPU_EVENT(vcpu, 3, "enabling storage keys for guest: %d", rc); if (rc) return rc; @@ -255,10 +256,9 @@ static int try_handle_skey(struct kvm_vcpu *vcpu) =20 static int handle_iske(struct kvm_vcpu *vcpu) { - unsigned long gaddr, vmaddr; - unsigned char key; + unsigned long gaddr; int reg1, reg2; - bool unlocked; + union skey key; int rc; =20 vcpu->stat.instruction_iske++; @@ -275,37 +275,21 @@ static int handle_iske(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D get_guest_storage_key(current->mm, vmaddr, &key); - - if (rc) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr), &ke= y); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; vcpu->run->s.regs.gprs[reg1] &=3D ~0xff; - vcpu->run->s.regs.gprs[reg1] |=3D key; + vcpu->run->s.regs.gprs[reg1] |=3D key.skey; return 0; } =20 static int handle_rrbe(struct kvm_vcpu *vcpu) { - unsigned long vmaddr, gaddr; + unsigned long gaddr; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_rrbe++; @@ -322,24 +306,10 @@ static int handle_rrbe(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D reset_guest_reference_bit(current->mm, vmaddr); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_reset_reference_bit(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr)); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; kvm_s390_set_psw_cc(vcpu, rc); @@ -354,9 +324,8 @@ static int handle_sske(struct kvm_vcpu *vcpu) { unsigned char m3 =3D vcpu->arch.sie_block->ipb >> 28; unsigned long start, end; - unsigned char key, oldkey; + union skey key, oldkey; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_sske++; @@ -377,7 +346,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) =20 kvm_s390_get_regs_rre(vcpu, ®1, ®2); =20 - key =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); if (m3 & SSKE_MB) { @@ -389,27 +358,17 @@ static int handle_sske(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - unlocked =3D false; - - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, key, &oldkey, - m3 & SSKE_NQ, m3 & SSKE_MR, - m3 & SSKE_MC); - - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, &oldkey, + m3 & SSKE_NQ, m3 & SSKE_MR, m3 & SSKE_MC); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) + if (rc > 1) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; start +=3D PAGE_SIZE; @@ -422,7 +381,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) } else { kvm_s390_set_psw_cc(vcpu, rc); vcpu->run->s.regs.gprs[reg1] &=3D ~0xff00UL; - vcpu->run->s.regs.gprs[reg1] |=3D (u64) oldkey << 8; + vcpu->run->s.regs.gprs[reg1] |=3D (u64)oldkey.skey << 8; } } if (m3 & SSKE_MB) { @@ -1082,7 +1041,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) bool mr =3D false, mc =3D false, nq; int reg1, reg2; unsigned long start, end; - unsigned char key; + union skey key; =20 vcpu->stat.instruction_pfmf++; =20 @@ -1110,7 +1069,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 nq =3D vcpu->run->s.regs.gprs[reg1] & PFMF_NQ; - key =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); =20 @@ -1141,14 +1100,6 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr; - bool unlocked =3D false; - - /* Translate guest address to host address */ - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (vcpu->run->s.regs.gprs[reg1] & PFMF_CF) { if (kvm_clear_guest(vcpu->kvm, start, PAGE_SIZE)) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); @@ -1159,19 +1110,17 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) =20 if (rc) return rc; - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, - key, NULL, nq, mr, mc); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, + NULL, nq, mr, mc); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc > 1) + return kvm_s390_inject_program_int(vcpu, rc); + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; } @@ -1195,8 +1144,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc) { int r1, r2, nappended, entries; - unsigned long gfn, hva, res, pgstev, ptev; + union essa_state state; unsigned long *cbrlo; + unsigned long gfn; + bool dirtied; =20 /* * We don't need to set SD.FPF.SK to 1 here, because if we have a @@ -1205,33 +1156,12 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) =20 kvm_s390_get_regs_rre(vcpu, &r1, &r2); gfn =3D vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT; - hva =3D gfn_to_hva(vcpu->kvm, gfn); entries =3D (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; =20 - if (kvm_is_error_hva(hva)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - nappended =3D pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev); - if (nappended < 0) { - res =3D orc ? 0x10 : 0; - vcpu->run->s.regs.gprs[r1] =3D res; /* Exception Indication */ + nappended =3D dat_perform_essa(vcpu->arch.gmap->asce, gfn, orc, &state, &= dirtied); + vcpu->run->s.regs.gprs[r1] =3D state.val; + if (nappended < 0) return 0; - } - res =3D (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22; - /* - * Set the block-content state part of the result. 0 means resident, so - * nothing to do if the page is valid. 2 is for preserved pages - * (non-present and non-zero), and 3 for zero pages (non-present and - * zero). - */ - if (ptev & _PAGE_INVALID) { - res |=3D 2; - if (pgstev & _PGSTE_GPS_ZERO) - res |=3D 1; - } - if (pgstev & _PGSTE_GPS_NODAT) - res |=3D 0x20; - vcpu->run->s.regs.gprs[r1] =3D res; /* * It is possible that all the normal 511 slots were full, in which case * we will now write in the 512th slot, which is reserved for host use. @@ -1243,17 +1173,34 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) cbrlo[entries] =3D gfn << PAGE_SHIFT; } =20 - if (orc) { - struct kvm_memory_slot *ms =3D gfn_to_memslot(vcpu->kvm, gfn); - - /* Increment only if we are really flipping the bit */ - if (ms && !test_and_set_bit(gfn - ms->base_gfn, kvm_second_dirty_bitmap(= ms))) - atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); - } + if (dirtied) + atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); =20 return nappended; } =20 +static void _essa_clear_cbrl(struct kvm_vcpu *vcpu, unsigned long *cbrl, i= nt len) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int i; + + lockdep_assert_held(&vcpu->kvm->mmu_lock); + + for (i =3D 0; i < len; i++) { + if (dat_entry_walk(NULL, gpa_to_gfn(cbrl[i]), vcpu->arch.gmap->asce, + 0, TABLE_TYPE_PAGE_TABLE, &crstep, &ptep)) + continue; + if (!ptep || ptep->s.pr) + continue; + pgste =3D pgste_get_lock(ptep); + if (pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED || pgste.zero) + gmap_helper_zap_one_page(vcpu->kvm->mm, cbrl[i]); + pgste_set_unlock(ptep, pgste); + } +} + static int handle_essa(struct kvm_vcpu *vcpu) { lockdep_assert_held(&vcpu->kvm->srcu); @@ -1261,11 +1208,9 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* entries expected to be 1FF */ int entries =3D (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; unsigned long *cbrlo; - struct gmap *gmap; int i, orc; =20 VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries); - gmap =3D vcpu->arch.gmap; vcpu->stat.instruction_essa++; if (!vcpu->kvm->arch.use_cmma) return kvm_s390_inject_program_int(vcpu, PGM_OPERATION); @@ -1289,11 +1234,7 @@ static int handle_essa(struct kvm_vcpu *vcpu) * value really needs to be written to; if the value is * already correct, we do nothing and avoid the lock. */ - if (vcpu->kvm->mm->context.uses_cmm =3D=3D 0) { - mmap_write_lock(vcpu->kvm->mm); - vcpu->kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(vcpu->kvm->mm); - } + set_bit(GMAP_FLAG_USES_CMM, &vcpu->arch.gmap->flags); /* * If we are here, we are supposed to have CMMA enabled in * the SIE block. Enabling CMMA works on a per-CPU basis, @@ -1307,20 +1248,22 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* Retry the ESSA instruction */ kvm_s390_retry_instr(vcpu); } else { - mmap_read_lock(vcpu->kvm->mm); - i =3D __do_essa(vcpu, orc); - mmap_read_unlock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + i =3D __do_essa(vcpu, orc); if (i < 0) return i; /* Account for the possible extra cbrl entry */ entries +=3D i; } - vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; /* reset nceo */ + /* reset nceo */ + vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; cbrlo =3D phys_to_virt(vcpu->arch.sie_block->cbrlo); - mmap_read_lock(gmap->mm); - for (i =3D 0; i < entries; ++i) - __gmap_zap(gmap, cbrlo[i]); - mmap_read_unlock(gmap->mm); + + mmap_read_lock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + _essa_clear_cbrl(vcpu, cbrlo, entries); + mmap_read_unlock(vcpu->kvm->mm); + return 0; } =20 diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index 6ba5a0305e25..b6809ee0bfa5 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -12,13 +12,16 @@ #include #include #include -#include #include #include #include #include #include #include "kvm-s390.h" +#include "dat.h" +#include "gaccess.h" +#include "gmap.h" +#include "faultin.h" =20 bool kvm_s390_pv_is_protected(struct kvm *kvm) { @@ -34,6 +37,85 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_s390_pv_cpu_is_protected); =20 +/** + * should_export_before_import - Determine whether an export is needed + * before an import-like operation + * @uvcb: the Ultravisor control block of the UVC to be performed + * @mm: the mm of the process + * + * Returns whether an export is needed before every import-like operation. + * This is needed for shared pages, which don't trigger a secure storage + * exception when accessed from a different guest. + * + * Although considered as one, the Unpin Page UVC is not an actual import, + * so it is not affected. + * + * No export is needed also when there is only one protected VM, because t= he + * page cannot belong to the wrong VM in that case (there is no "other VM" + * it can belong to). + * + * Return: true if an export is needed before every import, otherwise fals= e. + */ +static bool should_export_before_import(struct uv_cb_header *uvcb, struct = mm_struct *mm) +{ + /* + * The misc feature indicates, among other things, that importing a + * shared page from a different protected VM will automatically also + * transfer its ownership. + */ + if (uv_has_feature(BIT_UV_FEAT_MISC)) + return false; + if (uvcb->cmd =3D=3D UVC_CMD_UNPIN_PAGE_SHARED) + return false; + return atomic_read(&mm->context.protected_count) > 1; +} + +struct pv_make_secure { + void *uvcb; + struct folio *folio; + int rc; + bool needs_export; +}; + +static int __kvm_s390_pv_make_secure(struct guest_fault *f, struct folio *= folio) +{ + struct pv_make_secure *priv =3D f->priv; + int rc; + + if (priv->needs_export) + uv_convert_from_secure(folio_to_phys(folio)); + + if (folio_test_hugetlb(folio)) + return -EFAULT; + if (folio_test_large(folio)) + return -E2BIG; + + if (!f->page) + folio_get(folio); + rc =3D __make_folio_secure(folio, priv->uvcb); + if (!f->page) + folio_put(folio); + + return rc; +} + +static void _kvm_s390_pv_make_secure(struct guest_fault *f) +{ + struct pv_make_secure *priv =3D f->priv; + struct folio *folio; + + folio =3D pfn_folio(f->pfn); + priv->rc =3D -EAGAIN; + if (folio_trylock(folio)) { + priv->rc =3D __kvm_s390_pv_make_secure(f, folio); + if (priv->rc =3D=3D -E2BIG || priv->rc =3D=3D -EBUSY) { + priv->folio =3D folio; + folio_get(folio); + } + folio_unlock(folio); + } +} + /** * kvm_s390_pv_make_secure() - make one guest page secure * @kvm: the guest @@ -45,14 +127,34 @@ EXPORT_SYMBOL_GPL(kvm_s390_pv_cpu_is_protected); */ int kvm_s390_pv_make_secure(struct kvm *kvm, unsigned long gaddr, void *uv= cb) { - unsigned long vmaddr; + struct pv_make_secure priv =3D { .uvcb =3D uvcb }; + struct guest_fault f =3D { + .write_attempt =3D true, + .gfn =3D gpa_to_gfn(gaddr), + .callback =3D _kvm_s390_pv_make_secure, + .priv =3D &priv, + }; + int rc; =20 lockdep_assert_held(&kvm->srcu); =20 - vmaddr =3D gfn_to_hva(kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return -EFAULT; - return make_hva_secure(kvm->mm, vmaddr, uvcb); + priv.needs_export =3D should_export_before_import(uvcb, kvm->mm); + + scoped_guard(mutex, &kvm->arch.pv.import_lock) { + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &f); + + if (!rc) { + rc =3D priv.rc; + if (priv.folio) { + rc =3D s390_wiggle_split_folio(kvm->mm, priv.folio); + if (!rc) + rc =3D -EAGAIN; + } + } + } + if (priv.folio) + folio_put(priv.folio); + return rc; } =20 int kvm_s390_pv_convert_to_secure(struct kvm *kvm, unsigned long gaddr) @@ -299,35 +401,6 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm= *kvm, return 0; } =20 -/** - * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest me= mory. - * @kvm: the VM whose memory is to be cleared. - * - * Destroy the first 2GB of guest memory, to avoid prefix issues after reb= oot. - * The CPUs of the protected VM need to be destroyed beforehand. - */ -static void kvm_s390_destroy_lower_2g(struct kvm *kvm) -{ - const unsigned long pages_2g =3D SZ_2G / PAGE_SIZE; - struct kvm_memory_slot *slot; - unsigned long len; - int srcu_idx; - - srcu_idx =3D srcu_read_lock(&kvm->srcu); - - /* Take the memslot containing guest absolute address 0 */ - slot =3D gfn_to_memslot(kvm, 0); - /* Clear all slots or parts thereof that are below 2GB */ - while (slot && slot->base_gfn < pages_2g) { - len =3D min_t(u64, slot->npages, pages_2g - slot->base_gfn) * PAGE_SIZE; - s390_uv_destroy_range(kvm->mm, slot->userspace_addr, slot->userspace_add= r + len); - /* Take the next memslot */ - slot =3D gfn_to_memslot(kvm, slot->base_gfn + slot->npages); - } - - srcu_read_unlock(&kvm->srcu, srcu_idx); -} - static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, u16 *rc, u16 *rrc) { struct uv_cb_destroy_fast uvcb =3D { @@ -342,7 +415,6 @@ static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, = u16 *rc, u16 *rrc) *rc =3D uvcb.header.rc; if (rrc) *rrc =3D uvcb.header.rrc; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM FAST: rc %x rrc %x", uvcb.header.rc, uvcb.header.rrc); WARN_ONCE(cc && uvcb.header.rc !=3D 0x104, @@ -391,7 +463,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return -EINVAL; =20 /* Guest with segment type ASCE, refuse to destroy asynchronously */ - if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) + if (kvm->arch.gmap->asce.dt =3D=3D TABLE_TYPE_SEGMENT) return -EINVAL; =20 priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); @@ -404,8 +476,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) priv->stor_var =3D kvm->arch.pv.stor_var; priv->stor_base =3D kvm->arch.pv.stor_base; priv->handle =3D kvm_s390_pv_get_handle(kvm); - priv->old_gmap_table =3D (unsigned long)kvm->arch.gmap->table; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); + priv->old_gmap_table =3D (unsigned long)dereference_asce(kvm->arch.gmap-= >asce); if (s390_replace_asce(kvm->arch.gmap)) res =3D -ENOMEM; } @@ -415,7 +486,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return res; } =20 - kvm_s390_destroy_lower_2g(kvm); + gmap_pv_destroy_range(kvm->arch.gmap, 0, gpa_to_gfn(SZ_2G), false); kvm_s390_clear_pv_state(kvm); kvm->arch.pv.set_aside =3D priv; =20 @@ -449,7 +520,6 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16= *rrc) =20 cc =3D uv_cmd_nodata(kvm_s390_pv_get_handle(kvm), UVC_CMD_DESTROY_SEC_CONF, rc, rrc); - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); if (!cc) { atomic_dec(&kvm->mm->context.protected_count); kvm_s390_pv_dealloc_vm(kvm); @@ -532,7 +602,7 @@ int kvm_s390_pv_deinit_cleanup_all(struct kvm *kvm, u16= *rc, u16 *rrc) * cleanup has been performed. */ if (need_zap && mmget_not_zero(kvm->mm)) { - s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE); + gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asce),= false); mmput(kvm->mm); } =20 @@ -570,7 +640,7 @@ int kvm_s390_pv_deinit_aside_vm(struct kvm *kvm, u16 *r= c, u16 *rrc) return -EINVAL; =20 /* When a fatal signal is received, stop immediately */ - if (s390_uv_destroy_range_interruptible(kvm->mm, 0, TASK_SIZE_MAX)) + if (gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asc= e), true)) goto done; if (kvm_s390_pv_dispose_one_leftover(kvm, p, rc, rrc)) ret =3D -EIO; @@ -609,6 +679,7 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu= _notifier *subscription, r =3D kvm_s390_cpus_from_pv(kvm, &dummy, &dummy); if (!r && is_destroy_fast_available() && kvm_s390_pv_get_handle(kvm)) kvm_s390_pv_deinit_vm_fast(kvm, &dummy, &dummy); + set_bit(GMAP_FLAG_EXPORT_ON_UNMAP, &kvm->arch.gmap->flags); } =20 static const struct mmu_notifier_ops kvm_s390_pv_mmu_notifier_ops =3D { @@ -642,7 +713,7 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) /* Inputs */ uvcb.guest_stor_origin =3D 0; /* MSO is 0 for KVM */ uvcb.guest_stor_len =3D kvm->arch.pv.guest_len; - uvcb.guest_asce =3D kvm->arch.gmap->asce; + uvcb.guest_asce =3D kvm->arch.gmap->asce.val; uvcb.guest_sca =3D virt_to_phys(kvm->arch.sca); uvcb.conf_base_stor_origin =3D virt_to_phys((void *)kvm->arch.pv.stor_base); @@ -669,7 +740,6 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) } return -EIO; } - kvm->arch.gmap->guest_handle =3D uvcb.guest_handle; return 0; } =20 @@ -704,26 +774,14 @@ static int unpack_one(struct kvm *kvm, unsigned long = addr, u64 tweak, .tweak[1] =3D offset, }; int ret =3D kvm_s390_pv_make_secure(kvm, addr, &uvcb); - unsigned long vmaddr; - bool unlocked; =20 *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; =20 if (ret =3D=3D -ENXIO) { - mmap_read_lock(kvm->mm); - vmaddr =3D gfn_to_hva(kvm, gpa_to_gfn(addr)); - if (kvm_is_error_hva(vmaddr)) { - ret =3D -EFAULT; - } else { - ret =3D fixup_user_fault(kvm->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!ret) - ret =3D __gmap_link(kvm->arch.gmap, addr, vmaddr); - } - mmap_read_unlock(kvm->mm); + ret =3D kvm_s390_faultin_gfn_simple(NULL, kvm, gpa_to_gfn(addr), true); if (!ret) return -EAGAIN; - return ret; } =20 if (ret && ret !=3D -EAGAIN) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 1dd54ca3070a..840d1e9e3ae2 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -15,7 +15,6 @@ #include #include =20 -#include #include #include #include @@ -23,6 +22,7 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" =20 enum vsie_page_flags { VSIE_PAGE_IN_USE =3D 0, @@ -41,8 +41,11 @@ struct vsie_page { * are reused conditionally, should be accessed via READ_ONCE. */ struct kvm_s390_sie_block *scb_o; /* 0x0218 */ - /* the shadow gmap in use by the vsie_page */ - struct gmap *gmap; /* 0x0220 */ + /* + * Flags: must be set/cleared atomically after the vsie page can be + * looked up by other CPUs. + */ + unsigned long flags; /* 0x0220 */ /* address of the last reported fault to guest2 */ unsigned long fault_addr; /* 0x0228 */ /* calculated guest addresses of satellite control blocks */ @@ -57,33 +60,14 @@ struct vsie_page { * radix tree. */ gpa_t scb_gpa; /* 0x0258 */ - /* - * Flags: must be set/cleared atomically after the vsie page can be - * looked up by other CPUs. - */ - unsigned long flags; /* 0x0260 */ - __u8 reserved[0x0700 - 0x0268]; /* 0x0268 */ + /* the shadow gmap in use by the vsie_page */ + struct gmap_cache gmap_cache; /* 0x0260 */ + __u8 reserved[0x0700 - 0x0278]; /* 0x0278 */ struct kvm_s390_crypto_cb crycb; /* 0x0700 */ __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ }; =20 -/** - * gmap_shadow_valid() - check if a shadow guest address space matches the - * given properties and is still valid - * @sg: pointer to the shadow guest address space structure - * @asce: ASCE for which the shadow table is requested - * @edat_level: edat level to be used for the shadow translation - * - * Returns 1 if the gmap shadow is still valid and matches the given - * properties, the caller can continue using it. Returns 0 otherwise; the - * caller has to request a new shadow gmap in this case. - */ -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level) -{ - if (sg->removed) - return 0; - return sg->orig_asce =3D=3D asce && sg->edat_level =3D=3D edat_level; -} +static_assert(sizeof(struct vsie_page) =3D=3D PAGE_SIZE); =20 /* trigger a validity icpt for the given scb */ static int set_validity_icpt(struct kvm_s390_sie_block *scb, @@ -612,26 +596,17 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) return rc; } =20 -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end) { - struct kvm *kvm =3D gmap->private; - struct vsie_page *cur; + struct vsie_page *cur, *next; unsigned long prefix; - int i; =20 - if (!gmap_is_shadow(gmap)) - return; + KVM_BUG_ON(!test_bit(GMAP_FLAG_SHADOW, &gmap->flags), gmap->kvm); /* * Only new shadow blocks are added to the list during runtime, * therefore we can safely reference them all the time. */ - for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { - cur =3D READ_ONCE(kvm->arch.vsie.pages[i]); - if (!cur) - continue; - if (READ_ONCE(cur->gmap) !=3D gmap) - continue; + list_for_each_entry_safe(cur, next, &gmap->scb_users, gmap_cache.list) { prefix =3D cur->scb_s.prefix << GUEST_PREFIX_SHIFT; /* with mso/msl, the prefix lies at an offset */ prefix +=3D cur->scb_s.mso; @@ -667,9 +642,9 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vsi= e_page *vsie_page, struct /* with mso/msl, the prefix lies at offset *mso* */ prefix +=3D scb_s->mso; =20 - rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, prefix, NULL, true); if (!rc && (scb_s->ecb & ECB_TE)) - rc =3D kvm_s390_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, prefix + PAGE_SIZE, NULL, true); /* * We don't have to mprotect, we will be called for all unshadows. * SIE will detect if protection applies and trigger a validity. @@ -952,6 +927,7 @@ static int inject_fault(struct kvm_vcpu *vcpu, __u16 co= de, __u64 vaddr, */ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page= , struct gmap *sg) { + bool wr =3D kvm_s390_cur_gmap_fault_is_write(); int rc; =20 if ((current->thread.gmap_int_code & PGM_INT_CODE_MASK) =3D=3D PGM_PROTEC= TION) @@ -959,11 +935,10 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct= vsie_page *vsie_page, stru return inject_fault(vcpu, PGM_PROTECTION, current->thread.gmap_teid.addr * PAGE_SIZE, 1); =20 - rc =3D kvm_s390_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * P= AGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, sg, current->thread.gmap_teid.addr * PA= GE_SIZE, NULL, wr); if (rc > 0) { rc =3D inject_fault(vcpu, rc, - current->thread.gmap_teid.addr * PAGE_SIZE, - kvm_s390_cur_gmap_fault_is_write()); + current->thread.gmap_teid.addr * PAGE_SIZE, wr); if (rc >=3D 0) vsie_page->fault_addr =3D current->thread.gmap_teid.addr * PAGE_SIZE; } @@ -979,7 +954,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page, stru static void handle_last_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsi= e_page, struct gmap *sg) { if (vsie_page->fault_addr) - kvm_s390_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL); + gaccess_shadow_fault(vcpu, sg, vsie_page->fault_addr, NULL, true); vsie_page->fault_addr =3D 0; } =20 @@ -1064,8 +1039,9 @@ static u64 vsie_get_register(struct kvm_vcpu *vcpu, s= truct vsie_page *vsie_page, static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page, struct gmap *sg) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; - unsigned long pei_dest, pei_src, src, dest, mask, prefix; + unsigned long src, dest, mask, prefix; u64 *pei_block =3D &vsie_page->scb_o->mcic; + union mvpg_pei pei_dest, pei_src; int edat, rc_dest, rc_src; union ctlreg0 cr0; =20 @@ -1079,8 +1055,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page, src =3D vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; src =3D _kvm_s390_real_to_abs(prefix, src) + scb_s->mso; =20 - rc_dest =3D kvm_s390_shadow_fault(vcpu, sg, dest, &pei_dest); - rc_src =3D kvm_s390_shadow_fault(vcpu, sg, src, &pei_src); + rc_dest =3D gaccess_shadow_fault(vcpu, sg, dest, &pei_dest, true); + rc_src =3D gaccess_shadow_fault(vcpu, sg, src, &pei_src, false); /* * Either everything went well, or something non-critical went wrong * e.g. because of a race. In either case, simply retry. @@ -1115,8 +1091,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page, rc_src =3D rc_src !=3D PGM_PAGE_TRANSLATION ? rc_src : 0; } if (!rc_dest && !rc_src) { - pei_block[0] =3D pei_dest; - pei_block[1] =3D pei_src; + pei_block[0] =3D pei_dest.val; + pei_block[1] =3D pei_src.val; return 1; } =20 @@ -1187,7 +1163,7 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page, struc goto xfer_to_guest_mode_check; } guest_timing_enter_irqoff(); - rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce); + rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, sg->asce.v= al); guest_timing_exit_irqoff(); local_irq_enable(); } @@ -1237,43 +1213,64 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struc= t vsie_page *vsie_page, struc =20 static void release_gmap_shadow(struct vsie_page *vsie_page) { - if (vsie_page->gmap) - gmap_put(vsie_page->gmap); - WRITE_ONCE(vsie_page->gmap, NULL); + struct gmap *gmap =3D vsie_page->gmap_cache.gmap; + + KVM_BUG_ON(!gmap->parent, gmap->kvm); + lockdep_assert_held(&gmap->parent->children_lock); + + list_del(&vsie_page->gmap_cache.list); + vsie_page->gmap_cache.gmap =3D NULL; prefix_unmapped(vsie_page); + + if (list_empty(&gmap->scb_users)) { + gmap_remove_child(gmap); + gmap_put(gmap); + } } =20 -static int acquire_gmap_shadow(struct kvm_vcpu *vcpu, - struct vsie_page *vsie_page) +static struct gmap *acquire_gmap_shadow(struct kvm_vcpu *vcpu, struct vsie= _page *vsie_page) { - unsigned long asce; union ctlreg0 cr0; struct gmap *gmap; + union asce asce; int edat; =20 - asce =3D vcpu->arch.sie_block->gcr[1]; + asce.val =3D vcpu->arch.sie_block->gcr[1]; cr0.val =3D vcpu->arch.sie_block->gcr[0]; edat =3D cr0.edat && test_kvm_facility(vcpu->kvm, 8); edat +=3D edat && test_kvm_facility(vcpu->kvm, 78); =20 - /* - * ASCE or EDAT could have changed since last icpt, or the gmap - * we're holding has been unshadowed. If the gmap is still valid, - * we can safely reuse it. - */ - if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat)) { - vcpu->kvm->stat.gmap_shadow_reuse++; - return 0; + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + gmap =3D vsie_page->gmap_cache.gmap; + if (gmap) { + /* + * ASCE or EDAT could have changed since last icpt, or the gmap + * we're holding has been unshadowed. If the gmap is still valid, + * we can safely reuse it. + */ + if (gmap_is_shadow_valid(gmap, asce, edat)) { + vcpu->kvm->stat.gmap_shadow_reuse++; + gmap_get(gmap); + return gmap; + } + /* release the old shadow and mark the prefix as unmapped */ + release_gmap_shadow(vsie_page); + } } - - /* release the old shadow - if any, and mark the prefix as unmapped */ - release_gmap_shadow(vsie_page); - gmap =3D gmap_shadow(vcpu->arch.gmap, asce, edat); + gmap =3D gmap_create_shadow(vcpu->arch.mc, vcpu->kvm->arch.gmap, asce, ed= at); if (IS_ERR(gmap)) - return PTR_ERR(gmap); - vcpu->kvm->stat.gmap_shadow_create++; - WRITE_ONCE(vsie_page->gmap, gmap); - return 0; + return gmap; + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + /* unlikely race condition, remove the previous shadow */ + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); + vcpu->kvm->stat.gmap_shadow_create++; + list_add(&vsie_page->gmap_cache.list, &gmap->scb_users); + vsie_page->gmap_cache.gmap =3D gmap; + prefix_unmapped(vsie_page); + gmap_get(gmap); + } + return gmap; } =20 /* @@ -1330,8 +1327,11 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vs= ie_page *vsie_page) int rc =3D 0; =20 while (1) { - rc =3D acquire_gmap_shadow(vcpu, vsie_page); - sg =3D vsie_page->gmap; + sg =3D acquire_gmap_shadow(vcpu, vsie_page); + if (IS_ERR(sg)) { + rc =3D PTR_ERR(sg); + sg =3D NULL; + } if (!rc) rc =3D map_prefix(vcpu, vsie_page, sg); if (!rc) { @@ -1359,6 +1359,9 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsi= e_page *vsie_page) kvm_s390_rewind_psw(vcpu, 4); break; } + if (sg) + sg =3D gmap_put(sg); + cond_resched(); } =20 if (rc =3D=3D -EFAULT) { @@ -1455,8 +1458,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kv= m, unsigned long addr) vsie_page->scb_gpa =3D ULONG_MAX; =20 /* Double use of the same address or allocation failure. */ - if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, - vsie_page)) { + if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, vsie_page)= ) { put_vsie_page(vsie_page); mutex_unlock(&kvm->arch.vsie.mutex); return NULL; @@ -1465,7 +1467,12 @@ static struct vsie_page *get_vsie_page(struct kvm *k= vm, unsigned long addr) mutex_unlock(&kvm->arch.vsie.mutex); =20 memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); - release_gmap_shadow(vsie_page); + if (vsie_page->gmap_cache.gmap) { + scoped_guard(spinlock, &kvm->arch.gmap->children_lock) + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); + } + prefix_unmapped(vsie_page); vsie_page->fault_addr =3D 0; vsie_page->scb_s.ihcpu =3D 0xffffU; return vsie_page; @@ -1541,8 +1548,10 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) mutex_lock(&kvm->arch.vsie.mutex); for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { vsie_page =3D kvm->arch.vsie.pages[i]; + scoped_guard(spinlock, &kvm->arch.gmap->children_lock) + if (vsie_page->gmap_cache.gmap) + release_gmap_shadow(vsie_page); kvm->arch.vsie.pages[i] =3D NULL; - release_gmap_shadow(vsie_page); /* free the radix tree entry */ if (vsie_page->scb_gpa !=3D ULONG_MAX) radix_tree_delete(&kvm->arch.vsie.addr_to_page, diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c index 1a6ba105e071..0ac2f3998b14 100644 --- a/arch/s390/lib/uaccess.c +++ b/arch/s390/lib/uaccess.c @@ -34,136 +34,19 @@ void debug_user_asce(int exit) } #endif /*CONFIG_DEBUG_ENTRY */ =20 -union oac { - unsigned int val; - struct { - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac1; - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac2; - }; -}; - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_from_user_key(void *to, const void __user *from, unsigned long si= ze, unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac2.key =3D key, - .oac2.as =3D PSW_BITS_AS_SECONDARY, - .oac2.k =3D 1, - .oac2.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_FROM(0b, 0b) - EX_TABLE_UA_MVCOS_FROM(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char __user *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_from_user_key(void *to, const void __user *from, - unsigned long n, unsigned long key) -{ - unsigned long res =3D n; - - might_fault(); - if (!should_fail_usercopy()) { - instrument_copy_from_user_before(to, from, n); - res =3D raw_copy_from_user_key(to, from, n, key); - instrument_copy_from_user_after(to, from, n, res); - } - if (unlikely(res)) - memset(to + (n - res), 0, res); - return res; -} -EXPORT_SYMBOL(_copy_from_user_key); - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_to_user_key(void __user *to, const void *from, unsigned long size= , unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac1.key =3D key, - .oac1.as =3D PSW_BITS_AS_SECONDARY, - .oac1.k =3D 1, - .oac1.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_TO(0b, 0b) - EX_TABLE_UA_MVCOS_TO(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char __user *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_to_user_key(void __user *to, const void *from, - unsigned long n, unsigned long key) -{ - might_fault(); - if (should_fail_usercopy()) - return n; - instrument_copy_to_user(to, from, n); - return raw_copy_to_user_key(to, from, n, key); -} -EXPORT_SYMBOL(_copy_to_user_key); - #define CMPXCHG_USER_KEY_MAX_LOOPS 128 =20 -static nokprobe_inline int __cmpxchg_user_key_small(unsigned long address,= unsigned int *uval, - unsigned int old, unsigned int new, - unsigned int mask, unsigned long key) +static nokprobe_inline int __cmpxchg_key_small(void *address, unsigned int= *uval, + unsigned int old, unsigned int new, + unsigned int mask, unsigned long key) { unsigned long count; unsigned int prev; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" " llill %[count],%[max_loops]\n" "0: l %[prev],%[address]\n" "1: nr %[prev],%[mask]\n" @@ -178,8 +61,7 @@ static nokprobe_inline int __cmpxchg_user_key_small(unsi= gned long address, unsig " nr %[tmp],%[mask]\n" " jnz 5f\n" " brct %[count],2b\n" - "5: sacf 768\n" - " spka %[default_key]\n" + "5: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 5b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 5b, %[rc], %[prev]) @@ -197,16 +79,16 @@ static nokprobe_inline int __cmpxchg_user_key_small(un= signed long address, unsig [default_key] "J" (PAGE_DEFAULT_KEY), [max_loops] "J" (CMPXCHG_USER_KEY_MAX_LOOPS) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; if (!count) rc =3D -EAGAIN; return rc; } =20 -int __kprobes __cmpxchg_user_key1(unsigned long address, unsigned char *uv= al, - unsigned char old, unsigned char new, unsigned long key) +int __kprobes __cmpxchg_key1(void *addr, unsigned char *uval, unsigned cha= r old, + unsigned char new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -215,15 +97,16 @@ int __kprobes __cmpxchg_user_key1(unsigned long addres= s, unsigned char *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key1); +EXPORT_SYMBOL(__cmpxchg_key1); =20 -int __kprobes __cmpxchg_user_key2(unsigned long address, unsigned short *u= val, - unsigned short old, unsigned short new, unsigned long key) +int __kprobes __cmpxchg_key2(void *addr, unsigned short *uval, unsigned sh= ort old, + unsigned short new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -232,27 +115,23 @@ int __kprobes __cmpxchg_user_key2(unsigned long addre= ss, unsigned short *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xffff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key2); +EXPORT_SYMBOL(__cmpxchg_key2); =20 -int __kprobes __cmpxchg_user_key4(unsigned long address, unsigned int *uva= l, - unsigned int old, unsigned int new, unsigned long key) +int __kprobes __cmpxchg_key4(void *address, unsigned int *uval, unsigned i= nt old, + unsigned int new, unsigned long key) { unsigned int prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cs %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -264,27 +143,22 @@ int __kprobes __cmpxchg_user_key4(unsigned long addre= ss, unsigned int *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key4); +EXPORT_SYMBOL(__cmpxchg_key4); =20 -int __kprobes __cmpxchg_user_key8(unsigned long address, unsigned long *uv= al, - unsigned long old, unsigned long new, unsigned long key) +int __kprobes __cmpxchg_key8(void *address, unsigned long *uval, unsigned = long old, + unsigned long new, unsigned long key) { unsigned long prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: csg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -296,27 +170,22 @@ int __kprobes __cmpxchg_user_key8(unsigned long addre= ss, unsigned long *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key8); +EXPORT_SYMBOL(__cmpxchg_key8); =20 -int __kprobes __cmpxchg_user_key16(unsigned long address, __uint128_t *uva= l, - __uint128_t old, __uint128_t new, unsigned long key) +int __kprobes __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_= t old, + __uint128_t new, unsigned long key) { __uint128_t prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cdsg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REGPAIR(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REGPAIR(1b, 1b, %[rc], %[prev]) @@ -328,8 +197,7 @@ int __kprobes __cmpxchg_user_key16(unsigned long addres= s, __uint128_t *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key16); +EXPORT_SYMBOL(__cmpxchg_key16); diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index 4864cb35fc25..d653c64b869a 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -34,28 +34,6 @@ static void ptep_zap_softleaf_entry(struct mm_struct *mm= , softleaf_t entry) free_swap_and_cache(entry); } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm @@ -68,9 +46,7 @@ static inline void pgste_set_unlock(pte_t *ptep, pgste_t = pgste) void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr) { struct vm_area_struct *vma; - unsigned long pgstev; spinlock_t *ptl; - pgste_t pgste; pte_t *ptep; =20 mmap_assert_locked(mm); @@ -85,18 +61,8 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsi= gned long vmaddr) if (unlikely(!ptep)) return; if (pte_swap(*ptep)) { - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - - if ((pgstev & _PGSTE_GPS_USAGE_MASK) =3D=3D _PGSTE_GPS_USAGE_UNUSED || - (pgstev & _PGSTE_GPS_ZERO)) { - ptep_zap_softleaf_entry(mm, softleaf_from_pte(*ptep)); - pte_clear(mm, vmaddr, ptep); - } - - pgste_set_unlock(ptep, pgste); - preempt_enable(); + ptep_zap_softleaf_entry(mm, softleaf_from_pte(*ptep)); + pte_clear(mm, vmaddr, ptep); } pte_unmap_unlock(ptep, ptl); } --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 730C1347FD0; Mon, 22 Dec 2025 16:51:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422283; cv=none; b=cNeWZ7FsVCQ0e8MuRfUyAgvbOt98A0ROsTmR3lm16k5kiQho8NH4b6hMk5sI9qNPuO32N5HWNgoe9gzbMrtyjnFOD1YEL0cOqflogRee9WGnkPh0NjVARl4Yp/IT60zGvLFjhQqRSW2LQehwjChOCu66orWZmXm3/fi+jh5xEkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422283; c=relaxed/simple; bh=WLZ2fJLpaaxPvrXYo4djgnhlTRAD6iUmDEBHkmA2kmg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P6m6bZbS2eIWfttyj3HfVSX1RjuJCysvOfdEShB9fUGgWAAixC4e6Yz4aPK4op7O0Jwd5bgloNW6fTWmp4p0eUKxDbLG9S5C81iAyOCR6lGHZRMuOZTf9zQomazgSQxrOvia35GwGd6BoowVqSK5I+nyLxZIruw/wqkUqtwWapg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=PeKKyDE0; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="PeKKyDE0" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMBIMPU020242; Mon, 22 Dec 2025 16:51:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=LqSF23NOlRiz9DzP6 Rg0sAxRr2R/wiqi1F0XFPQuEik=; b=PeKKyDE0eDgh747GKOGPJPtaqjsG0ytQ4 qMGpwKLEasu/G8RyLYEhU+CBNJ/u4Oejh986PZhD/SMg088d/gkWPifTSzn33hHM CImu9aWbgorXZIXqElXdfWEBl6bTWNFQB9m18rIoWdVNiDceaJTCwAZhcR020CGT DP648Wd5lW382zI8Ygy75ma+ni6hDvi8MvzP/gdzdWtBAr95RNeZjxETALsiFaGx +GTYCprqchHsETycuwiUMCEk1r758oQM4rDab4IyYC0xSl8AmXvz2nNhV+n9N5Sp MZy89TEObWqtQ4iaSGcKM1gtSye/UXrk0InjZiYxrIV/wCAz1yOqg== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5ka399hc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:14 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME83GJ032274; Mon, 22 Dec 2025 16:51:13 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b68u0xvc9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:13 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGp93a51183970 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:09 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF5C420040; Mon, 22 Dec 2025 16:51:08 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C03132004B; Mon, 22 Dec 2025 16:51:07 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:07 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 25/28] KVM: s390: Remove gmap from s390/mm Date: Mon, 22 Dec 2025 17:50:30 +0100 Message-ID: <20251222165033.162329-26-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: M84rZoJrR69zasx5UsaDbkWDiXSWO5DR X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX79coP7BPoO66 58EeYnjSVJPUXFW9OUazDSwJ2gccKo0RkRKUosfZMnF/8JstgAUz0GduHKDPD/BTaNJZ7GIn0mL ZzHUpfPr3k8Xw7YRMoGS5v78E0iF4uoq1/3Op72yDG4ntUSN3RO322c3ye8/2ht19uybAW3rhnQ uUhC/abWSuFwUhZgpHjmSkFeQO4SzcA8W5jLNbzgDwRuzPp4xHKGDpUvgTHL2NFs8ed//4D6hvy wxkAGUifHJtnFFBlzRQ4eOi9bgnC1iCtSJmjqxppYnm4j/O4ov6N9IXZB8guiLzRuKDJb4bvHgx z92RrdWueUWfEFsJj36nGiwpRmnUR235lZau9b7YWnY4LXA7NbyKGSI1vJvO3A9f+nYs7UHNmUH 3AdxUHs2ZwhD1YxvSurP4mMITiVbP0Myx9dranprj17pGZlhAOOA/lI0wZf8YwfhnQkHaeojO9P AS8TrmX9Y8RvOp2kVXQ== X-Proofpoint-ORIG-GUID: M84rZoJrR69zasx5UsaDbkWDiXSWO5DR X-Authority-Analysis: v=2.4 cv=dqHWylg4 c=1 sm=1 tr=0 ts=69497702 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=VwQbUJbxAAAA:8 a=20KFwNOVAAAA:8 a=HgcNs73Y_g91Sx3oYX8A:9 a=i6BJ_KlcRHWnWxX2:21 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 malwarescore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Remove the now unused include/asm/gmap.h and mm/gmap.c files. Signed-off-by: Claudio Imbrenda --- MAINTAINERS | 2 - arch/s390/include/asm/gmap.h | 174 --- arch/s390/include/asm/pgtable.h | 8 - arch/s390/mm/Makefile | 1 - arch/s390/mm/gmap.c | 2436 ------------------------------- arch/s390/mm/pgtable.c | 8 - 6 files changed, 2629 deletions(-) delete mode 100644 arch/s390/include/asm/gmap.h delete mode 100644 arch/s390/mm/gmap.c diff --git a/MAINTAINERS b/MAINTAINERS index dc731d37c8fe..95448b485fd2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13899,14 +13899,12 @@ L: kvm@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git F: Documentation/virt/kvm/s390* -F: arch/s390/include/asm/gmap.h F: arch/s390/include/asm/gmap_helpers.h F: arch/s390/include/asm/kvm* F: arch/s390/include/uapi/asm/kvm* F: arch/s390/include/uapi/asm/uvdevice.h F: arch/s390/kernel/uv.c F: arch/s390/kvm/ -F: arch/s390/mm/gmap.c F: arch/s390/mm/gmap_helpers.c F: drivers/s390/char/uvdevice.c F: tools/testing/selftests/drivers/s390x/uvdevice/ diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h deleted file mode 100644 index 66c5808fd011..000000000000 --- a/arch/s390/include/asm/gmap.h +++ /dev/null @@ -1,174 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2016 - * Author(s): Martin Schwidefsky - */ - -#ifndef _ASM_S390_GMAP_H -#define _ASM_S390_GMAP_H - -#include -#include - -/* Generic bits for GMAP notification on DAT table entry changes. */ -#define GMAP_NOTIFY_SHADOW 0x2 -#define GMAP_NOTIFY_MPROT 0x1 - -/* Status bits only for huge segment entries */ -#define _SEGMENT_ENTRY_GMAP_IN 0x0800 /* invalidation notify bit */ -#define _SEGMENT_ENTRY_GMAP_UC 0x0002 /* dirty (migration) */ - -/** - * struct gmap_struct - guest address space - * @list: list head for the mm->context gmap list - * @mm: pointer to the parent mm_struct - * @guest_to_host: radix tree with guest to host address translation - * @host_to_guest: radix tree with pointer to segment table entries - * @guest_table_lock: spinlock to protect all entries in the guest page ta= ble - * @ref_count: reference counter for the gmap structure - * @table: pointer to the page directory - * @asce: address space control element for gmap page table - * @pfault_enabled: defines if pfaults are applicable for the guest - * @guest_handle: protected virtual machine handle for the ultravisor - * @host_to_rmap: radix tree with gmap_rmap lists - * @children: list of shadow gmap structures - * @shadow_lock: spinlock to protect the shadow gmap list - * @parent: pointer to the parent gmap for shadow guest address spaces - * @orig_asce: ASCE for which the shadow page table has been created - * @edat_level: edat level to be used for the shadow translation - * @removed: flag to indicate if a shadow guest address space has been rem= oved - * @initialized: flag to indicate if a shadow guest address space can be u= sed - */ -struct gmap { - struct list_head list; - struct mm_struct *mm; - struct radix_tree_root guest_to_host; - struct radix_tree_root host_to_guest; - spinlock_t guest_table_lock; - refcount_t ref_count; - unsigned long *table; - unsigned long asce; - unsigned long asce_end; - void *private; - bool pfault_enabled; - /* only set for protected virtual machines */ - unsigned long guest_handle; - /* Additional data for shadow guest address spaces */ - struct radix_tree_root host_to_rmap; - struct list_head children; - spinlock_t shadow_lock; - struct gmap *parent; - unsigned long orig_asce; - int edat_level; - bool removed; - bool initialized; -}; - -/** - * struct gmap_rmap - reverse mapping for shadow page table entries - * @next: pointer to next rmap in the list - * @raddr: virtual rmap address in the shadow guest address space - */ -struct gmap_rmap { - struct gmap_rmap *next; - unsigned long raddr; -}; - -#define gmap_for_each_rmap(pos, head) \ - for (pos =3D (head); pos; pos =3D pos->next) - -#define gmap_for_each_rmap_safe(pos, n, head) \ - for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) - -/** - * struct gmap_notifier - notify function block for page invalidation - * @notifier_call: address of callback function - */ -struct gmap_notifier { - struct list_head list; - struct rcu_head rcu; - void (*notifier_call)(struct gmap *gmap, unsigned long start, - unsigned long end); -}; - -static inline int gmap_is_shadow(struct gmap *gmap) -{ - return !!gmap->parent; -} - -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit); -void gmap_remove(struct gmap *gmap); -struct gmap *gmap_get(struct gmap *gmap); -void gmap_put(struct gmap *gmap); -void gmap_free(struct gmap *gmap); -struct gmap *gmap_alloc(unsigned long limit); - -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len); -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len); -unsigned long __gmap_translate(struct gmap *, unsigned long gaddr); -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr); -void __gmap_zap(struct gmap *, unsigned long gaddr); -void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long v= maddr); - -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val); - -void gmap_unshadow(struct gmap *sg); -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake); -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake); -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake); -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake); -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte); - -void gmap_register_pte_notifier(struct gmap_notifier *); -void gmap_unregister_pte_notifier(struct gmap_notifier *); - -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits); - -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap= [4], - unsigned long gaddr, unsigned long vmaddr); -int s390_replace_asce(struct gmap *gmap); -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns); -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible); -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level); - -/** - * s390_uv_destroy_range - Destroy a range of pages in the given mm. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls, b= ut - * it will otherwise only return when it completed. - */ -static inline void s390_uv_destroy_range(struct mm_struct *mm, unsigned lo= ng start, - unsigned long end) -{ - (void)__s390_uv_destroy_range(mm, start, end, false); -} - -/** - * s390_uv_destroy_range_interruptible - Destroy a range of pages in the - * given mm, but stop when a fatal signal is received. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls. If - * a fatal signal is received, it will return with -EINTR immediately, - * without finishing destroying the whole range. Upon successful - * completion, 0 is returned. - */ -static inline int s390_uv_destroy_range_interruptible(struct mm_struct *mm= , unsigned long start, - unsigned long end) -{ - return __s390_uv_destroy_range(mm, start, end, true); -} -#endif /* _ASM_S390_GMAP_H */ diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index cd4d135c4503..45f13697cf9e 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1369,8 +1369,6 @@ static inline int ptep_set_access_flags(struct vm_are= a_struct *vma, void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t entry); void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -void ptep_notify(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, unsigned long bits); int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, pte_t *ptep, int prot, unsigned long bit); void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, @@ -1396,9 +1394,6 @@ int set_pgste_bits(struct mm_struct *mm, unsigned lon= g addr, int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, unsigned long *oldpte, unsigned long *oldpgste); -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr); =20 #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); @@ -2023,9 +2018,6 @@ extern int __vmem_map_4k_page(unsigned long addr, uns= igned long phys, pgprot_t p extern int vmem_map_4k_page(unsigned long addr, unsigned long phys, pgprot= _t prot); extern void vmem_unmap_4k_page(unsigned long addr); extern pte_t *vmem_get_alloc_pte(unsigned long addr, bool alloc); -extern int s390_enable_sie(void); -extern int s390_enable_skey(void); -extern void s390_reset_cmma(struct mm_struct *mm); =20 /* s390 has a private copy of get unmapped area to deal with cache synonym= s */ #define HAVE_ARCH_UNMAPPED_AREA diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile index bd0401cc7ca5..193899c39ca7 100644 --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -10,7 +10,6 @@ obj-$(CONFIG_CMM) +=3D cmm.o obj-$(CONFIG_DEBUG_VIRTUAL) +=3D physaddr.o obj-$(CONFIG_HUGETLB_PAGE) +=3D hugetlbpage.o obj-$(CONFIG_PTDUMP) +=3D dump_pagetables.o -obj-$(CONFIG_PGSTE) +=3D gmap.o obj-$(CONFIG_PFAULT) +=3D pfault.o =20 obj-$(subst m,y,$(CONFIG_KVM)) +=3D gmap_helpers.o diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c deleted file mode 100644 index dd85bcca817d..000000000000 --- a/arch/s390/mm/gmap.c +++ /dev/null @@ -1,2436 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2020 - * Author(s): Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * The address is saved in a radix tree directly; NULL would be ambiguous, - * since 0 is a valid address, and NULL is returned when nothing was found. - * The lower bits are ignored by all users of the macro, so it can be used - * to distinguish a valid address 0 from a NULL. - */ -#define VALID_GADDR_FLAG 1 -#define IS_GADDR_VALID(gaddr) ((gaddr) & VALID_GADDR_FLAG) -#define MAKE_VALID_GADDR(gaddr) (((gaddr) & HPAGE_MASK) | VALID_GADDR_FLAG) - -#define GMAP_SHADOW_FAKE_TABLE 1ULL - -static struct page *gmap_alloc_crst(void) -{ - struct page *page; - - page =3D alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); - if (!page) - return NULL; - __arch_set_page_dat(page_to_virt(page), 1UL << CRST_ALLOC_ORDER); - return page; -} - -/** - * gmap_alloc - allocate and initialize a guest address space - * @limit: maximum address of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_alloc(unsigned long limit) -{ - struct gmap *gmap; - struct page *page; - unsigned long *table; - unsigned long etype, atype; - - if (limit < _REGION3_SIZE) { - limit =3D _REGION3_SIZE - 1; - atype =3D _ASCE_TYPE_SEGMENT; - etype =3D _SEGMENT_ENTRY_EMPTY; - } else if (limit < _REGION2_SIZE) { - limit =3D _REGION2_SIZE - 1; - atype =3D _ASCE_TYPE_REGION3; - etype =3D _REGION3_ENTRY_EMPTY; - } else if (limit < _REGION1_SIZE) { - limit =3D _REGION1_SIZE - 1; - atype =3D _ASCE_TYPE_REGION2; - etype =3D _REGION2_ENTRY_EMPTY; - } else { - limit =3D -1UL; - atype =3D _ASCE_TYPE_REGION1; - etype =3D _REGION1_ENTRY_EMPTY; - } - gmap =3D kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT); - if (!gmap) - goto out; - INIT_LIST_HEAD(&gmap->children); - INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT); - spin_lock_init(&gmap->guest_table_lock); - spin_lock_init(&gmap->shadow_lock); - refcount_set(&gmap->ref_count, 1); - page =3D gmap_alloc_crst(); - if (!page) - goto out_free; - table =3D page_to_virt(page); - crst_table_init(table, etype); - gmap->table =3D table; - gmap->asce =3D atype | _ASCE_TABLE_LENGTH | - _ASCE_USER_BITS | __pa(table); - gmap->asce_end =3D limit; - return gmap; - -out_free: - kfree(gmap); -out: - return NULL; -} -EXPORT_SYMBOL_GPL(gmap_alloc); - -/** - * gmap_create - create a guest address space - * @mm: pointer to the parent mm_struct - * @limit: maximum size of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit) -{ - struct gmap *gmap; - unsigned long gmap_asce; - - gmap =3D gmap_alloc(limit); - if (!gmap) - return NULL; - gmap->mm =3D mm; - spin_lock(&mm->context.lock); - list_add_rcu(&gmap->list, &mm->context.gmap_list); - if (list_is_singular(&mm->context.gmap_list)) - gmap_asce =3D gmap->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(mm->context.gmap_asce, gmap_asce); - spin_unlock(&mm->context.lock); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_create); - -static void gmap_flush_tlb(struct gmap *gmap) -{ - __tlb_flush_idte(gmap->asce); -} - -static void gmap_radix_tree_free(struct radix_tree_root *root) -{ - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - radix_tree_delete(root, index); - } - } while (nr > 0); -} - -static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) -{ - struct gmap_rmap *rmap, *rnext, *head; - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - head =3D radix_tree_delete(root, index); - gmap_for_each_rmap_safe(rmap, rnext, head) - kfree(rmap); - } - } while (nr > 0); -} - -static void gmap_free_crst(unsigned long *table, bool free_ptes) -{ - bool is_segment =3D (table[0] & _SEGMENT_ENTRY_TYPE_MASK) =3D=3D 0; - int i; - - if (is_segment) { - if (!free_ptes) - goto out; - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _SEGMENT_ENTRY_INVALID)) - page_table_free_pgste(page_ptdesc(phys_to_page(table[i]))); - } else { - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _REGION_ENTRY_INVALID)) - gmap_free_crst(__va(table[i] & PAGE_MASK), free_ptes); - } - -out: - free_pages((unsigned long)table, CRST_ALLOC_ORDER); -} - -/** - * gmap_free - free a guest address space - * @gmap: pointer to the guest address space structure - * - * No locks required. There are no references to this gmap anymore. - */ -void gmap_free(struct gmap *gmap) -{ - /* Flush tlb of all gmaps (if not already done for shadows) */ - if (!(gmap_is_shadow(gmap) && gmap->removed)) - gmap_flush_tlb(gmap); - /* Free all segment & region tables. */ - gmap_free_crst(gmap->table, gmap_is_shadow(gmap)); - - gmap_radix_tree_free(&gmap->guest_to_host); - gmap_radix_tree_free(&gmap->host_to_guest); - - /* Free additional data for a shadow gmap */ - if (gmap_is_shadow(gmap)) { - gmap_rmap_radix_tree_free(&gmap->host_to_rmap); - /* Release reference to the parent */ - gmap_put(gmap->parent); - } - - kfree(gmap); -} -EXPORT_SYMBOL_GPL(gmap_free); - -/** - * gmap_get - increase reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * Returns the gmap pointer - */ -struct gmap *gmap_get(struct gmap *gmap) -{ - refcount_inc(&gmap->ref_count); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_get); - -/** - * gmap_put - decrease reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * If the reference counter reaches zero the guest address space is freed. - */ -void gmap_put(struct gmap *gmap) -{ - if (refcount_dec_and_test(&gmap->ref_count)) - gmap_free(gmap); -} -EXPORT_SYMBOL_GPL(gmap_put); - -/** - * gmap_remove - remove a guest address space but do not free it yet - * @gmap: pointer to the guest address space structure - */ -void gmap_remove(struct gmap *gmap) -{ - struct gmap *sg, *next; - unsigned long gmap_asce; - - /* Remove all shadow gmaps linked to this gmap */ - if (!list_empty(&gmap->children)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, &gmap->children, list) { - list_del(&sg->list); - gmap_put(sg); - } - spin_unlock(&gmap->shadow_lock); - } - /* Remove gmap from the pre-mm list */ - spin_lock(&gmap->mm->context.lock); - list_del_rcu(&gmap->list); - if (list_empty(&gmap->mm->context.gmap_list)) - gmap_asce =3D 0; - else if (list_is_singular(&gmap->mm->context.gmap_list)) - gmap_asce =3D list_first_entry(&gmap->mm->context.gmap_list, - struct gmap, list)->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(gmap->mm->context.gmap_asce, gmap_asce); - spin_unlock(&gmap->mm->context.lock); - synchronize_rcu(); - /* Put reference */ - gmap_put(gmap); -} -EXPORT_SYMBOL_GPL(gmap_remove); - -/* - * gmap_alloc_table is assumed to be called with mmap_lock held - */ -static int gmap_alloc_table(struct gmap *gmap, unsigned long *table, - unsigned long init, unsigned long gaddr) -{ - struct page *page; - unsigned long *new; - - /* since we dont free the gmap table until gmap_free we can unlock */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - new =3D page_to_virt(page); - crst_table_init(new, init); - spin_lock(&gmap->guest_table_lock); - if (*table & _REGION_ENTRY_INVALID) { - *table =3D __pa(new) | _REGION_ENTRY_LENGTH | - (*table & _REGION_ENTRY_TYPE_MASK); - page =3D NULL; - } - spin_unlock(&gmap->guest_table_lock); - if (page) - __free_pages(page, CRST_ALLOC_ORDER); - return 0; -} - -static unsigned long host_to_guest_lookup(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_lookup(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static unsigned long host_to_guest_delete(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_delete(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static pmd_t *host_to_guest_pmd_delete(struct gmap *gmap, unsigned long vm= addr, - unsigned long *gaddr) -{ - *gaddr =3D host_to_guest_delete(gmap, vmaddr); - if (IS_GADDR_VALID(*gaddr)) - return (pmd_t *)gmap_table_walk(gmap, *gaddr, 1); - return NULL; -} - -/** - * __gmap_unlink_by_vmaddr - unlink a single segment via a host address - * @gmap: pointer to the guest address space structure - * @vmaddr: address in the host process address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unlink_by_vmaddr(struct gmap *gmap, unsigned long vmaddr) -{ - unsigned long gaddr; - int flush =3D 0; - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - spin_lock(&gmap->guest_table_lock); - - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - flush =3D (pmd_val(*pmdp) !=3D _SEGMENT_ENTRY_EMPTY); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - - spin_unlock(&gmap->guest_table_lock); - return flush; -} - -/** - * __gmap_unmap_by_gaddr - unmap a single segment via a guest address - * @gmap: pointer to the guest address space structure - * @gaddr: address in the guest address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unmap_by_gaddr(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) radix_tree_delete(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - return vmaddr ? __gmap_unlink_by_vmaddr(gmap, vmaddr) : 0; -} - -/** - * gmap_unmap_segment - unmap segment from the guest address space - * @gmap: pointer to the guest address space structure - * @to: address in the guest address space - * @len: length of the memory area to unmap - * - * Returns 0 if the unmap succeeded, -EINVAL if not. - */ -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || to + len < to) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - return 0; -} -EXPORT_SYMBOL_GPL(gmap_unmap_segment); - -/** - * gmap_map_segment - map a segment to the guest address space - * @gmap: pointer to the guest address space structure - * @from: source address in the parent address space - * @to: target address in the guest address space - * @len: length of the memory area to map - * - * Returns 0 if the mmap succeeded, -EINVAL or -ENOMEM if not. - */ -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((from | to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || from + len < from || to + len < to || - from + len - 1 > TASK_SIZE_MAX || to + len - 1 > gmap->asce_end) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) { - /* Remove old translation */ - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - /* Store new translation */ - if (radix_tree_insert(&gmap->guest_to_host, - (to + off) >> PMD_SHIFT, - (void *) from + off)) - break; - } - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - if (off >=3D len) - return 0; - gmap_unmap_segment(gmap, to, len); - return -ENOMEM; -} -EXPORT_SYMBOL_GPL(gmap_map_segment); - -/** - * __gmap_translate - translate a guest address to a user space address - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * - * Returns user space address which corresponds to the guest address or - * -EFAULT if no such mapping exists. - * This function does not establish potentially missing page table entries. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long __gmap_translate(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) - radix_tree_lookup(&gmap->guest_to_host, gaddr >> PMD_SHIFT); - /* Note: guest_to_host is empty for a shadow gmap */ - return vmaddr ? (vmaddr | (gaddr & ~PMD_MASK)) : -EFAULT; -} -EXPORT_SYMBOL_GPL(__gmap_translate); - -/** - * gmap_unlink - disconnect a page table from the gmap shadow tables - * @mm: pointer to the parent mm_struct - * @table: pointer to the host page table - * @vmaddr: vm address associated with the host page table - */ -void gmap_unlink(struct mm_struct *mm, unsigned long *table, - unsigned long vmaddr) -{ - struct gmap *gmap; - int flush; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - flush =3D __gmap_unlink_by_vmaddr(gmap, vmaddr); - if (flush) - gmap_flush_tlb(gmap); - } - rcu_read_unlock(); -} - -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new, - unsigned long gaddr); - -/** - * __gmap_link - set up shadow page tables to connect a host to a guest ad= dress - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * @vmaddr: vm address - * - * Returns 0 on success, -ENOMEM for out of memory conditions, and -EFAULT - * if the vm address is already mapped to a different guest segment. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - */ -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr) -{ - struct mm_struct *mm; - unsigned long *table; - spinlock_t *ptl; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - u64 unprot; - int rc; - - BUG_ON(gmap_is_shadow(gmap)); - /* Create higher level tables in the gmap page table */ - table =3D gmap->table; - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION1) { - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION2_ENTRY_EMPTY, - gaddr & _REGION1_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION2) { - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION3_ENTRY_EMPTY, - gaddr & _REGION2_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION3) { - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _SEGMENT_ENTRY_EMPTY, - gaddr & _REGION3_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - /* Walk the parent mm page table */ - mm =3D gmap->mm; - pgd =3D pgd_offset(mm, vmaddr); - VM_BUG_ON(pgd_none(*pgd)); - p4d =3D p4d_offset(pgd, vmaddr); - VM_BUG_ON(p4d_none(*p4d)); - pud =3D pud_offset(p4d, vmaddr); - VM_BUG_ON(pud_none(*pud)); - /* large puds cannot yet be handled */ - if (pud_leaf(*pud)) - return -EFAULT; - pmd =3D pmd_offset(pud, vmaddr); - VM_BUG_ON(pmd_none(*pmd)); - /* Are we allowed to use huge pages? */ - if (pmd_leaf(*pmd) && !gmap->mm->context.allow_gmap_hpage_1m) - return -EFAULT; - /* Link gmap segment table entry location to page table. */ - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - return rc; - ptl =3D pmd_lock(mm, pmd); - spin_lock(&gmap->guest_table_lock); - if (*table =3D=3D _SEGMENT_ENTRY_EMPTY) { - rc =3D radix_tree_insert(&gmap->host_to_guest, - vmaddr >> PMD_SHIFT, - (void *)MAKE_VALID_GADDR(gaddr)); - if (!rc) { - if (pmd_leaf(*pmd)) { - *table =3D (pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS_LARGE) - | _SEGMENT_ENTRY_GMAP_UC - | _SEGMENT_ENTRY; - } else - *table =3D (pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS) - | _SEGMENT_ENTRY; - } - } else if (*table & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) { - unprot =3D (u64)*table; - unprot &=3D ~_SEGMENT_ENTRY_PROTECT; - unprot |=3D _SEGMENT_ENTRY_GMAP_UC; - gmap_pmdp_xchg(gmap, (pmd_t *)table, __pmd(unprot), gaddr); - } - spin_unlock(&gmap->guest_table_lock); - spin_unlock(ptl); - radix_tree_preload_end(); - return rc; -} -EXPORT_SYMBOL(__gmap_link); - -/* - * this function is assumed to be called with mmap_lock held - */ -void __gmap_zap(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - mmap_assert_locked(gmap->mm); - - /* Find the vm address for the guest address */ - vmaddr =3D (unsigned long) radix_tree_lookup(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - if (vmaddr) { - vmaddr |=3D gaddr & ~PMD_MASK; - gmap_helper_zap_one_page(gmap->mm, vmaddr); - } -} -EXPORT_SYMBOL_GPL(__gmap_zap); - -static LIST_HEAD(gmap_notifier_list); -static DEFINE_SPINLOCK(gmap_notifier_lock); - -/** - * gmap_register_pte_notifier - register a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_register_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_add_rcu(&nb->list, &gmap_notifier_list); - spin_unlock(&gmap_notifier_lock); -} -EXPORT_SYMBOL_GPL(gmap_register_pte_notifier); - -/** - * gmap_unregister_pte_notifier - remove a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_unregister_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_del_rcu(&nb->list); - spin_unlock(&gmap_notifier_lock); - synchronize_rcu(); -} -EXPORT_SYMBOL_GPL(gmap_unregister_pte_notifier); - -/** - * gmap_call_notifier - call all registered invalidation callbacks - * @gmap: pointer to guest mapping meta data structure - * @start: start virtual address in the guest address space - * @end: end virtual address in the guest address space - */ -static void gmap_call_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct gmap_notifier *nb; - - list_for_each_entry(nb, &gmap_notifier_list, list) - nb->notifier_call(gmap, start, end); -} - -/** - * gmap_table_walk - walk the gmap page tables - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @level: page table level to stop at - * - * Returns a table entry pointer for the given guest address and @level - * @level=3D0 : returns a pointer to a page table table entry (or NULL) - * @level=3D1 : returns a pointer to a segment table entry (or NULL) - * @level=3D2 : returns a pointer to a region-3 table entry (or NULL) - * @level=3D3 : returns a pointer to a region-2 table entry (or NULL) - * @level=3D4 : returns a pointer to a region-1 table entry (or NULL) - * - * Returns NULL if the gmap page tables could not be walked to the - * requested level. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level) -{ - const int asce_type =3D gmap->asce & _ASCE_TYPE_MASK; - unsigned long *table =3D gmap->table; - - if (gmap_is_shadow(gmap) && gmap->removed) - return NULL; - - if (WARN_ON_ONCE(level > (asce_type >> 2) + 1)) - return NULL; - - if (asce_type !=3D _ASCE_TYPE_REGION1 && - gaddr & (-1UL << (31 + (asce_type >> 2) * 11))) - return NULL; - - switch (asce_type) { - case _ASCE_TYPE_REGION1: - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if (level =3D=3D 4) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION2: - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if (level =3D=3D 3) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION3: - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if (level =3D=3D 2) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_SEGMENT: - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - if (level =3D=3D 1) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _SEGMENT_ENTRY_ORIGIN); - table +=3D (gaddr & _PAGE_INDEX) >> PAGE_SHIFT; - } - return table; -} -EXPORT_SYMBOL(gmap_table_walk); - -/** - * gmap_pte_op_walk - walk the gmap page table, get the page table lock - * and return the pte pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @ptl: pointer to the spinlock pointer - * - * Returns a pointer to the locked pte for a guest address, or NULL - */ -static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr, - spinlock_t **ptl) -{ - unsigned long *table; - - BUG_ON(gmap_is_shadow(gmap)); - /* Walk the gmap page table, lock and get pte pointer */ - table =3D gmap_table_walk(gmap, gaddr, 1); /* get segment pointer */ - if (!table || *table & _SEGMENT_ENTRY_INVALID) - return NULL; - return pte_alloc_map_lock(gmap->mm, (pmd_t *) table, gaddr, ptl); -} - -/** - * gmap_pte_op_fixup - force a page in and connect the gmap page table - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @vmaddr: address in the host process address space - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * - * Returns 0 if the caller can retry __gmap_translate (might fail again), - * -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing - * up or connecting the gmap page table. - */ -static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr, - unsigned long vmaddr, int prot) -{ - struct mm_struct *mm =3D gmap->mm; - unsigned int fault_flags; - bool unlocked =3D false; - - BUG_ON(gmap_is_shadow(gmap)); - fault_flags =3D (prot =3D=3D PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - if (fixup_user_fault(mm, vmaddr, fault_flags, &unlocked)) - return -EFAULT; - if (unlocked) - /* lost mmap_lock, caller has to retry __gmap_translate */ - return 0; - /* Connect the page tables */ - return __gmap_link(gmap, gaddr, vmaddr); -} - -/** - * gmap_pte_op_end - release the page table lock - * @ptep: pointer to the locked pte - * @ptl: pointer to the page table spinlock - */ -static void gmap_pte_op_end(pte_t *ptep, spinlock_t *ptl) -{ - pte_unmap_unlock(ptep, ptl); -} - -/** - * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock - * and return the pmd pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * - * Returns a pointer to the pmd for a guest address, or NULL - */ -static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gad= dr) -{ - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - pmdp =3D (pmd_t *) gmap_table_walk(gmap, gaddr, 1); - if (!pmdp) - return NULL; - - /* without huge pages, there is no need to take the table lock */ - if (!gmap->mm->context.allow_gmap_hpage_1m) - return pmd_none(*pmdp) ? NULL : pmdp; - - spin_lock(&gmap->guest_table_lock); - if (pmd_none(*pmdp)) { - spin_unlock(&gmap->guest_table_lock); - return NULL; - } - - /* 4k page table entries are locked via the pte (pte_alloc_map_lock). */ - if (!pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); - return pmdp; -} - -/** - * gmap_pmd_op_end - release the guest_table_lock if needed - * @gmap: pointer to the guest mapping meta data structure - * @pmdp: pointer to the pmd - */ -static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp) -{ - if (pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); -} - -/* - * gmap_protect_pmd - remove access rights to memory and set pmd notificat= ion bits - * @pmdp: pointer to the pmd to be protected - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns: - * 0 if successfully protected - * -EAGAIN if a fixup is needed - * -EINVAL if unsupported notifier bits have been specified - * - * Expected to be called with sg->mm->mmap_lock in read and - * guest_table_lock held. - */ -static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int pmd_i =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID; - int pmd_p =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT; - pmd_t new =3D *pmdp; - - /* Fixup needed */ - if ((pmd_i && (prot !=3D PROT_NONE)) || (pmd_p && (prot =3D=3D PROT_WRITE= ))) - return -EAGAIN; - - if (prot =3D=3D PROT_NONE && !pmd_i) { - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (prot =3D=3D PROT_READ && !pmd_p) { - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_PROTECT)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (bits & GMAP_NOTIFY_MPROT) - set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - - /* Shadow GMAP protection needs split PMDs */ - if (bits & GMAP_NOTIFY_SHADOW) - return -EINVAL; - - return 0; -} - -/* - * gmap_protect_pte - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @pmdp: pointer to the pmd associated with the pte - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns 0 if successfully protected, -ENOMEM if out of memory and - * -EAGAIN if a fixup is needed. - * - * Expected to be called with sg->mm->mmap_lock in read - */ -static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int rc; - pte_t *ptep; - spinlock_t *ptl; - unsigned long pbits =3D 0; - - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return -EAGAIN; - - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl); - if (!ptep) - return -ENOMEM; - - pbits |=3D (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0; - pbits |=3D (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0; - /* Protect and unlock. */ - rc =3D ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits); - gmap_pte_op_end(ptep, ptl); - return rc; -} - -/* - * gmap_protect_range - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @len: size of area - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: - * PAGE_SIZE if a small page was successfully protected; - * HPAGE_SIZE if a large page was successfully protected; - * -ENOMEM if out of memory; - * -EFAULT if gaddr is invalid (or mapping for shadows is missing); - * -EAGAIN if the guest mapping is missing and should be fixed by the ca= ller. - * - * Context: Called with sg->mm->mmap_lock in read. - */ -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits) -{ - pmd_t *pmdp; - int rc =3D 0; - - BUG_ON(gmap_is_shadow(gmap)); - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return -EAGAIN; - - if (!pmd_leaf(*pmdp)) { - rc =3D gmap_protect_pte(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D PAGE_SIZE; - } else { - rc =3D gmap_protect_pmd(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D HPAGE_SIZE; - } - gmap_pmd_op_end(gmap, pmdp); - - return rc; -} -EXPORT_SYMBOL_GPL(gmap_protect_one); - -/** - * gmap_read_table - get an unsigned long value from a guest page table us= ing - * absolute addressing, without marking the page referen= ced. - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @val: pointer to the unsigned long value to return - * - * Returns 0 if the value was read, -ENOMEM if out of memory and -EFAULT - * if reading using the virtual address failed. -EINVAL if called on a gmap - * shadow. - * - * Called with gmap->mm->mmap_lock in read. - */ -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val) -{ - unsigned long address, vmaddr; - spinlock_t *ptl; - pte_t *ptep, pte; - int rc; - - if (gmap_is_shadow(gmap)) - return -EINVAL; - - while (1) { - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(gmap, gaddr, &ptl); - if (ptep) { - pte =3D *ptep; - if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) { - address =3D pte_val(pte) & PAGE_MASK; - address +=3D gaddr & ~PAGE_MASK; - *val =3D *(unsigned long *)__va(address); - set_pte(ptep, set_pte_bit(*ptep, __pgprot(_PAGE_YOUNG))); - /* Do *NOT* clear the _PAGE_INVALID bit! */ - rc =3D 0; - } - gmap_pte_op_end(ptep, ptl); - } - if (!rc) - break; - vmaddr =3D __gmap_translate(gmap, gaddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ); - if (rc) - break; - } - return rc; -} -EXPORT_SYMBOL_GPL(gmap_read_table); - -/** - * gmap_insert_rmap - add a rmap to the host_to_rmap radix tree - * @sg: pointer to the shadow guest address space structure - * @vmaddr: vm address associated with the rmap - * @rmap: pointer to the rmap structure - * - * Called with the sg->guest_table_lock - */ -static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr, - struct gmap_rmap *rmap) -{ - struct gmap_rmap *temp; - void __rcu **slot; - - BUG_ON(!gmap_is_shadow(sg)); - slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - if (slot) { - rmap->next =3D radix_tree_deref_slot_protected(slot, - &sg->guest_table_lock); - for (temp =3D rmap->next; temp; temp =3D temp->next) { - if (temp->raddr =3D=3D rmap->raddr) { - kfree(rmap); - return; - } - } - radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); - } else { - rmap->next =3D NULL; - radix_tree_insert(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT, - rmap); - } -} - -/** - * gmap_protect_rmap - restrict access rights to memory (RO) and create an= rmap - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow gmap - * @paddr: address in the parent guest address space - * @len: length of the memory area to protect - * - * Returns 0 if successfully protected and the rmap was created, -ENOMEM - * if out of memory and -EFAULT if paddr is invalid. - */ -static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr, - unsigned long paddr, unsigned long len) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr; - spinlock_t *ptl; - pte_t *ptep; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - while (len) { - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) - return vmaddr; - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D raddr; - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) { - kfree(rmap); - return rc; - } - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (ptep) { - spin_lock(&sg->guest_table_lock); - rc =3D ptep_force_prot(parent->mm, paddr, ptep, PROT_READ, - PGSTE_VSIE_BIT); - if (!rc) - gmap_insert_rmap(sg, vmaddr, rmap); - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(ptep, ptl); - } - radix_tree_preload_end(); - if (rc) { - kfree(rmap); - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ); - if (rc) - return rc; - continue; - } - paddr +=3D PAGE_SIZE; - len -=3D PAGE_SIZE; - } - return 0; -} - -#define _SHADOW_RMAP_MASK 0x7 -#define _SHADOW_RMAP_REGION1 0x5 -#define _SHADOW_RMAP_REGION2 0x4 -#define _SHADOW_RMAP_REGION3 0x3 -#define _SHADOW_RMAP_SEGMENT 0x2 -#define _SHADOW_RMAP_PGTABLE 0x1 - -/** - * gmap_idte_one - invalidate a single region or segment table entry - * @asce: region or segment table *origin* + table-type bits - * @vaddr: virtual address to identify the table entry to flush - * - * The invalid bit of a single region or segment table entry is set - * and the associated TLB entries depending on the entry are flushed. - * The table-type of the @asce identifies the portion of the @vaddr - * that is used as the invalidation index. - */ -static inline void gmap_idte_one(unsigned long asce, unsigned long vaddr) -{ - asm volatile( - " idte %0,0,%1" - : : "a" (asce), "a" (vaddr) : "cc", "memory"); -} - -/** - * gmap_unshadow_page - remove a page from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_page(struct gmap *sg, unsigned long raddr) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - table =3D gmap_table_walk(sg, raddr, 0); /* get page table pointer */ - if (!table || *table & _PAGE_INVALID) - return; - gmap_call_notifier(sg, raddr, raddr + PAGE_SIZE - 1); - ptep_unshadow_pte(sg->mm, raddr, (pte_t *) table); -} - -/** - * __gmap_unshadow_pgt - remove all entries from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @pgt: pointer to the start of a shadow page table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr, - unsigned long *pgt) -{ - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _PAGE_ENTRIES; i++, raddr +=3D PAGE_SIZE) - pgt[i] =3D _PAGE_INVALID; -} - -/** - * gmap_unshadow_pgt - remove a shadow page table from a segment entry - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long *ste; - phys_addr_t sto, pgt; - struct ptdesc *ptdesc; - - BUG_ON(!gmap_is_shadow(sg)); - ste =3D gmap_table_walk(sg, raddr, 1); /* get segment pointer */ - if (!ste || !(*ste & _SEGMENT_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _SEGMENT_SIZE - 1); - sto =3D __pa(ste - ((raddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT)); - gmap_idte_one(sto | _ASCE_TYPE_SEGMENT, raddr); - pgt =3D *ste & _SEGMENT_ENTRY_ORIGIN; - *ste =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); -} - -/** - * __gmap_unshadow_sgt - remove all entries from a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @sgt: pointer to the start of a shadow segment table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr, - unsigned long *sgt) -{ - struct ptdesc *ptdesc; - phys_addr_t pgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _SEGMENT_SIZE) { - if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN)) - continue; - pgt =3D sgt[i] & _REGION_ENTRY_ORIGIN; - sgt[i] =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); - } -} - -/** - * gmap_unshadow_sgt - remove a shadow segment table from a region-3 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the shadow->guest_table_lock - */ -static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long r3o, *r3e; - phys_addr_t sgt; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r3e =3D gmap_table_walk(sg, raddr, 2); /* get region-3 pointer */ - if (!r3e || !(*r3e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION3_SIZE - 1); - r3o =3D (unsigned long) (r3e - ((raddr & _REGION3_INDEX) >> _REGION3_SHIF= T)); - gmap_idte_one(__pa(r3o) | _ASCE_TYPE_REGION3, raddr); - sgt =3D *r3e & _REGION_ENTRY_ORIGIN; - *r3e =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r3t - remove all entries from a shadow region-3 table - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * @r3t: pointer to the start of a shadow region-3 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr, - unsigned long *r3t) -{ - struct page *page; - phys_addr_t sgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION3_SIZE) { - if (!(r3t[i] & _REGION_ENTRY_ORIGIN)) - continue; - sgt =3D r3t[i] & _REGION_ENTRY_ORIGIN; - r3t[i] =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r3t - remove a shadow region-3 table from a region-2 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r2o, *r2e; - phys_addr_t r3t; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r2e =3D gmap_table_walk(sg, raddr, 3); /* get region-2 pointer */ - if (!r2e || !(*r2e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION2_SIZE - 1); - r2o =3D (unsigned long) (r2e - ((raddr & _REGION2_INDEX) >> _REGION2_SHIF= T)); - gmap_idte_one(__pa(r2o) | _ASCE_TYPE_REGION2, raddr); - r3t =3D *r2e & _REGION_ENTRY_ORIGIN; - *r2e =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r2t - remove all entries from a shadow region-2 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r2t: pointer to the start of a shadow region-2 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, - unsigned long *r2t) -{ - phys_addr_t r3t; - struct page *page; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION2_SIZE) { - if (!(r2t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r3t =3D r2t[i] & _REGION_ENTRY_ORIGIN; - r2t[i] =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r2t - remove a shadow region-2 table from a region-1 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r1o, *r1e; - struct page *page; - phys_addr_t r2t; - - BUG_ON(!gmap_is_shadow(sg)); - r1e =3D gmap_table_walk(sg, raddr, 4); /* get region-1 pointer */ - if (!r1e || !(*r1e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION1_SIZE - 1); - r1o =3D (unsigned long) (r1e - ((raddr & _REGION1_INDEX) >> _REGION1_SHIF= T)); - gmap_idte_one(__pa(r1o) | _ASCE_TYPE_REGION1, raddr); - r2t =3D *r1e & _REGION_ENTRY_ORIGIN; - *r1e =3D _REGION1_ENTRY_EMPTY; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r1t - remove all entries from a shadow region-1 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r1t: pointer to the start of a shadow region-1 table - * - * Called with the shadow->guest_table_lock - */ -static void __gmap_unshadow_r1t(struct gmap *sg, unsigned long raddr, - unsigned long *r1t) -{ - unsigned long asce; - struct page *page; - phys_addr_t r2t; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - asce =3D __pa(r1t) | _ASCE_TYPE_REGION1; - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION1_SIZE) { - if (!(r1t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r2t =3D r1t[i] & _REGION_ENTRY_ORIGIN; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Clear entry and flush translation r1t -> r2t */ - gmap_idte_one(asce, raddr); - r1t[i] =3D _REGION1_ENTRY_EMPTY; - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow - remove a shadow page table completely - * @sg: pointer to the shadow guest address space structure - * - * Called with sg->guest_table_lock - */ -void gmap_unshadow(struct gmap *sg) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - if (sg->removed) - return; - sg->removed =3D 1; - gmap_call_notifier(sg, 0, -1UL); - gmap_flush_tlb(sg); - table =3D __va(sg->asce & _ASCE_ORIGIN); - switch (sg->asce & _ASCE_TYPE_MASK) { - case _ASCE_TYPE_REGION1: - __gmap_unshadow_r1t(sg, 0, table); - break; - case _ASCE_TYPE_REGION2: - __gmap_unshadow_r2t(sg, 0, table); - break; - case _ASCE_TYPE_REGION3: - __gmap_unshadow_r3t(sg, 0, table); - break; - case _ASCE_TYPE_SEGMENT: - __gmap_unshadow_sgt(sg, 0, table); - break; - } -} -EXPORT_SYMBOL(gmap_unshadow); - -/** - * gmap_shadow_r2t - create an empty shadow region 2 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r2t: parent gmap address of the region 2 table to get shadowed - * @fake: r2t references contiguous guest memory block, not a r2t - * - * The r2t parameter specifies the address of the source table. The - * four pages of the source table are made read-only in the parent gmap - * address space. A write to the source table area @r2t will automatically - * remove the shadow r2 table and all of its descendants. - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r2t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r2t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 4); /* get region-1 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r2t), _REGION2_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r2t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r2t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r2t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION1_MASK) | _SHADOW_RMAP_REGION1; - origin =3D r2t & _REGION_ENTRY_ORIGIN; - offset =3D ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 4); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r2t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r2t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r2t); - -/** - * gmap_shadow_r3t - create a shadow region 3 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r3t: parent gmap address of the region 3 table to get shadowed - * @fake: r3t references contiguous guest memory block, not a r3t - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r3t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r3t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 3); /* get region-2 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r3t), _REGION3_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r3t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r3t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r3t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION2_MASK) | _SHADOW_RMAP_REGION2; - origin =3D r3t & _REGION_ENTRY_ORIGIN; - offset =3D ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 3); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r3t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r3t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r3t); - -/** - * gmap_shadow_sgt - create a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @sgt: parent gmap address of the segment table to get shadowed - * @fake: sgt references contiguous guest memory block, not a sgt - * - * Returns: 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_sgt; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (sgt & _REGION3_ENTRY_LARGE)); - /* Allocate a shadow segment table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_sgt =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 2); /* get region-3 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_sgt), _SEGMENT_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_sgt | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D sgt & _REGION_ENTRY_PROTECT; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make sgt read-only in parent gmap page table */ - raddr =3D (saddr & _REGION3_MASK) | _SHADOW_RMAP_REGION3; - origin =3D sgt & _REGION_ENTRY_ORIGIN; - offset =3D ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 2); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_sgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_sgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_sgt); - -static void gmap_pgste_set_pgt_addr(struct ptdesc *ptdesc, unsigned long p= gt_addr) -{ - unsigned long *pgstes =3D page_to_virt(ptdesc_page(ptdesc)); - - pgstes +=3D _PAGE_ENTRIES; - - pgstes[0] &=3D ~PGSTE_ST2_MASK; - pgstes[1] &=3D ~PGSTE_ST2_MASK; - pgstes[2] &=3D ~PGSTE_ST2_MASK; - pgstes[3] &=3D ~PGSTE_ST2_MASK; - - pgstes[0] |=3D (pgt_addr >> 16) & PGSTE_ST2_MASK; - pgstes[1] |=3D pgt_addr & PGSTE_ST2_MASK; - pgstes[2] |=3D (pgt_addr << 16) & PGSTE_ST2_MASK; - pgstes[3] |=3D (pgt_addr << 32) & PGSTE_ST2_MASK; -} - -/** - * gmap_shadow_pgt - instantiate a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pgt: parent gmap address of the page table to get shadowed - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory, - * -EFAULT if an address in the parent gmap could not be resolved and - * - * Called with gmap->mm->mmap_lock in read - */ -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake) -{ - unsigned long raddr, origin; - unsigned long *table; - struct ptdesc *ptdesc; - phys_addr_t s_pgt; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (pgt & _SEGMENT_ENTRY_LARGE)); - /* Allocate a shadow page table */ - ptdesc =3D page_table_alloc_pgste(sg->mm); - if (!ptdesc) - return -ENOMEM; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN; - if (fake) - origin |=3D GMAP_SHADOW_FAKE_TABLE; - gmap_pgste_set_pgt_addr(ptdesc, origin); - s_pgt =3D page_to_phys(ptdesc_page(ptdesc)); - /* Install shadow page table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _SEGMENT_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _SEGMENT_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - /* mark as invalid as long as the parent table is not protected */ - *table =3D (unsigned long) s_pgt | _SEGMENT_ENTRY | - (pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_SEGMENT_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make pgt read-only in parent gmap page table (not the pgste) */ - raddr =3D (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK; - rc =3D gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 1); - if (!table || (*table & _SEGMENT_ENTRY_ORIGIN) !=3D s_pgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_SEGMENT_ENTRY_INVALID; - } else { - gmap_unshadow_pgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - page_table_free_pgste(ptdesc); - return rc; - -} -EXPORT_SYMBOL_GPL(gmap_shadow_pgt); - -/** - * gmap_shadow_page - create a shadow page mapping - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pte: pte in parent gmap address space to get shadowed - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr, paddr; - spinlock_t *ptl; - pte_t *sptep, *tptep; - int prot; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - prot =3D (pte_val(pte) & _PAGE_PROTECT) ? PROT_READ : PROT_WRITE; - - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D (saddr & PAGE_MASK) | _SHADOW_RMAP_PGTABLE; - - while (1) { - paddr =3D pte_val(pte) & PAGE_MASK; - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - break; - rc =3D -EAGAIN; - sptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (sptep) { - spin_lock(&sg->guest_table_lock); - /* Get page table pointer */ - tptep =3D (pte_t *) gmap_table_walk(sg, saddr, 0); - if (!tptep) { - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(sptep, ptl); - radix_tree_preload_end(); - break; - } - rc =3D ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte); - if (rc > 0) { - /* Success and a new mapping */ - gmap_insert_rmap(sg, vmaddr, rmap); - rmap =3D NULL; - rc =3D 0; - } - gmap_pte_op_end(sptep, ptl); - spin_unlock(&sg->guest_table_lock); - } - radix_tree_preload_end(); - if (!rc) - break; - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, prot); - if (rc) - break; - } - kfree(rmap); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_page); - -/* - * gmap_shadow_notify - handle notifications for shadow gmap - * - * Called with sg->parent->shadow_lock. - */ -static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr, - unsigned long gaddr) -{ - struct gmap_rmap *rmap, *rnext, *head; - unsigned long start, end, bits, raddr; - - BUG_ON(!gmap_is_shadow(sg)); - - spin_lock(&sg->guest_table_lock); - if (sg->removed) { - spin_unlock(&sg->guest_table_lock); - return; - } - /* Check for top level table */ - start =3D sg->orig_asce & _ASCE_ORIGIN; - end =3D start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE; - if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >=3D start && - gaddr < end) { - /* The complete shadow table has to go */ - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - return; - } - /* Remove the page table tree from on specific entry */ - head =3D radix_tree_delete(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - gmap_for_each_rmap_safe(rmap, rnext, head) { - bits =3D rmap->raddr & _SHADOW_RMAP_MASK; - raddr =3D rmap->raddr ^ bits; - switch (bits) { - case _SHADOW_RMAP_REGION1: - gmap_unshadow_r2t(sg, raddr); - break; - case _SHADOW_RMAP_REGION2: - gmap_unshadow_r3t(sg, raddr); - break; - case _SHADOW_RMAP_REGION3: - gmap_unshadow_sgt(sg, raddr); - break; - case _SHADOW_RMAP_SEGMENT: - gmap_unshadow_pgt(sg, raddr); - break; - case _SHADOW_RMAP_PGTABLE: - gmap_unshadow_page(sg, raddr); - break; - } - kfree(rmap); - } - spin_unlock(&sg->guest_table_lock); -} - -/** - * ptep_notify - call all invalidation callbacks for a specific pte. - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - * @pte: pointer to the page table entry - * @bits: bits from the pgste that caused the notify call - * - * This function is assumed to be called with the page table lock held - * for the pte to notify. - */ -void ptep_notify(struct mm_struct *mm, unsigned long vmaddr, - pte_t *pte, unsigned long bits) -{ - unsigned long offset, gaddr =3D 0; - struct gmap *gmap, *sg, *next; - - offset =3D ((unsigned long) pte) & (255 * sizeof(pte_t)); - offset =3D offset * (PAGE_SIZE / sizeof(pte_t)); - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - gaddr =3D host_to_guest_lookup(gmap, vmaddr) + offset; - spin_unlock(&gmap->guest_table_lock); - if (!IS_GADDR_VALID(gaddr)) - continue; - - if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, - &gmap->children, list) - gmap_shadow_notify(sg, vmaddr, gaddr); - spin_unlock(&gmap->shadow_lock); - } - if (bits & PGSTE_IN_BIT) - gmap_call_notifier(gmap, gaddr, gaddr + PAGE_SIZE - 1); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(ptep_notify); - -static void pmdp_notify_gmap(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1); -} - -/** - * gmap_pmdp_xchg - exchange a gmap pmd with another - * @gmap: pointer to the guest address space structure - * @pmdp: pointer to the pmd entry - * @new: replacement entry - * @gaddr: the affected guest address - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new, - unsigned long gaddr) -{ - gaddr &=3D HPAGE_MASK; - pmdp_notify_gmap(gmap, pmdp, gaddr); - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_GMAP_IN)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, (pmd_t *)pmdp, IDTE_GUEST_ASCE, gmap->asce, - IDTE_GLOBAL); - else - __pmdp_idte(gaddr, (pmd_t *)pmdp, 0, 0, IDTE_GLOBAL); - set_pmd(pmdp, new); -} - -static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr, - int purge) -{ - pmd_t *pmdp; - struct gmap *gmap; - unsigned long gaddr; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (purge) - __pmdp_cspg(pmdp); - set_pmd(pmdp, __pmd(_SEGMENT_ENTRY_EMPTY)); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} - -/** - * gmap_pmdp_invalidate - invalidate all affected guest pmd entries without - * flushing - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr) -{ - gmap_pmdp_clear(mm, vmaddr, 0); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_invalidate); - -/** - * gmap_pmdp_idte_local - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_LOCAL); - else - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_local); - -/** - * gmap_pmdp_idte_global - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_GLOBAL); - else - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global); - -/** - * gmap_test_and_clear_dirty_pmd - test and reset segment dirty status - * @gmap: pointer to guest address space - * @pmdp: pointer to the pmd to be tested - * @gaddr: virtual address in the guest address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static bool gmap_test_and_clear_dirty_pmd(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return false; - - /* Already protected memory, which did not change is clean */ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_UC)) - return false; - - /* Clear UC indication and reset protection */ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_UC))); - gmap_protect_pmd(gmap, gaddr, pmdp, PROT_READ, 0); - return true; -} - -/** - * gmap_sync_dirty_log_pmd - set bitmap based on dirty status of segment - * @gmap: pointer to guest address space - * @bitmap: dirty bitmap for this pmd - * @gaddr: virtual address in the guest address space - * @vmaddr: virtual address in the host address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4], - unsigned long gaddr, unsigned long vmaddr) -{ - int i; - pmd_t *pmdp; - pte_t *ptep; - spinlock_t *ptl; - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return; - - if (pmd_leaf(*pmdp)) { - if (gmap_test_and_clear_dirty_pmd(gmap, pmdp, gaddr)) - bitmap_fill(bitmap, _PAGE_ENTRIES); - } else { - for (i =3D 0; i < _PAGE_ENTRIES; i++, vmaddr +=3D PAGE_SIZE) { - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, vmaddr, &ptl); - if (!ptep) - continue; - if (ptep_test_and_clear_uc(gmap->mm, vmaddr, ptep)) - set_bit(i, bitmap); - pte_unmap_unlock(ptep, ptl); - } - } - gmap_pmd_op_end(gmap, pmdp); -} -EXPORT_SYMBOL_GPL(gmap_sync_dirty_log_pmd); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static int thp_split_walk_pmd_entry(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - - split_huge_pmd(vma, pmd, addr); - return 0; -} - -static const struct mm_walk_ops thp_split_walk_ops =3D { - .pmd_entry =3D thp_split_walk_pmd_entry, - .walk_lock =3D PGWALK_WRLOCK_VERIFY, -}; - -static inline void thp_split_mm(struct mm_struct *mm) -{ - struct vm_area_struct *vma; - VMA_ITERATOR(vmi, mm, 0); - - for_each_vma(vmi, vma) { - vm_flags_mod(vma, VM_NOHUGEPAGE, VM_HUGEPAGE); - walk_page_vma(vma, &thp_split_walk_ops, NULL); - } - mm->def_flags |=3D VM_NOHUGEPAGE; -} -#else -static inline void thp_split_mm(struct mm_struct *mm) -{ -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -/* - * switch on pgstes for its userspace process (for kvm) - */ -int s390_enable_sie(void) -{ - struct mm_struct *mm =3D current->mm; - - /* Do we have pgstes? if yes, we are done */ - if (mm_has_pgste(mm)) - return 0; - mmap_write_lock(mm); - mm->context.has_pgste =3D 1; - /* split thp mappings and disable thp for future mappings */ - thp_split_mm(mm); - mmap_write_unlock(mm); - return 0; -} -EXPORT_SYMBOL_GPL(s390_enable_sie); - -/* - * Enable storage key handling from now on and initialize the storage - * keys with the default key. - */ -static int __s390_enable_skey_pte(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - /* Clear storage key */ - ptep_zap_key(walk->mm, addr, pte); - return 0; -} - -/* - * Give a chance to schedule after setting a key to 256 pages. - * We only hold the mm lock, which is a rwsem and the kvm srcu. - * Both can sleep. - */ -static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - cond_resched(); - return 0; -} - -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, - struct mm_walk *walk) -{ - pmd_t *pmd =3D (pmd_t *)pte; - unsigned long start, end; - struct folio *folio =3D page_folio(pmd_page(*pmd)); - - /* - * The write check makes sure we do not set a key on shared - * memory. This is needed as the walker does not differentiate - * between actual guest memory and the process executable or - * shared libraries. - */ - if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID || - !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE)) - return 0; - - start =3D pmd_val(*pmd) & HPAGE_MASK; - end =3D start + HPAGE_SIZE; - __storage_key_init_range(start, end); - set_bit(PG_arch_1, &folio->flags.f); - cond_resched(); - return 0; -} - -static const struct mm_walk_ops enable_skey_walk_ops =3D { - .hugetlb_entry =3D __s390_enable_skey_hugetlb, - .pte_entry =3D __s390_enable_skey_pte, - .pmd_entry =3D __s390_enable_skey_pmd, - .walk_lock =3D PGWALK_WRLOCK, -}; - -int s390_enable_skey(void) -{ - struct mm_struct *mm =3D current->mm; - int rc =3D 0; - - mmap_write_lock(mm); - if (mm_uses_skeys(mm)) - goto out_up; - - mm->context.uses_skeys =3D 1; - rc =3D gmap_helper_disable_cow_sharing(); - if (rc) { - mm->context.uses_skeys =3D 0; - goto out_up; - } - walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL); - -out_up: - mmap_write_unlock(mm); - return rc; -} -EXPORT_SYMBOL_GPL(s390_enable_skey); - -/* - * Reset CMMA state, make all pages stable again. - */ -static int __s390_reset_cmma(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - ptep_zap_unused(walk->mm, addr, pte, 1); - return 0; -} - -static const struct mm_walk_ops reset_cmma_walk_ops =3D { - .pte_entry =3D __s390_reset_cmma, - .walk_lock =3D PGWALK_WRLOCK, -}; - -void s390_reset_cmma(struct mm_struct *mm) -{ - mmap_write_lock(mm); - walk_page_range(mm, 0, TASK_SIZE, &reset_cmma_walk_ops, NULL); - mmap_write_unlock(mm); -} -EXPORT_SYMBOL_GPL(s390_reset_cmma); - -#define GATHER_GET_PAGES 32 - -struct reset_walk_state { - unsigned long next; - unsigned long count; - unsigned long pfns[GATHER_GET_PAGES]; -}; - -static int s390_gather_pages(pte_t *ptep, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct reset_walk_state *p =3D walk->private; - pte_t pte =3D READ_ONCE(*ptep); - - if (pte_present(pte)) { - /* we have a reference from the mapping, take an extra one */ - get_page(phys_to_page(pte_val(pte))); - p->pfns[p->count] =3D phys_to_pfn(pte_val(pte)); - p->next =3D next; - p->count++; - } - return p->count >=3D GATHER_GET_PAGES; -} - -static const struct mm_walk_ops gather_pages_ops =3D { - .pte_entry =3D s390_gather_pages, - .walk_lock =3D PGWALK_RDLOCK, -}; - -/* - * Call the Destroy secure page UVC on each page in the given array of PFN= s. - * Each page needs to have an extra reference, which will be released here. - */ -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns) -{ - struct folio *folio; - unsigned long i; - - for (i =3D 0; i < count; i++) { - folio =3D pfn_folio(pfns[i]); - /* we always have an extra reference */ - uv_destroy_folio(folio); - /* get rid of the extra reference */ - folio_put(folio); - cond_resched(); - } -} -EXPORT_SYMBOL_GPL(s390_uv_destroy_pfns); - -/** - * __s390_uv_destroy_range - Call the destroy secure page UVC on each page - * in the given range of the given address space. - * @mm: the mm to operate on - * @start: the start of the range - * @end: the end of the range - * @interruptible: if not 0, stop when a fatal signal is received - * - * Walk the given range of the given address space and call the destroy - * secure page UVC on each page. Optionally exit early if a fatal signal is - * pending. - * - * Return: 0 on success, -EINTR if the function stopped before completing - */ -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible) -{ - struct reset_walk_state state =3D { .next =3D start }; - int r =3D 1; - - while (r > 0) { - state.count =3D 0; - mmap_read_lock(mm); - r =3D walk_page_range(mm, state.next, end, &gather_pages_ops, &state); - mmap_read_unlock(mm); - cond_resched(); - s390_uv_destroy_pfns(state.count, state.pfns); - if (interruptible && fatal_signal_pending(current)) - return -EINTR; - } - return 0; -} -EXPORT_SYMBOL_GPL(__s390_uv_destroy_range); - -/** - * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py - * @gmap: the gmap whose ASCE needs to be replaced - * - * If the ASCE is a SEGMENT type then this function will return -EINVAL, - * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng - * to the wrong pages, causing use-after-free and memory corruption. - * If the allocation of the new top level page table fails, the ASCE is not - * replaced. - * In any case, the old ASCE is always removed from the gmap CRST list. - * Therefore the caller has to make sure to save a pointer to it - * beforehand, unless a leak is actually intended. - */ -int s390_replace_asce(struct gmap *gmap) -{ - unsigned long asce; - struct page *page; - void *table; - - /* Replacing segment type ASCEs would cause serious issues */ - if ((gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) - return -EINVAL; - - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - table =3D page_to_virt(page); - memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); - - /* Set new table origin while preserving existing ASCE control bits */ - asce =3D (gmap->asce & ~_ASCE_ORIGIN) | __pa(table); - WRITE_ONCE(gmap->asce, asce); - WRITE_ONCE(gmap->mm->context.gmap_asce, asce); - WRITE_ONCE(gmap->table, table); - - return 0; -} -EXPORT_SYMBOL_GPL(s390_replace_asce); diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 08743c1dac2f..eced1dc5214f 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -369,8 +369,6 @@ static inline void pmdp_idte_local(struct mm_struct *mm, mm->context.asce, IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_local(mm, addr); } =20 static inline void pmdp_idte_global(struct mm_struct *mm, @@ -379,12 +377,8 @@ static inline void pmdp_idte_global(struct mm_struct *= mm, if (machine_has_tlb_guest()) { __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } else { __pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } } =20 @@ -419,8 +413,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *m= m, cpumask_of(smp_processor_id()))) { set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_INVALID))); mm->context.flush_mm =3D 1; - if (mm_has_pgste(mm)) - gmap_pmdp_invalidate(mm, addr); } else { pmdp_idte_global(mm, addr, pmdp); } --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54A2C3328F7; Mon, 22 Dec 2025 16:51:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422282; cv=none; b=R9HJ+GaxxZ37qj6Wt4ZiGQsFVt2/ZayBveZxZAViCun6S3NvrgY8V6aJLgD/NjuLhgrqtMfufUJ1dOqfs5LI54WEJZlCUYUCStD6QgIX1Mb7yKokvJI6eQY4e3wB0E0n4IDMXQ1Sj4LfYLUDS/itsbS+PrQlZdhOzf94QfJBS/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422282; c=relaxed/simple; bh=YgnAGPBILdxDizumPMYhU7uwIW5qiHRkpUTvo711hFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Py2ouG3VruHjDgh3aEBkHLbVipD2ekwY/KQvsrZmPOARIcauHvhO+ErGdH0NlCsLkwwHww6OvtXISGQewwLF6CdRkgoHe5d4pb9ZmBq7LESB8mdekOiGRakYPoQ1rQDlvR3JRBoZE2ytacA1y7iSPn7P3Eb3HTjHJMuwQ/UGots= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=m2gndG+4; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="m2gndG+4" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMC0i79030227; Mon, 22 Dec 2025 16:51:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=PQNS9L3a7cOTU7wTJ ziWJYSK9qoY396nUy/9FoPzCFg=; b=m2gndG+4ci5PN7g0qMGRGU/+hMZ2D1kv6 j76KPOGvKYGd4/CELyVeqo1NRAEvWukbu2jcBWrwR+DJfQbFUiJDSzi/zMiauNYW At6LDK0zyZwTPjOckCnQQb89JvbHubDxtmwJyw7baI1DEv9+ouq616SAOMAqbgMj tif9Waj9hWkFthK/NAPtMlYHFHC52JAvSTPxOmZEi9sDyKMCZapGIIB4g06OmSVN j8a+aMkkpk8HGtGbUpabLadkCri3ohmQTcBlkxnVFV9W07lteZj7ScHfdaE3xcXp 9h/W+bFSqot2csAfHOkD9+5olRHLD9Q10wAEJ1Qs+SvzdDJZ6gxLQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5h7hs7fb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:15 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BME3UXa032254; Mon, 22 Dec 2025 16:51:14 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4b68u0xvcf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:14 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGpAbe43450706 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:10 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2F96320040; Mon, 22 Dec 2025 16:51:10 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0A4BE20043; Mon, 22 Dec 2025 16:51:09 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:08 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 26/28] KVM: S390: Remove PGSTE code from linux/s390 mm Date: Mon, 22 Dec 2025 17:50:31 +0100 Message-ID: <20251222165033.162329-27-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX9TmY4fRrzplJ oWRF2cS77NOmswqTCZL1chSA6+h6fihI4WdMD02hsLc7eko94liUZeB8059a45wjW6ejDOGF5j+ XzdrxVdLkfwGjDEnLBIOIlpgaR+/9bH56B3eJC4UA+mm/pFULxGCnb/XkM8kCtUR4ppW0n2THbx PXputLFEZZaEyHAof+Sp47qOujy4urPJyOa1qvnrs/Ej/ByW9FfII2iEMMy2Mopt+tTDXF9f99u /y2X1JtTUYRGrvcI/W8qRkokR2g84H2fAeJGPg+0wMvtWh5DwyrB3u4bmMxkYWTmDeo7JW/buIq 2dtVZz2omzJodJGNM9WQGLEMtjePJ3Qf5C1LPdHu1qwpCy6SOglKnZERrGGyDOolzsMX4X8T8iY 8RJYmtIXR8BCS2fWTHBeotdMF7+MKJcLmQ0zK2UZOoASKljJK5HDks+MADqz72Df/PmfF509kHR BZt+gmPWy1wg5LF1f8A== X-Authority-Analysis: v=2.4 cv=Ba3VE7t2 c=1 sm=1 tr=0 ts=69497703 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iGo5jALgg6wKPfVY9XUA:9 X-Proofpoint-ORIG-GUID: tTiWt1caDVTnje6UlmL9J4kdmvylArpv X-Proofpoint-GUID: tTiWt1caDVTnje6UlmL9J4kdmvylArpv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 bulkscore=0 adultscore=0 priorityscore=1501 malwarescore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Remove the PGSTE config option. Remove all code from linux/s390 mm that involves PGSTEs. Signed-off-by: Claudio Imbrenda --- arch/s390/Kconfig | 3 - arch/s390/include/asm/hugetlb.h | 6 - arch/s390/include/asm/mmu.h | 13 - arch/s390/include/asm/page.h | 4 - arch/s390/include/asm/pgalloc.h | 4 - arch/s390/include/asm/pgtable.h | 121 +---- arch/s390/kvm/dat.h | 1 + arch/s390/mm/hugetlbpage.c | 24 - arch/s390/mm/pgalloc.c | 24 - arch/s390/mm/pgtable.c | 827 +------------------------------- mm/khugepaged.c | 9 - 11 files changed, 15 insertions(+), 1021 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 8270754985e9..961cbf023c1b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -32,9 +32,6 @@ config GENERIC_BUG_RELATIVE_POINTERS config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 -config PGSTE - def_bool n - config AUDIT_ARCH def_bool y =20 diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 69131736daaa..6983e52eaf81 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_st= ruct *mm, return __huge_ptep_get_and_clear(mm, addr, ptep); } =20 -static inline void arch_clear_hugetlb_flags(struct folio *folio) -{ - clear_bit(PG_arch_1, &folio->flags.f); -} -#define arch_clear_hugetlb_flags arch_clear_hugetlb_flags - #define __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index f07e49b419ab..d4fd7bf3692e 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -18,24 +18,11 @@ typedef struct { unsigned long vdso_base; /* The mmu context belongs to a secure guest. */ atomic_t protected_count; - /* - * The following bitfields need a down_write on the mm - * semaphore when they are written to. As they are only - * written once, they can be read without a lock. - */ - /* The mmu context uses extended page tables. */ - unsigned int has_pgste:1; - /* The mmu context uses storage keys. */ - unsigned int uses_skeys:1; - /* The mmu context uses CMM. */ - unsigned int uses_cmm:1; /* * The mmu context allows COW-sharing of memory pages (KSM, zeropage). * Note that COW-sharing during fork() is currently always allowed. */ unsigned int allow_cow_sharing:1; - /* The gmaps associated with this context are allowed to use huge pages. = */ - unsigned int allow_gmap_hpage_1m:1; } mm_context_t; =20 #define INIT_MM_CONTEXT(name) \ diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index c1d63b613bf9..6de2f4d25b63 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -78,7 +78,6 @@ static inline void copy_page(void *to, void *from) #ifdef STRICT_MM_TYPECHECKS =20 typedef struct { unsigned long pgprot; } pgprot_t; -typedef struct { unsigned long pgste; } pgste_t; typedef struct { unsigned long pte; } pte_t; typedef struct { unsigned long pmd; } pmd_t; typedef struct { unsigned long pud; } pud_t; @@ -94,7 +93,6 @@ static __always_inline unsigned long name ## _val(name ##= _t name) \ #else /* STRICT_MM_TYPECHECKS */ =20 typedef unsigned long pgprot_t; -typedef unsigned long pgste_t; typedef unsigned long pte_t; typedef unsigned long pmd_t; typedef unsigned long pud_t; @@ -110,7 +108,6 @@ static __always_inline unsigned long name ## _val(name = ## _t name) \ #endif /* STRICT_MM_TYPECHECKS */ =20 DEFINE_PGVAL_FUNC(pgprot) -DEFINE_PGVAL_FUNC(pgste) DEFINE_PGVAL_FUNC(pte) DEFINE_PGVAL_FUNC(pmd) DEFINE_PGVAL_FUNC(pud) @@ -120,7 +117,6 @@ DEFINE_PGVAL_FUNC(pgd) typedef pte_t *pgtable_t; =20 #define __pgprot(x) ((pgprot_t) { (x) } ) -#define __pgste(x) ((pgste_t) { (x) } ) #define __pte(x) ((pte_t) { (x) } ) #define __pmd(x) ((pmd_t) { (x) } ) #define __pud(x) ((pud_t) { (x) } ) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgallo= c.h index a16e65072371..a5de9e61ea9e 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -27,10 +27,6 @@ unsigned long *page_table_alloc_noprof(struct mm_struct = *); #define page_table_alloc(...) alloc_hooks(page_table_alloc_noprof(__VA_ARG= S__)) void page_table_free(struct mm_struct *, unsigned long *); =20 -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm); -#define page_table_alloc_pgste(...) alloc_hooks(page_table_alloc_pgste_nop= rof(__VA_ARGS__)) -void page_table_free_pgste(struct ptdesc *ptdesc); - static inline void crst_table_init(unsigned long *crst, unsigned long entr= y) { memset64((u64 *)crst, entry, _CRST_ENTRIES); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 45f13697cf9e..1c3c3be93be9 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -413,28 +413,6 @@ void setup_protection_map(void); * SW-bits: y young, d dirty, r read, w write */ =20 -/* Page status table bits for virtualization */ -#define PGSTE_ACC_BITS 0xf000000000000000UL -#define PGSTE_FP_BIT 0x0800000000000000UL -#define PGSTE_PCL_BIT 0x0080000000000000UL -#define PGSTE_HR_BIT 0x0040000000000000UL -#define PGSTE_HC_BIT 0x0020000000000000UL -#define PGSTE_GR_BIT 0x0004000000000000UL -#define PGSTE_GC_BIT 0x0002000000000000UL -#define PGSTE_ST2_MASK 0x0000ffff00000000UL -#define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */ -#define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */ -#define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */ - -/* Guest Page State used for virtualization */ -#define _PGSTE_GPS_ZERO 0x0000000080000000UL -#define _PGSTE_GPS_NODAT 0x0000000040000000UL -#define _PGSTE_GPS_USAGE_MASK 0x0000000003000000UL -#define _PGSTE_GPS_USAGE_STABLE 0x0000000000000000UL -#define _PGSTE_GPS_USAGE_UNUSED 0x0000000001000000UL -#define _PGSTE_GPS_USAGE_POT_VOLATILE 0x0000000002000000UL -#define _PGSTE_GPS_USAGE_VOLATILE _PGSTE_GPS_USAGE_MASK - /* * A user page table pointer has the space-switch-event bit, the * private-space-control bit and the storage-alteration-event-control @@ -566,15 +544,6 @@ static inline bool mm_pmd_folded(struct mm_struct *mm) } #define mm_pmd_folded(mm) mm_pmd_folded(mm) =20 -static inline int mm_has_pgste(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (unlikely(mm->context.has_pgste)) - return 1; -#endif - return 0; -} - static inline int mm_is_protected(struct mm_struct *mm) { #if IS_ENABLED(CONFIG_KVM) @@ -584,16 +553,6 @@ static inline int mm_is_protected(struct mm_struct *mm) return 0; } =20 -static inline pgste_t clear_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) & ~mask); -} - -static inline pgste_t set_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) | mask); -} - static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { return __pte(pte_val(pte) & ~pgprot_val(prot)); @@ -639,15 +598,6 @@ static inline int mm_forbids_zeropage(struct mm_struct= *mm) return 0; } =20 -static inline int mm_uses_skeys(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (mm->context.uses_skeys) - return 1; -#endif - return 0; -} - /** * cspg() - Compare and Swap and Purge (CSPG) * @ptr: Pointer to the value to be exchanged @@ -1356,45 +1306,13 @@ static inline int ptep_set_access_flags(struct vm_a= rea_struct *vma, { if (pte_same(*ptep, entry)) return 0; - if (cpu_has_rdp() && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, en= try)) + if (cpu_has_rdp() && pte_allow_rdp(*ptep, entry)) ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry); else ptep_xchg_direct(vma->vm_mm, addr, ptep, entry); return 1; } =20 -/* - * Additional functions to handle KVM guest page tables - */ -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry); -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, - pte_t *ptep, int prot, unsigned long bit); -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep , int reset); -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep); -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte); -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep); - -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long address, - pte_t *ptep); -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq); -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc); -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr); -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key); - -int set_pgste_bits(struct mm_struct *mm, unsigned long addr, - unsigned long bits, unsigned long value); -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste); - #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); =20 @@ -1409,23 +1327,12 @@ static inline void set_ptes(struct mm_struct *mm, u= nsigned long addr, { if (pte_present(entry)) entry =3D clear_pte_bit(entry, __pgprot(_PAGE_UNUSED)); - if (mm_has_pgste(mm)) { - for (;;) { - ptep_set_pte_at(mm, addr, ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - addr +=3D PAGE_SIZE; - } - } else { - for (;;) { - set_pte(ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - } + for (;;) { + set_pte(ptep, entry); + if (--nr =3D=3D 0) + break; + ptep++; + entry =3D __pte(pte_val(entry) + PAGE_SIZE); } } #define set_ptes set_ptes @@ -2026,18 +1933,4 @@ extern pte_t *vmem_get_alloc_pte(unsigned long addr,= bool alloc); #define pmd_pgtable(pmd) \ ((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE)) =20 -static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt) -{ - unsigned long *pgstes, res; - - pgstes =3D pgt + _PAGE_ENTRIES; - - res =3D (pgstes[0] & PGSTE_ST2_MASK) << 16; - res |=3D pgstes[1] & PGSTE_ST2_MASK; - res |=3D (pgstes[2] & PGSTE_ST2_MASK) >> 16; - res |=3D (pgstes[3] & PGSTE_ST2_MASK) >> 32; - - return res; -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index b3ac58d10d6c..71ecbc21a46f 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -108,6 +108,7 @@ union pte { #define _PAGE_SD 0x002 =20 /* Needed as macro to perform atomic operations */ +#define PGSTE_PCL_BIT 0x0080000000000000UL /* PCL lock, HW bit */ #define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ =20 enum pgste_gps_usage { diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index d42e61c7594e..35a898e15b1c 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -135,29 +135,6 @@ static inline pte_t __rste_to_pte(unsigned long rste) return __pte(pteval); } =20 -static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste) -{ - struct folio *folio; - unsigned long size, paddr; - - if (!mm_uses_skeys(mm) || - rste & _SEGMENT_ENTRY_INVALID) - return; - - if ((rste & _REGION_ENTRY_TYPE_MASK) =3D=3D _REGION_ENTRY_TYPE_R3) { - folio =3D page_folio(pud_page(__pud(rste))); - size =3D PUD_SIZE; - paddr =3D rste & PUD_MASK; - } else { - folio =3D page_folio(pmd_page(__pmd(rste))); - size =3D PMD_SIZE; - paddr =3D rste & PMD_MASK; - } - - if (!test_and_set_bit(PG_arch_1, &folio->flags.f)) - __storage_key_init_range(paddr, paddr + size); -} - void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -173,7 +150,6 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned l= ong addr, } else if (likely(pte_present(pte))) rste |=3D _SEGMENT_ENTRY_LARGE; =20 - clear_huge_pte_skeys(mm, rste); set_pte(ptep, __pte(rste)); } =20 diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 7df23528c01b..7ac44543e051 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -114,30 +114,6 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned = long end) return -ENOMEM; } =20 -#ifdef CONFIG_PGSTE - -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm) -{ - struct ptdesc *ptdesc; - u64 *table; - - ptdesc =3D pagetable_alloc_noprof(GFP_KERNEL_ACCOUNT, 0); - if (ptdesc) { - table =3D (u64 *)ptdesc_address(ptdesc); - __arch_set_page_dat(table, 1); - memset64(table, _PAGE_INVALID, PTRS_PER_PTE); - memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE); - } - return ptdesc; -} - -void page_table_free_pgste(struct ptdesc *ptdesc) -{ - pagetable_free(ptdesc); -} - -#endif /* CONFIG_PGSTE */ - unsigned long *page_table_alloc_noprof(struct mm_struct *mm) { gfp_t gfp =3D GFP_KERNEL_ACCOUNT; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index eced1dc5214f..4acd8b140c4b 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -115,171 +115,14 @@ static inline pte_t ptep_flush_lazy(struct mm_struct= *mm, return old; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - -static inline pgste_t pgste_get(pte_t *ptep) -{ - unsigned long pgste =3D 0; -#ifdef CONFIG_PGSTE - pgste =3D *(unsigned long *)(ptep + PTRS_PER_PTE); -#endif - return __pgste(pgste); -} - -static inline void pgste_set(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - *(pgste_t *)(ptep + PTRS_PER_PTE) =3D pgste; -#endif -} - -static inline pgste_t pgste_update_all(pte_t pte, pgste_t pgste, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address, bits, skey; - - if (!mm_uses_skeys(mm) || pte_val(pte) & _PAGE_INVALID) - return pgste; - address =3D pte_val(pte) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(address); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - /* Transfer page changed & referenced bit to guest bits in pgste */ - pgste =3D set_pgste_bit(pgste, bits << 48); /* GR bit & GC bit */ - /* Copy page access key and fetch protection bit to pgste */ - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) <= < 56); -#endif - return pgste; - -} - -static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address; - unsigned long nkey; - - if (!mm_uses_skeys(mm) || pte_val(entry) & _PAGE_INVALID) - return; - VM_BUG_ON(!(pte_val(*ptep) & _PAGE_INVALID)); - address =3D pte_val(entry) & PAGE_MASK; - /* - * Set page access key and fetch protection bit from pgste. - * The guest C/R information is still in the PGSTE, set real - * key C/R to 0. - */ - nkey =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - nkey |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - page_set_storage_key(address, nkey, 0); -#endif -} - -static inline pgste_t pgste_set_pte(pte_t *ptep, pgste_t pgste, pte_t entr= y) -{ -#ifdef CONFIG_PGSTE - if ((pte_val(entry) & _PAGE_PRESENT) && - (pte_val(entry) & _PAGE_WRITE) && - !(pte_val(entry) & _PAGE_INVALID)) { - if (!machine_has_esop()) { - /* - * Without enhanced suppression-on-protection force - * the dirty bit on for all writable ptes. - */ - entry =3D set_pte_bit(entry, __pgprot(_PAGE_DIRTY)); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - if (!(pte_val(entry) & _PAGE_PROTECT)) - /* This pte allows write access, set user-dirty */ - pgste =3D set_pgste_bit(pgste, PGSTE_UC_BIT); - } -#endif - set_pte(ptep, entry); - return pgste; -} - -static inline pgste_t pgste_pte_notify(struct mm_struct *mm, - unsigned long addr, - pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - unsigned long bits; - - bits =3D pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT); - if (bits) { - pgste =3D __pgste(pgste_val(pgste) ^ bits); - ptep_notify(mm, addr, ptep, bits); - } -#endif - return pgste; -} - -static inline pgste_t ptep_xchg_start(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) -{ - pgste_t pgste =3D __pgste(0); - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get_lock(ptep); - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - } - return pgste; -} - -static inline pte_t ptep_xchg_commit(struct mm_struct *mm, - unsigned long addr, pte_t *ptep, - pgste_t pgste, pte_t old, pte_t new) -{ - if (mm_has_pgste(mm)) { - if (pte_val(old) & _PAGE_INVALID) - pgste_set_key(ptep, pgste, new, mm); - if (pte_val(new) & _PAGE_INVALID) { - pgste =3D pgste_update_all(old, pgste, mm); - if ((pgste_val(pgste) & _PGSTE_GPS_USAGE_MASK) =3D=3D - _PGSTE_GPS_USAGE_UNUSED) - old =3D set_pte_bit(old, __pgprot(_PAGE_UNUSED)); - } - pgste =3D pgste_set_pte(ptep, pgste, new); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, new); - } - return old; -} - pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_direct(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_direct(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -313,15 +156,11 @@ EXPORT_SYMBOL(ptep_reset_dat_prot); pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_lazy(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -330,43 +169,20 @@ EXPORT_SYMBOL(ptep_xchg_lazy); pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep) { - pgste_t pgste; - pte_t old; - int nodat; - struct mm_struct *mm =3D vma->vm_mm; - - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - if (mm_has_pgste(mm)) { - pgste =3D pgste_update_all(old, pgste, mm); - pgste_set(ptep, pgste); - } - return old; + return ptep_flush_lazy(vma->vm_mm, addr, ptep, 1); } =20 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, pte_t old_pte, pte_t pte) { - pgste_t pgste; - struct mm_struct *mm =3D vma->vm_mm; - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get(ptep); - pgste_set_key(ptep, pgste, pte, mm); - pgste =3D pgste_set_pte(ptep, pgste, pte); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, pte); - } + set_pte(ptep, pte); } =20 static inline void pmdp_idte_local(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { if (machine_has_tlb_guest()) - __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, - mm->context.asce, IDTE_LOCAL); + __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, = IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); } @@ -420,40 +236,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *= mm, return old; } =20 -#ifdef CONFIG_PGSTE -static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pm= dp) -{ - struct vm_area_struct *vma; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - - /* We need a valid VMA, otherwise this is clearly a fault. */ - vma =3D vma_lookup(mm, addr); - if (!vma) - return -EFAULT; - - pgd =3D pgd_offset(mm, addr); - if (!pgd_present(*pgd)) - return -ENOENT; - - p4d =3D p4d_offset(pgd, addr); - if (!p4d_present(*p4d)) - return -ENOENT; - - pud =3D pud_offset(p4d, addr); - if (!pud_present(*pud)) - return -ENOENT; - - /* Large PUDs are not supported yet. */ - if (pud_leaf(*pud)) - return -EFAULT; - - *pmdp =3D pmd_offset(pud, addr); - return 0; -} -#endif - pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t new) { @@ -571,598 +353,3 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struc= t *mm, pmd_t *pmdp) return pgtable; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -#ifdef CONFIG_PGSTE -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry) -{ - pgste_t pgste; - - /* the mm_has_pgste() check is done in set_pte_at() */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_ZERO); - pgste_set_key(ptep, pgste, entry, mm); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - pgste_t pgste; - - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D set_pgste_bit(pgste, PGSTE_IN_BIT); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/** - * ptep_force_prot - change access rights of a locked pte - * @mm: pointer to the process mm_struct - * @addr: virtual address in the guest address space - * @ptep: pointer to the page table entry - * @prot: indicates guest access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bit: pgste bit to set (e.g. for notification) - * - * Returns 0 if the access rights were changed and -EAGAIN if the current - * and requested access rights are incompatible. - */ -int ptep_force_prot(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int prot, unsigned long bit) -{ - pte_t entry; - pgste_t pgste; - int pte_i, pte_p, nodat; - - pgste =3D pgste_get_lock(ptep); - entry =3D *ptep; - /* Check pte entry after all locks have been acquired */ - pte_i =3D pte_val(entry) & _PAGE_INVALID; - pte_p =3D pte_val(entry) & _PAGE_PROTECT; - if ((pte_i && (prot !=3D PROT_NONE)) || - (pte_p && (prot & PROT_WRITE))) { - pgste_set_unlock(ptep, pgste); - return -EAGAIN; - } - /* Change access rights and set pgste bit */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - if (prot =3D=3D PROT_NONE && !pte_i) { - ptep_flush_direct(mm, addr, ptep, nodat); - pgste =3D pgste_update_all(entry, pgste, mm); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_INVALID)); - } - if (prot =3D=3D PROT_READ && !pte_p) { - ptep_flush_direct(mm, addr, ptep, nodat); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_INVALID)); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - pgste =3D set_pgste_bit(pgste, bit); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - return 0; -} - -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte) -{ - pgste_t spgste, tpgste; - pte_t spte, tpte; - int rc =3D -EAGAIN; - - if (!(pte_val(*tptep) & _PAGE_INVALID)) - return 0; /* already shadowed */ - spgste =3D pgste_get_lock(sptep); - spte =3D *sptep; - if (!(pte_val(spte) & _PAGE_INVALID) && - !((pte_val(spte) & _PAGE_PROTECT) && - !(pte_val(pte) & _PAGE_PROTECT))) { - spgste =3D set_pgste_bit(spgste, PGSTE_VSIE_BIT); - tpgste =3D pgste_get_lock(tptep); - tpte =3D __pte((pte_val(spte) & PAGE_MASK) | - (pte_val(pte) & _PAGE_PROTECT)); - /* don't touch the storage key - it belongs to parent pgste */ - tpgste =3D pgste_set_pte(tptep, tpgste, tpte); - pgste_set_unlock(tptep, tpgste); - rc =3D 1; - } - pgste_set_unlock(sptep, spgste); - return rc; -} - -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep) -{ - pgste_t pgste; - int nodat; - - pgste =3D pgste_get_lock(ptep); - /* notifier is called by the caller */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_flush_direct(mm, saddr, ptep, nodat); - /* don't touch the storage key - it belongs to parent pgste */ - pgste =3D pgste_set_pte(ptep, pgste, __pte(_PAGE_INVALID)); - pgste_set_unlock(ptep, pgste); -} - -static void ptep_zap_softleaf_entry(struct mm_struct *mm, softleaf_t entry) -{ - if (softleaf_is_swap(entry)) - dec_mm_counter(mm, MM_SWAPENTS); - else if (softleaf_is_migration(entry)) { - struct folio *folio =3D softleaf_to_folio(entry); - - dec_mm_counter(mm, mm_counter(folio)); - } - free_swap_and_cache(entry); -} - -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int reset) -{ - unsigned long pgstev; - pgste_t pgste; - pte_t pte; - - /* Zap unused and logically-zero pages */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - pte =3D *ptep; - if (!reset && pte_swap(pte) && - ((pgstev & _PGSTE_GPS_USAGE_MASK) =3D=3D _PGSTE_GPS_USAGE_UNUSED || - (pgstev & _PGSTE_GPS_ZERO))) { - ptep_zap_softleaf_entry(mm, softleaf_from_pte(pte)); - pte_clear(mm, addr, ptep); - } - if (reset) - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODA= T); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - unsigned long ptev; - pgste_t pgste; - - /* Clear storage key ACC and F, but set R/C */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, PGSTE_GR_BIT | PGSTE_GC_BIT); - ptev =3D pte_val(*ptep); - if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) - page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 0); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/* - * Test and reset if a guest page is dirty - */ -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) -{ - pgste_t pgste; - pte_t pte; - bool dirty; - int nodat; - - pgste =3D pgste_get_lock(ptep); - dirty =3D !!(pgste_val(pgste) & PGSTE_UC_BIT); - pgste =3D clear_pgste_bit(pgste, PGSTE_UC_BIT); - pte =3D *ptep; - if (dirty && (pte_val(pte) & _PAGE_PRESENT)) { - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_ipte_global(mm, addr, ptep, nodat); - if (machine_has_esop() || !(pte_val(pte) & _PAGE_WRITE)) - pte =3D set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); - else - pte =3D set_pte_bit(pte, __pgprot(_PAGE_INVALID)); - set_pte(ptep, pte); - } - pgste_set_unlock(ptep, pgste); - return dirty; -} -EXPORT_SYMBOL_GPL(ptep_test_and_clear_uc); - -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq) -{ - unsigned long keyul, paddr; - spinlock_t *ptl; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * we can ignore attempts to set the key to 0, because it already is 0. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return key ? -EFAULT : 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return key ? -EFAULT : 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - /* - * Huge pmds need quiescing operations, they are - * always mapped. - */ - page_set_storage_key(paddr, key, 1); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - new =3D clear_pgste_bit(new, PGSTE_GR_BIT | PGSTE_GC_BIT | - PGSTE_ACC_BITS | PGSTE_FP_BIT); - keyul =3D (unsigned long) key; - new =3D set_pgste_bit(new, (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) <= < 48); - new =3D set_pgste_bit(new, (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 5= 6); - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - unsigned long bits, skey; - - paddr =3D pte_val(*ptep) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(paddr); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - skey =3D key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); - /* Set storage key ACC and FP */ - page_set_storage_key(paddr, skey, !nq); - /* Merge host changed & referenced into pgste */ - new =3D set_pgste_bit(new, bits << 52); - } - /* changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & - (PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT)) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_guest_storage_key); - -/* - * Conditionally set a guest storage key (handling csske). - * oldkey will be updated when either mr or mc is set and a pointer is giv= en. - * - * Returns 0 if a guests storage key update wasn't necessary, 1 if the gue= st - * storage key was updated and -EFAULT on access errors. - */ -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc) -{ - unsigned char tmp, mask =3D _PAGE_ACC_BITS | _PAGE_FP_BIT; - int rc; - - /* we can drop the pgste lock between getting and setting the key */ - if (mr | mc) { - rc =3D get_guest_storage_key(current->mm, addr, &tmp); - if (rc) - return rc; - if (oldkey) - *oldkey =3D tmp; - if (!mr) - mask |=3D _PAGE_REFERENCED; - if (!mc) - mask |=3D _PAGE_CHANGED; - if (!((tmp ^ key) & mask)) - return 0; - } - rc =3D set_guest_storage_key(current->mm, addr, key, nq); - return rc < 0 ? rc : 1; -} -EXPORT_SYMBOL(cond_set_guest_storage_key); - -/* - * Reset a guest reference bit (rrbe), returning the reference and changed= bit. - * - * Returns < 0 in case of error, otherwise the cc to be reported to the gu= est. - */ -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr) -{ - spinlock_t *ptl; - unsigned long paddr; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - int cc =3D 0; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0 and there is nothing for us to do. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - cc =3D page_reset_referenced(paddr); - spin_unlock(ptl); - return cc; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - /* Reset guest reference bit only */ - new =3D clear_pgste_bit(new, PGSTE_GR_BIT); - - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - paddr =3D pte_val(*ptep) & PAGE_MASK; - cc =3D page_reset_referenced(paddr); - /* Merge real referenced bit into host-set */ - new =3D set_pgste_bit(new, ((unsigned long)cc << 53) & PGSTE_HR_BIT); - } - /* Reflect guest's logical view, not physical */ - cc |=3D (pgste_val(old) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 49; - /* Changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & PGSTE_GR_BIT) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return cc; -} -EXPORT_SYMBOL(reset_guest_reference_bit); - -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key) -{ - unsigned long paddr; - spinlock_t *ptl; - pgste_t pgste; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0. - */ - *key =3D 0; - - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - *key =3D page_get_storage_key(paddr); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - pgste =3D pgste_get_lock(ptep); - *key =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - paddr =3D pte_val(*ptep) & PAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) - *key =3D page_get_storage_key(paddr); - /* Reflect guest's logical view, not physical */ - *key |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_guest_storage_key); - -/** - * pgste_perform_essa - perform ESSA actions on the PGSTE. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @orc: the specific action to perform, see the ESSA_SET_* macros. - * @oldpte: the PTE will be saved there if the pointer is not NULL. - * @oldpgste: the old PGSTE will be saved there if the pointer is not NULL. - * - * Return: 1 if the page is to be added to the CBRL, otherwise 0, - * or < 0 in case of error. -EINVAL is returned for invalid values - * of orc, -EFAULT for invalid addresses. - */ -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste) -{ - struct vm_area_struct *vma; - unsigned long pgstev; - spinlock_t *ptl; - pgste_t pgste; - pte_t *ptep; - int res =3D 0; - - WARN_ON_ONCE(orc > ESSA_MAX); - if (unlikely(orc > ESSA_MAX)) - return -EINVAL; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - if (oldpte) - *oldpte =3D pte_val(*ptep); - if (oldpgste) - *oldpgste =3D pgstev; - - switch (orc) { - case ESSA_GET_STATE: - break; - case ESSA_SET_STABLE: - pgstev &=3D ~(_PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT); - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - break; - case ESSA_SET_UNUSED: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_UNUSED; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_POT_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev |=3D _PGSTE_GPS_USAGE_POT_VOLATILE; - break; - } - if (pgstev & _PGSTE_GPS_ZERO) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - break; - } - if (!(pgstev & PGSTE_GC_BIT)) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - res =3D 1; - break; - } - break; - case ESSA_SET_STABLE_RESIDENT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - /* - * Since the resident state can go away any time after this - * call, we will not make this page resident. We can revisit - * this decision if a guest will ever start using this. - */ - break; - case ESSA_SET_STABLE_IF_RESIDENT: - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - } - break; - case ESSA_SET_STABLE_NODAT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE | _PGSTE_GPS_NODAT; - break; - default: - /* we should never get here! */ - break; - } - /* If we are discarding a page, set it to logical zero */ - if (res) - pgstev |=3D _PGSTE_GPS_ZERO; - - pgste =3D __pgste(pgstev); - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return res; -} -EXPORT_SYMBOL(pgste_perform_essa); - -/** - * set_pgste_bits - set specific PGSTE bits. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @bits: a bitmask representing the bits that will be touched - * @value: the values of the bits to be written. Only the bits in the mask - * will be written. - * - * Return: 0 on success, < 0 in case of error. - */ -int set_pgste_bits(struct mm_struct *mm, unsigned long hva, - unsigned long bits, unsigned long value) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pgste_t new; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - new =3D pgste_get_lock(ptep); - - new =3D clear_pgste_bit(new, bits); - new =3D set_pgste_bit(new, value & bits); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_pgste_bits); - -/** - * get_pgste - get the current PGSTE for the given address. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @pgstep: will be written with the current PGSTE for the given address. - * - * Return: 0 on success, < 0 in case of error. - */ -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - *pgstep =3D pgste_val(pgste_get(ptep)); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_pgste); -#endif diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 97d1b2824386..be3a2a603fb1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -343,15 +343,6 @@ int hugepage_madvise(struct vm_area_struct *vma, { switch (advice) { case MADV_HUGEPAGE: -#ifdef CONFIG_S390 - /* - * qemu blindly sets MADV_HUGEPAGE on all allocations, but s390 - * can't handle this properly after s390_enable_sie, so we simply - * ignore the madvise to prevent qemu from causing a SIGSEGV. - */ - if (mm_has_pgste(vma->vm_mm)) - return 0; -#endif *vm_flags &=3D ~VM_NOHUGEPAGE; *vm_flags |=3D VM_HUGEPAGE; /* --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58D3C3491F3; Mon, 22 Dec 2025 16:51:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422281; cv=none; b=VqEv53Xdi9dAFE+V4TU7usHObpo/P7oECjrmWNmxmArSy2iTAX1HbNNzDscQwkVkUZtRqcsvog41k8WI0QM1pG7xfvumlqLFz+NruIV4zjh8E+4KLnVTqclA46/M4AXJxGWSL9cFU1NVFyY5qowyaMpVkS51UA3P5xTEJY7zTpQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422281; c=relaxed/simple; bh=mA/FSA+uMJsFbEP0CQxI3MfRaZY3f8VFSRZWLARf0No=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MFiSwVJkLOrFUHO3JlZa1R2SKqcAvrR1HiC+qLBEt1EY2KR+60u6Ro7k5J4EH3c/Y7xEyU19ayI4hW7dZaYcZiEkguCaL8+9CReOGRP3q9z6hGdZe++9l8nDdB4cefm5LwwV5v5FP8bfXS0YxJxBleM1z+qN7j5Fvq8tZrUyITE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=FGUfMaQw; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="FGUfMaQw" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM7bGlj029926; Mon, 22 Dec 2025 16:51:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=+20bgMpLDfPW1Ozr7 f7qrf3uxA2kaHdzcHQeh221P9I=; b=FGUfMaQw9U2752RfdH79NyP+yLaKFfuEj a1JptFDeWgBDynEB/y6QHPAk+R5s7AlH5MVk8RiYliDR2AeECsn7/OxOXyzlaha7 3tkV2d2uwCv8+sYv1PtWQ+u2BLLUNmlp1JIDUUdQyr8K5pf3dxWdvTrOLfwB0Ksn +A5mGTTiih5B8HMO7DfcXkU1EfTCYrKHyI5d8K5ohWZMfC1itvCsedtecCPHXHT9 sQDLokMORTLF40xg0ibm49J3D8WwpHKnVBcf9xvHWBGPosi44/BsH+RNXRWeJtTU 3Z4IImLf0tj/ntoRI5IcivpahDE5C+qPVhZDXlBEUed03XrGJUptQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5kfq16yj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:16 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMFeVxA030236; Mon, 22 Dec 2025 16:51:15 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b66gxq981-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:15 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGpBnH17498398 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:11 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6E57B20040; Mon, 22 Dec 2025 16:51:11 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4EE5E2004B; Mon, 22 Dec 2025 16:51:10 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:10 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 27/28] KVM: s390: Enable 1M pages for gmap Date: Mon, 22 Dec 2025 17:50:32 +0100 Message-ID: <20251222165033.162329-28-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=carfb3DM c=1 sm=1 tr=0 ts=69497704 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=2DwSK8QdlStOok5YbGYA:9 X-Proofpoint-ORIG-GUID: 29k4ER3uXhNexO0YwCGJAFoHqa0l1Rgd X-Proofpoint-GUID: 29k4ER3uXhNexO0YwCGJAFoHqa0l1Rgd X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfXzt9y4qRVIGr6 AvoLU8Vn5M+/WaLbqcyg7M4p+jswStCJOY2U2q3o1zrLFahGmjyBtlhIrlL0/49S0XRaAWr1Zoy dTT10T2Y7OwpAikxUxR0JIo932As1WhoI6Qj8Kc48+/zqEkyWKb6+iIsMpDF/Cm3dGZLBoIZC76 UeOLtQtIlxfZiTveTzjZDiDPBzehYExPDVcVwI9YIUexa+o6ii/qeyCl7trJyDBo8h0vXOprDQO LQqXCO6p5IG5b9rS+zHn45ag8EPmqkXQYbujQBVsCxjRXt13MGCe+BGT41DhQzSRJ8fq4NG2ia7 s2mMJfqisY6O38A40eHusHlN904SHwWDLkrpa1F0qtyWOeTdO33KyK/lG7RMf8jUoDISBXS8WCW E8puZfrDpDfug99cXbvaXNfZPAeK6v/ukgMPzLVl0vnPrVnsL9zsMqvFhFM59ncIk5sOWhbDl4S vX7OQqiolamYNeMJvEQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 spamscore=0 clxscore=1015 malwarescore=0 impostorscore=0 priorityscore=1501 adultscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" While userspace is allowed to have pages of any size, the new gmap would always use 4k pages to back the guest. Enable 1M pages for gmap. This allows 1M pages to be used to back a guest when userspace is using 1M pages for the corresponding addresses (e.g. THP or hugetlbfs). Remove the limitation that disallowed having nested guests and hugepages at the same time. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gmap.c | 2 +- arch/s390/kvm/kvm-s390.c | 6 +----- arch/s390/kvm/pv.c | 3 +++ 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index 0abed178dde0..19392f3b398b 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -617,7 +617,7 @@ static inline bool gmap_2g_allowed(struct gmap *gmap, g= fn_t gfn) =20 static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) { - return false; + return test_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &gmap->flags); } =20 int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index a714037cef31..47f2794af2fb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -851,6 +851,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; + set_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &kvm->arch.gmap->flags); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -5739,11 +5740,6 @@ static int __init kvm_s390_init(void) return -ENODEV; } =20 - if (nested && hpage) { - pr_info("A KVM host that supports nesting cannot back its KVM guests wit= h huge pages\n"); - return -EINVAL; - } - for (i =3D 0; i < 16; i++) kvm_s390_fac_base[i] |=3D stfle_fac_list[i] & nonhyp_mask(i); diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index b6809ee0bfa5..456c96a1c44a 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -721,6 +721,9 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) uvcb.flags.ap_allow_instr =3D kvm->arch.model.uv_feat_guest.ap; uvcb.flags.ap_instr_intr =3D kvm->arch.model.uv_feat_guest.ap_intr; =20 + clear_bit(GMAP_FLAG_ALLOW_HPAGE_1M, &kvm->arch.gmap->flags); + gmap_split_huge_pages(kvm->arch.gmap); + cc =3D uv_call_sched(0, (u64)&uvcb); *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; --=20 2.52.0 From nobody Sat Feb 7 06:13:32 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 040513469E0; Mon, 22 Dec 2025 16:51:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422283; cv=none; b=GccEyvkLDcTwuHQH8LO+PBQgAcg+Sv4FtLkD9CiXbTlRKUdvlQzGjv3k1c6dVPNpqWEcJ/Up/6c7UF+IIz6kPCWNhCPuu10PoNsDPa19cdYpNLanBM71gXVbSFIvhe/5uNihSZ39jN/4HiI4AdcfMS0u5HveVXqNmYc6L0TZths= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766422283; c=relaxed/simple; bh=T68P+rnQ020PCR3fNqrx8Lh6890n9g1whPkTkJKScmo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Es6wknB4DYv6/I/9WJcdZkVwX7dZNFevrHwVNuSSPmj40aTvq2ZqHLiD6KtYYqWzFlXtv+iSeDdtIin4PzFuXL2qfyjcA7KICbvIbIxJfHVqD6qEcXLCRaZl20Ax2L/14eyCUNralb3392DsiXESTaJWqUYSKEWzid5nYWXNwpY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Umtb0YPe; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Umtb0YPe" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5BM8fcAs031160; Mon, 22 Dec 2025 16:51:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=ViOSbuufnWoJvrGWC fh0mRrQWpiPPBsxQRcz9iVD7c8=; b=Umtb0YPeqQldbwislS5wkDPe9S7pqYtbf t5pQ+LWDS/fvmi49kibZNO+9HBun5GdaSS9qOb2t54uRewOtlgMiHNEd6qXn1xDk mMdN20hRDcNrJACH8a1yM1fYayBxy5v6UmOlYnVMG4fzoBSkHoMtI2XfravdaUVa T5PY9GT3ECZoeAtx3iETZMtddGmJUfcLejX8cKyjZz/ko7DANDGisXehGrZ9geKV O6sPsxuQj1/UCyUrjlz/UHSVUyOIQqPzXUxP4XIeZWdMCSmJ+y6e4+52anYkCYD8 waVZvU1nes/TmwOYYuLxvcjjl8ugyni/4gOL9Czweft8Xs6wTFm6Q== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4b5j7e111w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:17 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5BMGOJt1004192; Mon, 22 Dec 2025 16:51:16 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4b674mq5ax-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Dec 2025 16:51:16 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5BMGpCYt49414516 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Dec 2025 16:51:12 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AC96720040; Mon, 22 Dec 2025 16:51:12 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8D69520043; Mon, 22 Dec 2025 16:51:11 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.79.149]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 22 Dec 2025 16:51:11 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v6 28/28] KVM: s390: Storage key manipulation IOCTL Date: Mon, 22 Dec 2025 17:50:33 +0100 Message-ID: <20251222165033.162329-29-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251222165033.162329-1-imbrenda@linux.ibm.com> References: <20251222165033.162329-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=G8YR0tk5 c=1 sm=1 tr=0 ts=69497705 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=TEVnKxO-3nxHHAUriJcA:9 X-Proofpoint-ORIG-GUID: 0eI9X2pfM2y9RhlzAMbxUBQWQkb7eZJd X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjIyMDE1NCBTYWx0ZWRfX2m/mrmlkwUsL 77piuO0M4V4JjDvSAvRVMVhHHjSG8LkhiFJ5AkubRYuOeKoXVHr2zcermJ9Tfx424r9KLCwlAdV vzVFvI1NjeMAxqpz+rh8R46oTe0ISxzYPluoQKzf+2+mgDMsW8Jrl6R5KSvy3E7UnjCnW7r6bC9 tNZp/IhRRgHJvahypWAjKiajlqrFIkPYq9dChBMi8hwAKP/JYpW2XJgQammT93RKtnyNiZ7nwbD Ydioyg6pnoebmvucy51aiLyO23Lotmbzb+DMv0BuKGdsGZRpTuhFAvKYcbTJ88fDZXR18mwpI/D 2Sy6IO+Ero/Al7PW/MS4pom3BrIdrFwHADfK3G+pJ4yjq9oBZ9H3TmcWyEyu/oeZWbvxdC6pKFq tJEWSLWqynHw2ekuuJvwInsrbzeMhRDALH21he6ddEL6vYcyeGE3o4rC2a0man49/Ymzu/Oipi5 pJVq+/ws+RJzA7K+4rQ== X-Proofpoint-GUID: 0eI9X2pfM2y9RhlzAMbxUBQWQkb7eZJd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-22_02,2025-12-22_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 clxscore=1015 adultscore=0 spamscore=0 malwarescore=0 impostorscore=0 phishscore=0 suspectscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2512220154 Content-Type: text/plain; charset="utf-8" Add a new IOCTL to allow userspace to manipulate storage keys directly. This will make it easier to write selftests related to storage keys. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/kvm-s390.c | 57 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 10 +++++++ 2 files changed, 67 insertions(+) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 47f2794af2fb..7f28b556b460 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -554,6 +554,37 @@ static void __kvm_s390_exit(void) debug_unregister(kvm_s390_dbf_uv); } =20 +static int kvm_s390_keyop(struct kvm_s390_mmu_cache *mc, struct kvm *kvm, = int op, + unsigned long addr, union skey skey) +{ + union asce asce =3D kvm->arch.gmap->asce; + gfn_t gfn =3D gpa_to_gfn(addr); + int r; + + guard(read_lock)(&kvm->mmu_lock); + + switch (op) { + case KVM_S390_KEYOP_SSKE: + r =3D dat_cond_set_storage_key(mc, asce, gfn, skey, &skey, 0, 0, 0); + if (r >=3D 0) + return skey.skey; + break; + case KVM_S390_KEYOP_ISKE: + r =3D dat_get_storage_key(asce, gfn, &skey); + if (!r) + return skey.skey; + break; + case KVM_S390_KEYOP_RRBE: + r =3D dat_reset_reference_bit(asce, gfn); + if (r > 0) + return r << 1; + break; + default: + return -EINVAL; + } + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) @@ -2931,6 +2962,32 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned in= t ioctl, unsigned long arg) r =3D -EFAULT; break; } + case KVM_S390_KEYOP: { + struct kvm_s390_mmu_cache *mc; + struct kvm_s390_keyop kop; + union skey skey; + + if (copy_from_user(&kop, argp, sizeof(kop))) { + r =3D -EFAULT; + break; + } + skey.skey =3D kop.key; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + r =3D kvm_s390_keyop(mc, kvm, kop.operation, kop.user_addr, skey); + kvm_s390_free_mmu_cache(mc); + if (r < 0) + break; + + kop.key =3D r; + r =3D 0; + if (copy_to_user(argp, &kop, sizeof(kop))) + r =3D -EFAULT; + break; + } case KVM_S390_ZPCI_OP: { struct kvm_s390_zpci_op args; =20 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index dddb781b0507..845417e56778 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1219,6 +1219,15 @@ struct kvm_vfio_spapr_tce { __s32 tablefd; }; =20 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; + /* * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns * a vcpu fd. @@ -1238,6 +1247,7 @@ struct kvm_vfio_spapr_tce { #define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_ma= pping) #define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_ma= pping) #define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long) +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x53, struct kvm_s390_keyop) =20 /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) --=20 2.52.0