From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16C9E2874F5; Thu, 20 Nov 2025 17:15:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658957; cv=none; b=fX4s5ZjROjxJhm/eadSpGRuA0BR2GReUUgfv7b7udnBEhHAoytvdjwn1rAEnUsRRLAjvlB5KxaNfmplW3nJkjHTCjjQwyI6OoARYdRWPZ/idPrqbdPra6awpH1gxIACTWuoYCeVAixZecZDbz8/pI+Qsmzm96QaTU6l4OH10+AM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658957; c=relaxed/simple; bh=VoaFOXEL6pENAFpQf75H1ApTLhVD17/K6ByfpKPt/DQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mqiQi65+EK+I0F4dQh2wFvHFyanoxWZ3KsaKKltdFzqgfBryd+CB59rCbaEdQi/tvdmFyvHZ7J96VyKSl7yW6sFLh0Z0xF6j+RWaV3TxnyPVJk0dbeZlHdx/BjRMrAzCnCTGYa2pcu7KOz6nH1qTJEytLfFNJjWwpT0fRxZPl8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Lkf3npeT; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Lkf3npeT" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKC3C2X013629; Thu, 20 Nov 2025 17:15:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=xaKxzY/IbWopyZNo1 Xb8G5vz3W+JIoGpAx17XJ8anWQ=; b=Lkf3npeTrIY+4MEIFS+tEf+9tDgjnn3Me fO8AeixhvJkiCFxI4HtmCqKo9urjS7xQxQfgXWmsNLmWv186x9XRMErJvn93/pcb dld7yunaIonyfVjeRwoJDXBmO/bH3xO4O3q2wKP3PGaX5kWf6uh4u96lV+ENxUja KKThfvfLD6yR4wYuvyGvL4a7Os1hE2l/EmRVke5KiBFdD0NKypc/gHqINOkc1B6W bnpgyJL9Qg8EkvvKEyfKWwuSO0AadiGkOnaK0Cf8m2shuIjnWA6FjIWPVKRyKytz OgvGNxxBtghj2KdI4wbxonMf9lGy3/8oz5RWpDrTTvhwxKvx+AcMg== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejgx6ack-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:53 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKH3KL6022354; Thu, 20 Nov 2025 17:15:52 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7maj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:52 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFmFq52953518 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:48 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AB97B20043; Thu, 20 Nov 2025 17:15:48 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 03FA320040; Thu, 20 Nov 2025 17:15:47 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:46 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 01/23] KVM: s390: Refactor pgste lock and unlock functions Date: Thu, 20 Nov 2025 18:15:22 +0100 Message-ID: <20251120171544.96841-2-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 3IgLNAQavu0ghRe_V2gTRHofYInuQXoR X-Authority-Analysis: v=2.4 cv=YqwChoYX c=1 sm=1 tr=0 ts=691f4cc9 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=gw90SDbXj9Ny8T8iXsQA:9 X-Proofpoint-ORIG-GUID: 3IgLNAQavu0ghRe_V2gTRHofYInuQXoR X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX2y1olTGwwaS5 W0W9X2jPOOuMVEdi9cnxTVGxc9c75SyN058XAB+GFCGKalVotHjLuyeWZ58DyMuiOO5iBdgZnss IdUCMrqlDVpmXWxX7QsLFb+4HBnGh++hG+4RJ2QJ83nwuP67oASXNlONe6MzgVeHA/X+wE8oTXt 77Bev0CRh0eKT3EPL/IVzC8eViqfCdfvtL8bwoeSGP2Eeclu4RiolKrKiz6y9S/SvejmtV5KRiZ Zjvo0hT0PB0qMKzTto74cdt8kFhLJcJlR/LXmk70x4l3s5Ji9eij7zQPfs/eOElvFy+UFML9+P0 XAW4rb/3maEghP9W1LuCYtC0QP0r5KbUfZalMq3JTkcXPphVFKhasDSQatzcHryIw3wRK3Hct1P lVvSNr4a1Wb8m0n4j8YBPKqmRbQ2QQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Move the pgste lock and unlock functions back into mm/pgtable.c and duplicate them in mm/gmap_helpers.c to avoid function name collisions later on. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/pgtable.h | 22 ---------------------- arch/s390/mm/gmap_helpers.c | 23 ++++++++++++++++++++++- arch/s390/mm/pgtable.c | 23 ++++++++++++++++++++++- 3 files changed, 44 insertions(+), 24 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 6663f1619abb..528ce3611e53 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -2053,26 +2053,4 @@ static inline unsigned long gmap_pgste_get_pgt_addr(= unsigned long *pgt) return res; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index d4c3c36855e2..e14a63119e30 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -15,7 +15,6 @@ #include #include #include -#include =20 /** * ptep_zap_swap_entry() - discard a swap entry. @@ -35,6 +34,28 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, sw= p_entry_t entry) free_swap_and_cache(entry); } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 05974304d622..d0e8579d2669 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -24,7 +24,6 @@ #include #include #include -#include #include =20 pgprot_t pgprot_writecombine(pgprot_t prot) @@ -116,6 +115,28 @@ static inline pte_t ptep_flush_lazy(struct mm_struct *= mm, return old; } =20 +static inline pgste_t pgste_get_lock(pte_t *ptep) +{ + unsigned long value =3D 0; +#ifdef CONFIG_PGSTE + unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); + + do { + value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); + } while (value & PGSTE_PCL_BIT); + value |=3D PGSTE_PCL_BIT; +#endif + return __pgste(value); +} + +static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) +{ +#ifdef CONFIG_PGSTE + barrier(); + WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); +#endif +} + static inline pgste_t pgste_get(pte_t *ptep) { unsigned long pgste =3D 0; --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D167C36CE19; Thu, 20 Nov 2025 17:15:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658959; cv=none; b=f9uGEXmZc6YjrrdwlFNrvRWGocAZtQujO8KcO8f/2lovr1rwLJn293DAiykIorvDr3MpxMvznR4wElUYIsFDCtHfq/Qs85XoG9NXQdbgIeWNFghpmLjvZ043UV6tmk6hwCQCZZS8swiWIeOLjVBRwEpikeymiZUUF5c4F2Vx16g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658959; c=relaxed/simple; bh=846wY4Q8kTWTl8iKvy1sofJTTFCBr+FU6VG6wOY7w/A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g/3SVLTm5mJG88iFBA6cNs1Ksk/hmBQ0vye6MXMPRPJmYKbvy2E41rXrKDtrUjLdc+pq09G2+RnhyqmNjwvYEuH2/KNi/jR2PNOqzUZtIW54Y8t8jrVsqvQKvTxzfEUKvVYgjIrKcivmk/jQgiv0cfep/tKyw4QI6OidfdZ1YIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=oBDI95Zb; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="oBDI95Zb" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCm0lo013647; Thu, 20 Nov 2025 17:15:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=xxJY71CB/uj5Sfn/v ACUN6MRJImbmEMWRyLBDS5bT1s=; b=oBDI95ZbP39Lkk9l68apnH/KAaQeRjXSe B4ibCRuOrhPns3k1dymyJGuEP9ke+/PlnB8dexdj4OOTKycD/qRmf5s3x+8EgzTC HaTbL5Kckrxd9y1Roi2baZLMdldQAI9deeUg9XuTzt2nhNY3LSN9LQest8+mxfjL 9hhjvFYJMJS96fxPjPEASAP7DhV7WjaWH03C1lxrBCnBoPP4huGPLWQ3m/GdWxp6 Y43rhs+0vHYhLFd/1uwO3OddaKFqbjo08ZlQvcxWQe2r1PyNSSJg6f3MIcgLiUU2 hou/OVni45w7Ixp8M5kPCYB6PY07+36jaxd0E3AqOKik96jqE+mXA== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejgx6ad1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:55 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGjtDm005244; Thu, 20 Nov 2025 17:15:54 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af5bkfkxc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:54 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFogA27525774 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:50 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7EA5F2004D; Thu, 20 Nov 2025 17:15:50 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D530220040; Thu, 20 Nov 2025 17:15:48 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:48 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 02/23] KVM: s390: add P bit in table entry bitfields, move union vaddress Date: Thu, 20 Nov 2025 18:15:23 +0100 Message-ID: <20251120171544.96841-3-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: DhXffliTtskZ7e_H-dazibWHTOPnAdZ8 X-Authority-Analysis: v=2.4 cv=YqwChoYX c=1 sm=1 tr=0 ts=691f4ccb cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=ejxT3_R5mcMXAYFV490A:9 X-Proofpoint-ORIG-GUID: DhXffliTtskZ7e_H-dazibWHTOPnAdZ8 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX2ayGRuLEK+mh 37ZDof5q7fhuIILTI2DJQTmEFH+UKB8pcHOss73UCvmUNS5OEn3W21TNRKCoDXtf7kpWMqdWnnU LvL8fNYsWVnMn8IgRN50DtbVRDwq+0seZV+Dm62EeG6MYXvsnaPOIQyIAR7TYmQEfJt390meJKs lLJwjHpg/qrRSqFhQN90eVsc1+gmdFXGksdd0XbXKmX3PGEQk16QgiHyNqew3lKXTLy+dc9coCH +3VqsPk/moL3ZxIdBC/ADURVQNvV+KrMOvuUIs5+fHOlc9dwMnECAMYtnUcSQfa9wzyTOTqw/q2 YU9ZUKXiOKjDu+9KdnZDfohfUUrhyjOLk4nyaaU/GM0aNCKFGHBcAXX9Q8d9YSRWSDZ7fXse00O iIMfoI4LZ0y82ctJYiOwRRxMA1cXfQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add P bit in hardware definition of region 3 and segment table entries. Move union vaddress from kvm/gaccess.c to asm/dat_bits.h Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/include/asm/dat-bits.h | 32 ++++++++++++++++++++++++++++++-- arch/s390/kvm/gaccess.c | 26 -------------------------- 2 files changed, 30 insertions(+), 28 deletions(-) diff --git a/arch/s390/include/asm/dat-bits.h b/arch/s390/include/asm/dat-b= its.h index 8d65eec2f124..c40874e0e426 100644 --- a/arch/s390/include/asm/dat-bits.h +++ b/arch/s390/include/asm/dat-bits.h @@ -9,6 +9,32 @@ #ifndef _S390_DAT_BITS_H #define _S390_DAT_BITS_H =20 +/* + * vaddress union in order to easily decode a virtual address into its + * region first index, region second index etc. parts. + */ +union vaddress { + unsigned long addr; + struct { + unsigned long rfx : 11; + unsigned long rsx : 11; + unsigned long rtx : 11; + unsigned long sx : 11; + unsigned long px : 8; + unsigned long bx : 12; + }; + struct { + unsigned long rfx01 : 2; + unsigned long : 9; + unsigned long rsx01 : 2; + unsigned long : 9; + unsigned long rtx01 : 2; + unsigned long : 9; + unsigned long sx01 : 2; + unsigned long : 29; + }; +}; + union asce { unsigned long val; struct { @@ -98,7 +124,8 @@ union region3_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Region-Invalid Bit */ unsigned long cr: 1; /* Common-Region Bit */ unsigned long tt: 2; /* Table-Type Bits */ @@ -140,7 +167,8 @@ union segment_table_entry { struct { unsigned long : 53; unsigned long fc: 1; /* Format-Control */ - unsigned long : 4; + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long : 3; unsigned long i : 1; /* Segment-Invalid Bit */ unsigned long cs: 1; /* Common-Segment Bit */ unsigned long tt: 2; /* Table-Type Bits */ diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 21c2e61fece4..d691fac1cc12 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -20,32 +20,6 @@ =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 -/* - * vaddress union in order to easily decode a virtual address into its - * region first index, region second index etc. parts. - */ -union vaddress { - unsigned long addr; - struct { - unsigned long rfx : 11; - unsigned long rsx : 11; - unsigned long rtx : 11; - unsigned long sx : 11; - unsigned long px : 8; - unsigned long bx : 12; - }; - struct { - unsigned long rfx01 : 2; - unsigned long : 9; - unsigned long rsx01 : 2; - unsigned long : 9; - unsigned long rtx01 : 2; - unsigned long : 9; - unsigned long sx01 : 2; - unsigned long : 29; - }; -}; - /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67FC436E56F; Thu, 20 Nov 2025 17:16:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658962; cv=none; b=W9R56+QLzcT9AL/uTSKZMFRrOZPvdeHIWpsabF4qhdqakpD6S5TW7Ppb2iOTQ0/j7vpkh8LNEZbgJZP/RDpBdMTJ72GVQzfH6fleipo1ijoENghxJ8zi+aYrvO+SvJH8RLETMZRyAP+8nMTTAwdAjmIFK6pR9iXvou82weOqkGw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658962; c=relaxed/simple; bh=u0nU9kATVWzKc/kzO+zCDytNR0N2lU6/4sQllCyXLLA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u8WIi//phkIe7liDeQ4LmS6yCgUeJ3w/6Q94iTyz5kLRCttii+dTjntpLunomS9tLUXU0zhbh0D4FvH7rxQJqvSqMxyvv6TKcVqctk4LT35suZ/CXRgShuAlTm5lF+/2Giu0prykbv2rqGPpMR/eOT9vRA+dWpUE9FH97z16q1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=rOGjBQIE; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="rOGjBQIE" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBshML021451; Thu, 20 Nov 2025 17:15:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=9Glwi7ThStNExADDO 9eERh5uOsvr/UmeHWdV3D6Ue6k=; b=rOGjBQIE0qDc2jeIFhzHjNxwRc8em5ViX +WDtlGsrGBM/6PeTQV2tZrt9snu0UBWMxhXyKV0Zvbxf3RU0DiKGDqFzRTqT6bdd aqJUfzJhIATBhKVK94getVym81ri+8CzjQpc7qcbHtJ1n6HedzPO1lAD6Y2X/KF3 HLLkkg7lWpKNUP9HkZxO3q0qAYBJEoUkha4CorBXBm1IvN+Nt2xJXWhnWkcPNPGl UJKQRx+VyuMFv6rUKuxhh6pyhsMv/RN2J09hwW6SQc79vfHf4oIFmo8z8X7QutPU tXDM0TYnllOiCyvYbY29HgwLAHRnODt8ZaADwwmixnR0ubwwj9x3g== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejjwfr9k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:57 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGcYbO030851; Thu, 20 Nov 2025 17:15:56 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af47y7qnb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:56 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFqI546268842 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:52 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FEA32004D; Thu, 20 Nov 2025 17:15:52 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A9B8220040; Thu, 20 Nov 2025 17:15:50 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:50 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 03/23] s390: Move sske_frame() to a header Date: Thu, 20 Nov 2025 18:15:24 +0100 Message-ID: <20251120171544.96841-4-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=BanVE7t2 c=1 sm=1 tr=0 ts=691f4ccd cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=-u9FmCOoZJUNk1x3bJoA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXzlPQO/shgidR sts1BE71Rur0zRVYs/yRVuAvut2NNCdbNBZwWoVGkhJEmMdmYDbR8wQM2OAG9eJHx3hvfa3MB8m Y32siroLpKkqmvE25Pxwr1cbrlL+o4szMW3mJNtsBckz/UTHVtxdw3VSXwJ87WaudSwla8yrMy5 1fOw+NV/MNNAeKNC74tc4wmviRCyw+EtFr4DNfiNn7q9mhu0Q09NGWh6i4dqBqGnmEKqoTNBguD mXtOETFhU8Ou5siNizMRQP3bhl2Z14SyXhi4rAn+udWvMjhsBwEYBssQBwhzR6Cm5I/UxEmlO8D LAB1fiP3l+mcRqUlnQv9N3rVAlQirgFbXtIsihrYOyylWKBRYCxq8mlpPRSIcmxXFth2YtTX6ac cilBYFSbgSJccb+zte9M/FXnNCY7+g== X-Proofpoint-GUID: HQM6LjQUB-vsTWCpkRawPf_3co87Cy0G X-Proofpoint-ORIG-GUID: HQM6LjQUB-vsTWCpkRawPf_3co87Cy0G X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 phishscore=0 bulkscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Move the sske_frame() function to asm/pgtable.h, so it can be used in other modules too. Opportunistically convert the .insn opcode specification to the appropriate mnemonic. Signed-off-by: Claudio Imbrenda Reviewed-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss Reviewed-by: Nina Schoetterl-Glausch --- arch/s390/include/asm/pgtable.h | 7 +++++++ arch/s390/mm/pageattr.c | 7 ------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 528ce3611e53..3ddc62fcf6dd 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1148,6 +1148,13 @@ static inline pte_t pte_mkhuge(pte_t pte) } #endif =20 +static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) +{ + asm volatile("sske %[skey],%[addr],1" + : [addr] "+a" (addr) : [skey] "d" (skey)); + return addr; +} + #define IPTE_GLOBAL 0 #define IPTE_LOCAL 1 =20 diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 348e759840e7..ceeb04136cec 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -16,13 +16,6 @@ #include #include =20 -static inline unsigned long sske_frame(unsigned long addr, unsigned char s= key) -{ - asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],1,0" - : [addr] "+a" (addr) : [skey] "d" (skey)); - return addr; -} - void __storage_key_init_range(unsigned long start, unsigned long end) { unsigned long boundary, size; --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 662D2393DD6; Thu, 20 Nov 2025 17:16:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658963; cv=none; b=uqKhDLRJDye5bC0ixqXkf361yQ582q0KeyuSa3UJJBfFKwT1Stmiu7BE2ayhSrIPqBlI3cyQsHhfjTDy3xnwsQTy3dO+68GZ5LvZ2YVwrojoPo+11yJLw7wZDJ3DRskbxXOe13PDmxWff+isHa16U/pBbgBZimm9X6aMptesX8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658963; c=relaxed/simple; bh=f4WlxBljOVtmnwlVgMtzL1RlxFr/CaVwBAvBr7+4HAM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V1zNJ9grGiuqSRbqfApIpZyhuG4Z7YG97s54jnNMIPiGvESbj3t/KOe+NHQXexlwEy3e69VjqhtNreBUr+aOiAsrzycxMXHNOK1rghxyEwEPdx2ZC3PDGC8kvdR+hrL1spafBRv7SBG+fyBjADP0TUTmo8G/Q8/DxI7nfU1+eOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=n55X0cHi; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="n55X0cHi" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCUu2j027962; Thu, 20 Nov 2025 17:15:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=aneBqESSUOFJoQfix DHvr/+qYDTdya7WXTYhXldWWtY=; b=n55X0cHiuZ4ZvALar0BqAJ3pG2bbvB79B x4dhDBaMSNOsTO97c/XXtUflZy3FaKG/6WofN5G1p9vHUEvLG61KGAGZpqFd0+48 776VgpuTvNojPAoK/6qm9N0EyR/86TEpyEX7Gl7V2wZwwww4X9xX8reZoLsrCUnP m+1x/hR8jCEE9B5RGk6Zg6HqkZ30tot1xk0AqiXE5vLQqBEx65Ncae7ko6Z2/gpk le5ZJoGnFyyKMaSKq218t1CDA7HmzHOjFRB6+yKbYi5H4pQgzmHZrD/GiDHMmBd+ cRrlCJbTpw3f9Y9oKefmrvT+EyWXzW0701hWXUh0Ut1mvAsVWiudQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7mtn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:58 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKFm5A4010419; Thu, 20 Nov 2025 17:15:58 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af3usfsbc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:57 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFscc37159246 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:54 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 41D7D2004B; Thu, 20 Nov 2025 17:15:54 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B898420043; Thu, 20 Nov 2025 17:15:52 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:52 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 04/23] KVM: s390: Add gmap_helper_set_unused() Date: Thu, 20 Nov 2025 18:15:25 +0100 Message-ID: <20251120171544.96841-5-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: p5lTn8RB8LPrOItvBNpQIqhUAvyZaNtH X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXxt28MXT5mCJG ta94PcSvnqO9pCk/VGQXJP60vIkHu6A5dbmMgAnxyqw26n8xXhHxuOVSWo/VNnJ8bW6DjcWeCsT quu4yx70TznwVJsWKFmx3FqWdI52zJiCCngp7PdGr9+YOjDhCMOdXkz070ueI47C7Aki9gf+ZLU CcJZ0tFPVe0AONFUwGGz52nSU61bwndBJKqcSUgZ7ok8WeOeqhAf/tRLtxObwksVwrDBbsn1haI Bk5CqMm86CikNJpmp3Ta1sKjTYDrJ1j+fD8HGKdSySagmxnLWAsuSae9VbXnwbzhQVmGi55a5VT jbYsrDZiR+7/6NW/mzfOvy19THa0EagglTuWtEwccyC9Jyq/pwcA/DVTtwTfGWL+xUNm/PqrpS0 esqe1wuH/dlgT8YTVwqTZaSVKh8saQ== X-Proofpoint-ORIG-GUID: p5lTn8RB8LPrOItvBNpQIqhUAvyZaNtH X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cce cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=F2Av9RbSm_i__n-hslAA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add gmap_helper_set_unused() to mark userspace ptes as unused. Core mm code will use that information to discard unused pages instead of attempting to swap them. Signed-off-by: Claudio Imbrenda Reviewed-by: Nico Boehr Tested-by: Nico Boehr Acked-by: Christoph Schlameuss --- arch/s390/include/asm/gmap_helpers.h | 1 + arch/s390/mm/gmap_helpers.c | 79 ++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/arch/s390/include/asm/gmap_helpers.h b/arch/s390/include/asm/g= map_helpers.h index 5356446a61c4..2d3ae421077e 100644 --- a/arch/s390/include/asm/gmap_helpers.h +++ b/arch/s390/include/asm/gmap_helpers.h @@ -11,5 +11,6 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr); void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsig= ned long end); int gmap_helper_disable_cow_sharing(void); +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr); =20 #endif /* _ASM_S390_GMAP_HELPERS_H */ diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index e14a63119e30..dca783859a73 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -124,6 +124,85 @@ void gmap_helper_discard(struct mm_struct *mm, unsigne= d long vmaddr, unsigned lo } EXPORT_SYMBOL_GPL(gmap_helper_discard); =20 +/** + * gmap_helper_try_set_pte_unused() - mark a pte entry as unused + * @mm: the mm + * @vmaddr: the userspace address whose pte is to be marked + * + * Mark the pte corresponding the given address as unused. This will cause + * core mm code to just drop this page instead of swapping it. + * + * This function needs to be called with interrupts disabled (for example + * while holding a spinlock), or while holding the mmap lock. Normally this + * function is called as a result of an unmap operation, and thus KVM comm= on + * code will already hold kvm->mmu_lock in write mode. + * + * Context: Needs to be called while holding the mmap lock or with interru= pts + * disabled. + */ +void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vm= addr) +{ + pmd_t *pmdp, pmd, pmdval; + pud_t *pudp, pud; + p4d_t *p4dp, p4d; + pgd_t *pgdp, pgd; + spinlock_t *ptl; /* Lock for the host (userspace) page table */ + pte_t *ptep; + + pgdp =3D pgd_offset(mm, vmaddr); + pgd =3D pgdp_get(pgdp); + if (pgd_none(pgd) || !pgd_present(pgd)) + return; + + p4dp =3D p4d_offset(pgdp, vmaddr); + p4d =3D p4dp_get(p4dp); + if (p4d_none(p4d) || !p4d_present(p4d)) + return; + + pudp =3D pud_offset(p4dp, vmaddr); + pud =3D pudp_get(pudp); + if (pud_none(pud) || pud_leaf(pud) || !pud_present(pud)) + return; + + pmdp =3D pmd_offset(pudp, vmaddr); + pmd =3D pmdp_get_lockless(pmdp); + if (pmd_none(pmd) || pmd_leaf(pmd) || !pmd_present(pmd)) + return; + + ptep =3D pte_offset_map_rw_nolock(mm, pmdp, vmaddr, &pmdval, &ptl); + if (!ptep) + return; + + /* + * Several paths exists that takes the ptl lock and then call the + * mmu_notifier, which takes the mmu_lock. The unmap path, instead, + * takes the mmu_lock in write mode first, and then potentially + * calls this function, which takes the ptl lock. This can lead to a + * deadlock. + * The unused page mechanism is only an optimization, if the + * _PAGE_UNUSED bit is not set, the unused page is swapped as normal + * instead of being discarded. + * If the lock is contended the bit is not set and the deadlock is + * avoided. + */ + if (spin_trylock(ptl)) { + /* + * Make sure the pte we are touching is still the correct + * one. In theory this check should not be needed, but + * better safe than sorry. + * Disabling interrupts or holding the mmap lock is enough to + * guarantee that no concurrent updates to the page tables + * are possible. + */ + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmdp)))) + __atomic64_or(_PAGE_UNUSED, (long *)ptep); + spin_unlock(ptl); + } + + pte_unmap(ptep); +} +EXPORT_SYMBOL_GPL(gmap_helper_try_set_pte_unused); + static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CEC93A1CF4; Thu, 20 Nov 2025 17:16:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658965; cv=none; b=JphYbLKPX+bUJBfudPBoJGHjegNXpdxwGtjuqIryO2Fu9+6yKFpmIIkqDSoYDm5si0tJ8RpxBp5sRBpANepouZw1HFm33RJeI+xeBTr73z9rD8c0kfB4iAcN8EUfYdyDhGQoJW5lN0lVZgbGWnUtBw5uWJjORLzsvIv1w6L8Sno= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658965; c=relaxed/simple; bh=9/XGXZbVrrrbHZGQm2z3Er12l9KTyaztqVzHuOUY0Yk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MJSaCrEsSlCULNDj+c8qQlR3+AgrqfpbE6il8N5bCVxwwRdi1BUdykVDNPebP1j5LlvWnzVBwrEBdkEGdqAIHIewTVtUX0lWdE8AUs/RTRmo/V0gX5XuyFjENJCErS/cxx68uapUbMdqissS0MkIC4dARlo8xJ5vLsu7UwQUEHQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Cv9FkwNC; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Cv9FkwNC" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBv8hH007044; Thu, 20 Nov 2025 17:16:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Q5BgJ9yd0kjSFSkjR HUunnRp8UfhNyqvkTWSU8BFoHw=; b=Cv9FkwNCagU/dAiOG2ArKykW8zg3m6BHN aZ//VJGN1YRChwocPbWjvzO2XGvkI/ZYqwlzBojkOBgwvCZ1x2S7jnZvCN1y3Wtj CQ+aHi0KA0z/UX3IOxT1pAs7T3jrYR5sm3dE1R7vdl0i1NNbjQtA5jaQwGJoV/v9 3VtpsemymPoO09BYSRoQyQgi3RtromheF1Wc7hDboqr3u/U/jDBmTJKuz8VUx0c6 gD3qjMO28qfu/EEDSPPVSBKwzFFa6EEQUftv/QHj4VDypkncR/wyry+EmtOnNQgG rOyc3YFDeiJBp6LOHxg590iG4R1Gsh6I+CyJZEHy7QIqdgb4p7yWA== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejmsxbkg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:00 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGUuIo010406; Thu, 20 Nov 2025 17:16:00 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af3usfsbg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:15:59 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFuwc57409878 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:56 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1D07B2004D; Thu, 20 Nov 2025 17:15:56 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F19320043; Thu, 20 Nov 2025 17:15:54 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:54 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 05/23] KVM: s390: Enable KVM_GENERIC_MMU_NOTIFIER Date: Thu, 20 Nov 2025 18:15:26 +0100 Message-ID: <20251120171544.96841-6-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: VWY_odSwCPZCI6vvT4d8WTOmTdknijfz X-Authority-Analysis: v=2.4 cv=Rv3I7SmK c=1 sm=1 tr=0 ts=691f4cd0 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=sWpJ-_il0aw9mM0vlR0A:9 X-Proofpoint-GUID: VWY_odSwCPZCI6vvT4d8WTOmTdknijfz X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX21g8tVE+QpPM ks+OZtX+r4BaVOunfDKNIbCAhaIj8fGGYWDQTgfyUVyuo0RmD+MHl98j4881AZatd6xOotumaEs qvZmGL9Rgc7Erbvl+a2HujYdaCb0kqE2surN3H4u3bmB6jvSrAh+/ndIDKWRRqXEFA+iZgoyQgC RLCP0EeAzr0bbNm8NKhmkAXrSSRpJzEjfLsQCcIxm0NHBlammEbdsrBwIjwYP1PoY8hnMIF6Vmu FWoqthUKR84uKFXCOC9/KWLb6LMRUAl7CjCMpgRZkU4lWwSsb7x5vELOZELIBlbO4Sx4iAmgevb 6L+bA3aREUqnjf5PAyRuwByzqwrmkoYuQKtbkXtOYaSUTKzkq1eGweZfVcoAYh8Rm2EELCNLbus Nox62iJ1+O0RD6SngYgUt5EhJpFAMg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Enable KVM_GENERIC_MMU_NOTIFIER, for now with empty placeholder callbacks. Also enable KVM_MMU_LOCKLESS_AGING and define KVM_HAVE_MMU_RWLOCK. Signed-off-by: Claudio Imbrenda Acked-by: Christian Borntraeger Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Kconfig | 3 ++- arch/s390/kvm/kvm-s390.c | 45 +++++++++++++++++++++++++++++++- 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index c2ba3d4398c5..f5f87dae0dd9 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -27,6 +27,7 @@ #include #include =20 +#define KVM_HAVE_MMU_RWLOCK #define KVM_MAX_VCPUS 255 =20 #define KVM_INTERNAL_MEM_SLOTS 1 diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index cae908d64550..e86332b26511 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -29,7 +29,8 @@ config KVM select HAVE_KVM_INVALID_WAKEUPS select HAVE_KVM_NO_POLL select KVM_VFIO - select MMU_NOTIFIER + select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_LOCKLESS_AGING help Support hosting paravirtualized guest machines using the SIE virtualization capability on the mainframe. This should work diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 16ba04062854..2e34f993e3c5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4918,7 +4918,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu= , gfn_t gfn, gpa_t gaddr, u rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); if (!rc) rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(spinlock, &vcpu->kvm->mmu_lock) { + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { kvm_release_faultin_page(vcpu->kvm, page, false, writable); } mmap_read_unlock(vcpu->arch.gmap->mm); @@ -6125,6 +6125,49 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, return; } =20 +/** + * kvm_test_age_gfn() - test young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range is young, otherwise 0. + */ +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_age_gfn() - clear young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code without holding the kvm mmu lock + * Return: true if any page in the given range was young, otherwise 0. + */ +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + +/** + * kvm_unmap_gfn_range() - Unmap a range of guest addresses + * @kvm: the kvm instance + * @range: the range of guest page frames to invalidate + * + * This function always returns false because every DAT table modification + * has to use the appropriate DAT table manipulation instructions, which w= ill + * keep the TLB coherent, hence no additional TLB flush is ever required. + * + * Context: called by KVM common code with the kvm mmu write lock held + * Return: false + */ +bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + static inline unsigned long nonhyp_mask(int i) { unsigned int nonhyp_fai =3D (sclp.hmfai << i * 2) >> 30; --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 018D63A5E84; Thu, 20 Nov 2025 17:16:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658966; cv=none; b=cmBsrNkL5ZKvoV1ELqhfsPzCCWIoDfvCZNR0wmeyKB0MisPu3Ch5deGch+9R2odyqF2B61bXbSsiZEx70K0g4Dpu70JJYvJQ1rYECWGO3+CLW1rXRzSkBczTxUV/94UOrAQ0v27aHII1XY0cNTskB6SF5LVIglOKsic1z9QLrqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658966; c=relaxed/simple; bh=BRFD/zr2t1m4wmmkXvg3kWx5+uDIcFNShRaNjbFWQjU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E92ZWY6wTEaWpF4oTY2+phNKwmP8U4mcuqlKZGNLpBcS1ihz4Gm3h7xEY07vHdpb/uzUK70B9h+8AMLmxuINPciHdeXUpSsdlOp6IAbWzm9StuQojVeyQZ57XmlLRzuM5lKHY0vdQ7wD8Up1/HCsxuI3mku7+fiXKBrm+qlS4Mo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=dwShJ1P4; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="dwShJ1P4" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKClWS9007023; Thu, 20 Nov 2025 17:16:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=4193Vdur/PnFHaaOF pWYmwatyRrvxD+5sPgR3d8oH5s=; b=dwShJ1P4G9RhlQ5r8dSLYQMDWNcAxgTuG 0LCQEC7tRXWQiick9VGEFzaSvB1Z14HNMAIwlsh3Cth7pwimx41nNPQmjByiCK2z uvbgyVgcYj123tJjXqJSZBhVvTsYHQ7txADc80DKqoPBRoIekFVvqyGhjGvO/Lqz y6Rk42UMUGbrMREBdbYPcqRUGWZ/jpQUqxc3q2IsTSHPa4RfIQxngrHY5HJYO77p KOJlSff6iCpYNbix5Ha+ka4Fac3CgMdv4OA1nro7Y9sO4qg+yLjUZKLi4XD5xwjI sj0qszE8kJUsTDogws6VCenaxZemjVOlsEslSMP3um55IcOKaMvqw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejmsxbkv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:02 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEPsnf022340; Thu, 20 Nov 2025 17:16:01 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mbs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:01 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFv4g10748344 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:58 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D07F72004D; Thu, 20 Nov 2025 17:15:57 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 432FF20040; Thu, 20 Nov 2025 17:15:56 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:56 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 06/23] KVM: s390: Rename some functions in gaccess.c Date: Thu, 20 Nov 2025 18:15:27 +0100 Message-ID: <20251120171544.96841-7-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ByQg66IaKb4s2XU6wQtk6gifjrqH9rR8 X-Authority-Analysis: v=2.4 cv=Rv3I7SmK c=1 sm=1 tr=0 ts=691f4cd2 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=daTLW1200KpnZZ0VTdgA:9 X-Proofpoint-GUID: ByQg66IaKb4s2XU6wQtk6gifjrqH9rR8 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX2Wjs9r/ZEOT+ SizaKSzmFg/0Sp0B5Q2pNmvJVA0xMa3BMUgzCyKW7TWE9fohQwPhpzPum1fzpO4ccg+oKlSZFug sK/cGmGMgfHB65531sUdE7jXheTyD31LQBgMj5WAyiD2tRkt4iJQLSAUAMWOxIJuhK0o6JnENaI DqnRmU0FEQvVW6gheGhI25nmKkJAaDq74FHCgKZfoQshAsfen81QUtmNMHdvsjbNt95JSWzUgqa HPSy4LIZEk+g8fIP1ZhLyJigByv8SmS8AQVloxRF7xJt0lHcRAdSrvJfzzjA6Lm3Hdm6L78jwVd 9qxv73Dgy1r1Qe2kRIxsdQMk4evC7yUfFZ4iGxPqG9IwAnOp2pPV3emf/V5vR2x2SVCl0YtlKNZ s0j80jJkEnnmO1/veh9Tv5QJ789hMw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Rename some functions in gaccess.c to add a _gva or _gpa suffix to indicate whether the function accepts a virtual or a guest-absolute address. This makes it easier to understand the code. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden Reviewed-by: Christoph Schlameuss --- arch/s390/kvm/gaccess.c | 51 +++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index d691fac1cc12..05fd3ee4b20d 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -412,7 +412,7 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) } =20 /** - * guest_translate - translate a guest virtual into a guest absolute addre= ss + * guest_translate_gva() - translate a guest virtual into a guest absolute= address * @vcpu: virtual cpu * @gva: guest virtual address * @gpa: points to where guest physical (absolute) address should be stored @@ -432,9 +432,9 @@ static int deref_table(struct kvm *kvm, unsigned long g= pa, unsigned long *val) * the returned value is the program interruption code as defined * by the architecture */ -static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long = gva, - unsigned long *gpa, const union asce asce, - enum gacc_mode mode, enum prot_type *prot) +static unsigned long guest_translate_gva(struct kvm_vcpu *vcpu, unsigned l= ong gva, + unsigned long *gpa, const union asce asce, + enum gacc_mode mode, enum prot_type *prot) { union vaddress vaddr =3D {.addr =3D gva}; union raddress raddr =3D {.addr =3D gva}; @@ -615,8 +615,8 @@ static int low_address_protection_enabled(struct kvm_vc= pu *vcpu, return 1; } =20 -static int vm_check_access_key(struct kvm *kvm, u8 access_key, - enum gacc_mode mode, gpa_t gpa) +static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, + enum gacc_mode mode, gpa_t gpa) { u8 storage_key, access_control; bool fetch_protected; @@ -678,9 +678,9 @@ static bool storage_prot_override_applies(u8 access_con= trol) return access_control =3D=3D PAGE_SPO_ACC; } =20 -static int vcpu_check_access_key(struct kvm_vcpu *vcpu, u8 access_key, - enum gacc_mode mode, union asce asce, gpa_t gpa, - unsigned long ga, unsigned int len) +static int vcpu_check_access_key_gpa(struct kvm_vcpu *vcpu, u8 access_key, + enum gacc_mode mode, union asce asce, gpa_t gpa, + unsigned long ga, unsigned int len) { u8 storage_key, access_control; unsigned long hva; @@ -772,7 +772,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return trans_exc(vcpu, PGM_PROTECTION, ga, ar, mode, PROT_TYPE_LA); if (psw_bits(*psw).dat) { - rc =3D guest_translate(vcpu, ga, &gpa, asce, mode, &prot); + rc =3D guest_translate_gva(vcpu, ga, &gpa, asce, mode, &prot); if (rc < 0) return rc; } else { @@ -784,8 +784,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, } if (rc) return trans_exc(vcpu, rc, ga, ar, mode, prot); - rc =3D vcpu_check_access_key(vcpu, access_key, mode, asce, gpa, ga, - fragment_len); + rc =3D vcpu_check_access_key_gpa(vcpu, access_key, mode, asce, gpa, ga, = fragment_len); if (rc) return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_KEYC); if (gpas) @@ -797,8 +796,8 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, u= nsigned long ga, u8 ar, return 0; } =20 -static int access_guest_page(struct kvm *kvm, enum gacc_mode mode, gpa_t g= pa, - void *data, unsigned int len) +static int access_guest_page_gpa(struct kvm *kvm, enum gacc_mode mode, gpa= _t gpa, + void *data, unsigned int len) { const unsigned int offset =3D offset_in_page(gpa); const gfn_t gfn =3D gpa_to_gfn(gpa); @@ -813,9 +812,8 @@ static int access_guest_page(struct kvm *kvm, enum gacc= _mode mode, gpa_t gpa, return rc; } =20 -static int -access_guest_page_with_key(struct kvm *kvm, enum gacc_mode mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) +static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, + void *data, unsigned int len, u8 access_key) { struct kvm_memory_slot *slot; bool writable; @@ -823,7 +821,7 @@ access_guest_page_with_key(struct kvm *kvm, enum gacc_m= ode mode, gpa_t gpa, hva_t hva; int rc; =20 - gfn =3D gpa >> PAGE_SHIFT; + gfn =3D gpa_to_gfn(gpa); slot =3D gfn_to_memslot(kvm, gfn); hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); =20 @@ -856,7 +854,7 @@ int access_guest_abs_with_key(struct kvm *kvm, gpa_t gp= a, void *data, =20 while (min(PAGE_SIZE - offset, len) > 0) { fragment_len =3D min(PAGE_SIZE - offset, len); - rc =3D access_guest_page_with_key(kvm, mode, gpa, data, fragment_len, ac= cess_key); + rc =3D access_guest_page_with_key_gpa(kvm, mode, gpa, data, fragment_len= , access_key); if (rc) return rc; offset =3D 0; @@ -916,15 +914,14 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsi= gned long ga, u8 ar, for (idx =3D 0; idx < nr_pages; idx++) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpas[idx]), len); if (try_fetch_prot_override && fetch_prot_override_applies(ga, fragment_= len)) { - rc =3D access_guest_page(vcpu->kvm, mode, gpas[idx], - data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpas[idx], data, fragment= _len); } else { - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, access_key); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, access_key); } if (rc =3D=3D PGM_PROTECTION && try_storage_prot_override) - rc =3D access_guest_page_with_key(vcpu->kvm, mode, gpas[idx], - data, fragment_len, PAGE_SPO_ACC); + rc =3D access_guest_page_with_key_gpa(vcpu->kvm, mode, gpas[idx], + data, fragment_len, PAGE_SPO_ACC); if (rc) break; len -=3D fragment_len; @@ -958,7 +955,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, while (len && !rc) { gpa =3D kvm_s390_real_to_abs(vcpu, gra); fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), len); - rc =3D access_guest_page(vcpu->kvm, mode, gpa, data, fragment_len); + rc =3D access_guest_page_gpa(vcpu->kvm, mode, gpa, data, fragment_len); len -=3D fragment_len; gra +=3D fragment_len; data +=3D fragment_len; @@ -1149,7 +1146,7 @@ int check_gpa_range(struct kvm *kvm, unsigned long gp= a, unsigned long length, =20 while (length && !rc) { fragment_len =3D min(PAGE_SIZE - offset_in_page(gpa), length); - rc =3D vm_check_access_key(kvm, access_key, mode, gpa); + rc =3D vm_check_access_key_gpa(kvm, access_key, mode, gpa); length -=3D fragment_len; gpa +=3D fragment_len; } --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C85BC3A8D43; Thu, 20 Nov 2025 17:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658979; cv=none; b=PJOXde0LE4M08puIoyRn81fLCdV4Et3EbQvm0lIpwgRRujIY4ZX/CNE6rDWXWc+Lf6SgvzNZQA1Uj/+v/Whie6CAvYA2VurJSMwcn9732bEVh55V2AoJETt5Yt3dvpY3bKZQLP0k8TKaxGhf4GCi9HxUR4cphil4HzUADoNFfPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658979; c=relaxed/simple; bh=smvNvMk/A78XAFU6gFZp4YeVqt+ECquADD5Pfz4z8u0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=slmLRRkLimm79yQWv3ui883yX3SYKVC0KIAkJjBSUTM/9GbZEiyFg5RM+20rOhAsfVj3GlDFdsea8KiqSyHB0gdCdXCX05DGgfbfhxSS0Exazzz2AH5jNGC0n05KA51/9O3x0o3Fq1lVkgljtwOXmXhP4cMFOKT5U0f6xxXz7Ck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=qKFX4lgV; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="qKFX4lgV" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDU2ap012624; Thu, 20 Nov 2025 17:16:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=TNWGaXoNDPQVLjJv5 F9Mh/F1oLB4sGb9MidyqxIVvF8=; b=qKFX4lgVpePzKK+S8nXU9Py7yss2ucxWM TY0GBuRpV9KxP3204U0iRUpqwgGCRMz0tEfHm1PlhlQXziiDv87fkgm98VfDh1Zw g7XU6KEGuzROJwZN39dKQQ4ZJ552wqWDPLbD3o4aPKhheVuO7d7C9EAkmwyQjV70 8zuzhU1YCByh9ZGNEzI8WSu3iPtFKkykmN5X76xhdMKicm/EgFpR5xkri21CO894 PCzzgivaND/Gfs30kOvqzrNgp2Ykv1/eS0wMqURP/RY1mBcw/bheydnHVQww5oZM N0CiQ5RSUVDXajLV1wSS4/4mao3GLST2cfvuK0sRecZXq0vM5rwAA== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejgx6aep-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:04 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGc7HO030854; Thu, 20 Nov 2025 17:16:03 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af47y7qpj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:03 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHFx8G59703598 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:15:59 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B59C82004D; Thu, 20 Nov 2025 17:15:59 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 07B3C20040; Thu, 20 Nov 2025 17:15:58 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:57 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 07/23] KVM: s390: KVM-specific bitfields and helper functions Date: Thu, 20 Nov 2025 18:15:28 +0100 Message-ID: <20251120171544.96841-8-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 7R61TMh3lNl6qdB_BHRnMaCtNKk3jQWl X-Authority-Analysis: v=2.4 cv=YqwChoYX c=1 sm=1 tr=0 ts=691f4cd4 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=3TdPgYH1icVPsE-fAXsA:9 X-Proofpoint-ORIG-GUID: 7R61TMh3lNl6qdB_BHRnMaCtNKk3jQWl X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX9OqBHnBWuoKA bEnV8pNBeAo/ahBV6AqJR9xrsKSUNtLwfjDH0orq1S6RiZ8q0HwEPTC5grpJBTcnD6vP42fzsms ZSWaOrHEyZDC6lSnYzLstNAd6SostH09C8t8vAdbDdgaSUW5zIwEXmAPYOymSw5ge+4FHCEt+yN tAk2DSGjL/jODZABVkn3qiQ8f6g6ybWJB8V5xU4zBmJ+2RaKaVdKzdJ3rvw1obiv4POIVFFssla 8or6sUb/6/jDx5jVKFi9DIt5UUsTLg2xHh/INKr0U8PZ5sIH6zCbTBjUGXQ+5Pr4sdq8J0CXv6M M+3sBuKxG14Geow1/9+2TbNiupMc4iCPrJ+A2gbxM50kAp05JW9UKMvXq084jswF4fTfoF/l8JG ualxcr/AEa5cmoYrVjRdknZj/bs03Q== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add KVM-s390 specific bitfields and helper functions to manipulate DAT tables. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.h | 720 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 720 insertions(+) create mode 100644 arch/s390/kvm/dat.h diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h new file mode 100644 index 000000000000..4d2b7a7bf898 --- /dev/null +++ b/arch/s390/kvm/dat.h @@ -0,0 +1,720 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2024, 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_DAT_H +#define __KVM_S390_DAT_H + +#include +#include +#include +#include +#include +#include +#include + +#define _ASCE(x) ((union asce) { .val =3D (x), }) +#define NULL_ASCE _ASCE(0) + +enum { + _DAT_TOKEN_NONE =3D 0, + _DAT_TOKEN_PIC, +}; + +#define _CRSTE_TOK(l, t, p) ((union crste) { \ + .tok.i =3D 1, \ + .tok.tt =3D (l), \ + .tok.type =3D (t), \ + .tok.par =3D (p) \ + }) +#define _CRSTE_PIC(l, p) _CRSTE_TOK(l, _DAT_TOKEN_PIC, p) + +#define _CRSTE_HOLE(l) _CRSTE_PIC(l, PGM_ADDRESSING) +#define _CRSTE_EMPTY(l) _CRSTE_TOK(l, _DAT_TOKEN_NONE, 0) + +#define _PMD_EMPTY _CRSTE_EMPTY(TABLE_TYPE_SEGMENT) + +#define _PTE_TOK(t, p) ((union pte) { .tok.i =3D 1, .tok.type =3D (t), .to= k.par =3D (p) }) +#define _PTE_EMPTY _PTE_TOK(_DAT_TOKEN_NONE, 0) + +/* This fake table type is used for page table walks (both for normal page= tables and vSIE) */ +#define TABLE_TYPE_PAGE_TABLE -1 + +enum dat_walk_flags { + DAT_WALK_CONTINUE =3D 0x20, + DAT_WALK_IGN_HOLES =3D 0x10, + DAT_WALK_SPLIT =3D 0x08, + DAT_WALK_ALLOC =3D 0x04, + DAT_WALK_ANY =3D 0x02, + DAT_WALK_LEAF =3D 0x01, + DAT_WALK_DEFAULT =3D 0 +}; + +#define DAT_WALK_SPLIT_ALLOC (DAT_WALK_SPLIT | DAT_WALK_ALLOC) +#define DAT_WALK_ALLOC_CONTINUE (DAT_WALK_CONTINUE | DAT_WALK_ALLOC) +#define DAT_WALK_LEAF_ALLOC (DAT_WALK_LEAF | DAT_WALK_ALLOC) + +union pte { + unsigned long val; + union page_table_entry h; + struct { + unsigned long :56; /* Hardware bits */ + unsigned long u : 1; /* Page unused */ + unsigned long s : 1; /* Special */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long sd: 1; /* Soft dirty */ + unsigned long pr: 1; /* Present */ + } s; + struct { + unsigned char hwbytes[7]; + unsigned char swbyte; + }; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :20; + unsigned long : 1; /* Must be 0 */ + unsigned long i : 1; /* Must be 1 */ + unsigned long : 2; + unsigned long : 7; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; +}; + +/* Soft dirty, needed as macro for atomic operations on ptes */ +#define _PAGE_SD 0x002 + +/* Needed as macro to perform atomic operations */ +#define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ + +enum pgste_gps_usage { + PGSTE_GPS_USAGE_STABLE =3D 0, + PGSTE_GPS_USAGE_UNUSED, + PGSTE_GPS_USAGE_POT_VOLATILE, + PGSTE_GPS_USAGE_VOLATILE, +}; + +union pgste { + unsigned long val; + struct { + unsigned long acc : 4; + unsigned long fp : 1; + unsigned long : 3; + unsigned long pcl : 1; + unsigned long hr : 1; + unsigned long hc : 1; + unsigned long : 2; + unsigned long gr : 1; + unsigned long gc : 1; + unsigned long : 1; + unsigned long :16; /* val16 */ + unsigned long zero : 1; + unsigned long nodat : 1; + unsigned long : 4; + unsigned long usage : 2; + unsigned long : 8; + unsigned long cmma_d : 1; /* Dirty flag for CMMA bits */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificatio= n */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 5; + unsigned long : 8; + }; + struct { + unsigned short hwbytes0; + unsigned short val16; /* used to store chunked values, see dat_{s,g}et_p= tval() */ + unsigned short hwbytes4; + unsigned char flags; /* maps to the software bits */ + unsigned char hwbyte7; + } __packed; +}; + +union pmd { + unsigned long val; + union segment_table_entry h; + struct { + struct { + unsigned long :44; /* HW */ + unsigned long : 3; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union pud { + unsigned long val; + union region3_table_entry h; + struct { + struct { + unsigned long :33; /* HW */ + unsigned long :14; /* Unused */ + unsigned long : 1; /* HW */ + unsigned long w : 1; /* Writable soft-bit */ + unsigned long r : 1; /* Readable soft-bit */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; /* Unused */ + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; +}; + +union p4d { + unsigned long val; + union region2_table_entry h; +}; + +union pgd { + unsigned long val; + union region1_table_entry h; +}; + +union crste { + unsigned long val; + union { + struct { + unsigned long :52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long : 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long : 2; + }; + struct { + unsigned long to:52; + unsigned long : 1; + unsigned long fc: 1; + unsigned long p : 1; + unsigned long : 1; + unsigned long tf: 2; + unsigned long i : 1; + unsigned long : 1; + unsigned long tt: 2; + unsigned long tl: 2; + } fc0; + struct { + unsigned long :47; + unsigned long av : 1; /* ACCF-Validity Control */ + unsigned long acc: 4; /* Access-Control Bits */ + unsigned long f : 1; /* Fetch-Protection Bit */ + unsigned long fc : 1; /* Format-Control */ + unsigned long p : 1; /* DAT-Protection Bit */ + unsigned long iep: 1; /* Instruction-Execution-Protection */ + unsigned long : 2; + unsigned long i : 1; /* Segment-Invalid Bit */ + unsigned long cs : 1; /* Common-Segment Bit */ + unsigned long tt : 2; /* Table-Type Bits */ + unsigned long : 2; + } fc1; + } h; + struct { + struct { + unsigned long :47; + unsigned long : 1; /* HW (should be 0) */ + unsigned long w : 1; /* Writable */ + unsigned long r : 1; /* Readable */ + unsigned long d : 1; /* Dirty */ + unsigned long y : 1; /* Young */ + unsigned long prefix_notif : 1; /* Guest prefix invalidation notificati= on */ + unsigned long : 3; /* HW */ + unsigned long vsie_notif : 1; /* Referenced in a shadow table */ + unsigned long : 1; + unsigned long : 4; /* HW */ + unsigned long sd : 1; /* Soft-Dirty */ + unsigned long pr : 1; /* Present */ + } fc1; + } s; + union { + struct { + unsigned long type :16; /* Token type */ + unsigned long par :16; /* Token parameter */ + unsigned long :26; + unsigned long i : 1; /* Must be 1 */ + unsigned long : 1; + unsigned long tt : 2; + unsigned long : 1; + unsigned long pr : 1; /* Must be 0 */ + }; + struct { + unsigned long token:32; /* Token and parameter */ + unsigned long :32; + }; + } tok; + union pmd pmd; + union pud pud; + union p4d p4d; + union pgd pgd; +}; + +union skey { + unsigned char skey; + struct { + unsigned char acc :4; + unsigned char fp :1; + unsigned char r :1; + unsigned char c :1; + unsigned char zero:1; + }; +}; + +static_assert(sizeof(union pgste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pte) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pmd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pud) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union p4d) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union pgd) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union crste) =3D=3D sizeof(unsigned long)); +static_assert(sizeof(union skey) =3D=3D sizeof(char)); + +struct segment_table { + union pmd pmds[_CRST_ENTRIES]; +}; + +struct region3_table { + union pud puds[_CRST_ENTRIES]; +}; + +struct region2_table { + union p4d p4ds[_CRST_ENTRIES]; +}; + +struct region1_table { + union pgd pgds[_CRST_ENTRIES]; +}; + +struct crst_table { + union { + union crste crstes[_CRST_ENTRIES]; + struct segment_table segment; + struct region3_table region3; + struct region2_table region2; + struct region1_table region1; + }; +}; + +struct page_table { + union pte ptes[_PAGE_ENTRIES]; + union pgste pgstes[_PAGE_ENTRIES]; +}; + +static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); +static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); + +/** + * _pte() - Useful constructor for union pte + * @pfn: the pfn this pte should point to. + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * @special: whether the pte should be marked as special + * + * The pte is also marked as young and present. If the pte is marked as di= rty, + * it gets marked as soft-dirty too. If the pte is not dirty, the hardware + * protect bit is set (independently of the write softbit); this way proper + * dirty tracking can be performed. + * + * Return: a union pte value. + */ +static inline union pte _pte(kvm_pfn_t pfn, bool writable, bool dirty, boo= l special) +{ + union pte res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.p =3D !dirty; + res.s.y =3D 1; + res.s.pr =3D 1; + res.s.w =3D writable; + res.s.d =3D dirty; + res.s.sd =3D dirty; + res.s.s =3D special; + return res; +} + +static inline union crste _crste_fc0(kvm_pfn_t pfn, int tt) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) }; + + res.h.tt =3D tt; + res.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + res.h.fc0.tf =3D 0; + return res; +} + +/** + * _crste() - Useful constructor for union crste with FC=3D1 + * @pfn: the pfn this pte should point to. + * @tt: the table type + * @writable: whether the pte should be writable. + * @dirty: whether the pte should be dirty. + * + * The crste is also marked as young and present. If the crste is marked as + * dirty, it gets marked as soft-dirty too. If the crste is not dirty, the + * hardware protect bit is set (independently of the write softbit); this = way + * proper dirty tracking can be performed. + * + * Return: a union crste value. + */ +static inline union crste _crste_fc1(kvm_pfn_t pfn, int tt, bool writable,= bool dirty) +{ + union crste res =3D { .val =3D PFN_PHYS(pfn) & _SEGMENT_MASK }; + + res.h.tt =3D tt; + res.h.p =3D !dirty; + res.h.fc =3D 1; + res.s.fc1.y =3D 1; + res.s.fc1.pr =3D 1; + res.s.fc1.w =3D writable; + res.s.fc1.d =3D dirty; + res.s.fc1.sd =3D dirty; + return res; +} + +/** + * struct vsie_rmap - reverse mapping for shadow page table entries + * @next: pointer to next rmap in the list + * @r_gfn: virtual rmap address in the shadow guest address space + */ +struct vsie_rmap { + struct vsie_rmap *next; + union { + unsigned long val; + struct { + long level: 8; + unsigned long : 4; + unsigned long r_gfn:52; + }; + }; +}; + +static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); + +static inline struct crst_table *crste_table_start(union crste *crstep) +{ + return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); +} + +static inline struct page_table *pte_table_start(union pte *ptep) +{ + return (struct page_table *)ALIGN_DOWN((unsigned long)ptep, _PAGE_TABLE_S= IZE); +} + +static inline bool crdte_crste(union crste *crstep, union crste old, union= crste new, gfn_t gfn, + union asce asce) +{ + unsigned long dtt =3D 0x10 | new.h.tt << 2; + void *table =3D crste_table_start(crstep); + + return crdte(old.val, new.val, table, dtt, gfn_to_gpa(gfn), asce.val); +} + +/** + * idte_crste() - invalidate a crste entry using idte + * @crstep: pointer to the crste to be invalidated + * @gfn: a gfn mapped by the crste + * @opt: options for the idte instruction + * @asce: the asce + * @local: whether the operation is cpu-local + */ +static __always_inline void idte_crste(union crste *crstep, gfn_t gfn, uns= igned long opt, + union asce asce, int local) +{ + unsigned long table_origin =3D __pa(crste_table_start(crstep)); + unsigned long gaddr =3D gfn_to_gpa(gfn) & HPAGE_MASK; + + if (__builtin_constant_p(opt) && opt =3D=3D 0) { + /* flush without guest asce */ + asm volatile("idte %[table_origin],0,%[gaddr],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr] "a" (gaddr), + [local] "i" (local) + : "cc"); + } else { + /* flush with guest asce */ + asm volatile("idte %[table_origin],%[asce],%[gaddr_opt],%[local]" + : "+m" (*crstep) + : [table_origin] "a" (table_origin), [gaddr_opt] "a" (gaddr | opt), + [asce] "a" (asce.val), [local] "i" (local) + : "cc"); + } +} + +static inline void dat_init_pgstes(struct page_table *pt, unsigned long va= l) +{ + memset64((void *)pt->pgstes, val, PTRS_PER_PTE); +} + +static inline void dat_init_page_table(struct page_table *pt, unsigned lon= g ptes, + unsigned long pgstes) +{ + memset64((void *)pt->ptes, ptes, PTRS_PER_PTE); + dat_init_pgstes(pt, pgstes); +} + +static inline gfn_t asce_end(union asce asce) +{ + return 1ULL << ((asce.dt + 1) * 11 + _SEGMENT_SHIFT - PAGE_SHIFT); +} + +#define _CRSTE(x) ((union crste) { .val =3D _Generic((x), \ + union pgd : (x).val, \ + union p4d : (x).val, \ + union pud : (x).val, \ + union pmd : (x).val, \ + union crste : (x).val)}) + +#define _CRSTEP(x) ((union crste *)_Generic((*(x)), \ + union pgd : (x), \ + union p4d : (x), \ + union pud : (x), \ + union pmd : (x), \ + union crste : (x))) + +#define _CRSTP(x) ((struct crst_table *)_Generic((*(x)), \ + struct crst_table : (x), \ + struct segment_table : (x), \ + struct region3_table : (x), \ + struct region2_table : (x), \ + struct region1_table : (x))) + +static inline bool asce_contains_gfn(union asce asce, gfn_t gfn) +{ + return gfn < asce_end(asce); +} + +static inline bool is_pmd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_SEGMENT; +} + +static inline bool is_pud(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION3; +} + +static inline bool is_p4d(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION2; +} + +static inline bool is_pgd(union crste crste) +{ + return crste.h.tt =3D=3D TABLE_TYPE_REGION1; +} + +static inline phys_addr_t pmd_origin_large(union pmd pmd) +{ + return pmd.val & _SEGMENT_ENTRY_ORIGIN_LARGE; +} + +static inline phys_addr_t pud_origin_large(union pud pud) +{ + return pud.val & _REGION3_ENTRY_ORIGIN_LARGE; +} + +/** + * crste_origin_large() - Return the large frame origin of a large crste + * @crste: The crste whose origin is to be returned. Should be either a + * region-3 table entry or a segment table entry, in both cases wi= th + * FC set to 1 (large pages). + * + * Return: The origin of the large frame pointed to by @crste, or -1 if the + * crste was not large (wrong table type, or FC=3D=3D0) + */ +static inline phys_addr_t crste_origin_large(union crste crste) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return pmd_origin_large(crste.pmd); + return pud_origin_large(crste.pud); +} + +#define crste_origin(x) (_Generic((x), \ + union pmd : (x).val & _SEGMENT_ENTRY_ORIGIN, \ + union pud : (x).val & _REGION_ENTRY_ORIGIN, \ + union p4d : (x).val & _REGION_ENTRY_ORIGIN, \ + union pgd : (x).val & _REGION_ENTRY_ORIGIN)) + +static inline unsigned long pte_origin(union pte pte) +{ + return pte.val & PAGE_MASK; +} + +static inline bool pmd_prefix(union pmd pmd) +{ + return pmd.h.fc && pmd.s.fc1.prefix_notif; +} + +static inline bool pud_prefix(union pud pud) +{ + return pud.h.fc && pud.s.fc1.prefix_notif; +} + +static inline bool crste_leaf(union crste crste) +{ + return (crste.h.tt <=3D TABLE_TYPE_REGION3) && crste.h.fc; +} + +static inline bool crste_prefix(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.prefix_notif; +} + +static inline bool crste_dirty(union crste crste) +{ + return crste_leaf(crste) && crste.s.fc1.d; +} + +static inline union pgste *pgste_of(union pte *pte) +{ + return (union pgste *)(pte + _PAGE_ENTRIES); +} + +static inline bool pte_hole(union pte pte) +{ + return pte.h.i && !pte.tok.pr && pte.tok.type !=3D _DAT_TOKEN_NONE; +} + +static inline bool _crste_hole(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type !=3D _DAT_TOKEN_NONE; +} + +#define crste_hole(x) _crste_hole(_CRSTE(x)) + +static inline bool _crste_none(union crste crste) +{ + return crste.h.i && !crste.tok.pr && crste.tok.type =3D=3D _DAT_TOKEN_NON= E; +} + +#define crste_none(x) _crste_none(_CRSTE(x)) + +static inline phys_addr_t large_pud_to_phys(union pud pud, gfn_t gfn) +{ + return pud_origin_large(pud) | (gfn_to_gpa(gfn) & ~_REGION3_MASK); +} + +static inline phys_addr_t large_pmd_to_phys(union pmd pmd, gfn_t gfn) +{ + return pmd_origin_large(pmd) | (gfn_to_gpa(gfn) & ~_SEGMENT_MASK); +} + +static inline phys_addr_t large_crste_to_phys(union crste crste, gfn_t gfn) +{ + if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3)) + return -1; + if (is_pmd(crste)) + return large_pmd_to_phys(crste.pmd, gfn); + return large_pud_to_phys(crste.pud, gfn); +} + +static inline bool cspg_crste(union crste *crstep, union crste old, union = crste new) +{ + return cspg(&crstep->val, old.val, new.val); +} + +static inline struct page_table *dereference_pmd(union pmd pmd) +{ + return phys_to_virt(crste_origin(pmd)); +} + +static inline struct segment_table *dereference_pud(union pud pud) +{ + return phys_to_virt(crste_origin(pud)); +} + +static inline struct region3_table *dereference_p4d(union p4d p4d) +{ + return phys_to_virt(crste_origin(p4d)); +} + +static inline struct region2_table *dereference_pgd(union pgd pgd) +{ + return phys_to_virt(crste_origin(pgd)); +} + +static inline struct crst_table *_dereference_crste(union crste crste) +{ + if (unlikely(is_pmd(crste))) + return NULL; + return phys_to_virt(crste_origin(crste.pud)); +} + +#define dereference_crste(x) (_Generic((x), \ + union pud : _dereference_crste(_CRSTE(x)), \ + union p4d : _dereference_crste(_CRSTE(x)), \ + union pgd : _dereference_crste(_CRSTE(x)), \ + union crste : _dereference_crste(_CRSTE(x)))) + +static inline struct crst_table *dereference_asce(union asce asce) +{ + return phys_to_virt(asce.val & _ASCE_ORIGIN); +} + +static inline void asce_flush_tlb(union asce asce) +{ + __tlb_flush_idte(asce.val); +} + +static inline bool pgste_get_trylock(union pte *ptep, union pgste *res) +{ + union pgste *pgstep =3D pgste_of(ptep); + union pgste old_pgste; + + if (READ_ONCE(pgstep->val) & PGSTE_PCL_BIT) + return false; + old_pgste.val =3D __atomic64_or_barrier(PGSTE_PCL_BIT, &pgstep->val); + if (old_pgste.pcl) + return false; + old_pgste.pcl =3D 1; + *res =3D old_pgste; + return true; +} + +static inline union pgste pgste_get_lock(union pte *ptep) +{ + union pgste res; + + while (!pgste_get_trylock(ptep, &res)) + cpu_relax(); + return res; +} + +static inline void pgste_set_unlock(union pte *ptep, union pgste pgste) +{ + pgste.pcl =3D 0; + barrier(); + WRITE_ONCE(*pgste_of(ptep), pgste); +} + +#endif /* __KVM_S390_DAT_H */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43DEF3A8D43; Thu, 20 Nov 2025 17:16:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658970; cv=none; b=JnkLBZfIFndHXOlwW1/BHWsOTrKR1vBWQ8Y/EalY0DI9ujWaDxTq+jSMrM1i2bKwQq88fL+Xq6x/giHi/0C0BYKjrPgb6ekb5RsMAQPA5FJBBQ9O3XzH2oKwk3CbeUmuyGR4R9gciFVKRuWgt/7wq817Q3BSvy7oAQEjkQbiwFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658970; c=relaxed/simple; bh=drRAYhaKeCpm2Wb5BXeML0ALeSOTJoZaXmajlaaUhG0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JyIhwkYHE9XYi3IAqGde1b2WsV4akSJHO04Xma+K/RVc//yCO0KKY6fGKqgVngm2b6qJYaTzLxsrFFR7BUMjnfa2s8R9fPETXhprAlQDfp1dIctDycAbthj/fxeaB6xdomzbvnuxUYg1O1gyAj2KjfQn1CxIN8HO0sJo0/WcNzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=jz2MheOg; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="jz2MheOg" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCnSii028030; Thu, 20 Nov 2025 17:16:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=T6CKg4sqfm3emsu/q zexgqOn1j36sSwAQAN4WhUZ42Q=; b=jz2MheOgl8dnBfvhSGjlPJJu7AHzVOStE 6jfvF/OMZRBtFWj/CTxyrZfWXq/8nbOtZr/CsYLcTfPnIc7pt3B/4hUyjJIasmXx Z4/sma9zMuM6Mh4dxYbyoJ5RLXpwTI1hRkC8a8aXWayhl0m2ThMZnKQeIH3ZQYga LI3bHYzalblwP4qZt8ZWzU+B+32YlIPSu07xVKGpNlKwT5JclvhRAhZfoB3sLazH UzOuI3AcsFTzCBggaFWiPUXrq7blqJEZNWHejVQdOp9LAEYhUFvP3eLIou91OTRa QjM0cUetZVDzeoK4VnmjE5B3FC9s8oEyCtghwUYxCcRyz8y74x9KQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7mv8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:06 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEewep017319; Thu, 20 Nov 2025 17:16:05 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af6j1ybnp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:05 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHG1Xo41812450 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:01 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C1D1D2004B; Thu, 20 Nov 2025 17:16:01 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 03B3620043; Thu, 20 Nov 2025 17:16:00 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:15:59 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 08/23] KVM: s390: KVM page table management functions: allocation Date: Thu, 20 Nov 2025 18:15:29 +0100 Message-ID: <20251120171544.96841-9-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 76bzQfXbMoOz9Sf-ZCn_1FHk1lRdMQee X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXzrLaIR4Q6EVN 3W5iZLF91/zGfUddwb8eBGNRn0WUVVIhUDZoGLM8ruWoB4y8S0122PuUayy8c3+91KEHalxgY/M hvdzkE56CcM3u137QAwutidQ6UiJBOKZs9RE1wyL86kb/iJK1eLoIZl6tamIJB/NkT+mpeQQkRj rrrdkX5pClNQLZNSO6OzwJfwwQowK8uVNOaPBp6qqbK7c8f06q1ZDVxGrGLo1jnHgNdFhqoSnDp H90I4oAMYfnvJFLM6l6dzOn2Kxj2C5hZWhw2qvG2J1AUcbju/JFSz+a3ZJBdW3EgG9HxsG0v2Xh YIyK8DIu7dwhPLDDWlnhvq/llAHgaEzRVAe7WxQxMqT/TujH/lxiYh6pQBEot8y8V9P+33om6Ns xgrChafHljMokvPnTZ0yUlmm3uvpRw== X-Proofpoint-ORIG-GUID: 76bzQfXbMoOz9Sf-ZCn_1FHk1lRdMQee X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cd6 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=ciOewYurfqE_QU1RCzEA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds the boilerplate and functions for the allocation and deallocation of DAT tables. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/Makefile | 1 + arch/s390/kvm/dat.c | 103 +++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 77 +++++++++++++++++++++++++++ arch/s390/mm/page-states.c | 1 + 4 files changed, 182 insertions(+) create mode 100644 arch/s390/kvm/dat.c diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 9a723c48b05a..84315d2f75fb 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,6 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D dat.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c new file mode 100644 index 000000000000..c324a27f379f --- /dev/null +++ b/arch/s390/kvm/dat.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2020, 2024 + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "dat.h" + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc) +{ + void *o; + + for ( ; mc->n_crsts < KVM_S390_MMU_CACHE_N_CRSTS; mc->n_crsts++) { + o =3D (void *)__get_free_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALL= OC_ORDER); + if (!o) + return -ENOMEM; + mc->crsts[mc->n_crsts] =3D o; + } + for ( ; mc->n_pts < KVM_S390_MMU_CACHE_N_PTS; mc->n_pts++) { + o =3D (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->pts[mc->n_pts] =3D o; + } + for ( ; mc->n_rmaps < KVM_S390_MMU_CACHE_N_RMAPS; mc->n_rmaps++) { + o =3D kzalloc(sizeof(*mc->rmaps[0]), GFP_KERNEL_ACCOUNT); + if (!o) + return -ENOMEM; + mc->rmaps[mc->n_rmaps] =3D o; + } + return 0; +} + +static inline struct page_table *dat_alloc_pt_noinit(struct kvm_s390_mmu_c= ache *mc) +{ + struct page_table *res; + + res =3D kvm_s390_mmu_cache_alloc_pt(mc); + if (res) + __arch_set_page_dat(res, 1); + return res; +} + +static inline struct crst_table *dat_alloc_crst_noinit(struct kvm_s390_mmu= _cache *mc) +{ + struct crst_table *res; + + res =3D kvm_s390_mmu_cache_alloc_crst(mc); + if (res) + __arch_set_page_dat(res, 1UL << CRST_ALLOC_ORDER); + return res; +} + +struct crst_table *dat_alloc_crst_sleepable(unsigned long init) +{ + struct page *page; + void *virt; + + page =3D alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_COMP, CRST_ALLOC_ORDER); + if (!page) + return NULL; + virt =3D page_to_virt(page); + __arch_set_page_dat(virt, 1UL << CRST_ALLOC_ORDER); + crst_table_init(virt, init); + return virt; +} + +void dat_free_level(struct crst_table *table, bool owns_ptes) +{ + unsigned int i; + + for (i =3D 0; i < _CRST_ENTRIES; i++) { + if (table->crstes[i].h.fc || table->crstes[i].h.i) + continue; + if (!is_pmd(table->crstes[i])) + dat_free_level(dereference_crste(table->crstes[i]), owns_ptes); + else if (owns_ptes) + dat_free_pt(dereference_pmd(table->crstes[i].pmd)); + } + dat_free_crst(table); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 4d2b7a7bf898..486b7dfc5df2 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -418,6 +418,46 @@ struct vsie_rmap { =20 static_assert(sizeof(struct vsie_rmap) =3D=3D 2 * sizeof(long)); =20 +#define KVM_S390_MMU_CACHE_N_CRSTS 6 +#define KVM_S390_MMU_CACHE_N_PTS 2 +#define KVM_S390_MMU_CACHE_N_RMAPS 16 +struct kvm_s390_mmu_cache { + void *crsts[KVM_S390_MMU_CACHE_N_CRSTS]; + void *pts[KVM_S390_MMU_CACHE_N_PTS]; + void *rmaps[KVM_S390_MMU_CACHE_N_RMAPS]; + short int n_crsts; + short int n_pts; + short int n_rmaps; +}; + +void dat_free_level(struct crst_table *table, bool owns_ptes); +struct crst_table *dat_alloc_crst_sleepable(unsigned long init); + +int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); + +#define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) + +static inline struct page_table *kvm_s390_mmu_cache_alloc_pt(struct kvm_s3= 90_mmu_cache *mc) +{ + if (mc->n_pts) + return mc->pts[--mc->n_pts]; + return (void *)__get_free_page(GFP_KVM_S390_MMU_CACHE); +} + +static inline struct crst_table *kvm_s390_mmu_cache_alloc_crst(struct kvm_= s390_mmu_cache *mc) +{ + if (mc->n_crsts) + return mc->crsts[--mc->n_crsts]; + return (void *)__get_free_pages(GFP_KVM_S390_MMU_CACHE | __GFP_COMP, CRST= _ALLOC_ORDER); +} + +static inline struct vsie_rmap *kvm_s390_mmu_cache_alloc_rmap(struct kvm_s= 390_mmu_cache *mc) +{ + if (mc->n_rmaps) + return mc->rmaps[--mc->n_rmaps]; + return kzalloc(sizeof(struct vsie_rmap), GFP_KVM_S390_MMU_CACHE); +} + static inline struct crst_table *crste_table_start(union crste *crstep) { return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE= _SIZE); @@ -717,4 +757,41 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_free_pt(struct page_table *pt) +{ + free_page((unsigned long)pt); +} + +static inline void _dat_free_crst(struct crst_table *table) +{ + free_pages((unsigned long)table, CRST_ALLOC_ORDER); +} + +#define dat_free_crst(x) _dat_free_crst(_CRSTP(x)) + +static inline void kvm_s390_free_mmu_cache(struct kvm_s390_mmu_cache *mc) +{ + if (!mc) + return; + while (mc->n_pts) + dat_free_pt(mc->pts[--mc->n_pts]); + while (mc->n_crsts) + _dat_free_crst(mc->crsts[--mc->n_crsts]); + while (mc->n_rmaps) + kfree(mc->rmaps[--mc->n_rmaps]); + kfree(mc); +} + +DEFINE_FREE(kvm_s390_mmu_cache, struct kvm_s390_mmu_cache *, if (_T) kvm_s= 390_free_mmu_cache(_T)) + +static inline struct kvm_s390_mmu_cache *kvm_s390_new_mmu_cache(void) +{ + struct kvm_s390_mmu_cache *mc __free(kvm_s390_mmu_cache); + + mc =3D kzalloc(sizeof(*mc), GFP_KERNEL_ACCOUNT); + if (mc && !kvm_s390_mmu_cache_topup(mc)) + return_ptr(mc); + return NULL; +} + #endif /* __KVM_S390_DAT_H */ diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c index 01f9b39e65f5..5bee173db72e 100644 --- a/arch/s390/mm/page-states.c +++ b/arch/s390/mm/page-states.c @@ -13,6 +13,7 @@ #include =20 int __bootdata_preserved(cmma_flag); +EXPORT_SYMBOL(cmma_flag); =20 void arch_free_page(struct page *page, int order) { --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA8EF377E9C; Thu, 20 Nov 2025 17:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658978; cv=none; b=X2lrjKSOAuR6QiOEy6d5qK1o0mIomOB6//9pZ9YkX1jRoEzm67w7+xlTvq6uIBzYn0dEjwwfEfbL1Kx4jn8zYwzyWbVeaAGY/qaFDtcte4Eayoy66zNMNmFQpQkRaHKbzLPJSQcsOe3xkRJVtXjBVM1gb2j6XGHjQD05NmBD+vU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658978; c=relaxed/simple; bh=YgWFM24ZgeoBOmj0T3wFA+RmJ7JyziVkHiE2wUwE9Nc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bfGuFoiqPFPI/y/f493M2ZQMHAOh87cx9KTZaPoMrh1bfx2I5cDvYbE6HKIPhE3ekRebyeinD61UoSHISQR17dawrcb8kgvlPm7iVMWiKGnnpTUORnPgefGSRX9UKlEvE6lpZGfLbDIkSOrBjFDBe6OWflwt993fjyFmBqO/rp8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ZT0MELUx; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ZT0MELUx" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBRtgK021553; Thu, 20 Nov 2025 17:16:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=rjdyBPkqJpI6MWf8t mW68B0spU3vJ3MctslbddyriDo=; b=ZT0MELUxfvunQUoC/xSfSMB2GOVhSgFWU 119r2roIRS0gNJzYhmitEOh6iihnT5SFi8Bi02DmnmI25rVSNYDTLJLGYjFzjWjG 9AXUXpVG0rwZUXV4SS3FObDFoQicL/nFJLNeyQm5/LfDb4zSU9Kl2SXzhRbtFJXy 3jJPgmkYF+ZZunNP0bemrPCGhO8JIbyRfnR/ol4rE51y69hy7d1aYyeGlHXqk36v G1/JEWyhxWqsivXM1gs/9qU6DPEZcZCTVaDhJnFl9fmSa2QcOC588pB8ZbzQbMtd HUG/1JjKZlca4yP9F+3ILb1XcqMU9mq9fspJpqKuawOm8QMOkS3Xw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejjwfrbh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:08 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEBuPm006959; Thu, 20 Nov 2025 17:16:07 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af62jqdmw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:07 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHG36Y38666620 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:03 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 99FED20043; Thu, 20 Nov 2025 17:16:03 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F0BB42004D; Thu, 20 Nov 2025 17:16:01 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:01 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 09/23] KVM: s390: KVM page table management functions: clear and replace Date: Thu, 20 Nov 2025 18:15:30 +0100 Message-ID: <20251120171544.96841-10-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=BanVE7t2 c=1 sm=1 tr=0 ts=691f4cd8 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=cTj5uL8xXfUXbJxEK6wA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX4nlUcpShGNrL 1pmctcLqQWB258SbrTzf5Exap0VeGG+B1IFHvf5srevl8rGru/A3vs0M4nSTwIXvgDyFY+ini1v fAZSxHIFg5aSyPG4l3nEk7MOw1KAmQZsdgkHgRAEfWvjFJL6Bv4wcukS2AGxGrGO/jei7nI3XfR ZeaDlrzQWSmJAbnQiUvfJMNAbTpOtH7tPhq97AtI+qcwWEgAgGOkxATX+Lw5omfr3LBSbstd9fO dQEVfLt7TDFkKlhQD0Yn6YQo232F8wV2FEyk1WFn6RF60fgjeeW+Grrs+EioxJE4BDli6LQS32Y 2N+DrIQrMUhtj0XHJLL+Ol91VY04Acap7V/ROKTbfBRm6l83TQa/tW+hbh6WeHIPx89lO8gJatU qY+4yZLi0O51QzU2vUJ0O0SOBIWqzw== X-Proofpoint-GUID: dxMC7KIXmruieVjbKRoc4Bp0XFnzvVU9 X-Proofpoint-ORIG-GUID: dxMC7KIXmruieVjbKRoc4Bp0XFnzvVU9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 phishscore=0 bulkscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to clear, replace or exchange DAT table entries. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 118 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 40 +++++++++++++++ 2 files changed, 158 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index c324a27f379f..a9d5b49ac411 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -101,3 +101,121 @@ void dat_free_level(struct crst_table *table, bool ow= ns_ptes) } dat_free_crst(table); } + +/** + * dat_crstep_xchg - exchange a gmap CRSTE with another + * @crstep: pointer to the CRST entry + * @new: replacement entry + * @gfn: the affected guest address + * @asce: the ASCE of the address space + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + */ +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce) +{ + if (crstep->h.i) { + WRITE_ONCE(*crstep, new); + return; + } else if (cpu_has_edat2()) { + crdte_crste(crstep, *crstep, new, gfn, asce); + return; + } + + if (machine_has_tlb_guest()) + idte_crste(crstep, gfn, IDTE_GUEST_ASCE, asce, IDTE_GLOBAL); + else + idte_crste(crstep, gfn, 0, NULL_ASCE, IDTE_GLOBAL); + WRITE_ONCE(*crstep, new); +} + +/** + * dat_crstep_xchg_atomic - atomically exchange a gmap CRSTE with another + * @crstep: pointer to the CRST entry + * @old: expected old value + * @new: replacement entry + * @gfn: the affected guest address + * @asce: the asce of the address space + * + * This function should only be called on invalid crstes, or on crstes with + * FC =3D 1, as that guarantees the presence of CSPG. + * + * This function is needed to atomically exchange a CRSTE that potentially + * maps a prefix area, without having to invalidate it inbetween. + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: true if the exchange was successful. + */ +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce) +{ + if (old.h.i) + return arch_try_cmpxchg((long *)crstep, &old.val, new.val); + if (cpu_has_edat2()) + return crdte_crste(crstep, old, new, gfn, asce); + return cspg_crste(crstep, old, new); +} + +static void dat_set_storage_key_from_pgste(union pte pte, union pgste pgst= e) +{ + union skey nkey =3D { .acc =3D pgste.acc, .fp =3D pgste.fp }; + + page_set_storage_key(pte_origin(pte), nkey.skey, 0); +} + +static void dat_move_storage_key(union pte old, union pte new) +{ + page_set_storage_key(pte_origin(new), page_get_storage_key(pte_origin(old= )), 1); +} + +static union pgste dat_save_storage_key_into_pgste(union pte pte, union pg= ste pgste) +{ + union skey skey; + + skey.skey =3D page_get_storage_key(pte_origin(pte)); + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gr |=3D skey.r; + pgste.gc |=3D skey.c; + + return pgste; +} + +union pgste __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte = new, gfn_t gfn, + union asce asce, bool uses_skeys) +{ + union pte old =3D READ_ONCE(*ptep); + + /* Updating only the software bits while holding the pgste lock */ + if (!((ptep->val ^ new.val) & ~_PAGE_SW_BITS)) { + WRITE_ONCE(ptep->swbyte, new.swbyte); + return pgste; + } + + if (!old.h.i) { + unsigned long opts =3D IPTE_GUEST_ASCE | (pgste.nodat ? IPTE_NODAT : 0); + + if (machine_has_tlb_guest()) + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, opts, asce.val, IPTE_GLOBAL); + else + __ptep_ipte(gfn_to_gpa(gfn), (void *)ptep, 0, 0, IPTE_GLOBAL); + } + + if (uses_skeys) { + if (old.h.i && !new.h.i) + /* Invalid to valid: restore storage keys from PGSTE */ + dat_set_storage_key_from_pgste(new, pgste); + else if (!old.h.i && new.h.i) + /* Valid to invalid: save storage keys to PGSTE */ + pgste =3D dat_save_storage_key_into_pgste(old, pgste); + else if (!old.h.i && !new.h.i) + /* Valid to valid: move storage keys */ + if (old.h.pfra !=3D new.h.pfra) + dat_move_storage_key(old, new); + /* Invalid to invalid: nothing to do */ + } + + WRITE_ONCE(*ptep, new); + return pgste; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 486b7dfc5df2..6a336c3c6f62 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -430,6 +430,12 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, + gfn_t gfn, union asce asce, bool uses_skeys); +bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, + union asce asce); +void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); + void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -757,6 +763,21 @@ static inline void pgste_set_unlock(union pte *ptep, u= nion pgste pgste) WRITE_ONCE(*pgste_of(ptep), pgste); } =20 +static inline void dat_ptep_xchg(union pte *ptep, union pte new, gfn_t gfn= , union asce asce, + bool has_skeys) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, asce, has_skeys); + pgste_set_unlock(ptep, pgste); +} + +static inline void dat_ptep_clear(union pte *ptep, gfn_t gfn, union asce a= sce, bool has_skeys) +{ + dat_ptep_xchg(ptep, _PTE_EMPTY, gfn, asce, has_skeys); +} + static inline void dat_free_pt(struct page_table *pt) { free_page((unsigned long)pt); @@ -794,4 +815,23 @@ static inline struct kvm_s390_mmu_cache *kvm_s390_new_= mmu_cache(void) return NULL; } =20 +static inline bool dat_pmdp_xchg_atomic(union pmd *pmdp, union pmd old, un= ion pmd new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pmdp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline bool dat_pudp_xchg_atomic(union pud *pudp, union pud old, un= ion pud new, + gfn_t gfn, union asce asce) +{ + return dat_crstep_xchg_atomic(_CRSTEP(pudp), _CRSTE(old), _CRSTE(new), gf= n, asce); +} + +static inline void dat_crstep_clear(union crste *crstep, gfn_t gfn, union = asce asce) +{ + union crste newcrste =3D _CRSTE_EMPTY(crstep->h.tt); + + dat_crstep_xchg(crstep, newcrste, gfn, asce); +} + #endif /* __KVM_S390_DAT_H */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EE763AA1AA; Thu, 20 Nov 2025 17:16:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658985; cv=none; b=k2L4LbkOawVoDqlYzoiT78+E6k2xmk2NsL6laavywbUwp/H3wm8f8uV3VYKwpzvpk3i5a7G586+xYhFxkxpXKeJMaQZtG7BqQVDhlkL80h+ypIs5S9K0xJ4MZ1f9Su1C+3G54yoao5mbCdClofzXyIlTOzgLuWjR5ioQakAuj/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658985; c=relaxed/simple; bh=XQOuLoTSkecG+KCjP3MBVF58HAw9g0zCJ4PnklcCyC4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=npTT98vNImHjtg8wHNKHJmCUTaO9tHCNA/sWTJoHW9dXk5IUGYtnNGipi1ZWB5J/qu2PDYeoG+TAeV0FLtcWADjS2rLIMsKYxSzQDAx6A0ngSg8rWuY4abtZDghDYXAqHX0jysqcf+xnjdbv60dQgrLVGxnQmHlmhmDqHJwOYBs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=DUpNpBSI; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="DUpNpBSI" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCVGqk004022; Thu, 20 Nov 2025 17:16:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=FU0MAMuzkO9E5WdOF bQT6ipdvRj5oGgyDE6crOYjvjQ=; b=DUpNpBSICmpNU9386B5d1xi+VU44aF2S3 eyF+yngRrMALWh4A2avT3PKNxqOB8KVr0G/Oywe7lsl4iFk3cCZa4y5MJT4+VyEk dmAht4n61nmU67mG5qsMOuBeqeOzkn+xv94C54Gs0P15doo+zqh2z9/cgWGss/r7 PQ2aRJSKopKVAnq0nVUsrDoLtszCpIhcuWBRTHbl91RUT3EHh7SOCoTY3S7ZaS06 TvUsELuIbQjXHjTRGo6SJLxIEFTdhalc01JXFF+SIsnvvpseymTNR/aWnL839KCO 5q/FikH8IqXRKVZWLaeT7ITnZujRAMkESLsCxNbmE0yHVO6umyI9g== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju684t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:10 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGbDv6030813; Thu, 20 Nov 2025 17:16:09 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af47y7qr2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:09 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHG5r840763846 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:05 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 821872004D; Thu, 20 Nov 2025 17:16:05 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C70472004B; Thu, 20 Nov 2025 17:16:03 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:03 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 10/23] KVM: s390: KVM page table management functions: walks Date: Thu, 20 Nov 2025 18:15:31 +0100 Message-ID: <20251120171544.96841-11-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AHShuXbgTES0yt9jznvFGmj4j1Ucv8zA X-Proofpoint-ORIG-GUID: AHShuXbgTES0yt9jznvFGmj4j1Ucv8zA X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXwkaUARxGfRX4 EUOmMWCN+eBaY90psgS4Gh4/wQ7U6djvFi7Konnnw/gNqwyvE5hhPIVWz2vh95kPO5GjseFUEgR sy10K0VpGJMvzhyZrCwdTDCokEY7K3kovs18oBg8M/741ZcavvQY1Ea9E32V3931Yhri2IdW6eM 3si121PzhKGwm5/4QDkTpITHf7ZQ/8HqBhHlwGkOGlPQ5wXVDmEACbOZXHw88m8jcmn8CU8PY+A 5MdPrlCN3RQa5Mley/tViMV9xiB2QnZdIe6Ilzu0BORsvPHjg5C59wujTYn7SVQQeqiPACg0Xgz TUMdQdQ+MqVGtEKGah8ciD8t1GyuEVf36+/3EqfEmZ+e1hPlyaRBYGpGcidgv/adjd6o38o9jW5 7yG57EodG8yigui+Ukg7OmeVgSgTug== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4cda cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=FjFqaoouhZgXAMKSwZQA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to walk to specific table entries, or to perform actions on a range of entries. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 383 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 39 +++++ 2 files changed, 422 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index a9d5b49ac411..3b74bf5463f4 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -219,3 +219,386 @@ union pgste __dat_ptep_xchg(union pte *ptep, union pg= ste pgste, union pte new, g WRITE_ONCE(*ptep, new); return pgste; } + +/* + * dat_split_ste - Split a segment table entry into page table entries + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: 0 in case of success, -ENOMEM if running out of memory. + */ +static int dat_split_ste(struct kvm_s390_mmu_cache *mc, union pmd *pmdp, g= fn_t gfn, + union asce asce, bool uses_skeys) +{ + union pgste pgste_init; + struct page_table *pt; + union pmd new, old; + union pte init; + int i; + + BUG_ON(!mc); + old =3D READ_ONCE(*pmdp); + + /* Already split, nothing to do */ + if (!old.h.i && !old.h.fc) + return 0; + + pt =3D dat_alloc_pt_noinit(mc); + if (!pt) + return -ENOMEM; + new.val =3D virt_to_phys(pt); + + while (old.h.i || old.h.fc) { + init.val =3D pmd_origin_large(old); + init.h.p =3D old.h.p; + init.h.i =3D old.h.i; + init.s.d =3D old.s.fc1.d; + init.s.w =3D old.s.fc1.w; + init.s.y =3D old.s.fc1.y; + init.s.sd =3D old.s.fc1.sd; + init.s.pr =3D old.s.fc1.pr; + pgste_init.val =3D 0; + if (old.h.fc) { + for (i =3D 0; i < _PAGE_ENTRIES; i++) + pt->ptes[i].val =3D init.val | i * PAGE_SIZE; + /* no need to take locks as the page table is not installed yet */ + pgste_init.prefix_notif =3D old.s.fc1.prefix_notif; + pgste_init.pcl =3D uses_skeys && init.h.i; + dat_init_pgstes(pt, pgste_init.val); + } else { + dat_init_page_table(pt, init.val, 0); + } + + if (dat_pmdp_xchg_atomic(pmdp, old, new, gfn, asce)) { + if (!pgste_init.pcl) + return 0; + for (i =3D 0; i < _PAGE_ENTRIES; i++) { + union pgste pgste =3D pt->pgstes[i]; + + pgste =3D dat_save_storage_key_into_pgste(pt->ptes[i], pgste); + pgste_set_unlock(pt->ptes + i, pgste); + } + return 0; + } + old =3D READ_ONCE(*pmdp); + } + + dat_free_pt(pt); + return 0; +} + +/* + * dat_split_crste - Split a crste into smaller crstes + * + * Context: This function is assumed to be called with kvm->mmu_lock held. + * + * Return: 0 in case of success, -ENOMEM if running out of memory. + */ +static int dat_split_crste(struct kvm_s390_mmu_cache *mc, union crste *crs= tep, + gfn_t gfn, union asce asce, bool uses_skeys) +{ + struct crst_table *table; + union crste old, new, init; + int i; + + old =3D READ_ONCE(*crstep); + if (is_pmd(old)) + return dat_split_ste(mc, &crstep->pmd, gfn, asce, uses_skeys); + + BUG_ON(!mc); + + /* Already split, nothing to do */ + if (!old.h.i && !old.h.fc) + return 0; + + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + + new.val =3D virt_to_phys(table); + new.h.tt =3D old.h.tt; + new.h.fc0.tl =3D _REGION_ENTRY_LENGTH; + + while (old.h.i || old.h.fc) { + init =3D old; + init.h.tt--; + if (old.h.fc) { + for (i =3D 0; i < _CRST_ENTRIES; i++) + table->crstes[i].val =3D init.val | i * HPAGE_SIZE; + } else { + crst_table_init((void *)table, init.val); + } + if (dat_crstep_xchg_atomic(crstep, old, new, gfn, asce)) + return 0; + old =3D READ_ONCE(*crstep); + } + + dat_free_crst(table); + return 0; +} + +/** + * dat_entry_walk() - walk the gmap page tables + * @gfn: guest frame + * @asce: the ASCE of the address space + * @flags: flags from WALK_* macros + * @level: level to walk to, from LEVEL_* macros + * @last: will be filled the last visited non-pte DAT entry + * @ptepp: will be filled the last visited pte entry, if any, otherwise NU= LL + * + * Returns a table entry pointer for the given guest address and @level + * + * The @flags have the following meanings: + * * @DAT_WALK_IGN_HOLES: consider holes as normal table entries + * * @DAT_WALK_ALLOC: allocate new tables to reach the requested level, if= needed + * * @DAT_WALK_SPLIT: split existing large pages to reach the requested le= vel, if needed + * * @DAT_WALK_LEAF: return successfully whenever a large page is encounte= red + * * @DAT_WALK_ANY: return successfully even if the requested level could = not be reached + * * @DAT_WALK_CONTINUE: walk to the requested level with the specified fl= ags, and then try to + * continue walking to ptes with only DAT_WALK_ANY + * + * Context: called with kvm->mmu_lock held. + * + * Return: + * * PGM_ADDRESSING if the requested address lies outside memory + * * a PIC number if the requested address lies in a memory hole of type _= DAT_TOKEN_PIC + * * -EFAULT if the requested address lies inside a memory hole of a diffe= rent type + * * -EINVAL if the given ASCE is not compatible with the requested level + * * -EFBIG if the requested level could not be reached because a larger f= rame was found + * * -ENOENT if the requested level could not be reached for other reasons + * * -ENOMEM if running out of memory while allocating or splitting a table + */ +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp) +{ + union vaddress vaddr =3D { .addr =3D gfn_to_gpa(gfn) }; + bool continue_anyway =3D flags & DAT_WALK_CONTINUE; + bool uses_skeys =3D flags & DAT_WALK_USES_SKEYS; + bool ign_holes =3D flags & DAT_WALK_IGN_HOLES; + bool allocate =3D flags & DAT_WALK_ALLOC; + bool split =3D flags & DAT_WALK_SPLIT; + bool leaf =3D flags & DAT_WALK_LEAF; + bool any =3D flags & DAT_WALK_ANY; + struct page_table *pgtable; + struct crst_table *table; + union crste entry; + int rc; + + *last =3D NULL; + *ptepp =3D NULL; + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (WARN_ON_ONCE(unlikely(walk_level > asce.dt))) + return -EINVAL; + if (!asce_contains_gfn(asce, gfn)) + return PGM_ADDRESSING; + + table =3D dereference_asce(asce); + if (asce.dt >=3D ASCE_TYPE_REGION1) { + *last =3D table->crstes + vaddr.rfx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION1)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION1) + return 0; + if (entry.pgd.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pgd); + } + + if (asce.dt >=3D ASCE_TYPE_REGION2) { + *last =3D table->crstes + vaddr.rsx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION2)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION2) + return 0; + if (entry.p4d.h.i) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.p4d); + } + + if (asce.dt >=3D ASCE_TYPE_REGION3) { + *last =3D table->crstes + vaddr.rtx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_REGION3)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (walk_level =3D=3D TABLE_TYPE_REGION3 && + continue_anyway && !entry.pud.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_REGION3 || ((leaf || any) && entry.pud.= h.fc)) + return 0; + if (entry.pud.h.i && !entry.pud.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_SEGMENT && entry.pud.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_crste(mc, *last, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + table =3D dereference_crste(entry.pud); + } + + *last =3D table->crstes + vaddr.sx; + entry =3D READ_ONCE(**last); + if (WARN_ON_ONCE(entry.h.tt !=3D TABLE_TYPE_SEGMENT)) + return -EINVAL; + if (crste_hole(entry) && !ign_holes) + return entry.tok.type =3D=3D _DAT_TOKEN_PIC ? entry.tok.par : -EFAULT; + if (continue_anyway && !entry.pmd.h.fc && !entry.h.i) { + walk_level =3D TABLE_TYPE_PAGE_TABLE; + allocate =3D false; + } + if (walk_level =3D=3D TABLE_TYPE_SEGMENT || ((leaf || any) && entry.pmd.h= .fc)) + return 0; + + if (entry.pmd.h.i && !entry.pmd.h.fc) { + if (!allocate) + return any ? 0 : -ENOENT; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + if (walk_level <=3D TABLE_TYPE_PAGE_TABLE && entry.pmd.h.fc) { + if (!split) + return -EFBIG; + rc =3D dat_split_ste(mc, &(*last)->pmd, gfn, asce, uses_skeys); + if (rc) + return rc; + entry =3D READ_ONCE(**last); + } + pgtable =3D dereference_pmd(entry.pmd); + *ptepp =3D pgtable->ptes + vaddr.px; + if (pte_hole(**ptepp) && !ign_holes) + return (*ptepp)->tok.type =3D=3D _DAT_TOKEN_PIC ? (*ptepp)->tok.par : -E= FAULT; + return 0; +} + +static long dat_pte_walk_range(gfn_t gfn, gfn_t end, struct page_table *ta= ble, struct dat_walk *w) +{ + unsigned int idx =3D gfn & (_PAGE_ENTRIES - 1); + long rc =3D 0; + + for ( ; gfn < end; idx++, gfn++) { + if (pte_hole(READ_ONCE(table->ptes[idx]))) { + if (!(w->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(w->flags & DAT_WALK_ANY)) + continue; + } + + rc =3D w->ops->pte_entry(table->ptes + idx, gfn, gfn + 1, w); + if (rc) + break; + } + return rc; +} + +static long dat_crste_walk_range(gfn_t start, gfn_t end, struct crst_table= *table, + struct dat_walk *walk) +{ + unsigned long idx, cur_shift, cur_size; + dat_walk_op the_op; + union crste crste; + gfn_t cur, next; + long rc =3D 0; + + cur_shift =3D 8 + table->crstes[0].h.tt * 11; + idx =3D (start >> cur_shift) & (_CRST_ENTRIES - 1); + cur_size =3D 1UL << cur_shift; + + for (cur =3D ALIGN_DOWN(start, cur_size); cur < end; idx++, cur =3D next)= { + next =3D cur + cur_size; + walk->last =3D table->crstes + idx; + crste =3D READ_ONCE(*walk->last); + + if (crste_hole(crste)) { + if (!(walk->flags & DAT_WALK_IGN_HOLES)) + return -EFAULT; + if (!(walk->flags & DAT_WALK_ANY)) + continue; + } + + the_op =3D walk->ops->crste_ops[crste.h.tt]; + if (the_op) { + rc =3D the_op(walk->last, cur, next, walk); + crste =3D READ_ONCE(*walk->last); + } + if (rc) + break; + if (!crste.h.i && !crste.h.fc) { + if (!is_pmd(crste)) + rc =3D dat_crste_walk_range(max(start, cur), min(end, next), + _dereference_crste(crste), walk); + else if (walk->ops->pte_entry) + rc =3D dat_pte_walk_range(max(start, cur), min(end, next), + dereference_pmd(crste.pmd), walk); + } + } + return rc; +} + +/** + * _dat_walk_gfn_range() - walk DAT tables + * @start: the first guest page frame to walk + * @end: the guest page frame immediately after the last one to walk + * @asce: the ASCE of the guest mapping + * @ops: the gmap_walk_ops that will be used to perform the walk + * @flags: flags from WALK_* (currently only WALK_IGN_HOLES is supported) + * @priv: will be passed as-is to the callbacks + * + * Any callback returning non-zero causes the walk to stop immediately. + * + * Return: -EINVAL in case of error, -EFAULT if @start is too high for the= given + * asce unless the DAT_WALK_IGN_HOLES flag is specified, otherwise= it + * returns whatever the callbacks return. + */ +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv) +{ + struct crst_table *table =3D dereference_asce(asce); + struct dat_walk walk =3D { + .ops =3D ops, + .asce =3D asce, + .priv =3D priv, + .flags =3D flags, + .start =3D start, + .end =3D end, + }; + + if (WARN_ON_ONCE(unlikely(!asce.val))) + return -EINVAL; + if (!asce_contains_gfn(asce, start)) + return (flags & DAT_WALK_IGN_HOLES) ? 0 : -EFAULT; + + return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 6a336c3c6f62..5488bdc1a79b 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -45,6 +45,7 @@ enum { #define TABLE_TYPE_PAGE_TABLE -1 =20 enum dat_walk_flags { + DAT_WALK_USES_SKEYS =3D 0x40, DAT_WALK_CONTINUE =3D 0x20, DAT_WALK_IGN_HOLES =3D 0x10, DAT_WALK_SPLIT =3D 0x08, @@ -332,6 +333,34 @@ struct page_table { static_assert(sizeof(struct crst_table) =3D=3D _CRST_TABLE_SIZE); static_assert(sizeof(struct page_table) =3D=3D PAGE_SIZE); =20 +struct dat_walk; + +typedef long (*dat_walk_op)(union crste *crste, gfn_t gfn, gfn_t next, str= uct dat_walk *w); + +struct dat_walk_ops { + union { + dat_walk_op crste_ops[4]; + struct { + dat_walk_op pmd_entry; + dat_walk_op pud_entry; + dat_walk_op p4d_entry; + dat_walk_op pgd_entry; + }; + }; + long (*pte_entry)(union pte *pte, gfn_t gfn, gfn_t next, struct dat_walk = *w); +}; + +struct dat_walk { + const struct dat_walk_ops *ops; + union crste *last; + union pte *last_pte; + union asce asce; + gfn_t start; + gfn_t end; + int flags; + void *priv; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -436,6 +465,11 @@ bool dat_crstep_xchg_atomic(union crste *crstep, union= crste old, union crste ne union asce asce); void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, unio= n asce asce); =20 +long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce, + const struct dat_walk_ops *ops, int flags, void *priv); + +int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce as= ce, int flags, + int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); =20 @@ -834,4 +868,9 @@ static inline void dat_crstep_clear(union crste *crstep= , gfn_t gfn, union asce a dat_crstep_xchg(crstep, newcrste, gfn, asce); } =20 +static inline int get_level(union crste *crstep, union pte *ptep) +{ + return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; +} + #endif /* __KVM_S390_DAT_H */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF35E3A8D73; Thu, 20 Nov 2025 17:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658977; cv=none; b=Twy9TGCc3c0692fPfvBP6V4bRItrynHK+jc/h0BI4od3vjvxaKwTHyXz6tXwWbpuEgtzLAPG0MfCmRTmIGJqF4BOmq43t0VozBk5VUsP8L6D11g4ZGNJaxdVxzOag/HKNM+6wuK2gjewh2Is6FME507NktfYeOh2A8Tx6QVc4nI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658977; c=relaxed/simple; bh=W0kHOWCv+oST7wKm7PxlM/N6tzCFtUtYnMFimN9LSwc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qtWe0XQLstzEhrI/cre1kVJ1iWDBjwyY/5o/s9avH5Rp7Mb+zTQJU/WrUyKjEflWmIOrhYe/EGvS04uv4PXtGKsVzHCuFgB6rEO3FFqirZ4AyMqggkiY0jp+t3dHqYZdiDy3AvBqz2ejWzwnS30p7d174pbjpNf7R+lZFOJfC5k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=rlxFi356; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="rlxFi356" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDFZtK028867; Thu, 20 Nov 2025 17:16:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=k5W9QGWgpRx8qnErH 5I+p1m5vLh745HyW44364+Sq9E=; b=rlxFi356zxffYglAUVOEvDxDDRoUUKYhs l/ygXpsOIg6LI6TLO8kAUpVgRFENDe2LkLLWZpeLbc/CWnm0NixUD2kTlntbVcDs TlP7fpDTZGb97mhIEWY5amckNNu5HWi7PbpcESxbHV4vhWc52bVI5/e+Ihh7j3+W YsF3rEXAQzInVl1SIK9IyHA96Fk+I5QFIArzJOeXFogznTrpj7J1Y/hFi3NSXK4T clC/5vGF9IJl7IgcS+l1kmyMq2tTDjHqiJ5bGTT+JeqGfYMxOmKyZwIaBDywIzhp 6JOyUl3yKN4M8OHtW91tPiiI3xrC5uNOlgbu5eZSijUBZkGt03Sag== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7mws-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:12 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKF6YHJ017318; Thu, 20 Nov 2025 17:16:11 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af6j1ybpt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:11 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHG7m331326678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:07 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 75A5B2004B; Thu, 20 Nov 2025 17:16:07 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AB32920040; Thu, 20 Nov 2025 17:16:05 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:05 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 11/23] KVM: s390: KVM page table management functions: storage keys Date: Thu, 20 Nov 2025 18:15:32 +0100 Message-ID: <20251120171544.96841-12-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: JxtdXAyya57cmEKFCeCZkIbYEGT5mf0m X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX++0+Rphi+2UV CGTbDC5r8yZXW85Z7umDF3Q4W4W3hOAwxvnH2fuJtwHJ7MikIxMPMf/37v3QEOv4x91iz/RFY3P w8nYfwe7r2jqcjs9RBNh+k/jf0gUW1RHSGb04r/KutyuO6osm6RUchhS9ntv3GA5DrL84i77Ava PvYWuMcdANmQIhfWJSjRZm/o8aFLOQu7oVgmg3D+3/l5ykbI89islOFx7u94WNOCQxmvhNqAERc q8vYE5JpxF1+jJYMB0W2621aWXcdxkRvWteP6gz0IPQwHy9XlmQVzmzLi3FAZ/hB4Ema8gIPKmB DhjotCLP6FUufOWqjTi1g6ZER89uwac2Ezs1qmxWbEJDA8Hbm6P92Y9ERenukhk7eeYSEpGsqIW tdtp3dkknZEGhWh7UBHS/c2H9pdWYw== X-Proofpoint-ORIG-GUID: JxtdXAyya57cmEKFCeCZkIbYEGT5mf0m X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cdc cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=W9Pr6BDoNAxxbfFvkF8A:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions related to storage key handling. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 215 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 7 ++ 2 files changed, 222 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index 3b74bf5463f4..121f99335ae9 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -602,3 +602,218 @@ long _dat_walk_gfn_range(gfn_t start, gfn_t end, unio= n asce asce, =20 return dat_crste_walk_range(start, min(end, asce_end(asce)), table, &walk= ); } + +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int rc; + + skey->skey =3D 0; + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste; + + crste =3D READ_ONCE(*crstep); + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + skey->skey =3D page_get_storage_key(large_crste_to_phys(crste, gfn)); + return 0; + } + pgste =3D pgste_get_lock(ptep); + if (ptep->h.i) { + skey->acc =3D pgste.acc; + skey->fp =3D pgste.fp; + } else { + skey->skey =3D page_get_storage_key(pte_origin(*ptep)); + } + skey->r |=3D pgste.gr; + skey->c |=3D pgste.gc; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static void dat_update_ptep_sd(union pgste old, union pgste pgste, union p= te *ptep) +{ + if (pgste.acc !=3D old.acc || pgste.fp !=3D old.fp || pgste.gr !=3D old.g= r || pgste.gc !=3D old.gc) + __atomic64_or(_PAGE_SD, &ptep->val); +} + +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAGE= _TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + page_set_storage_key(large_crste_to_phys(*crstep, gfn), skey.skey, !nq); + return 0; + } + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + union skey old_skey; + + old_skey.skey =3D page_get_storage_key(pte_origin(*ptep)); + pgste.hc |=3D old_skey.c; + pgste.hr |=3D old_skey.r; + skey.r =3D 0; + skey.c =3D 0; + page_set_storage_key(pte_origin(*ptep), skey.skey, !nq); + } + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static bool page_cond_set_storage_key(phys_addr_t paddr, union skey skey, = union skey *oldkey, + bool nq, bool mr, bool mc) +{ + oldkey->skey =3D page_get_storage_key(paddr); + if (oldkey->acc =3D=3D skey.acc && oldkey->fp =3D=3D skey.fp && + (oldkey->r =3D=3D skey.r || mr) && (oldkey->c =3D=3D skey.c || mc)) + return false; + page_set_storage_key(paddr, skey.skey, !nq); + return true; +} + +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(mmc, gfn, asce, DAT_WALK_LEAF_ALLOC, TABLE_TYPE_PAG= E_TABLE, + &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) + return page_cond_set_storage_key(large_crste_to_phys(*crstep, gfn), skey= , oldkey, + nq, mr, mc); + + old =3D pgste_get_lock(ptep); + pgste =3D old; + + rc =3D 1; + pgste.acc =3D skey.acc; + pgste.fp =3D skey.fp; + pgste.gc =3D skey.c; + pgste.gr =3D skey.r; + + if (!ptep->h.i) { + union skey prev; + + rc =3D page_cond_set_storage_key(pte_origin(*ptep), skey, &prev, nq, mr,= mc); + pgste.hc |=3D prev.c; + pgste.hr |=3D prev.r; + if (oldkey) + *oldkey =3D prev; + } + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +int dat_reset_reference_bit(union asce asce, gfn_t gfn) +{ + union pgste pgste, old; + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, gfn, asce, DAT_WALK_ANY, TABLE_TYPE_PAGE_TABL= E, &crstep, &ptep); + if (rc) + return rc; + + if (!ptep) { + union crste crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc || !crste.s.fc1.pr) + return 0; + return page_reset_referenced(large_crste_to_phys(*crstep, gfn)); + } + old =3D pgste_get_lock(ptep); + pgste =3D old; + + if (!ptep->h.i) { + rc =3D page_reset_referenced(pte_origin(*ptep)); + pgste.hr =3D rc >> 1; + } + rc |=3D (pgste.gr << 1) | pgste.gc; + pgste.gr =3D 0; + + dat_update_ptep_sd(old, pgste, ptep); + pgste_set_unlock(ptep, pgste); + return rc; +} + +static long dat_reset_skeys_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.acc =3D 0; + pgste.fp =3D 0; + pgste.gr =3D 0; + pgste.gc =3D 0; + if (ptep->s.pr) + page_set_storage_key(pte_origin(*ptep), PAGE_DEFAULT_KEY, 1); + pgste_set_unlock(ptep, pgste); + + if (need_resched()) + return next; + return 0; +} + +static long dat_reset_skeys_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + phys_addr_t addr, end, origin =3D crste_origin_large(*crstep); + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + addr =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + while (ALIGN(addr + 1, _SEGMENT_SIZE) <=3D end) + addr =3D sske_frame(addr, PAGE_DEFAULT_KEY); + for ( ; addr < end; addr +=3D PAGE_SIZE) + page_set_storage_key(addr, PAGE_DEFAULT_KEY, 1); + + if (need_resched()) + return next; + return 0; +} + +long dat_reset_skeys(union asce asce, gfn_t start) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_reset_skeys_pte, + .pmd_entry =3D dat_reset_skeys_crste, + .pud_entry =3D dat_reset_skeys_crste, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 5488bdc1a79b..6a328a4d1cca 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -472,6 +472,13 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_= t gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); +int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, + union skey skey, bool nq); +int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce as= ce, gfn_t gfn, + union skey skey, union skey *oldkey, bool nq, bool mr, bool mc); +int dat_reset_reference_bit(union asce asce, gfn_t gfn); +long dat_reset_skeys(union asce asce, gfn_t start); =20 int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 550433242A8; Thu, 20 Nov 2025 17:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658989; cv=none; b=Q/v19KhpQwxhqG4uFwL7+AYLzmQ9zoasG7pUJjmsC6izL/484tLZQ0dY/aP6stLJyx4Di+iQhpEbHgF3qLYzmM+dejKJqsvLIp5QpFKADAmLW68Di3eU8DLyR8Nc7asmVU8dp2PgN2oBLjS5e1UolI7bNFfum8es5k0d7u6jGT0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658989; c=relaxed/simple; bh=frOc3GlofiPb8+GF66q/8kMNgYt65fuc/Qy5AK5/grc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R1jfWrVE/60E6Kfu54CvbtYU29iEkxDHrt16IXM1oBIyu4JwrPqAy8AhSOolu/BYZ/cjuSkr62QdSmt7HQy8l8YAOJ37je1ZMW1a3fDgvwJ4KhkxiLmFlGc5YSpvc3kMvredgoiELxW32wySUXk7wXYyeTo1mX3apprwUANO3cg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=DBX09igO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="DBX09igO" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCkrMa012645; Thu, 20 Nov 2025 17:16:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=cQCS8NK26EJnVSD0/ ym6CIGp8VhdXVVEIkFoCaOKY3w=; b=DBX09igONUXerK5BriY0CUeqAexKGY4WJ YDtu4awVfrDP/g/W9r+TMcCLZIcg7mpArkttfSBbherInPPK6hRMJIU4ylG0Oqku 2u5to14xKI/v/R9hL+pwYjDjCN4f4z5eAhF62+RXKgoT34vQuQnnFar6EoXpyUY/ N12sg3bnk43SQG/W2ozZB1oLxAug1fEikCVcfyIbiiiD1T29FanVMqqZW9WaO5M8 1YwBJSL0yvHsLPDvcBYKERWoa831PMKb+bkV/5udX4hbyUFsj6F38ZWlm809SsGI wpwYQOWQG/6XPb8rXkkmVxKhoSPDBM0lC6KgB4FOtmiOpyEsxLtXw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejgx6agh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:13 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKF31Ph007031; Thu, 20 Nov 2025 17:16:13 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af62jqdpn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:13 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHG9I951118380 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:09 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6319920043; Thu, 20 Nov 2025 17:16:09 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AB8A62004B; Thu, 20 Nov 2025 17:16:07 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:07 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 12/23] KVM: s390: KVM page table management functions: lifecycle management Date: Thu, 20 Nov 2025 18:15:33 +0100 Message-ID: <20251120171544.96841-13-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 45NCEYisR34mt1xBunEM27ljuEh9z_9U X-Authority-Analysis: v=2.4 cv=YqwChoYX c=1 sm=1 tr=0 ts=691f4cde cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iJ2WplcNSJuwDbTYnpEA:9 X-Proofpoint-ORIG-GUID: 45NCEYisR34mt1xBunEM27ljuEh9z_9U X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXyKazIjGz8l1Z Q3w395Uihdrt1gc4nYboJlUSYSm3uGxv/q0B0NM+lzT8wfZJKsLdGTTPqIkGwLiWR3IS/2F9iF2 YaTuYu2/XvJw6R90DiIQg/awmjUkWZmwKkfRNH4waqfr8VvyYUrANPVYIUj9AkwO+cvHKWfV1Ga /Diw0si9aWDAxA4FdiQotU6TeIRTiPKDGH3w7oVUGHfCYChGs2fRm6O0+M9kS0BLYkkQk7+hzkA KPfCwQgiwY66fgIU4lLyarWYI1hJ/VIUeO/YIJMjspKdfGM3oH2ONRPkzUbdXsGX2/NI2cq+bLB QTdCErhrOCoe4rRil072qhlqysffExYYv3ntuCK/gzF4ReV1sTbs/kax8R+gQW9ghdaz0mfDvmV T/dn4MCLOmZJ33jCxZ05wDAPnn1cvQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 bulkscore=0 phishscore=0 spamscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle memslot creation and destruction, additional per-pagetable data stored in the PGSTEs, mapping physical addresses into the gmap, and marking address ranges as prefix. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 283 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 54 +++++++++ 2 files changed, 337 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index 121f99335ae9..d6b03ba58c93 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -102,6 +102,38 @@ void dat_free_level(struct crst_table *table, bool own= s_ptes) dat_free_crst(table); } =20 +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype) +{ + struct crst_table *table; + union crste crste; + + while (asce->dt > newtype) { + table =3D dereference_asce(*asce); + crste =3D table->crstes[0]; + if (crste.h.fc) + return 0; + if (!crste.h.i) { + asce->rsto =3D crste.h.fc0.to; + dat_free_crst(table); + } else { + crste.h.tt--; + crst_table_init((void *)table, crste.val); + } + asce->dt--; + } + while (asce->dt < newtype) { + crste =3D _crste_fc0(asce->rsto, asce->dt + 1); + table =3D dat_alloc_crst_noinit(mc); + if (!table) + return -ENOMEM; + crst_table_init((void *)table, _CRSTE_HOLE(crste.h.tt).val); + table->crstes[0] =3D crste; + asce->rsto =3D __pa(table) >> PAGE_SHIFT; + asce->dt++; + } + return 0; +} + /** * dat_crstep_xchg - exchange a gmap CRSTE with another * @crstep: pointer to the CRST entry @@ -817,3 +849,254 @@ long dat_reset_skeys(union asce asce, gfn_t start) =20 return _dat_walk_gfn_range(start, asce_end(asce), asce, &ops, DAT_WALK_IG= N_HOLES, NULL); } + +struct slot_priv { + unsigned long token; + struct kvm_s390_mmu_cache *mc; +}; + +static long _dat_slot_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct d= at_walk *walk) +{ + struct slot_priv *p =3D walk->priv; + union crste dummy =3D { .val =3D p->token }; + union pte new_pte, pte =3D READ_ONCE(*ptep); + + new_pte =3D _PTE_TOK(dummy.tok.type, dummy.tok.par); + + /* Table entry already in the desired state */ + if (pte.val =3D=3D new_pte.val) + return 0; + + dat_ptep_xchg(ptep, new_pte, gfn, walk->asce, false); + return 0; +} + +static long _dat_slot_crste(union crste *crstep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + union crste new_crste, crste =3D READ_ONCE(*crstep); + struct slot_priv *p =3D walk->priv; + + new_crste.val =3D p->token; + new_crste.h.tt =3D crste.h.tt; + + /* Table entry already in the desired state */ + if (crste.val =3D=3D new_crste.val) + return 0; + + /* This table entry needs to be updated */ + if (walk->start <=3D gfn && walk->end >=3D next) { + dat_crstep_xchg_atomic(crstep, crste, new_crste, gfn, walk->asce); + /* A lower level table was present, needs to be freed */ + if (!crste.h.fc && !crste.h.i) + dat_free_level(dereference_crste(crste), true); + return 0; + } + + /* A lower level table is present, things will handled there */ + if (!crste.h.fc && !crste.h.i) + return 0; + /* Split (install a lower level table), and handle things there */ + return dat_split_crste(p->mc, crstep, gfn, walk->asce, false); +} + +static const struct dat_walk_ops dat_slot_ops =3D { + .pte_entry =3D _dat_slot_pte, + .crste_ops =3D { _dat_slot_crste, _dat_slot_crste, _dat_slot_crste, _dat_= slot_crste, }, +}; + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param) +{ + struct slot_priv priv =3D { + .token =3D _CRSTE_TOK(0, type, param).val, + .mc =3D mc, + }; + + return _dat_walk_gfn_range(start, end, asce, &dat_slot_ops, + DAT_WALK_IGN_HOLES | DAT_WALK_ANY, &priv); +} + +static void pgste_set_unlock_multiple(union pte *first, int n, union pgste= *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgstes[i].pcl) + break; + pgste_set_unlock(first + i, pgstes[i]); + } +} + +static bool pgste_get_trylock_multiple(union pte *first, int n, union pgst= e *pgstes) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!pgste_get_trylock(first + i, pgstes + i)) + break; + } + if (i =3D=3D n) + return true; + pgste_set_unlock_multiple(first, n, pgstes); + return false; +} + +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram) +{ + union pgste pgstes[4] =3D {}; + unsigned long res =3D 0; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D 0; i < n; i++) + res =3D res << 16 | pgstes[i].val16; + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); + return res; +} + +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val) +{ + union pgste pgstes[4] =3D {}; + int i, n; + + n =3D param.len + 1; + + while (!pgste_get_trylock_multiple(table->ptes + param.offset, n, pgstes)) + cpu_relax(); + + for (i =3D param.len; i >=3D 0; i--) { + pgstes[i].val16 =3D val; + val =3D val >> 16; + } + + pgste_set_unlock_multiple(table->ptes + param.offset, n, pgstes); +} + +static long _dat_test_young_pte(union pte *ptep, gfn_t start, gfn_t end, s= truct dat_walk *walk) +{ + return ptep->s.y; +} + +static long _dat_test_young_crste(union crste *crstep, gfn_t start, gfn_t = end, + struct dat_walk *walk) +{ + return crstep->h.fc && crstep->s.fc1.y; +} + +static const struct dat_walk_ops test_age_ops =3D { + .pte_entry =3D _dat_test_young_pte, + .pmd_entry =3D _dat_test_young_crste, + .pud_entry =3D _dat_test_young_crste, +}; + +/** + * dat_test_age_gfn() - test young + * @kvm: the kvm instance + * @range: the range of guest addresses whose young status needs to be cle= ared + * + * Context: called by KVM common code with the kvm mmu write lock held + * Return: 1 if any page in the given range is young, otherwise 0. + */ +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end) +{ + return _dat_walk_gfn_range(start, end, asce, &test_age_ops, 0, NULL) > 0; +} + +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f) +{ + union crste oldval, newval; + union pte newpte, oldpte; + union pgste pgste; + int rc =3D 0; + + rc =3D dat_entry_walk(mc, f->gfn, asce, DAT_WALK_ALLOC_CONTINUE, level, &= f->crstep, &f->ptep); + if (rc =3D=3D -EINVAL || rc =3D=3D -ENOMEM) + return rc; + if (rc) + return -EAGAIN; + + if (WARN_ON_ONCE(unlikely(get_level(f->crstep, f->ptep) > level))) + return -EINVAL; + + if (f->ptep) { + pgste =3D pgste_get_lock(f->ptep); + oldpte =3D *f->ptep; + newpte =3D _pte(f->pfn, f->writable, f->write_attempt | oldpte.s.d, !f->= page); + newpte.s.sd =3D oldpte.s.sd; + oldpte.s.sd =3D 0; + if (oldpte.val =3D=3D _PTE_EMPTY.val || oldpte.h.pfra =3D=3D f->pfn) { + pgste =3D __dat_ptep_xchg(f->ptep, pgste, newpte, f->gfn, asce, uses_sk= eys); + if (f->callback) + f->callback(f); + } else { + rc =3D -EAGAIN; + } + pgste_set_unlock(f->ptep, pgste); + } else { + oldval =3D READ_ONCE(*f->crstep); + newval =3D _crste_fc1(f->pfn, oldval.h.tt, f->writable, + f->write_attempt | oldval.s.fc1.d); + newval.s.fc1.sd =3D oldval.s.fc1.sd; + if (oldval.val !=3D _CRSTE_EMPTY(oldval.h.tt).val && + crste_origin_large(oldval) !=3D crste_origin_large(newval)) + return -EAGAIN; + if (!dat_crstep_xchg_atomic(f->crstep, oldval, newval, f->gfn, asce)) + return -EAGAIN; + if (f->callback) + f->callback(f); + } + + return rc; +} + +static long dat_set_pn_crste(union crste *crstep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + union crste crste =3D READ_ONCE(*crstep); + int *n =3D walk->priv; + + if (!crste.h.fc || crste.h.i || crste.h.p) + return 0; + + *n =3D 2; + if (crste.s.fc1.prefix_notif) + return 0; + crste.s.fc1.prefix_notif =3D 1; + dat_crstep_xchg(crstep, crste, gfn, walk->asce); + return 0; +} + +static long dat_set_pn_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct = dat_walk *walk) +{ + int *n =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (!ptep->h.i && !ptep->h.p) { + pgste.prefix_notif =3D 1; + *n +=3D 1; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn) +{ + static const struct dat_walk_ops ops =3D { + .pte_entry =3D dat_set_pn_pte, + .pmd_entry =3D dat_set_pn_crste, + .pud_entry =3D dat_set_pn_crste, + }; + + int n =3D 0; + + _dat_walk_gfn_range(gfn, gfn + 2, asce, &ops, DAT_WALK_IGN_HOLES, &n); + if (n !=3D 2) + return -EAGAIN; + return 0; +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 6a328a4d1cca..c8df33f95160 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -361,6 +361,11 @@ struct dat_walk { void *priv; }; =20 +struct ptval_param { + unsigned char offset : 6; + unsigned char len : 2; +}; + /** * _pte() - Useful constructor for union pte * @pfn: the pfn this pte should point to. @@ -459,6 +464,32 @@ struct kvm_s390_mmu_cache { short int n_rmaps; }; =20 +struct guest_fault { + gfn_t gfn; /* Guest frame */ + kvm_pfn_t pfn; /* Host PFN */ + struct page *page; /* Host page */ + union pte *ptep; /* Used to resolve the fault, or NULL */ + union crste *crstep; /* Used to resolve the fault, or NULL */ + bool writable; /* Mapping is writable */ + bool write_attempt; /* Write access attempted */ + bool attempt_pfault; /* Attempt a pfault first */ + bool valid; /* This entry contains valid data */ + void (*callback)(struct guest_fault *f); + void *priv; +}; + +/* + * 0 1 2 3 4 5 6 7 + * +-------+-------+-------+-------+-------+-------+-------+-------+ + * 0 | | PGT_ADDR | + * 8 | VMADDR | | + * 16 | | + * 24 | | + */ +#define MKPTVAL(o, l) ((struct ptval_param) { .offset =3D (o), .len =3D ((= l) + 1) / 2 - 1}) +#define PTVAL_PGT_ADDR MKPTVAL(4, 8) +#define PTVAL_VMADDR MKPTVAL(8, 6) + union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgst= e, union pte new, gfn_t gfn, union asce asce, bool uses_skeys); bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union cr= ste new, gfn_t gfn, @@ -472,6 +503,7 @@ int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t= gfn, union asce asce, in int walk_level, union crste **last, union pte **ptepp); void dat_free_level(struct crst_table *table, bool owns_ptes); struct crst_table *dat_alloc_crst_sleepable(unsigned long init); +int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, in= t newtype); int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey); int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gf= n_t gfn, union skey skey, bool nq); @@ -480,6 +512,16 @@ int dat_cond_set_storage_key(struct kvm_s390_mmu_cache= *mmc, union asce asce, gf int dat_reset_reference_bit(union asce asce, gfn_t gfn); long dat_reset_skeys(union asce asce, gfn_t start); =20 +unsigned long dat_get_ptval(struct page_table *table, struct ptval_param p= aram); +void dat_set_ptval(struct page_table *table, struct ptval_param param, uns= igned long val); + +int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t sta= rt, gfn_t end, + u16 type, u16 param); +int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn); +bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end); +int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, + bool uses_skeys, struct guest_fault *f); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) @@ -880,4 +922,16 @@ static inline int get_level(union crste *crstep, union= pte *ptep) return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt; } =20 +static inline int dat_delete_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_PIC, PGM_= ADDRESSING); +} + +static inline int dat_create_slot(struct kvm_s390_mmu_cache *mc, union asc= e asce, gfn_t start, + unsigned long npages) +{ + return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_NONE, 0); +} + #endif /* __KVM_S390_DAT_H */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 462FF3A9BE9; Thu, 20 Nov 2025 17:16:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658980; cv=none; b=LTjCUHTYcZIvkSOwYqq05CRryql/VGGjq9cYy7xhXLFUBqyWoTIPwN9pbwnwEutxOrAL8b9cx3CIH+Cz+stYjXgyiMCpnjQE7udH5qT1OJKfkv6IH7jXa82NbO2rffBwIZEbg2MDZS7gtcpggfT79515/L5RqlEEAs0jBW5EaVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658980; c=relaxed/simple; bh=up24uzi92wniVe0oZiNVa7yr9lo8md0w9iqib6KQSts=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BMuxX2r2OsRwjBZUIOO3YO8pbAOdsrn1gCN2kM/YMnGz7I48ClUFUZzDcaDZngJwzyTd1a2y6xRBmykn1JEl5UtiMTxGlSEZ9q4qCIpDYowG0LlwYH9Qk37kjcQ0uha47LRb7AEezPC/aVDXQ1LqXZAk8lM3A81GTMsR7D+uHcQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=RJXMHgxa; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="RJXMHgxa" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDbT3u002688; Thu, 20 Nov 2025 17:16:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=k3QDu3p5g30wFYhxk nJJrwMZhiIueD71+gxGCNq065Q=; b=RJXMHgxaxIh1LT6V0oN+fzt5nZIEGi5Ro +dqvTnjvbMCFO6B9PT5F18R12zSY98WvaqPIO7SV8x2GV98PqkE+bsqBWoduUSmj +1FP+ACnGifWRaizUX5tYXUV+uIbhku2ywCAYVNHyboAT64H0NLTurtlVEU2vv+S gb55zYuJPz+CD+nBrezmrto9b1Y9/lDEOXvX85hsWXid+mdHurzyAPDG/b11QHMO lGDvMkW4Qy1AQ26S7CDjXNZKjKKtkwkm5CoTr+mClocDx7yM+1ZDZFvJOB2qY0pb YpP3YpSSqAF9fFOoaX429Yq+KwrLmrE2z2rSuDRU21rOhXkGglCiw== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju685x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:15 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEotn7017404; Thu, 20 Nov 2025 17:16:14 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af6j1ybqj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:14 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGB8524642220 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:11 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2686720040; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8BAA72004B; Thu, 20 Nov 2025 17:16:09 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:09 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 13/23] KVM: s390: KVM page table management functions: CMMA Date: Thu, 20 Nov 2025 18:15:34 +0100 Message-ID: <20251120171544.96841-14-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: l3SZ3bMAdq1PUdQtIMWgeT_tY8Q8-h4S X-Proofpoint-ORIG-GUID: l3SZ3bMAdq1PUdQtIMWgeT_tY8Q8-h4S X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX5Gylqtd2TJPX xKu0lxJO3gnNzW6WJdfmwb/B5SDowzWUo7recifNrXOHebAYTPen07qt+z/2zeKNC4DPN3peDGU +ZKmO32OUU3LBeRMbrLwGlswhivcDZBmprVC7yF8l0FaeAIH2P8Z0UVjrYFb8W54Shphbho0Icf nSyIAkFG8w7FYHA9dzL05PtzpwzNTyhhDt/tQ3kebDuGiDi683vDLulc6yuuKs1Bf+WfwIaGHk7 bPF9c/70QgSG+QKl3Z9QmJ1FzsRuoUjMBo/6ezeetcId5L/ttTTLHWKdinWNbuJiDyX/WWI/2Sn HjkCQuPTUJzNAC/IO1sFLJoDgfRdTFpEehNmRDtfO6J7s/t9DS2NNS4DPbgbbtcb64WsZRvG8KF 521obpjeTC+5LWU4NioW9TItmqiL6Q== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4cdf cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=XQIW6j_J4to4VNKvMa8A:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add page table management functions to be used for KVM guest (gmap) page tables. This patch adds functions to handle CMMA and the ESSA instruction. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/dat.c | 262 ++++++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/dat.h | 27 +++++ 2 files changed, 289 insertions(+) diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c index d6b03ba58c93..d31d059e9996 100644 --- a/arch/s390/kvm/dat.c +++ b/arch/s390/kvm/dat.c @@ -1100,3 +1100,265 @@ int dat_set_prefix_notif_bit(union asce asce, gfn_t= gfn) return -EAGAIN; return 0; } + +/** + * dat_perform_essa() - perform ESSA actions on the PGSTE. + * @asce: the asce to operate on. + * @gfn: the guest page frame to operate on. + * @orc: the specific action to perform, see the ESSA_SET_* macros. + * @state: the storage attributes to be returned to the guest. + * @dirty: returns whether the function dirtied a previously clean entry. + * + * Context: Called with kvm->mmu_lock held. + * + * Return: + * * 1 if the page state has been altered and the page is to be added to t= he CBRL + * * 0 if the page state has been altered, but the page is not to be added= to the CBRL + * * -1 if the page state has not been altered and the page is not to be a= dded to the CBRL + */ +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int res =3D 0; + + if (dat_entry_walk(NULL, gfn, asce, 0, TABLE_TYPE_PAGE_TABLE, &crstep, &p= tep)) { + *state =3D (union essa_state) { .exception =3D 1 }; + return -1; + } + + pgste =3D pgste_get_lock(ptep); + + *state =3D (union essa_state) { + .content =3D (ptep->h.i << 1) + (ptep->h.i && pgste.zero), + .nodat =3D pgste.nodat, + .usage =3D pgste.usage, + }; + + switch (orc) { + case ESSA_GET_STATE: + res =3D -1; + break; + case ESSA_SET_STABLE: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 0; + break; + case ESSA_SET_UNUSED: + pgste.usage =3D PGSTE_GPS_USAGE_UNUSED; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_VOLATILE: + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + if (ptep->h.i) + res =3D 1; + break; + case ESSA_SET_POT_VOLATILE: + if (!ptep->h.i) { + pgste.usage =3D PGSTE_GPS_USAGE_POT_VOLATILE; + } else if (pgste.zero) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + } else if (!pgste.gc) { + pgste.usage =3D PGSTE_GPS_USAGE_VOLATILE; + res =3D 1; + } + break; + case ESSA_SET_STABLE_RESIDENT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + /* + * Since the resident state can go away any time after this + * call, we will not make this page resident. We can revisit + * this decision if a guest will ever start using this. + */ + break; + case ESSA_SET_STABLE_IF_RESIDENT: + if (!ptep->h.i) + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + break; + case ESSA_SET_STABLE_NODAT: + pgste.usage =3D PGSTE_GPS_USAGE_STABLE; + pgste.nodat =3D 1; + break; + default: + WARN_ONCE(1, "Invalid ORC!"); + res =3D -1; + break; + } + /* If we are discarding a page, set it to logical zero */ + pgste.zero =3D res =3D=3D 1; + if (orc > 0) { + *dirty =3D !pgste.cmma_d; + pgste.cmma_d =3D 1; + } + + pgste_set_unlock(ptep, pgste); + + return res; +} + +static long dat_reset_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D 0; + pgste.nodat =3D 0; + pgste.cmma_d =3D 0; + pgste_set_unlock(ptep, pgste); + if (need_resched()) + return next; + return 0; +} + +long dat_reset_cmma(union asce asce, gfn_t start) +{ + const struct dat_walk_ops dat_reset_cmma_ops =3D { + .pte_entry =3D dat_reset_cmma_pte, + }; + + return _dat_walk_gfn_range(start, asce_end(asce), asce, &dat_reset_cmma_o= ps, + DAT_WALK_IGN_HOLES, NULL); +} + +struct dat_get_cmma_state { + gfn_t start; + gfn_t end; + unsigned int count; + u8 *values; + atomic64_t *remaining; +}; + +static long __dat_peek_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, st= ruct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + state->values[gfn - walk->start] =3D pgste.usage | (pgste.nodat << 6); + pgste_set_unlock(ptep, pgste); + state->end =3D next; + + return 0; +} + +static long __dat_peek_cmma_crste(union crste *crstep, gfn_t gfn, gfn_t ne= xt, struct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + + if (crstep->h.i) + state->end =3D min(walk->end, next); + return 0; +} + +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D __dat_peek_cmma_pte, + .pmd_entry =3D __dat_peek_cmma_crste, + .pud_entry =3D __dat_peek_cmma_crste, + .p4d_entry =3D __dat_peek_cmma_crste, + .pgd_entry =3D __dat_peek_cmma_crste, + }; + struct dat_get_cmma_state state =3D { .values =3D values, }; + int rc; + + rc =3D _dat_walk_gfn_range(start, start + *count, asce, &ops, DAT_WALK_DE= FAULT, &state); + *count =3D state.end - start; + /* Return success if at least one value was saved, otherwise an error. */ + return (rc =3D=3D -EFAULT && *count > 0) ? 0 : rc; +} + +static long __dat_get_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_get_cmma_state *state =3D walk->priv; + union pgste pgste; + + if (state->start !=3D -1) { + if ((gfn - state->end) > KVM_S390_MAX_BIT_DISTANCE) + return 1; + if (gfn - state->start >=3D state->count) + return 1; + } + + if (!READ_ONCE(*pgste_of(ptep)).cmma_d) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (pgste.cmma_d) { + if (state->start =3D=3D -1) + state->start =3D gfn; + pgste.cmma_d =3D 0; + atomic64_dec(state->remaining); + state->values[gfn - state->start] =3D pgste.usage | pgste.nodat << 6; + state->end =3D next; + } + pgste_set_unlock(ptep, pgste); + return 0; +} + +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_get_cmma_pte, }; + struct dat_get_cmma_state state =3D { + .remaining =3D rem, + .values =3D values, + .count =3D *count, + .start =3D -1, + }; + + _dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLE= S, &state); + + if (state.start =3D=3D -1) { + *count =3D 0; + } else { + *count =3D state.end - state.start; + *start =3D state.start; + } + + return 0; +} + +struct dat_set_cmma_state { + unsigned long mask; + const u8 *bits; +}; + +static long __dat_set_cmma_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + struct dat_set_cmma_state *state =3D walk->priv; + union pgste pgste, tmp; + + tmp.val =3D (state->bits[gfn - walk->start] << 24) & state->mask; + + pgste =3D pgste_get_lock(ptep); + pgste.usage =3D tmp.usage; + pgste.nodat =3D tmp.nodat; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +/* + * This function sets the CMMA attributes for the given pages. If the input + * buffer has zero length, no action is taken, otherwise the attributes are + * set and the mm->context.uses_cmm flag is set. + */ +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __dat_set_cmma_pte, }; + struct dat_set_cmma_state state =3D { .mask =3D mask, .bits =3D bits, }; + union crste *crstep; + union pte *ptep; + gfn_t cur; + int rc; + + for (cur =3D ALIGN_DOWN(gfn, _PAGE_ENTRIES); cur < gfn + count; cur +=3D = _PAGE_ENTRIES) { + rc =3D dat_entry_walk(mc, cur, asce, DAT_WALK_ALLOC, TABLE_TYPE_PAGE_TAB= LE, + &crstep, &ptep); + if (rc) + return rc; + } + return _dat_walk_gfn_range(gfn, gfn + count, asce, &ops, DAT_WALK_IGN_HOL= ES, &state); +} diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index c8df33f95160..4190a54224c0 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -17,6 +17,15 @@ #include #include =20 +/* + * Base address and length must be sent at the start of each block, theref= ore + * it's cheaper to send some clean data, as long as it's less than the siz= e of + * two longs. + */ +#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) +/* For consistency */ +#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) + #define _ASCE(x) ((union asce) { .val =3D (x), }) #define NULL_ASCE _ASCE(0) =20 @@ -433,6 +442,17 @@ static inline union crste _crste_fc1(kvm_pfn_t pfn, in= t tt, bool writable, bool return res; } =20 +union essa_state { + unsigned char val; + struct { + unsigned char : 2; + unsigned char nodat : 1; + unsigned char exception : 1; + unsigned char usage : 2; + unsigned char content : 2; + }; +}; + /** * struct vsie_rmap - reverse mapping for shadow page table entries * @next: pointer to next rmap in the list @@ -522,6 +542,13 @@ bool dat_test_age_gfn(union asce asce, gfn_t start, gf= n_t end); int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level, bool uses_skeys, struct guest_fault *f); =20 +int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state= *state, bool *dirty); +long dat_reset_cmma(union asce asce, gfn_t start_gfn); +int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *v= alues); +int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *v= alues, atomic64_t *rem); +int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_= t gfn, + unsigned long count, unsigned long mask, const uint8_t *bits); + int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc); =20 #define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN) --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD7E0368E19; Thu, 20 Nov 2025 17:16:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658993; cv=none; b=s1tF7iZLQE2JzVY1jrpVzE/fQ1sn+f5TQ8+CDWAfRctegAb5x0jIwuJDdGj8JYgyM1MHDPKzteH7qh9DwU+pie86w6/30SE1CLjSpHu03RA+g5FjvWCvLcuadWtnHB92Xvgx8492HA6zy55Mb2PE2exGGx6eqAENCf3W/FbeluI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658993; c=relaxed/simple; bh=ln+bh5U3MAH/Hmqd+cn8RAgNSBwIixhjSCtvVS8Hb7w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fRFwCdVQj794sTZRithda/N0UcynLC1sMHtSt6QGd9FIirdFUnTMn0v9QtHl33jVR8kfWLZ5aT1C3l6PHDExZAm63gfQqFfiUOcsUcTP6TmJkbbBT6vWELt3lth3Erh1gsKyxwcu6mBhIsyfAc635ODNOQJjQ0VjVVv2/KOs9Ts= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=re650c6W; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="re650c6W" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCk5xN007069; Thu, 20 Nov 2025 17:16:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=91tpwFi8Ap7xuDZ5b DactEORgKv7T9E08OKMu3A67Qk=; b=re650c6W4ZkYhch+a5OlmSiSNZ1YHyXvc lO2M/TYqeLViwG/x4+oyqmq76B1EKHARGiWKcULC6AM71J/K88N5LNGgPuIptICZ WD4s9T4ntyr0p7E9Pmzom0+UMIFeATD0AkcO2vUiAYxqtl4EqFS8u3EaQgdnDxhj 7DJZqPSPu3FVXXCSrtuIeoHffmvvI8IgmBD3P4Ff04p9ZKV5kXTVOPDiytn9PSkJ xCUXX5u0fQFU0hPxFz3wwKvwdv/iA00K3NcZUqCrojRdKvVUQsOPvgG++6LTg6jp bS8/H56rwgZnNtCaZ6wvVWDAkS0ijDRF/nKunH0orcx+c4DRZN+Xw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejmsxbqf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:18 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKE8luX005133; Thu, 20 Nov 2025 17:16:17 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af5bkfm2x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:17 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGDp860883374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:13 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1BC2B2004D; Thu, 20 Nov 2025 17:16:13 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53CAE2004B; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:11 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 14/23] KVM: s390: New gmap code Date: Thu, 20 Nov 2025 18:15:35 +0100 Message-ID: <20251120171544.96841-15-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 5_k6svsnudG5Jv-eLSbEAE6_yk6puhqv X-Authority-Analysis: v=2.4 cv=Rv3I7SmK c=1 sm=1 tr=0 ts=691f4ce2 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=a_3Y2duSdAakELC2MvUA:9 a=xtKt4m9Umsf5W5BQ:21 X-Proofpoint-GUID: 5_k6svsnudG5Jv-eLSbEAE6_yk6puhqv X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX/OIRWMbDyFmp +dyyuCkpWBsSBSwizYuCF5KLkAlg8E3+EqBwDczfCR/64DrBqMM8hg/UHDiQblA7QKeTMpwFhbY gx5/DT6HANHH6LelYltzX8XgKWlxn9CK07yuaLuAxvo57M+m7HI/+tLXNd00I/hiRFCCiul89hz XZdqkFZwMrjC0qsm1yTil/6UMk4O4bPj1pWLlzDKpyKhjZrY0OUP1eWje/nomb2Juym8jOp3rSy fWpGlrWXTJjWZz4J3nIk1g6jqHDLFW3aF4D220vxylDsEsbLcRa4Mft5n5ktcpkQOtcAkl1/k0m VWXovMnREe7hBTKbm7hIV08UgeJ9wupxSufjz9L5UJS+0TzGfAGHFgaKXV+jmn9p0KOneH0IaUh kM+VuAty6s1cdrMe654rYmYaZjZp+g== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" New gmap (guest map) code. This new gmap code will only be used by KVM. This will replace the existing gmap. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/gmap.c | 1062 ++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/gmap.h | 165 +++++++ 3 files changed, 1228 insertions(+), 1 deletion(-) create mode 100644 arch/s390/kvm/gmap.c create mode 100644 arch/s390/kvm/gmap.h diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 84315d2f75fb..21088265402c 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o +kvm-y +=3D dat.o gmap.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c new file mode 100644 index 000000000000..29ce8df697dd --- /dev/null +++ b/arch/s390/kvm/gmap.c @@ -0,0 +1,1062 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Guest memory management for KVM/s390 + * + * Copyright IBM Corp. 2008, 2020, 2024 + * + * Author(s): Claudio Imbrenda + * Martin Schwidefsky + * David Hildenbrand + * Janosch Frank + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dat.h" +#include "gmap.h" +#include "kvm-s390.h" + +static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.sie_block->prog0c & PROG_IN_SIE; +} + +static int gmap_limit_to_type(gfn_t limit) +{ + if (!limit) + return TABLE_TYPE_REGION1; + if (limit <=3D _REGION3_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_SEGMENT; + if (limit <=3D _REGION2_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION3; + if (limit <=3D _REGION1_SIZE >> PAGE_SHIFT) + return TABLE_TYPE_REGION2; + return TABLE_TYPE_REGION1; +} + +/** + * gmap_alloc - allocate and initialize a guest address space + * @limit: maximum address of the gmap address space + * + * Returns a guest address space structure. + */ +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit) +{ + struct crst_table *table; + struct gmap *gmap; + int type; + + type =3D gmap_limit_to_type(limit); + + gmap =3D kzalloc(sizeof(*gmap), GFP_KERNEL_ACCOUNT); + if (!gmap) + return NULL; + INIT_LIST_HEAD(&gmap->children); + INIT_LIST_HEAD(&gmap->list); + INIT_LIST_HEAD(&gmap->scb_users); + INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_KVM_S390_MMU_CACHE); + spin_lock_init(&gmap->children_lock); + spin_lock_init(&gmap->host_to_rmap_lock); + + table =3D dat_alloc_crst_sleepable(_CRSTE_EMPTY(type).val); + if (!table) { + kfree(gmap); + return NULL; + } + + gmap->asce.val =3D __pa(table); + gmap->asce.dt =3D type; + gmap->asce.tl =3D _ASCE_TABLE_LENGTH; + gmap->asce.x =3D 1; + gmap->asce.p =3D 1; + gmap->asce.s =3D 1; + gmap->kvm =3D kvm; + gmap->owns_page_tables =3D 1; + + return gmap; +} + +static void gmap_add_child(struct gmap *parent, struct gmap *child) +{ + KVM_BUG_ON(parent && parent->is_ucontrol && parent->parent, parent->kvm); + KVM_BUG_ON(parent && parent->is_ucontrol && !parent->owns_page_tables, pa= rent->kvm); + lockdep_assert_held(&parent->children_lock); + + child->parent =3D parent; + child->is_ucontrol =3D parent->is_ucontrol; + child->allow_hpage_1m =3D parent->allow_hpage_1m; + if (kvm_is_ucontrol(parent->kvm)) + child->owns_page_tables =3D 0; + list_add(&child->list, &parent->children); +} + +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit) +{ + struct gmap *res; + + lockdep_assert_not_held(&parent->children_lock); + res =3D gmap_new(parent->kvm, limit); + if (res) { + scoped_guard(spinlock, &parent->children_lock) + gmap_add_child(parent, res); + } + return res; +} + +int gmap_set_limit(struct gmap *gmap, gfn_t limit) +{ + struct kvm_s390_mmu_cache *mc; + int rc, type; + + type =3D gmap_limit_to_type(limit); + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + rc =3D dat_set_asce_limit(mc, &gmap->asce, type); + } while (rc =3D=3D -ENOMEM); + + kvm_s390_free_mmu_cache(mc); + return 0; +} + +static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct radix_tree_iter iter; + unsigned long indices[16]; + unsigned long index; + void __rcu **slot; + int i, nr; + + /* A radix tree is freed by deleting all of its entries */ + index =3D 0; + do { + nr =3D 0; + radix_tree_for_each_slot(slot, root, &iter, index) { + indices[nr] =3D iter.index; + if (++nr =3D=3D 16) + break; + } + for (i =3D 0; i < nr; i++) { + index =3D indices[i]; + head =3D radix_tree_delete(root, index); + gmap_for_each_rmap_safe(rmap, rnext, head) + kfree(rmap); + } + } while (nr > 0); +} + +void gmap_remove_child(struct gmap *child) +{ + if (KVM_BUG_ON(!child->parent, child->kvm)) + return; + lockdep_assert_held(&child->parent->children_lock); + + list_del(&child->list); + child->parent =3D NULL; +} + +/** + * gmap_dispose - remove and free a guest address space and its children + * @gmap: pointer to the guest address space structure + */ +void gmap_dispose(struct gmap *gmap) +{ + /* The gmap must have been removed from the parent beforehands */ + KVM_BUG_ON(gmap->parent, gmap->kvm); + /* All children of this gmap must have been removed beforehands*/ + KVM_BUG_ON(!list_empty(&gmap->children), gmap->kvm); + /* No VSIE shadow block is allowed to use this gmap */ + KVM_BUG_ON(!list_empty(&gmap->scb_users), gmap->kvm); + KVM_BUG_ON(!gmap->asce.val, gmap->kvm); + + /* Flush tlb of all gmaps */ + asce_flush_tlb(gmap->asce); + + /* Free all DAT tables. */ + dat_free_level(dereference_asce(gmap->asce), gmap->owns_page_tables); + + /* Free additional data for a shadow gmap */ + if (gmap->is_shadow) + gmap_rmap_radix_tree_free(&gmap->host_to_rmap); + + kfree(gmap); +} + +/** + * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py + * @gmap: the gmap whose ASCE needs to be replaced + * + * If the ASCE is a SEGMENT type then this function will return -EINVAL, + * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng + * to the wrong pages, causing use-after-free and memory corruption. + * If the allocation of the new top level page table fails, the ASCE is not + * replaced. + * In any case, the old ASCE is always removed from the gmap CRST list. + * Therefore the caller has to make sure to save a pointer to it + * beforehand, unless a leak is actually intended. + */ +int s390_replace_asce(struct gmap *gmap) +{ + struct crst_table *table; + union asce asce; + + /* Replacing segment type ASCEs would cause serious issues */ + if (gmap->asce.dt =3D=3D ASCE_TYPE_SEGMENT) + return -EINVAL; + + table =3D dat_alloc_crst_sleepable(0); + if (!table) + return -ENOMEM; + memcpy(table, dereference_asce(gmap->asce), sizeof(*table)); + + /* Set new table origin while preserving existing ASCE control bits */ + asce =3D gmap->asce; + asce.rsto =3D virt_to_pfn(table); + WRITE_ONCE(gmap->asce, asce); + + return 0; +} + +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint) +{ + struct kvm *kvm =3D gmap->kvm; + struct kvm_vcpu *vcpu; + gfn_t prefix_gfn; + unsigned long i; + + if (gmap->is_shadow) + return false; + kvm_for_each_vcpu(i, vcpu, kvm) { + /* match against both prefix pages */ + prefix_gfn =3D gpa_to_gfn(kvm_s390_get_prefix(vcpu)); + if (prefix_gfn < end && gfn <=3D prefix_gfn + 1) { + if (hint && kvm_s390_is_in_sie(vcpu)) + return false; + VCPU_EVENT(vcpu, 2, "gmap notifier for %llx-%llx", + gfn_to_gpa(gfn), gfn_to_gpa(end)); + kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); + } + } + return true; +} + +struct clear_young_pte_priv { + struct gmap *gmap; + bool young; +}; + +static long gmap_clear_young_pte(union pte *ptep, gfn_t gfn, gfn_t end, st= ruct dat_walk *walk) +{ + struct clear_young_pte_priv *p =3D walk->priv; + union pgste pgste; + union pte pte, new; + + pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (!pte.s.y && pte.h.i)) + return 0; + + pgste =3D pgste_get_lock(ptep); + if (!pgste.prefix_notif || gmap_mkold_prefix(p->gmap, gfn, end)) { + new =3D pte; + new.h.i =3D 1; + new.s.y =3D 0; + if ((new.s.d || !new.h.p) && !new.s.s) + folio_set_dirty(pfn_folio(pte.h.pfra)); + new.s.d =3D 0; + new.h.p =3D 1; + + pgste.prefix_notif =3D 0; + pgste =3D __dat_ptep_xchg(ptep, pgste, new, gfn, walk->asce, p->gmap->us= es_skeys); + } + p->young =3D 1; + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long gmap_clear_young_crste(union crste *crstep, gfn_t gfn, gfn_t e= nd, struct dat_walk *walk) +{ + struct clear_young_pte_priv *priv =3D walk->priv; + union crste crste, new; + + crste =3D READ_ONCE(*crstep); + + if (!crste.h.fc) + return 0; + if (!crste.s.fc1.y && crste.h.i) + return 0; + if (!crste_prefix(crste) || gmap_mkold_prefix(priv->gmap, gfn, end)) { + new =3D crste; + new.h.i =3D 1; + new.s.fc1.y =3D 0; + new.s.fc1.prefix_notif =3D 0; + if (new.s.fc1.d || !new.h.p) + folio_set_dirty(phys_to_folio(crste_origin_large(crste))); + new.s.fc1.d =3D 0; + new.h.p =3D 1; + dat_crstep_xchg(crstep, new, gfn, walk->asce); + } + priv->young =3D 1; + return 0; +} + +/** + * gmap_age_gfn() - clear young + * @gmap: the guest gmap + * @start: the first gfn to test + * @end: the gfn after the last one to test + * + * Context: called with the kvm mmu write lock held + * Return: 1 if any page in the given range was young, otherwise 0. + */ +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D gmap_clear_young_pte, + .pmd_entry =3D gmap_clear_young_crste, + .pud_entry =3D gmap_clear_young_crste, + }; + struct clear_young_pte_priv priv =3D { + .gmap =3D gmap, + .young =3D false, + }; + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + + return priv.young; +} + +struct gmap_unmap_priv { + struct gmap *gmap; + struct kvm_memory_slot *slot; +}; + +static long _gmap_unmap_pte(union pte *ptep, gfn_t gfn, gfn_t next, struct= dat_walk *w) +{ + struct gmap_unmap_priv *priv =3D w->priv; + unsigned long vmaddr; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + if (ptep->s.pr && pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED) { + vmaddr =3D __gfn_to_hva_memslot(priv->slot, gfn); + gmap_helper_try_set_pte_unused(priv->gmap->kvm->mm, vmaddr); + } + pgste =3D gmap_ptep_xchg(priv->gmap, ptep, _PTE_EMPTY, pgste, gfn); + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long _gmap_unmap_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap_unmap_priv *priv =3D walk->priv; + + if (crstep->h.fc) + gmap_crstep_xchg(priv->gmap, crstep, _CRSTE_EMPTY(crstep->h.tt), gfn); + + return 0; +} + +/** + * gmap_unmap_gfn_range() - Unmap a range of guest addresses + * @gmap: the gmap to act on + * @start: the first gfn to unmap + * @end: the gfn after the last one to unmap + * + * Context: called with the kvm mmu write lock held + * Return: false + */ +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _gmap_unmap_pte, + .pmd_entry =3D _gmap_unmap_crste, + .pud_entry =3D _gmap_unmap_crste, + }; + struct gmap_unmap_priv priv =3D { + .gmap =3D gmap, + .slot =3D slot, + }; + + lockdep_assert_held_write(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &ops, 0, &priv); + return false; +} + +static union pgste __pte_test_and_clear_softdirty(union pte *ptep, union p= gste pgste, gfn_t gfn, + struct gmap *gmap) +{ + union pte pte =3D READ_ONCE(*ptep); + + if (!pte.s.pr || (pte.h.p && !pte.s.sd)) + return pgste; + + /* + * If this page contains one or more prefixes of vCPUS that are currently + * running, do not reset the protection, leave it marked as dirty. + */ + if (!pgste.prefix_notif || gmap_mkold_prefix(gmap, gfn, gfn + 1)) { + pte.h.p =3D 1; + pte.s.sd =3D 0; + pgste =3D gmap_ptep_xchg(gmap, ptep, pte, pgste, gfn); + } + + mark_page_dirty(gmap->kvm, gfn); + + return pgste; +} + +static long _pte_test_and_clear_softdirty(union pte *ptep, gfn_t gfn, gfn_= t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union pgste pgste; + + pgste =3D pgste_get_lock(ptep); + pgste =3D __pte_test_and_clear_softdirty(ptep, pgste, gfn, gmap); + pgste_set_unlock(ptep, pgste); + return 0; +} + +static long _crste_test_and_clear_softdirty(union crste *table, gfn_t gfn,= gfn_t end, + struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, new; + + if (fatal_signal_pending(current)) + return 1; + crste =3D READ_ONCE(*table); + if (!crste.h.fc) + return 0; + if (crste.h.p && !crste.s.fc1.sd) + return 0; + + /* + * If this large page contains one or more prefixes of vCPUs that are + * currently running, do not reset the protection, leave it marked as + * dirty. + */ + if (!crste.s.fc1.prefix_notif || gmap_mkold_prefix(gmap, gfn, end)) { + new =3D crste; + new.h.p =3D 1; + new.s.fc1.sd =3D 0; + gmap_crstep_xchg(gmap, table, new, gfn); + } + + for ( ; gfn < end; gfn++) + mark_page_dirty(gmap->kvm, gfn); + + return 0; +} + +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end) +{ + const struct dat_walk_ops walk_ops =3D { + .pte_entry =3D _pte_test_and_clear_softdirty, + .pmd_entry =3D _crste_test_and_clear_softdirty, + .pud_entry =3D _crste_test_and_clear_softdirty, + }; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + _dat_walk_gfn_range(start, end, gmap->asce, &walk_ops, 0, gmap); +} + +static int gmap_handle_minor_crste_fault(union asce asce, struct guest_fau= lt *f) +{ + union crste newcrste, oldcrste =3D READ_ONCE(*f->crstep); + + /* Somehow the crste is not large anymore, let the slow path deal with it= */ + if (!oldcrste.h.fc) + return 1; + + f->pfn =3D PHYS_PFN(large_crste_to_phys(oldcrste, f->gfn)); + f->writable =3D oldcrste.s.fc1.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldcrste.h.i && !(f->write_attempt && oldcrste.h.p)) + return 0; + + if (!f->write_attempt || oldcrste.s.fc1.w) { + f->write_attempt |=3D oldcrste.s.fc1.w && oldcrste.s.fc1.d; + newcrste =3D oldcrste; + newcrste.h.i =3D 0; + newcrste.s.fc1.y =3D 1; + if (f->write_attempt) { + newcrste.h.p =3D 0; + newcrste.s.fc1.d =3D 1; + newcrste.s.fc1.sd =3D 1; + } + if (!oldcrste.s.fc1.d && newcrste.s.fc1.d) + SetPageDirty(phys_to_page(crste_origin_large(newcrste))); + /* In case of races, let the slow path deal with it */ + return !dat_crstep_xchg_atomic(f->crstep, oldcrste, newcrste, f->gfn, as= ce); + } + /* Trying to write on a read-only page, let the slow path deal with it */ + return 1; +} + +static int _gmap_handle_minor_pte_fault(struct gmap *gmap, union pgste *pg= ste, + struct guest_fault *f) +{ + union pte newpte, oldpte =3D READ_ONCE(*f->ptep); + + f->pfn =3D oldpte.h.pfra; + f->writable =3D oldpte.s.w; + + /* Appropriate permissions already (race with another handler), nothing t= o do */ + if (!oldpte.h.i && !(f->write_attempt && oldpte.h.p)) + return 0; + /* Trying to write on a read-only page, let the slow path deal with it */ + if (!oldpte.s.pr || (f->write_attempt && !oldpte.s.w)) + return 1; + + newpte =3D oldpte; + newpte.h.i =3D 0; + newpte.s.y =3D 1; + if (f->write_attempt) { + newpte.h.p =3D 0; + newpte.s.d =3D 1; + newpte.s.sd =3D 1; + } + if (!oldpte.s.d && newpte.s.d) + SetPageDirty(pfn_to_page(newpte.h.pfra)); + *pgste =3D gmap_ptep_xchg(gmap, f->ptep, newpte, *pgste, f->gfn); + + return 0; +} + +/** + * gmap_try_fixup_minor() -- Try to fixup a minor gmap fault. + * @gmap: the gmap whose fault needs to be resolved. + * @gfn: the faulting address. + * @wr: whether the fault was caused by a write access. + * + * A minor fault is a fault that can be resolved quickly within gmap. + * The page is already mapped, the fault is only due to dirty/young tracki= ng. + * + * Return: 0 in case of success, < 0 in case of error, > 0 if the fault co= uld + * not be resolved and needs to go through the slow path. + */ +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault) +{ + union pgste pgste; + int rc; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(NULL, fault->gfn, gmap->asce, DAT_WALK_LEAF, TABLE_= TYPE_PAGE_TABLE, + &fault->crstep, &fault->ptep); + /* If a PTE or a leaf CRSTE could not be reached, slow path */ + if (rc) + return 1; + + if (fault->ptep) { + pgste =3D pgste_get_lock(fault->ptep); + rc =3D _gmap_handle_minor_pte_fault(gmap, &pgste, fault); + if (!rc && fault->callback) + fault->callback(fault); + pgste_set_unlock(fault->ptep, pgste); + } else { + rc =3D gmap_handle_minor_crste_fault(gmap->asce, fault); + if (!rc && fault->callback) + fault->callback(fault); + } + return rc; +} + +static inline bool gmap_2g_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) +{ + return false; +} + +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) +{ + unsigned int order; + int rc, level; + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + level =3D TABLE_TYPE_PAGE_TABLE; + if (f->page) { + order =3D folio_order(page_folio(f->page)); + if (order >=3D get_order(_REGION3_SIZE) && gmap_2g_allowed(gmap, f->gfn)) + level =3D TABLE_TYPE_REGION3; + else if (order >=3D get_order(_SEGMENT_SIZE) && gmap_1m_allowed(gmap, f-= >gfn)) + level =3D TABLE_TYPE_SEGMENT; + } + rc =3D dat_link(mc, gmap->asce, level, gmap->uses_skeys, f); + KVM_BUG_ON(rc =3D=3D -EINVAL, gmap->kvm); + return rc; +} + +static int gmap_ucas_map_one(struct kvm_s390_mmu_cache *mc, struct gmap *g= map, + gfn_t p_gfn, gfn_t c_gfn) +{ + struct page_table *pt; + union crste *crstep; + union pte *ptep; + int rc; + + guard(read_lock)(&gmap->kvm->mmu_lock); + + rc =3D dat_entry_walk(mc, p_gfn, gmap->parent->asce, DAT_WALK_ALLOC, TABL= E_TYPE_PAGE_TABLE, + &crstep, &ptep); + if (rc) + return rc; + pt =3D pte_table_start(ptep); + dat_set_ptval(pt, PTVAL_VMADDR, p_gfn >> (_SEGMENT_SHIFT - PAGE_SHIFT)); + + rc =3D dat_entry_walk(mc, c_gfn, gmap->asce, DAT_WALK_ALLOC, TABLE_TYPE_S= EGMENT, + &crstep, &ptep); + if (rc) + return rc; + dat_crstep_xchg(crstep, _crste_fc0(virt_to_pfn(pt), TABLE_TYPE_SEGMENT), = c_gfn, gmap->asce); + return 0; +} + +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count) +{ + struct kvm_s390_mmu_cache *mc; + int rc; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + while (count) { + rc =3D gmap_ucas_map_one(mc, gmap, p_gfn, c_gfn); + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + break; + continue; + } + if (rc) + break; + + count--; + c_gfn +=3D _PAGE_ENTRIES; + p_gfn +=3D _PAGE_ENTRIES; + } + return rc; +} + +static void gmap_ucas_unmap_one(struct gmap *gmap, gfn_t c_gfn) +{ + union crste *crstep; + union pte *ptep; + int rc; + + rc =3D dat_entry_walk(NULL, c_gfn, gmap->asce, 0, TABLE_TYPE_SEGMENT, &cr= step, &ptep); + if (!rc) + dat_crstep_xchg(crstep, _PMD_EMPTY, c_gfn, gmap->asce); +} + +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count) +{ + guard(read_lock)(&gmap->kvm->mmu_lock); + + for ( ; count; count--, c_gfn +=3D _PAGE_ENTRIES) + gmap_ucas_unmap_one(gmap, c_gfn); +} + +static long _gmap_split_crste(union crste *crstep, gfn_t gfn, gfn_t next, = struct dat_walk *walk) +{ + struct gmap *gmap =3D walk->priv; + union crste crste, newcrste; + + crste =3D READ_ONCE(*crstep); + newcrste =3D _CRSTE_EMPTY(crste.h.tt); + + while (crste_leaf(crste)) { + if (crste_prefix(crste)) + gmap_unmap_prefix(gmap, gfn, next); + if (crste.s.fc1.vsie_notif) + gmap_handle_vsie_unshadow_event(gmap, gfn); + if (dat_crstep_xchg_atomic(crstep, crste, newcrste, gfn, walk->asce)) + break; + crste =3D READ_ONCE(*crstep); + } + + if (need_resched()) + return next; + + return 0; +} + +void gmap_split_huge_pages(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { + .pmd_entry =3D _gmap_split_crste, + .pud_entry =3D _gmap_split_crste, + }; + gfn_t start =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, asce_end(gmap->asce), gmap->asce, + &ops, DAT_WALK_IGN_HOLES, gmap); + cond_resched(); + } while (start); +} + +static int _gmap_enable_skeys(struct gmap *gmap) +{ + gfn_t start =3D 0; + int rc; + + if (mm_uses_skeys(gmap->kvm->mm)) + return 0; + + gmap->kvm->mm->context.uses_skeys =3D 1; + rc =3D gmap_helper_disable_cow_sharing(); + if (rc) { + gmap->kvm->mm->context.uses_skeys =3D 0; + return rc; + } + + do { + scoped_guard(write_lock, &gmap->kvm->mmu_lock) + start =3D dat_reset_skeys(gmap->asce, start); + cond_resched(); + } while (start); + return 0; +} + +int gmap_enable_skeys(struct gmap *gmap) +{ + int rc; + + mmap_write_lock(gmap->kvm->mm); + rc =3D _gmap_enable_skeys(gmap); + mmap_write_unlock(gmap->kvm->mm); + return rc; +} + +static long _destroy_pages_pte(union pte *ptep, gfn_t gfn, gfn_t next, str= uct dat_walk *walk) +{ + if (!ptep->s.pr) + return 0; + __kvm_s390_pv_destroy_page(phys_to_page(pte_origin(*ptep))); + if (need_resched()) + return next; + return 0; +} + +static long _destroy_pages_crste(union crste *crstep, gfn_t gfn, gfn_t nex= t, struct dat_walk *walk) +{ + phys_addr_t origin, cur, end; + + if (!crstep->h.fc || !crstep->s.fc1.pr) + return 0; + + origin =3D crste_origin_large(*crstep); + cur =3D ((max(gfn, walk->start) - gfn) << PAGE_SHIFT) + origin; + end =3D ((min(next, walk->end) - gfn) << PAGE_SHIFT) + origin; + for ( ; cur < end; cur +=3D PAGE_SIZE) + __kvm_s390_pv_destroy_page(phys_to_page(cur)); + if (need_resched()) + return next; + return 0; +} + +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible) +{ + const struct dat_walk_ops ops =3D { + .pte_entry =3D _destroy_pages_pte, + .pmd_entry =3D _destroy_pages_crste, + .pud_entry =3D _destroy_pages_crste, + }; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + start =3D _dat_walk_gfn_range(start, end, gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + if (interruptible && fatal_signal_pending(current)) + return -EINTR; + cond_resched(); + } while (start && start < end); + return 0; +} + +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level) +{ + struct vsie_rmap *temp, *rmap; + void __rcu **slot; + int rc; + + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + lockdep_assert_held(&sg->host_to_rmap_lock); + + rc =3D -ENOMEM; + rmap =3D kzalloc(sizeof(*rmap), GFP_ATOMIC); + if (!rmap) + goto out; + + rc =3D 0; + rmap->r_gfn =3D r_gfn; + rmap->level =3D level; + slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, p_gfn); + if (slot) { + rmap->next =3D radix_tree_deref_slot_protected(slot, &sg->host_to_rmap_l= ock); + for (temp =3D rmap->next; temp; temp =3D temp->next) { + if (temp->val =3D=3D rmap->val) + goto out; + } + radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); + } else { + rmap->next =3D NULL; + rc =3D radix_tree_insert(&sg->host_to_rmap, p_gfn, rmap); + if (rc) + goto out; + } + rmap =3D NULL; +out: + kfree(rmap); + return rc; +} + +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + union pte pte; + int flags, rc; + + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + lockdep_assert_held(&sg->parent->children_lock); + + flags =3D DAT_WALK_SPLIT_ALLOC | (sg->parent->uses_skeys ? DAT_WALK_USES_= SKEYS : 0); + rc =3D dat_entry_walk(mc, p_gfn, sg->parent->asce, flags, + TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) + return rc; + if (level <=3D TABLE_TYPE_REGION1) { + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, p_gfn, r_gfn, level); + } + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep); + pte =3D ptep->s.pr ? *ptep : _pte(pfn, wr, false, false); + pte.h.p =3D 1; + if (pgste.vsie_notif) { + _gmap_handle_vsie_unshadow_event(sg->parent, p_gfn); + pgste.vsie_notif =3D 0; + } + pgste =3D gmap_ptep_xchg(sg->parent, ptep, pte, pgste, p_gfn); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static long __set_cmma_dirty_pte(union pte *ptep, gfn_t gfn, gfn_t next, s= truct dat_walk *walk) +{ + __atomic64_or(PGSTE_CMMA_D_BIT, &pgste_of(ptep)->val); + if (need_resched()) + return next; + return 0; +} + +void gmap_set_cmma_all_dirty(struct gmap *gmap) +{ + const struct dat_walk_ops ops =3D { .pte_entry =3D __set_cmma_dirty_pte, = }; + gfn_t gfn =3D 0; + + do { + scoped_guard(read_lock, &gmap->kvm->mmu_lock) + gfn =3D _dat_walk_gfn_range(gfn, asce_end(gmap->asce), gmap->asce, &ops, + DAT_WALK_IGN_HOLES, NULL); + cond_resched(); + } while (gfn); +} + +static void gmap_unshadow_level(struct gmap *sg, gfn_t r_gfn, int level) +{ + unsigned long align =3D PAGE_SIZE; + gpa_t gaddr =3D gfn_to_gpa(r_gfn); + union crste *crstep; + union crste crste; + union pte *ptep; + + if (level > TABLE_TYPE_PAGE_TABLE) + align =3D 1UL << (11 * level + _SEGMENT_SHIFT); + kvm_s390_vsie_gmap_notifier(sg, ALIGN_DOWN(gaddr, align), ALIGN(gaddr + 1= , align)); + if (dat_entry_walk(NULL, r_gfn, sg->asce, 0, level, &crstep, &ptep)) + return; + if (ptep) { + dat_ptep_xchg(ptep, _PTE_EMPTY, r_gfn, sg->asce, sg->uses_skeys); + return; + } + crste =3D READ_ONCE(*crstep); + dat_crstep_clear(crstep, r_gfn, sg->asce); + if (is_pmd(crste)) + dat_free_pt(dereference_pmd(crste.pmd)); + else + dat_free_level(dereference_crste(crste), true); +} + +static void gmap_unshadow(struct gmap *sg) +{ + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + KVM_BUG_ON(!sg->parent, sg->kvm); + KVM_BUG_ON(sg->removed, sg->kvm); + + lockdep_assert_held(&sg->parent->children_lock); + + sg->removed =3D 1; + kvm_s390_vsie_gmap_notifier(sg, 0, -1UL); + + if (list_empty(&sg->scb_users), sg->kvm) { + gmap_remove_child(sg); + gmap_dispose(sg); + } +} + +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn) +{ + struct vsie_rmap *rmap, *rnext, *head; + struct gmap *sg, *next; + gfn_t start, end; + + list_for_each_entry_safe(sg, next, &parent->children, list) { + start =3D sg->guest_asce.rsto; + end =3D start + sg->guest_asce.tl + 1; + if (!sg->guest_asce.r && gfn >=3D start && gfn < end) { + gmap_unshadow(sg); + continue; + } + scoped_guard(spinlock, &sg->host_to_rmap_lock) + head =3D radix_tree_delete(&sg->host_to_rmap, gfn); + gmap_for_each_rmap_safe(rmap, rnext, head) + gmap_unshadow_level(sg, rmap->r_gfn, rmap->level); + } +} + +/** + * gmap_find_shadow - find a specific asce in the list of shadow tables + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * Returns the pointer to a gmap if a shadow table with the given asce is + * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, + * otherwise NULL + * + * Context: Called with parent->children_lock held + */ +static struct gmap *gmap_find_shadow(struct gmap *parent, union asce asce,= int edat_level) +{ + struct gmap *sg; + + lockdep_assert_held(&parent->children_lock); + list_for_each_entry(sg, &parent->children, list) { + if (!gmap_is_shadow_valid(sg, asce, edat_level)) + continue; + if (!sg->initialized) + return ERR_PTR(-EAGAIN); + return sg; + } + return NULL; +} + +static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) +{ + KVM_BUG_ON(1, sg->kvm); + return -EINVAL; +} + +/** + * gmap_create_shadow() - create/find a shadow guest address space + * @parent: pointer to the parent gmap + * @asce: ASCE for which the shadow table is created + * @edat_level: edat level to be used for the shadow translation + * + * The pages of the top level page table referred by the asce parameter + * will be set to read-only and marked in the PGSTEs of the kvm process. + * The shadow table will be removed automatically on any change to the + * PTE mapping for the source table. + * + * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, + * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the + * parent gmap table could not be protected. + */ +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *parent, + union asce asce, int edat_level) +{ + struct gmap *sg, *new; + int rc; + + scoped_guard(spinlock, &parent->children_lock) + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) + return sg; + /* Create a new shadow gmap */ + new =3D gmap_new(parent->kvm, asce.r ? 1UL << (64 - PAGE_SHIFT) : asce_en= d(asce)); + if (!new) + return ERR_PTR(-ENOMEM); + new->guest_asce =3D asce; + new->edat_level =3D edat_level; + new->initialized =3D false; + new->is_shadow =3D true; + new->parent =3D parent; + + scoped_guard(spinlock, &parent->children_lock) { + /* Recheck if another CPU created the same shadow */ + sg =3D gmap_find_shadow(parent, asce, edat_level); + if (sg) { + gmap_dispose(new); + return sg; + } + if (asce.r) { + /* only allow one real-space gmap shadow */ + list_for_each_entry(sg, &parent->children, list) { + if (sg->guest_asce.r) { + scoped_guard(write_lock, &parent->kvm->mmu_lock) + gmap_unshadow(sg); + break; + } + } + new->initialized =3D true; + gmap_add_child(parent, new); + /* nothing to protect, return right away */ + return new; + } + } + + /* protect after insertion, so it will get properly invalidated */ + rc =3D gmap_protect_asce_top_level(mc, new); + if (rc) { + gmap_dispose(new); + return ERR_PTR(rc); + } + return new; +} diff --git a/arch/s390/kvm/gmap.h b/arch/s390/kvm/gmap.h new file mode 100644 index 000000000000..dcfd8d213321 --- /dev/null +++ b/arch/s390/kvm/gmap.h @@ -0,0 +1,165 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest address space mapping code + * + * Copyright IBM Corp. 2007, 2016, 2025 + * Author(s): Martin Schwidefsky + * Claudio Imbrenda + */ + +#ifndef ARCH_KVM_S390_GMAP_H +#define ARCH_KVM_S390_GMAP_H + +#include "dat.h" + +/** + * struct gmap_struct - guest address space + * @is_shadow: whether this gmap is a vsie shadow gmap + * @owns_page_tables: whether this gmap owns all dat levels; normally 1, i= s 0 + * only for ucontrol per-cpu gmaps, since they share th= e page + * tables with the main gmap. + * @is_ucontrol: whether this gmap is ucontrol (main gmap or per-cpu gmap) + * @allow_hpage_1m: whether 1M hugepages are allowed for this gmap, + * independently of whatever page size is used by userspa= ce + * @allow_hpage_2g: whether 2G hugepages are allowed for this gmap, + * independently of whatever page size is used by userspa= ce + * @pfault_enabled: whether pfault is enabled for this gmap + * @removed: whether this shadow gmap is about to be disposed of + * @initialized: flag to indicate if a shadow guest address space can be u= sed + * @uses_skeys: indicates if the guest uses storage keys + * @uses_cmm: indicates if the guest uses cmm + * @edat_level: the edat level of this shadow gmap + * @kvm: the vm + * @asce: the ASCE used by this gmap + * @list: list head used in children gmaps for the children gmap list + * @children_lock: protects children and scb_users + * @children: list of child gmaps of this gmap + * @scb_users: list of vsie_scb that use this shadow gmap + * @parent: parent gmap of a child gmap + * @guest_asce: original ASCE of this shadow gmap + * @host_to_rmap_lock: protects host_to_rmap + * @host_to_rmap: radix tree mapping host addresses to guest addresses + */ +struct gmap { + unsigned char is_shadow:1; + unsigned char owns_page_tables:1; + unsigned char is_ucontrol:1; + bool allow_hpage_1m; + bool allow_hpage_2g; + bool pfault_enabled; + bool removed; + bool initialized; + bool uses_skeys; + bool uses_cmm; + unsigned char edat_level; + struct kvm *kvm; + union asce asce; + struct list_head list; + spinlock_t children_lock; /* protects: children, scb_users */ + struct list_head children; + struct list_head scb_users; + struct gmap *parent; + union asce guest_asce; + spinlock_t host_to_rmap_lock; /* protects host_to_rmap */ + struct radix_tree_root host_to_rmap; +}; + +#define gmap_for_each_rmap_safe(pos, n, head) \ + for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) + +int s390_replace_asce(struct gmap *gmap); +bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint= ); +bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end); +bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot,= gfn_t start, gfn_t end); +int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault); +struct gmap *gmap_new(struct kvm *kvm, gfn_t limit); +struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit); +void gmap_remove_child(struct gmap *child); +void gmap_dispose(struct gmap *gmap); +int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *fault); +void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end); +int gmap_set_limit(struct gmap *gmap, gfn_t limit); +int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned lo= ng count); +void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count); +int gmap_enable_skeys(struct gmap *gmap); +int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool = interruptible); +int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level); +int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_= t p_gfn, gfn_t r_gfn, + kvm_pfn_t pfn, int level, bool wr); +void gmap_set_cmma_all_dirty(struct gmap *gmap); +void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn); +struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap= *gmap, + union asce asce, int edat_level); +void gmap_split_huge_pages(struct gmap *gmap); + +static inline void gmap_handle_vsie_unshadow_event(struct gmap *parent, gf= n_t gfn) +{ + scoped_guard(spinlock, &parent->children_lock) + _gmap_handle_vsie_unshadow_event(parent, gfn); +} + +static inline bool gmap_mkold_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, true); +} + +static inline bool gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t e= nd) +{ + return _gmap_unmap_prefix(gmap, gfn, end, false); +} + +static inline union pgste gmap_ptep_xchg(struct gmap *gmap, union pte *pte= p, union pte newpte, + union pgste pgste, gfn_t gfn) +{ + lockdep_assert_held(&gmap->kvm->mmu_lock); + + if (pgste.prefix_notif && (newpte.h.p || newpte.h.i)) { + pgste.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + 1); + } + if (pgste.vsie_notif && (ptep->h.p !=3D newpte.h.p || newpte.h.i)) { + pgste.vsie_notif =3D 0; + gmap_handle_vsie_unshadow_event(gmap, gfn); + } + return __dat_ptep_xchg(ptep, pgste, newpte, gfn, gmap->asce, gmap->uses_s= keys); +} + +static inline void gmap_crstep_xchg(struct gmap *gmap, union crste *crstep= , union crste ne, + gfn_t gfn) +{ + unsigned long align =3D 8 + (is_pmd(*crstep) ? 0 : 11); + + lockdep_assert_held(&gmap->kvm->mmu_lock); + + gfn =3D ALIGN_DOWN(gfn, align); + if (crste_prefix(*crstep) && (ne.h.p || ne.h.i || !crste_prefix(ne))) { + ne.s.fc1.prefix_notif =3D 0; + gmap_unmap_prefix(gmap, gfn, gfn + align); + } + if (crste_leaf(*crstep) && crstep->s.fc1.vsie_notif && + (ne.h.p || ne.h.i || !ne.s.fc1.vsie_notif)) { + ne.s.fc1.vsie_notif =3D 0; + gmap_handle_vsie_unshadow_event(gmap, gfn); + } + dat_crstep_xchg(crstep, ne, gfn, gmap->asce); +} + +/** + * gmap_is_shadow_valid() - check if a shadow guest address space matches = the + * given properties and is still valid + * @sg: pointer to the shadow guest address space structure + * @asce: ASCE for which the shadow table is requested + * @edat_level: edat level to be used for the shadow translation + * + * Returns true if the gmap shadow is still valid and matches the given + * properties, the caller can continue using it. Returns false otherwise; = the + * caller has to request a new shadow gmap in this case. + */ +static inline bool gmap_is_shadow_valid(struct gmap *sg, union asce asce, = int edat_level) +{ + if (sg->removed) + return false; + return sg->guest_asce.val =3D=3D asce.val && sg->edat_level =3D=3D edat_l= evel; +} + +#endif /* ARCH_KVM_S390_GMAP_H */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41CD93AA1AB; Thu, 20 Nov 2025 17:16:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658986; cv=none; b=BF6kdt9EOH8+uldkCXudxY23IgaY8QYq2/4GINKke8pEXlybIee7IGppMUep7pH9mrdLszlOGNGMy9argoKEF09xzQrtTH/LL0o5NEy68218F1AwrNLuiEpCM/5tsmSoYyRi1AgFUv3o9T9BkqqO4SsO/eo82saGjtkycmVwGyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658986; c=relaxed/simple; bh=74ZdDL1jnsu+M2FxjBvlQ+NadbP94tyzu5dJbp2ffD4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fjQ4wN1ji55hj4r/KNhBx2i5U4rgLbiXUavdrrixVGPPSaF3jZrPR33UACKl2a/Nf7NQPPLRFuRc7doAB6DDVEXNbL6rY+kPXnAiS+XxufdEjQRAQUUBOTkJmpGoM7dGrO7vGh4NLNLKv419tNw0tWdlZvqPIzawiMRz0rnhnF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=PTrWx6nX; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="PTrWx6nX" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDFimw002726; Thu, 20 Nov 2025 17:16:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=44dD3G8F384Ce6uPP kI4G1C2kToYhDz6X/+mFIzO52A=; b=PTrWx6nXnq8AkDOz4EVvwJ3gqT+KqIzCc 6DZfAWQdxvtNfhVeSSvW4ofamHzyzFCF9yKwRFHA9ac/nFEi/Tk7KNy4bP0AtN0z J5nz+doj3yS8Q2RyxqGSRA4EqVxXzJN95Ylz+DaqQrx2CbNch9FcRc8h4k/1iIVq Zu9MRoeMFI46E5A2ArQCqPtaECwNbEr4ANrWYyICcmxurLKuQxeZQuk7VVFMpqal kGjQfSeR2Jb4N23Nz6V3KK36N7lq/s6l4B0lg6IJnc6INrTdwqdSSnRcm7Jg9BJz uOr0VhKwdFOeIHcH/uHGxGLbqcN+uX8rHPPkHrxofMx/ccSgnINZw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju686v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:19 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKH2ZBB022386; Thu, 20 Nov 2025 17:16:19 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mfk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:19 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGFcH43188700 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:15 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF1E82004B; Thu, 20 Nov 2025 17:16:14 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4818820040; Thu, 20 Nov 2025 17:16:13 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:13 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 15/23] KVM: s390: Add helper functions for fault handling Date: Thu, 20 Nov 2025 18:15:36 +0100 Message-ID: <20251120171544.96841-16-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 320DJ9Ym09h21urR6qePX9U4sGkVj49d X-Proofpoint-ORIG-GUID: 320DJ9Ym09h21urR6qePX9U4sGkVj49d X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXxZ6btkaVbhGd 0R0QqqnVNAFFbfQsVk/bTa4yt38zLFpizwnjzILl3vNqYSlBneDYp+DZz3wGcW5uBsM+vmxFsQc JTjdwAt8q7RrqE2sDblcVqYab7heaT+LvM1OCFeZeWgftNtW9XkzVBLYUG4PwRzod7pfwzscLEo AY27S4sjppJdXr8MrJ/TttbT0+MxPWoI2SufkN9tvlkxb+JZISXJW53Nu7K9kbMMwGBVmgEuCAj Dfk4zcTRFhtESM49gN0NtYEy+3HqBKBeJz/KLtLLwKWFIGUIbcn9rB9uMV61U7/6jUI6V4kWkkZ U376+gBZPiEjWUo9UxqIslABwHlZimOExEOp1FBkAa8ypAHvzNyK26Os82vzCcR+uCdpgbKyxf8 /Ff8cngEcT5lzx5MlWUn1R5EOGVAWQ== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4ce3 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=A4J1138ORUdiv7jBIEgA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add some helper functions for handling multiple guest faults at the same time. This will be needed for VSIE, where a nested guest access also needs to access all the page tables that map it. Signed-off-by: Claudio Imbrenda --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/faultin.c | 148 +++++++++++++++++++++++++++++++ arch/s390/kvm/faultin.h | 92 +++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 2 +- arch/s390/kvm/kvm-s390.h | 2 + 6 files changed, 245 insertions(+), 2 deletions(-) create mode 100644 arch/s390/kvm/faultin.c create mode 100644 arch/s390/kvm/faultin.h diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index f5f87dae0dd9..958a3b8c32d1 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -441,6 +441,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; + void *mc; /* Placeholder */ }; =20 struct kvm_vm_stat { diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 21088265402c..1e2dcd3e2436 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -9,7 +9,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o -kvm-y +=3D dat.o gmap.o +kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o obj-$(CONFIG_KVM) +=3D kvm.o diff --git a/arch/s390/kvm/faultin.c b/arch/s390/kvm/faultin.c new file mode 100644 index 000000000000..9795ed429097 --- /dev/null +++ b/arch/s390/kvm/faultin.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ +#include +#include + +#include "gmap.h" +#include "trace.h" +#include "faultin.h" + +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + +/* + * kvm_s390_faultin_gfn() - handle a dat fault. + * @vcpu: the vCPU whose gmap is to be fixed up, or NULL if operating on t= he VM. + * @kvm: the VM whose gmap is to be fixed up, or NULL if operating on a vC= PU. + * @f: the guest fault that needs to be resolved. + * + * Return: + * * 0 on success + * * < 0 in case of error + * * > 0 in case of guest exceptions + * + * Context: + * * The mm lock must not be held before calling + * * kvm->srcu must be held + * * may sleep + */ +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f) +{ + struct kvm_s390_mmu_cache *local_mc __free(kvm_s390_mmu_cache) =3D NULL; + struct kvm_s390_mmu_cache *mc =3D NULL; + struct kvm_memory_slot *slot; + unsigned long inv_seq; + int foll, rc =3D 0; + + foll =3D f->write_attempt ? FOLL_WRITE : 0; + foll |=3D f->attempt_pfault ? FOLL_NOWAIT : 0; + + if (vcpu) { + kvm =3D vcpu->kvm; + mc =3D vcpu->arch.mc; + } + + lockdep_assert_held(&kvm->srcu); + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (gmap_try_fixup_minor(kvm->arch.gmap, f) =3D=3D 0) + return 0; + } + + while (1) { + f->valid =3D false; + inv_seq =3D kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + if (vcpu) + slot =3D kvm_vcpu_gfn_to_memslot(vcpu, f->gfn); + else + slot =3D gfn_to_memslot(kvm, f->gfn); + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page); + + /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT)= */ + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { + if (unlikely(!f->attempt_pfault)) + return -EAGAIN; + if (unlikely(!vcpu)) + return -EINVAL; + trace_kvm_s390_major_guest_pfault(vcpu); + if (kvm_arch_setup_async_pf(vcpu)) + return 0; + vcpu->stat.pfault_sync++; + /* Could not setup async pfault, try again synchronously */ + foll &=3D ~FOLL_NOWAIT; + f->pfn =3D __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page= ); + } + + /* Access outside memory, addressing exception */ + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + /* Signal pending: try again */ + if (f->pfn =3D=3D KVM_PFN_ERR_SIGPENDING) + return -EAGAIN; + /* Check if it's read-only memory; don't try to actually handle that cas= e. */ + if (f->pfn =3D=3D KVM_PFN_ERR_RO_FAULT) + return -EOPNOTSUPP; + /* Any other error */ + if (is_error_pfn(f->pfn)) + return -EFAULT; + + if (!mc) { + local_mc =3D kvm_s390_new_mmu_cache(); + if (!local_mc) + return -ENOMEM; + mc =3D local_mc; + } + + /* Loop, will automatically release the faulted page */ + if (mmu_invalidate_retry_gfn_unsafe(kvm, inv_seq, f->gfn)) { + kvm_release_faultin_page(kvm, f->page, true, false); + continue; + } + + scoped_guard(read_lock, &kvm->mmu_lock) { + if (!mmu_invalidate_retry_gfn(kvm, inv_seq, f->gfn)) { + f->valid =3D true; + rc =3D gmap_link(mc, kvm->arch.gmap, f); + kvm_release_faultin_page(kvm, f->page, !!rc, f->write_attempt); + f->page =3D NULL; + } + } + kvm_release_faultin_page(kvm, f->page, true, false); + + if (rc =3D=3D -ENOMEM) { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + } else if (rc !=3D -EAGAIN) { + return rc; + } + } +} + +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w) +{ + struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); + int foll =3D w ? FOLL_WRITE : 0; + + f->write_attempt =3D w; + f->gfn =3D gfn; + f->pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &f->writable, &f->page); + if (is_noslot_pfn(f->pfn)) + return PGM_ADDRESSING; + if (is_sigpending_pfn(f->pfn)) + return -EINTR; + if (f->pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) + return -EAGAIN; + if (is_error_pfn(f->pfn)) + return -EFAULT; + + f->valid =3D true; + return 0; +} diff --git a/arch/s390/kvm/faultin.h b/arch/s390/kvm/faultin.h new file mode 100644 index 000000000000..f86176d2769c --- /dev/null +++ b/arch/s390/kvm/faultin.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KVM guest fault handling. + * + * Copyright IBM Corp. 2025 + * Author(s): Claudio Imbrenda + */ + +#ifndef __KVM_S390_FAULTIN_H +#define __KVM_S390_FAULTIN_H + +#include + +#include "dat.h" + +int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct gu= est_fault *f); +int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t = gfn, bool w); + +static inline int kvm_s390_faultin_gfn_simple(struct kvm_vcpu *vcpu, struc= t kvm *kvm, + gfn_t gfn, bool wr) +{ + struct guest_fault f =3D { .gfn =3D gfn, .write_attempt =3D wr, }; + + return kvm_s390_faultin_gfn(vcpu, kvm, &f); +} + +static inline int kvm_s390_get_guest_page_and_read_gpa(struct kvm *kvm, st= ruct guest_fault *f, + gpa_t gaddr, unsigned long *val) +{ + int rc; + + rc =3D kvm_s390_get_guest_page(kvm, f, gpa_to_gfn(gaddr), false); + if (rc) + return rc; + + *val =3D *(unsigned long *)phys_to_virt(pfn_to_phys(f->pfn) | offset_in_p= age(gaddr)); + + return 0; +} + +static inline void kvm_s390_release_multiple(struct kvm *kvm, struct guest= _fault *guest_faults, + int n, bool ignore) +{ + int i; + + for (i =3D 0; i < n; i++) { + kvm_release_faultin_page(kvm, guest_faults[i].page, ignore, + guest_faults[i].write_attempt); + guest_faults[i].page =3D NULL; + } +} + +static inline bool kvm_s390_multiple_faults_need_retry(struct kvm *kvm, un= signed long seq, + struct guest_fault *guest_faults, int n, + bool unsafe) +{ + int i; + + for (i =3D 0; i < n; i++) { + if (!guest_faults[i].valid) + continue; + if (unsafe && mmu_invalidate_retry_gfn_unsafe(kvm, seq, guest_faults[i].= gfn)) + return true; + if (!unsafe && mmu_invalidate_retry_gfn(kvm, seq, guest_faults[i].gfn)) + return true; + } + return false; +} + +static inline int kvm_s390_get_guest_pages(struct kvm *kvm, struct guest_f= ault *guest_faults, + gfn_t start, int n_pages, bool write_attempt) +{ + int i, rc; + + for (i =3D 0; i < n_pages; i++) { + rc =3D kvm_s390_get_guest_page(kvm, guest_faults + i, start + i, write_a= ttempt); + if (rc) + break; + } + return rc; +} + +#define kvm_s390_release_faultin_array(kvm, array, ignore) \ + kvm_s390_release_multiple(kvm, array, ARRAY_SIZE(array), ignore) + +#define kvm_s390_array_needs_retry_unsafe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), t= rue) + +#define kvm_s390_array_needs_retry_safe(kvm, seq, array) \ + kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), f= alse) + +#endif /* __KVM_S390_FAULTIN_H */ diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2e34f993e3c5..d7eff75a53d0 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4747,7 +4747,7 @@ bool kvm_arch_can_dequeue_async_page_present(struct k= vm_vcpu *vcpu) return true; } =20 -static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu) { hva_t hva; struct kvm_arch_async_pf arch; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index c44fe0c3a097..f89f9f698df5 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -470,6 +470,8 @@ static inline int kvm_s390_handle_dat_fault(struct kvm_= vcpu *vcpu, gpa_t gaddr, return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); } =20 +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); + /* implemented in diag.c */ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu); =20 --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 771F53242AB; Thu, 20 Nov 2025 17:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658987; cv=none; b=cGptUuN7JBW7UB/2oZ1tjBL/lFbgj+/eQRwy06b8Mkr5q+7dE02Gz+PGPYVV+DizO/3FPbRf1irt+8MVfavTg9TFuMmnc0S86iuDAn3K3FcXac1LL2Q7CMGdamMVAjhWWWHENBESzO2ZbHBL/c+d8Dv4LaH3usmI4MS8vs1FhI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658987; c=relaxed/simple; bh=gVuoVjsqmNdGa9KNfggc9JKmi16j23yUI5AoHAqa+QY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sHYlDHzYZ6oNThWS1BfjxcnUB+EEK7ot13f/K+4QFeJqteeTCLTOJONYyiJ5fOlay/+uijoK1dW9fWsNdjQJJyU0SekzLFHbowlIkSttw50XWMizJjdetf13/Y39ftAlz89z2N/YXRML238L2JJj8Wi58xpPCe0BWxr8u/9+Z1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=pVvYfTPx; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="pVvYfTPx" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCarh4028121; Thu, 20 Nov 2025 17:16:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=+tJ3G0BCx/KpJ4pd/ x+8oFCxk1gtphfpy4kH4nQ+4WU=; b=pVvYfTPxvUvuxrT35PaonV1qcPQEyfaj6 RUBx3/16r/Yqb9+6G5BG1hJpd4AMqOhz57Vxbmj15oLKjF8QZGNjd6D7CUo4Pw6u zkmLzONmwlPsO/7SXeK9JC6JTpIp2GyhTMKi1SspO4f3vtXfy6IiVSYAA/euO2W0 AoNf7OYj48IqrlY86eg+h7UXO66jia6kjRKjS7Gfwl6vFqFd9ppItCC0yprj9tNi /lIDcBIx4SGtG4bYsbmU3Z3lsNgIr14uyu6AxHM7C7A4ghynhGtY4aHVwBiH2I0A Cu1j+G1/fvkdCFaYo/3vZRtsjhhGPkpB5FMe1fWJtT0crd+7WTsAg== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7n00-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:21 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKF8nV2007037; Thu, 20 Nov 2025 17:16:20 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af62jqdqy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:20 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGGcP49283508 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:16 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B904A2004D; Thu, 20 Nov 2025 17:16:16 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 167A920040; Thu, 20 Nov 2025 17:16:15 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:14 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 16/23] KVM: s390: Add some helper functions needed for vSIE Date: Thu, 20 Nov 2025 18:15:37 +0100 Message-ID: <20251120171544.96841-17-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 0VWEUFjZ24rPZRcSDwqMai4qRzozo_ZT X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX+PT5sjyR4EDY o1JqQUsd0DSe+celUIkcHv9Cw9QUSeeu3GHTtwzyRGnD7aPt/SYzD6gYzZK4YDRJ4Mme0hQGAl3 7JdumB73cRO3kCINzvNzt+QiUKrNIQz8LL26mngLPFBMsO1HWNauRq+cR/+GLicB7TqjZ6U0jj7 gotV/T3w81wpDO3RdZIfxN4f0znDm1qNPBiOuipeNX3OuW9PAzpHUFseDFG5Z1dfk79cF+iLXZj b1dPZGK+oZ9tNzLORPqwJ/+n0QXq+UA70swpW/uSn1GwemFdEKAE83hrwBk0ejo7JO1o45YjLB/ qdN4AEqlOxxWb7eQmdhkbLQB73q8A6d86jd8BNDPIkOlG0HczDM9Y5ej5C8OidgNifEXvhXgxYN IfGMn44FhK2WDFI38kgXSc1nlkuUhA== X-Proofpoint-ORIG-GUID: 0VWEUFjZ24rPZRcSDwqMai4qRzozo_ZT X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4ce5 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=aP1jQA4SsBbapb4bBcQA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Implement gmap_protect_asce_top_level(), which was a stub. This function was a stub due to cross dependencies with other patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 71 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index 29ce8df697dd..cbb777e940d1 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -22,6 +22,7 @@ #include "dat.h" #include "gmap.h" #include "kvm-s390.h" +#include "faultin.h" =20 static inline bool kvm_s390_is_in_sie(struct kvm_vcpu *vcpu) { @@ -988,10 +989,78 @@ static struct gmap *gmap_find_shadow(struct gmap *par= ent, union asce asce, int e return NULL; } =20 +#define CRST_TABLE_PAGES (_CRST_TABLE_SIZE / PAGE_SIZE) +struct gmap_protect_asce_top_level { + unsigned long seq; + struct guest_fault f[CRST_TABLE_PAGES]; +}; + +static inline int __gmap_protect_asce_top_level(struct kvm_s390_mmu_cache = *mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc, i; + + guard(write_lock)(&sg->kvm->mmu_lock); + + if (kvm_s390_array_needs_retry_safe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + + scoped_guard(spinlock, &sg->parent->children_lock) { + for (i =3D 0; i < CRST_TABLE_PAGES; i++) { + rc =3D gmap_protect_rmap(mc, sg, context->f[i].gfn, 0, context->f[i].pf= n, + TABLE_TYPE_REGION1 + 1, context->f[i].writable); + if (rc) + return rc; + } + sg->initialized =3D true; + gmap_add_child(sg->parent, sg); + } + + kvm_s390_release_faultin_array(sg->kvm, context->f, false); + return 0; +} + +static inline int _gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *= mc, struct gmap *sg, + struct gmap_protect_asce_top_level *context) +{ + int rc; + + if (kvm_s390_array_needs_retry_unsafe(sg->kvm, context->seq, context->f)) + return -EAGAIN; + do { + rc =3D kvm_s390_mmu_cache_topup(mc); + if (rc) + return rc; + rc =3D radix_tree_preload(GFP_KERNEL); + if (rc) + return rc; + rc =3D __gmap_protect_asce_top_level(mc, sg, context); + radix_tree_preload_end(); + } while (rc =3D=3D -ENOMEM); + + return rc; +} + static int gmap_protect_asce_top_level(struct kvm_s390_mmu_cache *mc, stru= ct gmap *sg) { - KVM_BUG_ON(1, sg->kvm); - return -EINVAL; + struct gmap_protect_asce_top_level context =3D {}; + union asce asce =3D sg->guest_asce; + int rc; + + KVM_BUG_ON(!sg->is_shadow, sg->kvm); + + context.seq =3D sg->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + rc =3D kvm_s390_get_guest_pages(sg->kvm, context.f, asce.rsto, asce.dt + = 1, false); + if (rc > 0) + rc =3D -EFAULT; + if (!rc) + rc =3D _gmap_protect_asce_top_level(mc, sg, &context); + if (rc) + kvm_s390_release_faultin_array(sg->kvm, context.f, true); + return rc; } =20 /** --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BB4F3242BA; Thu, 20 Nov 2025 17:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658988; cv=none; b=g44FjKN+SaOGG8LRr5YbPhwVhgJ3LBspaFlpW5mkNy+DiEnJFAazwj1tlVN7Cc3ka2Ydjz6cI1/+HGE1mNvpjXJHkmveOZ15NzDec8JcUxXHlNpAPH7JjpC92lLcsoV3mOYK78LjgXckaK37hXUTJZ7Pleb1T0GjV1nONptJ7Wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763658988; c=relaxed/simple; bh=gDd+ju82f/6QL4NVbuYQTLPj68kvejVxzOtVImmcLJw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E0qZKWtJlGnP7CqxM9eaTkDcZkrcwCo5u+SPpOfjPs4zymD7zGwNSO9bRDL1PJZsYksx94XGMRsbnQvlw6bcTP8LTWOvQ/7+LLSD0oIv6+MT6CUdP748n5JuR5g6fw2b9SQJl80U5UZpyAo4v5WBNEpA9CfrI6QZHw59ZT7LYWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=q6VXvAMF; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="q6VXvAMF" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBvek3021517; Thu, 20 Nov 2025 17:16:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=XWjRLxEiNnrndkhSi lvqCpJUgLPdel7vWAFKkrY4Tgc=; b=q6VXvAMF5EVYjkk2C2N8xvb8RQYqH12xq ONJItZOqgA178iRcXAb1QuVN7s24T3JwTAxIEkUvNVe+x+cEQtP6u9uSMIOTiHp8 lX2/KPL5HA+0xtorB2YrsrfXGw3g++K4Pjky6HqfaAhcHoTBeBLyeMJhoX9h++RZ GwGO2M/pf4NLTuo5f664YcKjYchUd6PfEBLNtvh2l+Eaz3f+dPXJWzlLm9GIuhYP eLGlGc5IxJngQTmAL3gNdBKoFCNcx3GjaogluhzkbqCJQYaAgNhjV3iKjETEPnl6 DiMD2n0AY9A+mxtsV61qz/Cx4r5HyIDBzcVQrXE7hpp5rt2r6GAtg== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejjwfrej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:23 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGUuJ4010406; Thu, 20 Nov 2025 17:16:22 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af3usfsg0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:22 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGJRr40763898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:19 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E673420043; Thu, 20 Nov 2025 17:16:18 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E0ABD20040; Thu, 20 Nov 2025 17:16:16 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:16 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 17/23] KVM: s390: Stop using CONFIG_PGSTE Date: Thu, 20 Nov 2025 18:15:38 +0100 Message-ID: <20251120171544.96841-18-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=BanVE7t2 c=1 sm=1 tr=0 ts=691f4ce7 cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=WeuG7GAIFJ3A7Xv3FmEA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX5JUf94NnNSxb WSUPX/kDBRVkhv8rX3ABfQ120VchBWsqltQrfQbO6JAZlIglUtV9iq/HWzfmpjhpkH0AxcdTiWj ohQqcu1yoGsjmCCE83CCJsZksYx/uAMxVmLUyiAW5VJLrldxXAnMACWBiXrQQNGuj+CyMCqQ3J9 IleHpGWQ5sNqEEhqfY3sSLbQX80AWRx95NTAfqJRBPOu6fl4dbRP2XRPq6xWsmKanbPcrIWdG2X 8VjhuAA3kWIFYhK8chBmhyu2yBAKAZcHP7cJ2ptFksN3hU+K+9QeWkOyNx8d0fkUFk15FDib653 kgV+LNCuQCCUlYJadfdcrlYNttUMC+ZJxzuRKY8MqUE3N1EQqbJaxm0wV7AW4rlHGwa57G58KE3 YQqUuFYPnIvrl/YrIHiqz6jj0DR/eQ== X-Proofpoint-GUID: yYsqBvOYUz9iDVCl1gcPXL8HVry68v8i X-Proofpoint-ORIG-GUID: yYsqBvOYUz9iDVCl1gcPXL8HVry68v8i X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 phishscore=0 bulkscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Switch to using IS_ENABLED(CONFIG_KVM) instead of CONFIG_PGSTE, since the latter will be removed soon. Many CONFIG_PGSTE are left behind, because they will be removed completely in upcoming patches. The ones replaced here are mostly the ones that will stay. Signed-off-by: Claudio Imbrenda Reviewed-by: Steffen Eiden --- arch/s390/include/asm/mmu_context.h | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/s390/mm/fault.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index d9b8501bc93d..48e548c01daa 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -29,7 +29,7 @@ static inline int init_new_context(struct task_struct *ts= k, atomic_set(&mm->context.protected_count, 0); mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) mm->context.has_pgste =3D 0; mm->context.uses_skeys =3D 0; mm->context.uses_cmm =3D 0; diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 3ddc62fcf6dd..7ccad785e4fe 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -577,7 +577,7 @@ static inline int mm_has_pgste(struct mm_struct *mm) =20 static inline int mm_is_protected(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (unlikely(atomic_read(&mm->context.protected_count))) return 1; #endif @@ -632,7 +632,7 @@ static inline pud_t set_pud_bit(pud_t pud, pgprot_t pro= t) #define mm_forbids_zeropage mm_forbids_zeropage static inline int mm_forbids_zeropage(struct mm_struct *mm) { -#ifdef CONFIG_PGSTE +#if IS_ENABLED(CONFIG_KVM) if (!mm->context.allow_cow_sharing) return 1; #endif diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index e1ad05bfd28a..683afd496190 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -398,7 +398,7 @@ void do_dat_exception(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_dat_exception); =20 -#if IS_ENABLED(CONFIG_PGSTE) +#if IS_ENABLED(CONFIG_KVM) =20 void do_secure_storage_access(struct pt_regs *regs) { @@ -465,4 +465,4 @@ void do_secure_storage_access(struct pt_regs *regs) } NOKPROBE_SYMBOL(do_secure_storage_access); =20 -#endif /* CONFIG_PGSTE */ +#endif /* CONFIG_KVM */ --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8738D371A23; Thu, 20 Nov 2025 17:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659007; cv=none; b=AiKocTeAggn3MJgBVpCkfwAluYNkw0doRAbL5viMxK0tG/Pv7qyRzXi77fSKU5dRaJYj1k0s21G4q7B2+A3ZnCK3niSLqUgvNilNfpGaPdPL6dx/A1FJSFOvHY95s1sKcGEi/cYUqpFzkBe6GVRbn0oCQ4zEVLmEYxRIEwkbTfQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659007; c=relaxed/simple; bh=MFTcp4Qqv8NBZAXQszOFf4DmX/Tj/WHhjFpHFi8o7yg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dmxwIBIAkmknE3gr0lIG+MH5ZilaOvPuJsEgERw4TqSbU0KIfJ4e0vMK4nrL6xtYSDuBrm4l8J7ikbBfP4CQdfLhecsYl0vjwB2pWXl+mIbG/PnxzWh2s7fMTd0YRX4sjcpmWo5TjaFd8O+QSRsH+O/sDtdJaIpBIOUKKsR/1zg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=l1Wu2pkz; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="l1Wu2pkz" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCnSiq028030; Thu, 20 Nov 2025 17:16:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=mapCAmNqgXwN/Ykkw 7AUF8CI6YMEvexdgSKLBhw1bsk=; b=l1Wu2pkzlx8gVTJhnnwFkKMxWPQAXMkiG wy0pjuOFyxB1bvqWrspzWK1YhdbZupW78FbKKId8ta8WA78wjZGXYuVyBg5VXxHZ bExdGo7oB5MT9kENluCh7O0JBB9i/Wke5bThJYy/kdXUf4HK2LCbgH8S8ogRzEzx TPF7j6yVK9sWuILCc6T1cmeU2kjOFibeIINcv0hyCfuAW0w5R+R/1HXFb1JXVHWG Kw9xKby/zrNY8yGGnqNu1+1t8c4I212G8py1zBibExxRfZ6DRP1obug8Lv9ayD/w WJ9Q5UotWnDQoSbHxakp5ZCafQPSP9fBDiYTZb0Y5aMRR53vy/v/Q== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7n0j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:42 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEPsnu022340; Thu, 20 Nov 2025 17:16:24 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mg9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:24 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGKfB58392908 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:20 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C251D2004B; Thu, 20 Nov 2025 17:16:20 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 196E620040; Thu, 20 Nov 2025 17:16:19 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:18 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 18/23] KVM: s390: Storage key functions refactoring Date: Thu, 20 Nov 2025 18:15:39 +0100 Message-ID: <20251120171544.96841-19-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AMA_UjHXxCBLtCfXw2M1iej-VCMNHOBv X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXxNLu0bQgATqV yOjISkbjJU9HHj8Drd5UnGF5pnQqeVfeIFf44hZOq5rKCi64NLkzEqKRq9J5UtU5CF/c+cVRz1h VJc6viPh3Tyo93LQ39c4Y500k1x9ypzPEW3WAsfYW5vK5Bm/6PI5/9EnFEu1bXid8A65mRlMcZf KFDNQKDgdxXKaZjydSds/3Q3ZJotIrIKV18kVOZA2FW0qJh2mcp+RtJTafwB8jGk6mkrsfCG/OJ Iawr7cdQUphzFdc/+rgiQKCHyQshMNmEjuvASjgqsVqgqy6TOKsrW9yj/H4X+dOGIFjqf/l2gYy v2PX+vCuikwedSV1U1gNC6YRRmq2MYN4162s0/v5/6Ispg4wYsDanU5nEpFkXncuZCUJMm7I/e1 Y9H080FtVMBMxYcElYO+dwFw+plvJQ== X-Proofpoint-ORIG-GUID: AMA_UjHXxCBLtCfXw2M1iej-VCMNHOBv X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cfa cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=g1xfaEJ1DM-V_6iQr0UA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Refactor some storage key functions to improve readability. Introduce helper functions that will be used in the next patches. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gaccess.c | 36 +++++++++--------- arch/s390/kvm/gaccess.h | 4 +- arch/s390/kvm/kvm-s390.c | 80 +++++++++++++++------------------------- arch/s390/kvm/kvm-s390.h | 8 ++++ 4 files changed, 58 insertions(+), 70 deletions(-) diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 05fd3ee4b20d..a054de80a5cc 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -989,9 +989,8 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned l= ong gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, - __uint128_t *old_addr, __uint128_t new, - u8 access_key, bool *success) +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 acc, bool *success) { gfn_t gfn =3D gpa_to_gfn(gpa); struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); @@ -1023,41 +1022,42 @@ int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa= _t gpa, int len, case 1: { u8 old; =20 - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, *old_addr, new, access_= key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); + *success =3D !ret && old =3D=3D old_addr->one; + old_addr->one =3D old; break; } case 2: { u16 old; =20 - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); + *success =3D !ret && old =3D=3D old_addr->two; + old_addr->two =3D old; break; } case 4: { u32 old; =20 - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); + *success =3D !ret && old =3D=3D old_addr->four; + old_addr->four =3D old; break; } case 8: { u64 old; =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, *old_addr, new, access= _key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); + *success =3D !ret && old =3D=3D old_addr->eight; + old_addr->eight =3D old; break; } case 16: { __uint128_t old; =20 - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, *old_addr, new= , access_key); - *success =3D !ret && old =3D=3D *old_addr; - *old_addr =3D old; + ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, + new.sixteen, acc); + *success =3D !ret && old =3D=3D old_addr->sixteen; + old_addr->sixteen =3D old; break; } default: diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 3fde45a151f2..774cdf19998f 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,8 +206,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, __uint= 128_t *old, - __uint128_t new, u8 access_key, bool *success); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, + union kvm_s390_quad new, u8 access_key, bool *success); =20 /** * write_guest_with_key - copy data from kernel space to guest space diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index d7eff75a53d0..ab69c9fd7926 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2905,9 +2905,9 @@ static int mem_op_validate_common(struct kvm_s390_mem= _op *mop, u64 supported_fla static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op = *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; - int r, srcu_idx; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION | KVM_S390_MEMOP_F_CHECK_ONLY); @@ -2920,52 +2920,36 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, = struct kvm_s390_mem_op *mop) return -ENOMEM; } =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { - r =3D check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); - goto out_unlock; - } - if (acc_mode =3D=3D GACC_FETCH) { + if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) + return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); + + if (acc_mode =3D=3D GACC_STORE && copy_from_user(tmpbuf, uaddr, mop->siz= e)) + return -EFAULT; r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_FETCH, mop->key); + mop->size, acc_mode, mop->key); if (r) - goto out_unlock; - if (copy_to_user(uaddr, tmpbuf, mop->size)) - r =3D -EFAULT; - } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_unlock; - } - r =3D access_guest_abs_with_key(kvm, mop->gaddr, tmpbuf, - mop->size, GACC_STORE, mop->key); + return r; + if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); - - vfree(tmpbuf); - return r; + return 0; } =20 static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem= _op *mop) { void __user *uaddr =3D (void __user *)mop->buf; void __user *old_addr =3D (void __user *)mop->old_addr; - union { - __uint128_t quad; - char raw[sizeof(__uint128_t)]; - } old =3D { .quad =3D 0}, new =3D { .quad =3D 0 }; - unsigned int off_in_quad =3D sizeof(new) - mop->size; - int r, srcu_idx; - bool success; + union kvm_s390_quad old =3D { .sixteen =3D 0 }; + union kvm_s390_quad new =3D { .sixteen =3D 0 }; + bool success =3D false; + int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION); if (r) @@ -2977,25 +2961,21 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *k= vm, struct kvm_s390_mem_op *m */ if (mop->size > sizeof(new)) return -EINVAL; - if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size)) + if (copy_from_user(&new, uaddr, mop->size)) return -EFAULT; - if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size)) + if (copy_from_user(&old, old_addr, mop->size)) return -EFAULT; =20 - srcu_idx =3D srcu_read_lock(&kvm->srcu); + scoped_guard(srcu, &kvm->srcu) { + if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) + return PGM_ADDRESSING; =20 - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) { - r =3D PGM_ADDRESSING; - goto out_unlock; - } - - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old.quad, - new.quad, mop->key, &success); - if (!success && copy_to_user(old_addr, &old.raw[off_in_quad], mop->size)) - r =3D -EFAULT; + r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, + mop->key, &success); =20 -out_unlock: - srcu_read_unlock(&kvm->srcu, srcu_idx); + if (!success && copy_to_user(old_addr, &old, mop->size)) + return -EFAULT; + } return r; } =20 diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index f89f9f698df5..495ee9caaa30 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -22,6 +22,14 @@ =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 +union kvm_s390_quad { + __uint128_t sixteen; + unsigned long eight; + unsigned int four; + unsigned short two; + unsigned char one; +}; + static inline void kvm_s390_fpu_store(struct kvm_run *run) { fpu_stfpc(&run->s.regs.fpc); --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA2C6371A23; Thu, 20 Nov 2025 17:16:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659021; cv=none; b=UOAqn0Dx7k+1SmzdK/NYvmTRiFC6vFDo7hBxC+oyF/mcTo/j5VBz6bEITTEZ0u+jHf9GQKjrCTVDfZDP7BdCH2xfBMLinO7tV3J5YWpRLbmyEtM9jWvQE3rxG0ikhNhCbgHxqzOOpX4PKtpMCHT/DsD+0AYnyol3Xk4K9OTgv7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659021; c=relaxed/simple; bh=HZyWmBrW7MGH2i5xa5sb929AXLGu5mERt8rwRgvq2jI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mig71rFsN8koc/jWnJNf2LnKS2yGfCZcDtayCR6aVyubcCgXVyp/947R96wOKF0uIryu1FB+Ka3staY68OK45YAfeq/SnnLLTmBtG4XKNDKrdK6o16+et7+a3Kce3ZdxZ/+s0nChnJYi2+7w3Gqs+TZ+Ta4gkj/GqWLqQmxpPxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=eC7O5vUJ; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="eC7O5vUJ" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKDFin2002726; Thu, 20 Nov 2025 17:16:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=JRCkmQOaD/QLMbuau IyUSBhHle0D16h1T10d9rVH6Ms=; b=eC7O5vUJzXS1Z+SJw3joaqIHeNl73IdGg bsm5psZW38muKWXDzDfiUuQJ3duLiyt36G6WhfBNOgqLNhBMjfc6G6Y0eZgNhLEB fVdDibodO95VAiltUAzjLt0+8yCA+n85srYYoPd0YtqyTS51XgbFhMjDyPKAMAV3 Ronvr6GAldG8nbGL3IQgK/CkycYRllHw5afh3W32xWy4Hg4IU84O0PwXavJBKw21 NVrFoa8M72vOXVvSISk2aRT82h469l40jbBOebD8649bETfCKREFacgqimygDgXO xCfj8ywBR3pL9S2HWnfdiOIEQOW4pRfLPVdJ4ZhH0I8xA6v5B9QYQ== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju6880-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:46 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGc7Hf030854; Thu, 20 Nov 2025 17:16:28 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af47y7qu4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:27 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGNld49283524 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:23 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5DF3D2004B; Thu, 20 Nov 2025 17:16:23 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00D5520040; Thu, 20 Nov 2025 17:16:21 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:20 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 19/23] KVM: s390: Switch to new gmap Date: Thu, 20 Nov 2025 18:15:40 +0100 Message-ID: <20251120171544.96841-20-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: _5ZaX5GJuHsJRT8DbvR-iVzxrn4gApfm X-Proofpoint-ORIG-GUID: _5ZaX5GJuHsJRT8DbvR-iVzxrn4gApfm X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX//lKDVhqaHNS mRv5x76teRqn+Ki+YWQ0GhTNYRHR2ugLloBaDyPV+P0nYYT8Bih2C4ilq1ACyt0SPBPWnfRBCAR Y6uQP9Iejspmq0GbEPChC4kAsi7wfQJwLEd7RJsxzsNTDBvPjO6eWr4E3lm1cCAFDcxVeqb3r0E +eDBfQfdZM2ol2GnneZJ/rH/WrM75B6wzD3yedjMbe9xh+LD/53rHE6VcKgYKJb5z6hP0gKkyBH Yy3PxwAiHc9ZaoM7hN/cuZ4zJqoAMfkVfw9MgTcSlvPpIkVGy+z2tYAQftlwbSaMfb9VKnF8sJJ CV0BTC/eFQkFcWD9a0O/nN5i8lM+qLNiFJg13AK6C5p6LxB+2jJx4kjBOxMnNOGQbtCi/Fgw00/ pzcT2l9JP1Tr8F+s4md9Zxe89SqJng== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4cff cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=20KFwNOVAAAA:8 a=ngqOC3Gsl-S2KcdWAMYA:9 a=lnVDH3WJtl_eQsIS:21 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Switch KVM/s390 to use the new gmap code. Remove includes to and include "gmap.h" instead; fix all the existing users of the old gmap functions to use the new ones instead. Fix guest storage key access functions to work with the new gmap. Signed-off-by: Claudio Imbrenda --- arch/s390/Kconfig | 2 +- arch/s390/include/asm/kvm_host.h | 5 +- arch/s390/include/asm/mmu_context.h | 4 - arch/s390/include/asm/tlb.h | 3 - arch/s390/include/asm/uaccess.h | 70 +-- arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c | 2 +- arch/s390/kvm/gaccess.c | 866 +++++++++++++++++----------- arch/s390/kvm/gaccess.h | 18 +- arch/s390/kvm/gmap-vsie.c | 141 ----- arch/s390/kvm/gmap.c | 6 +- arch/s390/kvm/intercept.c | 15 +- arch/s390/kvm/interrupt.c | 2 +- arch/s390/kvm/kvm-s390.c | 757 +++++++----------------- arch/s390/kvm/kvm-s390.h | 20 +- arch/s390/kvm/priv.c | 211 +++---- arch/s390/kvm/pv.c | 64 +- arch/s390/kvm/vsie.c | 153 +++-- arch/s390/lib/uaccess.c | 184 +----- arch/s390/mm/gmap_helpers.c | 29 - 20 files changed, 991 insertions(+), 1563 deletions(-) delete mode 100644 arch/s390/kvm/gmap-vsie.c diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index df22b10d9141..3b4ba19a3611 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -33,7 +33,7 @@ config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 config PGSTE - def_bool y if KVM + def_bool n =20 config AUDIT_ARCH def_bool y diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_h= ost.h index 958a3b8c32d1..9abaa23bbb76 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -441,7 +441,7 @@ struct kvm_vcpu_arch { bool acrs_loaded; struct kvm_s390_pv_vcpu pv; union diag318_info diag318_info; - void *mc; /* Placeholder */ + struct kvm_s390_mmu_cache *mc; }; =20 struct kvm_vm_stat { @@ -633,6 +633,8 @@ struct kvm_s390_pv { struct mmu_notifier mmu_notifier; }; =20 +struct kvm_s390_mmu_cache; + struct kvm_arch{ void *sca; int use_esca; @@ -673,6 +675,7 @@ struct kvm_arch{ struct kvm_s390_pv pv; struct list_head kzdev_list; spinlock_t kzdev_list_lock; + struct kvm_s390_mmu_cache *mc; }; =20 #define KVM_HVA_ERR_BAD (-1UL) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mm= u_context.h index 48e548c01daa..bd1ef5e2d2eb 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -30,11 +30,7 @@ static inline int init_new_context(struct task_struct *t= sk, mm->context.gmap_asce =3D 0; mm->context.flush_mm =3D 0; #if IS_ENABLED(CONFIG_KVM) - mm->context.has_pgste =3D 0; - mm->context.uses_skeys =3D 0; - mm->context.uses_cmm =3D 0; mm->context.allow_cow_sharing =3D 1; - mm->context.allow_gmap_hpage_1m =3D 0; #endif switch (mm->context.asce_limit) { default: diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index 1e50f6f1ad9d..7354b42ee994 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -36,7 +36,6 @@ static inline bool __tlb_remove_folio_pages(struct mmu_ga= ther *tlb, =20 #include #include -#include =20 /* * Release the page cache reference for a pte removed by @@ -85,8 +84,6 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, p= gtable_t pte, tlb->mm->context.flush_mm =3D 1; tlb->freed_tables =3D 1; tlb->cleared_pmds =3D 1; - if (mm_has_pgste(tlb->mm)) - gmap_unlink(tlb->mm, (unsigned long *)pte, address); tlb_remove_ptdesc(tlb, virt_to_ptdesc(pte)); } =20 diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uacces= s.h index 3e5b8b677057..6380e03cfb62 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -471,65 +471,15 @@ do { \ #define __get_kernel_nofault __mvc_kernel_nofault #define __put_kernel_nofault __mvc_kernel_nofault =20 -void __cmpxchg_user_key_called_with_bad_pointer(void); - -int __cmpxchg_user_key1(unsigned long address, unsigned char *uval, - unsigned char old, unsigned char new, unsigned long key); -int __cmpxchg_user_key2(unsigned long address, unsigned short *uval, - unsigned short old, unsigned short new, unsigned long key); -int __cmpxchg_user_key4(unsigned long address, unsigned int *uval, - unsigned int old, unsigned int new, unsigned long key); -int __cmpxchg_user_key8(unsigned long address, unsigned long *uval, - unsigned long old, unsigned long new, unsigned long key); -int __cmpxchg_user_key16(unsigned long address, __uint128_t *uval, - __uint128_t old, __uint128_t new, unsigned long key); - -static __always_inline int _cmpxchg_user_key(unsigned long address, void *= uval, - __uint128_t old, __uint128_t new, - unsigned long key, int size) -{ - switch (size) { - case 1: return __cmpxchg_user_key1(address, uval, old, new, key); - case 2: return __cmpxchg_user_key2(address, uval, old, new, key); - case 4: return __cmpxchg_user_key4(address, uval, old, new, key); - case 8: return __cmpxchg_user_key8(address, uval, old, new, key); - case 16: return __cmpxchg_user_key16(address, uval, old, new, key); - default: __cmpxchg_user_key_called_with_bad_pointer(); - } - return 0; -} - -/** - * cmpxchg_user_key() - cmpxchg with user space target, honoring storage k= eys - * @ptr: User space address of value to compare to @old and exchange with - * @new. Must be aligned to sizeof(*@ptr). - * @uval: Address where the old value of *@ptr is written to. - * @old: Old value. Compared to the content pointed to by @ptr in order to - * determine if the exchange occurs. The old value read from *@ptr is - * written to *@uval. - * @new: New value to place at *@ptr. - * @key: Access key to use for checking storage key protection. - * - * Perform a cmpxchg on a user space target, honoring storage key protecti= on. - * @key alone determines how key checking is performed, neither - * storage-protection-override nor fetch-protection-override apply. - * The caller must compare *@uval and @old to determine if values have been - * exchanged. In case of an exception *@uval is set to zero. - * - * Return: 0: cmpxchg executed - * -EFAULT: an exception happened when trying to access *@ptr - * -EAGAIN: maxed out number of retries (byte and short only) - */ -#define cmpxchg_user_key(ptr, uval, old, new, key) \ -({ \ - __typeof__(ptr) __ptr =3D (ptr); \ - __typeof__(uval) __uval =3D (uval); \ - \ - BUILD_BUG_ON(sizeof(*(__ptr)) !=3D sizeof(*(__uval))); \ - might_fault(); \ - __chk_user_ptr(__ptr); \ - _cmpxchg_user_key((unsigned long)(__ptr), (void *)(__uval), \ - (old), (new), (key), sizeof(*(__ptr))); \ -}) +int __cmpxchg_key1(void *address, unsigned char *uval, unsigned char old, + unsigned char new, unsigned long key); +int __cmpxchg_key2(void *address, unsigned short *uval, unsigned short old, + unsigned short new, unsigned long key); +int __cmpxchg_key4(void *address, unsigned int *uval, unsigned int old, + unsigned int new, unsigned long key); +int __cmpxchg_key8(void *address, unsigned long *uval, unsigned long old, + unsigned long new, unsigned long key); +int __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_t old, + __uint128_t new, unsigned long key); =20 #endif /* __S390_UACCESS_H */ diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 1e2dcd3e2436..dac9d53b23d8 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -8,7 +8,7 @@ include $(srctree)/virt/kvm/Makefile.kvm ccflags-y :=3D -Ivirt/kvm -Iarch/s390/kvm =20 kvm-y +=3D kvm-s390.o intercept.o interrupt.o priv.o sigp.o -kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o +kvm-y +=3D diag.o gaccess.o guestdbg.o vsie.o pv.o kvm-y +=3D dat.o gmap.o faultin.o =20 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D pci.o diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index 53233dec8cad..d89d1c381522 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -10,13 +10,13 @@ =20 #include #include -#include #include #include #include "kvm-s390.h" #include "trace.h" #include "trace-s390.h" #include "gaccess.h" +#include "gmap.h" =20 static void do_discard_gfn_range(struct kvm_vcpu *vcpu, gfn_t gfn_start, g= fn_t gfn_end) { diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index a054de80a5cc..0c70f46ae323 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -11,15 +11,43 @@ #include #include #include +#include +#include +#include #include #include -#include #include #include "kvm-s390.h" +#include "dat.h" +#include "gmap.h" #include "gaccess.h" +#include "faultin.h" =20 #define GMAP_SHADOW_FAKE_TABLE 1ULL =20 +union dat_table_entry { + unsigned long val; + union region1_table_entry pgd; + union region2_table_entry p4d; + union region3_table_entry pud; + union segment_table_entry pmd; + union page_table_entry pte; +}; + +#define WALK_N_ENTRIES 7 +#define LEVEL_MEM -2 +struct pgtwalk { + struct guest_fault raw_entries[WALK_N_ENTRIES]; + gpa_t last_addr; + int level; + bool p; +}; + +static inline struct guest_fault *get_entries(struct pgtwalk *w) +{ + return w->raw_entries - LEVEL_MEM; +} + /* * raddress union which will contain the result (real or absolute address) * after a page table walk. The rfaa, sfaa and pfra members are used to @@ -81,6 +109,28 @@ struct aste { /* .. more fields there */ }; =20 +union oac { + unsigned int val; + struct { + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac1; + struct { + unsigned short key : 4; + unsigned short : 4; + unsigned short as : 2; + unsigned short : 4; + unsigned short k : 1; + unsigned short a : 1; + } oac2; + }; +}; + int ipte_lock_held(struct kvm *kvm) { if (sclp.has_siif) { @@ -618,28 +668,16 @@ static int low_address_protection_enabled(struct kvm_= vcpu *vcpu, static int vm_check_access_key_gpa(struct kvm *kvm, u8 access_key, enum gacc_mode mode, gpa_t gpa) { - u8 storage_key, access_control; - bool fetch_protected; - unsigned long hva; + union skey storage_key; int r; =20 - if (access_key =3D=3D 0) - return 0; - - hva =3D gfn_to_hva(kvm, gpa_to_gfn(gpa)); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &kvm->mmu_lock) + r =3D dat_get_storage_key(kvm->arch.gmap->asce, gpa_to_gfn(gpa), &storag= e_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); - if (access_control =3D=3D access_key) + if (access_key =3D=3D 0 || storage_key.acc =3D=3D access_key) return 0; - fetch_protected =3D storage_key & _PAGE_FP_BIT; - if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !fetch_protect= ed) + if ((mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) && !storage_key.f= p) return 0; return PGM_PROTECTION; } @@ -682,8 +720,7 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu *v= cpu, u8 access_key, enum gacc_mode mode, union asce asce, gpa_t gpa, unsigned long ga, unsigned int len) { - u8 storage_key, access_control; - unsigned long hva; + union skey storage_key; int r; =20 /* access key 0 matches any storage key -> allow */ @@ -693,26 +730,23 @@ static int vcpu_check_access_key_gpa(struct kvm_vcpu = *vcpu, u8 access_key, * caller needs to ensure that gfn is accessible, so we can * assume that this cannot fail */ - hva =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gpa)); - mmap_read_lock(current->mm); - r =3D get_guest_storage_key(current->mm, hva, &storage_key); - mmap_read_unlock(current->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + r =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gpa), &stora= ge_key); if (r) return r; - access_control =3D FIELD_GET(_PAGE_ACC_BITS, storage_key); /* access key matches storage key -> allow */ - if (access_control =3D=3D access_key) + if (storage_key.acc =3D=3D access_key) return 0; if (mode =3D=3D GACC_FETCH || mode =3D=3D GACC_IFETCH) { /* it is a fetch and fetch protection is off -> allow */ - if (!(storage_key & _PAGE_FP_BIT)) + if (!storage_key.fp) return 0; if (fetch_prot_override_applicable(vcpu, mode, asce) && fetch_prot_override_applies(ga, len)) return 0; } if (storage_prot_override_applicable(vcpu) && - storage_prot_override_applies(access_control)) + storage_prot_override_applies(storage_key.acc)) return 0; return PGM_PROTECTION; } @@ -812,37 +846,79 @@ static int access_guest_page_gpa(struct kvm *kvm, enu= m gacc_mode mode, gpa_t gpa return rc; } =20 +static int mvcos_key(void *to, const void *from, unsigned long size, u8 ds= t_key, u8 src_key) +{ + union oac spec =3D { + .oac1.key =3D dst_key, + .oac1.k =3D !!dst_key, + .oac2.key =3D src_key, + .oac2.k =3D !!src_key, + }; + int exception =3D PGM_PROTECTION; + + asm_inline volatile( + " lr %%r0,%[spec]\n" + "0: mvcos %[to],%[from],%[size]\n" + "1: lhi %[exc],0\n" + "2:\n" + EX_TABLE(0b, 2b) + EX_TABLE(1b, 2b) + : [size] "+d" (size), [to] "=3DQ" (*(char *)to), [exc] "+d" (exception) + : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) + : "memory", "cc", "0"); + return exception; +} + +struct acc_page_key_context { + void *data; + int exception; + unsigned short offset; + unsigned short len; + bool store; + u8 access_key; +}; + +static void _access_guest_page_with_key_gpa(struct guest_fault *f) +{ + struct acc_page_key_context *context =3D f->priv; + void *ptr; + int r; + + ptr =3D __va(PFN_PHYS(f->pfn) | context->offset); + + if (context->store) + r =3D mvcos_key(ptr, context->data, context->len, context->access_key, 0= ); + else + r =3D mvcos_key(context->data, ptr, context->len, 0, context->access_key= ); + + context->exception =3D r; +} + static int access_guest_page_with_key_gpa(struct kvm *kvm, enum gacc_mode = mode, gpa_t gpa, - void *data, unsigned int len, u8 access_key) + void *data, unsigned int len, u8 acc) { - struct kvm_memory_slot *slot; - bool writable; - gfn_t gfn; - hva_t hva; + struct acc_page_key_context context =3D { + .offset =3D offset_in_page(gpa), + .len =3D len, + .data =3D data, + .access_key =3D acc, + .store =3D mode =3D=3D GACC_STORE, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D mode =3D=3D GACC_STORE, + .callback =3D _access_guest_page_with_key_gpa, + }; int rc; =20 - gfn =3D gpa_to_gfn(gpa); - slot =3D gfn_to_memslot(kvm, gfn); - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); + if (KVM_BUG_ON((len + context.offset) > PAGE_SIZE, kvm)) + return -EINVAL; =20 - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a ro memslot, even tho that can't occur (they're unsuppo= rted). - * Don't try to actually handle that case. - */ - if (!writable && mode =3D=3D GACC_STORE) - return -EOPNOTSUPP; - hva +=3D offset_in_page(gpa); - if (mode =3D=3D GACC_STORE) - rc =3D copy_to_user_key((void __user *)hva, data, len, access_key); - else - rc =3D copy_from_user_key(data, (void __user *)hva, len, access_key); + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); if (rc) - return PGM_PROTECTION; - if (mode =3D=3D GACC_STORE) - mark_page_dirty_in_slot(kvm, slot, gfn); - return 0; + return rc; + return context.exception; } =20 int access_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, void *data, @@ -965,18 +1041,101 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsign= ed long gra, return rc; } =20 +/** + * __cmpxchg_with_key() - cmpxchg memory, honoring storage keys + * @ptr: Address of value to compare to *@old and exchange with + * @new. Must be aligned to sizeof(*@ptr). + * @uval: Address where the old value of *@ptr is written to. + * @old: Old value. Compared to the content pointed to by @ptr in order to + * determine if the exchange occurs. The old value read from *@ptr is + * written to *@uval. + * @new: New value to place at *@ptr. + * @access_key: Access key to use for checking storage key protection. + * + * Perform a cmpxchg on guest memory, honoring storage key protection. + * @access_key alone determines how key checking is performed, neither + * storage-protection-override nor fetch-protection-override apply. + * In case of an exception *@uval is set to zero. + * + * Return: + * * 0: cmpxchg executed successfully + * * 1: cmpxchg executed unsuccessfully + * * PGM_PROTECTION: an exception happened when trying to access *@ptr + * * -EAGAIN: maxed out number of retries (byte and short only) + */ +static int __cmpxchg_with_key(union kvm_s390_quad *ptr, union kvm_s390_qua= d *old, + union kvm_s390_quad new, int size, u8 access_key) +{ + union kvm_s390_quad tmp =3D { .sixteen =3D 0 }; + int rc; + + /* + * The cmpxchg_key macro depends on the type of "old", so we need + * a case for each valid length and get some code duplication as long + * as we don't introduce a new macro. + */ + switch (size) { + case 1: + rc =3D __cmpxchg_key1(&ptr->one, &tmp.one, old->one, new.one, access_key= ); + break; + case 2: + rc =3D __cmpxchg_key2(&ptr->two, &tmp.two, old->two, new.two, access_key= ); + break; + case 4: + rc =3D __cmpxchg_key4(&ptr->four, &tmp.four, old->four, new.four, access= _key); + break; + case 8: + rc =3D __cmpxchg_key8(&ptr->eight, &tmp.eight, old->eight, new.eight, ac= cess_key); + break; + case 16: + rc =3D __cmpxchg_key16(&ptr->sixteen, &tmp.sixteen, old->sixteen, new.si= xteen, + access_key); + break; + default: + return -EINVAL; + } + if (!rc && memcmp(&tmp, old, size)) + rc =3D 1; + *old =3D tmp; + /* + * Assume that the fault is caused by protection, either key protection + * or user page write protection. + */ + if (rc =3D=3D -EFAULT) + rc =3D PGM_PROTECTION; + return rc; +} + +struct cmpxchg_key_context { + union kvm_s390_quad new; + union kvm_s390_quad *old; + int exception; + unsigned short offset; + u8 access_key; + u8 len; +}; + +static void _cmpxchg_guest_abs_with_key(struct guest_fault *f) +{ + struct cmpxchg_key_context *context =3D f->priv; + + context->exception =3D __cmpxchg_with_key(__va(PFN_PHYS(f->pfn) | context= ->offset), + context->old, context->new, context->len, + context->access_key); +} + /** * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute addres= s. * @kvm: Virtual machine instance. * @gpa: Absolute guest address of the location to be changed. * @len: Operand length of the cmpxchg, required: 1 <=3D len <=3D 16. Prov= iding a * non power of two will result in failure. - * @old_addr: Pointer to old value. If the location at @gpa contains this = value, - * the exchange will succeed. After calling cmpxchg_guest_abs_w= ith_key() - * *@old_addr contains the value at @gpa before the attempt to - * exchange the value. + * @old: Pointer to old value. If the location at @gpa contains this value, + * the exchange will succeed. After calling cmpxchg_guest_abs_with_k= ey() + * *@old contains the value at @gpa before the attempt to + * exchange the value. * @new: The value to place at @gpa. - * @access_key: The access key to use for the guest access. + * @acc: The access key to use for the guest access. * @success: output value indicating if an exchange occurred. * * Atomically exchange the value at @gpa by @new, if it contains *@old. @@ -989,89 +1148,36 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigne= d long gra, * * -EAGAIN: transient failure (len 1 or 2) * * -EOPNOTSUPP: read-only memslot (should never occur) */ -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 acc, bool *success) { - gfn_t gfn =3D gpa_to_gfn(gpa); - struct kvm_memory_slot *slot =3D gfn_to_memslot(kvm, gfn); - bool writable; - hva_t hva; - int ret; - - if (!IS_ALIGNED(gpa, len)) - return -EINVAL; - - hva =3D gfn_to_hva_memslot_prot(slot, gfn, &writable); - if (kvm_is_error_hva(hva)) - return PGM_ADDRESSING; - /* - * Check if it's a read-only memslot, even though that cannot occur - * since those are unsupported. - * Don't try to actually handle that case. - */ - if (!writable) - return -EOPNOTSUPP; - - hva +=3D offset_in_page(gpa); - /* - * The cmpxchg_user_key macro depends on the type of "old", so we need - * a case for each valid length and get some code duplication as long - * as we don't introduce a new macro. - */ - switch (len) { - case 1: { - u8 old; - - ret =3D cmpxchg_user_key((u8 __user *)hva, &old, old_addr->one, new.one,= acc); - *success =3D !ret && old =3D=3D old_addr->one; - old_addr->one =3D old; - break; - } - case 2: { - u16 old; - - ret =3D cmpxchg_user_key((u16 __user *)hva, &old, old_addr->two, new.two= , acc); - *success =3D !ret && old =3D=3D old_addr->two; - old_addr->two =3D old; - break; - } - case 4: { - u32 old; + struct cmpxchg_key_context context =3D { + .old =3D old, + .new =3D new, + .offset =3D offset_in_page(gpa), + .len =3D len, + .access_key =3D acc, + }; + struct guest_fault fault =3D { + .gfn =3D gpa_to_gfn(gpa), + .priv =3D &context, + .write_attempt =3D true, + .callback =3D _cmpxchg_guest_abs_with_key, + }; + int rc; =20 - ret =3D cmpxchg_user_key((u32 __user *)hva, &old, old_addr->four, new.fo= ur, acc); - *success =3D !ret && old =3D=3D old_addr->four; - old_addr->four =3D old; - break; - } - case 8: { - u64 old; + lockdep_assert_held(&kvm->srcu); =20 - ret =3D cmpxchg_user_key((u64 __user *)hva, &old, old_addr->eight, new.e= ight, acc); - *success =3D !ret && old =3D=3D old_addr->eight; - old_addr->eight =3D old; - break; - } - case 16: { - __uint128_t old; - - ret =3D cmpxchg_user_key((__uint128_t __user *)hva, &old, old_addr->sixt= een, - new.sixteen, acc); - *success =3D !ret && old =3D=3D old_addr->sixteen; - old_addr->sixteen =3D old; - break; - } - default: + if (len > 16 || !IS_ALIGNED(gpa, len)) return -EINVAL; - } - if (*success) - mark_page_dirty_in_slot(kvm, slot, gfn); - /* - * Assume that the fault is caused by protection, either key protection - * or user page write protection. - */ - if (ret =3D=3D -EFAULT) - ret =3D PGM_PROTECTION; - return ret; + + rc =3D kvm_s390_faultin_gfn(NULL, kvm, &fault); + if (rc) + return rc; + *success =3D !context.exception; + if (context.exception =3D=3D 1) + return 0; + return context.exception; } =20 /** @@ -1173,304 +1279,362 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_= vcpu *vcpu, unsigned long gra) } =20 /** - * kvm_s390_shadow_tables - walk the guest page table and create shadow ta= bles + * walk_guest_tables() - walk the guest page table and pin the dat tables * @sg: pointer to the shadow guest address space structure * @saddr: faulting address in the shadow gmap - * @pgt: pointer to the beginning of the page table for the given address = if - * successful (return value 0), or to the first invalid DAT entry in - * case of exceptions (return value > 0) - * @dat_protection: referenced memory is write protected - * @fake: pgt references contiguous guest memory block, not a pgtable + * @w: will be filled with information on the pinned pages + * @wr: indicates a write access if true + * + * Return: + * * 0 in case of success, + * * a PIC code > 0 in case the address translation fails + * * an error code < 0 if other errors happen in the host */ -static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, - unsigned long *pgt, int *dat_protection, - int *fake) +static int walk_guest_tables(struct gmap *sg, unsigned long saddr, struct = pgtwalk *w, bool wr) { - struct kvm *kvm; - struct gmap *parent; - union asce asce; + struct gmap *parent =3D sg->parent; + struct guest_fault *entries; + union dat_table_entry table; union vaddress vaddr; unsigned long ptr; + struct kvm *kvm; + union asce asce; int rc; =20 - *fake =3D 0; - *dat_protection =3D 0; - kvm =3D sg->private; - parent =3D sg->parent; + kvm =3D parent->kvm; + asce =3D sg->guest_asce; + entries =3D get_entries(w); + + w->level =3D LEVEL_MEM; + w->last_addr =3D saddr; + if (asce.r) + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, gpa_to_gfn(sadd= r), false); + vaddr.addr =3D saddr; - asce.val =3D sg->orig_asce; ptr =3D asce.rsto * PAGE_SIZE; - if (asce.r) { - *fake =3D 1; - ptr =3D 0; - asce.dt =3D ASCE_TYPE_REGION1; - } + + if (!asce_contains_gfn(asce, gpa_to_gfn(saddr))) + return PGM_ASCE_TYPE; switch (asce.dt) { case ASCE_TYPE_REGION1: - if (vaddr.rfx01 > asce.tl && !*fake) + if (vaddr.rfx01 > asce.tl) return PGM_REGION_FIRST_TRANS; break; case ASCE_TYPE_REGION2: - if (vaddr.rfx) - return PGM_ASCE_TYPE; if (vaddr.rsx01 > asce.tl) return PGM_REGION_SECOND_TRANS; break; case ASCE_TYPE_REGION3: - if (vaddr.rfx || vaddr.rsx) - return PGM_ASCE_TYPE; if (vaddr.rtx01 > asce.tl) return PGM_REGION_THIRD_TRANS; break; case ASCE_TYPE_SEGMENT: - if (vaddr.rfx || vaddr.rsx || vaddr.rtx) - return PGM_ASCE_TYPE; if (vaddr.sx01 > asce.tl) return PGM_SEGMENT_TRANSLATION; break; } =20 + w->level =3D asce.dt; switch (asce.dt) { - case ASCE_TYPE_REGION1: { - union region1_table_entry rfte; - - if (*fake) { - ptr +=3D vaddr.rfx * _REGION1_SIZE; - rfte.val =3D ptr; - goto shadow_r2t; - } - *pgt =3D ptr + vaddr.rfx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val); + case ASCE_TYPE_REGION1: + w->last_addr =3D ptr + vaddr.rfx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rfte.i) + if (table.pgd.i) return PGM_REGION_FIRST_TRANS; - if (rfte.tt !=3D TABLE_TYPE_REGION1) + if (table.pgd.tt !=3D TABLE_TYPE_REGION1) return PGM_TRANSLATION_SPEC; - if (vaddr.rsx01 < rfte.tf || vaddr.rsx01 > rfte.tl) + if (vaddr.rsx01 < table.pgd.tf || vaddr.rsx01 > table.pgd.tl) return PGM_REGION_SECOND_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rfte.p; - ptr =3D rfte.rto * PAGE_SIZE; -shadow_r2t: - rc =3D gmap_shadow_r2t(sg, saddr, rfte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r1_entry++; - } + w->p |=3D table.pgd.p; + ptr =3D table.pgd.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION2: { - union region2_table_entry rste; - - if (*fake) { - ptr +=3D vaddr.rsx * _REGION2_SIZE; - rste.val =3D ptr; - goto shadow_r3t; - } - *pgt =3D ptr + vaddr.rsx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val); + case ASCE_TYPE_REGION2: + w->last_addr =3D ptr + vaddr.rsx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rste.i) + if (table.p4d.i) return PGM_REGION_SECOND_TRANS; - if (rste.tt !=3D TABLE_TYPE_REGION2) + if (table.p4d.tt !=3D TABLE_TYPE_REGION2) return PGM_TRANSLATION_SPEC; - if (vaddr.rtx01 < rste.tf || vaddr.rtx01 > rste.tl) + if (vaddr.rtx01 < table.p4d.tf || vaddr.rtx01 > table.p4d.tl) return PGM_REGION_THIRD_TRANS; if (sg->edat_level >=3D 1) - *dat_protection |=3D rste.p; - ptr =3D rste.rto * PAGE_SIZE; -shadow_r3t: - rste.p |=3D *dat_protection; - rc =3D gmap_shadow_r3t(sg, saddr, rste.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r2_entry++; - } + w->p |=3D table.p4d.p; + ptr =3D table.p4d.rto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_REGION3: { - union region3_table_entry rtte; - - if (*fake) { - ptr +=3D vaddr.rtx * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; - } - *pgt =3D ptr + vaddr.rtx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val); + case ASCE_TYPE_REGION3: + w->last_addr =3D ptr + vaddr.rtx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (rtte.i) + if (table.pud.i) return PGM_REGION_THIRD_TRANS; - if (rtte.tt !=3D TABLE_TYPE_REGION3) + if (table.pud.tt !=3D TABLE_TYPE_REGION3) return PGM_TRANSLATION_SPEC; - if (rtte.cr && asce.p && sg->edat_level >=3D 2) + if (table.pud.cr && asce.p && sg->edat_level >=3D 2) return PGM_TRANSLATION_SPEC; - if (rtte.fc && sg->edat_level >=3D 2) { - *dat_protection |=3D rtte.fc0.p; - *fake =3D 1; - ptr =3D rtte.fc1.rfaa * _REGION3_SIZE; - rtte.val =3D ptr; - goto shadow_sgt; + if (sg->edat_level >=3D 1) + w->p |=3D table.pud.p; + if (table.pud.fc && sg->edat_level >=3D 2) { + table.val =3D u64_replace_bits(table.val, saddr, ~_REGION3_MASK); + goto edat_applies; } - if (vaddr.sx01 < rtte.fc0.tf || vaddr.sx01 > rtte.fc0.tl) + if (vaddr.sx01 < table.pud.fc0.tf || vaddr.sx01 > table.pud.fc0.tl) return PGM_SEGMENT_TRANSLATION; - if (sg->edat_level >=3D 1) - *dat_protection |=3D rtte.fc0.p; - ptr =3D rtte.fc0.sto * PAGE_SIZE; -shadow_sgt: - rtte.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_sgt(sg, saddr, rtte.val, *fake); - if (rc) - return rc; - kvm->stat.gmap_shadow_r3_entry++; - } + ptr =3D table.pud.fc0.sto * PAGE_SIZE; + w->level--; fallthrough; - case ASCE_TYPE_SEGMENT: { - union segment_table_entry ste; - - if (*fake) { - ptr +=3D vaddr.sx * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; - } - *pgt =3D ptr + vaddr.sx * 8; - rc =3D gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val); + case ASCE_TYPE_SEGMENT: + w->last_addr =3D ptr + vaddr.sx * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); if (rc) return rc; - if (ste.i) + if (table.pmd.i) return PGM_SEGMENT_TRANSLATION; - if (ste.tt !=3D TABLE_TYPE_SEGMENT) + if (table.pmd.tt !=3D TABLE_TYPE_SEGMENT) return PGM_TRANSLATION_SPEC; - if (ste.cs && asce.p) + if (table.pmd.cs && asce.p) return PGM_TRANSLATION_SPEC; - *dat_protection |=3D ste.fc0.p; - if (ste.fc && sg->edat_level >=3D 1) { - *fake =3D 1; - ptr =3D ste.fc1.sfaa * _SEGMENT_SIZE; - ste.val =3D ptr; - goto shadow_pgt; + w->p |=3D table.pmd.p; + if (table.pmd.fc && sg->edat_level >=3D 1) { + table.val =3D u64_replace_bits(table.val, saddr, ~_SEGMENT_MASK); + goto edat_applies; } - ptr =3D ste.fc0.pto * (PAGE_SIZE / 2); -shadow_pgt: - ste.fc0.p |=3D *dat_protection; - rc =3D gmap_shadow_pgt(sg, saddr, ste.val, *fake); + ptr =3D table.pmd.fc0.pto * (PAGE_SIZE / 2); + w->level--; + } + w->last_addr =3D ptr + vaddr.px * 8; + rc =3D kvm_s390_get_guest_page_and_read_gpa(kvm, entries + w->level, + w->last_addr, &table.val); + if (rc) + return rc; + if (table.pte.i) + return PGM_PAGE_TRANSLATION; + if (table.pte.z) + return PGM_TRANSLATION_SPEC; + w->p |=3D table.pte.p; +edat_applies: + if (wr && w->p) + return PGM_PROTECTION; + + return kvm_s390_get_guest_page(kvm, entries + LEVEL_MEM, table.pte.pfra, = wr); +} + +static int _do_shadow_pte(struct gmap *sg, gpa_t raddr, union pte *ptep_h,= union pte *ptep, + struct guest_fault *f, bool p) +{ + union pgste pgste; + union pte newpte; + int rc; + + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, f->gfn, gpa_to_gfn(raddr), TABLE_TYPE_PAGE_T= ABLE); + if (rc) + return rc; + + pgste =3D pgste_get_lock(ptep_h); + newpte =3D _pte(f->pfn, f->writable, !p, 0); + newpte.s.d |=3D ptep->s.d; + newpte.s.sd |=3D ptep->s.sd; + newpte.h.p &=3D ptep->h.p; + pgste =3D gmap_ptep_xchg(sg->parent, ptep_h, newpte, pgste, f->gfn); + pgste.vsie_notif =3D 1; + pgste_set_unlock(ptep_h, pgste); + + newpte =3D _pte(f->pfn, 0, !p, 0); + pgste =3D pgste_get_lock(ptep); + pgste =3D __dat_ptep_xchg(ptep, pgste, newpte, gpa_to_gfn(raddr), sg->asc= e, sg->uses_skeys); + pgste_set_unlock(ptep, pgste); + + return 0; +} + +static int _do_shadow_crste(struct gmap *sg, gpa_t raddr, union crste *hos= t, union crste *table, + struct guest_fault *f, bool p) +{ + union crste newcrste; + gfn_t gfn; + int rc; + + lockdep_assert_held_write(&sg->kvm->mmu_lock); + + gfn =3D f->gfn & gpa_to_gfn(is_pmd(*table) ? _SEGMENT_MASK : _REGION3_MAS= K); + scoped_guard(spinlock, &sg->host_to_rmap_lock) + rc =3D gmap_insert_rmap(sg, gfn, gpa_to_gfn(raddr), host->h.tt); + if (rc) + return rc; + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, f->writable, !p); + newcrste.s.fc1.d |=3D host->s.fc1.d; + newcrste.s.fc1.sd |=3D host->s.fc1.sd; + newcrste.h.p &=3D host->h.p; + newcrste.s.fc1.vsie_notif =3D 1; + newcrste.s.fc1.prefix_notif =3D host->s.fc1.prefix_notif; + gmap_crstep_xchg(sg->parent, host, newcrste, f->gfn); + + newcrste =3D _crste_fc1(f->pfn, host->h.tt, 0, !p); + dat_crstep_xchg(table, newcrste, gpa_to_gfn(raddr), sg->asce); + return 0; +} + +static int _gaccess_do_shadow(struct kvm_s390_mmu_cache *mc, struct gmap *= sg, + unsigned long saddr, struct pgtwalk *w) +{ + struct guest_fault *entries; + int flags, i, hl, gl, l, rc; + union crste *table, *host; + union pte *ptep, *ptep_h; + + lockdep_assert_held(&sg->kvm->mmu_lock); + entries =3D get_entries(w); + ptep_h =3D NULL; + ptep =3D NULL; + + rc =3D dat_entry_walk(NULL, gpa_to_gfn(saddr), sg->asce, DAT_WALK_ANY, TA= BLE_TYPE_PAGE_TABLE, + &table, &ptep); + if (rc) + return rc; + + /* A race occourred. The shadow mapping is already valid, nothing to do */ + if ((ptep && !ptep->h.i) || (!ptep && crste_leaf(*table))) + return 0; + + gl =3D get_level(table, ptep); + + /* + * Skip levels that are already protected. For each level, protect + * only the page containing the entry, not the whole table. + */ + for (i =3D gl ; i > w->level; i--) { + rc =3D gmap_protect_rmap(mc, sg, entries[i - 1].gfn, gpa_to_gfn(saddr), + entries[i - 1].pfn, i, entries[i - 1].writable); if (rc) return rc; - kvm->stat.gmap_shadow_sg_entry++; } + + rc =3D dat_entry_walk(NULL, entries[LEVEL_MEM].gfn, sg->parent->asce, DAT= _WALK_LEAF, + TABLE_TYPE_PAGE_TABLE, &host, &ptep_h); + if (rc) + return rc; + + hl =3D get_level(host, ptep_h); + /* Get the smallest granularity */ + l =3D min3(gl, hl, w->level); + + flags =3D DAT_WALK_SPLIT_ALLOC | (sg->parent->uses_skeys ? DAT_WALK_USES_= SKEYS : 0); + /* If necessary, create the shadow mapping */ + if (l < gl) { + rc =3D dat_entry_walk(mc, gpa_to_gfn(saddr), sg->asce, flags, l, &table,= &ptep); + if (rc) + return rc; } - /* Return the parent address of the page table */ - *pgt =3D ptr; - return 0; + if (l < hl) { + rc =3D dat_entry_walk(mc, entries[LEVEL_MEM].gfn, sg->parent->asce, + flags, l, &host, &ptep_h); + if (rc) + return rc; + } + + if (KVM_BUG_ON(l > TABLE_TYPE_REGION3, sg->kvm)) + return -EFAULT; + if (l =3D=3D TABLE_TYPE_PAGE_TABLE) + return _do_shadow_pte(sg, saddr, ptep_h, ptep, entries + LEVEL_MEM, w->p= ); + return _do_shadow_crste(sg, saddr, host, table, entries + LEVEL_MEM, w->p= ); } =20 -/** - * shadow_pgt_lookup() - find a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: the address in the shadow aguest address space - * @pgt: parent gmap address of the page table to get shadowed - * @dat_protection: if the pgtable is marked as protected by dat - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if the shadow page table was found and -EAGAIN if the page - * table was not found. - * - * Called with sg->mm->mmap_lock in read. - */ -static int shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, unsigne= d long *pgt, - int *dat_protection, int *fake) +static inline int _gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap= *sg, gpa_t saddr, + unsigned long seq, struct pgtwalk *walk) { - unsigned long pt_index; - unsigned long *table; - struct page *page; int rc; =20 - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { - /* Shadow page tables are full pages (pte+pgste) */ - page =3D pfn_to_page(*table >> PAGE_SHIFT); - pt_index =3D gmap_pgste_get_pgt_addr(page_to_virt(page)); - *pgt =3D pt_index & ~GMAP_SHADOW_FAKE_TABLE; - *dat_protection =3D !!(*table & _SEGMENT_ENTRY_PROTECT); - *fake =3D !!(pt_index & GMAP_SHADOW_FAKE_TABLE); - rc =3D 0; - } else { - rc =3D -EAGAIN; + if (kvm_s390_array_needs_retry_unsafe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; +again: + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + if (kvm_s390_array_needs_retry_safe(vcpu->kvm, seq, walk->raw_entries)) + return -EAGAIN; + scoped_guard(spinlock, &sg->parent->children_lock) { + if (sg->removed) + return -EAGAIN; + rc =3D _gaccess_do_shadow(vcpu->arch.mc, sg, saddr, walk); + } + if (rc =3D=3D -ENOMEM) + goto again; + if (!rc) + kvm_s390_release_faultin_array(vcpu->kvm, walk->raw_entries, false); } - spin_unlock(&sg->guest_table_lock); return rc; } =20 /** - * kvm_s390_shadow_fault - handle fault on a shadow page table - * @vcpu: virtual cpu - * @sg: pointer to the shadow guest address space structure + * __kvm_s390_shadow_fault() - handle fault on a shadow page table + * @vcpu: virtual cpu that triggered the action + * @sg: the shadow guest address space structure * @saddr: faulting address in the shadow gmap * @datptr: will contain the address of the faulting DAT table entry, or of * the valid leaf, plus some flags + * @wr: whether this is a write access * - * Returns: - 0 if the shadow fault was successfully resolved - * - > 0 (pgm exception code) on exceptions while faulting - * - -EAGAIN if the caller can retry immediately - * - -EFAULT when accessing invalid guest addresses - * - -ENOMEM if out of memory + * Return: + * * 0 if the shadow fault was successfully resolved + * * > 0 (pgm exception code) on exceptions while faulting + * * -EAGAIN if the caller can retry immediately + * * -EFAULT when accessing invalid guest addresses + * * -ENOMEM if out of memory */ -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, - unsigned long saddr, unsigned long *datptr) +static int __gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, = gpa_t saddr, + union mvpg_pei *datptr, bool wr) { - union vaddress vaddr; - union page_table_entry pte; - unsigned long pgt =3D 0; - int dat_protection, fake; + struct pgtwalk walk =3D { .p =3D false, }; + unsigned long seq; int rc; =20 - if (KVM_BUG_ON(!gmap_is_shadow(sg), vcpu->kvm)) - return -EFAULT; + seq =3D vcpu->kvm->mmu_invalidate_seq; + /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); =20 - mmap_read_lock(sg->mm); - /* - * We don't want any guest-2 tables to change - so the parent - * tables/pointers we read stay valid - unshadowing is however - * always possible - only guest_table_lock protects us. - */ - ipte_lock(vcpu->kvm); - - rc =3D shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); + rc =3D walk_guest_tables(sg, saddr, &walk, wr); + if (datptr) { + datptr->val =3D walk.last_addr; + datptr->dat_prot =3D wr && walk.p; + datptr->not_pte =3D walk.level > TABLE_TYPE_PAGE_TABLE; + datptr->real =3D sg->guest_asce.r; + } + if (!rc) + rc =3D _gaccess_shadow_fault(vcpu, sg, saddr, seq, &walk); if (rc) - rc =3D kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection, - &fake); + kvm_s390_release_faultin_array(vcpu->kvm, walk.raw_entries, true); + return rc; +} =20 - vaddr.addr =3D saddr; - if (fake) { - pte.val =3D pgt + vaddr.px * PAGE_SIZE; - goto shadow_page; - } +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr) +{ + int rc; =20 - switch (rc) { - case PGM_SEGMENT_TRANSLATION: - case PGM_REGION_THIRD_TRANS: - case PGM_REGION_SECOND_TRANS: - case PGM_REGION_FIRST_TRANS: - pgt |=3D PEI_NOT_PTE; - break; - case 0: - pgt +=3D vaddr.px * 8; - rc =3D gmap_read_table(sg->parent, pgt, &pte.val); - } - if (datptr) - *datptr =3D pgt | dat_protection * PEI_DAT_PROT; - if (!rc && pte.i) - rc =3D PGM_PAGE_TRANSLATION; - if (!rc && pte.z) - rc =3D PGM_TRANSLATION_SPEC; -shadow_page: - pte.p |=3D dat_protection; - if (!rc) - rc =3D gmap_shadow_page(sg, saddr, __pte(pte.val)); - vcpu->kvm->stat.gmap_shadow_pg_entry++; + if (KVM_BUG_ON(!sg->is_shadow, vcpu->kvm)) + return -EFAULT; + + rc =3D kvm_s390_mmu_cache_topup(vcpu->arch.mc); + if (rc) + return rc; + + ipte_lock(vcpu->kvm); + rc =3D __gaccess_shadow_fault(vcpu, sg, saddr, datptr, wr || sg->guest_as= ce.r); ipte_unlock(vcpu->kvm); - mmap_read_unlock(sg->mm); + return rc; } diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 774cdf19998f..b5385cec60f4 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,7 +206,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsign= ed long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); =20 -int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old_addr, +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union = kvm_s390_quad *old, union kvm_s390_quad new, u8 access_key, bool *success); =20 /** @@ -450,11 +450,17 @@ void ipte_unlock(struct kvm *kvm); int ipte_lock_held(struct kvm *kvm); int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long= gra); =20 -/* MVPG PEI indication bits */ -#define PEI_DAT_PROT 2 -#define PEI_NOT_PTE 4 +union mvpg_pei { + unsigned long val; + struct { + unsigned long addr : 61; + unsigned long not_pte : 1; + unsigned long dat_prot: 1; + unsigned long real : 1; + }; +}; =20 -int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow, - unsigned long saddr, unsigned long *datptr); +int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t sad= dr, + union mvpg_pei *datptr, bool wr); =20 #endif /* __KVM_S390_GACCESS_H */ diff --git a/arch/s390/kvm/gmap-vsie.c b/arch/s390/kvm/gmap-vsie.c deleted file mode 100644 index 56ef153eb8fe..000000000000 --- a/arch/s390/kvm/gmap-vsie.c +++ /dev/null @@ -1,141 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Guest memory management for KVM/s390 nested VMs. - * - * Copyright IBM Corp. 2008, 2020, 2024 - * - * Author(s): Claudio Imbrenda - * Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -#include "kvm-s390.h" - -/** - * gmap_find_shadow - find a specific asce in the list of shadow tables - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * Returns the pointer to a gmap if a shadow table with the given asce is - * already available, ERR_PTR(-EAGAIN) if another one is just being create= d, - * otherwise NULL - * - * Context: Called with parent->shadow_lock held - */ -static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long as= ce, int edat_level) -{ - struct gmap *sg; - - lockdep_assert_held(&parent->shadow_lock); - list_for_each_entry(sg, &parent->children, list) { - if (!gmap_shadow_valid(sg, asce, edat_level)) - continue; - if (!sg->initialized) - return ERR_PTR(-EAGAIN); - refcount_inc(&sg->ref_count); - return sg; - } - return NULL; -} - -/** - * gmap_shadow - create/find a shadow guest address space - * @parent: pointer to the parent gmap - * @asce: ASCE for which the shadow table is created - * @edat_level: edat level to be used for the shadow translation - * - * The pages of the top level page table referred by the asce parameter - * will be set to read-only and marked in the PGSTEs of the kvm process. - * The shadow table will be removed automatically on any change to the - * PTE mapping for the source table. - * - * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of mem= ory, - * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the - * parent gmap table could not be protected. - */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level) -{ - struct gmap *sg, *new; - unsigned long limit; - int rc; - - if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)par= ent->private) || - KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private)) - return ERR_PTR(-EFAULT); - spin_lock(&parent->shadow_lock); - sg =3D gmap_find_shadow(parent, asce, edat_level); - spin_unlock(&parent->shadow_lock); - if (sg) - return sg; - /* Create a new shadow gmap */ - limit =3D -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11)); - if (asce & _ASCE_REAL_SPACE) - limit =3D -1UL; - new =3D gmap_alloc(limit); - if (!new) - return ERR_PTR(-ENOMEM); - new->mm =3D parent->mm; - new->parent =3D gmap_get(parent); - new->private =3D parent->private; - new->orig_asce =3D asce; - new->edat_level =3D edat_level; - new->initialized =3D false; - spin_lock(&parent->shadow_lock); - /* Recheck if another CPU created the same shadow */ - sg =3D gmap_find_shadow(parent, asce, edat_level); - if (sg) { - spin_unlock(&parent->shadow_lock); - gmap_free(new); - return sg; - } - if (asce & _ASCE_REAL_SPACE) { - /* only allow one real-space gmap shadow */ - list_for_each_entry(sg, &parent->children, list) { - if (sg->orig_asce & _ASCE_REAL_SPACE) { - spin_lock(&sg->guest_table_lock); - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - break; - } - } - } - refcount_set(&new->ref_count, 2); - list_add(&new->list, &parent->children); - if (asce & _ASCE_REAL_SPACE) { - /* nothing to protect, return right away */ - new->initialized =3D true; - spin_unlock(&parent->shadow_lock); - return new; - } - spin_unlock(&parent->shadow_lock); - /* protect after insertion, so it will get properly invalidated */ - mmap_read_lock(parent->mm); - rc =3D __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN, - ((asce & _ASCE_TABLE_LENGTH) + 1), - PROT_READ, GMAP_NOTIFY_SHADOW); - mmap_read_unlock(parent->mm); - spin_lock(&parent->shadow_lock); - new->initialized =3D true; - if (rc) { - list_del(&new->list); - gmap_free(new); - new =3D ERR_PTR(rc); - } - spin_unlock(&parent->shadow_lock); - return new; -} diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index cbb777e940d1..502012c0dfad 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -730,13 +730,13 @@ static int _gmap_enable_skeys(struct gmap *gmap) gfn_t start =3D 0; int rc; =20 - if (mm_uses_skeys(gmap->kvm->mm)) + if (gmap->uses_skeys) return 0; =20 - gmap->kvm->mm->context.uses_skeys =3D 1; + WRITE_ONCE(gmap->uses_skeys, 1); rc =3D gmap_helper_disable_cow_sharing(); if (rc) { - gmap->kvm->mm->context.uses_skeys =3D 0; + WRITE_ONCE(gmap->uses_skeys, 0); return rc; } =20 diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c index c7908950c1f4..ecc41587efeb 100644 --- a/arch/s390/kvm/intercept.c +++ b/arch/s390/kvm/intercept.c @@ -21,6 +21,7 @@ #include "gaccess.h" #include "trace.h" #include "trace-s390.h" +#include "faultin.h" =20 u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu) { @@ -367,8 +368,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg2, &srcaddr, GACC_FETCH, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, srcaddr, 0); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(srcaddr), fals= e); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 /* Ensure that the source is paged-in, no actual access -> no key checkin= g */ @@ -376,8 +380,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu) reg1, &dstaddr, GACC_STORE, 0); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc); - rc =3D kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE); - if (rc !=3D 0) + + do { + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(dstaddr), true= ); + } while (rc =3D=3D -EAGAIN); + if (rc) return rc; =20 kvm_s390_retry_instr(vcpu); diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index c62a868cf2b6..aae0bc8bf038 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include @@ -35,6 +34,7 @@ #include "gaccess.h" #include "trace-s390.h" #include "pci.h" +#include "gmap.h" =20 #define PFAULT_INIT 0x0600 #define PFAULT_DONE 0x0680 diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ab69c9fd7926..c8662177c63c 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include @@ -53,6 +52,8 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" +#include "faultin.h" #include "pci.h" =20 #define CREATE_TRACE_POINTS @@ -263,15 +264,11 @@ static DECLARE_BITMAP(kvm_s390_available_cpu_feat, KV= M_S390_VM_CPU_FEAT_NR_BITS) /* available subfunctions indicated via query / "test bit" */ static struct kvm_s390_vm_cpu_subfunc kvm_s390_available_subfunc; =20 -static struct gmap_notifier gmap_notifier; -static struct gmap_notifier vsie_gmap_notifier; debug_info_t *kvm_s390_dbf; debug_info_t *kvm_s390_dbf_uv; =20 /* Section: not file related */ /* forward declarations */ -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); static int sca_switch_to_extended(struct kvm *kvm); =20 static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta) @@ -529,10 +526,6 @@ static int __init __kvm_s390_init(void) if (rc) goto err_gib; =20 - gmap_notifier.notifier_call =3D kvm_gmap_notifier; - gmap_register_pte_notifier(&gmap_notifier); - vsie_gmap_notifier.notifier_call =3D kvm_s390_vsie_gmap_notifier; - gmap_register_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_register(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -552,8 +545,6 @@ static int __init __kvm_s390_init(void) =20 static void __kvm_s390_exit(void) { - gmap_unregister_pte_notifier(&gmap_notifier); - gmap_unregister_pte_notifier(&vsie_gmap_notifier); atomic_notifier_chain_unregister(&s390_epoch_delta_notifier, &kvm_clock_notifier); =20 @@ -569,7 +560,7 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { if (ioctl =3D=3D KVM_S390_ENABLE_SIE) - return s390_enable_sie(); + return 0; return -EINVAL; } =20 @@ -695,32 +686,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) =20 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *mems= lot) { - int i; - gfn_t cur_gfn, last_gfn; - unsigned long gaddr, vmaddr; - struct gmap *gmap =3D kvm->arch.gmap; - DECLARE_BITMAP(bitmap, _PAGE_ENTRIES); - - /* Loop over all guest segments */ - cur_gfn =3D memslot->base_gfn; - last_gfn =3D memslot->base_gfn + memslot->npages; - for (; cur_gfn <=3D last_gfn; cur_gfn +=3D _PAGE_ENTRIES) { - gaddr =3D gfn_to_gpa(cur_gfn); - vmaddr =3D gfn_to_hva_memslot(memslot, cur_gfn); - if (kvm_is_error_hva(vmaddr)) - continue; - - bitmap_zero(bitmap, _PAGE_ENTRIES); - gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr); - for (i =3D 0; i < _PAGE_ENTRIES; i++) { - if (test_bit(i, bitmap)) - mark_page_dirty(kvm, cur_gfn + i); - } + gfn_t last_gfn =3D memslot->base_gfn + memslot->npages; =20 - if (fatal_signal_pending(current)) - return; - cond_resched(); - } + scoped_guard(read_lock, &kvm->mmu_lock) + gmap_sync_dirty_log(kvm->arch.gmap, memslot->base_gfn, last_gfn); } =20 /* Section: vm related */ @@ -880,9 +849,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; - mmap_write_lock(kvm->mm); - kvm->mm->context.allow_gmap_hpage_1m =3D 1; - mmap_write_unlock(kvm->mm); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -949,7 +915,7 @@ static int kvm_s390_get_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_att= r *attr) { int ret; - unsigned int idx; + switch (attr->attr) { case KVM_S390_VM_MEM_ENABLE_CMMA: ret =3D -ENXIO; @@ -960,8 +926,6 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att mutex_lock(&kvm->lock); if (kvm->created_vcpus) ret =3D -EBUSY; - else if (kvm->mm->context.allow_gmap_hpage_1m) - ret =3D -EINVAL; else { kvm->arch.use_cmma =3D 1; /* Not compatible with cmma. */ @@ -970,7 +934,9 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, st= ruct kvm_device_attr *att } mutex_unlock(&kvm->lock); break; - case KVM_S390_VM_MEM_CLR_CMMA: + case KVM_S390_VM_MEM_CLR_CMMA: { + gfn_t start_gfn =3D 0; + ret =3D -ENXIO; if (!sclp.has_cmma) break; @@ -979,13 +945,13 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, = struct kvm_device_attr *att break; =20 VM_EVENT(kvm, 3, "%s", "RESET: CMMA states"); - mutex_lock(&kvm->lock); - idx =3D srcu_read_lock(&kvm->srcu); - s390_reset_cmma(kvm->arch.gmap->mm); - srcu_read_unlock(&kvm->srcu, idx); - mutex_unlock(&kvm->lock); + do { + start_gfn =3D dat_reset_cmma(kvm->arch.gmap->asce, start_gfn); + cond_resched(); + } while (start_gfn); ret =3D 0; break; + } case KVM_S390_VM_MEM_LIMIT_SIZE: { unsigned long new_limit; =20 @@ -1002,29 +968,12 @@ static int kvm_s390_set_mem_control(struct kvm *kvm,= struct kvm_device_attr *att if (!new_limit) return -EINVAL; =20 - /* gmap_create takes last usable address */ - if (new_limit !=3D KVM_S390_NO_MEM_LIMIT) - new_limit -=3D 1; - ret =3D -EBUSY; - mutex_lock(&kvm->lock); - if (!kvm->created_vcpus) { - /* gmap_create will round the limit up */ - struct gmap *new =3D gmap_create(current->mm, new_limit); - - if (!new) { - ret =3D -ENOMEM; - } else { - gmap_remove(kvm->arch.gmap); - new->private =3D kvm; - kvm->arch.gmap =3D new; - ret =3D 0; - } - } - mutex_unlock(&kvm->lock); + if (!kvm->created_vcpus) + ret =3D gmap_set_limit(kvm->arch.gmap, gpa_to_gfn(new_limit)); VM_EVENT(kvm, 3, "SET: max guest address: %lu", new_limit); VM_EVENT(kvm, 3, "New guest asce: 0x%p", - (void *) kvm->arch.gmap->asce); + (void *)kvm->arch.gmap->asce.val); break; } default: @@ -1189,19 +1138,13 @@ static int kvm_s390_vm_start_migration(struct kvm *= kvm) kvm->arch.migration_mode =3D 1; return 0; } - /* mark all the pages in active slots as dirty */ kvm_for_each_memslot(ms, bkt, slots) { if (!ms->dirty_bitmap) return -EINVAL; - /* - * The second half of the bitmap is only used on x86, - * and would be wasted otherwise, so we put it to good - * use here to keep track of the state of the storage - * attributes. - */ - memset(kvm_second_dirty_bitmap(ms), 0xff, kvm_dirty_bitmap_bytes(ms)); ram_pages +=3D ms->npages; } + /* mark all the pages as dirty */ + gmap_set_cmma_all_dirty(kvm->arch.gmap); atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages); kvm->arch.migration_mode =3D 1; kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION); @@ -2113,40 +2056,32 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, st= ruct kvm_device_attr *attr) =20 static int kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; =20 /* Is this guest using storage keys? */ - if (!mm_uses_skeys(current->mm)) + if (!kvm->arch.gmap->uses_skeys) return KVM_S390_GET_SKEYS_NONE; =20 /* Enforce sane limit on memory allocation */ if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0; i < args->count; i++) { + r =3D dat_get_storage_key(kvm->arch.gmap->asce, + args->start_gfn + i, keys + i); + if (r) + break; } - - r =3D get_guest_storage_key(current->mm, hva, &keys[i]); - if (r) - break; } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); =20 if (!r) { r =3D copy_to_user((uint8_t __user *)args->skeydata_addr, keys, @@ -2161,10 +2096,9 @@ static int kvm_s390_get_skeys(struct kvm *kvm, struc= t kvm_s390_skeys *args) =20 static int kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) { - uint8_t *keys; - uint64_t hva; - int srcu_idx, i, r =3D 0; - bool unlocked; + struct kvm_s390_mmu_cache *mc; + union skey *keys; + int i, r =3D 0; =20 if (args->flags !=3D 0) return -EINVAL; @@ -2173,7 +2107,7 @@ static int kvm_s390_set_skeys(struct kvm *kvm, struct= kvm_s390_skeys *args) if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; =20 - keys =3D kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT); + keys =3D kvmalloc_array(args->count, sizeof(*keys), GFP_KERNEL_ACCOUNT); if (!keys) return -ENOMEM; =20 @@ -2185,159 +2119,41 @@ static int kvm_s390_set_skeys(struct kvm *kvm, str= uct kvm_s390_skeys *args) } =20 /* Enable storage key handling for the guest */ - r =3D s390_enable_skey(); + r =3D gmap_enable_skeys(kvm->arch.gmap); if (r) goto out; =20 - i =3D 0; - mmap_read_lock(current->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - while (i < args->count) { - unlocked =3D false; - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; - break; - } - + r =3D -EINVAL; + for (i =3D 0; i < args->count; i++) { /* Lowest order bit is reserved */ - if (keys[i] & 0x01) { - r =3D -EINVAL; - break; - } - - r =3D set_guest_storage_key(current->mm, hva, keys[i], 0); - if (r) { - r =3D fixup_user_fault(current->mm, hva, - FAULT_FLAG_WRITE, &unlocked); - if (r) - break; - } - if (!r) - i++; - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(current->mm); -out: - kvfree(keys); - return r; -} - -/* - * Base address and length must be sent at the start of each block, theref= ore - * it's cheaper to send some clean data, as long as it's less than the siz= e of - * two longs. - */ -#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *)) -/* for consistency */ -#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) - -static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *a= rgs, - u8 *res, unsigned long bufsize) -{ - unsigned long pgstev, hva, cur_gfn =3D args->start_gfn; - - args->count =3D 0; - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - /* - * We return an error if the first value was invalid, but we - * return successfully if at least one value was copied. - */ - if (kvm_is_error_hva(hva)) - return args->count ? 0 : -EFAULT; - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - res[args->count++] =3D (pgstev >> 24) & 0x43; - cur_gfn++; + if (keys[i].zero) + goto out; } =20 - return 0; -} - -static struct kvm_memory_slot *gfn_to_memslot_approx(struct kvm_memslots *= slots, - gfn_t gfn) -{ - return ____gfn_to_memslot(slots, gfn, true); -} - -static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots, - unsigned long cur_gfn) -{ - struct kvm_memory_slot *ms =3D gfn_to_memslot_approx(slots, cur_gfn); - unsigned long ofs =3D cur_gfn - ms->base_gfn; - struct rb_node *mnode =3D &ms->gfn_node[slots->node_idx]; - - if (ms->base_gfn + ms->npages <=3D cur_gfn) { - mnode =3D rb_next(mnode); - /* If we are above the highest slot, wrap around */ - if (!mnode) - mnode =3D rb_first(&slots->gfn_tree); - - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D 0; - } - - if (cur_gfn < ms->base_gfn) - ofs =3D 0; - - ofs =3D find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs); - while (ofs >=3D ms->npages && (mnode =3D rb_next(mnode))) { - ms =3D container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_= idx]); - ofs =3D find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages); + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + r =3D -ENOMEM; + goto out; } - return ms->base_gfn + ofs; -} =20 -static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *ar= gs, - u8 *res, unsigned long bufsize) -{ - unsigned long mem_end, cur_gfn, next_gfn, hva, pgstev; - struct kvm_memslots *slots =3D kvm_memslots(kvm); - struct kvm_memory_slot *ms; - - if (unlikely(kvm_memslots_empty(slots))) - return 0; - - cur_gfn =3D kvm_s390_next_dirty_cmma(slots, args->start_gfn); - ms =3D gfn_to_memslot(kvm, cur_gfn); - args->count =3D 0; - args->start_gfn =3D cur_gfn; - if (!ms) - return 0; - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - mem_end =3D kvm_s390_get_gfn_end(slots); - - while (args->count < bufsize) { - hva =3D gfn_to_hva(kvm, cur_gfn); - if (kvm_is_error_hva(hva)) - return 0; - /* Decrement only if we actually flipped the bit to 0 */ - if (test_and_clear_bit(cur_gfn - ms->base_gfn, kvm_second_dirty_bitmap(m= s))) - atomic64_dec(&kvm->arch.cmma_dirty_pages); - if (get_pgste(kvm->mm, hva, &pgstev) < 0) - pgstev =3D 0; - /* Save the value */ - res[args->count++] =3D (pgstev >> 24) & 0x43; - /* If the next bit is too far away, stop. */ - if (next_gfn > cur_gfn + KVM_S390_MAX_BIT_DISTANCE) - return 0; - /* If we reached the previous "next", find the next one */ - if (cur_gfn =3D=3D next_gfn) - next_gfn =3D kvm_s390_next_dirty_cmma(slots, cur_gfn + 1); - /* Reached the end of memory or of the buffer, stop */ - if ((next_gfn >=3D mem_end) || - (next_gfn - args->start_gfn >=3D bufsize)) - return 0; - cur_gfn++; - /* Reached the end of the current memslot, take the next one. */ - if (cur_gfn - ms->base_gfn >=3D ms->npages) { - ms =3D gfn_to_memslot(kvm, cur_gfn); - if (!ms) - return 0; + r =3D 0; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r =3D=3D -ENOMEM) + break; + scoped_guard(read_lock, &kvm->mmu_lock) { + for (i =3D 0 ; i < args->count; i++) { + r =3D dat_set_storage_key(mc, kvm->arch.gmap->asce, + args->start_gfn + i, keys[i], 0); + if (r) + break; + } } - } - return 0; + } while (r =3D=3D -ENOMEM); + kvm_s390_free_mmu_cache(mc); +out: + kvfree(keys); + return r; } =20 /* @@ -2351,8 +2167,7 @@ static int kvm_s390_get_cmma(struct kvm *kvm, struct = kvm_s390_cmma_log *args, static int kvm_s390_get_cmma_bits(struct kvm *kvm, struct kvm_s390_cmma_log *args) { - unsigned long bufsize; - int srcu_idx, peek, ret; + int peek, ret; u8 *values; =20 if (!kvm->arch.use_cmma) @@ -2365,8 +2180,8 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, if (!peek && !kvm->arch.migration_mode) return -EINVAL; /* CMMA is disabled or was not used, or the buffer has length zero */ - bufsize =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); - if (!bufsize || !kvm->mm->context.uses_cmm) { + args->count =3D min(args->count, KVM_S390_CMMA_SIZE_MAX); + if (!args->count || !kvm->arch.gmap->uses_cmm) { memset(args, 0, sizeof(*args)); return 0; } @@ -2376,18 +2191,18 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, return 0; } =20 - values =3D vmalloc(bufsize); + values =3D vmalloc(args->count); if (!values) return -ENOMEM; =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - if (peek) - ret =3D kvm_s390_peek_cmma(kvm, args, values, bufsize); - else - ret =3D kvm_s390_get_cmma(kvm, args, values, bufsize); - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); + scoped_guard(read_lock, &kvm->mmu_lock) { + if (peek) + ret =3D dat_peek_cmma(args->start_gfn, kvm->arch.gmap->asce, &args->cou= nt, + values); + else + ret =3D dat_get_cmma(kvm->arch.gmap->asce, &args->start_gfn, &args->cou= nt, + values, &kvm->arch.cmma_dirty_pages); + } =20 if (kvm->arch.migration_mode) args->remaining =3D atomic64_read(&kvm->arch.cmma_dirty_pages); @@ -2409,11 +2224,9 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm, static int kvm_s390_set_cmma_bits(struct kvm *kvm, const struct kvm_s390_cmma_log *args) { - unsigned long hva, mask, pgstev, i; - uint8_t *bits; - int srcu_idx, r =3D 0; - - mask =3D args->mask; + struct kvm_s390_mmu_cache *mc; + u8 *bits =3D NULL; + int r =3D 0; =20 if (!kvm->arch.use_cmma) return -ENXIO; @@ -2427,9 +2240,12 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, if (args->count =3D=3D 0) return 0; =20 + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; bits =3D vmalloc(array_size(sizeof(*bits), args->count)); if (!bits) - return -ENOMEM; + goto out; =20 r =3D copy_from_user(bits, (void __user *)args->values, args->count); if (r) { @@ -2437,29 +2253,19 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm, goto out; } =20 - mmap_read_lock(kvm->mm); - srcu_idx =3D srcu_read_lock(&kvm->srcu); - for (i =3D 0; i < args->count; i++) { - hva =3D gfn_to_hva(kvm, args->start_gfn + i); - if (kvm_is_error_hva(hva)) { - r =3D -EFAULT; + do { + r =3D kvm_s390_mmu_cache_topup(mc); + if (r) break; + scoped_guard(read_lock, &kvm->mmu_lock) { + r =3D dat_set_cmma_bits(mc, kvm->arch.gmap->asce, args->start_gfn, + args->count, args->mask, bits); } + } while (r =3D=3D -ENOMEM); =20 - pgstev =3D bits[i]; - pgstev =3D pgstev << 24; - mask &=3D _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT; - set_pgste_bits(kvm->mm, hva, mask, pgstev); - } - srcu_read_unlock(&kvm->srcu, srcu_idx); - mmap_read_unlock(kvm->mm); - - if (!kvm->mm->context.uses_cmm) { - mmap_write_lock(kvm->mm); - kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(kvm->mm); - } + WRITE_ONCE(kvm->arch.gmap->uses_cmm, 1); out: + kvm_s390_free_mmu_cache(mc); vfree(bits); return r; } @@ -2923,9 +2729,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) acc_mode =3D mop->op =3D=3D KVM_S390_MEMOP_ABSOLUTE_READ ? GACC_FETCH : G= ACC_STORE; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) return check_gpa_range(kvm, mop->gaddr, mop->size, acc_mode, mop->key); =20 @@ -2938,7 +2741,6 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, st= ruct kvm_s390_mem_op *mop) if (acc_mode !=3D GACC_STORE && copy_to_user(uaddr, tmpbuf, mop->size)) return -EFAULT; } - return 0; } =20 @@ -2967,9 +2769,6 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm= , struct kvm_s390_mem_op *m return -EFAULT; =20 scoped_guard(srcu, &kvm->srcu) { - if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) - return PGM_ADDRESSING; - r =3D cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old, new, mop->key, &success); =20 @@ -3329,11 +3128,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) if (type) goto out_err; #endif - - rc =3D s390_enable_sie(); - if (rc) - goto out_err; - rc =3D -ENOMEM; =20 if (!sclp.has_64bscao) @@ -3413,6 +3207,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long = type) debug_register_view(kvm->arch.dbf, &debug_sprintf_view); VM_EVENT(kvm, 3, "vm created with type %lu", type); =20 + kvm->arch.mem_limit =3D type & KVM_VM_S390_UCONTROL ? KVM_S390_NO_MEM_LIM= IT : sclp.hamax + 1; + kvm->arch.gmap =3D gmap_new(kvm, gpa_to_gfn(kvm->arch.mem_limit)); + if (!kvm->arch.gmap) + goto out_err; + kvm->arch.gmap->pfault_enabled =3D 0; + if (type & KVM_VM_S390_UCONTROL) { struct kvm_userspace_memory_region2 fake_memslot =3D { .slot =3D KVM_S390_UCONTROL_MEMSLOT, @@ -3422,23 +3222,15 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) .flags =3D 0, }; =20 - kvm->arch.gmap =3D NULL; - kvm->arch.mem_limit =3D KVM_S390_NO_MEM_LIMIT; /* one flat fake memslot covering the whole address-space */ mutex_lock(&kvm->slots_lock); KVM_BUG_ON(kvm_set_internal_memslot(kvm, &fake_memslot), kvm); mutex_unlock(&kvm->slots_lock); + kvm->arch.gmap->is_ucontrol =3D 1; } else { - if (sclp.hamax =3D=3D U64_MAX) - kvm->arch.mem_limit =3D TASK_SIZE_MAX; - else - kvm->arch.mem_limit =3D min_t(unsigned long, TASK_SIZE_MAX, - sclp.hamax + 1); - kvm->arch.gmap =3D gmap_create(current->mm, kvm->arch.mem_limit - 1); - if (!kvm->arch.gmap) - goto out_err; - kvm->arch.gmap->private =3D kvm; - kvm->arch.gmap->pfault_enabled =3D 0; + struct crst_table *table =3D dereference_asce(kvm->arch.gmap->asce); + + crst_table_init((void *)table, _CRSTE_HOLE(table->crstes[0].h.tt).val); } =20 kvm->arch.use_pfmfi =3D sclp.has_pfmfi; @@ -3472,8 +3264,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) sca_del_vcpu(vcpu); kvm_s390_update_topology_change_report(vcpu->kvm, 1); =20 - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) + gmap_remove_child(vcpu->arch.gmap); + gmap_dispose(vcpu->arch.gmap); + } =20 if (vcpu->kvm->arch.use_cmma) kvm_s390_vcpu_unsetup_cmma(vcpu); @@ -3481,6 +3276,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) if (kvm_s390_pv_cpu_get_handle(vcpu)) kvm_s390_pv_destroy_cpu(vcpu, &rc, &rrc); free_page((unsigned long)(vcpu->arch.sie_block)); + kvm_s390_free_mmu_cache(vcpu->arch.mc); } =20 void kvm_arch_destroy_vm(struct kvm *kvm) @@ -3507,25 +3303,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm) =20 debug_unregister(kvm->arch.dbf); free_page((unsigned long)kvm->arch.sie_page2); - if (!kvm_is_ucontrol(kvm)) - gmap_remove(kvm->arch.gmap); kvm_s390_destroy_adapters(kvm); kvm_s390_clear_float_irqs(kvm); kvm_s390_vsie_destroy(kvm); + gmap_dispose(kvm->arch.gmap); KVM_EVENT(3, "vm 0x%p destroyed", kvm); } =20 -/* Section: vcpu related */ -static int __kvm_ucontrol_vcpu_init(struct kvm_vcpu *vcpu) -{ - vcpu->arch.gmap =3D gmap_create(current->mm, -1UL); - if (!vcpu->arch.gmap) - return -ENOMEM; - vcpu->arch.gmap->private =3D vcpu->kvm; - - return 0; -} - static void sca_del_vcpu(struct kvm_vcpu *vcpu) { if (!kvm_s390_use_sca_entries()) @@ -3961,9 +3745,15 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) int rc; =20 BUILD_BUG_ON(sizeof(struct sie_page) !=3D 4096); + vcpu->arch.mc =3D kvm_s390_new_mmu_cache(); + if (!vcpu->arch.mc) + return -ENOMEM; sie_page =3D (struct sie_page *) get_zeroed_page(GFP_KERNEL_ACCOUNT); - if (!sie_page) + if (!sie_page) { + kvm_s390_free_mmu_cache(vcpu->arch.mc); + vcpu->arch.mc =3D NULL; return -ENOMEM; + } =20 vcpu->arch.sie_block =3D &sie_page->sie_block; vcpu->arch.sie_block->itdba =3D virt_to_phys(&sie_page->itdb); @@ -4005,8 +3795,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->run->kvm_valid_regs |=3D KVM_SYNC_FPRS; =20 if (kvm_is_ucontrol(vcpu->kvm)) { - rc =3D __kvm_ucontrol_vcpu_init(vcpu); - if (rc) + rc =3D -ENOMEM; + vcpu->arch.gmap =3D gmap_new_child(vcpu->kvm->arch.gmap, -1UL); + if (!vcpu->arch.gmap) goto out_free_sie_block; } =20 @@ -4022,8 +3813,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; =20 out_ucontrol_uninit: - if (kvm_is_ucontrol(vcpu->kvm)) - gmap_remove(vcpu->arch.gmap); + if (kvm_is_ucontrol(vcpu->kvm)) { + gmap_remove_child(vcpu->arch.gmap); + gmap_dispose(vcpu->arch.gmap); + } out_free_sie_block: free_page((unsigned long)(vcpu->arch.sie_block)); return rc; @@ -4087,32 +3880,6 @@ void kvm_s390_sync_request(int req, struct kvm_vcpu = *vcpu) kvm_s390_vcpu_request(vcpu); } =20 -static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct kvm *kvm =3D gmap->private; - struct kvm_vcpu *vcpu; - unsigned long prefix; - unsigned long i; - - trace_kvm_s390_gmap_notifier(start, end, gmap_is_shadow(gmap)); - - if (gmap_is_shadow(gmap)) - return; - if (start >=3D 1UL << 31) - /* We are only interested in prefix pages */ - return; - kvm_for_each_vcpu(i, vcpu, kvm) { - /* match against both prefix pages */ - prefix =3D kvm_s390_get_prefix(vcpu); - if (prefix <=3D end && start <=3D prefix + 2*PAGE_SIZE - 1) { - VCPU_EVENT(vcpu, 2, "gmap notifier for %lx-%lx", - start, end); - kvm_s390_sync_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); - } - } -} - bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) { /* do not poll with more than halt_poll_max_steal percent of steal time */ @@ -4496,72 +4263,53 @@ static bool ibs_enabled(struct kvm_vcpu *vcpu) return kvm_s390_test_cpuflags(vcpu, CPUSTAT_IBS); } =20 -static int __kvm_s390_fixup_fault_sync(struct gmap *gmap, gpa_t gaddr, uns= igned int flags) +static int vcpu_ucontrol_translate(struct kvm_vcpu *vcpu, gpa_t *gaddr) { - struct kvm *kvm =3D gmap->private; - gfn_t gfn =3D gpa_to_gfn(gaddr); - bool unlocked; - hva_t vmaddr; - gpa_t tmp; + union crste *crstep; + union pte *ptep; int rc; =20 - if (kvm_is_ucontrol(kvm)) { - tmp =3D __gmap_translate(gmap, gaddr); - gfn =3D gpa_to_gfn(tmp); - } - - vmaddr =3D gfn_to_hva(kvm, gfn); - rc =3D fixup_user_fault(gmap->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!rc) - rc =3D __gmap_link(gmap, gaddr, vmaddr); - return rc; -} - -/** - * __kvm_s390_mprotect_many() - Apply specified protection to guest pages - * @gmap: the gmap of the guest - * @gpa: the starting guest address - * @npages: how many pages to protect - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: 0 in case of success, < 0 in case of error - see gmap_protect_= one() - * - * Context: kvm->srcu and gmap->mm need to be held in read mode - */ -int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, - unsigned long bits) -{ - unsigned int fault_flag =3D (prot & PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - gpa_t end =3D gpa + npages * PAGE_SIZE; - int rc; - - for (; gpa < end; gpa =3D ALIGN(gpa + 1, rc)) { - rc =3D gmap_protect_one(gmap, gpa, prot, bits); - if (rc =3D=3D -EAGAIN) { - __kvm_s390_fixup_fault_sync(gmap, gpa, fault_flag); - rc =3D gmap_protect_one(gmap, gpa, prot, bits); + if (kvm_is_ucontrol(vcpu->kvm)) { + /* + * This translates the per-vCPU guest address into a + * fake guest address, which can then be used with the + * fake memslots that are identity mapping userspace. + * This allows ucontrol VMs to use the normal fault + * resolution path, like normal VMs. + */ + rc =3D dat_entry_walk(NULL, gpa_to_gfn(*gaddr), vcpu->arch.gmap->asce, + 0, TABLE_TYPE_PAGE_TABLE, &crstep, &ptep); + if (rc) { + vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; + vcpu->run->s390_ucontrol.trans_exc_code =3D *gaddr; + vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; + return -EREMOTE; } - if (rc < 0) - return rc; + *gaddr &=3D ~_SEGMENT_MASK; + *gaddr |=3D dat_get_ptval(pte_table_start(ptep), PTVAL_VMADDR) << _SEGME= NT_SHIFT; } - return 0; } =20 -static int kvm_s390_mprotect_notify_prefix(struct kvm_vcpu *vcpu) +static int kvm_s390_fixup_prefix(struct kvm_vcpu *vcpu) { gpa_t gaddr =3D kvm_s390_get_prefix(vcpu); - int idx, rc; - - idx =3D srcu_read_lock(&vcpu->kvm->srcu); - mmap_read_lock(vcpu->arch.gmap->mm); + gfn_t gfn; + int rc; =20 - rc =3D __kvm_s390_mprotect_many(vcpu->arch.gmap, gaddr, 2, PROT_WRITE, GM= AP_NOTIFY_MPROT); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + gfn =3D gpa_to_gfn(gaddr); =20 - mmap_read_unlock(vcpu->arch.gmap->mm); - srcu_read_unlock(&vcpu->kvm->srcu, idx); + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn, true); + if (rc) + return rc; + rc =3D kvm_s390_faultin_gfn_simple(vcpu, NULL, gfn + 1, true); + if (rc) + return rc; =20 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_set_prefix_notif_bit(vcpu->kvm->arch.gmap->asce, gfn); return rc; } =20 @@ -4581,7 +4329,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) { int rc; =20 - rc =3D kvm_s390_mprotect_notify_prefix(vcpu); + rc =3D kvm_s390_fixup_prefix(vcpu); if (rc) { kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); return rc; @@ -4631,7 +4379,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *= vcpu) * CMM has been used. */ if ((vcpu->kvm->arch.use_cmma) && - (vcpu->kvm->mm->context.uses_cmm)) + (vcpu->arch.gmap->uses_cmm)) vcpu->arch.sie_block->ecb2 |=3D ECB2_CMMA; goto retry; } @@ -4839,98 +4587,25 @@ static void kvm_s390_assert_primary_as(struct kvm_v= cpu *vcpu) current->thread.gmap_int_code, current->thread.gmap_teid.val); } =20 -/* - * __kvm_s390_handle_dat_fault() - handle a dat fault for the gmap of a vc= pu - * @vcpu: the vCPU whose gmap is to be fixed up - * @gfn: the guest frame number used for memslots (including fake memslots) - * @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps - * @foll: FOLL_* flags - * - * Return: 0 on success, < 0 in case of error. - * Context: The mm lock must not be held before calling. May sleep. - */ -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int foll) -{ - struct kvm_memory_slot *slot; - unsigned int fault_flags; - bool writable, unlocked; - unsigned long vmaddr; - struct page *page; - kvm_pfn_t pfn; +static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, gpa_t gaddr, bool= wr) +{ + struct guest_fault f =3D { + .write_attempt =3D wr, + .attempt_pfault =3D vcpu->arch.gmap->pfault_enabled, + }; int rc; =20 - slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); - if (!slot || slot->flags & KVM_MEMSLOT_INVALID) - return vcpu_post_run_addressing_exception(vcpu); - - fault_flags =3D foll & FOLL_WRITE ? FAULT_FLAG_WRITE : 0; - if (vcpu->arch.gmap->pfault_enabled) - foll |=3D FOLL_NOWAIT; - vmaddr =3D __gfn_to_hva_memslot(slot, gfn); - -try_again: - pfn =3D __kvm_faultin_pfn(slot, gfn, foll, &writable, &page); + if (vcpu_ucontrol_translate(vcpu, &gaddr)) + return -EREMOTE; + f.gfn =3D gpa_to_gfn(gaddr); =20 - /* Access outside memory, inject addressing exception */ - if (is_noslot_pfn(pfn)) + rc =3D kvm_s390_faultin_gfn(vcpu, NULL, &f); + if (rc <=3D 0) + return rc; + if (rc =3D=3D PGM_ADDRESSING) return vcpu_post_run_addressing_exception(vcpu); - /* Signal pending: try again */ - if (pfn =3D=3D KVM_PFN_ERR_SIGPENDING) - return -EAGAIN; - - /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT) = */ - if (pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { - trace_kvm_s390_major_guest_pfault(vcpu); - if (kvm_arch_setup_async_pf(vcpu)) - return 0; - vcpu->stat.pfault_sync++; - /* Could not setup async pfault, try again synchronously */ - foll &=3D ~FOLL_NOWAIT; - goto try_again; - } - /* Any other error */ - if (is_error_pfn(pfn)) - return -EFAULT; - - /* Success */ - mmap_read_lock(vcpu->arch.gmap->mm); - /* Mark the userspace PTEs as young and/or dirty, to avoid page fault loo= ps */ - rc =3D fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlock= ed); - if (!rc) - rc =3D __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); - scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { - kvm_release_faultin_page(vcpu->kvm, page, false, writable); - } - mmap_read_unlock(vcpu->arch.gmap->mm); - return rc; -} - -static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gad= dr, unsigned int foll) -{ - unsigned long gaddr_tmp; - gfn_t gfn; - - gfn =3D gpa_to_gfn(gaddr); - if (kvm_is_ucontrol(vcpu->kvm)) { - /* - * This translates the per-vCPU guest address into a - * fake guest address, which can then be used with the - * fake memslots that are identity mapping userspace. - * This allows ucontrol VMs to use the normal fault - * resolution path, like normal VMs. - */ - mmap_read_lock(vcpu->arch.gmap->mm); - gaddr_tmp =3D __gmap_translate(vcpu->arch.gmap, gaddr); - mmap_read_unlock(vcpu->arch.gmap->mm); - if (gaddr_tmp =3D=3D -EFAULT) { - vcpu->run->exit_reason =3D KVM_EXIT_S390_UCONTROL; - vcpu->run->s390_ucontrol.trans_exc_code =3D gaddr; - vcpu->run->s390_ucontrol.pgm_code =3D PGM_SEGMENT_TRANSLATION; - return -EREMOTE; - } - gfn =3D gpa_to_gfn(gaddr_tmp); - } - return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, foll); + KVM_BUG_ON(rc, vcpu->kvm); + return -EINVAL; } =20 static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu) @@ -5102,7 +4777,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) =20 exit_reason =3D kvm_s390_enter_exit_sie(vcpu->arch.sie_block, vcpu->run->s.regs.gprs, - vcpu->arch.gmap->asce); + vcpu->arch.gmap->asce.val); =20 __enable_cpu_timer_accounting(vcpu); guest_timing_exit_irqoff(); @@ -5633,8 +5308,8 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcp= u, struct kvm_s390_mem_op *mop) { void __user *uaddr =3D (void __user *)mop->buf; + void *tmpbuf __free(kvfree) =3D NULL; enum gacc_mode acc_mode; - void *tmpbuf =3D NULL; int r; =20 r =3D mem_op_validate_common(mop, KVM_S390_MEMOP_F_INJECT_EXCEPTION | @@ -5656,32 +5331,21 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *v= cpu, if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r =3D check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, acc_mode, mop->key); - goto out_inject; - } - if (acc_mode =3D=3D GACC_FETCH) { + } else if (acc_mode =3D=3D GACC_FETCH) { r =3D read_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); - if (r) - goto out_inject; - if (copy_to_user(uaddr, tmpbuf, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (!r && copy_to_user(uaddr, tmpbuf, mop->size)) + return -EFAULT; } else { - if (copy_from_user(tmpbuf, uaddr, mop->size)) { - r =3D -EFAULT; - goto out_free; - } + if (copy_from_user(tmpbuf, uaddr, mop->size)) + return -EFAULT; r =3D write_guest_with_key(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size, mop->key); } =20 -out_inject: if (r > 0 && (mop->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION) !=3D 0) kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm); =20 -out_free: - vfree(tmpbuf); return r; } =20 @@ -5871,37 +5535,39 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } #ifdef CONFIG_KVM_S390_UCONTROL case KVM_S390_UCAS_MAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.user_addr | ucas.vcpu_addr | ucas.length, _SEGMENT_= SIZE)) break; - } =20 - r =3D gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr, - ucasmap.vcpu_addr, ucasmap.length); + r =3D gmap_ucas_map(vcpu->arch.gmap, gpa_to_gfn(ucas.user_addr), + gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); break; } case KVM_S390_UCAS_UNMAP: { - struct kvm_s390_ucas_mapping ucasmap; + struct kvm_s390_ucas_mapping ucas; =20 - if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { - r =3D -EFAULT; + r =3D -EFAULT; + if (copy_from_user(&ucas, argp, sizeof(ucas))) break; - } =20 - if (!kvm_is_ucontrol(vcpu->kvm)) { - r =3D -EINVAL; + r =3D -EINVAL; + if (!kvm_is_ucontrol(vcpu->kvm)) + break; + if (!IS_ALIGNED(ucas.vcpu_addr | ucas.length, _SEGMENT_SIZE)) break; - } =20 - r =3D gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr, - ucasmap.length); + gmap_ucas_unmap(vcpu->arch.gmap, gpa_to_gfn(ucas.vcpu_addr), + ucas.length >> _SEGMENT_SHIFT); + r =3D 0; break; } #endif @@ -6074,34 +5740,39 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *new, enum kvm_mr_change change) { + struct kvm_s390_mmu_cache *mc =3D NULL; int rc =3D 0; =20 - if (kvm_is_ucontrol(kvm)) + if (change =3D=3D KVM_MR_FLAGS_ONLY) return; =20 + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) { + rc =3D -ENOMEM; + goto out; + } + switch (change) { case KVM_MR_DELETE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->npa= ges); break; case KVM_MR_MOVE: - rc =3D gmap_unmap_segment(kvm->arch.gmap, old->base_gfn * PAGE_SIZE, - old->npages * PAGE_SIZE); + rc =3D dat_delete_slot(mc, kvm->arch.gmap->asce, old->base_gfn, old->npa= ges); if (rc) break; fallthrough; case KVM_MR_CREATE: - rc =3D gmap_map_segment(kvm->arch.gmap, new->userspace_addr, - new->base_gfn * PAGE_SIZE, - new->npages * PAGE_SIZE); + rc =3D dat_create_slot(mc, kvm->arch.gmap->asce, new->base_gfn, new->npa= ges); break; case KVM_MR_FLAGS_ONLY: break; default: WARN(1, "Unknown KVM MR CHANGE: %d\n", change); } +out: if (rc) pr_warn("failed to commit memory region\n"); + kvm_s390_free_mmu_cache(mc); return; } =20 @@ -6115,7 +5786,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, */ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return dat_test_age_gfn(kvm->arch.gmap->asce, range->start, range->end); } =20 /** @@ -6128,7 +5800,8 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn= _range *range) */ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + scoped_guard(read_lock, &kvm->mmu_lock) + return gmap_age_gfn(kvm->arch.gmap, range->start, range->end); } =20 /** @@ -6145,7 +5818,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) */ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return false; + return gmap_unmap_gfn_range(kvm->arch.gmap, range->slot, range->start, ra= nge->end); } =20 static inline unsigned long nonhyp_mask(int i) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 495ee9caaa30..8a979b1f1a7b 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -19,6 +19,8 @@ #include #include #include +#include "dat.h" +#include "gmap.h" =20 #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) =20 @@ -114,9 +116,7 @@ static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) static inline int kvm_is_ucontrol(struct kvm *kvm) { #ifdef CONFIG_KVM_S390_UCONTROL - if (kvm->arch.gmap) - return 0; - return 1; + return kvm->arch.gmap->is_ucontrol; #else return 0; #endif @@ -440,14 +440,10 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu); /* implemented in vsie.c */ int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu); void kvm_s390_vsie_kick(struct kvm_vcpu *vcpu); -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end); +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end= ); void kvm_s390_vsie_init(struct kvm *kvm); void kvm_s390_vsie_destroy(struct kvm *kvm); -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level); - -/* implemented in gmap-vsie.c */ -struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat= _level); +int gmap_shadow_valid(struct gmap *sg, union asce asce, int edat_level); =20 /* implemented in sigp.c */ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); @@ -469,15 +465,9 @@ void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu); void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm); __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu); int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rc, u16 *rrc); -int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t ga= ddr, unsigned int flags); int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsi= gned int prot, unsigned long bits); =20 -static inline int kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gpa_t g= addr, unsigned int flags) -{ - return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); -} - bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu); =20 /* implemented in diag.c */ diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 9a71b6e00948..4ecc20688db6 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -21,13 +21,14 @@ #include #include #include -#include #include #include #include +#include #include "gaccess.h" #include "kvm-s390.h" #include "trace.h" +#include "gmap.h" =20 static int handle_ri(struct kvm_vcpu *vcpu) { @@ -222,7 +223,7 @@ int kvm_s390_skey_check_enable(struct kvm_vcpu *vcpu) if (vcpu->arch.skey_enabled) return 0; =20 - rc =3D s390_enable_skey(); + rc =3D gmap_enable_skeys(vcpu->arch.gmap); VCPU_EVENT(vcpu, 3, "enabling storage keys for guest: %d", rc); if (rc) return rc; @@ -255,10 +256,9 @@ static int try_handle_skey(struct kvm_vcpu *vcpu) =20 static int handle_iske(struct kvm_vcpu *vcpu) { - unsigned long gaddr, vmaddr; - unsigned char key; + unsigned long gaddr; int reg1, reg2; - bool unlocked; + union skey key; int rc; =20 vcpu->stat.instruction_iske++; @@ -275,37 +275,21 @@ static int handle_iske(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D get_guest_storage_key(current->mm, vmaddr, &key); - - if (rc) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_get_storage_key(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr), &ke= y); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; vcpu->run->s.regs.gprs[reg1] &=3D ~0xff; - vcpu->run->s.regs.gprs[reg1] |=3D key; + vcpu->run->s.regs.gprs[reg1] |=3D key.skey; return 0; } =20 static int handle_rrbe(struct kvm_vcpu *vcpu) { - unsigned long vmaddr, gaddr; + unsigned long gaddr; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_rrbe++; @@ -322,24 +306,10 @@ static int handle_rrbe(struct kvm_vcpu *vcpu) gaddr =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; gaddr =3D kvm_s390_logical_to_effective(vcpu, gaddr); gaddr =3D kvm_s390_real_to_abs(vcpu, gaddr); - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); -retry: - unlocked =3D false; - mmap_read_lock(current->mm); - rc =3D reset_guest_reference_bit(current->mm, vmaddr); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - if (!rc) { - mmap_read_unlock(current->mm); - goto retry; - } - } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + rc =3D dat_reset_reference_bit(vcpu->arch.gmap->asce, gpa_to_gfn(gaddr)); + if (rc > 0) + return kvm_s390_inject_program_int(vcpu, rc); if (rc < 0) return rc; kvm_s390_set_psw_cc(vcpu, rc); @@ -354,9 +324,8 @@ static int handle_sske(struct kvm_vcpu *vcpu) { unsigned char m3 =3D vcpu->arch.sie_block->ipb >> 28; unsigned long start, end; - unsigned char key, oldkey; + union skey key, oldkey; int reg1, reg2; - bool unlocked; int rc; =20 vcpu->stat.instruction_sske++; @@ -377,7 +346,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) =20 kvm_s390_get_regs_rre(vcpu, ®1, ®2); =20 - key =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & 0xfe; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); if (m3 & SSKE_MB) { @@ -389,27 +358,17 @@ static int handle_sske(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - unlocked =3D false; - - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, key, &oldkey, - m3 & SSKE_NQ, m3 & SSKE_MR, - m3 & SSKE_MC); - - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, &oldkey, + m3 & SSKE_NQ, m3 & SSKE_MR, m3 & SSKE_MC); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) + if (rc > 1) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; start +=3D PAGE_SIZE; @@ -422,7 +381,7 @@ static int handle_sske(struct kvm_vcpu *vcpu) } else { kvm_s390_set_psw_cc(vcpu, rc); vcpu->run->s.regs.gprs[reg1] &=3D ~0xff00UL; - vcpu->run->s.regs.gprs[reg1] |=3D (u64) oldkey << 8; + vcpu->run->s.regs.gprs[reg1] |=3D (u64)oldkey.skey << 8; } } if (m3 & SSKE_MB) { @@ -1082,7 +1041,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) bool mr =3D false, mc =3D false, nq; int reg1, reg2; unsigned long start, end; - unsigned char key; + union skey key; =20 vcpu->stat.instruction_pfmf++; =20 @@ -1110,7 +1069,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 nq =3D vcpu->run->s.regs.gprs[reg1] & PFMF_NQ; - key =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; + key.skey =3D vcpu->run->s.regs.gprs[reg1] & PFMF_KEY; start =3D vcpu->run->s.regs.gprs[reg2] & PAGE_MASK; start =3D kvm_s390_logical_to_effective(vcpu, start); =20 @@ -1141,14 +1100,6 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) } =20 while (start !=3D end) { - unsigned long vmaddr; - bool unlocked =3D false; - - /* Translate guest address to host address */ - vmaddr =3D gfn_to_hva(vcpu->kvm, gpa_to_gfn(start)); - if (kvm_is_error_hva(vmaddr)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (vcpu->run->s.regs.gprs[reg1] & PFMF_CF) { if (kvm_clear_guest(vcpu->kvm, start, PAGE_SIZE)) return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); @@ -1159,19 +1110,17 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) =20 if (rc) return rc; - mmap_read_lock(current->mm); - rc =3D cond_set_guest_storage_key(current->mm, vmaddr, - key, NULL, nq, mr, mc); - if (rc < 0) { - rc =3D fixup_user_fault(current->mm, vmaddr, - FAULT_FLAG_WRITE, &unlocked); - rc =3D !rc ? -EAGAIN : rc; + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) { + rc =3D dat_cond_set_storage_key(vcpu->arch.mc, vcpu->arch.gmap->asce, + gpa_to_gfn(start), key, + NULL, nq, mr, mc); } - mmap_read_unlock(current->mm); - if (rc =3D=3D -EFAULT) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - if (rc =3D=3D -EAGAIN) + if (rc > 1) + return kvm_s390_inject_program_int(vcpu, rc); + if (rc =3D=3D -ENOMEM) { + kvm_s390_mmu_cache_topup(vcpu->arch.mc); continue; + } if (rc < 0) return rc; } @@ -1195,8 +1144,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc) { int r1, r2, nappended, entries; - unsigned long gfn, hva, res, pgstev, ptev; + union essa_state state; unsigned long *cbrlo; + unsigned long gfn; + bool dirtied; =20 /* * We don't need to set SD.FPF.SK to 1 here, because if we have a @@ -1205,33 +1156,12 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) =20 kvm_s390_get_regs_rre(vcpu, &r1, &r2); gfn =3D vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT; - hva =3D gfn_to_hva(vcpu->kvm, gfn); entries =3D (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3; =20 - if (kvm_is_error_hva(hva)) - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); - - nappended =3D pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev); - if (nappended < 0) { - res =3D orc ? 0x10 : 0; - vcpu->run->s.regs.gprs[r1] =3D res; /* Exception Indication */ + nappended =3D dat_perform_essa(vcpu->arch.gmap->asce, gfn, orc, &state, &= dirtied); + vcpu->run->s.regs.gprs[r1] =3D state.val; + if (nappended < 0) return 0; - } - res =3D (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22; - /* - * Set the block-content state part of the result. 0 means resident, so - * nothing to do if the page is valid. 2 is for preserved pages - * (non-present and non-zero), and 3 for zero pages (non-present and - * zero). - */ - if (ptev & _PAGE_INVALID) { - res |=3D 2; - if (pgstev & _PGSTE_GPS_ZERO) - res |=3D 1; - } - if (pgstev & _PGSTE_GPS_NODAT) - res |=3D 0x20; - vcpu->run->s.regs.gprs[r1] =3D res; /* * It is possible that all the normal 511 slots were full, in which case * we will now write in the 512th slot, which is reserved for host use. @@ -1243,17 +1173,34 @@ static inline int __do_essa(struct kvm_vcpu *vcpu, = const int orc) cbrlo[entries] =3D gfn << PAGE_SHIFT; } =20 - if (orc) { - struct kvm_memory_slot *ms =3D gfn_to_memslot(vcpu->kvm, gfn); - - /* Increment only if we are really flipping the bit */ - if (ms && !test_and_set_bit(gfn - ms->base_gfn, kvm_second_dirty_bitmap(= ms))) - atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); - } + if (dirtied) + atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); =20 return nappended; } =20 +static void _essa_clear_cbrl(struct kvm_vcpu *vcpu, unsigned long *cbrl, i= nt len) +{ + union crste *crstep; + union pgste pgste; + union pte *ptep; + int i; + + lockdep_assert_held(&vcpu->kvm->mmu_lock); + + for (i =3D 0; i < len; i++) { + if (dat_entry_walk(NULL, gpa_to_gfn(cbrl[i]), vcpu->arch.gmap->asce, + 0, TABLE_TYPE_PAGE_TABLE, &crstep, &ptep)) + continue; + if (!ptep || ptep->s.pr) + continue; + pgste =3D pgste_get_lock(ptep); + if (pgste.usage =3D=3D PGSTE_GPS_USAGE_UNUSED || pgste.zero) + gmap_helper_zap_one_page(vcpu->kvm->mm, cbrl[i]); + pgste_set_unlock(ptep, pgste); + } +} + static int handle_essa(struct kvm_vcpu *vcpu) { lockdep_assert_held(&vcpu->kvm->srcu); @@ -1289,11 +1236,7 @@ static int handle_essa(struct kvm_vcpu *vcpu) * value really needs to be written to; if the value is * already correct, we do nothing and avoid the lock. */ - if (vcpu->kvm->mm->context.uses_cmm =3D=3D 0) { - mmap_write_lock(vcpu->kvm->mm); - vcpu->kvm->mm->context.uses_cmm =3D 1; - mmap_write_unlock(vcpu->kvm->mm); - } + WRITE_ONCE(vcpu->arch.gmap->uses_cmm, 1); /* * If we are here, we are supposed to have CMMA enabled in * the SIE block. Enabling CMMA works on a per-CPU basis, @@ -1307,20 +1250,22 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* Retry the ESSA instruction */ kvm_s390_retry_instr(vcpu); } else { - mmap_read_lock(vcpu->kvm->mm); - i =3D __do_essa(vcpu, orc); - mmap_read_unlock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + i =3D __do_essa(vcpu, orc); if (i < 0) return i; /* Account for the possible extra cbrl entry */ entries +=3D i; } - vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; /* reset nceo */ + /* reset nceo */ + vcpu->arch.sie_block->cbrlo &=3D PAGE_MASK; cbrlo =3D phys_to_virt(vcpu->arch.sie_block->cbrlo); - mmap_read_lock(gmap->mm); - for (i =3D 0; i < entries; ++i) - __gmap_zap(gmap, cbrlo[i]); - mmap_read_unlock(gmap->mm); + + mmap_read_lock(vcpu->kvm->mm); + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) + _essa_clear_cbrl(vcpu, cbrlo, entries); + mmap_read_unlock(vcpu->kvm->mm); + return 0; } =20 diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index 6ba5a0305e25..d8a5c7b91148 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -12,13 +12,16 @@ #include #include #include -#include #include #include #include #include #include #include "kvm-s390.h" +#include "dat.h" +#include "gaccess.h" +#include "gmap.h" +#include "faultin.h" =20 bool kvm_s390_pv_is_protected(struct kvm *kvm) { @@ -299,35 +302,6 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm= *kvm, return 0; } =20 -/** - * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest me= mory. - * @kvm: the VM whose memory is to be cleared. - * - * Destroy the first 2GB of guest memory, to avoid prefix issues after reb= oot. - * The CPUs of the protected VM need to be destroyed beforehand. - */ -static void kvm_s390_destroy_lower_2g(struct kvm *kvm) -{ - const unsigned long pages_2g =3D SZ_2G / PAGE_SIZE; - struct kvm_memory_slot *slot; - unsigned long len; - int srcu_idx; - - srcu_idx =3D srcu_read_lock(&kvm->srcu); - - /* Take the memslot containing guest absolute address 0 */ - slot =3D gfn_to_memslot(kvm, 0); - /* Clear all slots or parts thereof that are below 2GB */ - while (slot && slot->base_gfn < pages_2g) { - len =3D min_t(u64, slot->npages, pages_2g - slot->base_gfn) * PAGE_SIZE; - s390_uv_destroy_range(kvm->mm, slot->userspace_addr, slot->userspace_add= r + len); - /* Take the next memslot */ - slot =3D gfn_to_memslot(kvm, slot->base_gfn + slot->npages); - } - - srcu_read_unlock(&kvm->srcu, srcu_idx); -} - static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, u16 *rc, u16 *rrc) { struct uv_cb_destroy_fast uvcb =3D { @@ -342,7 +316,6 @@ static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, = u16 *rc, u16 *rrc) *rc =3D uvcb.header.rc; if (rrc) *rrc =3D uvcb.header.rrc; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM FAST: rc %x rrc %x", uvcb.header.rc, uvcb.header.rrc); WARN_ONCE(cc && uvcb.header.rc !=3D 0x104, @@ -391,7 +364,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return -EINVAL; =20 /* Guest with segment type ASCE, refuse to destroy asynchronously */ - if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) + if (kvm->arch.gmap->asce.dt =3D=3D TABLE_TYPE_SEGMENT) return -EINVAL; =20 priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); @@ -404,8 +377,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) priv->stor_var =3D kvm->arch.pv.stor_var; priv->stor_base =3D kvm->arch.pv.stor_base; priv->handle =3D kvm_s390_pv_get_handle(kvm); - priv->old_gmap_table =3D (unsigned long)kvm->arch.gmap->table; - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); + priv->old_gmap_table =3D (unsigned long)dereference_asce(kvm->arch.gmap-= >asce); if (s390_replace_asce(kvm->arch.gmap)) res =3D -ENOMEM; } @@ -415,7 +387,7 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16= *rrc) return res; } =20 - kvm_s390_destroy_lower_2g(kvm); + gmap_pv_destroy_range(kvm->arch.gmap, 0, gpa_to_gfn(SZ_2G), false); kvm_s390_clear_pv_state(kvm); kvm->arch.pv.set_aside =3D priv; =20 @@ -449,7 +421,6 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16= *rrc) =20 cc =3D uv_cmd_nodata(kvm_s390_pv_get_handle(kvm), UVC_CMD_DESTROY_SEC_CONF, rc, rrc); - WRITE_ONCE(kvm->arch.gmap->guest_handle, 0); if (!cc) { atomic_dec(&kvm->mm->context.protected_count); kvm_s390_pv_dealloc_vm(kvm); @@ -532,7 +503,7 @@ int kvm_s390_pv_deinit_cleanup_all(struct kvm *kvm, u16= *rc, u16 *rrc) * cleanup has been performed. */ if (need_zap && mmget_not_zero(kvm->mm)) { - s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE); + gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asce),= false); mmput(kvm->mm); } =20 @@ -570,7 +541,7 @@ int kvm_s390_pv_deinit_aside_vm(struct kvm *kvm, u16 *r= c, u16 *rrc) return -EINVAL; =20 /* When a fatal signal is received, stop immediately */ - if (s390_uv_destroy_range_interruptible(kvm->mm, 0, TASK_SIZE_MAX)) + if (gmap_pv_destroy_range(kvm->arch.gmap, 0, asce_end(kvm->arch.gmap->asc= e), true)) goto done; if (kvm_s390_pv_dispose_one_leftover(kvm, p, rc, rrc)) ret =3D -EIO; @@ -642,7 +613,7 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) /* Inputs */ uvcb.guest_stor_origin =3D 0; /* MSO is 0 for KVM */ uvcb.guest_stor_len =3D kvm->arch.pv.guest_len; - uvcb.guest_asce =3D kvm->arch.gmap->asce; + uvcb.guest_asce =3D kvm->arch.gmap->asce.val; uvcb.guest_sca =3D virt_to_phys(kvm->arch.sca); uvcb.conf_base_stor_origin =3D virt_to_phys((void *)kvm->arch.pv.stor_base); @@ -669,7 +640,6 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) } return -EIO; } - kvm->arch.gmap->guest_handle =3D uvcb.guest_handle; return 0; } =20 @@ -704,26 +674,14 @@ static int unpack_one(struct kvm *kvm, unsigned long = addr, u64 tweak, .tweak[1] =3D offset, }; int ret =3D kvm_s390_pv_make_secure(kvm, addr, &uvcb); - unsigned long vmaddr; - bool unlocked; =20 *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; =20 if (ret =3D=3D -ENXIO) { - mmap_read_lock(kvm->mm); - vmaddr =3D gfn_to_hva(kvm, gpa_to_gfn(addr)); - if (kvm_is_error_hva(vmaddr)) { - ret =3D -EFAULT; - } else { - ret =3D fixup_user_fault(kvm->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); - if (!ret) - ret =3D __gmap_link(kvm->arch.gmap, addr, vmaddr); - } - mmap_read_unlock(kvm->mm); + ret =3D kvm_s390_faultin_gfn_simple(NULL, kvm, gpa_to_gfn(addr), true); if (!ret) return -EAGAIN; - return ret; } =20 if (ret && ret !=3D -EAGAIN) diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 347268f89f2f..775c6d3b33d7 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -15,7 +15,6 @@ #include #include =20 -#include #include #include #include @@ -23,9 +22,11 @@ #include #include "kvm-s390.h" #include "gaccess.h" +#include "gmap.h" =20 enum vsie_page_flags { VSIE_PAGE_IN_USE =3D 0, + VSIE_PAGE_RUNNING, }; =20 struct vsie_page { @@ -62,11 +63,20 @@ struct vsie_page { * looked up by other CPUs. */ unsigned long flags; /* 0x0260 */ - __u8 reserved[0x0700 - 0x0268]; /* 0x0268 */ + /* Per-gmap list of vsie_pages that use that gmap */ + struct list_head list; /* 0x0268 */ + __u8 reserved[0x0700 - 0x0278]; /* 0x0278 */ struct kvm_s390_crypto_cb crycb; /* 0x0700 */ __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ }; =20 +static_assert(sizeof(struct vsie_page) =3D=3D PAGE_SIZE); + +static inline bool is_vsie_page_running(struct vsie_page *vsie_page) +{ + return test_bit(VSIE_PAGE_RUNNING, &vsie_page->flags); +} + /** * gmap_shadow_valid() - check if a shadow guest address space matches the * given properties and is still valid @@ -78,11 +88,11 @@ struct vsie_page { * properties, the caller can continue using it. Returns 0 otherwise; the * caller has to request a new shadow gmap in this case. */ -int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level) +int gmap_shadow_valid(struct gmap *sg, union asce asce, int edat_level) { if (sg->removed) return 0; - return sg->orig_asce =3D=3D asce && sg->edat_level =3D=3D edat_level; + return sg->guest_asce.val =3D=3D asce.val && sg->edat_level =3D=3D edat_l= evel; } =20 /* trigger a validity icpt for the given scb */ @@ -612,31 +622,29 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) return rc; } =20 -void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) +void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, gpa_t start, gpa_t end) { - struct kvm *kvm =3D gmap->private; - struct vsie_page *cur; + struct vsie_page *cur, *next; unsigned long prefix; - int i; =20 - if (!gmap_is_shadow(gmap)) - return; + KVM_BUG_ON(!gmap->is_shadow, gmap->kvm); + KVM_BUG_ON(!gmap->parent, gmap->kvm); + lockdep_assert_held(&gmap->parent->children_lock); /* * Only new shadow blocks are added to the list during runtime, * therefore we can safely reference them all the time. */ - for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { - cur =3D READ_ONCE(kvm->arch.vsie.pages[i]); - if (!cur) - continue; - if (READ_ONCE(cur->gmap) !=3D gmap) - continue; + list_for_each_entry_safe(cur, next, &gmap->scb_users, list) { prefix =3D cur->scb_s.prefix << GUEST_PREFIX_SHIFT; /* with mso/msl, the prefix lies at an offset */ prefix +=3D cur->scb_s.mso; - if (prefix <=3D end && start <=3D prefix + 2 * PAGE_SIZE - 1) + if (prefix <=3D end && start <=3D prefix + 2 * PAGE_SIZE - 1) { prefix_unmapped_sync(cur); + if (gmap->removed && !is_vsie_page_running(cur)) { + list_del(&cur->list); + cur->gmap =3D NULL; + } + } } } =20 @@ -667,10 +675,10 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct v= sie_page *vsie_page) /* with mso/msl, the prefix lies at offset *mso* */ prefix +=3D scb_s->mso; =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL); + rc =3D gaccess_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL, true); if (!rc && (scb_s->ecb & ECB_TE)) - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - prefix + PAGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, vsie_page->gmap, + prefix + PAGE_SIZE, NULL, true); /* * We don't have to mprotect, we will be called for all unshadows. * SIE will detect if protection applies and trigger a validity. @@ -953,6 +961,7 @@ static int inject_fault(struct kvm_vcpu *vcpu, __u16 co= de, __u64 vaddr, */ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { + bool wr =3D kvm_s390_cur_gmap_fault_is_write(); int rc; =20 if ((current->thread.gmap_int_code & PGM_INT_CODE_MASK) =3D=3D PGM_PROTEC= TION) @@ -960,12 +969,11 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct= vsie_page *vsie_page) return inject_fault(vcpu, PGM_PROTECTION, current->thread.gmap_teid.addr * PAGE_SIZE, 1); =20 - rc =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - current->thread.gmap_teid.addr * PAGE_SIZE, NULL); + rc =3D gaccess_shadow_fault(vcpu, vsie_page->gmap, + current->thread.gmap_teid.addr * PAGE_SIZE, NULL, wr); if (rc > 0) { rc =3D inject_fault(vcpu, rc, - current->thread.gmap_teid.addr * PAGE_SIZE, - kvm_s390_cur_gmap_fault_is_write()); + current->thread.gmap_teid.addr * PAGE_SIZE, wr); if (rc >=3D 0) vsie_page->fault_addr =3D current->thread.gmap_teid.addr * PAGE_SIZE; } @@ -982,8 +990,8 @@ static void handle_last_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { if (vsie_page->fault_addr) - kvm_s390_shadow_fault(vcpu, vsie_page->gmap, - vsie_page->fault_addr, NULL); + gaccess_shadow_fault(vcpu, vsie_page->gmap, + vsie_page->fault_addr, NULL, true); vsie_page->fault_addr =3D 0; } =20 @@ -1068,8 +1076,9 @@ static u64 vsie_get_register(struct kvm_vcpu *vcpu, s= truct vsie_page *vsie_page, static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_= page) { struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; - unsigned long pei_dest, pei_src, src, dest, mask, prefix; + unsigned long src, dest, mask, prefix; u64 *pei_block =3D &vsie_page->scb_o->mcic; + union mvpg_pei pei_dest, pei_src; int edat, rc_dest, rc_src; union ctlreg0 cr0; =20 @@ -1083,8 +1092,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) src =3D vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; src =3D _kvm_s390_real_to_abs(prefix, src) + scb_s->mso; =20 - rc_dest =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest); - rc_src =3D kvm_s390_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src); + rc_dest =3D gaccess_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest, = true); + rc_src =3D gaccess_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src, fal= se); /* * Either everything went well, or something non-critical went wrong * e.g. because of a race. In either case, simply retry. @@ -1119,8 +1128,8 @@ static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, st= ruct vsie_page *vsie_page) rc_src =3D rc_src !=3D PGM_PAGE_TRANSLATION ? rc_src : 0; } if (!rc_dest && !rc_src) { - pei_block[0] =3D pei_dest; - pei_block[1] =3D pei_src; + pei_block[0] =3D pei_dest.val; + pei_block[1] =3D pei_src.val; return 1; } =20 @@ -1182,7 +1191,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct = vsie_page *vsie_page) if (!kvm_s390_vcpu_sie_inhibited(vcpu)) { local_irq_disable(); guest_timing_enter_irqoff(); - rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, vsie_page-= >gmap->asce); + rc =3D kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs, + vsie_page->gmap->asce.val); guest_timing_exit_irqoff(); local_irq_enable(); } @@ -1230,42 +1240,62 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struc= t vsie_page *vsie_page) =20 static void release_gmap_shadow(struct vsie_page *vsie_page) { - if (vsie_page->gmap) - gmap_put(vsie_page->gmap); - WRITE_ONCE(vsie_page->gmap, NULL); + struct gmap *gmap =3D vsie_page->gmap; + + KVM_BUG_ON(!gmap->parent, gmap->kvm); + lockdep_assert_held(&gmap->parent->children_lock); + + vsie_page->gmap =3D NULL; + list_del(&vsie_page->list); + + if (list_empty(&gmap->scb_users)) { + gmap_remove_child(gmap); + gmap_dispose(gmap); + } prefix_unmapped(vsie_page); } =20 static int acquire_gmap_shadow(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { - unsigned long asce; union ctlreg0 cr0; struct gmap *gmap; + union asce asce; int edat; =20 - asce =3D vcpu->arch.sie_block->gcr[1]; + asce.val =3D vcpu->arch.sie_block->gcr[1]; cr0.val =3D vcpu->arch.sie_block->gcr[0]; edat =3D cr0.edat && test_kvm_facility(vcpu->kvm, 8); edat +=3D edat && test_kvm_facility(vcpu->kvm, 78); =20 - /* - * ASCE or EDAT could have changed since last icpt, or the gmap - * we're holding has been unshadowed. If the gmap is still valid, - * we can safely reuse it. - */ - if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat)) { - vcpu->kvm->stat.gmap_shadow_reuse++; - return 0; + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + if (vsie_page->gmap) { + /* + * ASCE or EDAT could have changed since last icpt, or the gmap + * we're holding has been unshadowed. If the gmap is still valid, + * we can safely reuse it. + */ + if (gmap_shadow_valid(vsie_page->gmap, asce, edat)) { + vcpu->kvm->stat.gmap_shadow_reuse++; + return 0; + } + /* release the old shadow - if any, and mark the prefix as unmapped */ + if (vsie_page->gmap) + release_gmap_shadow(vsie_page); + } } - - /* release the old shadow - if any, and mark the prefix as unmapped */ - release_gmap_shadow(vsie_page); - gmap =3D gmap_shadow(vcpu->arch.gmap, asce, edat); + gmap =3D gmap_create_shadow(vcpu->arch.mc, vcpu->kvm->arch.gmap, asce, ed= at); if (IS_ERR(gmap)) return PTR_ERR(gmap); - vcpu->kvm->stat.gmap_shadow_create++; - WRITE_ONCE(vsie_page->gmap, gmap); + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + /* unlikely race condition, remove the previous shadow */ + if (vsie_page->gmap) + release_gmap_shadow(vsie_page); + vcpu->kvm->stat.gmap_shadow_create++; + list_add(&vsie_page->list, &gmap->scb_users); + vsie_page->gmap =3D gmap; + prefix_unmapped(vsie_page); + } return 0; } =20 @@ -1321,6 +1351,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsi= e_page *vsie_page) struct kvm_s390_sie_block *scb_s =3D &vsie_page->scb_s; int rc =3D 0; =20 + set_bit(VSIE_PAGE_RUNNING, &vsie_page->flags); while (1) { rc =3D acquire_gmap_shadow(vcpu, vsie_page); if (!rc) @@ -1353,6 +1384,11 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vs= ie_page *vsie_page) } cond_resched(); } + scoped_guard(spinlock, &vcpu->kvm->arch.gmap->children_lock) { + if (vsie_page->gmap && vsie_page->gmap->removed) + release_gmap_shadow(vsie_page); + clear_bit(VSIE_PAGE_RUNNING, &vsie_page->flags); + } =20 if (rc =3D=3D -EFAULT) { /* @@ -1448,8 +1484,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kv= m, unsigned long addr) vsie_page->scb_gpa =3D ULONG_MAX; =20 /* Double use of the same address or allocation failure. */ - if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, - vsie_page)) { + if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, vsie_page)= ) { put_vsie_page(vsie_page); mutex_unlock(&kvm->arch.vsie.mutex); return NULL; @@ -1458,7 +1493,11 @@ static struct vsie_page *get_vsie_page(struct kvm *k= vm, unsigned long addr) mutex_unlock(&kvm->arch.vsie.mutex); =20 memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); - release_gmap_shadow(vsie_page); + if (vsie_page->gmap) { + scoped_guard(spinlock, &vsie_page->gmap->parent->children_lock) + release_gmap_shadow(vsie_page); + } + prefix_unmapped(vsie_page); vsie_page->fault_addr =3D 0; vsie_page->scb_s.ihcpu =3D 0xffffU; return vsie_page; @@ -1535,8 +1574,10 @@ void kvm_s390_vsie_destroy(struct kvm *kvm) mutex_lock(&kvm->arch.vsie.mutex); for (i =3D 0; i < kvm->arch.vsie.page_count; i++) { vsie_page =3D kvm->arch.vsie.pages[i]; + scoped_guard(spinlock, &kvm->arch.gmap->children_lock) + if (vsie_page->gmap) + release_gmap_shadow(vsie_page); kvm->arch.vsie.pages[i] =3D NULL; - release_gmap_shadow(vsie_page); /* free the radix tree entry */ if (vsie_page->scb_gpa !=3D ULONG_MAX) radix_tree_delete(&kvm->arch.vsie.addr_to_page, diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c index 1a6ba105e071..0ac2f3998b14 100644 --- a/arch/s390/lib/uaccess.c +++ b/arch/s390/lib/uaccess.c @@ -34,136 +34,19 @@ void debug_user_asce(int exit) } #endif /*CONFIG_DEBUG_ENTRY */ =20 -union oac { - unsigned int val; - struct { - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac1; - struct { - unsigned short key : 4; - unsigned short : 4; - unsigned short as : 2; - unsigned short : 4; - unsigned short k : 1; - unsigned short a : 1; - } oac2; - }; -}; - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_from_user_key(void *to, const void __user *from, unsigned long si= ze, unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac2.key =3D key, - .oac2.as =3D PSW_BITS_AS_SECONDARY, - .oac2.k =3D 1, - .oac2.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_FROM(0b, 0b) - EX_TABLE_UA_MVCOS_FROM(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char __user *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_from_user_key(void *to, const void __user *from, - unsigned long n, unsigned long key) -{ - unsigned long res =3D n; - - might_fault(); - if (!should_fail_usercopy()) { - instrument_copy_from_user_before(to, from, n); - res =3D raw_copy_from_user_key(to, from, n, key); - instrument_copy_from_user_after(to, from, n, res); - } - if (unlikely(res)) - memset(to + (n - res), 0, res); - return res; -} -EXPORT_SYMBOL(_copy_from_user_key); - -static uaccess_kmsan_or_inline __must_check unsigned long -raw_copy_to_user_key(void __user *to, const void *from, unsigned long size= , unsigned long key) -{ - unsigned long osize; - union oac spec =3D { - .oac1.key =3D key, - .oac1.as =3D PSW_BITS_AS_SECONDARY, - .oac1.k =3D 1, - .oac1.a =3D 1, - }; - int cc; - - while (1) { - osize =3D size; - asm_inline volatile( - " lr %%r0,%[spec]\n" - "0: mvcos %[to],%[from],%[size]\n" - "1: nopr %%r7\n" - CC_IPM(cc) - EX_TABLE_UA_MVCOS_TO(0b, 0b) - EX_TABLE_UA_MVCOS_TO(1b, 0b) - : CC_OUT(cc, cc), [size] "+d" (size), [to] "=3DQ" (*(char __user *)to) - : [spec] "d" (spec.val), [from] "Q" (*(const char *)from) - : CC_CLOBBER_LIST("memory", "0")); - if (CC_TRANSFORM(cc) =3D=3D 0) - return osize - size; - size -=3D 4096; - to +=3D 4096; - from +=3D 4096; - } -} - -unsigned long _copy_to_user_key(void __user *to, const void *from, - unsigned long n, unsigned long key) -{ - might_fault(); - if (should_fail_usercopy()) - return n; - instrument_copy_to_user(to, from, n); - return raw_copy_to_user_key(to, from, n, key); -} -EXPORT_SYMBOL(_copy_to_user_key); - #define CMPXCHG_USER_KEY_MAX_LOOPS 128 =20 -static nokprobe_inline int __cmpxchg_user_key_small(unsigned long address,= unsigned int *uval, - unsigned int old, unsigned int new, - unsigned int mask, unsigned long key) +static nokprobe_inline int __cmpxchg_key_small(void *address, unsigned int= *uval, + unsigned int old, unsigned int new, + unsigned int mask, unsigned long key) { unsigned long count; unsigned int prev; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" " llill %[count],%[max_loops]\n" "0: l %[prev],%[address]\n" "1: nr %[prev],%[mask]\n" @@ -178,8 +61,7 @@ static nokprobe_inline int __cmpxchg_user_key_small(unsi= gned long address, unsig " nr %[tmp],%[mask]\n" " jnz 5f\n" " brct %[count],2b\n" - "5: sacf 768\n" - " spka %[default_key]\n" + "5: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 5b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 5b, %[rc], %[prev]) @@ -197,16 +79,16 @@ static nokprobe_inline int __cmpxchg_user_key_small(un= signed long address, unsig [default_key] "J" (PAGE_DEFAULT_KEY), [max_loops] "J" (CMPXCHG_USER_KEY_MAX_LOOPS) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; if (!count) rc =3D -EAGAIN; return rc; } =20 -int __kprobes __cmpxchg_user_key1(unsigned long address, unsigned char *uv= al, - unsigned char old, unsigned char new, unsigned long key) +int __kprobes __cmpxchg_key1(void *addr, unsigned char *uval, unsigned cha= r old, + unsigned char new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -215,15 +97,16 @@ int __kprobes __cmpxchg_user_key1(unsigned long addres= s, unsigned char *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key1); +EXPORT_SYMBOL(__cmpxchg_key1); =20 -int __kprobes __cmpxchg_user_key2(unsigned long address, unsigned short *u= val, - unsigned short old, unsigned short new, unsigned long key) +int __kprobes __cmpxchg_key2(void *addr, unsigned short *uval, unsigned sh= ort old, + unsigned short new, unsigned long key) { + unsigned long address =3D (unsigned long)addr; unsigned int prev, shift, mask, _old, _new; int rc; =20 @@ -232,27 +115,23 @@ int __kprobes __cmpxchg_user_key2(unsigned long addre= ss, unsigned short *uval, _old =3D (unsigned int)old << shift; _new =3D (unsigned int)new << shift; mask =3D ~(0xffff << shift); - rc =3D __cmpxchg_user_key_small(address, &prev, _old, _new, mask, key); + rc =3D __cmpxchg_key_small((void *)address, &prev, _old, _new, mask, key); *uval =3D prev >> shift; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key2); +EXPORT_SYMBOL(__cmpxchg_key2); =20 -int __kprobes __cmpxchg_user_key4(unsigned long address, unsigned int *uva= l, - unsigned int old, unsigned int new, unsigned long key) +int __kprobes __cmpxchg_key4(void *address, unsigned int *uval, unsigned i= nt old, + unsigned int new, unsigned long key) { unsigned int prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cs %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -264,27 +143,22 @@ int __kprobes __cmpxchg_user_key4(unsigned long addre= ss, unsigned int *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key4); +EXPORT_SYMBOL(__cmpxchg_key4); =20 -int __kprobes __cmpxchg_user_key8(unsigned long address, unsigned long *uv= al, - unsigned long old, unsigned long new, unsigned long key) +int __kprobes __cmpxchg_key8(void *address, unsigned long *uval, unsigned = long old, + unsigned long new, unsigned long key) { unsigned long prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: csg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REG(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REG(1b, 1b, %[rc], %[prev]) @@ -296,27 +170,22 @@ int __kprobes __cmpxchg_user_key8(unsigned long addre= ss, unsigned long *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key8); +EXPORT_SYMBOL(__cmpxchg_key8); =20 -int __kprobes __cmpxchg_user_key16(unsigned long address, __uint128_t *uva= l, - __uint128_t old, __uint128_t new, unsigned long key) +int __kprobes __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_= t old, + __uint128_t new, unsigned long key) { __uint128_t prev =3D old; - bool sacf_flag; int rc =3D 0; =20 skey_regions_initialize(); - sacf_flag =3D enable_sacf_uaccess(); asm_inline volatile( "20: spka 0(%[key])\n" - " sacf 256\n" "0: cdsg %[prev],%[new],%[address]\n" - "1: sacf 768\n" - " spka %[default_key]\n" + "1: spka %[default_key]\n" "21:\n" EX_TABLE_UA_LOAD_REGPAIR(0b, 1b, %[rc], %[prev]) EX_TABLE_UA_LOAD_REGPAIR(1b, 1b, %[rc], %[prev]) @@ -328,8 +197,7 @@ int __kprobes __cmpxchg_user_key16(unsigned long addres= s, __uint128_t *uval, [key] "a" (key << 4), [default_key] "J" (PAGE_DEFAULT_KEY) : "memory", "cc"); - disable_sacf_uaccess(sacf_flag); *uval =3D prev; return rc; } -EXPORT_SYMBOL(__cmpxchg_user_key16); +EXPORT_SYMBOL(__cmpxchg_key16); diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index dca783859a73..da81519db55a 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -34,28 +34,6 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, sw= p_entry_t entry) free_swap_and_cache(entry); } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - /** * gmap_helper_zap_one_page() - discard a page if it was swapped. * @mm: the mm @@ -69,7 +47,6 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsig= ned long vmaddr) { struct vm_area_struct *vma; spinlock_t *ptl; - pgste_t pgste; pte_t *ptep; =20 mmap_assert_locked(mm); @@ -84,14 +61,8 @@ void gmap_helper_zap_one_page(struct mm_struct *mm, unsi= gned long vmaddr) if (unlikely(!ptep)) return; if (pte_swap(*ptep)) { - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - ptep_zap_swap_entry(mm, pte_to_swp_entry(*ptep)); pte_clear(mm, vmaddr, ptep); - - pgste_set_unlock(ptep, pgste); - preempt_enable(); } pte_unmap_unlock(ptep, ptl); } --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A5D7374143; Thu, 20 Nov 2025 17:16:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659016; cv=none; b=qlziUvmCtE8mVrNtTO0zfmn84KQUm3ZHVq7fev2HhAGsouZ5Qc91jg5FAXrHiSK2+mpctSeT6Uu/KgAP67RrAaYecEoSLHYARvOUZ/41tiEfW9LkAsAaPreiDNyFAGaPhZRE7KNeNpW+fgAYmFkDDIYh3L6jr5jmIKVUOGAXIOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659016; c=relaxed/simple; bh=cqqz85PtO30wZ4OcxtZhPCVGIhAeaugkcBTYPRnq6Vw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BFNqY1ekt5kFBFIbPtL5F0JlXZOvliutrU0PXYOpVKlTUckYuC9eXlo3H5ETatSyAGcm1T/FAIrR90AfXyB6ZY+2A6dHKry7+SP9+kpkLE3xD5TSiL25GlzsLkCi10zR5LtxGpi4AGQWyIr8DSIX6+YX4U6dfP9sEUxkBGHBZFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=sR5mKdfa; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="sR5mKdfa" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKB7lHW003872; Thu, 20 Nov 2025 17:16:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=L0PVBimrfzXdbzaLh gtHLxpArTZj9wjAW2/RAbEB9Es=; b=sR5mKdfarCyL0+IsebgaTYbH6r34H8of1 8UeppnF5QhexvFhzthc5JlGBKjUAh/vCOncrrk9/8nv63EWItb68kRnoCUNNpnK6 h5F49CvGQiVpVUwLLV5l2wkw9Lgb7t0A23c9D18z5klLpP2LE+bPodAGKhsuXv4G I2qcCqHzLpNoAUh7ekCDuERTLPXfhfa9T5Tp+sOCQiRL7DLVe00g9l8P/BDU9VlH IQAsEQx1rfObs8COem47tC5wm8R3XIl0S5RJALcBT5gSnrK6KIJSfvRsJAQPe6Cr 4TRd7MD6u2HcQQrHw4Pwx7w8jX2CCLx49v/4jVzuG1d7zZ9qkDwsQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejju6885-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:47 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEPso1022340; Thu, 20 Nov 2025 17:16:30 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mgs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:29 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGPoX26542626 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:25 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 858762004B; Thu, 20 Nov 2025 17:16:25 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9EFA720040; Thu, 20 Nov 2025 17:16:23 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:23 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 20/23] KVM: s390: Remove gmap from s390/mm Date: Thu, 20 Nov 2025 18:15:41 +0100 Message-ID: <20251120171544.96841-21-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 3gpo-OM631CnNJqsm9Ds2NlMaVe6n_g3 X-Proofpoint-ORIG-GUID: 3gpo-OM631CnNJqsm9Ds2NlMaVe6n_g3 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX0XIxyGDJ43yO fwDDlyxuzPYqXsehSaMWAhHVXSVjJ/aOM9kpaTXTE3zmAEp+DhWNG0d7+UORijmTb1R+Sxb6+fq ko+lj9HIQIPOwjZfn/W2BF4MO7ecNTcDn8kw/1yP28FxPhkH0Nvwq9RpYevRaH0W89KdGmDuM/F ZaWZ0mFwnoWqtVtq03PJxql1EtfPZMWvilNbKWNuWPypKSR4SEsUz0R0xvj4c+rjFvWIA+bj2in x0RgSVU14nlHQklBsRJavv+ucXNyQRASFIImReSVGKtTBuajx03JvdQTr0YATlX0BiTarWqgP5P z1rNUZW2czXUbJvchG+6XTR1/6NhPTr/p6ztWs3gwMMbFTCmnGGZvc4enmOUj6I/IM0Vi6o4Dw2 Jle4KcvfQnjzDElDgnQRLQznqN0R7g== X-Authority-Analysis: v=2.4 cv=SvOdKfO0 c=1 sm=1 tr=0 ts=691f4cff cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=VwQbUJbxAAAA:8 a=20KFwNOVAAAA:8 a=cibzcKNe6wOC1K94NK0A:9 a=USuQuVp_JFL2agNk:21 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 clxscore=1015 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 impostorscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Remove the now unused include/asm/gmap.h and mm/gmap.c files. Signed-off-by: Claudio Imbrenda --- MAINTAINERS | 2 - arch/s390/include/asm/gmap.h | 174 --- arch/s390/include/asm/pgtable.h | 9 - arch/s390/mm/Makefile | 1 - arch/s390/mm/gmap.c | 2453 ------------------------------- arch/s390/mm/pgtable.c | 10 - 6 files changed, 2649 deletions(-) delete mode 100644 arch/s390/include/asm/gmap.h delete mode 100644 arch/s390/mm/gmap.c diff --git a/MAINTAINERS b/MAINTAINERS index e64b94e6b5a9..8df80ec8a667 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13740,14 +13740,12 @@ L: kvm@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git F: Documentation/virt/kvm/s390* -F: arch/s390/include/asm/gmap.h F: arch/s390/include/asm/gmap_helpers.h F: arch/s390/include/asm/kvm* F: arch/s390/include/uapi/asm/kvm* F: arch/s390/include/uapi/asm/uvdevice.h F: arch/s390/kernel/uv.c F: arch/s390/kvm/ -F: arch/s390/mm/gmap.c F: arch/s390/mm/gmap_helpers.c F: drivers/s390/char/uvdevice.c F: tools/testing/selftests/drivers/s390x/uvdevice/ diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h deleted file mode 100644 index 66c5808fd011..000000000000 --- a/arch/s390/include/asm/gmap.h +++ /dev/null @@ -1,174 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2016 - * Author(s): Martin Schwidefsky - */ - -#ifndef _ASM_S390_GMAP_H -#define _ASM_S390_GMAP_H - -#include -#include - -/* Generic bits for GMAP notification on DAT table entry changes. */ -#define GMAP_NOTIFY_SHADOW 0x2 -#define GMAP_NOTIFY_MPROT 0x1 - -/* Status bits only for huge segment entries */ -#define _SEGMENT_ENTRY_GMAP_IN 0x0800 /* invalidation notify bit */ -#define _SEGMENT_ENTRY_GMAP_UC 0x0002 /* dirty (migration) */ - -/** - * struct gmap_struct - guest address space - * @list: list head for the mm->context gmap list - * @mm: pointer to the parent mm_struct - * @guest_to_host: radix tree with guest to host address translation - * @host_to_guest: radix tree with pointer to segment table entries - * @guest_table_lock: spinlock to protect all entries in the guest page ta= ble - * @ref_count: reference counter for the gmap structure - * @table: pointer to the page directory - * @asce: address space control element for gmap page table - * @pfault_enabled: defines if pfaults are applicable for the guest - * @guest_handle: protected virtual machine handle for the ultravisor - * @host_to_rmap: radix tree with gmap_rmap lists - * @children: list of shadow gmap structures - * @shadow_lock: spinlock to protect the shadow gmap list - * @parent: pointer to the parent gmap for shadow guest address spaces - * @orig_asce: ASCE for which the shadow page table has been created - * @edat_level: edat level to be used for the shadow translation - * @removed: flag to indicate if a shadow guest address space has been rem= oved - * @initialized: flag to indicate if a shadow guest address space can be u= sed - */ -struct gmap { - struct list_head list; - struct mm_struct *mm; - struct radix_tree_root guest_to_host; - struct radix_tree_root host_to_guest; - spinlock_t guest_table_lock; - refcount_t ref_count; - unsigned long *table; - unsigned long asce; - unsigned long asce_end; - void *private; - bool pfault_enabled; - /* only set for protected virtual machines */ - unsigned long guest_handle; - /* Additional data for shadow guest address spaces */ - struct radix_tree_root host_to_rmap; - struct list_head children; - spinlock_t shadow_lock; - struct gmap *parent; - unsigned long orig_asce; - int edat_level; - bool removed; - bool initialized; -}; - -/** - * struct gmap_rmap - reverse mapping for shadow page table entries - * @next: pointer to next rmap in the list - * @raddr: virtual rmap address in the shadow guest address space - */ -struct gmap_rmap { - struct gmap_rmap *next; - unsigned long raddr; -}; - -#define gmap_for_each_rmap(pos, head) \ - for (pos =3D (head); pos; pos =3D pos->next) - -#define gmap_for_each_rmap_safe(pos, n, head) \ - for (pos =3D (head); n =3D pos ? pos->next : NULL, pos; pos =3D n) - -/** - * struct gmap_notifier - notify function block for page invalidation - * @notifier_call: address of callback function - */ -struct gmap_notifier { - struct list_head list; - struct rcu_head rcu; - void (*notifier_call)(struct gmap *gmap, unsigned long start, - unsigned long end); -}; - -static inline int gmap_is_shadow(struct gmap *gmap) -{ - return !!gmap->parent; -} - -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit); -void gmap_remove(struct gmap *gmap); -struct gmap *gmap_get(struct gmap *gmap); -void gmap_put(struct gmap *gmap); -void gmap_free(struct gmap *gmap); -struct gmap *gmap_alloc(unsigned long limit); - -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len); -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len); -unsigned long __gmap_translate(struct gmap *, unsigned long gaddr); -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr); -void __gmap_zap(struct gmap *, unsigned long gaddr); -void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long v= maddr); - -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val); - -void gmap_unshadow(struct gmap *sg); -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake); -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake); -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake); -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake); -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte); - -void gmap_register_pte_notifier(struct gmap_notifier *); -void gmap_unregister_pte_notifier(struct gmap_notifier *); - -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits); - -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap= [4], - unsigned long gaddr, unsigned long vmaddr); -int s390_replace_asce(struct gmap *gmap); -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns); -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible); -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level); - -/** - * s390_uv_destroy_range - Destroy a range of pages in the given mm. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls, b= ut - * it will otherwise only return when it completed. - */ -static inline void s390_uv_destroy_range(struct mm_struct *mm, unsigned lo= ng start, - unsigned long end) -{ - (void)__s390_uv_destroy_range(mm, start, end, false); -} - -/** - * s390_uv_destroy_range_interruptible - Destroy a range of pages in the - * given mm, but stop when a fatal signal is received. - * @mm: the mm on which to operate on - * @start: the start of the range - * @end: the end of the range - * - * This function will call cond_sched, so it should not generate stalls. If - * a fatal signal is received, it will return with -EINTR immediately, - * without finishing destroying the whole range. Upon successful - * completion, 0 is returned. - */ -static inline int s390_uv_destroy_range_interruptible(struct mm_struct *mm= , unsigned long start, - unsigned long end) -{ - return __s390_uv_destroy_range(mm, start, end, true); -} -#endif /* _ASM_S390_GMAP_H */ diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 7ccad785e4fe..b6ec18999c62 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1380,8 +1380,6 @@ static inline int ptep_set_access_flags(struct vm_are= a_struct *vma, void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t entry); void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -void ptep_notify(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, unsigned long bits); int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, pte_t *ptep, int prot, unsigned long bit); void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, @@ -1407,10 +1405,6 @@ int set_pgste_bits(struct mm_struct *mm, unsigned lo= ng addr, int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, unsigned long *oldpte, unsigned long *oldpgste); -void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr); -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr); =20 #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); @@ -2035,9 +2029,6 @@ extern int __vmem_map_4k_page(unsigned long addr, uns= igned long phys, pgprot_t p extern int vmem_map_4k_page(unsigned long addr, unsigned long phys, pgprot= _t prot); extern void vmem_unmap_4k_page(unsigned long addr); extern pte_t *vmem_get_alloc_pte(unsigned long addr, bool alloc); -extern int s390_enable_sie(void); -extern int s390_enable_skey(void); -extern void s390_reset_cmma(struct mm_struct *mm); =20 /* s390 has a private copy of get unmapped area to deal with cache synonym= s */ #define HAVE_ARCH_UNMAPPED_AREA diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile index bd0401cc7ca5..193899c39ca7 100644 --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -10,7 +10,6 @@ obj-$(CONFIG_CMM) +=3D cmm.o obj-$(CONFIG_DEBUG_VIRTUAL) +=3D physaddr.o obj-$(CONFIG_HUGETLB_PAGE) +=3D hugetlbpage.o obj-$(CONFIG_PTDUMP) +=3D dump_pagetables.o -obj-$(CONFIG_PGSTE) +=3D gmap.o obj-$(CONFIG_PFAULT) +=3D pfault.o =20 obj-$(subst m,y,$(CONFIG_KVM)) +=3D gmap_helpers.o diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c deleted file mode 100644 index 8ff6bba107e8..000000000000 --- a/arch/s390/mm/gmap.c +++ /dev/null @@ -1,2453 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * KVM guest address space mapping code - * - * Copyright IBM Corp. 2007, 2020 - * Author(s): Martin Schwidefsky - * David Hildenbrand - * Janosch Frank - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * The address is saved in a radix tree directly; NULL would be ambiguous, - * since 0 is a valid address, and NULL is returned when nothing was found. - * The lower bits are ignored by all users of the macro, so it can be used - * to distinguish a valid address 0 from a NULL. - */ -#define VALID_GADDR_FLAG 1 -#define IS_GADDR_VALID(gaddr) ((gaddr) & VALID_GADDR_FLAG) -#define MAKE_VALID_GADDR(gaddr) (((gaddr) & HPAGE_MASK) | VALID_GADDR_FLAG) - -#define GMAP_SHADOW_FAKE_TABLE 1ULL - -static struct page *gmap_alloc_crst(void) -{ - struct page *page; - - page =3D alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); - if (!page) - return NULL; - __arch_set_page_dat(page_to_virt(page), 1UL << CRST_ALLOC_ORDER); - return page; -} - -/** - * gmap_alloc - allocate and initialize a guest address space - * @limit: maximum address of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_alloc(unsigned long limit) -{ - struct gmap *gmap; - struct page *page; - unsigned long *table; - unsigned long etype, atype; - - if (limit < _REGION3_SIZE) { - limit =3D _REGION3_SIZE - 1; - atype =3D _ASCE_TYPE_SEGMENT; - etype =3D _SEGMENT_ENTRY_EMPTY; - } else if (limit < _REGION2_SIZE) { - limit =3D _REGION2_SIZE - 1; - atype =3D _ASCE_TYPE_REGION3; - etype =3D _REGION3_ENTRY_EMPTY; - } else if (limit < _REGION1_SIZE) { - limit =3D _REGION1_SIZE - 1; - atype =3D _ASCE_TYPE_REGION2; - etype =3D _REGION2_ENTRY_EMPTY; - } else { - limit =3D -1UL; - atype =3D _ASCE_TYPE_REGION1; - etype =3D _REGION1_ENTRY_EMPTY; - } - gmap =3D kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT); - if (!gmap) - goto out; - INIT_LIST_HEAD(&gmap->children); - INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT); - INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT); - spin_lock_init(&gmap->guest_table_lock); - spin_lock_init(&gmap->shadow_lock); - refcount_set(&gmap->ref_count, 1); - page =3D gmap_alloc_crst(); - if (!page) - goto out_free; - table =3D page_to_virt(page); - crst_table_init(table, etype); - gmap->table =3D table; - gmap->asce =3D atype | _ASCE_TABLE_LENGTH | - _ASCE_USER_BITS | __pa(table); - gmap->asce_end =3D limit; - return gmap; - -out_free: - kfree(gmap); -out: - return NULL; -} -EXPORT_SYMBOL_GPL(gmap_alloc); - -/** - * gmap_create - create a guest address space - * @mm: pointer to the parent mm_struct - * @limit: maximum size of the gmap address space - * - * Returns a guest address space structure. - */ -struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit) -{ - struct gmap *gmap; - unsigned long gmap_asce; - - gmap =3D gmap_alloc(limit); - if (!gmap) - return NULL; - gmap->mm =3D mm; - spin_lock(&mm->context.lock); - list_add_rcu(&gmap->list, &mm->context.gmap_list); - if (list_is_singular(&mm->context.gmap_list)) - gmap_asce =3D gmap->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(mm->context.gmap_asce, gmap_asce); - spin_unlock(&mm->context.lock); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_create); - -static void gmap_flush_tlb(struct gmap *gmap) -{ - if (cpu_has_idte()) - __tlb_flush_idte(gmap->asce); - else - __tlb_flush_global(); -} - -static void gmap_radix_tree_free(struct radix_tree_root *root) -{ - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - radix_tree_delete(root, index); - } - } while (nr > 0); -} - -static void gmap_rmap_radix_tree_free(struct radix_tree_root *root) -{ - struct gmap_rmap *rmap, *rnext, *head; - struct radix_tree_iter iter; - unsigned long indices[16]; - unsigned long index; - void __rcu **slot; - int i, nr; - - /* A radix tree is freed by deleting all of its entries */ - index =3D 0; - do { - nr =3D 0; - radix_tree_for_each_slot(slot, root, &iter, index) { - indices[nr] =3D iter.index; - if (++nr =3D=3D 16) - break; - } - for (i =3D 0; i < nr; i++) { - index =3D indices[i]; - head =3D radix_tree_delete(root, index); - gmap_for_each_rmap_safe(rmap, rnext, head) - kfree(rmap); - } - } while (nr > 0); -} - -static void gmap_free_crst(unsigned long *table, bool free_ptes) -{ - bool is_segment =3D (table[0] & _SEGMENT_ENTRY_TYPE_MASK) =3D=3D 0; - int i; - - if (is_segment) { - if (!free_ptes) - goto out; - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _SEGMENT_ENTRY_INVALID)) - page_table_free_pgste(page_ptdesc(phys_to_page(table[i]))); - } else { - for (i =3D 0; i < _CRST_ENTRIES; i++) - if (!(table[i] & _REGION_ENTRY_INVALID)) - gmap_free_crst(__va(table[i] & PAGE_MASK), free_ptes); - } - -out: - free_pages((unsigned long)table, CRST_ALLOC_ORDER); -} - -/** - * gmap_free - free a guest address space - * @gmap: pointer to the guest address space structure - * - * No locks required. There are no references to this gmap anymore. - */ -void gmap_free(struct gmap *gmap) -{ - /* Flush tlb of all gmaps (if not already done for shadows) */ - if (!(gmap_is_shadow(gmap) && gmap->removed)) - gmap_flush_tlb(gmap); - /* Free all segment & region tables. */ - gmap_free_crst(gmap->table, gmap_is_shadow(gmap)); - - gmap_radix_tree_free(&gmap->guest_to_host); - gmap_radix_tree_free(&gmap->host_to_guest); - - /* Free additional data for a shadow gmap */ - if (gmap_is_shadow(gmap)) { - gmap_rmap_radix_tree_free(&gmap->host_to_rmap); - /* Release reference to the parent */ - gmap_put(gmap->parent); - } - - kfree(gmap); -} -EXPORT_SYMBOL_GPL(gmap_free); - -/** - * gmap_get - increase reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * Returns the gmap pointer - */ -struct gmap *gmap_get(struct gmap *gmap) -{ - refcount_inc(&gmap->ref_count); - return gmap; -} -EXPORT_SYMBOL_GPL(gmap_get); - -/** - * gmap_put - decrease reference counter for guest address space - * @gmap: pointer to the guest address space structure - * - * If the reference counter reaches zero the guest address space is freed. - */ -void gmap_put(struct gmap *gmap) -{ - if (refcount_dec_and_test(&gmap->ref_count)) - gmap_free(gmap); -} -EXPORT_SYMBOL_GPL(gmap_put); - -/** - * gmap_remove - remove a guest address space but do not free it yet - * @gmap: pointer to the guest address space structure - */ -void gmap_remove(struct gmap *gmap) -{ - struct gmap *sg, *next; - unsigned long gmap_asce; - - /* Remove all shadow gmaps linked to this gmap */ - if (!list_empty(&gmap->children)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, &gmap->children, list) { - list_del(&sg->list); - gmap_put(sg); - } - spin_unlock(&gmap->shadow_lock); - } - /* Remove gmap from the pre-mm list */ - spin_lock(&gmap->mm->context.lock); - list_del_rcu(&gmap->list); - if (list_empty(&gmap->mm->context.gmap_list)) - gmap_asce =3D 0; - else if (list_is_singular(&gmap->mm->context.gmap_list)) - gmap_asce =3D list_first_entry(&gmap->mm->context.gmap_list, - struct gmap, list)->asce; - else - gmap_asce =3D -1UL; - WRITE_ONCE(gmap->mm->context.gmap_asce, gmap_asce); - spin_unlock(&gmap->mm->context.lock); - synchronize_rcu(); - /* Put reference */ - gmap_put(gmap); -} -EXPORT_SYMBOL_GPL(gmap_remove); - -/* - * gmap_alloc_table is assumed to be called with mmap_lock held - */ -static int gmap_alloc_table(struct gmap *gmap, unsigned long *table, - unsigned long init, unsigned long gaddr) -{ - struct page *page; - unsigned long *new; - - /* since we dont free the gmap table until gmap_free we can unlock */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - new =3D page_to_virt(page); - crst_table_init(new, init); - spin_lock(&gmap->guest_table_lock); - if (*table & _REGION_ENTRY_INVALID) { - *table =3D __pa(new) | _REGION_ENTRY_LENGTH | - (*table & _REGION_ENTRY_TYPE_MASK); - page =3D NULL; - } - spin_unlock(&gmap->guest_table_lock); - if (page) - __free_pages(page, CRST_ALLOC_ORDER); - return 0; -} - -static unsigned long host_to_guest_lookup(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_lookup(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static unsigned long host_to_guest_delete(struct gmap *gmap, unsigned long= vmaddr) -{ - return (unsigned long)radix_tree_delete(&gmap->host_to_guest, vmaddr >> P= MD_SHIFT); -} - -static pmd_t *host_to_guest_pmd_delete(struct gmap *gmap, unsigned long vm= addr, - unsigned long *gaddr) -{ - *gaddr =3D host_to_guest_delete(gmap, vmaddr); - if (IS_GADDR_VALID(*gaddr)) - return (pmd_t *)gmap_table_walk(gmap, *gaddr, 1); - return NULL; -} - -/** - * __gmap_unlink_by_vmaddr - unlink a single segment via a host address - * @gmap: pointer to the guest address space structure - * @vmaddr: address in the host process address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unlink_by_vmaddr(struct gmap *gmap, unsigned long vmaddr) -{ - unsigned long gaddr; - int flush =3D 0; - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - spin_lock(&gmap->guest_table_lock); - - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - flush =3D (pmd_val(*pmdp) !=3D _SEGMENT_ENTRY_EMPTY); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - - spin_unlock(&gmap->guest_table_lock); - return flush; -} - -/** - * __gmap_unmap_by_gaddr - unmap a single segment via a guest address - * @gmap: pointer to the guest address space structure - * @gaddr: address in the guest address space - * - * Returns 1 if a TLB flush is required - */ -static int __gmap_unmap_by_gaddr(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) radix_tree_delete(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - return vmaddr ? __gmap_unlink_by_vmaddr(gmap, vmaddr) : 0; -} - -/** - * gmap_unmap_segment - unmap segment from the guest address space - * @gmap: pointer to the guest address space structure - * @to: address in the guest address space - * @len: length of the memory area to unmap - * - * Returns 0 if the unmap succeeded, -EINVAL if not. - */ -int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long = len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || to + len < to) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - return 0; -} -EXPORT_SYMBOL_GPL(gmap_unmap_segment); - -/** - * gmap_map_segment - map a segment to the guest address space - * @gmap: pointer to the guest address space structure - * @from: source address in the parent address space - * @to: target address in the guest address space - * @len: length of the memory area to map - * - * Returns 0 if the mmap succeeded, -EINVAL or -ENOMEM if not. - */ -int gmap_map_segment(struct gmap *gmap, unsigned long from, - unsigned long to, unsigned long len) -{ - unsigned long off; - int flush; - - BUG_ON(gmap_is_shadow(gmap)); - if ((from | to | len) & (PMD_SIZE - 1)) - return -EINVAL; - if (len =3D=3D 0 || from + len < from || to + len < to || - from + len - 1 > TASK_SIZE_MAX || to + len - 1 > gmap->asce_end) - return -EINVAL; - - flush =3D 0; - mmap_write_lock(gmap->mm); - for (off =3D 0; off < len; off +=3D PMD_SIZE) { - /* Remove old translation */ - flush |=3D __gmap_unmap_by_gaddr(gmap, to + off); - /* Store new translation */ - if (radix_tree_insert(&gmap->guest_to_host, - (to + off) >> PMD_SHIFT, - (void *) from + off)) - break; - } - mmap_write_unlock(gmap->mm); - if (flush) - gmap_flush_tlb(gmap); - if (off >=3D len) - return 0; - gmap_unmap_segment(gmap, to, len); - return -ENOMEM; -} -EXPORT_SYMBOL_GPL(gmap_map_segment); - -/** - * __gmap_translate - translate a guest address to a user space address - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * - * Returns user space address which corresponds to the guest address or - * -EFAULT if no such mapping exists. - * This function does not establish potentially missing page table entries. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long __gmap_translate(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - vmaddr =3D (unsigned long) - radix_tree_lookup(&gmap->guest_to_host, gaddr >> PMD_SHIFT); - /* Note: guest_to_host is empty for a shadow gmap */ - return vmaddr ? (vmaddr | (gaddr & ~PMD_MASK)) : -EFAULT; -} -EXPORT_SYMBOL_GPL(__gmap_translate); - -/** - * gmap_unlink - disconnect a page table from the gmap shadow tables - * @mm: pointer to the parent mm_struct - * @table: pointer to the host page table - * @vmaddr: vm address associated with the host page table - */ -void gmap_unlink(struct mm_struct *mm, unsigned long *table, - unsigned long vmaddr) -{ - struct gmap *gmap; - int flush; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - flush =3D __gmap_unlink_by_vmaddr(gmap, vmaddr); - if (flush) - gmap_flush_tlb(gmap); - } - rcu_read_unlock(); -} - -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new, - unsigned long gaddr); - -/** - * __gmap_link - set up shadow page tables to connect a host to a guest ad= dress - * @gmap: pointer to guest mapping meta data structure - * @gaddr: guest address - * @vmaddr: vm address - * - * Returns 0 on success, -ENOMEM for out of memory conditions, and -EFAULT - * if the vm address is already mapped to a different guest segment. - * The mmap_lock of the mm that belongs to the address space must be held - * when this function gets called. - */ -int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmad= dr) -{ - struct mm_struct *mm; - unsigned long *table; - spinlock_t *ptl; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - u64 unprot; - int rc; - - BUG_ON(gmap_is_shadow(gmap)); - /* Create higher level tables in the gmap page table */ - table =3D gmap->table; - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION1) { - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION2_ENTRY_EMPTY, - gaddr & _REGION1_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION2) { - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _REGION3_ENTRY_EMPTY, - gaddr & _REGION2_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - if ((gmap->asce & _ASCE_TYPE_MASK) >=3D _ASCE_TYPE_REGION3) { - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if ((*table & _REGION_ENTRY_INVALID) && - gmap_alloc_table(gmap, table, _SEGMENT_ENTRY_EMPTY, - gaddr & _REGION3_MASK)) - return -ENOMEM; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - } - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - /* Walk the parent mm page table */ - mm =3D gmap->mm; - pgd =3D pgd_offset(mm, vmaddr); - VM_BUG_ON(pgd_none(*pgd)); - p4d =3D p4d_offset(pgd, vmaddr); - VM_BUG_ON(p4d_none(*p4d)); - pud =3D pud_offset(p4d, vmaddr); - VM_BUG_ON(pud_none(*pud)); - /* large puds cannot yet be handled */ - if (pud_leaf(*pud)) - return -EFAULT; - pmd =3D pmd_offset(pud, vmaddr); - VM_BUG_ON(pmd_none(*pmd)); - /* Are we allowed to use huge pages? */ - if (pmd_leaf(*pmd) && !gmap->mm->context.allow_gmap_hpage_1m) - return -EFAULT; - /* Link gmap segment table entry location to page table. */ - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - return rc; - ptl =3D pmd_lock(mm, pmd); - spin_lock(&gmap->guest_table_lock); - if (*table =3D=3D _SEGMENT_ENTRY_EMPTY) { - rc =3D radix_tree_insert(&gmap->host_to_guest, - vmaddr >> PMD_SHIFT, - (void *)MAKE_VALID_GADDR(gaddr)); - if (!rc) { - if (pmd_leaf(*pmd)) { - *table =3D (pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS_LARGE) - | _SEGMENT_ENTRY_GMAP_UC - | _SEGMENT_ENTRY; - } else - *table =3D pmd_val(*pmd) & - _SEGMENT_ENTRY_HARDWARE_BITS; - } - } else if (*table & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) { - unprot =3D (u64)*table; - unprot &=3D ~_SEGMENT_ENTRY_PROTECT; - unprot |=3D _SEGMENT_ENTRY_GMAP_UC; - gmap_pmdp_xchg(gmap, (pmd_t *)table, __pmd(unprot), gaddr); - } - spin_unlock(&gmap->guest_table_lock); - spin_unlock(ptl); - radix_tree_preload_end(); - return rc; -} -EXPORT_SYMBOL(__gmap_link); - -/* - * this function is assumed to be called with mmap_lock held - */ -void __gmap_zap(struct gmap *gmap, unsigned long gaddr) -{ - unsigned long vmaddr; - - mmap_assert_locked(gmap->mm); - - /* Find the vm address for the guest address */ - vmaddr =3D (unsigned long) radix_tree_lookup(&gmap->guest_to_host, - gaddr >> PMD_SHIFT); - if (vmaddr) { - vmaddr |=3D gaddr & ~PMD_MASK; - gmap_helper_zap_one_page(gmap->mm, vmaddr); - } -} -EXPORT_SYMBOL_GPL(__gmap_zap); - -static LIST_HEAD(gmap_notifier_list); -static DEFINE_SPINLOCK(gmap_notifier_lock); - -/** - * gmap_register_pte_notifier - register a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_register_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_add_rcu(&nb->list, &gmap_notifier_list); - spin_unlock(&gmap_notifier_lock); -} -EXPORT_SYMBOL_GPL(gmap_register_pte_notifier); - -/** - * gmap_unregister_pte_notifier - remove a pte invalidation callback - * @nb: pointer to the gmap notifier block - */ -void gmap_unregister_pte_notifier(struct gmap_notifier *nb) -{ - spin_lock(&gmap_notifier_lock); - list_del_rcu(&nb->list); - spin_unlock(&gmap_notifier_lock); - synchronize_rcu(); -} -EXPORT_SYMBOL_GPL(gmap_unregister_pte_notifier); - -/** - * gmap_call_notifier - call all registered invalidation callbacks - * @gmap: pointer to guest mapping meta data structure - * @start: start virtual address in the guest address space - * @end: end virtual address in the guest address space - */ -static void gmap_call_notifier(struct gmap *gmap, unsigned long start, - unsigned long end) -{ - struct gmap_notifier *nb; - - list_for_each_entry(nb, &gmap_notifier_list, list) - nb->notifier_call(gmap, start, end); -} - -/** - * gmap_table_walk - walk the gmap page tables - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @level: page table level to stop at - * - * Returns a table entry pointer for the given guest address and @level - * @level=3D0 : returns a pointer to a page table table entry (or NULL) - * @level=3D1 : returns a pointer to a segment table entry (or NULL) - * @level=3D2 : returns a pointer to a region-3 table entry (or NULL) - * @level=3D3 : returns a pointer to a region-2 table entry (or NULL) - * @level=3D4 : returns a pointer to a region-1 table entry (or NULL) - * - * Returns NULL if the gmap page tables could not be walked to the - * requested level. - * - * Note: Can also be called for shadow gmaps. - */ -unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int= level) -{ - const int asce_type =3D gmap->asce & _ASCE_TYPE_MASK; - unsigned long *table =3D gmap->table; - - if (gmap_is_shadow(gmap) && gmap->removed) - return NULL; - - if (WARN_ON_ONCE(level > (asce_type >> 2) + 1)) - return NULL; - - if (asce_type !=3D _ASCE_TYPE_REGION1 && - gaddr & (-1UL << (31 + (asce_type >> 2) * 11))) - return NULL; - - switch (asce_type) { - case _ASCE_TYPE_REGION1: - table +=3D (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT; - if (level =3D=3D 4) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION2: - table +=3D (gaddr & _REGION2_INDEX) >> _REGION2_SHIFT; - if (level =3D=3D 3) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_REGION3: - table +=3D (gaddr & _REGION3_INDEX) >> _REGION3_SHIFT; - if (level =3D=3D 2) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _REGION_ENTRY_ORIGIN); - fallthrough; - case _ASCE_TYPE_SEGMENT: - table +=3D (gaddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT; - if (level =3D=3D 1) - break; - if (*table & _REGION_ENTRY_INVALID) - return NULL; - table =3D __va(*table & _SEGMENT_ENTRY_ORIGIN); - table +=3D (gaddr & _PAGE_INDEX) >> PAGE_SHIFT; - } - return table; -} -EXPORT_SYMBOL(gmap_table_walk); - -/** - * gmap_pte_op_walk - walk the gmap page table, get the page table lock - * and return the pte pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @ptl: pointer to the spinlock pointer - * - * Returns a pointer to the locked pte for a guest address, or NULL - */ -static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr, - spinlock_t **ptl) -{ - unsigned long *table; - - BUG_ON(gmap_is_shadow(gmap)); - /* Walk the gmap page table, lock and get pte pointer */ - table =3D gmap_table_walk(gmap, gaddr, 1); /* get segment pointer */ - if (!table || *table & _SEGMENT_ENTRY_INVALID) - return NULL; - return pte_alloc_map_lock(gmap->mm, (pmd_t *) table, gaddr, ptl); -} - -/** - * gmap_pte_op_fixup - force a page in and connect the gmap page table - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @vmaddr: address in the host process address space - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * - * Returns 0 if the caller can retry __gmap_translate (might fail again), - * -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing - * up or connecting the gmap page table. - */ -static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr, - unsigned long vmaddr, int prot) -{ - struct mm_struct *mm =3D gmap->mm; - unsigned int fault_flags; - bool unlocked =3D false; - - BUG_ON(gmap_is_shadow(gmap)); - fault_flags =3D (prot =3D=3D PROT_WRITE) ? FAULT_FLAG_WRITE : 0; - if (fixup_user_fault(mm, vmaddr, fault_flags, &unlocked)) - return -EFAULT; - if (unlocked) - /* lost mmap_lock, caller has to retry __gmap_translate */ - return 0; - /* Connect the page tables */ - return __gmap_link(gmap, gaddr, vmaddr); -} - -/** - * gmap_pte_op_end - release the page table lock - * @ptep: pointer to the locked pte - * @ptl: pointer to the page table spinlock - */ -static void gmap_pte_op_end(pte_t *ptep, spinlock_t *ptl) -{ - pte_unmap_unlock(ptep, ptl); -} - -/** - * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock - * and return the pmd pointer - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * - * Returns a pointer to the pmd for a guest address, or NULL - */ -static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gad= dr) -{ - pmd_t *pmdp; - - BUG_ON(gmap_is_shadow(gmap)); - pmdp =3D (pmd_t *) gmap_table_walk(gmap, gaddr, 1); - if (!pmdp) - return NULL; - - /* without huge pages, there is no need to take the table lock */ - if (!gmap->mm->context.allow_gmap_hpage_1m) - return pmd_none(*pmdp) ? NULL : pmdp; - - spin_lock(&gmap->guest_table_lock); - if (pmd_none(*pmdp)) { - spin_unlock(&gmap->guest_table_lock); - return NULL; - } - - /* 4k page table entries are locked via the pte (pte_alloc_map_lock). */ - if (!pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); - return pmdp; -} - -/** - * gmap_pmd_op_end - release the guest_table_lock if needed - * @gmap: pointer to the guest mapping meta data structure - * @pmdp: pointer to the pmd - */ -static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp) -{ - if (pmd_leaf(*pmdp)) - spin_unlock(&gmap->guest_table_lock); -} - -/* - * gmap_protect_pmd - remove access rights to memory and set pmd notificat= ion bits - * @pmdp: pointer to the pmd to be protected - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns: - * 0 if successfully protected - * -EAGAIN if a fixup is needed - * -EINVAL if unsupported notifier bits have been specified - * - * Expected to be called with sg->mm->mmap_lock in read and - * guest_table_lock held. - */ -static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int pmd_i =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID; - int pmd_p =3D pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT; - pmd_t new =3D *pmdp; - - /* Fixup needed */ - if ((pmd_i && (prot !=3D PROT_NONE)) || (pmd_p && (prot =3D=3D PROT_WRITE= ))) - return -EAGAIN; - - if (prot =3D=3D PROT_NONE && !pmd_i) { - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (prot =3D=3D PROT_READ && !pmd_p) { - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_INVALID)); - new =3D set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_PROTECT)); - gmap_pmdp_xchg(gmap, pmdp, new, gaddr); - } - - if (bits & GMAP_NOTIFY_MPROT) - set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - - /* Shadow GMAP protection needs split PMDs */ - if (bits & GMAP_NOTIFY_SHADOW) - return -EINVAL; - - return 0; -} - -/* - * gmap_protect_pte - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @pmdp: pointer to the pmd associated with the pte - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: notification bits to set - * - * Returns 0 if successfully protected, -ENOMEM if out of memory and - * -EAGAIN if a fixup is needed. - * - * Expected to be called with sg->mm->mmap_lock in read - */ -static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr, - pmd_t *pmdp, int prot, unsigned long bits) -{ - int rc; - pte_t *ptep; - spinlock_t *ptl; - unsigned long pbits =3D 0; - - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return -EAGAIN; - - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl); - if (!ptep) - return -ENOMEM; - - pbits |=3D (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0; - pbits |=3D (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0; - /* Protect and unlock. */ - rc =3D ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits); - gmap_pte_op_end(ptep, ptl); - return rc; -} - -/* - * gmap_protect_range - remove access rights to memory and set pgste bits - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @len: size of area - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bits: pgste notification bits to set - * - * Returns: - * PAGE_SIZE if a small page was successfully protected; - * HPAGE_SIZE if a large page was successfully protected; - * -ENOMEM if out of memory; - * -EFAULT if gaddr is invalid (or mapping for shadows is missing); - * -EAGAIN if the guest mapping is missing and should be fixed by the ca= ller. - * - * Context: Called with sg->mm->mmap_lock in read. - */ -int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, uns= igned long bits) -{ - pmd_t *pmdp; - int rc =3D 0; - - BUG_ON(gmap_is_shadow(gmap)); - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return -EAGAIN; - - if (!pmd_leaf(*pmdp)) { - rc =3D gmap_protect_pte(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D PAGE_SIZE; - } else { - rc =3D gmap_protect_pmd(gmap, gaddr, pmdp, prot, bits); - if (!rc) - rc =3D HPAGE_SIZE; - } - gmap_pmd_op_end(gmap, pmdp); - - return rc; -} -EXPORT_SYMBOL_GPL(gmap_protect_one); - -/** - * gmap_read_table - get an unsigned long value from a guest page table us= ing - * absolute addressing, without marking the page referen= ced. - * @gmap: pointer to guest mapping meta data structure - * @gaddr: virtual address in the guest address space - * @val: pointer to the unsigned long value to return - * - * Returns 0 if the value was read, -ENOMEM if out of memory and -EFAULT - * if reading using the virtual address failed. -EINVAL if called on a gmap - * shadow. - * - * Called with gmap->mm->mmap_lock in read. - */ -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long = *val) -{ - unsigned long address, vmaddr; - spinlock_t *ptl; - pte_t *ptep, pte; - int rc; - - if (gmap_is_shadow(gmap)) - return -EINVAL; - - while (1) { - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(gmap, gaddr, &ptl); - if (ptep) { - pte =3D *ptep; - if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) { - address =3D pte_val(pte) & PAGE_MASK; - address +=3D gaddr & ~PAGE_MASK; - *val =3D *(unsigned long *)__va(address); - set_pte(ptep, set_pte_bit(*ptep, __pgprot(_PAGE_YOUNG))); - /* Do *NOT* clear the _PAGE_INVALID bit! */ - rc =3D 0; - } - gmap_pte_op_end(ptep, ptl); - } - if (!rc) - break; - vmaddr =3D __gmap_translate(gmap, gaddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ); - if (rc) - break; - } - return rc; -} -EXPORT_SYMBOL_GPL(gmap_read_table); - -/** - * gmap_insert_rmap - add a rmap to the host_to_rmap radix tree - * @sg: pointer to the shadow guest address space structure - * @vmaddr: vm address associated with the rmap - * @rmap: pointer to the rmap structure - * - * Called with the sg->guest_table_lock - */ -static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr, - struct gmap_rmap *rmap) -{ - struct gmap_rmap *temp; - void __rcu **slot; - - BUG_ON(!gmap_is_shadow(sg)); - slot =3D radix_tree_lookup_slot(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - if (slot) { - rmap->next =3D radix_tree_deref_slot_protected(slot, - &sg->guest_table_lock); - for (temp =3D rmap->next; temp; temp =3D temp->next) { - if (temp->raddr =3D=3D rmap->raddr) { - kfree(rmap); - return; - } - } - radix_tree_replace_slot(&sg->host_to_rmap, slot, rmap); - } else { - rmap->next =3D NULL; - radix_tree_insert(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT, - rmap); - } -} - -/** - * gmap_protect_rmap - restrict access rights to memory (RO) and create an= rmap - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow gmap - * @paddr: address in the parent guest address space - * @len: length of the memory area to protect - * - * Returns 0 if successfully protected and the rmap was created, -ENOMEM - * if out of memory and -EFAULT if paddr is invalid. - */ -static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr, - unsigned long paddr, unsigned long len) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr; - spinlock_t *ptl; - pte_t *ptep; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - while (len) { - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) - return vmaddr; - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D raddr; - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) { - kfree(rmap); - return rc; - } - rc =3D -EAGAIN; - ptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (ptep) { - spin_lock(&sg->guest_table_lock); - rc =3D ptep_force_prot(parent->mm, paddr, ptep, PROT_READ, - PGSTE_VSIE_BIT); - if (!rc) - gmap_insert_rmap(sg, vmaddr, rmap); - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(ptep, ptl); - } - radix_tree_preload_end(); - if (rc) { - kfree(rmap); - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ); - if (rc) - return rc; - continue; - } - paddr +=3D PAGE_SIZE; - len -=3D PAGE_SIZE; - } - return 0; -} - -#define _SHADOW_RMAP_MASK 0x7 -#define _SHADOW_RMAP_REGION1 0x5 -#define _SHADOW_RMAP_REGION2 0x4 -#define _SHADOW_RMAP_REGION3 0x3 -#define _SHADOW_RMAP_SEGMENT 0x2 -#define _SHADOW_RMAP_PGTABLE 0x1 - -/** - * gmap_idte_one - invalidate a single region or segment table entry - * @asce: region or segment table *origin* + table-type bits - * @vaddr: virtual address to identify the table entry to flush - * - * The invalid bit of a single region or segment table entry is set - * and the associated TLB entries depending on the entry are flushed. - * The table-type of the @asce identifies the portion of the @vaddr - * that is used as the invalidation index. - */ -static inline void gmap_idte_one(unsigned long asce, unsigned long vaddr) -{ - asm volatile( - " idte %0,0,%1" - : : "a" (asce), "a" (vaddr) : "cc", "memory"); -} - -/** - * gmap_unshadow_page - remove a page from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_page(struct gmap *sg, unsigned long raddr) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - table =3D gmap_table_walk(sg, raddr, 0); /* get page table pointer */ - if (!table || *table & _PAGE_INVALID) - return; - gmap_call_notifier(sg, raddr, raddr + PAGE_SIZE - 1); - ptep_unshadow_pte(sg->mm, raddr, (pte_t *) table); -} - -/** - * __gmap_unshadow_pgt - remove all entries from a shadow page table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @pgt: pointer to the start of a shadow page table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr, - unsigned long *pgt) -{ - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _PAGE_ENTRIES; i++, raddr +=3D PAGE_SIZE) - pgt[i] =3D _PAGE_INVALID; -} - -/** - * gmap_unshadow_pgt - remove a shadow page table from a segment entry - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_pgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long *ste; - phys_addr_t sto, pgt; - struct ptdesc *ptdesc; - - BUG_ON(!gmap_is_shadow(sg)); - ste =3D gmap_table_walk(sg, raddr, 1); /* get segment pointer */ - if (!ste || !(*ste & _SEGMENT_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _SEGMENT_SIZE - 1); - sto =3D __pa(ste - ((raddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT)); - gmap_idte_one(sto | _ASCE_TYPE_SEGMENT, raddr); - pgt =3D *ste & _SEGMENT_ENTRY_ORIGIN; - *ste =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); -} - -/** - * __gmap_unshadow_sgt - remove all entries from a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @sgt: pointer to the start of a shadow segment table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr, - unsigned long *sgt) -{ - struct ptdesc *ptdesc; - phys_addr_t pgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _SEGMENT_SIZE) { - if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN)) - continue; - pgt =3D sgt[i] & _REGION_ENTRY_ORIGIN; - sgt[i] =3D _SEGMENT_ENTRY_EMPTY; - __gmap_unshadow_pgt(sg, raddr, __va(pgt)); - /* Free page table */ - ptdesc =3D page_ptdesc(phys_to_page(pgt)); - page_table_free_pgste(ptdesc); - } -} - -/** - * gmap_unshadow_sgt - remove a shadow segment table from a region-3 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the shadow->guest_table_lock - */ -static void gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr) -{ - unsigned long r3o, *r3e; - phys_addr_t sgt; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r3e =3D gmap_table_walk(sg, raddr, 2); /* get region-3 pointer */ - if (!r3e || !(*r3e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION3_SIZE - 1); - r3o =3D (unsigned long) (r3e - ((raddr & _REGION3_INDEX) >> _REGION3_SHIF= T)); - gmap_idte_one(__pa(r3o) | _ASCE_TYPE_REGION3, raddr); - sgt =3D *r3e & _REGION_ENTRY_ORIGIN; - *r3e =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r3t - remove all entries from a shadow region-3 table - * @sg: pointer to the shadow guest address space structure - * @raddr: address in the shadow guest address space - * @r3t: pointer to the start of a shadow region-3 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr, - unsigned long *r3t) -{ - struct page *page; - phys_addr_t sgt; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION3_SIZE) { - if (!(r3t[i] & _REGION_ENTRY_ORIGIN)) - continue; - sgt =3D r3t[i] & _REGION_ENTRY_ORIGIN; - r3t[i] =3D _REGION3_ENTRY_EMPTY; - __gmap_unshadow_sgt(sg, raddr, __va(sgt)); - /* Free segment table */ - page =3D phys_to_page(sgt); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r3t - remove a shadow region-3 table from a region-2 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r3t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r2o, *r2e; - phys_addr_t r3t; - struct page *page; - - BUG_ON(!gmap_is_shadow(sg)); - r2e =3D gmap_table_walk(sg, raddr, 3); /* get region-2 pointer */ - if (!r2e || !(*r2e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION2_SIZE - 1); - r2o =3D (unsigned long) (r2e - ((raddr & _REGION2_INDEX) >> _REGION2_SHIF= T)); - gmap_idte_one(__pa(r2o) | _ASCE_TYPE_REGION2, raddr); - r3t =3D *r2e & _REGION_ENTRY_ORIGIN; - *r2e =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r2t - remove all entries from a shadow region-2 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r2t: pointer to the start of a shadow region-2 table - * - * Called with the sg->guest_table_lock - */ -static void __gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr, - unsigned long *r2t) -{ - phys_addr_t r3t; - struct page *page; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION2_SIZE) { - if (!(r2t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r3t =3D r2t[i] & _REGION_ENTRY_ORIGIN; - r2t[i] =3D _REGION2_ENTRY_EMPTY; - __gmap_unshadow_r3t(sg, raddr, __va(r3t)); - /* Free region 3 table */ - page =3D phys_to_page(r3t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow_r2t - remove a shadow region-2 table from a region-1 entry - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * - * Called with the sg->guest_table_lock - */ -static void gmap_unshadow_r2t(struct gmap *sg, unsigned long raddr) -{ - unsigned long r1o, *r1e; - struct page *page; - phys_addr_t r2t; - - BUG_ON(!gmap_is_shadow(sg)); - r1e =3D gmap_table_walk(sg, raddr, 4); /* get region-1 pointer */ - if (!r1e || !(*r1e & _REGION_ENTRY_ORIGIN)) - return; - gmap_call_notifier(sg, raddr, raddr + _REGION1_SIZE - 1); - r1o =3D (unsigned long) (r1e - ((raddr & _REGION1_INDEX) >> _REGION1_SHIF= T)); - gmap_idte_one(__pa(r1o) | _ASCE_TYPE_REGION1, raddr); - r2t =3D *r1e & _REGION_ENTRY_ORIGIN; - *r1e =3D _REGION1_ENTRY_EMPTY; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); -} - -/** - * __gmap_unshadow_r1t - remove all entries from a shadow region-1 table - * @sg: pointer to the shadow guest address space structure - * @raddr: rmap address in the shadow guest address space - * @r1t: pointer to the start of a shadow region-1 table - * - * Called with the shadow->guest_table_lock - */ -static void __gmap_unshadow_r1t(struct gmap *sg, unsigned long raddr, - unsigned long *r1t) -{ - unsigned long asce; - struct page *page; - phys_addr_t r2t; - int i; - - BUG_ON(!gmap_is_shadow(sg)); - asce =3D __pa(r1t) | _ASCE_TYPE_REGION1; - for (i =3D 0; i < _CRST_ENTRIES; i++, raddr +=3D _REGION1_SIZE) { - if (!(r1t[i] & _REGION_ENTRY_ORIGIN)) - continue; - r2t =3D r1t[i] & _REGION_ENTRY_ORIGIN; - __gmap_unshadow_r2t(sg, raddr, __va(r2t)); - /* Clear entry and flush translation r1t -> r2t */ - gmap_idte_one(asce, raddr); - r1t[i] =3D _REGION1_ENTRY_EMPTY; - /* Free region 2 table */ - page =3D phys_to_page(r2t); - __free_pages(page, CRST_ALLOC_ORDER); - } -} - -/** - * gmap_unshadow - remove a shadow page table completely - * @sg: pointer to the shadow guest address space structure - * - * Called with sg->guest_table_lock - */ -void gmap_unshadow(struct gmap *sg) -{ - unsigned long *table; - - BUG_ON(!gmap_is_shadow(sg)); - if (sg->removed) - return; - sg->removed =3D 1; - gmap_call_notifier(sg, 0, -1UL); - gmap_flush_tlb(sg); - table =3D __va(sg->asce & _ASCE_ORIGIN); - switch (sg->asce & _ASCE_TYPE_MASK) { - case _ASCE_TYPE_REGION1: - __gmap_unshadow_r1t(sg, 0, table); - break; - case _ASCE_TYPE_REGION2: - __gmap_unshadow_r2t(sg, 0, table); - break; - case _ASCE_TYPE_REGION3: - __gmap_unshadow_r3t(sg, 0, table); - break; - case _ASCE_TYPE_SEGMENT: - __gmap_unshadow_sgt(sg, 0, table); - break; - } -} -EXPORT_SYMBOL(gmap_unshadow); - -/** - * gmap_shadow_r2t - create an empty shadow region 2 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r2t: parent gmap address of the region 2 table to get shadowed - * @fake: r2t references contiguous guest memory block, not a r2t - * - * The r2t parameter specifies the address of the source table. The - * four pages of the source table are made read-only in the parent gmap - * address space. A write to the source table area @r2t will automatically - * remove the shadow r2 table and all of its descendants. - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r2t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r2t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 4); /* get region-1 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r2t), _REGION2_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r2t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r2t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r2t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION1_MASK) | _SHADOW_RMAP_REGION1; - origin =3D r2t & _REGION_ENTRY_ORIGIN; - offset =3D ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 4); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r2t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r2t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r2t); - -/** - * gmap_shadow_r3t - create a shadow region 3 table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @r3t: parent gmap address of the region 3 table to get shadowed - * @fake: r3t references contiguous guest memory block, not a r3t - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_r3t; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - /* Allocate a shadow region second table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_r3t =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 3); /* get region-2 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_r3t), _REGION3_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_r3t | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D (r3t & _REGION_ENTRY_PROTECT); - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make r3t read-only in parent gmap page table */ - raddr =3D (saddr & _REGION2_MASK) | _SHADOW_RMAP_REGION2; - origin =3D r3t & _REGION_ENTRY_ORIGIN; - offset =3D ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 3); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_r3t) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_r3t(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_r3t); - -/** - * gmap_shadow_sgt - create a shadow segment table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @sgt: parent gmap address of the segment table to get shadowed - * @fake: sgt references contiguous guest memory block, not a sgt - * - * Returns: 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sg= t, - int fake) -{ - unsigned long raddr, origin, offset, len; - unsigned long *table; - phys_addr_t s_sgt; - struct page *page; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (sgt & _REGION3_ENTRY_LARGE)); - /* Allocate a shadow segment table */ - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - s_sgt =3D page_to_phys(page); - /* Install shadow region second table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 2); /* get region-3 pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _REGION_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _REGION_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - crst_table_init(__va(s_sgt), _SEGMENT_ENTRY_EMPTY); - /* mark as invalid as long as the parent table is not protected */ - *table =3D s_sgt | _REGION_ENTRY_LENGTH | - _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID; - if (sg->edat_level >=3D 1) - *table |=3D sgt & _REGION_ENTRY_PROTECT; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_REGION_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make sgt read-only in parent gmap page table */ - raddr =3D (saddr & _REGION3_MASK) | _SHADOW_RMAP_REGION3; - origin =3D sgt & _REGION_ENTRY_ORIGIN; - offset =3D ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE; - len =3D ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset; - rc =3D gmap_protect_rmap(sg, raddr, origin + offset, len); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 2); - if (!table || (*table & _REGION_ENTRY_ORIGIN) !=3D s_sgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_REGION_ENTRY_INVALID; - } else { - gmap_unshadow_sgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - __free_pages(page, CRST_ALLOC_ORDER); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_sgt); - -static void gmap_pgste_set_pgt_addr(struct ptdesc *ptdesc, unsigned long p= gt_addr) -{ - unsigned long *pgstes =3D page_to_virt(ptdesc_page(ptdesc)); - - pgstes +=3D _PAGE_ENTRIES; - - pgstes[0] &=3D ~PGSTE_ST2_MASK; - pgstes[1] &=3D ~PGSTE_ST2_MASK; - pgstes[2] &=3D ~PGSTE_ST2_MASK; - pgstes[3] &=3D ~PGSTE_ST2_MASK; - - pgstes[0] |=3D (pgt_addr >> 16) & PGSTE_ST2_MASK; - pgstes[1] |=3D pgt_addr & PGSTE_ST2_MASK; - pgstes[2] |=3D (pgt_addr << 16) & PGSTE_ST2_MASK; - pgstes[3] |=3D (pgt_addr << 32) & PGSTE_ST2_MASK; -} - -/** - * gmap_shadow_pgt - instantiate a shadow page table - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pgt: parent gmap address of the page table to get shadowed - * @fake: pgt references contiguous guest memory block, not a pgtable - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory, - * -EFAULT if an address in the parent gmap could not be resolved and - * - * Called with gmap->mm->mmap_lock in read - */ -int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pg= t, - int fake) -{ - unsigned long raddr, origin; - unsigned long *table; - struct ptdesc *ptdesc; - phys_addr_t s_pgt; - int rc; - - BUG_ON(!gmap_is_shadow(sg) || (pgt & _SEGMENT_ENTRY_LARGE)); - /* Allocate a shadow page table */ - ptdesc =3D page_table_alloc_pgste(sg->mm); - if (!ptdesc) - return -ENOMEM; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN; - if (fake) - origin |=3D GMAP_SHADOW_FAKE_TABLE; - gmap_pgste_set_pgt_addr(ptdesc, origin); - s_pgt =3D page_to_phys(ptdesc_page(ptdesc)); - /* Install shadow page table */ - spin_lock(&sg->guest_table_lock); - table =3D gmap_table_walk(sg, saddr, 1); /* get segment pointer */ - if (!table) { - rc =3D -EAGAIN; /* Race with unshadow */ - goto out_free; - } - if (!(*table & _SEGMENT_ENTRY_INVALID)) { - rc =3D 0; /* Already established */ - goto out_free; - } else if (*table & _SEGMENT_ENTRY_ORIGIN) { - rc =3D -EAGAIN; /* Race with shadow */ - goto out_free; - } - /* mark as invalid as long as the parent table is not protected */ - *table =3D (unsigned long) s_pgt | _SEGMENT_ENTRY | - (pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID; - if (fake) { - /* nothing to protect for fake tables */ - *table &=3D ~_SEGMENT_ENTRY_INVALID; - spin_unlock(&sg->guest_table_lock); - return 0; - } - spin_unlock(&sg->guest_table_lock); - /* Make pgt read-only in parent gmap page table (not the pgste) */ - raddr =3D (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT; - origin =3D pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK; - rc =3D gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE); - spin_lock(&sg->guest_table_lock); - if (!rc) { - table =3D gmap_table_walk(sg, saddr, 1); - if (!table || (*table & _SEGMENT_ENTRY_ORIGIN) !=3D s_pgt) - rc =3D -EAGAIN; /* Race with unshadow */ - else - *table &=3D ~_SEGMENT_ENTRY_INVALID; - } else { - gmap_unshadow_pgt(sg, raddr); - } - spin_unlock(&sg->guest_table_lock); - return rc; -out_free: - spin_unlock(&sg->guest_table_lock); - page_table_free_pgste(ptdesc); - return rc; - -} -EXPORT_SYMBOL_GPL(gmap_shadow_pgt); - -/** - * gmap_shadow_page - create a shadow page mapping - * @sg: pointer to the shadow guest address space structure - * @saddr: faulting address in the shadow gmap - * @pte: pte in parent gmap address space to get shadowed - * - * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the - * shadow table structure is incomplete, -ENOMEM if out of memory and - * -EFAULT if an address in the parent gmap could not be resolved. - * - * Called with sg->mm->mmap_lock in read. - */ -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte) -{ - struct gmap *parent; - struct gmap_rmap *rmap; - unsigned long vmaddr, paddr; - spinlock_t *ptl; - pte_t *sptep, *tptep; - int prot; - int rc; - - BUG_ON(!gmap_is_shadow(sg)); - parent =3D sg->parent; - prot =3D (pte_val(pte) & _PAGE_PROTECT) ? PROT_READ : PROT_WRITE; - - rmap =3D kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT); - if (!rmap) - return -ENOMEM; - rmap->raddr =3D (saddr & PAGE_MASK) | _SHADOW_RMAP_PGTABLE; - - while (1) { - paddr =3D pte_val(pte) & PAGE_MASK; - vmaddr =3D __gmap_translate(parent, paddr); - if (IS_ERR_VALUE(vmaddr)) { - rc =3D vmaddr; - break; - } - rc =3D radix_tree_preload(GFP_KERNEL_ACCOUNT); - if (rc) - break; - rc =3D -EAGAIN; - sptep =3D gmap_pte_op_walk(parent, paddr, &ptl); - if (sptep) { - spin_lock(&sg->guest_table_lock); - /* Get page table pointer */ - tptep =3D (pte_t *) gmap_table_walk(sg, saddr, 0); - if (!tptep) { - spin_unlock(&sg->guest_table_lock); - gmap_pte_op_end(sptep, ptl); - radix_tree_preload_end(); - break; - } - rc =3D ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte); - if (rc > 0) { - /* Success and a new mapping */ - gmap_insert_rmap(sg, vmaddr, rmap); - rmap =3D NULL; - rc =3D 0; - } - gmap_pte_op_end(sptep, ptl); - spin_unlock(&sg->guest_table_lock); - } - radix_tree_preload_end(); - if (!rc) - break; - rc =3D gmap_pte_op_fixup(parent, paddr, vmaddr, prot); - if (rc) - break; - } - kfree(rmap); - return rc; -} -EXPORT_SYMBOL_GPL(gmap_shadow_page); - -/* - * gmap_shadow_notify - handle notifications for shadow gmap - * - * Called with sg->parent->shadow_lock. - */ -static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr, - unsigned long gaddr) -{ - struct gmap_rmap *rmap, *rnext, *head; - unsigned long start, end, bits, raddr; - - BUG_ON(!gmap_is_shadow(sg)); - - spin_lock(&sg->guest_table_lock); - if (sg->removed) { - spin_unlock(&sg->guest_table_lock); - return; - } - /* Check for top level table */ - start =3D sg->orig_asce & _ASCE_ORIGIN; - end =3D start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE; - if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >=3D start && - gaddr < end) { - /* The complete shadow table has to go */ - gmap_unshadow(sg); - spin_unlock(&sg->guest_table_lock); - list_del(&sg->list); - gmap_put(sg); - return; - } - /* Remove the page table tree from on specific entry */ - head =3D radix_tree_delete(&sg->host_to_rmap, vmaddr >> PAGE_SHIFT); - gmap_for_each_rmap_safe(rmap, rnext, head) { - bits =3D rmap->raddr & _SHADOW_RMAP_MASK; - raddr =3D rmap->raddr ^ bits; - switch (bits) { - case _SHADOW_RMAP_REGION1: - gmap_unshadow_r2t(sg, raddr); - break; - case _SHADOW_RMAP_REGION2: - gmap_unshadow_r3t(sg, raddr); - break; - case _SHADOW_RMAP_REGION3: - gmap_unshadow_sgt(sg, raddr); - break; - case _SHADOW_RMAP_SEGMENT: - gmap_unshadow_pgt(sg, raddr); - break; - case _SHADOW_RMAP_PGTABLE: - gmap_unshadow_page(sg, raddr); - break; - } - kfree(rmap); - } - spin_unlock(&sg->guest_table_lock); -} - -/** - * ptep_notify - call all invalidation callbacks for a specific pte. - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - * @pte: pointer to the page table entry - * @bits: bits from the pgste that caused the notify call - * - * This function is assumed to be called with the page table lock held - * for the pte to notify. - */ -void ptep_notify(struct mm_struct *mm, unsigned long vmaddr, - pte_t *pte, unsigned long bits) -{ - unsigned long offset, gaddr =3D 0; - struct gmap *gmap, *sg, *next; - - offset =3D ((unsigned long) pte) & (255 * sizeof(pte_t)); - offset =3D offset * (PAGE_SIZE / sizeof(pte_t)); - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - gaddr =3D host_to_guest_lookup(gmap, vmaddr) + offset; - spin_unlock(&gmap->guest_table_lock); - if (!IS_GADDR_VALID(gaddr)) - continue; - - if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) { - spin_lock(&gmap->shadow_lock); - list_for_each_entry_safe(sg, next, - &gmap->children, list) - gmap_shadow_notify(sg, vmaddr, gaddr); - spin_unlock(&gmap->shadow_lock); - } - if (bits & PGSTE_IN_BIT) - gmap_call_notifier(gmap, gaddr, gaddr + PAGE_SIZE - 1); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(ptep_notify); - -static void pmdp_notify_gmap(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_IN))); - gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1); -} - -/** - * gmap_pmdp_xchg - exchange a gmap pmd with another - * @gmap: pointer to the guest address space structure - * @pmdp: pointer to the pmd entry - * @new: replacement entry - * @gaddr: the affected guest address - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new, - unsigned long gaddr) -{ - gaddr &=3D HPAGE_MASK; - pmdp_notify_gmap(gmap, pmdp, gaddr); - new =3D clear_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_GMAP_IN)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, (pmd_t *)pmdp, IDTE_GUEST_ASCE, gmap->asce, - IDTE_GLOBAL); - else if (cpu_has_idte()) - __pmdp_idte(gaddr, (pmd_t *)pmdp, 0, 0, IDTE_GLOBAL); - else - __pmdp_csp(pmdp); - set_pmd(pmdp, new); -} - -static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr, - int purge) -{ - pmd_t *pmdp; - struct gmap *gmap; - unsigned long gaddr; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (purge) - __pmdp_csp(pmdp); - set_pmd(pmdp, __pmd(_SEGMENT_ENTRY_EMPTY)); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} - -/** - * gmap_pmdp_invalidate - invalidate all affected guest pmd entries without - * flushing - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr) -{ - gmap_pmdp_clear(mm, vmaddr, 0); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_invalidate); - -/** - * gmap_pmdp_csp - csp all affected guest pmd entries - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr) -{ - gmap_pmdp_clear(mm, vmaddr, 1); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_csp); - -/** - * gmap_pmdp_idte_local - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_LOCAL); - else if (cpu_has_idte()) - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_local); - -/** - * gmap_pmdp_idte_global - invalidate and clear a guest pmd entry - * @mm: pointer to the process mm_struct - * @vmaddr: virtual address in the process address space - */ -void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr) -{ - unsigned long gaddr; - struct gmap *gmap; - pmd_t *pmdp; - - rcu_read_lock(); - list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { - spin_lock(&gmap->guest_table_lock); - pmdp =3D host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); - if (pmdp) { - pmdp_notify_gmap(gmap, pmdp, gaddr); - WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | - _SEGMENT_ENTRY_GMAP_UC | - _SEGMENT_ENTRY)); - if (machine_has_tlb_guest()) - __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, - gmap->asce, IDTE_GLOBAL); - else if (cpu_has_idte()) - __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL); - else - __pmdp_csp(pmdp); - *pmdp =3D __pmd(_SEGMENT_ENTRY_EMPTY); - } - spin_unlock(&gmap->guest_table_lock); - } - rcu_read_unlock(); -} -EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global); - -/** - * gmap_test_and_clear_dirty_pmd - test and reset segment dirty status - * @gmap: pointer to guest address space - * @pmdp: pointer to the pmd to be tested - * @gaddr: virtual address in the guest address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -static bool gmap_test_and_clear_dirty_pmd(struct gmap *gmap, pmd_t *pmdp, - unsigned long gaddr) -{ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID) - return false; - - /* Already protected memory, which did not change is clean */ - if (pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT && - !(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_UC)) - return false; - - /* Clear UC indication and reset protection */ - set_pmd(pmdp, clear_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_GMAP_UC))); - gmap_protect_pmd(gmap, gaddr, pmdp, PROT_READ, 0); - return true; -} - -/** - * gmap_sync_dirty_log_pmd - set bitmap based on dirty status of segment - * @gmap: pointer to guest address space - * @bitmap: dirty bitmap for this pmd - * @gaddr: virtual address in the guest address space - * @vmaddr: virtual address in the host address space - * - * This function is assumed to be called with the guest_table_lock - * held. - */ -void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4], - unsigned long gaddr, unsigned long vmaddr) -{ - int i; - pmd_t *pmdp; - pte_t *ptep; - spinlock_t *ptl; - - pmdp =3D gmap_pmd_op_walk(gmap, gaddr); - if (!pmdp) - return; - - if (pmd_leaf(*pmdp)) { - if (gmap_test_and_clear_dirty_pmd(gmap, pmdp, gaddr)) - bitmap_fill(bitmap, _PAGE_ENTRIES); - } else { - for (i =3D 0; i < _PAGE_ENTRIES; i++, vmaddr +=3D PAGE_SIZE) { - ptep =3D pte_alloc_map_lock(gmap->mm, pmdp, vmaddr, &ptl); - if (!ptep) - continue; - if (ptep_test_and_clear_uc(gmap->mm, vmaddr, ptep)) - set_bit(i, bitmap); - pte_unmap_unlock(ptep, ptl); - } - } - gmap_pmd_op_end(gmap, pmdp); -} -EXPORT_SYMBOL_GPL(gmap_sync_dirty_log_pmd); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static int thp_split_walk_pmd_entry(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct vm_area_struct *vma =3D walk->vma; - - split_huge_pmd(vma, pmd, addr); - return 0; -} - -static const struct mm_walk_ops thp_split_walk_ops =3D { - .pmd_entry =3D thp_split_walk_pmd_entry, - .walk_lock =3D PGWALK_WRLOCK_VERIFY, -}; - -static inline void thp_split_mm(struct mm_struct *mm) -{ - struct vm_area_struct *vma; - VMA_ITERATOR(vmi, mm, 0); - - for_each_vma(vmi, vma) { - vm_flags_mod(vma, VM_NOHUGEPAGE, VM_HUGEPAGE); - walk_page_vma(vma, &thp_split_walk_ops, NULL); - } - mm->def_flags |=3D VM_NOHUGEPAGE; -} -#else -static inline void thp_split_mm(struct mm_struct *mm) -{ -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -/* - * switch on pgstes for its userspace process (for kvm) - */ -int s390_enable_sie(void) -{ - struct mm_struct *mm =3D current->mm; - - /* Do we have pgstes? if yes, we are done */ - if (mm_has_pgste(mm)) - return 0; - mmap_write_lock(mm); - mm->context.has_pgste =3D 1; - /* split thp mappings and disable thp for future mappings */ - thp_split_mm(mm); - mmap_write_unlock(mm); - return 0; -} -EXPORT_SYMBOL_GPL(s390_enable_sie); - -/* - * Enable storage key handling from now on and initialize the storage - * keys with the default key. - */ -static int __s390_enable_skey_pte(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - /* Clear storage key */ - ptep_zap_key(walk->mm, addr, pte); - return 0; -} - -/* - * Give a chance to schedule after setting a key to 256 pages. - * We only hold the mm lock, which is a rwsem and the kvm srcu. - * Both can sleep. - */ -static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - cond_resched(); - return 0; -} - -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, - struct mm_walk *walk) -{ - pmd_t *pmd =3D (pmd_t *)pte; - unsigned long start, end; - struct folio *folio =3D page_folio(pmd_page(*pmd)); - - /* - * The write check makes sure we do not set a key on shared - * memory. This is needed as the walker does not differentiate - * between actual guest memory and the process executable or - * shared libraries. - */ - if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID || - !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE)) - return 0; - - start =3D pmd_val(*pmd) & HPAGE_MASK; - end =3D start + HPAGE_SIZE; - __storage_key_init_range(start, end); - set_bit(PG_arch_1, &folio->flags.f); - cond_resched(); - return 0; -} - -static const struct mm_walk_ops enable_skey_walk_ops =3D { - .hugetlb_entry =3D __s390_enable_skey_hugetlb, - .pte_entry =3D __s390_enable_skey_pte, - .pmd_entry =3D __s390_enable_skey_pmd, - .walk_lock =3D PGWALK_WRLOCK, -}; - -int s390_enable_skey(void) -{ - struct mm_struct *mm =3D current->mm; - int rc =3D 0; - - mmap_write_lock(mm); - if (mm_uses_skeys(mm)) - goto out_up; - - mm->context.uses_skeys =3D 1; - rc =3D gmap_helper_disable_cow_sharing(); - if (rc) { - mm->context.uses_skeys =3D 0; - goto out_up; - } - walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL); - -out_up: - mmap_write_unlock(mm); - return rc; -} -EXPORT_SYMBOL_GPL(s390_enable_skey); - -/* - * Reset CMMA state, make all pages stable again. - */ -static int __s390_reset_cmma(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - ptep_zap_unused(walk->mm, addr, pte, 1); - return 0; -} - -static const struct mm_walk_ops reset_cmma_walk_ops =3D { - .pte_entry =3D __s390_reset_cmma, - .walk_lock =3D PGWALK_WRLOCK, -}; - -void s390_reset_cmma(struct mm_struct *mm) -{ - mmap_write_lock(mm); - walk_page_range(mm, 0, TASK_SIZE, &reset_cmma_walk_ops, NULL); - mmap_write_unlock(mm); -} -EXPORT_SYMBOL_GPL(s390_reset_cmma); - -#define GATHER_GET_PAGES 32 - -struct reset_walk_state { - unsigned long next; - unsigned long count; - unsigned long pfns[GATHER_GET_PAGES]; -}; - -static int s390_gather_pages(pte_t *ptep, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct reset_walk_state *p =3D walk->private; - pte_t pte =3D READ_ONCE(*ptep); - - if (pte_present(pte)) { - /* we have a reference from the mapping, take an extra one */ - get_page(phys_to_page(pte_val(pte))); - p->pfns[p->count] =3D phys_to_pfn(pte_val(pte)); - p->next =3D next; - p->count++; - } - return p->count >=3D GATHER_GET_PAGES; -} - -static const struct mm_walk_ops gather_pages_ops =3D { - .pte_entry =3D s390_gather_pages, - .walk_lock =3D PGWALK_RDLOCK, -}; - -/* - * Call the Destroy secure page UVC on each page in the given array of PFN= s. - * Each page needs to have an extra reference, which will be released here. - */ -void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns) -{ - struct folio *folio; - unsigned long i; - - for (i =3D 0; i < count; i++) { - folio =3D pfn_folio(pfns[i]); - /* we always have an extra reference */ - uv_destroy_folio(folio); - /* get rid of the extra reference */ - folio_put(folio); - cond_resched(); - } -} -EXPORT_SYMBOL_GPL(s390_uv_destroy_pfns); - -/** - * __s390_uv_destroy_range - Call the destroy secure page UVC on each page - * in the given range of the given address space. - * @mm: the mm to operate on - * @start: the start of the range - * @end: the end of the range - * @interruptible: if not 0, stop when a fatal signal is received - * - * Walk the given range of the given address space and call the destroy - * secure page UVC on each page. Optionally exit early if a fatal signal is - * pending. - * - * Return: 0 on success, -EINTR if the function stopped before completing - */ -int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, - unsigned long end, bool interruptible) -{ - struct reset_walk_state state =3D { .next =3D start }; - int r =3D 1; - - while (r > 0) { - state.count =3D 0; - mmap_read_lock(mm); - r =3D walk_page_range(mm, state.next, end, &gather_pages_ops, &state); - mmap_read_unlock(mm); - cond_resched(); - s390_uv_destroy_pfns(state.count, state.pfns); - if (interruptible && fatal_signal_pending(current)) - return -EINTR; - } - return 0; -} -EXPORT_SYMBOL_GPL(__s390_uv_destroy_range); - -/** - * s390_replace_asce - Try to replace the current ASCE of a gmap with a co= py - * @gmap: the gmap whose ASCE needs to be replaced - * - * If the ASCE is a SEGMENT type then this function will return -EINVAL, - * otherwise the pointers in the host_to_guest radix tree will keep pointi= ng - * to the wrong pages, causing use-after-free and memory corruption. - * If the allocation of the new top level page table fails, the ASCE is not - * replaced. - * In any case, the old ASCE is always removed from the gmap CRST list. - * Therefore the caller has to make sure to save a pointer to it - * beforehand, unless a leak is actually intended. - */ -int s390_replace_asce(struct gmap *gmap) -{ - unsigned long asce; - struct page *page; - void *table; - - /* Replacing segment type ASCEs would cause serious issues */ - if ((gmap->asce & _ASCE_TYPE_MASK) =3D=3D _ASCE_TYPE_SEGMENT) - return -EINVAL; - - page =3D gmap_alloc_crst(); - if (!page) - return -ENOMEM; - table =3D page_to_virt(page); - memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); - - /* Set new table origin while preserving existing ASCE control bits */ - asce =3D (gmap->asce & ~_ASCE_ORIGIN) | __pa(table); - WRITE_ONCE(gmap->asce, asce); - WRITE_ONCE(gmap->mm->context.gmap_asce, asce); - WRITE_ONCE(gmap->table, table); - - return 0; -} -EXPORT_SYMBOL_GPL(s390_replace_asce); diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index d0e8579d2669..9cbd1552f918 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -369,8 +369,6 @@ static inline void pmdp_idte_local(struct mm_struct *mm, mm->context.asce, IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_local(mm, addr); } =20 static inline void pmdp_idte_global(struct mm_struct *mm, @@ -379,16 +377,10 @@ static inline void pmdp_idte_global(struct mm_struct = *mm, if (machine_has_tlb_guest()) { __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } else if (cpu_has_idte()) { __pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_idte_global(mm, addr); } else { __pmdp_csp(pmdp); - if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m) - gmap_pmdp_csp(mm, addr); } } =20 @@ -423,8 +415,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *m= m, cpumask_of(smp_processor_id()))) { set_pmd(pmdp, set_pmd_bit(*pmdp, __pgprot(_SEGMENT_ENTRY_INVALID))); mm->context.flush_mm =3D 1; - if (mm_has_pgste(mm)) - gmap_pmdp_invalidate(mm, addr); } else { pmdp_idte_global(mm, addr, pmdp); } --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBFF6371A1F; Thu, 20 Nov 2025 17:16:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659012; cv=none; b=IVWEvLnj8ZhnZ3HXUWimFA1ZGCJuMfIDX6zU8VBz7Tf0vIQjdRE854Q4PptaDcx8qUHXZ/S9EQsd2KCQSJy3VtQ/YhSBzT3MKwofKZavzipuJOB7pNZFpKDrh0j72g/FpPvJOPFbCVWfEqrQijJufxtnuBXR+wADRWVdpCmClJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659012; c=relaxed/simple; bh=UD2BOFv8ugFzu5joTLi9gHrf/7eaVj1CTkHpG0AiODM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KD8hSwh9OVd7IYjermXJx+9EkjKtt24nwOgcq8jxpuE2jYgv+tlxvjH04r601k76XJrWwVi9uyVP4rAPuMLxVcoIAwg9+xTIlAT1dpL6CTqBz86wzyMZtdCzl18a8/SqwkHJgulaffhc7nDfzx8VtfuKxDMg58QxgpVoesPC13k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=HGeCSZz8; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="HGeCSZz8" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKCk5xS007069; Thu, 20 Nov 2025 17:16:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=vqvKCP4YcF0uXyoA/ ypSZpyoP61wolJsZAO0p7Q3YmI=; b=HGeCSZz8MK4Th0LEPcwAqAyaESRN6r0J2 dvrd9ItChmVbwqp+pKbonkrRh9n0OYv9g3mBkBbzUZ0l555nTHj009UNvw9+RJ0I YXGclbx9T4lJVskkea4tcZ7oC8ESm0AEsS4T6g9CQL6TlANa8QtC7OAF7HwPYtmP J35U64AVO1GVm9Z3dU7NAI9DD1vJ2Zh7OV15YGh1joDJw/e4ID9t2u3oR7p4h1t4 /HhCDHE9CQX//tvKHSGYkCbEkrw+eXbOSWnlFkiGmfNjjZGZkCDFpFosxssHEZjk kagyMdPc4iHIlzPv6srwB5KugZbFeJfKvHYZ7Qyb2kkTMpdz24inQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejmsxbsh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:46 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKH0nRl022311; Thu, 20 Nov 2025 17:16:33 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mh0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:32 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGSgU31326714 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:28 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3AD1D2004B; Thu, 20 Nov 2025 17:16:28 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CBE8D20040; Thu, 20 Nov 2025 17:16:25 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:25 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 21/23] KVM: S390: Remove PGSTE code from linux/s390 mm Date: Thu, 20 Nov 2025 18:15:42 +0100 Message-ID: <20251120171544.96841-22-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MPE324GEvcEaXOi0nYvmJBU6CBBEbcMG X-Authority-Analysis: v=2.4 cv=Rv3I7SmK c=1 sm=1 tr=0 ts=691f4cfe cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=iGo5jALgg6wKPfVY9XUA:9 X-Proofpoint-GUID: MPE324GEvcEaXOi0nYvmJBU6CBBEbcMG X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXxqEX7y4P4Tqe kNy000UgpaAzkn8pIs4eIYJnSGS8fW1+hkhkVaBgq+EFcP9BHv/NF8dXHu/F5K221fhiC908acJ ed1OHUe2370Ohl5hTuDiijy3IZyELQROx09UuC+LrrCGmYWi6nXsR8ZuGH5DE9aoKJR4YIc2zB3 1uQP6LLGF9ZFGnR53tiunKoJXBs4CZWWBUN16B2Kj4pU8cMpQU/10XYa34MvyYH6qz3Dtj7Zruy lYf+EihhuoNjTs0z2KnpqhhiBeJRoSt66gO/zmRA/iPMGOgvPbGZVFFGZyF4++n0ribNuUSBifp b0Frg+xiJq9GAdhOAw6b+Zuygf3q15XFw1+V8bmM8OHIUqlA860pWei6TaGDvJlCYK+GPmjpNZv Ox2iWe3Ul+qWONSuY9wmJLoXnUDJaA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Remove the PGSTE config option. Remove all code from linux/s390 mm that involves PGSTEs. Signed-off-by: Claudio Imbrenda --- arch/s390/Kconfig | 3 - arch/s390/include/asm/mmu.h | 13 - arch/s390/include/asm/page.h | 4 - arch/s390/include/asm/pgalloc.h | 4 - arch/s390/include/asm/pgtable.h | 121 +---- arch/s390/kvm/dat.h | 1 + arch/s390/mm/hugetlbpage.c | 24 - arch/s390/mm/pgalloc.c | 24 - arch/s390/mm/pgtable.c | 829 +------------------------------- mm/khugepaged.c | 9 - 10 files changed, 17 insertions(+), 1015 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 3b4ba19a3611..162dc930092e 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -32,9 +32,6 @@ config GENERIC_BUG_RELATIVE_POINTERS config GENERIC_LOCKBREAK def_bool y if PREEMPTION =20 -config PGSTE - def_bool n - config AUDIT_ARCH def_bool y =20 diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index f07e49b419ab..d4fd7bf3692e 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -18,24 +18,11 @@ typedef struct { unsigned long vdso_base; /* The mmu context belongs to a secure guest. */ atomic_t protected_count; - /* - * The following bitfields need a down_write on the mm - * semaphore when they are written to. As they are only - * written once, they can be read without a lock. - */ - /* The mmu context uses extended page tables. */ - unsigned int has_pgste:1; - /* The mmu context uses storage keys. */ - unsigned int uses_skeys:1; - /* The mmu context uses CMM. */ - unsigned int uses_cmm:1; /* * The mmu context allows COW-sharing of memory pages (KSM, zeropage). * Note that COW-sharing during fork() is currently always allowed. */ unsigned int allow_cow_sharing:1; - /* The gmaps associated with this context are allowed to use huge pages. = */ - unsigned int allow_gmap_hpage_1m:1; } mm_context_t; =20 #define INIT_MM_CONTEXT(name) \ diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index 9240a363c893..5047a0a9450c 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -78,7 +78,6 @@ static inline void copy_page(void *to, void *from) #ifdef STRICT_MM_TYPECHECKS =20 typedef struct { unsigned long pgprot; } pgprot_t; -typedef struct { unsigned long pgste; } pgste_t; typedef struct { unsigned long pte; } pte_t; typedef struct { unsigned long pmd; } pmd_t; typedef struct { unsigned long pud; } pud_t; @@ -94,7 +93,6 @@ static __always_inline unsigned long name ## _val(name ##= _t name) \ #else /* STRICT_MM_TYPECHECKS */ =20 typedef unsigned long pgprot_t; -typedef unsigned long pgste_t; typedef unsigned long pte_t; typedef unsigned long pmd_t; typedef unsigned long pud_t; @@ -110,7 +108,6 @@ static __always_inline unsigned long name ## _val(name = ## _t name) \ #endif /* STRICT_MM_TYPECHECKS */ =20 DEFINE_PGVAL_FUNC(pgprot) -DEFINE_PGVAL_FUNC(pgste) DEFINE_PGVAL_FUNC(pte) DEFINE_PGVAL_FUNC(pmd) DEFINE_PGVAL_FUNC(pud) @@ -120,7 +117,6 @@ DEFINE_PGVAL_FUNC(pgd) typedef pte_t *pgtable_t; =20 #define __pgprot(x) ((pgprot_t) { (x) } ) -#define __pgste(x) ((pgste_t) { (x) } ) #define __pte(x) ((pte_t) { (x) } ) #define __pmd(x) ((pmd_t) { (x) } ) #define __pud(x) ((pud_t) { (x) } ) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgallo= c.h index a16e65072371..a5de9e61ea9e 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -27,10 +27,6 @@ unsigned long *page_table_alloc_noprof(struct mm_struct = *); #define page_table_alloc(...) alloc_hooks(page_table_alloc_noprof(__VA_ARG= S__)) void page_table_free(struct mm_struct *, unsigned long *); =20 -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm); -#define page_table_alloc_pgste(...) alloc_hooks(page_table_alloc_pgste_nop= rof(__VA_ARGS__)) -void page_table_free_pgste(struct ptdesc *ptdesc); - static inline void crst_table_init(unsigned long *crst, unsigned long entr= y) { memset64((u64 *)crst, entry, _CRST_ENTRIES); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index b6ec18999c62..fc1915b8e379 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -413,28 +413,6 @@ void setup_protection_map(void); * SW-bits: y young, d dirty, r read, w write */ =20 -/* Page status table bits for virtualization */ -#define PGSTE_ACC_BITS 0xf000000000000000UL -#define PGSTE_FP_BIT 0x0800000000000000UL -#define PGSTE_PCL_BIT 0x0080000000000000UL -#define PGSTE_HR_BIT 0x0040000000000000UL -#define PGSTE_HC_BIT 0x0020000000000000UL -#define PGSTE_GR_BIT 0x0004000000000000UL -#define PGSTE_GC_BIT 0x0002000000000000UL -#define PGSTE_ST2_MASK 0x0000ffff00000000UL -#define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */ -#define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */ -#define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */ - -/* Guest Page State used for virtualization */ -#define _PGSTE_GPS_ZERO 0x0000000080000000UL -#define _PGSTE_GPS_NODAT 0x0000000040000000UL -#define _PGSTE_GPS_USAGE_MASK 0x0000000003000000UL -#define _PGSTE_GPS_USAGE_STABLE 0x0000000000000000UL -#define _PGSTE_GPS_USAGE_UNUSED 0x0000000001000000UL -#define _PGSTE_GPS_USAGE_POT_VOLATILE 0x0000000002000000UL -#define _PGSTE_GPS_USAGE_VOLATILE _PGSTE_GPS_USAGE_MASK - /* * A user page table pointer has the space-switch-event bit, the * private-space-control bit and the storage-alteration-event-control @@ -566,15 +544,6 @@ static inline bool mm_pmd_folded(struct mm_struct *mm) } #define mm_pmd_folded(mm) mm_pmd_folded(mm) =20 -static inline int mm_has_pgste(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (unlikely(mm->context.has_pgste)) - return 1; -#endif - return 0; -} - static inline int mm_is_protected(struct mm_struct *mm) { #if IS_ENABLED(CONFIG_KVM) @@ -584,16 +553,6 @@ static inline int mm_is_protected(struct mm_struct *mm) return 0; } =20 -static inline pgste_t clear_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) & ~mask); -} - -static inline pgste_t set_pgste_bit(pgste_t pgste, unsigned long mask) -{ - return __pgste(pgste_val(pgste) | mask); -} - static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { return __pte(pte_val(pte) & ~pgprot_val(prot)); @@ -639,15 +598,6 @@ static inline int mm_forbids_zeropage(struct mm_struct= *mm) return 0; } =20 -static inline int mm_uses_skeys(struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - if (mm->context.uses_skeys) - return 1; -#endif - return 0; -} - static inline void csp(unsigned int *ptr, unsigned int old, unsigned int n= ew) { union register_pair r1 =3D { .even =3D old, .odd =3D new, }; @@ -1367,45 +1317,13 @@ static inline int ptep_set_access_flags(struct vm_a= rea_struct *vma, { if (pte_same(*ptep, entry)) return 0; - if (cpu_has_rdp() && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, en= try)) + if (cpu_has_rdp() && pte_allow_rdp(*ptep, entry)) ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry); else ptep_xchg_direct(vma->vm_mm, addr, ptep, entry); return 1; } =20 -/* - * Additional functions to handle KVM guest page tables - */ -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry); -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep= ); -int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr, - pte_t *ptep, int prot, unsigned long bit); -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep , int reset); -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep); -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte); -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep); - -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long address, - pte_t *ptep); -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq); -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc); -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr); -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key); - -int set_pgste_bits(struct mm_struct *mm, unsigned long addr, - unsigned long bits, unsigned long value); -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep); -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste); - #define pgprot_writecombine pgprot_writecombine pgprot_t pgprot_writecombine(pgprot_t prot); =20 @@ -1420,23 +1338,12 @@ static inline void set_ptes(struct mm_struct *mm, u= nsigned long addr, { if (pte_present(entry)) entry =3D clear_pte_bit(entry, __pgprot(_PAGE_UNUSED)); - if (mm_has_pgste(mm)) { - for (;;) { - ptep_set_pte_at(mm, addr, ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - addr +=3D PAGE_SIZE; - } - } else { - for (;;) { - set_pte(ptep, entry); - if (--nr =3D=3D 0) - break; - ptep++; - entry =3D __pte(pte_val(entry) + PAGE_SIZE); - } + for (;;) { + set_pte(ptep, entry); + if (--nr =3D=3D 0) + break; + ptep++; + entry =3D __pte(pte_val(entry) + PAGE_SIZE); } } #define set_ptes set_ptes @@ -2037,18 +1944,4 @@ extern pte_t *vmem_get_alloc_pte(unsigned long addr,= bool alloc); #define pmd_pgtable(pmd) \ ((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE)) =20 -static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt) -{ - unsigned long *pgstes, res; - - pgstes =3D pgt + _PAGE_ENTRIES; - - res =3D (pgstes[0] & PGSTE_ST2_MASK) << 16; - res |=3D pgstes[1] & PGSTE_ST2_MASK; - res |=3D (pgstes[2] & PGSTE_ST2_MASK) >> 16; - res |=3D (pgstes[3] & PGSTE_ST2_MASK) >> 32; - - return res; -} - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/kvm/dat.h b/arch/s390/kvm/dat.h index 4190a54224c0..21a096dec9d7 100644 --- a/arch/s390/kvm/dat.h +++ b/arch/s390/kvm/dat.h @@ -108,6 +108,7 @@ union pte { #define _PAGE_SD 0x002 =20 /* Needed as macro to perform atomic operations */ +#define PGSTE_PCL_BIT 0x0080000000000000UL /* PCL lock, HW bit */ #define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */ =20 enum pgste_gps_usage { diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 72e8fa136af5..2597c1766a62 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -136,29 +136,6 @@ static inline pte_t __rste_to_pte(unsigned long rste) return __pte(pteval); } =20 -static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste) -{ - struct folio *folio; - unsigned long size, paddr; - - if (!mm_uses_skeys(mm) || - rste & _SEGMENT_ENTRY_INVALID) - return; - - if ((rste & _REGION_ENTRY_TYPE_MASK) =3D=3D _REGION_ENTRY_TYPE_R3) { - folio =3D page_folio(pud_page(__pud(rste))); - size =3D PUD_SIZE; - paddr =3D rste & PUD_MASK; - } else { - folio =3D page_folio(pmd_page(__pmd(rste))); - size =3D PMD_SIZE; - paddr =3D rste & PMD_MASK; - } - - if (!test_and_set_bit(PG_arch_1, &folio->flags.f)) - __storage_key_init_range(paddr, paddr + size); -} - void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -174,7 +151,6 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned l= ong addr, } else if (likely(pte_present(pte))) rste |=3D _SEGMENT_ENTRY_LARGE; =20 - clear_huge_pte_skeys(mm, rste); set_pte(ptep, __pte(rste)); } =20 diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 626fca116cd7..3dc2aebf8e6e 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -114,30 +114,6 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned = long end) return -ENOMEM; } =20 -#ifdef CONFIG_PGSTE - -struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm) -{ - struct ptdesc *ptdesc; - u64 *table; - - ptdesc =3D pagetable_alloc_noprof(GFP_KERNEL_ACCOUNT, 0); - if (ptdesc) { - table =3D (u64 *)ptdesc_address(ptdesc); - __arch_set_page_dat(table, 1); - memset64(table, _PAGE_INVALID, PTRS_PER_PTE); - memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE); - } - return ptdesc; -} - -void page_table_free_pgste(struct ptdesc *ptdesc) -{ - pagetable_free(ptdesc); -} - -#endif /* CONFIG_PGSTE */ - unsigned long *page_table_alloc_noprof(struct mm_struct *mm) { gfp_t gfp =3D GFP_KERNEL_ACCOUNT; diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 9cbd1552f918..ed39b1c6bed3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -115,171 +115,14 @@ static inline pte_t ptep_flush_lazy(struct mm_struct= *mm, return old; } =20 -static inline pgste_t pgste_get_lock(pte_t *ptep) -{ - unsigned long value =3D 0; -#ifdef CONFIG_PGSTE - unsigned long *ptr =3D (unsigned long *)(ptep + PTRS_PER_PTE); - - do { - value =3D __atomic64_or_barrier(PGSTE_PCL_BIT, ptr); - } while (value & PGSTE_PCL_BIT); - value |=3D PGSTE_PCL_BIT; -#endif - return __pgste(value); -} - -static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - barrier(); - WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~P= GSTE_PCL_BIT); -#endif -} - -static inline pgste_t pgste_get(pte_t *ptep) -{ - unsigned long pgste =3D 0; -#ifdef CONFIG_PGSTE - pgste =3D *(unsigned long *)(ptep + PTRS_PER_PTE); -#endif - return __pgste(pgste); -} - -static inline void pgste_set(pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - *(pgste_t *)(ptep + PTRS_PER_PTE) =3D pgste; -#endif -} - -static inline pgste_t pgste_update_all(pte_t pte, pgste_t pgste, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address, bits, skey; - - if (!mm_uses_skeys(mm) || pte_val(pte) & _PAGE_INVALID) - return pgste; - address =3D pte_val(pte) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(address); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - /* Transfer page changed & referenced bit to guest bits in pgste */ - pgste =3D set_pgste_bit(pgste, bits << 48); /* GR bit & GC bit */ - /* Copy page access key and fetch protection bit to pgste */ - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) <= < 56); -#endif - return pgste; - -} - -static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry, - struct mm_struct *mm) -{ -#ifdef CONFIG_PGSTE - unsigned long address; - unsigned long nkey; - - if (!mm_uses_skeys(mm) || pte_val(entry) & _PAGE_INVALID) - return; - VM_BUG_ON(!(pte_val(*ptep) & _PAGE_INVALID)); - address =3D pte_val(entry) & PAGE_MASK; - /* - * Set page access key and fetch protection bit from pgste. - * The guest C/R information is still in the PGSTE, set real - * key C/R to 0. - */ - nkey =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - nkey |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - page_set_storage_key(address, nkey, 0); -#endif -} - -static inline pgste_t pgste_set_pte(pte_t *ptep, pgste_t pgste, pte_t entr= y) -{ -#ifdef CONFIG_PGSTE - if ((pte_val(entry) & _PAGE_PRESENT) && - (pte_val(entry) & _PAGE_WRITE) && - !(pte_val(entry) & _PAGE_INVALID)) { - if (!machine_has_esop()) { - /* - * Without enhanced suppression-on-protection force - * the dirty bit on for all writable ptes. - */ - entry =3D set_pte_bit(entry, __pgprot(_PAGE_DIRTY)); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - if (!(pte_val(entry) & _PAGE_PROTECT)) - /* This pte allows write access, set user-dirty */ - pgste =3D set_pgste_bit(pgste, PGSTE_UC_BIT); - } -#endif - set_pte(ptep, entry); - return pgste; -} - -static inline pgste_t pgste_pte_notify(struct mm_struct *mm, - unsigned long addr, - pte_t *ptep, pgste_t pgste) -{ -#ifdef CONFIG_PGSTE - unsigned long bits; - - bits =3D pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT); - if (bits) { - pgste =3D __pgste(pgste_val(pgste) ^ bits); - ptep_notify(mm, addr, ptep, bits); - } -#endif - return pgste; -} - -static inline pgste_t ptep_xchg_start(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) -{ - pgste_t pgste =3D __pgste(0); - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get_lock(ptep); - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - } - return pgste; -} - -static inline pte_t ptep_xchg_commit(struct mm_struct *mm, - unsigned long addr, pte_t *ptep, - pgste_t pgste, pte_t old, pte_t new) -{ - if (mm_has_pgste(mm)) { - if (pte_val(old) & _PAGE_INVALID) - pgste_set_key(ptep, pgste, new, mm); - if (pte_val(new) & _PAGE_INVALID) { - pgste =3D pgste_update_all(old, pgste, mm); - if ((pgste_val(pgste) & _PGSTE_GPS_USAGE_MASK) =3D=3D - _PGSTE_GPS_USAGE_UNUSED) - old =3D set_pte_bit(old, __pgprot(_PAGE_UNUSED)); - } - pgste =3D pgste_set_pte(ptep, pgste, new); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, new); - } - return old; -} - pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_direct(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_direct(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -313,15 +156,11 @@ EXPORT_SYMBOL(ptep_reset_dat_prot); pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t new) { - pgste_t pgste; pte_t old; - int nodat; =20 preempt_disable(); - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - old =3D ptep_xchg_commit(mm, addr, ptep, pgste, old, new); + old =3D ptep_flush_lazy(mm, addr, ptep, 1); + set_pte(ptep, new); preempt_enable(); return old; } @@ -330,43 +169,22 @@ EXPORT_SYMBOL(ptep_xchg_lazy); pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep) { - pgste_t pgste; - pte_t old; - int nodat; - struct mm_struct *mm =3D vma->vm_mm; - - pgste =3D ptep_xchg_start(mm, addr, ptep); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - old =3D ptep_flush_lazy(mm, addr, ptep, nodat); - if (mm_has_pgste(mm)) { - pgste =3D pgste_update_all(old, pgste, mm); - pgste_set(ptep, pgste); - } - return old; + preempt_disable(); + return ptep_flush_lazy(vma->vm_mm, addr, ptep, 1); } =20 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, pte_t old_pte, pte_t pte) { - pgste_t pgste; - struct mm_struct *mm =3D vma->vm_mm; - - if (mm_has_pgste(mm)) { - pgste =3D pgste_get(ptep); - pgste_set_key(ptep, pgste, pte, mm); - pgste =3D pgste_set_pte(ptep, pgste, pte); - pgste_set_unlock(ptep, pgste); - } else { - set_pte(ptep, pte); - } + set_pte(ptep, pte); + preempt_enable(); } =20 static inline void pmdp_idte_local(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { if (machine_has_tlb_guest()) - __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, - mm->context.asce, IDTE_LOCAL); + __pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE, mm->context.asce, = IDTE_LOCAL); else __pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL); } @@ -422,40 +240,6 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *= mm, return old; } =20 -#ifdef CONFIG_PGSTE -static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pm= dp) -{ - struct vm_area_struct *vma; - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - - /* We need a valid VMA, otherwise this is clearly a fault. */ - vma =3D vma_lookup(mm, addr); - if (!vma) - return -EFAULT; - - pgd =3D pgd_offset(mm, addr); - if (!pgd_present(*pgd)) - return -ENOENT; - - p4d =3D p4d_offset(pgd, addr); - if (!p4d_present(*p4d)) - return -ENOENT; - - pud =3D pud_offset(p4d, addr); - if (!pud_present(*pud)) - return -ENOENT; - - /* Large PUDs are not supported yet. */ - if (pud_leaf(*pud)) - return -EFAULT; - - *pmdp =3D pmd_offset(pud, addr); - return 0; -} -#endif - pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t new) { @@ -579,598 +363,3 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struc= t *mm, pmd_t *pmdp) return pgtable; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - -#ifdef CONFIG_PGSTE -void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t entry) -{ - pgste_t pgste; - - /* the mm_has_pgste() check is done in set_pte_at() */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_ZERO); - pgste_set_key(ptep, pgste, entry, mm); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - pgste_t pgste; - - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D set_pgste_bit(pgste, PGSTE_IN_BIT); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/** - * ptep_force_prot - change access rights of a locked pte - * @mm: pointer to the process mm_struct - * @addr: virtual address in the guest address space - * @ptep: pointer to the page table entry - * @prot: indicates guest access rights: PROT_NONE, PROT_READ or PROT_WRITE - * @bit: pgste bit to set (e.g. for notification) - * - * Returns 0 if the access rights were changed and -EAGAIN if the current - * and requested access rights are incompatible. - */ -int ptep_force_prot(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int prot, unsigned long bit) -{ - pte_t entry; - pgste_t pgste; - int pte_i, pte_p, nodat; - - pgste =3D pgste_get_lock(ptep); - entry =3D *ptep; - /* Check pte entry after all locks have been acquired */ - pte_i =3D pte_val(entry) & _PAGE_INVALID; - pte_p =3D pte_val(entry) & _PAGE_PROTECT; - if ((pte_i && (prot !=3D PROT_NONE)) || - (pte_p && (prot & PROT_WRITE))) { - pgste_set_unlock(ptep, pgste); - return -EAGAIN; - } - /* Change access rights and set pgste bit */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - if (prot =3D=3D PROT_NONE && !pte_i) { - ptep_flush_direct(mm, addr, ptep, nodat); - pgste =3D pgste_update_all(entry, pgste, mm); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_INVALID)); - } - if (prot =3D=3D PROT_READ && !pte_p) { - ptep_flush_direct(mm, addr, ptep, nodat); - entry =3D clear_pte_bit(entry, __pgprot(_PAGE_INVALID)); - entry =3D set_pte_bit(entry, __pgprot(_PAGE_PROTECT)); - } - pgste =3D set_pgste_bit(pgste, bit); - pgste =3D pgste_set_pte(ptep, pgste, entry); - pgste_set_unlock(ptep, pgste); - return 0; -} - -int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr, - pte_t *sptep, pte_t *tptep, pte_t pte) -{ - pgste_t spgste, tpgste; - pte_t spte, tpte; - int rc =3D -EAGAIN; - - if (!(pte_val(*tptep) & _PAGE_INVALID)) - return 0; /* already shadowed */ - spgste =3D pgste_get_lock(sptep); - spte =3D *sptep; - if (!(pte_val(spte) & _PAGE_INVALID) && - !((pte_val(spte) & _PAGE_PROTECT) && - !(pte_val(pte) & _PAGE_PROTECT))) { - spgste =3D set_pgste_bit(spgste, PGSTE_VSIE_BIT); - tpgste =3D pgste_get_lock(tptep); - tpte =3D __pte((pte_val(spte) & PAGE_MASK) | - (pte_val(pte) & _PAGE_PROTECT)); - /* don't touch the storage key - it belongs to parent pgste */ - tpgste =3D pgste_set_pte(tptep, tpgste, tpte); - pgste_set_unlock(tptep, tpgste); - rc =3D 1; - } - pgste_set_unlock(sptep, spgste); - return rc; -} - -void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *p= tep) -{ - pgste_t pgste; - int nodat; - - pgste =3D pgste_get_lock(ptep); - /* notifier is called by the caller */ - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_flush_direct(mm, saddr, ptep, nodat); - /* don't touch the storage key - it belongs to parent pgste */ - pgste =3D pgste_set_pte(ptep, pgste, __pte(_PAGE_INVALID)); - pgste_set_unlock(ptep, pgste); -} - -static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) -{ - if (!non_swap_entry(entry)) - dec_mm_counter(mm, MM_SWAPENTS); - else if (is_migration_entry(entry)) { - struct folio *folio =3D pfn_swap_entry_folio(entry); - - dec_mm_counter(mm, mm_counter(folio)); - } - free_swap_and_cache(entry); -} - -void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, int reset) -{ - unsigned long pgstev; - pgste_t pgste; - pte_t pte; - - /* Zap unused and logically-zero pages */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - pte =3D *ptep; - if (!reset && pte_swap(pte) && - ((pgstev & _PGSTE_GPS_USAGE_MASK) =3D=3D _PGSTE_GPS_USAGE_UNUSED || - (pgstev & _PGSTE_GPS_ZERO))) { - ptep_zap_swap_entry(mm, pte_to_swp_entry(pte)); - pte_clear(mm, addr, ptep); - } - if (reset) - pgste =3D clear_pgste_bit(pgste, _PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODA= T); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - unsigned long ptev; - pgste_t pgste; - - /* Clear storage key ACC and F, but set R/C */ - preempt_disable(); - pgste =3D pgste_get_lock(ptep); - pgste =3D clear_pgste_bit(pgste, PGSTE_ACC_BITS | PGSTE_FP_BIT); - pgste =3D set_pgste_bit(pgste, PGSTE_GR_BIT | PGSTE_GC_BIT); - ptev =3D pte_val(*ptep); - if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE)) - page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 0); - pgste_set_unlock(ptep, pgste); - preempt_enable(); -} - -/* - * Test and reset if a guest page is dirty - */ -bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) -{ - pgste_t pgste; - pte_t pte; - bool dirty; - int nodat; - - pgste =3D pgste_get_lock(ptep); - dirty =3D !!(pgste_val(pgste) & PGSTE_UC_BIT); - pgste =3D clear_pgste_bit(pgste, PGSTE_UC_BIT); - pte =3D *ptep; - if (dirty && (pte_val(pte) & _PAGE_PRESENT)) { - pgste =3D pgste_pte_notify(mm, addr, ptep, pgste); - nodat =3D !!(pgste_val(pgste) & _PGSTE_GPS_NODAT); - ptep_ipte_global(mm, addr, ptep, nodat); - if (machine_has_esop() || !(pte_val(pte) & _PAGE_WRITE)) - pte =3D set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); - else - pte =3D set_pte_bit(pte, __pgprot(_PAGE_INVALID)); - set_pte(ptep, pte); - } - pgste_set_unlock(ptep, pgste); - return dirty; -} -EXPORT_SYMBOL_GPL(ptep_test_and_clear_uc); - -int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, bool nq) -{ - unsigned long keyul, paddr; - spinlock_t *ptl; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * we can ignore attempts to set the key to 0, because it already is 0. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return key ? -EFAULT : 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return key ? -EFAULT : 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - /* - * Huge pmds need quiescing operations, they are - * always mapped. - */ - page_set_storage_key(paddr, key, 1); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - new =3D clear_pgste_bit(new, PGSTE_GR_BIT | PGSTE_GC_BIT | - PGSTE_ACC_BITS | PGSTE_FP_BIT); - keyul =3D (unsigned long) key; - new =3D set_pgste_bit(new, (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) <= < 48); - new =3D set_pgste_bit(new, (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 5= 6); - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - unsigned long bits, skey; - - paddr =3D pte_val(*ptep) & PAGE_MASK; - skey =3D (unsigned long) page_get_storage_key(paddr); - bits =3D skey & (_PAGE_CHANGED | _PAGE_REFERENCED); - skey =3D key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); - /* Set storage key ACC and FP */ - page_set_storage_key(paddr, skey, !nq); - /* Merge host changed & referenced into pgste */ - new =3D set_pgste_bit(new, bits << 52); - } - /* changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & - (PGSTE_ACC_BITS | PGSTE_FP_BIT | PGSTE_GR_BIT | PGSTE_GC_BIT)) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_guest_storage_key); - -/* - * Conditionally set a guest storage key (handling csske). - * oldkey will be updated when either mr or mc is set and a pointer is giv= en. - * - * Returns 0 if a guests storage key update wasn't necessary, 1 if the gue= st - * storage key was updated and -EFAULT on access errors. - */ -int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char key, unsigned char *oldkey, - bool nq, bool mr, bool mc) -{ - unsigned char tmp, mask =3D _PAGE_ACC_BITS | _PAGE_FP_BIT; - int rc; - - /* we can drop the pgste lock between getting and setting the key */ - if (mr | mc) { - rc =3D get_guest_storage_key(current->mm, addr, &tmp); - if (rc) - return rc; - if (oldkey) - *oldkey =3D tmp; - if (!mr) - mask |=3D _PAGE_REFERENCED; - if (!mc) - mask |=3D _PAGE_CHANGED; - if (!((tmp ^ key) & mask)) - return 0; - } - rc =3D set_guest_storage_key(current->mm, addr, key, nq); - return rc < 0 ? rc : 1; -} -EXPORT_SYMBOL(cond_set_guest_storage_key); - -/* - * Reset a guest reference bit (rrbe), returning the reference and changed= bit. - * - * Returns < 0 in case of error, otherwise the cc to be reported to the gu= est. - */ -int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr) -{ - spinlock_t *ptl; - unsigned long paddr; - pgste_t old, new; - pmd_t *pmdp; - pte_t *ptep; - int cc =3D 0; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0 and there is nothing for us to do. - */ - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - cc =3D page_reset_referenced(paddr); - spin_unlock(ptl); - return cc; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - new =3D old =3D pgste_get_lock(ptep); - /* Reset guest reference bit only */ - new =3D clear_pgste_bit(new, PGSTE_GR_BIT); - - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - paddr =3D pte_val(*ptep) & PAGE_MASK; - cc =3D page_reset_referenced(paddr); - /* Merge real referenced bit into host-set */ - new =3D set_pgste_bit(new, ((unsigned long)cc << 53) & PGSTE_HR_BIT); - } - /* Reflect guest's logical view, not physical */ - cc |=3D (pgste_val(old) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 49; - /* Changing the guest storage key is considered a change of the page */ - if ((pgste_val(new) ^ pgste_val(old)) & PGSTE_GR_BIT) - new =3D set_pgste_bit(new, PGSTE_UC_BIT); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return cc; -} -EXPORT_SYMBOL(reset_guest_reference_bit); - -int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, - unsigned char *key) -{ - unsigned long paddr; - spinlock_t *ptl; - pgste_t pgste; - pmd_t *pmdp; - pte_t *ptep; - - /* - * If we don't have a PTE table and if there is no huge page mapped, - * the storage key is 0. - */ - *key =3D 0; - - switch (pmd_lookup(mm, addr, &pmdp)) { - case -ENOENT: - return 0; - case 0: - break; - default: - return -EFAULT; - } -again: - ptl =3D pmd_lock(mm, pmdp); - if (!pmd_present(*pmdp)) { - spin_unlock(ptl); - return 0; - } - - if (pmd_leaf(*pmdp)) { - paddr =3D pmd_val(*pmdp) & HPAGE_MASK; - paddr |=3D addr & ~HPAGE_MASK; - *key =3D page_get_storage_key(paddr); - spin_unlock(ptl); - return 0; - } - spin_unlock(ptl); - - ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); - if (!ptep) - goto again; - pgste =3D pgste_get_lock(ptep); - *key =3D (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; - paddr =3D pte_val(*ptep) & PAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) - *key =3D page_get_storage_key(paddr); - /* Reflect guest's logical view, not physical */ - *key |=3D (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48; - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_guest_storage_key); - -/** - * pgste_perform_essa - perform ESSA actions on the PGSTE. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @orc: the specific action to perform, see the ESSA_SET_* macros. - * @oldpte: the PTE will be saved there if the pointer is not NULL. - * @oldpgste: the old PGSTE will be saved there if the pointer is not NULL. - * - * Return: 1 if the page is to be added to the CBRL, otherwise 0, - * or < 0 in case of error. -EINVAL is returned for invalid values - * of orc, -EFAULT for invalid addresses. - */ -int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, - unsigned long *oldpte, unsigned long *oldpgste) -{ - struct vm_area_struct *vma; - unsigned long pgstev; - spinlock_t *ptl; - pgste_t pgste; - pte_t *ptep; - int res =3D 0; - - WARN_ON_ONCE(orc > ESSA_MAX); - if (unlikely(orc > ESSA_MAX)) - return -EINVAL; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - pgste =3D pgste_get_lock(ptep); - pgstev =3D pgste_val(pgste); - if (oldpte) - *oldpte =3D pte_val(*ptep); - if (oldpgste) - *oldpgste =3D pgstev; - - switch (orc) { - case ESSA_GET_STATE: - break; - case ESSA_SET_STABLE: - pgstev &=3D ~(_PGSTE_GPS_USAGE_MASK | _PGSTE_GPS_NODAT); - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - break; - case ESSA_SET_UNUSED: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_UNUSED; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - if (pte_val(*ptep) & _PAGE_INVALID) - res =3D 1; - break; - case ESSA_SET_POT_VOLATILE: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev |=3D _PGSTE_GPS_USAGE_POT_VOLATILE; - break; - } - if (pgstev & _PGSTE_GPS_ZERO) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - break; - } - if (!(pgstev & PGSTE_GC_BIT)) { - pgstev |=3D _PGSTE_GPS_USAGE_VOLATILE; - res =3D 1; - break; - } - break; - case ESSA_SET_STABLE_RESIDENT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - /* - * Since the resident state can go away any time after this - * call, we will not make this page resident. We can revisit - * this decision if a guest will ever start using this. - */ - break; - case ESSA_SET_STABLE_IF_RESIDENT: - if (!(pte_val(*ptep) & _PAGE_INVALID)) { - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE; - } - break; - case ESSA_SET_STABLE_NODAT: - pgstev &=3D ~_PGSTE_GPS_USAGE_MASK; - pgstev |=3D _PGSTE_GPS_USAGE_STABLE | _PGSTE_GPS_NODAT; - break; - default: - /* we should never get here! */ - break; - } - /* If we are discarding a page, set it to logical zero */ - if (res) - pgstev |=3D _PGSTE_GPS_ZERO; - - pgste =3D __pgste(pgstev); - pgste_set_unlock(ptep, pgste); - pte_unmap_unlock(ptep, ptl); - return res; -} -EXPORT_SYMBOL(pgste_perform_essa); - -/** - * set_pgste_bits - set specific PGSTE bits. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @bits: a bitmask representing the bits that will be touched - * @value: the values of the bits to be written. Only the bits in the mask - * will be written. - * - * Return: 0 on success, < 0 in case of error. - */ -int set_pgste_bits(struct mm_struct *mm, unsigned long hva, - unsigned long bits, unsigned long value) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pgste_t new; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - new =3D pgste_get_lock(ptep); - - new =3D clear_pgste_bit(new, bits); - new =3D set_pgste_bit(new, value & bits); - - pgste_set_unlock(ptep, new); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(set_pgste_bits); - -/** - * get_pgste - get the current PGSTE for the given address. - * @mm: the memory context. It must have PGSTEs, no check is performed her= e! - * @hva: the host virtual address of the page whose PGSTE is to be process= ed - * @pgstep: will be written with the current PGSTE for the given address. - * - * Return: 0 on success, < 0 in case of error. - */ -int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgst= ep) -{ - struct vm_area_struct *vma; - spinlock_t *ptl; - pte_t *ptep; - - vma =3D vma_lookup(mm, hva); - if (!vma || is_vm_hugetlb_page(vma)) - return -EFAULT; - ptep =3D get_locked_pte(mm, hva, &ptl); - if (unlikely(!ptep)) - return -EFAULT; - *pgstep =3D pgste_val(pgste_get(ptep)); - pte_unmap_unlock(ptep, ptl); - return 0; -} -EXPORT_SYMBOL(get_pgste); -#endif diff --git a/mm/khugepaged.c b/mm/khugepaged.c index abe54f0043c7..0a17225ffe41 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -342,15 +342,6 @@ int hugepage_madvise(struct vm_area_struct *vma, { switch (advice) { case MADV_HUGEPAGE: -#ifdef CONFIG_S390 - /* - * qemu blindly sets MADV_HUGEPAGE on all allocations, but s390 - * can't handle this properly after s390_enable_sie, so we simply - * ignore the madvise to prevent qemu from causing a SIGSEGV. - */ - if (mm_has_pgste(vma->vm_mm)) - return 0; -#endif *vm_flags &=3D ~VM_NOHUGEPAGE; *vm_flags |=3D VM_HUGEPAGE; /* --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88C35371A1E; Thu, 20 Nov 2025 17:16:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659008; cv=none; b=Es7FKD+Z7Ckgh2hsvAlckrhojBuXOjnRjKSu+d95EleSOOMm8Bd2E4ddcGvr8PxGPOST9WSnYBU5BPxNkUSyptp7B7FPlY7DthlT0oUtuDnLLlhOZlxM2cJ4YtkirEuUclWezncpZ7kk7eCNLfO/T7NPpUTfy3tvJiqrOGceq8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659008; c=relaxed/simple; bh=9w3KrEBUUoHASQroqDJK/+seXCXIx5t9hTDKXw0V/18=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OFLpFBP+Rb8uDE/UyFPXuR9An8QtMiuvRAczhC2KdCgQYLNIOdMAMkDUiNgWcdtpFHyNiCinx50kTULgVcd+qVKngT8rGWMgnDOOPUy9sQlj90q4kTdroUaHqV08hoXiHpgiP1fjrizgX6R+BANHr5bVqOwC629KVePp9iybOCk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=mmxuYCUF; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="mmxuYCUF" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBLPkN027938; Thu, 20 Nov 2025 17:16:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=30t+ezeghrBcC+FAT idHtdc9l6Wv1PVVvnvIKELOl4I=; b=mmxuYCUFXH7/02RsBWbB148/DgeQS6kUc BOXIqLgvYA6LmEMZnSbGxIU018tUwcEkHyrcxZm1uF4YHvhFNuYblyow5Xf54y4E skrKhbhm9naOnSAogZh4nCkHYgZjki46OKZ7o0JEenfBSCOCa7X2HLP3qAXKHc9e NZK0pFZMBcYMUwyAD62sq8Kfds0TmNThDmUx5DD1QHhC8RP2jrK3AD30iYeTz85u 2W/JjJXEhumUn1Dn0nMrLih+QrnwKIoqZxNz7xxdGv5cOnL9P432H+d+911aleWV kyBpYagZt2Epnw07tI7lGEqN4rS4HM565IQA7P+lj+lPSAZYPdOgw== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7n1s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:43 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKEPso3022340; Thu, 20 Nov 2025 17:16:35 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4af4un7mhb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:34 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGUOQ50004320 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:30 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CE43E2004B; Thu, 20 Nov 2025 17:16:30 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 98E4820040; Thu, 20 Nov 2025 17:16:28 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:28 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 22/23] KVM: s390: Enable 1M pages for gmap Date: Thu, 20 Nov 2025 18:15:43 +0100 Message-ID: <20251120171544.96841-23-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: rBDDP7gbyOX0At1NtdmUjuFksWw2RFNV X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfXyVD9W50PbDis bsnWg9gZu6G2DCg79YT3+ldWrEXjh2TO9OV6OsPS87Blrsj3CM0L39Q3M08CNyP4WkpljiQc5dX slWLKQSSaUM9R6edV/qWTTUe226KG33LQ/53DtsyBLGSrRZ5TTkCTx2LHtfdQOIVcjadkeDLcq2 ykAse1FPFP26/rHu8h5kZlrYMtSPNu+YOMeZkoYV5+6G6fKXtraEoyKGtgRhYOyPC4Ws2PVMUsE WAO3wivT6ymCKKXRTj4ObuCREH/cc4kGHAYk0JzlueplBeBHtuOTxNSCywEJl/vk3B11ARZVub4 ApDxS7aCnADJg9P97ssEWOh9B2VJEPrqx+bu7C5Muky6qFHKWM/R2wozwEiPnJMCKFS9BMatZyQ FDEPS9/q+jznn7tuZ3/VCNMhgXPJzw== X-Proofpoint-ORIG-GUID: rBDDP7gbyOX0At1NtdmUjuFksWw2RFNV X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cfb cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=2DwSK8QdlStOok5YbGYA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" While userspace is allowed to have pages of any size, the new gmap would always use 4k pages to back the guest. Enable 1M pages for gmap. This allows 1M pages to be used to back a guest when userspace is using 1M pages for the corresponding addresses (e.g. THP or hugetlbfs). Remove the limitation that disallowed having nested guests and hugepages at the same time. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/gmap.c | 2 +- arch/s390/kvm/kvm-s390.c | 6 +----- arch/s390/kvm/pv.c | 3 +++ 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c index 502012c0dfad..ef4190f56ae9 100644 --- a/arch/s390/kvm/gmap.c +++ b/arch/s390/kvm/gmap.c @@ -591,7 +591,7 @@ static inline bool gmap_2g_allowed(struct gmap *gmap, g= fn_t gfn) =20 static inline bool gmap_1m_allowed(struct gmap *gmap, gfn_t gfn) { - return false; + return gmap->allow_hpage_1m; } =20 int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct gue= st_fault *f) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index c8662177c63c..b7dc1d601fb8 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -849,6 +849,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm= _enable_cap *cap) r =3D -EINVAL; else { r =3D 0; + WRITE_ONCE(kvm->arch.gmap->allow_hpage_1m, 1); /* * We might have to create fake 4k page * tables. To avoid that the hardware works on @@ -5837,11 +5838,6 @@ static int __init kvm_s390_init(void) return -ENODEV; } =20 - if (nested && hpage) { - pr_info("A KVM host that supports nesting cannot back its KVM guests wit= h huge pages\n"); - return -EINVAL; - } - for (i =3D 0; i < 16; i++) kvm_s390_fac_base[i] |=3D stfle_fac_list[i] & nonhyp_mask(i); diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index d8a5c7b91148..8ea5f8d7e714 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -621,6 +621,9 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *= rrc) uvcb.flags.ap_allow_instr =3D kvm->arch.model.uv_feat_guest.ap; uvcb.flags.ap_instr_intr =3D kvm->arch.model.uv_feat_guest.ap_intr; =20 + WRITE_ONCE(kvm->arch.gmap->allow_hpage_1m, 0); + gmap_split_huge_pages(kvm->arch.gmap); + cc =3D uv_call_sched(0, (u64)&uvcb); *rc =3D uvcb.header.rc; *rrc =3D uvcb.header.rrc; --=20 2.51.1 From nobody Tue Dec 2 01:51:36 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82A38371A22; Thu, 20 Nov 2025 17:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659007; cv=none; b=O3n9CW6/Gv21esJEfi+UUmX1c0K/OBwSi9FMEmn1Lw8GWc3IPLBq33Y5sTcR+DPj+bj5pEqh6HHyDCIIn8qGyAbQY/aGeOd+KHCAu5K4adz2j9xdYjfwKc0wizXfdMyEhZl6dYD9ulXCyP3uljGy3lpMBvlkHSfUsHyi66bbmxk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763659007; c=relaxed/simple; bh=4CKFDj0hIhd8M0Kkg3C6sDxhR4ruHQPfY+uymnjUbIY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jr2drGRp69sD6rBYiGO2k/l/hQJBaOQ3G7AHSXeo7isLgpjVpTPpWjIj/q1yOZUrgExy+KmiDJv+p9mSbICMWNiforP8PjnR9F1PLwLTHeMhd8TbKaCHAkL8FTF2z+y8EtMnjP7r2W/UgAqbk536tFzTXai9ZVQA8DLrBlhTmj0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=FIhtWoLL; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="FIhtWoLL" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKBbR4c027926; Thu, 20 Nov 2025 17:16:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=JBtmXcIba2fi4czF9 3NATfwa4x8dAL704MEsd0327Tg=; b=FIhtWoLLQwSAibSqDmN8868HXM6pLjx/B G1Y1w2nn3eENAh7k9tKU9ve2aoa3WAE5M92g/mcpfFbZIC38f0ZotnaJkSFRmFAe 3qWjy1n1HRA7LWfVwszzqoQh4cXwoDDpjQrX7lIowquNpjGbiTqGSWdMWLrjivf9 pwO0QJLbuALKqXeE0eu0G9GOlFdaiK2IzOfNHnBF615AJUREIgfXuVzLrGuT4wET I5on+E0anbU45a3Shc/dt7XqMlDAzAksfjyWt5aVltetLUnE9vYX4sFY/7RG+cRL R49M8SRcRPLANkOsM8o52x4pBUIAkDDMiyThGoAjEKSuxL+jg47TQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aejka7n1w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:42 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AKGqd2u010392; Thu, 20 Nov 2025 17:16:36 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4af3usfsht-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 20 Nov 2025 17:16:36 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5AKHGXKa40239588 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Nov 2025 17:16:33 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1C4CA2004B; Thu, 20 Nov 2025 17:16:33 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 087E920040; Thu, 20 Nov 2025 17:16:31 +0000 (GMT) Received: from p-imbrenda.ibmuc.com (unknown [9.111.12.33]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 20 Nov 2025 17:16:30 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, frankja@linux.ibm.com, nsg@linux.ibm.com, nrb@linux.ibm.com, seiden@linux.ibm.com, gra@linux.ibm.com, schlameuss@linux.ibm.com, hca@linux.ibm.com, svens@linux.ibm.com, agordeev@linux.ibm.com, gor@linux.ibm.com, david@redhat.com, gerald.schaefer@linux.ibm.com Subject: [PATCH v4 23/23] KVM: s390: Storage key manipulation IOCTL Date: Thu, 20 Nov 2025 18:15:44 +0100 Message-ID: <20251120171544.96841-24-imbrenda@linux.ibm.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251120171544.96841-1-imbrenda@linux.ibm.com> References: <20251120171544.96841-1-imbrenda@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 7nheCIlC5-cEjh_hXfX5oPVkYVX56mQT X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTE1MDAzMiBTYWx0ZWRfX19cd4yieA3qI mSF+r5eszTy/g+Lfa2YEWYjQLX6bjeFlOSA8NKg1L9bHlHY5pQZae1DmqZw5CLc3YlpOS/AFpXl SNNolliVZPrs5UqEo8q9cX3HvHV3Z36swkUu0RdR99qopKl8daIyUB/vCGrxJ6uEwNQWZqngsuu lSmpQSJeYigGgbWloQ5laQPOM8MiHZ9j62JXmvRQ3JioHofxbCfQ/+V+95olEvO0gEDuakXJwQq fYgwkTRCD1GRKKrifPsPGsp61GoFcOTv/FQQC9SFDTs/NGqglFWwJbtCKoNoEPFaRrn/CirgAa+ WNrd+bsujUdjU246FiCC1CJ0kWkzZPQW0+/FqByFc4hPFuglvmHWSPAIcASkxA2IiXZ60sKF3h6 TSX1tU/qga+YRWe5XG0XqTqu9BtUyA== X-Proofpoint-ORIG-GUID: 7nheCIlC5-cEjh_hXfX5oPVkYVX56mQT X-Authority-Analysis: v=2.4 cv=XtL3+FF9 c=1 sm=1 tr=0 ts=691f4cfa cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=TEVnKxO-3nxHHAUriJcA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-20_06,2025-11-20_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511150032 Content-Type: text/plain; charset="utf-8" Add a new IOCTL to allow userspace to manipulate storage keys directly. This will make it easier to write selftests related to storage keys. Signed-off-by: Claudio Imbrenda --- arch/s390/kvm/kvm-s390.c | 57 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 10 +++++++ 2 files changed, 67 insertions(+) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index b7dc1d601fb8..e8e3c8cf75fa 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -555,6 +555,37 @@ static void __kvm_s390_exit(void) debug_unregister(kvm_s390_dbf_uv); } =20 +static int kvm_s390_keyop(struct kvm_s390_mmu_cache *mc, struct kvm *kvm, = int op, + unsigned long addr, union skey skey) +{ + union asce asce =3D kvm->arch.gmap->asce; + gfn_t gfn =3D gpa_to_gfn(addr); + int r; + + guard(read_lock)(&kvm->mmu_lock); + + switch (op) { + case KVM_S390_KEYOP_SSKE: + r =3D dat_cond_set_storage_key(mc, asce, gfn, skey, &skey, 0, 0, 0); + if (r >=3D 0) + return skey.skey; + break; + case KVM_S390_KEYOP_ISKE: + r =3D dat_get_storage_key(asce, gfn, &skey); + if (!r) + return skey.skey; + break; + case KVM_S390_KEYOP_RRBE: + r =3D dat_reset_reference_bit(asce, gfn); + if (r > 0) + return r << 1; + break; + default: + return -EINVAL; + } + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) @@ -2930,6 +2961,32 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned in= t ioctl, unsigned long arg) r =3D -EFAULT; break; } + case KVM_S390_KEYOP: { + struct kvm_s390_mmu_cache *mc; + struct kvm_s390_keyop kop; + union skey skey; + + if (copy_from_user(&kop, argp, sizeof(kop))) { + r =3D -EFAULT; + break; + } + skey.skey =3D kop.key; + + mc =3D kvm_s390_new_mmu_cache(); + if (!mc) + return -ENOMEM; + + r =3D kvm_s390_keyop(mc, kvm, kop.operation, kop.user_addr, skey); + kvm_s390_free_mmu_cache(mc); + if (r < 0) + break; + + kop.key =3D r; + r =3D 0; + if (copy_to_user(argp, &kop, sizeof(kop))) + r =3D -EFAULT; + break; + } case KVM_S390_ZPCI_OP: { struct kvm_s390_zpci_op args; =20 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 52f6000ab020..402098d20134 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1208,6 +1208,15 @@ struct kvm_vfio_spapr_tce { __s32 tablefd; }; =20 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; + /* * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns * a vcpu fd. @@ -1227,6 +1236,7 @@ struct kvm_vfio_spapr_tce { #define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_ma= pping) #define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_ma= pping) #define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long) +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x53, struct kvm_s390_keyop) =20 /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) --=20 2.51.1