From nobody Sat Jun 20 11:48:37 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 812B230C360; Thu, 18 Jun 2026 14:47:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794064; cv=none; b=pZRAmg5+jUnGXfJuyX+oB65hlJ2PddMVViyAA6IqeGE57DUO5fXvVcOKp/T9cpSieBGIYvdnYkiLH676Nmbt06vXPSXS2JwzjoKZlOJXI057vF4dyA34NwK9GNxUz3sFMlSvdgD3t7U6gQX5tfy6BJWhcmW5aJITiUzU/GMjQ1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794064; c=relaxed/simple; bh=krlLAtxKmiTKeqCUzfOuFCbjTIujtkGOL0woquqjXJQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DSLsfiUeYDLsGTU1bIuU3YtI6IRjcHNRimf6TJL7U7wqlr7AN+eC5nH6xpEWiaa2tiLdrpTo3oSQMfEY2Bu1R0yTdDtwmEpD1lt5svmlUR7wol8l0/qOFS6o/jbJSj/elp1pYb4IW79b25g2ImsU3kVZYNiiM8dOvV8YZAkgQT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=aT/yJkgO; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="aT/yJkgO" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65IEILf83825434; Thu, 18 Jun 2026 14:47:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=LAK2IHF6PKDSnIzAL ihkJANyZ/PnGgqfXG2+DPEBHp8=; b=aT/yJkgOfpTD1aLXnUx6vmnLpUSYX5Z0v EPS/G9kPjz3mqg6bHxe0mdEDr2DFTJ95z8n2dm03nqYSOhAtbU9eM1aRlqDOYoYM i6W9R3n/m8BtVrKt8wVDbvhX3deMdKlFfg40L6V3swU0RxO/tl7OoZwb2xgecIx5 DoZGQR3PVatZODLQTIWgFXfeFAySezBRTIbeezvZ1Rc7ORpfb+G3zxwqamMvNdZz f80Jvrd3h1VJoMGsPNI84F+nk6nqV75ZZ6KRQWNESLj+Reze22HqH41Hzr0ylc9q ExnO63Kvdm6zx1KHcGK5qq3V71P/z8x8ov4HSAHepOBL732AFJYlA== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4euequ8pdu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:34 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 65IEYanJ003024; Thu, 18 Jun 2026 14:47:33 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4ev172c7gb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:32 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 65IElTsd37749162 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jun 2026 14:47:29 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F57C20043; Thu, 18 Jun 2026 14:47:29 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E135A20040; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.87.85.9]) by smtpav01.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 55669) id C6223E072E; Thu, 18 Jun 2026 16:47:28 +0200 (CEST) From: Alexander Gordeev To: Gerald Schaefer , Heiko Carstens , Christian Borntraeger , Vasily Gorbik , Claudio Imbrenda Cc: linux-s390@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kevin Brodsky , David Hildenbrand Subject: [PATCH -next v4 1/4] mm: Make lazy MMU mode context-aware Date: Thu, 18 Jun 2026 16:47:25 +0200 Message-ID: <74ca0412fd60e2de2186c9712081565e3bcc07e6.1781789772.git.agordeev@linux.ibm.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Info: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfX906mdp4LFcyZ XALah7dPlmPTGDG4AnQ3MeACLgEZswVsMPeUC3aKSDdIKWNwVc8h+ItItSWfErVTW8hv/Pa+XkD c1SCF0FVPB6xBix4vTionI71iE6XpTU= X-Proofpoint-ORIG-GUID: _FZ_I7H9sZ5womZl-KJn01rGWAFFxasl X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfX0AB/2Lq69Eek CWV6ZStzX4ne/9Md1vwCjA1GC3I3EQfAAXSc0EammjZ1Pk2uSA+w5/27bB19/pX1y3gx6/jL9F2 QfHAPAtZODfuzaMkvlJIfXccaloMIFhSEARxE5SWh9L4NQHEKboE1Z77Y+sCXW7mc1EioiXoGNs oY/s04nIAAUS6jqpYUl7DrrcPciM05+s2RiP2yd838TPYSngCjQgFYK3T77L3nMvt+tvrBZdP49 5KLNfNmno6I2juKfOHDlsX57UdQPmTslF++YKFjew6kdiAjYYieyhWJJs5OIuDfro6xGegYzqCU vFxjnbH2dojuw8Wm0YquHnYaxI8+HNWYN2gDgRZBdxPgt3GRBo1FtdJXfn1aqOy8mHCP0ycqwWB 9HaerLONiAjpW2MNwh+3sMJZhvi4FFyMUkC4urQHa6LvLocn5D0FDF19XFiotZ69iu/fOfTyuL6 oEmcLlVN3d7I8WEZYSA== X-Proofpoint-GUID: _FZ_I7H9sZ5womZl-KJn01rGWAFFxasl X-Authority-Analysis: v=2.4 cv=L9gtheT8 c=1 sm=1 tr=0 ts=6a340506 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=FelO9ux0wxsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22 a=7CQSdrXTAAAA:8 a=VwQbUJbxAAAA:8 a=VnNF1IyMAAAA:8 a=ovIRSG1zLJ1dbapjHkcA:9 a=a-qgeE7W1pNrGK8U0ZQC:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-18_02,2026-06-18_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 malwarescore=0 clxscore=1015 adultscore=0 phishscore=0 lowpriorityscore=0 suspectscore=0 priorityscore=1501 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606150000 definitions=main-2606180133 Content-Type: text/plain; charset="utf-8" Lazy MMU mode is assumed to be context-independent, in the sense that it does not need any additional information while operating. However, the s390 architecture benefits from knowing the exact page table entries being modified. Introduce lazy_mmu_mode_enable_with_ptes(), which is provided with the process address space and the page table being operated on. This information is required to enable s390-specific optimizations. The function takes parameters that are typically passed to page- table level walkers, which implies that the span of PTE entries never crosses a page table boundary. Architectures that do not require such information simply do not need to define the lazy_mmu_mode_enable_with_ptes() callback. Reviewed-by: Kevin Brodsky Acked-by: David Hildenbrand (Arm) Signed-off-by: Alexander Gordeev --- fs/proc/task_mmu.c | 2 +- include/linux/pgtable.h | 46 +++++++++++++++++++++++++++++++++++++++++ mm/madvise.c | 8 +++---- mm/memory.c | 8 +++---- mm/mprotect.c | 2 +- mm/mremap.c | 2 +- mm/vmalloc.c | 6 +++--- 7 files changed, 60 insertions(+), 14 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d32408f7cd5e..750f6095147f 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2842,7 +2842,7 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, return 0; } =20 - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(vma->vm_mm, start, end, start_pte); =20 if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { /* Fast path for performing exclusive WP */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2981e386da7b..cc85daf30739 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -271,6 +271,50 @@ static inline void lazy_mmu_mode_enable(void) arch_enter_lazy_mmu_mode(); } =20 +#ifndef arch_enter_lazy_mmu_mode_with_ptes +static inline void arch_enter_lazy_mmu_mode_with_ptes(struct mm_struct *mm, + unsigned long addr, unsigned long end, pte_t *ptep) +{ + arch_enter_lazy_mmu_mode(); +} +#endif + +/** + * lazy_mmu_mode_enable_with_ptes() - Enable the lazy MMU mode with a spee= dup hint. + * @mm: Address space the pages are mapped into. + * @addr: Start address of the range. + * @end: End address of the range. + * @ptep: Page table pointer for the first entry. + * + * Enters a new lazy MMU mode section; if the mode was not already enabled, + * enables it and calls arch_enter_lazy_mmu_mode_with_ptes(). + * + * PTEs that fall within the specified range might observe update speedups. + * The PTEs must belong to the specified address space and be in the same = PMD. + * + * There are no requirements on the order or range completeness of PTE + * updates for the specified range. + * + * Must be paired with a call to lazy_mmu_mode_disable(). + * + * Has no effect if called: + * - While paused - see lazy_mmu_mode_pause() + * - In interrupt context + */ +static inline void lazy_mmu_mode_enable_with_ptes(struct mm_struct *mm, + unsigned long addr, unsigned long end, pte_t *ptep) +{ + struct lazy_mmu_state *state =3D ¤t->lazy_mmu_state; + + if (in_interrupt() || state->pause_count > 0) + return; + + VM_WARN_ON_ONCE(state->enable_count =3D=3D U8_MAX); + + if (state->enable_count++ =3D=3D 0) + arch_enter_lazy_mmu_mode_with_ptes(mm, addr, end, ptep); +} + /** * lazy_mmu_mode_disable() - Disable the lazy MMU mode. * @@ -387,6 +431,8 @@ static inline void lazy_mmu_mode_resume(void) } #else static inline void lazy_mmu_mode_enable(void) {} +static inline void lazy_mmu_mode_enable_with_ptes(struct mm_struct *mm, + unsigned long addr, unsigned long end, pte_t *ptep) {} static inline void lazy_mmu_mode_disable(void) {} static inline void lazy_mmu_mode_pause(void) {} static inline void lazy_mmu_mode_resume(void) {} diff --git a/mm/madvise.c b/mm/madvise.c index cd9bb077072c..c14bd5d1828e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, start_pte); for (; addr < end; pte +=3D nr, addr +=3D nr * PAGE_SIZE) { nr =3D 1; ptent =3D ptep_get(pte); @@ -508,7 +508,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) break; flush_tlb_batched_pending(mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, start_pte); if (!err) nr =3D 0; continue; @@ -675,7 +675,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, start_pte); for (; addr !=3D end; pte +=3D nr, addr +=3D PAGE_SIZE * nr) { nr =3D 1; ptent =3D ptep_get(pte); @@ -735,7 +735,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) break; flush_tlb_batched_pending(mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, pte); if (!err) nr =3D 0; continue; diff --git a/mm/memory.c b/mm/memory.c index ff338c2abe92..ee1770ff4a64 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1272,7 +1272,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte =3D src_pte; orig_dst_pte =3D dst_pte; - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(src_mm, addr, end, src_pte); =20 do { nr =3D 1; @@ -1922,7 +1922,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, return addr; =20 flush_tlb_batched_pending(mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, start_pte); do { bool any_skipped =3D false; =20 @@ -2919,7 +2919,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, mapped_pte =3D pte =3D pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, mapped_pte); do { BUG_ON(!pte_none(ptep_get(pte))); if (!pfn_modify_allowed(pfn, prot)) { @@ -3330,7 +3330,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, return -EINVAL; } =20 - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, addr, end, mapped_pte); =20 if (fn) { do { diff --git a/mm/mprotect.c b/mm/mprotect.c index 9cbf932b028c..3fc26418e837 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -337,7 +337,7 @@ static long change_pte_range(struct mmu_gather *tlb, is_private_single_threaded =3D vma_is_single_threaded_private(vma); =20 flush_tlb_batched_pending(vma->vm_mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(vma->vm_mm, addr, end, pte); do { nr_ptes =3D 1; oldpte =3D ptep_get(pte); diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..0dfe3de39ccc 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -260,7 +260,7 @@ static int move_ptes(struct pagetable_move_control *pmc, if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(mm, old_addr, old_end, old_ptep); =20 for (; old_addr < old_end; old_ptep +=3D nr_ptes, old_addr +=3D nr_ptes *= PAGE_SIZE, new_ptep +=3D nr_ptes, new_addr +=3D nr_ptes * PAGE_SIZE) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1afca3568b9b..b5ed2b05771f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -108,7 +108,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, if (!pte) return -ENOMEM; =20 - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(&init_mm, addr, end, pte); =20 do { if (unlikely(!pte_none(ptep_get(pte)))) { @@ -371,7 +371,7 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(&init_mm, addr, end, pte); =20 do { #ifdef CONFIG_HUGETLB_PAGE @@ -538,7 +538,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, if (!pte) return -ENOMEM; =20 - lazy_mmu_mode_enable(); + lazy_mmu_mode_enable_with_ptes(&init_mm, addr, end, pte); =20 do { struct page *page =3D pages[*nr]; --=20 2.53.0 From nobody Sat Jun 20 11:48:37 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50EDA2E8B64; Thu, 18 Jun 2026 14:47:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794064; cv=none; b=aZHuQaRX8CLbAWabK9XlHhNqfLMhOH9bsfIR4Lf4vSo7dDQTqqQp9Wx8f1LY6KKvO8639ZGIv8PfBw3H7vAqvMy6Gqmq3c6rqHB2s5o1MDyownDyXglcD4Ix7n4e4CXK+y/PZ5hsFCCH+09vYDCGVlqt6h/PiHW/m7njfs6W6+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794064; c=relaxed/simple; bh=oMLQqsELzKDjpY8SHe37ad5K6WNIvVcVtatFeVLztrw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sZuUMDCdIvThCvvQVEmmtsN6Qw1wqoLeaNu6Gx9UgwVoDJjkW1yw+IlVOouzSYX1KDGhjaweFf9ufKuyd4aY5yMGVuhK6HjRblyws/ocYGUsh1Zl0NfHbs/EaFfBSktw5sb1PLKOTptrd/pOZR3oSCLmK3G7gem853vE4wVQA+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=mByPdbby; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="mByPdbby" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65IEIQBK3598266; Thu, 18 Jun 2026 14:47:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Wh30ExsXB4Dk9w3gk zV1KJuL4XAPcpnkx8TooRYuZIk=; b=mByPdbbyWeYjslqBnYlUfe/C3YIBoSD0U qH4epwccy0XuodI8/t3KdnKYyPxHkt0gFkCbfYg2YnWO0QxNgEbE0zvBlTLg4wJL th2gjgaFr0ARRa6k1659XN3WlhdcH2lQauc3OOoXS3MvlY1SAnA7kPd/tkEg3Lbc DQYIP+X4oGbuAkGpjcaD+7dHl7evNOho+Y3eEKXl6UVu///sxcqHA2hv95qpaT4h jr5YFxzm7iwVfJ2XlqI7s3ktsOAA+vLMfdhkSR1dSZIid7E8l8HbG+v2KYi9etYB 7Vr7u+D1aSc3goRAtfj4Ksxp3j0AQhCP0pG9ce3JHO48XpyizXLcA== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4eueqxgjeb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:34 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 65IEYbrS028040; Thu, 18 Jun 2026 14:47:33 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4ev172c794-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:33 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 65IElT7g49873334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jun 2026 14:47:29 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3338120040; Thu, 18 Jun 2026 14:47:29 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E5E2E20043; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.87.85.9]) by smtpav04.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 55669) id C915EE0801; Thu, 18 Jun 2026 16:47:28 +0200 (CEST) From: Alexander Gordeev To: Gerald Schaefer , Heiko Carstens , Christian Borntraeger , Vasily Gorbik , Claudio Imbrenda Cc: linux-s390@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kevin Brodsky , David Hildenbrand Subject: [PATCH -next v4 2/4] s390/mm: Batch PTE updates in lazy MMU mode Date: Thu, 18 Jun 2026 16:47:26 +0200 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=OcSoyBTY c=1 sm=1 tr=0 ts=6a340506 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=FelO9ux0wxsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=V8glGbnc2Ofi9Qvn3v5h:22 a=VnNF1IyMAAAA:8 a=C-ONzq7a6YYeur7YfcQA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfX+hSRBDEsJMyF UezpEaKE5P+K8lkZlH726dRFSmBGtzmYZBN+6cwpLwIF0BcpZ48js89HLO9knPnefkDvvEkdM5J OoTNx37rNefki1phYcQ9HIrCsUAnaXi+zaloGwKoj6r1113hwB0xD3YRum2JtBFCdzCfz/GXkia +xBCFx1pbAtr6VnvxeBnJE3fqX7CPN782ZEHpxXW9E0076W596aKGgwM6v2oVmDDe1dGe2AK3BR KwS8U2EpMLfl2EN/PKaweya2oOuf3qlXlBPRnTvRBiCboTPbxiBswXk/KOyushM/p6EsrF1uVHf hA34s7zcA4WzUK3IalmcW+soahYz7MUKYZN56/0na3yaifFwrp6N1awgbXAMblbU1zHWvraUHRp gLILNLRhJgwjra4AQ52EMYfP2RM9C9YhuB5DA9fdRkVEXmTvYeEZrifjev+254YEZJbf+DF/zGI ysd2dxaLeYFxaZNGCGw== X-Proofpoint-GUID: n-nNqBj2JDihbZsbJbY8x_xI5eDWZ9WF X-Proofpoint-Spam-Info: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfX0gcjOr4AXaze p8zu5AOrhkW6Ot6PKOLYnymma9QRPJrUU+eXrwakAxo7gY8+y4yKtvehdA4oDbUqdPG8tyvHNdg 2nIKUdyQ5SDjhU4vdWT3gd4yaYTwbh4= X-Proofpoint-ORIG-GUID: n-nNqBj2JDihbZsbJbY8x_xI5eDWZ9WF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-18_02,2026-06-18_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 spamscore=0 adultscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606150000 definitions=main-2606180133 Content-Type: text/plain; charset="utf-8" Make use of the IPTE instruction's "Additional Entries" field to invalidate multiple PTEs in one go while in lazy MMU mode. This is the mode in which many memory-management system calls (like mremap(), mprotect(), etc.) update memory attributes. To achieve that, the set_pte() and ptep_get() primitives use a per-CPU cache to store and retrieve PTE values and apply the cached values to the real page table once lazy MMU mode is left. The same is done for memory-management platform callbacks that would otherwise cause intense per-PTE IPTE traffic, reducing the number of IPTE instructions from up to PTRS_PER_PTE to a single instruction in the best case. The average reduction is of course smaller. Since all existing page table iterators called in lazy MMU mode handle one table at a time, the per-CPU cache does not need to be larger than PTRS_PER_PTE entries. That also naturally aligns with the IPTE instruction, which must not cross a page table boundary. Before this change, the system calls did: lazy_mmu_mode_enable_pte() ... // up to PTRS_PER_PTE single-IPTEs ... lazy_mmu_mode_disable() With this change, the system calls do: lazy_mmu_mode_enable_pte() ... ... lazy_mmu_mode_disable() // apply cache with one multi-IPTE When applied to large memory ranges, some system calls show significant speedups: mprotect() ~15x munmap() ~3x mremap() ~28x At the same time, fork() shows a measurable slowdown of ~1.5x. The overall results depend on memory size and access patterns, but the change generally does not degrade performance. In addition to a process-wide impact, the rework affects the whole Central Electronics Complex (CEC). Each (global) IPTE instruction initiates a quiesce state in a CEC, so reducing the number of IPTE calls relieves CEC-wide quiesce traffic. In an extreme case of mprotect() contiguously triggering the quiesce state on four LPARs in parallel, measurements show ~25x fewer quiesce events. Signed-off-by: Alexander Gordeev --- arch/s390/Kconfig | 1 + arch/s390/include/asm/lazy_mmu.h | 9 + arch/s390/include/asm/lowcore.h | 2 +- arch/s390/include/asm/pgtable.h | 157 +++++++++++-- arch/s390/kernel/setup.c | 2 + arch/s390/kernel/smp.c | 7 + arch/s390/mm/Makefile | 2 +- arch/s390/mm/lazy_mmu.c | 382 +++++++++++++++++++++++++++++++ arch/s390/mm/pgtable.c | 8 +- 9 files changed, 546 insertions(+), 24 deletions(-) create mode 100644 arch/s390/include/asm/lazy_mmu.h create mode 100644 arch/s390/mm/lazy_mmu.c diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 84404e6778d5..7846332dcd0a 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -97,6 +97,7 @@ config S390 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_LAZY_MMU_MODE select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/s390/include/asm/lazy_mmu.h b/arch/s390/include/asm/lazy_= mmu.h new file mode 100644 index 000000000000..98366e9de9bc --- /dev/null +++ b/arch/s390/include/asm/lazy_mmu.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LAZY_MMU_H +#define __LAZY_MMU_H + +void lazy_mmu_online_boot_cpu(void); +int lazy_mmu_online_cpu(gfp_t gfp, unsigned int cpu); +void lazy_mmu_offline_cpu(unsigned int cpu); + +#endif /* __LAZY_MMU_H */ diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcor= e.h index 3b3ecc647993..dba236664da9 100644 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -163,7 +163,7 @@ struct lowcore { __s32 preempt_count; /* 0x03a8 */ __u32 spinlock_lockval; /* 0x03ac */ __u32 spinlock_index; /* 0x03b0 */ - __u8 pad_0x03b4[0x03b8-0x03b4]; /* 0x03b4 */ + __s32 lazy_mmu_count; /* 0x03b4 */ __u64 percpu_offset; /* 0x03b8 */ __u8 percpu_register; /* 0x03c0 */ __u8 pad_0x03c1[0x0400-0x03c1]; /* 0x03c1 */ diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index f9a8a92fa160..2b6659d61fa5 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -39,6 +39,64 @@ enum { =20 extern atomic_long_t direct_pages_count[PG_DIRECT_MAP_MAX]; =20 +bool __lazy_mmu_ptep_test_and_clear_young(unsigned long addr, pte_t *ptep,= int *res); +bool __lazy_mmu_ptep_get_and_clear(unsigned long addr, pte_t *ptep, pte_t = *res); +bool __lazy_mmu_ptep_modify_prot_start(unsigned long addr, pte_t *ptep, pt= e_t *res); +bool __lazy_mmu_ptep_modify_prot_commit(unsigned long addr, pte_t *ptep, p= te_t old_pte, pte_t pte); +bool __lazy_mmu_ptep_set_wrprotect(unsigned long addr, pte_t *ptep); +bool __lazy_mmu_set_pte(pte_t *ptep, pte_t pte); +bool __lazy_mmu_ptep_get(pte_t *ptep, pte_t *res); + +static __always_inline bool is_lazy_mmu_active(void) +{ + if (__is_defined(__DECOMPRESSOR)) + return false; + if (!get_lowcore()->lazy_mmu_count) + return false; + return true; +} + +static inline +bool lazy_mmu_ptep_test_and_clear_young(unsigned long addr, pte_t *ptep, i= nt *res) +{ + if (!is_lazy_mmu_active()) + return false; + return __lazy_mmu_ptep_test_and_clear_young(addr, ptep, res); +} + +static inline +bool lazy_mmu_ptep_get_and_clear(unsigned long addr, pte_t *ptep, pte_t *r= es) +{ + if (!is_lazy_mmu_active()) + return false; + return __lazy_mmu_ptep_get_and_clear(addr, ptep, res); +} + +static inline +bool lazy_mmu_ptep_modify_prot_start(unsigned long addr, pte_t *ptep, pte_= t *res) +{ + if (!is_lazy_mmu_active()) + return false; + return __lazy_mmu_ptep_modify_prot_start(addr, ptep, res); +} + +static inline +bool lazy_mmu_ptep_modify_prot_commit(unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte) +{ + if (!is_lazy_mmu_active()) + return false; + return __lazy_mmu_ptep_modify_prot_commit(addr, ptep, old_pte, pte); +} + +static inline +bool lazy_mmu_ptep_set_wrprotect(unsigned long addr, pte_t *ptep) +{ + if (!is_lazy_mmu_active()) + return false; + return __lazy_mmu_ptep_set_wrprotect(addr, ptep); +} + static inline void update_page_count(int level, long count) { if (IS_ENABLED(CONFIG_PROC_FS)) @@ -978,15 +1036,30 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) WRITE_ONCE(*pmdp, pmd); } =20 -static inline void set_pte(pte_t *ptep, pte_t pte) +static inline void __set_pte(pte_t *ptep, pte_t pte) { WRITE_ONCE(*ptep, pte); } =20 +static inline void set_pte(pte_t *ptep, pte_t pte) +{ + if (!is_lazy_mmu_active() || !__lazy_mmu_set_pte(ptep, pte)) + __set_pte(ptep, pte); +} + +static inline pte_t __ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + #define ptep_get ptep_get static inline pte_t ptep_get(pte_t *ptep) { - return READ_ONCE(*ptep); + pte_t res; + + if (!is_lazy_mmu_active() || !__lazy_mmu_ptep_get(ptep, &res)) + res =3D __ptep_get(ptep); + return res; } =20 #define pmdp_get pmdp_get @@ -1179,6 +1252,15 @@ static __always_inline void __ptep_ipte_range(unsign= ed long address, int nr, } while (nr !=3D 255); } =20 +void arch_enter_lazy_mmu_mode_with_ptes(struct mm_struct *mm, + unsigned long addr, unsigned long end, + pte_t *pte); +#define arch_enter_lazy_mmu_mode_with_ptes arch_enter_lazy_mmu_mode_with_p= tes + +void arch_enter_lazy_mmu_mode(void); +void arch_leave_lazy_mmu_mode(void); +void arch_flush_lazy_mmu_mode(void); + /* * This is hard to understand. ptep_get_and_clear and ptep_clear_flush * both clear the TLB for the unmapped pte. The reason is that @@ -1199,10 +1281,16 @@ pte_t ptep_xchg_lazy(struct mm_struct *, unsigned l= ong, pte_t *, pte_t); static inline bool ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - pte_t pte =3D ptep_get(ptep); + pte_t pte; + int res; =20 - pte =3D ptep_xchg_direct(vma->vm_mm, addr, ptep, pte_mkold(pte)); - return pte_young(pte); + if (!lazy_mmu_ptep_test_and_clear_young(addr, ptep, &res)) { + pte =3D __ptep_get(ptep); + pte =3D pte_mkold(pte); + pte =3D ptep_xchg_direct(vma->vm_mm, addr, ptep, pte); + res =3D pte_young(pte); + } + return res; } =20 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH @@ -1218,7 +1306,8 @@ static inline pte_t ptep_get_and_clear(struct mm_stru= ct *mm, { pte_t res; =20 - res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + if (!lazy_mmu_ptep_get_and_clear(addr, ptep, &res)) + res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); page_table_check_pte_clear(mm, addr, res); /* At this point the reference through the mapping is still present */ if (mm_is_protected(mm) && pte_present(res)) @@ -1227,9 +1316,34 @@ static inline pte_t ptep_get_and_clear(struct mm_str= uct *mm, } =20 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION -pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t= *); -void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, - pte_t *, pte_t, pte_t); +pte_t ___ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pt= e_t *); +void ___ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, + pte_t *, pte_t, pte_t); + +static inline +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + pte_t res; + + if (!lazy_mmu_ptep_modify_prot_start(addr, ptep, &res)) + res =3D ___ptep_modify_prot_start(vma, addr, ptep); + return res; +} + +static inline +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, + pte_t *ptep, pte_t old_pte, pte_t pte) +{ + if (!lazy_mmu_ptep_modify_prot_commit(addr, ptep, old_pte, pte)) + ___ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); +} + +bool ipte_range_ptep_modify_prot_start(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, pte_t *res); +bool ipte_range_ptep_modify_prot_commit(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte); =20 #define __HAVE_ARCH_PTEP_CLEAR_FLUSH static inline pte_t ptep_clear_flush(struct vm_area_struct *vma, @@ -1259,11 +1373,13 @@ static inline pte_t ptep_get_and_clear_full(struct = mm_struct *mm, { pte_t res; =20 - if (full) { - res =3D ptep_get(ptep); - set_pte(ptep, __pte(_PAGE_INVALID)); - } else { - res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + if (!lazy_mmu_ptep_get_and_clear(addr, ptep, &res)) { + if (full) { + res =3D __ptep_get(ptep); + __set_pte(ptep, __pte(_PAGE_INVALID)); + } else { + res =3D ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); + } } page_table_check_pte_clear(mm, addr, res); /* At this point the reference through the mapping is still present */ @@ -1289,10 +1405,15 @@ static inline pte_t ptep_get_and_clear_full(struct = mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_t pte =3D ptep_get(ptep); + pte_t pte; =20 - if (pte_write(pte)) - ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte)); + if (!lazy_mmu_ptep_set_wrprotect(addr, ptep)) { + pte =3D __ptep_get(ptep); + if (pte_write(pte)) { + pte =3D pte_wrprotect(pte); + ptep_xchg_lazy(mm, addr, ptep, pte); + } + } } =20 /* @@ -1325,7 +1446,7 @@ static inline void flush_tlb_fix_spurious_fault(struc= t vm_area_struct *vma, * PTE does not have _PAGE_PROTECT set, to avoid unnecessary overhead. * A local RDP can be used to do the flush. */ - if (cpu_has_rdp() && !(pte_val(ptep_get(ptep)) & _PAGE_PROTECT)) + if (cpu_has_rdp() && !(pte_val(__ptep_get(ptep)) & _PAGE_PROTECT)) __ptep_rdp(address, ptep, 1); } #define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index b60284328fe3..f5a3c9e1b6b8 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -77,6 +77,7 @@ #include #include #include +#include #include "entry.h" =20 /* @@ -1012,5 +1013,6 @@ void __init setup_arch(char **cmdline_p) =20 void __init arch_cpu_finalize_init(void) { + lazy_mmu_online_boot_cpu(); sclp_init(); } diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 0ba7f89b8161..0a826bbaf1dd 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -59,6 +59,7 @@ #include #include #include +#include #include "entry.h" =20 enum { @@ -866,6 +867,11 @@ int __cpu_up(unsigned int cpu, struct task_struct *tid= le) rc =3D pcpu_alloc_lowcore(pcpu, cpu); if (rc) return rc; + rc =3D lazy_mmu_online_cpu(GFP_KERNEL, cpu); + if (rc) { + pcpu_free_lowcore(pcpu, cpu); + return rc; + } /* * Make sure global control register contents do not change * until new CPU has initialized control registers. @@ -921,6 +927,7 @@ void __cpu_die(unsigned int cpu) pcpu =3D per_cpu_ptr(&pcpu_devices, cpu); while (!pcpu_stopped(pcpu)) cpu_relax(); + lazy_mmu_offline_cpu(cpu); pcpu_free_lowcore(pcpu, cpu); cpumask_clear_cpu(cpu, mm_cpumask(&init_mm)); cpumask_clear_cpu(cpu, &init_mm.context.cpu_attach_mask); diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile index 193899c39ca7..26e9fc11543a 100644 --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -3,7 +3,7 @@ # Makefile for the linux s390-specific parts of the memory manager. # =20 -obj-y :=3D init.o fault.o extmem.o mmap.o vmem.o maccess.o +obj-y :=3D init.o fault.o extmem.o mmap.o vmem.o maccess.o lazy_mmu.o obj-y +=3D page-states.o pageattr.o pgtable.o pgalloc.o extable.o =20 obj-$(CONFIG_CMM) +=3D cmm.o diff --git a/arch/s390/mm/lazy_mmu.c b/arch/s390/mm/lazy_mmu.c new file mode 100644 index 000000000000..d75b93d9b0de --- /dev/null +++ b/arch/s390/mm/lazy_mmu.c @@ -0,0 +1,382 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +#define PTE_POISON _PAGE_LARGE + +struct ipte_range { + struct mm_struct *mm; + unsigned long base_addr; + unsigned long base_end; + pte_t *base_pte; + pte_t *start_pte; + pte_t *end_pte; + pte_t cache[PTRS_PER_PTE]; +}; + +static DEFINE_PER_CPU(struct ipte_range *, ipte_range); + +static int count_contiguous(pte_t *start, pte_t *end, bool *valid) +{ + unsigned long page_invalid_bit; + pte_t *ptep, pte; + + pte =3D __ptep_get(start); + page_invalid_bit =3D pte_val(pte) & _PAGE_INVALID; + + for (ptep =3D start + 1; ptep < end; ptep++) { + pte =3D __ptep_get(ptep); + if ((pte_val(pte) & _PAGE_INVALID) !=3D page_invalid_bit) + break; + } + + *valid =3D !(page_invalid_bit); + return ptep - start; +} + +static void __invalidate_pte_range(struct mm_struct *mm, unsigned long add= r, + int nr_ptes, pte_t *ptep) +{ + atomic_inc(&mm->context.flush_count); + if (cpu_has_tlb_lc() && cpumask_equal(mm_cpumask(mm), cpumask_of(smp_proc= essor_id()))) + __ptep_ipte_range(addr, nr_ptes - 1, ptep, IPTE_LOCAL); + else + __ptep_ipte_range(addr, nr_ptes - 1, ptep, IPTE_GLOBAL); + atomic_dec(&mm->context.flush_count); +} + +static int invalidate_pte_range(struct mm_struct *mm, unsigned long addr, + pte_t *start, pte_t *end) +{ + int nr_ptes; + bool valid; + + nr_ptes =3D count_contiguous(start, end, &valid); + if (valid) + __invalidate_pte_range(mm, addr, nr_ptes, start); + + return nr_ptes; +} + +static void set_pte_range(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t *end, pte_t *cache) +{ + int i, nr_ptes; + + while (ptep < end) { + nr_ptes =3D invalidate_pte_range(mm, addr, ptep, end); + + for (i =3D 0; i < nr_ptes; i++, ptep++, cache++) { + __set_pte(ptep, *cache); + *cache =3D __pte(PTE_POISON); + } + + addr +=3D nr_ptes * PAGE_SIZE; + } +} + +static void enter_ipte_norange(void) +{ + struct ipte_range __maybe_unused *range; + + if (!test_facility(13)) + return; + + range =3D get_cpu_var(ipte_range); + get_lowcore()->lazy_mmu_count++; +} + +static void enter_ipte_range(struct mm_struct *mm, + unsigned long addr, unsigned long end, pte_t *pte) +{ + struct ipte_range *range; + + if (!test_facility(13)) + return; + + range =3D get_cpu_var(ipte_range); + get_lowcore()->lazy_mmu_count++; + + range->mm =3D mm; + range->base_addr =3D addr; + range->base_end =3D end; + range->base_pte =3D pte; +} + +static void leave_ipte_range(void) +{ + pte_t *ptep, *start, *start_cache, *cache; + unsigned long start_addr, addr; + struct ipte_range *range; + int start_idx; + + if (!test_facility(13)) + return; + + lockdep_assert_preemption_disabled(); + range =3D this_cpu_read(ipte_range); + if (!range->mm) + goto norange; + if (!range->start_pte) + goto done; + + start =3D range->start_pte; + start_idx =3D range->start_pte - range->base_pte; + start_addr =3D range->base_addr + start_idx * PAGE_SIZE; + addr =3D start_addr; + start_cache =3D &range->cache[start_idx]; + cache =3D start_cache; + for (ptep =3D start; ptep < range->end_pte; ptep++, cache++, addr +=3D PA= GE_SIZE) { + if (pte_val(*cache) =3D=3D PTE_POISON) { + if (start) { + set_pte_range(range->mm, start_addr, start, ptep, start_cache); + start =3D NULL; + } + } else if (!start) { + start =3D ptep; + start_addr =3D addr; + start_cache =3D cache; + } + } + set_pte_range(range->mm, start_addr, start, ptep, start_cache); + + range->start_pte =3D NULL; + range->end_pte =3D NULL; + +done: + range->mm =3D NULL; + range->base_addr =3D 0; + range->base_end =3D 0; + range->base_pte =3D NULL; + +norange: + get_lowcore()->lazy_mmu_count--; + put_cpu_var(ipte_range); +} + +static void flush_lazy_mmu_mode(void) +{ + unsigned long addr, end; + struct ipte_range *range; + struct mm_struct *mm; + pte_t *pte; + + if (!test_facility(13)) + return; + + range =3D get_cpu_var(ipte_range); + if (range->mm) { + mm =3D range->mm; + addr =3D range->base_addr; + end =3D range->base_end; + pte =3D range->base_pte; + + leave_ipte_range(); + enter_ipte_range(mm, addr, end, pte); + } + put_cpu_var(ipte_range); +} + +void arch_enter_lazy_mmu_mode(void) +{ + enter_ipte_norange(); +} +EXPORT_SYMBOL_IF_KUNIT(arch_enter_lazy_mmu_mode); + +void arch_enter_lazy_mmu_mode_with_ptes(struct mm_struct *mm, + unsigned long addr, unsigned long end, + pte_t *pte) +{ + enter_ipte_range(mm, addr, end, pte); +} +EXPORT_SYMBOL_IF_KUNIT(arch_enter_lazy_mmu_mode_with_ptes); + +void arch_leave_lazy_mmu_mode(void) +{ + leave_ipte_range(); +} +EXPORT_SYMBOL_IF_KUNIT(arch_leave_lazy_mmu_mode); + +void arch_flush_lazy_mmu_mode(void) +{ + flush_lazy_mmu_mode(); +} +EXPORT_SYMBOL_IF_KUNIT(arch_flush_lazy_mmu_mode); + +static void __ipte_range_set_pte(struct ipte_range *range, pte_t *ptep, pt= e_t pte) +{ + unsigned int idx =3D ptep - range->base_pte; + + lockdep_assert_preemption_disabled(); + range->cache[idx] =3D pte; + + if (!range->start_pte) { + range->start_pte =3D ptep; + range->end_pte =3D ptep + 1; + } else if (ptep < range->start_pte) { + range->start_pte =3D ptep; + } else if (ptep + 1 > range->end_pte) { + range->end_pte =3D ptep + 1; + } +} + +static pte_t __ipte_range_ptep_get(struct ipte_range *range, pte_t *ptep) +{ + unsigned int idx =3D ptep - range->base_pte; + + lockdep_assert_preemption_disabled(); + if (pte_val(range->cache[idx]) =3D=3D PTE_POISON) + return __ptep_get(ptep); + return range->cache[idx]; +} + +static struct ipte_range *this_ipte_range(pte_t *ptep) +{ + struct ipte_range *range; + unsigned int nr_ptes; + + range =3D this_cpu_read(ipte_range); + if (ptep < range->base_pte) + return NULL; + nr_ptes =3D (range->base_end - range->base_addr) / PAGE_SIZE; + if (ptep >=3D range->base_pte + nr_ptes) + return NULL; + + return range; +} + +bool __lazy_mmu_set_pte(pte_t *ptep, pte_t pte) +{ + struct ipte_range *range; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + __ipte_range_set_pte(range, ptep, pte); + + return true; +} + +bool __lazy_mmu_ptep_get(pte_t *ptep, pte_t *res) +{ + struct ipte_range *range; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + *res =3D __ipte_range_ptep_get(range, ptep); + + return true; +} + +bool __lazy_mmu_ptep_test_and_clear_young(unsigned long addr, pte_t *ptep,= int *res) +{ + struct ipte_range *range; + pte_t pte, old; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + old =3D __ipte_range_ptep_get(range, ptep); + pte =3D pte_mkold(old); + __ipte_range_set_pte(range, ptep, pte); + *res =3D pte_young(old); + + return true; +} + +bool __lazy_mmu_ptep_get_and_clear(unsigned long addr, pte_t *ptep, pte_t = *res) +{ + struct ipte_range *range; + pte_t pte, old; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + old =3D __ipte_range_ptep_get(range, ptep); + pte =3D __pte(_PAGE_INVALID); + __ipte_range_set_pte(range, ptep, pte); + *res =3D old; + + return true; +} + +bool __lazy_mmu_ptep_modify_prot_start(unsigned long addr, pte_t *ptep, pt= e_t *res) +{ + return __lazy_mmu_ptep_get_and_clear(addr, ptep, res); +} + +bool __lazy_mmu_ptep_modify_prot_commit(unsigned long addr, pte_t *ptep, + pte_t old_pte, pte_t pte) +{ + struct ipte_range *range; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + __ipte_range_set_pte(range, ptep, pte); + + return true; +} + +bool __lazy_mmu_ptep_set_wrprotect(unsigned long addr, pte_t *ptep) +{ + struct ipte_range *range; + pte_t pte; + + range =3D this_ipte_range(ptep); + if (!range) + return false; + + pte =3D __ipte_range_ptep_get(range, ptep); + if (pte_write(pte)) { + pte =3D pte_wrprotect(pte); + __ipte_range_set_pte(range, ptep, pte); + } + + return true; +} + +int lazy_mmu_online_cpu(gfp_t gfp, unsigned int cpu) +{ + struct ipte_range *range; + int i; + + if (!test_facility(13)) + return 0; + + range =3D kzalloc_obj(*range, gfp); + if (!range) + return -ENOMEM; + for (i =3D 0; i < ARRAY_SIZE(range->cache); i++) + range->cache[i] =3D __pte(PTE_POISON); + per_cpu(ipte_range, cpu) =3D range; + + return 0; +} + +void lazy_mmu_offline_cpu(unsigned int cpu) +{ + struct ipte_range *range; + + if (!test_facility(13)) + return; + + range =3D per_cpu(ipte_range, cpu); + per_cpu(ipte_range, cpu) =3D NULL; + kfree(range); +} + +void __init lazy_mmu_online_boot_cpu(void) +{ + lazy_mmu_online_cpu(GFP_ATOMIC, 0); +} diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 4acd8b140c4b..df36523bcbbb 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -166,14 +166,14 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned l= ong addr, } EXPORT_SYMBOL(ptep_xchg_lazy); =20 -pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, - pte_t *ptep) +pte_t ___ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long = addr, + pte_t *ptep) { return ptep_flush_lazy(vma->vm_mm, addr, ptep, 1); } =20 -void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, - pte_t *ptep, pte_t old_pte, pte_t pte) +void ___ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long = addr, + pte_t *ptep, pte_t old_pte, pte_t pte) { set_pte(ptep, pte); } --=20 2.53.0 From nobody Sat Jun 20 11:48:37 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADC24311954; Thu, 18 Jun 2026 14:47:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794063; cv=none; b=gUg0GdIUBsd5ABosiw3Dvcy3GewvBZNuSE4tuWodevKOJYONY5CGgwP8W3jvV1f0oj3FxF8O3Qoh0DUCRKkIjnSp4IZKAskiKe749rNtrQI88V7wnuXBzQh822zA5GlZXlMDsjZX+2ILfsLcbc2Y6faQ2TbAEsrR5ubNp6tcKS8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794063; c=relaxed/simple; bh=zk/RSbN6RZ7ABlkFkPJo4CjjUUbpsORg+MkEStfZ3NU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V9Q+gtfKqk/jk44Ja2/ZYnn++6+xInoNsIkKWjuATq3yDstWWOX5snNfiaJnqaXv8tUmaJXWgqHBXysAtp6N83zCh/BHfxcpEpiU3Wt/SUAkc3n9M8tGCh/9pQAitHP728E2E5Tc4SMnfM3hIAMBrsfdHLAjEvPDM0UjcfJjBHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=J0cA797v; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="J0cA797v" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65IEIMf23825512; Thu, 18 Jun 2026 14:47:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=wVZh/diParohRV0vS r+dwmsRa2RCsdg+05fhUVnh7AA=; b=J0cA797vniT/OscH6U/sUnyW07Ah2vmBo UAHJGiCXlIHuZFJnGi5vkCCnDF1t7tUuymRaSolNXvpHTLHMrEWn1KEQmKXPI/AS wtb5ZZUmu88rps/6HIpFhhZykoT+8ZDxs9dNsqNRqPGMnLvsgzaOxBI30BjzX5x+ J4SISt8comsPGkAuWMEZbstFchPsYG0vyCra2Hh4hoSGmslPNRbgny6oHLQg2zA5 ygbBij6p4me/iagovWDshmES3caIWzra0ZCxLh5fXGaqCcZl7j9oac8DcZ6bnEbE TMWBeQlBiso4GyLGlwSz+2Ra8FZXqcwIH3tHO4X+/x0O6TJq1j3sw== Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4euequ8pdv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:34 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 65IEYekS003046; Thu, 18 Jun 2026 14:47:33 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4ev172c7gc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:32 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 65IElTmu50856284 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jun 2026 14:47:29 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 170C320043; Thu, 18 Jun 2026 14:47:29 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E3C0720040; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.87.85.9]) by smtpav02.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 55669) id CB795E080A; Thu, 18 Jun 2026 16:47:28 +0200 (CEST) From: Alexander Gordeev To: Gerald Schaefer , Heiko Carstens , Christian Borntraeger , Vasily Gorbik , Claudio Imbrenda Cc: linux-s390@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kevin Brodsky , David Hildenbrand Subject: [PATCH -next v4 3/4] mm/kasan: Introduce helpers for lazy MMU mode sanitizer Date: Thu, 18 Jun 2026 16:47:27 +0200 Message-ID: <0d76139923a280617a21839b7e3f3e7735b58fdf.1781789772.git.agordeev@linux.ibm.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Info: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfXw/h4dcMCGa/w pyTlBGENgKlb5/zRUALKmPegxermI3iPsv2VolNnMXw4GatB+mDgOzKljsYnNZ09D0hdE4CFu1i c51wU0XeQ1/zoFtmvi2wx0FaEst9kOw= X-Proofpoint-ORIG-GUID: P3E1IP0mguKZJMoIp9_D6FKidqpvTnpm X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfXyBJcS+wx7zf6 QZLwc3ePfrd5qak7lSfqZxWrRr9gIVmdu2/qkAWXTUFIv/HcylEKguW4j1MyzQ12sdCFqKuwsc5 3BeeAmDBlRlzO2+A0rSBOqLVQyilkH49pdzhI8n3fQ3qaebLbBxs9+p/5E/v3x/w4iNvC5lcz/Y tMrKTuUTjb/y/zo38dFv2g9iT1J4yfKBy18w1On/iNx/A5dvzAbr3fBBRBxPrhN2XE9iwdo8C3p 8N70Y4RN4467eV20h0W242rk+3/OrDWx3BsSVO02E2wq8M+JLV6UEbF6iDv0OkWypFQpYPka2ML L6K5sSptRc6DiABxW+vfowToI2bpkyhsGZu6u4Hm885qWqwbihUaDxKJXmZGyDIeAnry8wCkf58 pga3UKZ7PBnu+5+s4lqS+J9VUDB3NGo/L5Su5zHKVrjx2V+F9+rWEcqDCFr4vfNRio5hwz6EIkG fhNjMkH+bFSzGkEIWjw== X-Proofpoint-GUID: P3E1IP0mguKZJMoIp9_D6FKidqpvTnpm X-Authority-Analysis: v=2.4 cv=L9gtheT8 c=1 sm=1 tr=0 ts=6a340506 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=FelO9ux0wxsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22 a=VnNF1IyMAAAA:8 a=lldn-txhwCp5ury-Zd0A:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-18_02,2026-06-18_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 malwarescore=0 clxscore=1015 adultscore=0 phishscore=0 lowpriorityscore=0 suspectscore=0 priorityscore=1501 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606150000 definitions=main-2606180133 Content-Type: text/plain; charset="utf-8" Provide helpers that allow architectures implement illegitimate PTE direct accesses while the lazy MMU mode is enabled, such as: pte_t pte =3D *ptep; *ptep =3D pte; By contrast, these would have to be: pte_t pte =3D ptep_get(ptep); set_pte(ptep, pte); The direct PTE accesses pose a real issue on s390. Suggested-by: Ilya Leoshkevich Signed-off-by: Alexander Gordeev --- include/linux/kasan.h | 16 ++++++++++++++++ mm/kasan/common.c | 10 ++++++++++ mm/kasan/kasan.h | 2 ++ 3 files changed, 28 insertions(+) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index bf233bde68c7..deadf566b84a 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -134,6 +134,20 @@ static __always_inline void kasan_poison_slab(struct s= lab *slab) __kasan_poison_slab(slab); } =20 +void __kasan_poison_pte(pte_t *pte, int nr); +static __always_inline void kasan_poison_pte(pte_t *pte, int nr) +{ + if (kasan_enabled()) + __kasan_poison_pte(pte, nr); +} + +void __kasan_unpoison_pte(pte_t *pte, int nr); +static __always_inline void kasan_unpoison_pte(pte_t *pte, int nr) +{ + if (kasan_enabled()) + __kasan_unpoison_pte(pte, nr); +} + void __kasan_unpoison_new_object(struct kmem_cache *cache, void *object); /** * kasan_unpoison_new_object - Temporarily unpoison a new slab object. @@ -414,6 +428,8 @@ static inline bool kasan_unpoison_pages(struct page *pa= ge, unsigned int order, return false; } static inline void kasan_poison_slab(struct slab *slab) {} +static inline void kasan_poison_pte(pte_t *pte, int nr) {} +static inline void kasan_unpoison_pte(pte_t *pte, int nr) {} static inline void kasan_unpoison_new_object(struct kmem_cache *cache, void *object) {} static inline void kasan_poison_new_object(struct kmem_cache *cache, diff --git a/mm/kasan/common.c b/mm/kasan/common.c index b7d05c2a6d93..cbf68680614e 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -163,6 +163,16 @@ void __kasan_poison_slab(struct slab *slab) KASAN_SLAB_REDZONE, false); } =20 +void __kasan_poison_pte(pte_t *pte, int nr) +{ + kasan_poison(pte, sizeof(*pte) * nr, KASAN_LAZY_MMU_PTE, false); +} + +void __kasan_unpoison_pte(pte_t *pte, int nr) +{ + kasan_unpoison(pte, sizeof(*pte) * nr, false); +} + void __kasan_unpoison_new_object(struct kmem_cache *cache, void *object) { kasan_unpoison(object, cache->object_size, false); diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index fc9169a54766..8ba0fbabd75b 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -144,12 +144,14 @@ static inline bool kasan_requires_meta(void) #define KASAN_PAGE_REDZONE 0xFE /* redzone for kmalloc_large allocation */ #define KASAN_SLAB_REDZONE 0xFC /* redzone for slab object */ #define KASAN_SLAB_FREE 0xFB /* freed slab object */ +#define KASAN_LAZY_MMU_PTE 0xFD #define KASAN_VMALLOC_INVALID 0xF8 /* inaccessible space in vmap area */ #else #define KASAN_PAGE_FREE KASAN_TAG_INVALID #define KASAN_PAGE_REDZONE KASAN_TAG_INVALID #define KASAN_SLAB_REDZONE KASAN_TAG_INVALID #define KASAN_SLAB_FREE KASAN_TAG_INVALID +#define KASAN_LAZY_MMU_PTE KASAN_TAG_INVALID #define KASAN_VMALLOC_INVALID KASAN_TAG_INVALID /* only used for SW_TAGS */ #endif =20 --=20 2.53.0 From nobody Sat Jun 20 11:48:37 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AB1E2D594F; Thu, 18 Jun 2026 14:47:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794065; cv=none; b=YXmztKdmEnna1WJ9GyM4O6FvYFUxnG0iU08IOdNyNmp4sNqeoiZqgh4YQktrwMMl7zkb6LLkxA/zx+QoZ8UXfBQZw5FlXRtZwN1g6irXdvZCKc77SnljRuErEjBC1DWEsrBCwfqhTn4LJar4R0OpbBYaILHd+5gvaF8tTzPFLWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781794065; c=relaxed/simple; bh=8PZL43A5Vmyx5qcwm9ojD0DHjG2LXKl85Hz5tytMZHk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j8FGEBYPHzTquvsQ0j5TwY4pM57ccDaNMSXz/dsVZCs6OVJ0B7CoBzBDHNqFn0/osIJRrdpi+vjXg/ZaWRMOVJjm8anewkTJsvXD4N2JH+C+lIspXh21QLlO3CYJIr9TvQucVXr8sYzTJCYx5BXdXbDdVDgSZ93z0c2njfAMKxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=iWZVFBic; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="iWZVFBic" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65IEIEv53657801; Thu, 18 Jun 2026 14:47:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=S+RYyodoiEOwvmJG/ H2wYuH2wgz201TLMBhdk5m93Lk=; b=iWZVFBicjloZSPodqN7cKPkRmi3sXc9wx jjy6zXxrZiknZCL9LKZdtnkEZxlW9SpEKjWnKLluhEAkjJ8KXQR8cGQ8uyglIjUl cxctmwhsO66wZr/EskPkHAtEAe2EQVifAxbxD993ZFOgl5Y1eVDtp+xPdtlqRlDy kw5o5L042l1ogSDtq11WjBgu1o2r25zwZHiAM7FMU5wKFgYtJ0kZ1vq3MXwUzOY3 KVZsu7D+tTn8SKBhfRq1DpiJV/d0GJzKA6lGUPX00JqIKryt9lw8JTWsW+YLNeQ8 yf987uz1Fc7f+d3GNRAoa+mDG/lxn2i/FQl4iE8WRqsz8+7vzWNyg== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4eueqw0q0x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:33 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 65IEYa6W028037; Thu, 18 Jun 2026 14:47:32 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4ev172c793-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2026 14:47:32 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 65IElTxT60883360 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 18 Jun 2026 14:47:29 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 112F320043; Thu, 18 Jun 2026 14:47:29 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E369F20040; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.87.85.9]) by smtpav07.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 18 Jun 2026 14:47:28 +0000 (GMT) Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 55669) id CDD54E0876; Thu, 18 Jun 2026 16:47:28 +0200 (CEST) From: Alexander Gordeev To: Gerald Schaefer , Heiko Carstens , Christian Borntraeger , Vasily Gorbik , Claudio Imbrenda Cc: linux-s390@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kevin Brodsky , David Hildenbrand Subject: [PATCH -next v4 4/4] s390/mm: Lazy MMU mode sanitizer Date: Thu, 18 Jun 2026 16:47:28 +0200 Message-ID: <638d47ee200c586a119ac725864f381fe75eefa9.1781789772.git.agordeev@linux.ibm.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Info: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfXzfGQm+b6A+tL wZPzdXbQpKryOp0l+bL0tci8gsXTBYj3fGfMJfDFtavSZnH/3O5qsJE7pmLysEYEWNQs+9BBkLy Qtz+d3HyBOIKXtVlDGAKWemywYxsrTA= X-Proofpoint-GUID: YaD33rHQ7-Vf9X-8qKMxI3oerY-lZbx4 X-Authority-Analysis: v=2.4 cv=bMgm5v+Z c=1 sm=1 tr=0 ts=6a340506 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=FelO9ux0wxsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=U7nrCbtTmkRpXpFmAIza:22 a=VnNF1IyMAAAA:8 a=DuHJb5c9rUSq-GGRdagA:9 X-Proofpoint-ORIG-GUID: YaD33rHQ7-Vf9X-8qKMxI3oerY-lZbx4 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjE4MDEzMyBTYWx0ZWRfXwOSYOwJA1DPF pa8tVEgcntwPG1zvVU1EtVi38rxY44clMU5PAcMWoPfIW5FvaWfHaIXIWVXWWqb/EHW8LnjGxsS Lv20QIQduoH9wVElPqSygBvqBChTMlpHTZKXaW0oJK3n3waHJC1m++8vqvuHeswNaTnRxTwIk2z G4QYZ5P1nmOb3l1XxkODk3rM22ynLuoTNgTmSj6ByGm5GE3M1GPp4kYa3qBjND9vtqhJb8H1KLH Wkh4RTHkBcQpU+PI0n+w9CjNT8yuvC/ctrBH/BHgQuIOrsojzjUPuxpGb0k9OOS/uvy/1qJqUED Vy8iER2yuGtnEtdZ1lzEmsMVBwiHVEh15Kpc/19y0G/GQg+pPQ87uquJkLeNHhwTmUCoG+GhUws C8DtSWoUvutteiAfVIujzx2GnaLradJPaveDSVdP02a+ATqbk07HUoSVlDyuTMbjlywermrdo5v wh+gB1FIgaS/KFPNdCA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-18_02,2026-06-18_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 impostorscore=0 malwarescore=0 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 phishscore=0 clxscore=1015 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606150000 definitions=main-2606180133 Content-Type: text/plain; charset="utf-8" Detect PTE entries access in lazy MMU mode by means other than set_pte() and ptep_get() primitives, which would be a read hazard. The access to kasan shadow memory from ptep_get_lockless() mistakenly hits invalid access in case a concurrent lazy MMU access to the same PTE is happening. To avoid that disable instrumentation for ptep_get_lockless() altogether. Suggested-by: Ilya Leoshkevich Signed-off-by: Alexander Gordeev --- arch/s390/include/asm/pgtable.h | 6 ++++++ arch/s390/mm/lazy_mmu.c | 27 +++++++++++++++++++++++---- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 2b6659d61fa5..a93e7e786457 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1047,6 +1047,12 @@ static inline void set_pte(pte_t *ptep, pte_t pte) __set_pte(ptep, pte); } =20 +#define ptep_get_lockless ptep_get_lockless +static inline __no_sanitize_address pte_t ptep_get_lockless(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + static inline pte_t __ptep_get(pte_t *ptep) { return READ_ONCE(*ptep); diff --git a/arch/s390/mm/lazy_mmu.c b/arch/s390/mm/lazy_mmu.c index d75b93d9b0de..ee2385897bc7 100644 --- a/arch/s390/mm/lazy_mmu.c +++ b/arch/s390/mm/lazy_mmu.c @@ -63,10 +63,13 @@ static int invalidate_pte_range(struct mm_struct *mm, u= nsigned long addr, } =20 static void set_pte_range(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t *end, pte_t *cache) + pte_t *start, pte_t *end, pte_t *cache) { - int i, nr_ptes; + int nr_ptes, nr_total =3D end - start; + pte_t *ptep =3D start; + int i; =20 + kasan_unpoison_pte(start, nr_total); while (ptep < end) { nr_ptes =3D invalidate_pte_range(mm, addr, ptep, end); =20 @@ -77,6 +80,7 @@ static void set_pte_range(struct mm_struct *mm, unsigned = long addr, =20 addr +=3D nr_ptes * PAGE_SIZE; } + kasan_poison_pte(start, nr_total); } =20 static void enter_ipte_norange(void) @@ -94,6 +98,7 @@ static void enter_ipte_range(struct mm_struct *mm, unsigned long addr, unsigned long end, pte_t *pte) { struct ipte_range *range; + unsigned int nr_ptes; =20 if (!test_facility(13)) return; @@ -105,6 +110,9 @@ static void enter_ipte_range(struct mm_struct *mm, range->base_addr =3D addr; range->base_end =3D end; range->base_pte =3D pte; + + nr_ptes =3D (range->base_end - range->base_addr) / PAGE_SIZE; + kasan_poison_pte(range->base_pte, nr_ptes); } =20 static void leave_ipte_range(void) @@ -112,6 +120,7 @@ static void leave_ipte_range(void) pte_t *ptep, *start, *start_cache, *cache; unsigned long start_addr, addr; struct ipte_range *range; + unsigned int nr_ptes; int start_idx; =20 if (!test_facility(13)) @@ -148,6 +157,9 @@ static void leave_ipte_range(void) range->end_pte =3D NULL; =20 done: + nr_ptes =3D (range->base_end - range->base_addr) / PAGE_SIZE; + kasan_unpoison_pte(range->base_pte, nr_ptes); + range->mm =3D NULL; range->base_addr =3D 0; range->base_end =3D 0; @@ -227,10 +239,17 @@ static void __ipte_range_set_pte(struct ipte_range *r= ange, pte_t *ptep, pte_t pt static pte_t __ipte_range_ptep_get(struct ipte_range *range, pte_t *ptep) { unsigned int idx =3D ptep - range->base_pte; + pte_t pte; =20 lockdep_assert_preemption_disabled(); - if (pte_val(range->cache[idx]) =3D=3D PTE_POISON) - return __ptep_get(ptep); + if (pte_val(range->cache[idx]) =3D=3D PTE_POISON) { + kasan_unpoison_pte(ptep, 1); + pte =3D __ptep_get(ptep); + kasan_poison_pte(ptep, 1); + + return pte; + } + return range->cache[idx]; } =20 --=20 2.53.0