From nobody Thu Apr 2 18:53:29 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B0B8280324 for ; Fri, 27 Mar 2026 02:14:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577660; cv=none; b=KKAarz7x3oHdrqqrAMwTuRAooQFY0QFvSo9spoPHKDz5UxIgvcmhfXZRBrHOBEOp3UpWi7M5k65L73MRUood0xdUom7WJP/i0BMjIFv2iBk/+W0IXGcxYF58WLvY3Rp8ceawB4IaVikL/IGC70XDLNtoFt3IQi28f0TLwF0eH3Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577660; c=relaxed/simple; bh=a0weMuyoTIRziCg5iagpSP3kXAOGRR06q6WMxemQMLI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jP/hZArmR5T94CcN5tHVIhbnAoRsv/HAl1kwnahXlYEZx3gZk/jTCvamGvky3PIRk8LmGnYK7JSG96EltK2tQ9rZwjMSVeeqbA9WEXyMItQYr5rpi/Oa1egVAytAdoGt33hPU05Am+XnE6isi1+PcBm9U1a43QC8EWkTYFzwgD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=AyyvqfAT; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="AyyvqfAT" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577656; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=35ZUueVWTNj5ENc9hO0kXufOV22N6/4M8lIMyGx5nWI=; b=AyyvqfATC2BCM88QS4aIPXad5Ss5lC9cHT+xptmrLSNcMLknDQOu4gCyD1sp2YYHpHTvk4 VNJ9tKHO1z4pcHLd7iTIFFjH3tGKbBu91MSXsXrZW8sVi+XZU0uHE+BRUOx22o/9NBQ71z P3xX9UxGsvxqcy4J1vMbAqdE12t9QJ4= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 01/24] mm: thp: make split_huge_pmd functions return int for error propagation Date: Thu, 26 Mar 2026 19:08:43 -0700 Message-ID: <20260327021403.214713-2-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Currently split cannot fail, but future patches will add lazy PTE page table allocation. With lazy PTE page table allocation at THP split time __split_huge_pmd() calls pte_alloc_one() which can fail if order-0 allocation cannot be satisfied. Split functions currently return void, so callers have no way to detect this failure. The PMD would remain huge, but callers assumed the split succeeded and proceeded to operate on that basis =E2=80=94 interpreting a h= uge PMD entry as a page table pointer could result in a kernel bug. Change __split_huge_pmd(), split_huge_pmd(), split_huge_pmd_if_needed() and split_huge_pmd_address() to return 0 on success (-ENOMEM on allocation failure in later patch). Convert the split_huge_pmd macro to a static inline function that propagates the return value. The return values will be handled by the callers in future commits. The CONFIG_TRANSPARENT_HUGEPAGE=3Dn stubs are changed to return 0. No behaviour change is expected with this patch. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 34 ++++++++++++++++++---------------- mm/huge_memory.c | 16 ++++++++++------ 2 files changed, 28 insertions(+), 22 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1258fa37e85b5..b081ce044c735 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -418,7 +418,7 @@ static inline int split_huge_page(struct page *page) extern struct list_lru deferred_split_lru; void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, +int __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); =20 /** @@ -447,15 +447,15 @@ static inline bool pmd_is_huge(pmd_t pmd) return false; } =20 -#define split_huge_pmd(__vma, __pmd, __address) \ - do { \ - pmd_t *____pmd =3D (__pmd); \ - if (pmd_is_huge(*____pmd)) \ - __split_huge_pmd(__vma, __pmd, __address, \ - false); \ - } while (0) +static inline int split_huge_pmd(struct vm_area_struct *vma, + pmd_t *pmd, unsigned long address) +{ + if (pmd_is_huge(*pmd)) + return __split_huge_pmd(vma, pmd, address, false); + return 0; +} =20 -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long addr= ess, +int split_huge_pmd_address(struct vm_area_struct *vma, unsigned long addre= ss, bool freeze); =20 void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, @@ -649,13 +649,15 @@ static inline int try_folio_split_to_order(struct fol= io *folio, } =20 static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} -#define split_huge_pmd(__vma, __pmd, __address) \ - do { } while (0) - -static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze) {} -static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze) {} +static inline int split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address) +{ + return 0; +} +static inline int __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze) { return 0; } +static inline int split_huge_pmd_address(struct vm_area_struct *vma, + unsigned long address, bool freeze) { return 0; } static inline void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze) {} diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b2a6060b3c202..976a1c74c0870 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3283,7 +3283,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma= , unsigned long address, __split_huge_pmd_locked(vma, pmd, address, freeze); } =20 -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, +int __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze) { spinlock_t *ptl; @@ -3297,20 +3297,22 @@ void __split_huge_pmd(struct vm_area_struct *vma, p= md_t *pmd, split_huge_pmd_locked(vma, range.start, pmd, freeze); spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); + + return 0; } =20 -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long addr= ess, +int split_huge_pmd_address(struct vm_area_struct *vma, unsigned long addre= ss, bool freeze) { pmd_t *pmd =3D mm_find_pmd(vma->vm_mm, address); =20 if (!pmd) - return; + return 0; =20 - __split_huge_pmd(vma, pmd, address, freeze); + return __split_huge_pmd(vma, pmd, address, freeze); } =20 -static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma, un= signed long address) +static inline int split_huge_pmd_if_needed(struct vm_area_struct *vma, uns= igned long address) { /* * If the new address isn't hpage aligned and it could previously @@ -3319,7 +3321,9 @@ static inline void split_huge_pmd_if_needed(struct vm= _area_struct *vma, unsigned if (!IS_ALIGNED(address, HPAGE_PMD_SIZE) && range_in_vma(vma, ALIGN_DOWN(address, HPAGE_PMD_SIZE), ALIGN(address, HPAGE_PMD_SIZE))) - split_huge_pmd_address(vma, address, false); + return split_huge_pmd_address(vma, address, false); + + return 0; } =20 void vma_adjust_trans_huge(struct vm_area_struct *vma, --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B68E4188596 for ; Fri, 27 Mar 2026 02:14:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577665; cv=none; b=h0B2p5UNJpZ+etAI8VQlSbVSr9K8M5np9+Q7EvzTvpeJMDhsdO3Y3s9Sg6qpWO6kyQFmGzEEH/MdZPUAlLrIMH1df+H5FJdZV49YaKOKEhRmWRDrSNRrX/XB3SdiIMhd2yDiH9Z6wvdGiOTqxOUtI6nKw9qbz++ePOC7oRshW4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577665; c=relaxed/simple; bh=qw5vVm6Ga5hdedCmxX+ZaUBhH3obR6pIqeEo5qpXhgU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z8Ba7jgaX1Vez6O5sCaIxQ26z6vfQnRIxp35g46wspKFBWXhQ9I+8MDh+s4YVMwqxU5MYnfUczl2TUmoDIyqNpc0ujFLuKfwPSxZpJMf7ZNaE/+p+KETiz59t/xft0hsxEhT3lJxH8s9sYCDYfuuN8ZI5spMiMuHu1ldecEAOW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=JXh0qus4; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="JXh0qus4" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577661; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q6STvr62iSGyD74aKGWAYxsY9Nv9L3Kw5kG6Q78YS0c=; b=JXh0qus4OUnHnUJQMBwIISIQ03+Y70d5O+Day5tjDbyx3pzwEpYu0lwJjjVphY24TjxxMa 1hFfAyuk5rjIO0pLg3gZ3xFgFBzeuJbGE+WXFJEXpCI6xGIZu5xY6u0fIMlGDpXIyghrV9 9AA/ClvozRfD2U72HJ06fpVqrLPPqR0= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 02/24] mm: thp: propagate split failure from vma_adjust_trans_huge() Date: Thu, 26 Mar 2026 19:08:44 -0700 Message-ID: <20260327021403.214713-3-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" With lazy PTE page table allocation, split_huge_pmd_if_needed() and thus vma_adjust_trans_huge() can now fail if order-0 allocation for pagetable fails when trying to split. It is important to check if this failure occurred to prevent a huge PMD straddling at VMA boundary. The vma_adjust_trans_huge() call is moved before vma_prepare() in all three callers (__split_vma, vma_shrink, commit_merge). Previously it sat between vma_prepare() and vma_complete(), where there is no mechanism to abort - once vma_prepare() has been called, we must reach vma_complete(). By moving the call earlier, a split failure can return -ENOMEM cleanly without needing to undo VMA preparation. This move is safe because vma_adjust_trans_huge() acquires its own pmd_lock() internally and does not depend on any locks or state changes from vma_prepare(). The VMA boundaries are also unchanged at the new call site, satisfying __split_huge_pmd_locked()'s requirement that the VMA covers the full PMD range. All 3 callers (__split_vma, vma_shrink, commit_merge) already return -ENOMEM if there are allocation failures for other reasons (failure in vma_iter_prealloc for example), this follows the same pattern. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 13 ++++++----- mm/huge_memory.c | 21 +++++++++++++----- mm/vma.c | 37 +++++++++++++++++++++---------- tools/testing/vma/include/stubs.h | 9 ++++---- 4 files changed, 53 insertions(+), 27 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b081ce044c735..224965fce4e66 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -483,8 +483,8 @@ int hugepage_madvise(struct vm_area_struct *vma, vm_fla= gs_t *vm_flags, int advice); int madvise_collapse(struct vm_area_struct *vma, unsigned long start, unsigned long end, bool *lock_dropped); -void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct vm_area_struct *next); +int vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, + unsigned long end, struct vm_area_struct *next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); =20 @@ -685,11 +685,12 @@ static inline int madvise_collapse(struct vm_area_str= uct *vma, return -EINVAL; } =20 -static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, - unsigned long start, - unsigned long end, - struct vm_area_struct *next) +static inline int vma_adjust_trans_huge(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + struct vm_area_struct *next) { + return 0; } static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 976a1c74c0870..99f3b8b24c682 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3326,20 +3326,31 @@ static inline int split_huge_pmd_if_needed(struct v= m_area_struct *vma, unsigned return 0; } =20 -void vma_adjust_trans_huge(struct vm_area_struct *vma, +int vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct vm_area_struct *next) { + int err; + /* Check if we need to split start first. */ - split_huge_pmd_if_needed(vma, start); + err =3D split_huge_pmd_if_needed(vma, start); + if (err) + return err; =20 /* Check if we need to split end next. */ - split_huge_pmd_if_needed(vma, end); + err =3D split_huge_pmd_if_needed(vma, end); + if (err) + return err; =20 /* If we're incrementing next->vm_start, we might need to split it. */ - if (next) - split_huge_pmd_if_needed(next, end); + if (next) { + err =3D split_huge_pmd_if_needed(next, end); + if (err) + return err; + } + + return 0; } =20 static void unmap_folio(struct folio *folio) diff --git a/mm/vma.c b/mm/vma.c index a43f3c5d4b3dd..b4a3839a8036e 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -513,6 +513,15 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_s= truct *vma, return err; } =20 + /* + * Split any THP straddling the split boundary before splitting + * the VMA itself. Do this before vma_prepare() so we can + * cleanly fail without undoing VMA preparation. + */ + err =3D vma_adjust_trans_huge(vma, vma->vm_start, addr, NULL); + if (err) + return err; + new =3D vm_area_dup(vma); if (!new) return -ENOMEM; @@ -550,11 +559,6 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_s= truct *vma, vp.insert =3D new; vma_prepare(&vp); =20 - /* - * Get rid of huge pages and shared page tables straddling the split - * boundary. - */ - vma_adjust_trans_huge(vma, vma->vm_start, addr, NULL); if (is_vm_hugetlb_page(vma)) hugetlb_split(vma, addr); =20 @@ -732,6 +736,7 @@ static int commit_merge(struct vma_merge_struct *vmg) { struct vm_area_struct *vma; struct vma_prepare vp; + int err; =20 if (vmg->__adjust_next_start) { /* We manipulate middle and adjust next, which is the target. */ @@ -743,6 +748,16 @@ static int commit_merge(struct vma_merge_struct *vmg) vma_iter_config(vmg->vmi, vmg->start, vmg->end); } =20 + /* + * THP pages may need to do additional splits if we increase + * middle->vm_start. Do this before vma_prepare() so we can + * cleanly fail without undoing VMA preparation. + */ + err =3D vma_adjust_trans_huge(vma, vmg->start, vmg->end, + vmg->__adjust_middle_start ? vmg->middle : NULL); + if (err) + return err; + init_multi_vma_prep(&vp, vma, vmg); =20 /* @@ -755,12 +770,6 @@ static int commit_merge(struct vma_merge_struct *vmg) return -ENOMEM; =20 vma_prepare(&vp); - /* - * THP pages may need to do additional splits if we increase - * middle->vm_start. - */ - vma_adjust_trans_huge(vma, vmg->start, vmg->end, - vmg->__adjust_middle_start ? vmg->middle : NULL); vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff); vmg_adjust_set_range(vmg); vma_iter_store_overwrite(vmg->vmi, vmg->target); @@ -1248,9 +1257,14 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_a= rea_struct *vma, unsigned long start, unsigned long end, pgoff_t pgoff) { struct vma_prepare vp; + int err; =20 WARN_ON((vma->vm_start !=3D start) && (vma->vm_end !=3D end)); =20 + err =3D vma_adjust_trans_huge(vma, start, end, NULL); + if (err) + return err; + if (vma->vm_start < start) vma_iter_config(vmi, vma->vm_start, start); else @@ -1263,7 +1277,6 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_ar= ea_struct *vma, =20 init_vma_prep(&vp, vma); vma_prepare(&vp); - vma_adjust_trans_huge(vma, start, end, NULL); =20 vma_iter_clear(vmi); vma_set_range(vma, start, end, pgoff); diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/= stubs.h index a30b8bc849557..952e3cc88ef10 100644 --- a/tools/testing/vma/include/stubs.h +++ b/tools/testing/vma/include/stubs.h @@ -419,11 +419,12 @@ static inline int vma_dup_policy(struct vm_area_struc= t *src, struct vm_area_stru return 0; } =20 -static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, - unsigned long start, - unsigned long end, - struct vm_area_struct *next) +static inline int vma_adjust_trans_huge(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + struct vm_area_struct *next) { + return 0; } =20 static inline void hugetlb_split(struct vm_area_struct *, unsigned long) {} --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1C26329367 for ; Fri, 27 Mar 2026 02:14:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577673; cv=none; b=B3X5ANkX0Kim0uxDya0z6ly0Fgjftpq9i7PDIHljWZzzaq5W5GJZIGKKs2kSg5xPZ9ql2QZDNHwX1BRUG7pZA3aBKELx5iQrBB7zT3380Zemj+Cey0Ul9aYPpMd5eQkuSzIzulnEnXQmhnC32bkzQGTOS/bJt+V1rU8eaTsqEOo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577673; c=relaxed/simple; bh=hoI8HXVgTW3QKeCmiQ8BY+fI6mXjuan+bCtnpnDK8I8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mh/dLJIPZyv3eTTDNQuLxdy09Y3i+Tu00UV017xhR2YwIhAdY71oGghm71vIeJUnodJwd+/9HR8K91h3cEhVFGUy1Ag1xfhu79OhlupEzJzWubLkimbtqyGk47lRbmGEku8SIihmEjWkRfRRbBaeLXhX+Xcd8VsHpMsioXjP5gs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=lWTo7Sxh; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="lWTo7Sxh" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HoJOR/NIUvMGOPs6vSvgMHM3cn3cr848MTO9YNsr9oY=; b=lWTo7SxhC2/el6n5yFl/mHGjsk9IDwHTkAjGiFljw5zNKe4kG/HDpe+AsO9ZpB7Xn6yD+M lcl/hkBeI86GVj/xnEw8geiYoIXVZyBgQnR/YjKR6jzAP1uj+D4R7LeXEfle4CWzYnGNLi 3qFtTl8j3HQslYlUdM8vuHjTd+DqRMA= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 03/24] mm: thp: handle split failure in copy_huge_pmd() Date: Thu, 26 Mar 2026 19:08:45 -0700 Message-ID: <20260327021403.214713-4-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" copy_huge_pmd() splits the source PMD when a folio is pinned and can't be COW-shared at PMD granularity. It then returns -EAGAIN so copy_pmd_range() falls through to copy_pte_range(). If the split fails, the PMD is still huge. Returning -EAGAIN would cause copy_pmd_range() to call copy_pte_range(), which would dereference the huge PMD entry as if it were a pointer to a PTE page table. Return -ENOMEM on split failure instead (which is already done in copy_huge_pmd() if pte_alloc_one() fails), which causes copy_page_range() to abort the fork with -ENOMEM, similar to how copy_pmd_range() would be aborted if pmd_alloc() and copy_pte_range() fail. Signed-off-by: Usama Arif --- mm/huge_memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 99f3b8b24c682..8ad43897bdf80 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1913,7 +1913,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, pte_free(dst_mm, pgtable); spin_unlock(src_ptl); spin_unlock(dst_ptl); - __split_huge_pmd(src_vma, src_pmd, addr, false); + /* + * If split fails, the PMD is still huge so copy_pte_range + * (via -EAGAIN) would misinterpret it as a page table + * pointer. Return -ENOMEM directly to copy_pmd_range. + */ + if (__split_huge_pmd(src_vma, src_pmd, addr, false)) + return -ENOMEM; return -EAGAIN; } add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AB52280324 for ; Fri, 27 Mar 2026 02:14:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577678; cv=none; b=Daqg2QSJXorA64g2RMrzFx71EnsEHKHBl7xvIWQWS5I9VqPGeiiyhuzhYdlf6UTco9P+HvZN2D9+0PMOfI6W088vSIMuuB5GQKSf+wQUVjoz2QVk8sPYKBgf5bWVQFS4V6qjCaRyFxtIPB6eWtENwNM31ytbO4jWTB2v+PdkZ/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577678; c=relaxed/simple; bh=mdu1gHWmP4lEKNPX7Z6IBetlaX3ougDSSv4qLlbvfAs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IPwFtQTgXu5OAVgsNZtlQbik8laY4N0BfIQACWUjFdaBWkWGnO4SaQ3Cf0Oz1Zt4y9pfe3gQQxpcHMWntEitKD8VtKtDbvGRNrtz9m/XuJ4HlnpR9FYwuSMEpqf+op1RuTgiYe5CZF8IS46hWpAPw55jVEHz8S19JRrpg1D1tnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=KJYNK18n; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="KJYNK18n" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577674; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AsidUyaFSU2OL+6pIevfNbim9vED85JJGBowf2ctQP8=; b=KJYNK18nvflazGbUsGQzniopI3KPVpz1LShSWjjSbT+hxFnh/0tpkw+tfNe3j+MVojye1j wdQPu4/bxNWWVoy1HF/DFPdbf1u5jrpcQiHNSPWDNk1zHFhU0vPcRAGI2daPSXBje5u6Yr R2eLlqWUdaQ4fQFe4gaiAaU+oQXNXZY= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 04/24] mm: thp: handle split failure in do_huge_pmd_wp_page() Date: Thu, 26 Mar 2026 19:08:46 -0700 Message-ID: <20260327021403.214713-5-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT do_huge_pmd_wp_page() splits the PMD when a COW of the entire huge page fails (e.g., can't allocate a new folio or the folio is pinned). It then returns VM_FAULT_FALLBACK so the fault can be retried at PTE granularity. If the split fails, the PMD is still huge. Returning VM_FAULT_FALLBACK would re-enter the PTE fault path, which expects a PTE page table at the PMD entry =E2=80=94 not a huge PMD. Return VM_FAULT_OOM on split failure, which signals the fault handler to invoke the OOM killer or return -ENOMEM to userspace. Signed-off-by: Usama Arif --- mm/huge_memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8ad43897bdf80..9f4be707c8cb0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2137,7 +2137,13 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) folio_unlock(folio); spin_unlock(vmf->ptl); fallback: - __split_huge_pmd(vma, vmf->pmd, vmf->address, false); + /* + * Split failure means the PMD is still huge; returning + * VM_FAULT_FALLBACK would re-enter the PTE path with a + * huge PMD, causing incorrect behavior. + */ + if (__split_huge_pmd(vma, vmf->pmd, vmf->address, false)) + return VM_FAULT_OOM; return VM_FAULT_FALLBACK; } =20 --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1102329367 for ; Fri, 27 Mar 2026 02:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577686; cv=none; b=Hi1y+21QWgeBxRAg1nAX+CJpyRngfiVibDsyuEup7CI/Uu6d97mvMtCYI1pVFuPyzMUYSSidUBWIuCqKBtFucEZruLRm92zCLU3a3nJpnECwixq1FjbP77xH57EcOK9JW5+pXKadHOlpEiZhPwOBQ4RJBRA/+hvC7GR5C1I7EB4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577686; c=relaxed/simple; bh=zKlP5pVKPnTMhZ3fH6jP6aRAXsG4BYpLMfDDzJG9YKc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LAjCEJdluw5/tiuoQWu+M9CWlg+xmb6W+kxtwIvIOkkMb8SILum1C0kpMgEoVJiVxciKhb4M0bmmJO/gWpnpX+0P/CZmxf8mOnb7YYnTONH0aA0OJJjYRn8eiEgmzyNv51o48jfcfLjE+TK2WBUZoymYMl+p+UNMSb1f0B5VMXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=HOqb9c3q; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="HOqb9c3q" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sLE/jttxUNtNWJGXfdbfQ0fLOQxN3ImdDTn6kFnTkPI=; b=HOqb9c3qtURSYEfsDks7zbx2QDtDTH9l8hI6RIZbNZVZjMBH564Y3pW5ZPIcsI8NKoiThq EoKp0vqkwqrK6t8HL7D5u0ey61UVlkvaX3oR+haZEDWM9M+SC4Ti9llRE18ryT8ip3fCx4 H6PgMQ7rNNkpgtUJP6NJzF9wyR2yris= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 05/24] mm: thp: handle split failure in zap_pmd_range() Date: Thu, 26 Mar 2026 19:08:47 -0700 Message-ID: <20260327021403.214713-6-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" zap_pmd_range() splits a huge PMD when the zap range doesn't cover the full PMD (partial unmap). If the split fails, the PMD stays huge. Falling through to zap_pte_range() would dereference the huge PMD entry as a PTE page table pointer. Skip the range covered by the PMD on split failure instead. The skip is safe across all call paths into zap_pmd_range(): - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so every PMD is fully covered (next - addr =3D=3D HPAGE_PMD_SIZE). The zap_huge_pmd() branch handles these without splitting. The split failure path is unreachable. - munmap / mmap overlay: vma_adjust_trans_huge() (called from __split_vma) splits any PMD straddling the VMA boundary before the VMA is split. If that PMD split fails, __split_vma() returns -ENOMEM and the munmap is aborted before reaching zap_pmd_range(). The split failure path is unreachable. - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it. The pages remain valid and accessible. A subsequent access returns existing data without faulting. Signed-off-by: Usama Arif --- mm/memory.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e44469f9cf659..caf97c48cb166 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1985,9 +1985,18 @@ static inline unsigned long zap_pmd_range(struct mmu= _gather *tlb, do { next =3D pmd_addr_end(addr, end); if (pmd_is_huge(*pmd)) { - if (next - addr !=3D HPAGE_PMD_SIZE) - __split_huge_pmd(vma, pmd, addr, false); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) { + if (next - addr !=3D HPAGE_PMD_SIZE) { + /* + * If split fails, the PMD stays huge. + * Skip the range to avoid falling through + * to zap_pte_range, which would treat the + * huge PMD entry as a page table pointer. + */ + if (__split_huge_pmd(vma, pmd, addr, false)) { + addr =3D next; + continue; + } + } else if (zap_huge_pmd(tlb, vma, pmd, addr)) { addr =3D next; continue; } --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 404FA329C48 for ; Fri, 27 Mar 2026 02:14:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577691; cv=none; b=TSjpA41Rg/yRKI97ZOrwaPe3eS5iUxPbeSh60DZ8XFoYIgfVmEI012p1hGFXak2MUoCn1TbfYIAn8m6Ro21L+WuOWdkwu1v34kWiE14rehdjKIcMLKMxZF9JndJ28YZJpObRvk6XkBLqE/IiWdQwuPTakoBHNPsYMFWKRBdOpZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577691; c=relaxed/simple; bh=fxX/sueyUOeg6w1sOy3tLvYGW+Jdk7TL6TnQMxgHOpo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Pr03F2XbfrgWGhuwlk/fCEIGeBLFKskfMWmHSTChO37V5mt/Z0utSwCm1yleCccs4CMHOitzN9gF2lkap2P0J2Gr55IXswblku5KXF1+wPbPO3jPp0QPkHJ0DI+DvkRFNu3qUUwNrmv0v2zuBGjV9hjGl0uYhdJBU+RM7HdPvME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=qCDvFHls; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="qCDvFHls" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577688; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aCI7ssiILkMHWnNEWRqeaTC8kADA5FVp8TNFAvIRw20=; b=qCDvFHlsrwn4BvHyl/OogfK+SU4PjAk/orshzs5oMnmbbYqMdO8VxcJG7SBrE5JnvZqhvW kHDxMSY4i21WqlUHUenbjsbHJeQSPwrgThPtMRU39TVbMcqqZ2CIv39avji+vUki/1rSj8 MrVPi7UsYpHW/i7zFvcn0mt99gsGNQA= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 06/24] mm: thp: handle split failure in wp_huge_pmd() Date: Thu, 26 Mar 2026 19:08:48 -0700 Message-ID: <20260327021403.214713-7-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" wp_huge_pmd() splits the PMD when COW or write-notify must be handled at PTE level (e.g., shared/file VMAs, userfaultfd). It then returns VM_FAULT_FALLBACK so the fault handler retries at PTE granularity. If the split fails, the PMD is still huge. The PTE fault path cannot handle a huge PMD entry. Return VM_FAULT_OOM on split failure, which signals the fault handler to invoke the OOM killer or return -ENOMEM to userspace. This is similar to what __handle_mm_fault would do if p4d_alloc or pud_alloc fails. Signed-off-by: Usama Arif --- mm/memory.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index caf97c48cb166..b99ec3ffc18d1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6328,8 +6328,13 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault= *vmf) } =20 split: - /* COW or write-notify handled on pte level: split pmd. */ - __split_huge_pmd(vma, vmf->pmd, vmf->address, false); + /* + * COW or write-notify handled on pte level: split pmd. + * If split fails, the PMD is still huge so falling back + * to PTE handling would be incorrect. + */ + if (__split_huge_pmd(vma, vmf->pmd, vmf->address, false)) + return VM_FAULT_OOM; =20 return VM_FAULT_FALLBACK; } --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0056832E13B for ; Fri, 27 Mar 2026 02:14:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577701; cv=none; b=dHlv/KQe1xkQO9d1aue0MRu/lP22jmB3Kgt2T5MnA7+lA8k8j7yAPZe5IaHFKrKSIbiTmL881S9mvBv3Jw+wUG00u10qbSsU7aG3nzpIkCw3210yMYK9BqYx/irPC6KAzaVE4DOQItHF3yy2oXjV7t5isVoagLDqoaObJJ6n/hY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577701; c=relaxed/simple; bh=LXOvTkYyflxaR8Vr531g4vNHVKKvwGdDKMAogSQPWj4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IVuweFVyzK4zcdr+zqsvsCduzLWIOPogTIJag1FAJRjICMf0TL0OuLFm9uRjtG6l6NowWGoO1X9GyRNcGPqyQ17a987/cQOK0/as/vtG74ac8oC9PCYuH4TCiRIfDprJJRaEFfqsdEOFJjkuFKmogiWJ4+05o0lmjfMuRFCzjL0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=u0/m9ASD; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="u0/m9ASD" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YEV2JraEcWrcU0ZBvWVbflxwtxhTRk4GT952CPZuA50=; b=u0/m9ASDIyRTjYzRS9sw7YydNLKfppOx0RMOs35/i/y5Ni21Cb1jtxqZNy/yP+Kqi/QxXC k6dOK7+inks9R2SF5uInEMSGMi6RkUSEsV/nxpnQv+EtYjG/nOmRguIxK19Cw1btj+1eHv O/E1Nq01pdgTuk7/bByUiSP3Dm/a2Jo= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 07/24] mm: thp: retry on split failure in change_pmd_range() Date: Thu, 26 Mar 2026 19:08:49 -0700 Message-ID: <20260327021403.214713-8-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT change_pmd_range() splits a huge PMD when mprotect() targets a sub-PMD range or when VMA flags require per-PTE protection bits that can't be represented at PMD granularity. If pte_alloc_one() fails inside __split_huge_pmd(), the huge PMD remains intact. Without this change, change_pte_range() would return -EAGAIN because pte_offset_map_lock() returns NULL for a huge PMD, sending the code back to the 'again' label to retry the split=E2=80=94without ever call= ing cond_resched(). Now that __split_huge_pmd() returns an error code, handle it explicitly: yield the CPU with cond_resched() and retry via goto again, giving other tasks a chance to free memory. Trying to return an error all the way to change_protection_range would not work as it would leave a memory range with new protections, and others unchanged, with no easy way to roll back the already modified entries (and previous splits). __split_huge_pmd only requires an order-0 allocation and is extremely unlikely to fail. Signed-off-by: Usama Arif --- mm/mprotect.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 110d47a36d4bb..e39e96963da8b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -477,7 +477,16 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, if (pmd_is_huge(_pmd)) { if ((next - addr !=3D HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { - __split_huge_pmd(vma, pmd, addr, false); + ret =3D __split_huge_pmd(vma, pmd, addr, false); + if (ret) { + /* + * Yield and retry. Other tasks + * may free memory while we + * reschedule. + */ + cond_resched(); + goto again; + } /* * For file-backed, the pmd could have been * cleared; make sure pmd populated if --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4D302F39D7 for ; Fri, 27 Mar 2026 02:15:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577706; cv=none; b=JeScKIyTl0h0T5REu5xGQc6K9EHONMkz0ghwNKP+PhUUZKrLB5AlZRGZcDvmF4StK0Ois7F2MwJhmEk4OCz5zjSmrSIufmm7XspJyBh3EcAEwvT1EY4onP1gVLuccU4FJ3GfCxjiNz43G/1MGzr4n1bVxuq7YejzY2pAI3SI13w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577706; c=relaxed/simple; bh=s0WP9xhE1CTtNKRy6LtQcqflof/ya4sCmhE4hMDDEWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CI2YicaxrU25uNAJ6ay5jS4xj9xJS36FN8epKgqDq7CO6rP1C6C2lUYLbvg0HMZfhEWYW5nrv6y5SwIA3S7Q5GCxLapLLy6sBHTd8Xj7s0xWs4Sud87d6AH7UeRyXfY/CM4K09xAVJAZvESWz+TnpFG7OeyWgtovaKto+HSXB1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=c+UwEtgW; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="c+UwEtgW" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=giSfT9S4Y3Fs6MBCHHOkpgsjb4flU8mTLfNnlsB/wxU=; b=c+UwEtgWwyc5ExZqiso+pcfyt+iKbZ1QmoTTRB8NhGDZKpUwyzQzLx5Oi6hXFvtdq5ago3 ODRA0nas4eR1mjjEDHyY5MY9At3yhrqG1QJr6rpr5PRO3un578DMuu2R03YOspCxOxZ+jD EPix5NId6vnIBtCHcU6MJtVrfHvraLA= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 08/24] mm: thp: handle split failure in follow_pmd_mask() Date: Thu, 26 Mar 2026 19:08:50 -0700 Message-ID: <20260327021403.214713-9-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" follow_pmd_mask() splits a huge PMD when FOLL_SPLIT_PMD is set, so GUP can pin individual pages at PTE granularity. If the split fails, the PMD is still huge and follow_page_pte() cannot process it. Return ERR_PTR(-ENOMEM) on split failure, which causes the GUP caller to get -ENOMEM. -ENOMEM is already returned in follow_pmd_mask if pte_alloc_one fails (which is the reason why split_huge_pmd could fail), hence this is a safe change. Signed-off-by: Usama Arif --- mm/gup.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index ad9ded39609cb..07c6b0483c322 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -928,8 +928,16 @@ static struct page *follow_pmd_mask(struct vm_area_str= uct *vma, return follow_page_pte(vma, address, pmd, flags); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { + int ret; + spin_unlock(ptl); - split_huge_pmd(vma, pmd, address); + /* + * If split fails, the PMD is still huge and + * we cannot proceed to follow_page_pte. + */ + ret =3D split_huge_pmd(vma, pmd, address); + if (ret) + return ERR_PTR(ret); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : follow_page_pte(vma, address, pmd, flags); --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 461A53101B2 for ; Fri, 27 Mar 2026 02:15:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577714; cv=none; b=M4iN8zKXuYsVSbvnavZFn55mnY2ADefxBNjfCES9JANsUwYn9d+vim8xCNwv2EMYGi2EfkOX6fv4JiHq6dUPuze/zJBruk3CavlAOdcM509If4I9j9b79x2NAoGACMCRFaVDSewvLpIYJC4OkEoKk7uze4prRJLhul+NmtCm7lQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577714; c=relaxed/simple; bh=RQLz9tyjrzBwJZqdbpdeWQwAnJH6wxMRNSREyyRf+3o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Keme8uMDY2algQ41//nsO4xeeqpT9mu8p6Xq+OkILpT0WotJu9Ubg/m3xE3Q2WEj7x6osWomDEcLsdP2t8IKR3XSy3d86HGRAuBvFCHtMUske/W0WVo84eq4L96ROS967JcIy7eB9qPu3Pb3oT7xF0W6rvphoMbtV5yxHqfISq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=qUprQn+Y; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="qUprQn+Y" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577711; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dfNWz2MwM4ZY6bFGmoYrNTnpJRQZnNwj/uOrRkiYL5M=; b=qUprQn+YADvN/9XX6sWYfNa/kg5jNuE3cirjiQ/YhUk5tRCUEKizV7YGDnhu+O6mL2/RmH oTF5aOueObHp18z9I+INhJ7C79JF930GoJirPSOE8DqOuuEiuDjMjZKFEcq6mi9dt+yWRn geRfame3sIEEpbw6/P4+nyJYGxERCHk= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 09/24] mm: handle walk_page_range() failure from THP split Date: Thu, 26 Mar 2026 19:08:51 -0700 Message-ID: <20260327021403.214713-10-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" walk_pmd_range() splits a huge PMD when a page table walker with pte_entry or install_pte callbacks needs PTE-level granularity. If the split fails due to memory allocation failure in pte_alloc_one(), walk_pte_range() would encounter a huge PMD instead of a PTE page table. Break out of the loop on split failure and return -ENOMEM to the walker's caller. Callers that reach this path (those with pte_entry or install_pte set) such as mincore, hmm_range_fault and queue_pages_range already handle negative return values from walk_page_range(). Similar approach is taken when __pte_alloc() fails in walk_pmd_range(). Signed-off-by: Usama Arif --- mm/pagewalk.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 3ae2586ff45b4..c5850de71b8cb 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -165,9 +165,11 @@ static int walk_pmd_range(pud_t *pud, unsigned long ad= dr, unsigned long end, continue; } =20 - if (walk->vma) - split_huge_pmd(walk->vma, pmd, addr); - else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) + if (walk->vma) { + err =3D split_huge_pmd(walk->vma, pmd, addr); + if (err) + break; + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) continue; /* Nothing to do. */ =20 err =3D walk_pte_range(pmd, addr, next, walk); --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9084F29ACFC for ; Fri, 27 Mar 2026 02:15:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577720; cv=none; b=nkavZjy6VM8E3MJqiCpUBQr1NThi5+Vb7FUm8LB0BlUr4aVFEUHbZTnDCusknGa7ZQOyInyPlrvCd0+Q/Wre6B28bMnHrM0exoacohM6RLQfZbQ8jh9FmiS3ydePv5+mESnnnME09s//uSPfSPf8RYLWxt6RypUmnMk7WZGM1GQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577720; c=relaxed/simple; bh=4VB9QOrr8toNuf8HCW3jsSPSxVkjkM7tAv9dxR4tFDo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ro6dhnFYgrFmrdsQlIXo/k2EPnoa2eQ7XKsvIH+fEpfbhmrZF37FPlZIGgZiXyyp0cGJ+9B6BWLDJka4jd5zgdqekDV9yttm4SGAwn16IMrAt4DBKCjif+neVcbk9MkNDDt9s1Gy12ACCDxi4f7zsM2SA/ErPTX4xPLmQZnZFfc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=NTJcN7VC; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="NTJcN7VC" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5OmEvrhLT+taWsJJPLmmWMkCHrhCtqMj0PAwSXwnE8E=; b=NTJcN7VCFrbbYw8QV6DmQtslK9s9YNMNuM3BLeJdPk9ev+oaHPxhCblvvx8mF+jryQcQ/A wg7YpJu4lEnBQjk0VtVknCzS7c33o5merk9llHuNz4qXtYAm5HOn3BSwWdYd/7vyZ/q4XM aKuuq0sNgUbjhFWLQf7XvGWk7bo3EeE= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 10/24] mm: thp: handle split failure in mremap move_page_tables() Date: Thu, 26 Mar 2026 19:08:52 -0700 Message-ID: <20260327021403.214713-11-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" move_page_tables() splits a huge PMD when the extent is smaller than HPAGE_PMD_SIZE and the PMD can't be moved at PMD granularity. If the split fails, the PMD stays huge and move_ptes() can't operate on individual PTEs. Break out of the loop on split failure, which causes mremap() to return however much was moved so far (partial move). This is consistent with other allocation failures in the same loop (e.g., alloc_new_pmd(), pte_alloc()). Signed-off-by: Usama Arif --- mm/mremap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832b..2f70cb48f6061 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -855,7 +855,13 @@ unsigned long move_page_tables(struct pagetable_move_c= ontrol *pmc) if (extent =3D=3D HPAGE_PMD_SIZE && move_pgt_entry(pmc, HPAGE_PMD, old_pmd, new_pmd)) continue; - split_huge_pmd(pmc->old, old_pmd, pmc->old_addr); + /* + * If split fails, the PMD stays huge and move_ptes + * can't operate on it. Break out so the caller + * can handle the partial move. + */ + if (split_huge_pmd(pmc->old, old_pmd, pmc->old_addr)) + break; } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && extent =3D=3D PMD_SIZE) { /* --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D21733290C5 for ; Fri, 27 Mar 2026 02:15:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577726; cv=none; b=RBRN1Y4TZCka5XFzEDJetEnxTMQh3OJboJ3Xmgq/2/L/2De4ltfa0APVmEZNxbkA5jRDJhn4IwxvDHcwygWC0vGzdU9+VkDrx7MjbdLaHlO/kEqhGq9+g2UzoXmze7eitVwRpWfwB2KfjbsD230wd1ycSERdtQeHhqY6i3TLuLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577726; c=relaxed/simple; bh=ECEG/Uw4ckEOnjxk0ggFgRuoQFR0FzH4I0uSFBXT4mA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iAmA3pkjOsaIjo/Sem6cXHlxp/4GseHKAQkatJrQNVNDEHSQM/7rH2hNWRYsYJYCuGW9beCuppJa7jpBUSTDU4okdeLamam4dTvBIMDCBPjZMg/dHA9unTqgGT+pyfKwrqXBWVSTR9D3Sah2s+vPuES6hy/irptsf77r3BvK+K8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ZIxjF05t; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ZIxjF05t" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577723; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MnIHRq/fqfQAry7CsjH3C0Gk1Kzxzc2ZoqIQPSTrMd8=; b=ZIxjF05tNow9dzUHFs4hqWlnAIWjdc/ZiKkgAVcRdwnfvfmpyyBGO5uiLBm/6z5QMsmkqr ztEHZPBtu0iiXBGTFC271TxvM49AQMco70eXk+ARnt2wNg+T9bIFsUICK9N4z8qczvq+p2 or1an7NL4GSoQ5bKiLd41MfIKi+EWcU= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 11/24] mm: thp: handle split failure in userfaultfd move_pages() Date: Thu, 26 Mar 2026 19:08:53 -0700 Message-ID: <20260327021403.214713-12-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The UFFDIO_MOVE ioctl's move_pages() loop splits a huge PMD when the folio is pinned and can't be moved at PMD granularity. If the split fails, the PMD stays huge and move_pages_pte() can't process individual pages. Break out of the loop on split failure and return -ENOMEM to the caller. This is similar to how other allocation failures (__pte_alloc, mm_alloc_pmd) are handled in move_pages(). Signed-off-by: Usama Arif --- mm/userfaultfd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 481ec7eb44420..a04d62dd1e065 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1946,7 +1946,13 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsi= gned long dst_start, } =20 spin_unlock(ptl); - split_huge_pmd(src_vma, src_pmd, src_addr); + /* + * If split fails, the PMD stays huge and + * move_pages_pte can't process it. + */ + err =3D split_huge_pmd(src_vma, src_pmd, src_addr); + if (err) + break; /* The folio will be split by move_pages_pte() */ continue; } --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29598314B6E for ; Fri, 27 Mar 2026 02:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577731; cv=none; b=BZKEjmQRkC6/kELF58JzMMnborJSILI3PuOFjpjI93TiUkWbMRETkrRcKYTbb/nQLz/H706oCwDTByKBUOfFz9pEMetaWB9YYAosbJxQhqzCrDR6thTgAAFMUJrs9Ks2/b6F0nBrLVcBzJiKbE7Q0+/FIXz59oP/sCqmPf8hRdY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577731; c=relaxed/simple; bh=+F5vSD16AY/+SU23Bcz2sZiOlpNz1cHNMnvnF8SdP0k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KmzEBLNqjj6SK9YJwDGwIE9BWLaLVps5OU2EtzqfEnCChUsA70H75tJEfsgh2Wimn6rKd7uCIncAZ7kw5EylvuW/SyFlzW/FWtcWHzN+rX+j/x3q8WrkgEteMfo7BUjLMh/4mmF6u++hw0GeH3L/vjDfSUGZDRzxosKUgsWlWzQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=eUmbqBO7; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="eUmbqBO7" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WTevK4s9bqcsyF5dGhWo4MENNp9vZlr+uy9xAQG2jfo=; b=eUmbqBO72zX2904RPuTC7NJzAIMHlr5gcD4RVYygo1GU1dDyzany1Q2T9JuiXvkcB8DQd9 aunpkimHYmudKZq6kRshELjRaFFsIkn5wCDv9o4AVpLZf/bmSb/rct1ebqgCoY+qovcSBc l6ygYvG/M7DXmErg92nyDDfRdBDaJ3A= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 12/24] mm: thp: handle split failure in device migration Date: Thu, 26 Mar 2026 19:08:54 -0700 Message-ID: <20260327021403.214713-13-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Device memory migration has two call sites that split huge PMDs: migrate_vma_split_unmapped_folio(): Called from migrate_vma_pages() when migrating a PMD-mapped THP to a destination that doesn't support compound pages. It splits the PMD then splits the folio via folio_split_unmapped(). If the PMD split fails, folio_split_unmapped() would operate on an unsplit folio with inconsistent page table state. Propagate -ENOMEM to skip this page's migration. This is safe as folio_split_unmapped failure would be propagated in a similar way. migrate_vma_insert_page(): Called from migrate_vma_pages() when inserting a page into a VMA during migration back from device memory. If a huge zero PMD exists at the target address, it must be split before PTE insertion. If the split fails, the subsequent pte_alloc() and set_pte_at() would operate on a PMD slot still occupied by the huge zero entry. Use goto abort, consistent with other allocation failures in this function. Signed-off-by: Usama Arif --- mm/migrate_device.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 2912eba575d5e..00003fbe803df 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -919,7 +919,19 @@ static int migrate_vma_split_unmapped_folio(struct mig= rate_vma *migrate, * drops a reference at the end. */ folio_get(folio); - split_huge_pmd_address(migrate->vma, addr, true); + /* + * If PMD split fails, folio_split_unmapped would operate on an + * unsplit folio with inconsistent page table state. + */ + ret =3D split_huge_pmd_address(migrate->vma, addr, true); + if (ret) { + /* + * folio_get above was not consumed by split_huge_pmd_address. + * put back that reference. + */ + folio_put(folio); + return ret; + } ret =3D folio_split_unmapped(folio, 0); if (ret) return ret; @@ -1015,7 +1027,13 @@ static void migrate_vma_insert_page(struct migrate_v= ma *migrate, if (pmd_trans_huge(*pmdp)) { if (!is_huge_zero_pmd(*pmdp)) goto abort; - split_huge_pmd(vma, pmdp, addr); + /* + * If split fails, the huge zero PMD remains and + * pte_alloc/PTE insertion that follows would be + * incorrect. + */ + if (split_huge_pmd(vma, pmdp, addr)) + goto abort; } else if (pmd_leaf(*pmdp)) goto abort; } --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0586F3314B9 for ; Fri, 27 Mar 2026 02:15:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577736; cv=none; b=mvQBBlYQ1E75w+a51v8TmqABxmczUPQKNc8ceBXoR9pmZaLfuWSvk7WzpIwWX1Poi69O5pNX7d8r1wWVs02GSw5sd4PwspYKGWm83cPc47eDn8ABT0Oc3bsbLsS0z8iM3134Ro6S9M7g2ZnM+cJuT1wBS6qDq8q+u7R7gZSNb2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577736; c=relaxed/simple; bh=4BdZVzg7p5U+3is7QnIcqItm+JxYIoZNwRJVRSBVrTQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ShqK8+0H3RNYqrTsBbzr5waTWeFu01rE8dkaPlOjP+wYrLq7hNlKOovxNsQAiYYUBfh+hxCeBEG0ELFxD0KdoUXryuFMHeM18693MqCghapsN9kYWiKg9qxVG3JnwSFsda4rVCn8tTAqHdwM2ZLQMWl09UnJih7dVwwXOo43J1E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ffpxHw+P; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ffpxHw+P" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577733; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XO0R5fX+FiXnuLY0HoKuyc3BzHok+k+2YVFrwxlOH1U=; b=ffpxHw+PDquRIY7PsBB+gYIVU0HGhnrg/lszSQI4Ss/ylyOhYh/GY54SqX/Gjmz298GChe CK1RQEiqAUEGH9vhPBlSkxZYbNei6tVRaaPyX3jC8Yd6BbKN5MzHnqPEqfYrvEF6cG/SMn RA3uLKxe+69mdcNlUOoR18d3XTq3a4c= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 13/24] mm: proc: handle split_huge_pmd failure in pagemap_scan Date: Thu, 26 Mar 2026 19:08:55 -0700 Message-ID: <20260327021403.214713-14-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT pagemap_scan_thp_entry() splits a huge PMD when the PAGEMAP_SCAN ioctl needs to write-protect only a portion of a THP. It then returns -ENOENT so pagemap_scan_pmd_entry() falls through to PTE-level handling. Check the split_huge_pmd() return value and propagate the error on failure. Returning -ENOMEM instead of -ENOENT prevents the fallthrough to PTE handling, and the error propagates through walk_page_range() to do_pagemap_scan() where it becomes the ioctl return value. pagemap_scan_backout_range() already undoes the buffered output, and walk_end is written back to userspace so the caller knows where the scan stopped. If the split fails, the PMD remains huge. An alternative to the approach in the patch is to return -ENOENT, causing the caller to proceed to pte_offset_map_lock(). ___pte_offset_map() detects the trans_huge PMD and returns NULL, which sets ACTION_AGAIN =E2=80=94 restarting the walker o= n the same PMD by which time the system might have enough memory to satisfy the split from succeeding. Signed-off-by: Usama Arif --- fs/proc/task_mmu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e091931d7ca19..f5f459140b5c0 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2714,9 +2714,13 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, unsign= ed long start, * needs to be performed on a portion of the huge page. */ if (end !=3D start + HPAGE_SIZE) { + int err; + spin_unlock(ptl); - split_huge_pmd(vma, pmd, start); + err =3D split_huge_pmd(vma, pmd, start); pagemap_scan_backout_range(p, start, end); + if (err) + return err; /* Report as if there was no THP */ return -ENOENT; } --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D355630648A for ; Fri, 27 Mar 2026 02:15:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577742; cv=none; b=csvFfOkxH1ryyRVgEzYLonFypylGSbwIv2tRAqTlPv/BPYOyNfG5nHf3XrdGa6TrM4IipX6cjEq3g9GsSRa6DY+nayvAjJ6SFTJSoL+A0429RnsyDmO2/8k/C0raKsvmalv2MBjVVTo0v/176xtKtJnGaQLaG16JU8MjgFjYcYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577742; c=relaxed/simple; bh=jpuQVVXokoXedW4E3Rk2kXk+4m7MDI3R8N0U0vopra8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gOdZhyV/SOXdn5jfYeTwVHvVPl9X6O3yXSy7JjGQ894g2QHuOIXGUhHXUMLmVM4oJAGc+EFY+gJxYzfg6KI5vf2EQ6AIYfB0HLSf2vUTWcgp6Ug5poVwt1HWtXqJB33BMtwAi9DoCDtrK1ZnNlmGlZ8RLYIVsO5Rp10PKCX18WE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=w7b0PFKI; arc=none smtp.client-ip=95.215.58.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="w7b0PFKI" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DA9DXWIy9I9hYoDDZ6d9idv8WvO7k2GsFa78UasupfU=; b=w7b0PFKIYHL5VzgfVOmJaU8mndUe+9d3TAczYvwOPmlWZPC+YdbN4753/gZVXBv6rf/e2p hhxdxl4rOelbJ74Wb6vwZdxneuTlP5Mo2CuIkJfQOwBMurDfXHMHtfo4IP9DtIrgPBP2Dl uZc/JaknYWmWWlMxr4qMjjDLSrv/rhw= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 14/24] powerpc/mm: handle split_huge_pmd failure in subpage_prot Date: Thu, 26 Mar 2026 19:08:56 -0700 Message-ID: <20260327021403.214713-15-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" subpage_walk_pmd_entry() splits huge PMDs when the subpage_prot syscall controls the access permissions on individual 4 kB. In practice this cannot fail today: sys_subpage_prot() returns -ENOENT early when radix is enabled, and on hash powerpc arch_needs_pgtable_deposit() is true so split uses the pre-deposited page table and always succeeds. The change is for __must_check compliance introduced in a later patch and correctness if the call chain ever becomes reachable on architectures with lazy PTE allocation. Propagate the error through the full call chain up to the syscall. The syscall already returns -ENOMEM in other places when it runs out of memory. Signed-off-by: Usama Arif --- arch/powerpc/mm/book3s64/subpage_prot.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book= 3s64/subpage_prot.c index 37d47282c3686..b3635a11ff433 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -139,8 +139,8 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned = long addr, unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->vma; - split_huge_pmd(vma, pmd, addr); - return 0; + + return split_huge_pmd(vma, pmd, addr); } =20 static const struct mm_walk_ops subpage_walk_ops =3D { @@ -148,11 +148,12 @@ static const struct mm_walk_ops subpage_walk_ops =3D { .walk_lock =3D PGWALK_WRLOCK_VERIFY, }; =20 -static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long ad= dr, - unsigned long len) +static int subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long add= r, + unsigned long len) { struct vm_area_struct *vma; VMA_ITERATOR(vmi, mm, addr); + int err; =20 /* * We don't try too hard, we just mark all the vma in that range @@ -160,14 +161,17 @@ static void subpage_mark_vma_nohuge(struct mm_struct = *mm, unsigned long addr, */ for_each_vma_range(vmi, vma, addr + len) { vm_flags_set(vma, VM_NOHUGEPAGE); - walk_page_vma(vma, &subpage_walk_ops, NULL); + err =3D walk_page_vma(vma, &subpage_walk_ops, NULL); + if (err) + return err; } + return 0; } #else -static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long ad= dr, - unsigned long len) +static int subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long add= r, + unsigned long len) { - return; + return 0; } #endif =20 @@ -229,7 +233,9 @@ SYSCALL_DEFINE3(subpage_prot, unsigned long, addr, mm->context.hash_context->spt =3D spt; } =20 - subpage_mark_vma_nohuge(mm, addr, len); + err =3D subpage_mark_vma_nohuge(mm, addr, len); + if (err) + goto out; for (limit =3D addr + len; addr < limit; addr =3D next) { next =3D pmd_addr_end(addr, limit); err =3D -ENOMEM; --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5A69332ED0 for ; Fri, 27 Mar 2026 02:15:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577747; cv=none; b=FOfzY475HBkSOMRfeBp8Ud1LThJr+QgZmr6QRuuzsEbbHm0M6GBtcZGn+N7Fy5g2T9LfIgSYs0fwyylf8Zri6zePodVMn8nbOBnzsqQ6y2YKBkit/I/7FfhVTSQdj3/DRr63x3VoNASonLxtstWNPW+MM4NkQOX9MNH3wF0087Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577747; c=relaxed/simple; bh=MSBXQPCmQM3p81GXRNNhHe5yDRAt26b3JVPgOD91aaw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KfCge1ZR6E23IRDeWlAtnGXf5yCNTa7miBkmel/vFSMhFekD7cCWfCIIa3MObCOLxPjJ1XuCx/nJ+e31yoIlj+uLnhwNwEU/vgF1aruoMp1WyqorQZkDwU5NJXprAaJ6/5xbHVQMDm3ntFlByUVLTBw3dE9jZTUCAQKkyOuGyZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Hnot0p5K; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Hnot0p5K" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8tx/Qjve1iVx/sjsNymGDpKKmMZv1TUyDs/EfcYY5GE=; b=Hnot0p5K89CAv1AQSFPCVNypjXRfAocNMIZyWJnznY4gpq+GWy2CVy+0V7yyIEu0F+HuLG VCDbIV/SqybnVZSHtV9JQkcdTWMzMp9IQ8Hs+qWemVt0yevh5Dk4OIgO/2Wqfo3fp+W2Sb aJvabR1V9LfLrBzcx+XgaoHKYXMXuJs= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 15/24] fs/dax: handle split_huge_pmd failure in dax_iomap_pmd_fault Date: Thu, 26 Mar 2026 19:08:57 -0700 Message-ID: <20260327021403.214713-16-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" dax_iomap_pmd_fault() splits a huge PMD when the PMD fault falls back to PTE-level handling. The split is necessary so that the subsequent PTE fault does not misinterpret the huge PMD entry as a page table pointer. In practice this cannot fail today: DAX VMAs are always file-backed, and __split_huge_pmd() only allocates a PTE page table (the operation that can return -ENOMEM) for anonymous VMAs. For file-backed VMAs the split path simply zaps the PMD and returns 0. Use WARN_ON_ONCE to document this invariant and check the return value for __must_check compliance introduced in the next patch. Signed-off-by: Usama Arif --- fs/dax.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/dax.c b/fs/dax.c index a5237169b4679..ed1859e8a916f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -2039,7 +2039,14 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_faul= t *vmf, unsigned long *pfnp, dax_unlock_entry(&xas, entry); fallback: if (ret =3D=3D VM_FAULT_FALLBACK) { - split_huge_pmd(vmf->vma, vmf->pmd, vmf->address); + /* + * split_huge_pmd() cannot fail for file-backed (DAX) VMAs + * since splitting only zaps the PMD without allocating a + * PTE page table. + */ + if (WARN_ON_ONCE(split_huge_pmd(vmf->vma, vmf->pmd, + vmf->address))) + ret =3D VM_FAULT_OOM; count_vm_event(THP_FAULT_FALLBACK); } out: --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7233B3368B3 for ; Fri, 27 Mar 2026 02:15:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577751; cv=none; b=mbgn6n/UyfcqL8Z/HMFZ9dvkFw1oHNoWBhxlKoxSpwBOQ7ItptTqj8m4MRdSPGOXhjFu1qG1ekw8lxJMBVNZb2oFndjJAvLnHZyXT5oDm78l0OdvMyCGJ+y/94dU3x4TbOM1UQivRbTjdwFBn/PrJUWgcFlIQwssSufxiRTyVRs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577751; c=relaxed/simple; bh=j9LGkm9AT1SVnMNzLezcvHWm8GVx3wLShAsc0rRNvYo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dt8zz9OX/cO9lh1gTBrUYtN5AZ8DTjFXCG/2DZhCaZ3sIkC1KxRWIkiJefxlouSAmgoySk2TcwtSUo0qO6pV1Qum+ayKb2GHJ2NrQ/KDtYEKEJe3wFR0Oo5BLvhgAdDfc5AcIy6PWNXKgZPoYe5IhO2VGD8j+yI5msdKQGyMugg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=PBqsuNks; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="PBqsuNks" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577748; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y5PgiQrcwbn9XAgL4cRYPHYRVIr8zaspEnc9S8fZNtw=; b=PBqsuNksqiYxLcyifzHiiquL6Zzzi+Q6QhZKvMqsl6Ywk8e+CwDNj9TY+ZBrYzU5TX4YZX tN7zZsn84kggt2lWPGZqKW8iUu8t1aNBnigvnGagmSkB+xzT/lcMTAqVdArQiFTX42gjln p/GMbq4RVKzlqVsGNHcXNmutrl2WTFQ= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 16/24] mm: huge_mm: Make sure all split_huge_pmd calls are checked Date: Thu, 26 Mar 2026 19:08:58 -0700 Message-ID: <20260327021403.214713-17-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Mark __split_huge_pmd(), split_huge_pmd() and split_huge_pmd_address() with __must_check so the compiler warns if any caller ignores the return value. Not checking return value and operating on the basis that the pmd is split could result in a kernel bug. The possibility of an order-0 allocation failing for page table allocation is very low, but it should be handled correctly. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 224965fce4e66..c4d0badc4ce27 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -418,7 +418,7 @@ static inline int split_huge_page(struct page *page) extern struct list_lru deferred_split_lru; void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 -int __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, +int __must_check __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); =20 /** @@ -447,7 +447,7 @@ static inline bool pmd_is_huge(pmd_t pmd) return false; } =20 -static inline int split_huge_pmd(struct vm_area_struct *vma, +static inline int __must_check split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address) { if (pmd_is_huge(*pmd)) @@ -455,7 +455,7 @@ static inline int split_huge_pmd(struct vm_area_struct = *vma, return 0; } =20 -int split_huge_pmd_address(struct vm_area_struct *vma, unsigned long addre= ss, +int __must_check split_huge_pmd_address(struct vm_area_struct *vma, unsign= ed long address, bool freeze); =20 void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72E33329C6D for ; Fri, 27 Mar 2026 02:15:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577757; cv=none; b=NpuMAe4T//dUMQe0u9BZRw08GQePsnQcnOlwBi0KDK/wKk9DotvUsXUMzLOckFUIdYzLkdBvy6oehCUXuzFAk9zHNotcaJzECAZbpvLmlXzR6cavwXnz0t9h44jK7StjnW4fcoaDva63WlMgpBzHAC+k8G5gqixz4+Fz86uKOmc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577757; c=relaxed/simple; bh=4RNHDJsarktqazcp37gv2tFLbvHdmscrGWuxP6+aR2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PBo2AM7khUr5NvxMPsPnHQE+3OvAoPJzmG0gsC+peBQ75qhvkX75W0NbqkzHk/aYISms4d56gYVzwLvmuq/LwujMKkEPv9PMC6e1XyI+YfYiF8W4Gh9i+foV7ZMYQrDXmAoD9x/wtsaipu2/1Ro+KHRciAbIaxJTPY/gEU3mJOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=w30DGB4U; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="w30DGB4U" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=39bZVeDLCZMGYPUr+EAc/EnS1HXbPGaYLlzIdJWQkDw=; b=w30DGB4UyGXLzwfa4i4fkjpzxE6aN3Qb/p4ZKaRsPTAiSqe7v4evtfAT7/BH1HS7bkR3ee TPIhwTi3UpYkcx8TXWvhJiBwvNzLsYzefVS4KGsbhlTKHaRYF6+xMyROiJO8FjJH7HcP1c VHlLz8MpZzAFWrBdFHiUuP22FNKzK2o= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 17/24] mm: thp: allocate PTE page tables lazily at split time Date: Thu, 26 Mar 2026 19:08:59 -0700 Message-ID: <20260327021403.214713-18-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT When the kernel creates a PMD-level THP mapping for anonymous pages, it pre-allocates a PTE page table and deposits it via pgtable_trans_huge_deposit(). This deposited table is withdrawn during PMD split or zap. The rationale was that split must not fail=E2=80=94if the kernel decides to split a THP, it needs a PTE table to populate. However, every anon THP wastes 4KB (one page table page) that sits unused in the deposit list for the lifetime of the mapping. On systems with many THPs, this adds up to significant memory waste. The original rationale is also not an issue. It is ok for split to fail, and if the kernel can't find an order 0 allocation for split, there are much bigger problems. On large servers where you can easily have 100s of GBs of THPs, the memory usage for these tables is 200M per 100G. This memory could be used for any other usecase, which include allocating the pagetables required during split. This patch removes the pre-deposit for anonymous pages on architectures where arch_needs_pgtable_deposit() returns false (every arch apart from powerpc, and only when radix hash tables are not enabled) and allocates the PTE table lazily=E2=80=94only when a split actually occurs. The split p= ath is modified to accept a caller-provided page table. PowerPC exception: It would have been great if we can completely remove the pagetable deposit code and this commit would mostly have been a code cleanup patch, unfortunately PowerPC has hash MMU, it stores hash slot information in the deposited page table and pre-deposit is necessary. All deposit/ withdraw paths are guarded by arch_needs_pgtable_deposit(), so PowerPC behavior is unchanged with this patch. On a better note, arch_needs_pgtable_deposit will always evaluate to false at compile time on non PowerPC architectures and the pre-deposit code will not be compiled in. Suggested-by: David Hildenbrand Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 4 +- mm/huge_memory.c | 158 ++++++++++++++++++++++++++-------------- mm/khugepaged.c | 7 +- mm/migrate_device.c | 15 ++-- mm/rmap.c | 39 +++++++++- 5 files changed, 158 insertions(+), 65 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c4d0badc4ce27..c02ba9c4b8d5b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -561,7 +561,7 @@ static inline bool thp_migration_supported(void) } =20 void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addre= ss, - pmd_t *pmd, bool freeze); + pmd_t *pmd, bool freeze, pgtable_t pgtable); bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd, @@ -660,7 +660,7 @@ static inline int split_huge_pmd_address(struct vm_area= _struct *vma, unsigned long address, bool freeze) { return 0; } static inline void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, - bool freeze) {} + bool freeze, pgtable_t pgtable) {} =20 static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f4be707c8cb0..2acedb1de7404 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1309,17 +1309,19 @@ static vm_fault_t __do_huge_pmd_anonymous_page(stru= ct vm_fault *vmf) unsigned long haddr =3D vmf->address & HPAGE_PMD_MASK; struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; - pgtable_t pgtable; + pgtable_t pgtable =3D NULL; vm_fault_t ret =3D 0; =20 folio =3D vma_alloc_anon_folio_pmd(vma, vmf->address); if (unlikely(!folio)) return VM_FAULT_FALLBACK; =20 - pgtable =3D pte_alloc_one(vma->vm_mm); - if (unlikely(!pgtable)) { - ret =3D VM_FAULT_OOM; - goto release; + if (arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(vma->vm_mm); + if (unlikely(!pgtable)) { + ret =3D VM_FAULT_OOM; + goto release; + } } =20 vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); @@ -1334,14 +1336,18 @@ static vm_fault_t __do_huge_pmd_anonymous_page(stru= ct vm_fault *vmf) if (userfaultfd_missing(vma)) { spin_unlock(vmf->ptl); folio_put(folio); - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); ret =3D handle_userfault(vmf, VM_UFFD_MISSING); VM_BUG_ON(ret & VM_FAULT_FALLBACK); return ret; } - pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); + if (pgtable) { + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, + pgtable); + mm_inc_nr_ptes(vma->vm_mm); + } map_anon_folio_pmd_pf(folio, vmf->pmd, vma, haddr); - mm_inc_nr_ptes(vma->vm_mm); spin_unlock(vmf->ptl); } =20 @@ -1437,9 +1443,11 @@ static void set_huge_zero_folio(pgtable_t pgtable, s= truct mm_struct *mm, pmd_t entry; entry =3D folio_mk_pmd(zero_folio, vma->vm_page_prot); entry =3D pmd_mkspecial(entry); - pgtable_trans_huge_deposit(mm, pmd, pgtable); + if (pgtable) { + pgtable_trans_huge_deposit(mm, pmd, pgtable); + mm_inc_nr_ptes(mm); + } set_pmd_at(mm, haddr, pmd, entry); - mm_inc_nr_ptes(mm); } =20 vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) @@ -1458,16 +1466,19 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fau= lt *vmf) if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && transparent_hugepage_use_zero_page()) { - pgtable_t pgtable; + pgtable_t pgtable =3D NULL; struct folio *zero_folio; vm_fault_t ret; =20 - pgtable =3D pte_alloc_one(vma->vm_mm); - if (unlikely(!pgtable)) - return VM_FAULT_OOM; + if (arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(vma->vm_mm); + if (unlikely(!pgtable)) + return VM_FAULT_OOM; + } zero_folio =3D mm_get_huge_zero_folio(vma->vm_mm); if (unlikely(!zero_folio)) { - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); count_vm_event(THP_FAULT_FALLBACK); return VM_FAULT_FALLBACK; } @@ -1477,10 +1488,12 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fau= lt *vmf) ret =3D check_stable_address_space(vma->vm_mm); if (ret) { spin_unlock(vmf->ptl); - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); } else if (userfaultfd_missing(vma)) { spin_unlock(vmf->ptl); - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); ret =3D handle_userfault(vmf, VM_UFFD_MISSING); VM_BUG_ON(ret & VM_FAULT_FALLBACK); } else { @@ -1491,7 +1504,8 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault= *vmf) } } else { spin_unlock(vmf->ptl); - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); } return ret; } @@ -1823,8 +1837,10 @@ static void copy_huge_non_present_pmd( } =20 add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (pgtable) { + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + } if (!userfaultfd_wp(dst_vma)) pmd =3D pmd_swp_clear_uffd_wp(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); @@ -1864,9 +1880,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, if (!vma_is_anonymous(dst_vma)) return 0; =20 - pgtable =3D pte_alloc_one(dst_mm); - if (unlikely(!pgtable)) - goto out; + if (arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; + } =20 dst_ptl =3D pmd_lock(dst_mm, dst_pmd); src_ptl =3D pmd_lockptr(src_mm, src_pmd); @@ -1884,7 +1902,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, } =20 if (unlikely(!pmd_trans_huge(pmd))) { - pte_free(dst_mm, pgtable); + if (pgtable) + pte_free(dst_mm, pgtable); goto out_unlock; } /* @@ -1910,7 +1929,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, dst_vma, sr= c_vma))) { /* Page maybe pinned: split and retry the fault on PTEs. */ folio_put(src_folio); - pte_free(dst_mm, pgtable); + if (pgtable) + pte_free(dst_mm, pgtable); spin_unlock(src_ptl); spin_unlock(dst_ptl); /* @@ -1924,8 +1944,10 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, } add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); out_zero_page: - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (pgtable) { + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + } pmdp_set_wrprotect(src_mm, addr, src_pmd); if (!userfaultfd_wp(dst_vma)) pmd =3D pmd_clear_uffd_wp(pmd); @@ -2376,22 +2398,13 @@ static struct folio *normal_or_softleaf_folio_pmd(s= truct vm_area_struct *vma, static bool has_deposited_pgtable(struct vm_area_struct *vma, pmd_t pmdval, struct folio *folio) { - /* Some architectures require unconditional depositing. */ - if (arch_needs_pgtable_deposit()) - return true; - - /* - * Huge zero always deposited except for DAX which handles itself, see - * set_huge_zero_folio(). - */ - if (is_huge_zero_pmd(pmdval)) - return !vma_is_dax(vma); - /* - * Otherwise, only anonymous folios are deposited, see - * __do_huge_pmd_anonymous_page(). + * With lazy PTE page table allocation, only architectures that + * require unconditional depositing (powerpc hash MMU) will have + * deposited page tables. All other architectures allocate PTE + * page tables lazily at split time. */ - return folio && folio_test_anon(folio); + return arch_needs_pgtable_deposit(); } =20 /** @@ -2514,7 +2527,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsign= ed long old_addr, force_flush =3D true; VM_BUG_ON(!pmd_none(*new_pmd)); =20 - if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { + if (pmd_move_must_withdraw(new_ptl, old_ptl, vma) && + arch_needs_pgtable_deposit()) { pgtable_t pgtable; pgtable =3D pgtable_trans_huge_withdraw(mm, old_pmd); pgtable_trans_huge_deposit(mm, new_pmd, pgtable); @@ -2823,8 +2837,10 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t = *dst_pmd, pmd_t *src_pmd, pm } set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); =20 - src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); - pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable); + if (arch_needs_pgtable_deposit()) { + src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); + pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable); + } unlock_ptls: double_pt_unlock(src_ptl, dst_ptl); /* unblock rmap walks */ @@ -2966,10 +2982,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pu= d_t *pud, #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ =20 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd) + unsigned long haddr, pmd_t *pmd, pgtable_t pgtable) { struct mm_struct *mm =3D vma->vm_mm; - pgtable_t pgtable; pmd_t _pmd, old_pmd; unsigned long addr; pte_t *pte; @@ -2985,7 +3000,16 @@ static void __split_huge_zero_page_pmd(struct vm_are= a_struct *vma, */ old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); =20 - pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + if (arch_needs_pgtable_deposit()) { + pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + } else { + VM_BUG_ON(!pgtable); + /* + * Account for the freshly allocated (in __split_huge_pmd) pgtable + * being used in mm. + */ + mm_inc_nr_ptes(mm); + } pmd_populate(mm, &_pmd, pgtable); =20 pte =3D pte_offset_map(&_pmd, haddr); @@ -3007,12 +3031,11 @@ static void __split_huge_zero_page_pmd(struct vm_ar= ea_struct *vma, } =20 static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long haddr, bool freeze) + unsigned long haddr, bool freeze, pgtable_t pgtable) { struct mm_struct *mm =3D vma->vm_mm; struct folio *folio; struct page *page; - pgtable_t pgtable; pmd_t old_pmd, _pmd; bool soft_dirty, uffd_wp =3D false, young =3D false, write =3D false; bool anon_exclusive =3D false, dirty =3D false; @@ -3036,6 +3059,8 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, */ if (arch_needs_pgtable_deposit()) zap_deposited_table(mm, pmd); + if (pgtable) + pte_free(mm, pgtable); if (vma_is_special_huge(vma)) return; if (unlikely(pmd_is_migration_entry(old_pmd))) { @@ -3068,7 +3093,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, * small page also write protected so it does not seems useful * to invalidate secondary mmu at this time. */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); + return __split_huge_zero_page_pmd(vma, haddr, pmd, pgtable); } =20 if (pmd_is_migration_entry(*pmd)) { @@ -3192,7 +3217,16 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, * Withdraw the table only after we mark the pmd entry invalid. * This's critical for some architectures (Power). */ - pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + if (arch_needs_pgtable_deposit()) { + pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + } else { + VM_BUG_ON(!pgtable); + /* + * Account for the freshly allocated (in __split_huge_pmd) pgtable + * being used in mm. + */ + mm_inc_nr_ptes(mm); + } pmd_populate(mm, &_pmd, pgtable); =20 pte =3D pte_offset_map(&_pmd, haddr); @@ -3288,11 +3322,13 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, } =20 void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addre= ss, - pmd_t *pmd, bool freeze) + pmd_t *pmd, bool freeze, pgtable_t pgtable) { VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); if (pmd_trans_huge(*pmd) || pmd_is_valid_softleaf(*pmd)) - __split_huge_pmd_locked(vma, pmd, address, freeze); + __split_huge_pmd_locked(vma, pmd, address, freeze, pgtable); + else if (pgtable) + pte_free(vma->vm_mm, pgtable); } =20 int __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, @@ -3300,13 +3336,24 @@ int __split_huge_pmd(struct vm_area_struct *vma, pm= d_t *pmd, { spinlock_t *ptl; struct mmu_notifier_range range; + pgtable_t pgtable =3D NULL; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, address & HPAGE_PMD_MASK, (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); + + /* allocate pagetable before acquiring pmd lock */ + if (vma_is_anonymous(vma) && !arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(vma->vm_mm); + if (!pgtable) { + mmu_notifier_invalidate_range_end(&range); + return -ENOMEM; + } + } + ptl =3D pmd_lock(vma->vm_mm, pmd); - split_huge_pmd_locked(vma, range.start, pmd, freeze); + split_huge_pmd_locked(vma, range.start, pmd, freeze, pgtable); spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); =20 @@ -3442,7 +3489,8 @@ static bool __discard_anon_folio_pmd_locked(struct vm= _area_struct *vma, } =20 folio_remove_rmap_pmd(folio, pmd_page(orig_pmd), vma); - zap_deposited_table(mm, pmdp); + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmdp); add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d06d84219e1b8..40b33263f6135 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1239,7 +1239,12 @@ static enum scan_result collapse_huge_page(struct mm= _struct *mm, unsigned long a =20 spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); - pgtable_trans_huge_deposit(mm, pmd, pgtable); + if (arch_needs_pgtable_deposit()) { + pgtable_trans_huge_deposit(mm, pmd, pgtable); + } else { + mm_dec_nr_ptes(mm); + pte_free(mm, pgtable); + } map_anon_folio_pmd_nopf(folio, pmd, vma, address); spin_unlock(pmd_ptl); =20 diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 00003fbe803df..b9242217a81b6 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -829,9 +829,13 @@ static int migrate_vma_insert_huge_pmd_page(struct mig= rate_vma *migrate, =20 __folio_mark_uptodate(folio); =20 - pgtable =3D pte_alloc_one(vma->vm_mm); - if (unlikely(!pgtable)) - goto abort; + if (arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(vma->vm_mm); + if (unlikely(!pgtable)) + goto abort; + } else { + pgtable =3D NULL; + } =20 if (folio_is_device_private(folio)) { swp_entry_t swp_entry; @@ -879,10 +883,11 @@ static int migrate_vma_insert_huge_pmd_page(struct mi= grate_vma *migrate, folio_get(folio); =20 if (flush) { - pte_free(vma->vm_mm, pgtable); + if (pgtable) + pte_free(vma->vm_mm, pgtable); flush_cache_page(vma, addr, addr + HPAGE_PMD_SIZE); pmdp_invalidate(vma, addr, pmdp); - } else { + } else if (pgtable) { pgtable_trans_huge_deposit(vma->vm_mm, pmdp, pgtable); mm_inc_nr_ptes(vma->vm_mm); } diff --git a/mm/rmap.c b/mm/rmap.c index 78b7fb5f367ce..efbcdd3b32632 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -76,6 +76,7 @@ #include #include =20 +#include #include =20 #define CREATE_TRACE_POINTS @@ -1995,6 +1996,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, unsigned long pfn; unsigned long hsz =3D 0; int ptes =3D 0; + pgtable_t prealloc_pte =3D NULL; =20 /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2029,6 +2031,10 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, } mmu_notifier_invalidate_range_start(&range); =20 + if ((flags & TTU_SPLIT_HUGE_PMD) && vma_is_anonymous(vma) && + !arch_needs_pgtable_deposit()) + prealloc_pte =3D pte_alloc_one(mm); + while (page_vma_mapped_walk(&pvmw)) { /* * If the folio is in an mlock()d vma, we must not swap it out. @@ -2078,12 +2084,21 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, } =20 if (flags & TTU_SPLIT_HUGE_PMD) { + pgtable_t pgtable =3D prealloc_pte; + + prealloc_pte =3D NULL; + if (!arch_needs_pgtable_deposit() && !pgtable && + vma_is_anonymous(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret =3D false; + break; + } /* * We temporarily have to drop the PTL and * restart so we can process the PTE-mapped THP. */ split_huge_pmd_locked(vma, pvmw.address, - pvmw.pmd, false); + pvmw.pmd, false, pgtable); flags &=3D ~TTU_SPLIT_HUGE_PMD; page_vma_mapped_walk_restart(&pvmw); continue; @@ -2363,6 +2378,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, break; } =20 + if (prealloc_pte) + pte_free(mm, prealloc_pte); + mmu_notifier_invalidate_range_end(&range); =20 return ret; @@ -2422,6 +2440,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, enum ttu_flags flags =3D (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz =3D 0; + pgtable_t prealloc_pte =3D NULL; =20 /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2456,6 +2475,10 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, } mmu_notifier_invalidate_range_start(&range); =20 + if ((flags & TTU_SPLIT_HUGE_PMD) && vma_is_anonymous(vma) && + !arch_needs_pgtable_deposit()) + prealloc_pte =3D pte_alloc_one(mm); + while (page_vma_mapped_walk(&pvmw)) { /* PMD-mapped THP migration entry */ if (!pvmw.pte) { @@ -2463,6 +2486,15 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, __maybe_unused pmd_t pmdval; =20 if (flags & TTU_SPLIT_HUGE_PMD) { + pgtable_t pgtable =3D prealloc_pte; + + prealloc_pte =3D NULL; + if (!arch_needs_pgtable_deposit() && !pgtable && + vma_is_anonymous(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret =3D false; + break; + } /* * split_huge_pmd_locked() might leave the * folio mapped through PTEs. Retry the walk @@ -2470,7 +2502,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * abort the walk. */ split_huge_pmd_locked(vma, pvmw.address, - pvmw.pmd, true); + pvmw.pmd, true, pgtable); flags &=3D ~TTU_SPLIT_HUGE_PMD; page_vma_mapped_walk_restart(&pvmw); continue; @@ -2721,6 +2753,9 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, folio_put(folio); } =20 + if (prealloc_pte) + pte_free(mm, prealloc_pte); + mmu_notifier_invalidate_range_end(&range); =20 return ret; --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6800832E13B for ; Fri, 27 Mar 2026 02:16:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577763; cv=none; b=oWWyNhFmGXniqTtaPKwJSFhVozEY9LjV3zG+ej7YaWdTACLU/bXnHueFj2yK9ckuOITbOHxdKEp1DqtBFetdLBOj42t6iY211hoLjkcAExh0GrqBsmfcY7cHhwi/D2VbmVL+lt+dfi0vdr9Hz55ANbEXIiczcInxWvEYxOkta3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577763; c=relaxed/simple; bh=FQEc69Q/+FrUrqq14aJmPyl26Cwkvf2VvjDzLT5nK0Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ndRlvXxnXAzpFBdZqT+6C/HmQB26MKKQ/JgCF2TuoJ3LG4BRDLFCBbNumbTo/bdiepadYu0AL2TW9YuB0javtVFrjj6lnaK7fbn3IHXy1+GjOlnJFkMDjR7zU1JvGNwT0bYZgSKrCcWO1baG9/jklqVjnP9DJ45eRWzapMmvCbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=txcrfU3Z; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="txcrfU3Z" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iYCwapbn/lTyj+e7LtoHxlqJFgqF7EX0syxt7rVM1Zo=; b=txcrfU3Z3cqsr4kM8egih5yBbs+GKiMnbBtOqZAy3/1n6NN5mRJoqoBLEB8uutYAh1MlPA wFv2iEWhRlMUDe5vOtRTYXLnTetgc+HQLwXlzSzSrbTfDFGJVHZvzzIk17r377TEYzDuFp CuXwGeRfuJ/IB9iX79ItYRHGbBpTagM= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 18/24] mm: thp: remove pgtable_trans_huge_{deposit/withdraw} when not needed Date: Thu, 26 Mar 2026 19:09:00 -0700 Message-ID: <20260327021403.214713-19-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Since the previous commit made deposit/withdraw only needed for architectures where arch_needs_pgtable_deposit() returns true (currently only powerpc hash MMU), the generic implementation in pgtable-generic.c and the s390/sparc overrides are now dead code =E2=80=94 all call sites are guarded by arch_needs_pgtable_deposit() which is compile-time false on those architectures. Remove them entirely and replace the extern declarations with static inline no-op stubs for the default case. pgtable_trans_huge_{deposit,withdraw}() are renamed to arch_pgtable_trans_huge_{deposit,withdraw}(). Signed-off-by: Usama Arif --- arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +++--- arch/s390/include/asm/pgtable.h | 6 --- arch/s390/mm/pgtable.c | 41 -------------------- arch/sparc/include/asm/pgtable_64.h | 6 --- arch/sparc/mm/tlb.c | 36 ----------------- include/linux/pgtable.h | 16 +++++--- mm/debug_vm_pgtable.c | 4 +- mm/huge_memory.c | 26 ++++++------- mm/khugepaged.c | 2 +- mm/memory.c | 2 +- mm/migrate_device.c | 2 +- mm/pgtable-generic.c | 32 --------------- 12 files changed, 35 insertions(+), 150 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index 60e283cf22be1..f1f36a4ed2bc8 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1360,18 +1360,18 @@ pud_t pudp_huge_get_and_clear_full(struct vm_area_s= truct *vma, unsigned long addr, pud_t *pudp, int full); =20 -#define __HAVE_ARCH_PGTABLE_DEPOSIT -static inline void pgtable_trans_huge_deposit(struct mm_struct *mm, - pmd_t *pmdp, pgtable_t pgtable) +#define arch_pgtable_trans_huge_deposit arch_pgtable_trans_huge_deposit +static inline void arch_pgtable_trans_huge_deposit(struct mm_struct *mm, + pmd_t *pmdp, pgtable_t pgtable) { if (radix_enabled()) return radix__pgtable_trans_huge_deposit(mm, pmdp, pgtable); return hash__pgtable_trans_huge_deposit(mm, pmdp, pgtable); } =20 -#define __HAVE_ARCH_PGTABLE_WITHDRAW -static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, - pmd_t *pmdp) +#define arch_pgtable_trans_huge_withdraw arch_pgtable_trans_huge_withdraw +static inline pgtable_t arch_pgtable_trans_huge_withdraw(struct mm_struct = *mm, + pmd_t *pmdp) { if (radix_enabled()) return radix__pgtable_trans_huge_withdraw(mm, pmdp); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 40a6fb19dd1dc..9394aabe0442b 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1659,12 +1659,6 @@ pud_t pudp_xchg_direct(struct mm_struct *, unsigned = long, pud_t *, pud_t); =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE =20 -#define __HAVE_ARCH_PGTABLE_DEPOSIT -void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable); - -#define __HAVE_ARCH_PGTABLE_WITHDRAW -pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); =20 #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS static inline int pmdp_set_access_flags(struct vm_area_struct *vma, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 4acd8b140c4bd..c9a9ab2c7d937 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -312,44 +312,3 @@ pud_t pudp_xchg_direct(struct mm_struct *mm, unsigned = long addr, return old; } EXPORT_SYMBOL(pudp_xchg_direct); - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable) -{ - struct list_head *lh =3D (struct list_head *) pgtable; - - assert_spin_locked(pmd_lockptr(mm, pmdp)); - - /* FIFO */ - if (!pmd_huge_pte(mm, pmdp)) - INIT_LIST_HEAD(lh); - else - list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp)); - pmd_huge_pte(mm, pmdp) =3D pgtable; -} - -pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) -{ - struct list_head *lh; - pgtable_t pgtable; - pte_t *ptep; - - assert_spin_locked(pmd_lockptr(mm, pmdp)); - - /* FIFO */ - pgtable =3D pmd_huge_pte(mm, pmdp); - lh =3D (struct list_head *) pgtable; - if (list_empty(lh)) - pmd_huge_pte(mm, pmdp) =3D NULL; - else { - pmd_huge_pte(mm, pmdp) =3D (pgtable_t) lh->next; - list_del(lh); - } - ptep =3D (pte_t *) pgtable; - set_pte(ptep, __pte(_PAGE_INVALID)); - ptep++; - set_pte(ptep, __pte(_PAGE_INVALID)); - return pgtable; -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/p= gtable_64.h index 74ede706fb325..60861560f8c40 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -987,12 +987,6 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, = unsigned long addr, extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, pmd_t *pmdp); =20 -#define __HAVE_ARCH_PGTABLE_DEPOSIT -void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable); - -#define __HAVE_ARCH_PGTABLE_WITHDRAW -pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); #endif =20 /* diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index 6d9dd5eb13287..9049d54e6e2cb 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -275,40 +275,4 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsi= gned long address, return old; } =20 -void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable) -{ - struct list_head *lh =3D (struct list_head *) pgtable; - - assert_spin_locked(&mm->page_table_lock); - - /* FIFO */ - if (!pmd_huge_pte(mm, pmdp)) - INIT_LIST_HEAD(lh); - else - list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp)); - pmd_huge_pte(mm, pmdp) =3D pgtable; -} - -pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) -{ - struct list_head *lh; - pgtable_t pgtable; - - assert_spin_locked(&mm->page_table_lock); - - /* FIFO */ - pgtable =3D pmd_huge_pte(mm, pmdp); - lh =3D (struct list_head *) pgtable; - if (list_empty(lh)) - pmd_huge_pte(mm, pmdp) =3D NULL; - else { - pmd_huge_pte(mm, pmdp) =3D (pgtable_t) lh->next; - list_del(lh); - } - pte_val(pgtable[0]) =3D 0; - pte_val(pgtable[1]) =3D 0; - - return pgtable; -} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cdd68ed3ae1a9..f646414e801b7 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1207,13 +1207,19 @@ static inline pmd_t pmdp_collapse_flush(struct vm_a= rea_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif =20 -#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT -extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable); +#ifndef arch_pgtable_trans_huge_deposit +static inline void arch_pgtable_trans_huge_deposit(struct mm_struct *mm, + pmd_t *pmdp, pgtable_t pgtable) +{ +} #endif =20 -#ifndef __HAVE_ARCH_PGTABLE_WITHDRAW -extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *= pmdp); +#ifndef arch_pgtable_trans_huge_withdraw +static inline pgtable_t arch_pgtable_trans_huge_withdraw(struct mm_struct = *mm, + pmd_t *pmdp) +{ + return NULL; +} #endif =20 #ifndef arch_needs_pgtable_deposit diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 23dc3ee095619..db58c5a1f4f48 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -240,7 +240,7 @@ static void __init pmd_advanced_tests(struct pgtable_de= bug_args *args) /* Align the address wrt HPAGE_PMD_SIZE */ vaddr &=3D HPAGE_PMD_MASK; =20 - pgtable_trans_huge_deposit(args->mm, args->pmdp, args->start_ptep); + arch_pgtable_trans_huge_deposit(args->mm, args->pmdp, args->start_ptep); =20 pmd =3D pfn_pmd(args->pmd_pfn, args->page_prot); set_pmd_at(args->mm, vaddr, args->pmdp, pmd); @@ -276,7 +276,7 @@ static void __init pmd_advanced_tests(struct pgtable_de= bug_args *args) =20 /* Clear the pte entries */ pmdp_huge_get_and_clear(args->mm, vaddr, args->pmdp); - pgtable_trans_huge_withdraw(args->mm, args->pmdp); + arch_pgtable_trans_huge_withdraw(args->mm, args->pmdp); } =20 static void __init pmd_leaf_tests(struct pgtable_debug_args *args) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2acedb1de7404..48c4884a6f386 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1343,7 +1343,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct= vm_fault *vmf) return ret; } if (pgtable) { - pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, + arch_pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); mm_inc_nr_ptes(vma->vm_mm); } @@ -1444,7 +1444,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, st= ruct mm_struct *mm, entry =3D folio_mk_pmd(zero_folio, vma->vm_page_prot); entry =3D pmd_mkspecial(entry); if (pgtable) { - pgtable_trans_huge_deposit(mm, pmd, pgtable); + arch_pgtable_trans_huge_deposit(mm, pmd, pgtable); mm_inc_nr_ptes(mm); } set_pmd_at(mm, haddr, pmd, entry); @@ -1577,7 +1577,7 @@ static vm_fault_t insert_pmd(struct vm_area_struct *v= ma, unsigned long addr, } =20 if (pgtable) { - pgtable_trans_huge_deposit(mm, pmd, pgtable); + arch_pgtable_trans_huge_deposit(mm, pmd, pgtable); mm_inc_nr_ptes(mm); pgtable =3D NULL; } @@ -1839,7 +1839,7 @@ static void copy_huge_non_present_pmd( add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); if (pgtable) { mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + arch_pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); } if (!userfaultfd_wp(dst_vma)) pmd =3D pmd_swp_clear_uffd_wp(pmd); @@ -1946,7 +1946,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, out_zero_page: if (pgtable) { mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + arch_pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); } pmdp_set_wrprotect(src_mm, addr, src_pmd); if (!userfaultfd_wp(dst_vma)) @@ -2354,7 +2354,7 @@ static inline void zap_deposited_table(struct mm_stru= ct *mm, pmd_t *pmd) { pgtable_t pgtable; =20 - pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + pgtable =3D arch_pgtable_trans_huge_withdraw(mm, pmd); pte_free(mm, pgtable); mm_dec_nr_ptes(mm); } @@ -2434,7 +2434,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_a= rea_struct *vma, /* * For architectures like ppc64 we look at deposited pgtable * when calling pmdp_huge_get_and_clear. So do the - * pgtable_trans_huge_withdraw after finishing pmdp related + * arch_pgtable_trans_huge_withdraw after finishing pmdp related * operations. */ orig_pmd =3D pmdp_huge_get_and_clear_full(vma, addr, pmd, @@ -2530,8 +2530,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsign= ed long old_addr, if (pmd_move_must_withdraw(new_ptl, old_ptl, vma) && arch_needs_pgtable_deposit()) { pgtable_t pgtable; - pgtable =3D pgtable_trans_huge_withdraw(mm, old_pmd); - pgtable_trans_huge_deposit(mm, new_pmd, pgtable); + pgtable =3D arch_pgtable_trans_huge_withdraw(mm, old_pmd); + arch_pgtable_trans_huge_deposit(mm, new_pmd, pgtable); } pmd =3D move_soft_dirty_pmd(pmd); if (vma_has_uffd_without_event_remap(vma)) @@ -2838,8 +2838,8 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *= dst_pmd, pmd_t *src_pmd, pm set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); =20 if (arch_needs_pgtable_deposit()) { - src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); - pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable); + src_pgtable =3D arch_pgtable_trans_huge_withdraw(mm, src_pmd); + arch_pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable); } unlock_ptls: double_pt_unlock(src_ptl, dst_ptl); @@ -3001,7 +3001,7 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); =20 if (arch_needs_pgtable_deposit()) { - pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + pgtable =3D arch_pgtable_trans_huge_withdraw(mm, pmd); } else { VM_BUG_ON(!pgtable); /* @@ -3218,7 +3218,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, * This's critical for some architectures (Power). */ if (arch_needs_pgtable_deposit()) { - pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + pgtable =3D arch_pgtable_trans_huge_withdraw(mm, pmd); } else { VM_BUG_ON(!pgtable); /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 40b33263f6135..b6d5da4567fe0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1240,7 +1240,7 @@ static enum scan_result collapse_huge_page(struct mm_= struct *mm, unsigned long a spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); if (arch_needs_pgtable_deposit()) { - pgtable_trans_huge_deposit(mm, pmd, pgtable); + arch_pgtable_trans_huge_deposit(mm, pmd, pgtable); } else { mm_dec_nr_ptes(mm); pte_free(mm, pgtable); diff --git a/mm/memory.c b/mm/memory.c index b99ec3ffc18d1..583ca340cef43 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5551,7 +5551,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; =20 - pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, vmf->prealloc_pte); + arch_pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, vmf->prealloc_pte); /* * We are going to consume the prealloc table, * count that as nr_ptes. diff --git a/mm/migrate_device.c b/mm/migrate_device.c index b9242217a81b6..fb0f29a7c73bc 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -888,7 +888,7 @@ static int migrate_vma_insert_huge_pmd_page(struct migr= ate_vma *migrate, flush_cache_page(vma, addr, addr + HPAGE_PMD_SIZE); pmdp_invalidate(vma, addr, pmdp); } else if (pgtable) { - pgtable_trans_huge_deposit(vma->vm_mm, pmdp, pgtable); + arch_pgtable_trans_huge_deposit(vma->vm_mm, pmdp, pgtable); mm_inc_nr_ptes(vma->vm_mm); } set_pmd_at(vma->vm_mm, addr, pmdp, entry); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index b91b1a98029c7..5dfdbe6488062 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -164,38 +164,6 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma= , unsigned long address, #endif #endif =20 -#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT -void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, - pgtable_t pgtable) -{ - assert_spin_locked(pmd_lockptr(mm, pmdp)); - - /* FIFO */ - if (!pmd_huge_pte(mm, pmdp)) - INIT_LIST_HEAD(&pgtable->lru); - else - list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru); - pmd_huge_pte(mm, pmdp) =3D pgtable; -} -#endif - -#ifndef __HAVE_ARCH_PGTABLE_WITHDRAW -/* no "address" argument so destroys page coloring of some arch */ -pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) -{ - pgtable_t pgtable; - - assert_spin_locked(pmd_lockptr(mm, pmdp)); - - /* FIFO */ - pgtable =3D pmd_huge_pte(mm, pmdp); - pmd_huge_pte(mm, pmdp) =3D list_first_entry_or_null(&pgtable->lru, - struct page, lru); - if (pmd_huge_pte(mm, pmdp)) - list_del(&pgtable->lru); - return pgtable; -} -#endif =20 #ifndef __HAVE_ARCH_PMDP_INVALIDATE pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EA853368B3 for ; Fri, 27 Mar 2026 02:16:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577766; cv=none; b=jSoCk0bWu/FU3RvCp3w4e5a37WyFRhKmwDZ0Y3nMyA6pR43kllzrbyaLI5JEuf3icgu9SC/ZoiRqaJzGoF7IlNiAExJcbqnH0DIePnupuIdoq5IM1ND6w6fLhmP4MFPYxgCw2g8pgxy4LfUxdXWAkVzpRhY4ZJ6qCCNC2SdcY9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577766; c=relaxed/simple; bh=YTVu15v3L47S7CtoTWxdOrYIq1Zja+qpb/5AizlA7DY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vBEz6s8yUGMSlR5hkmNDa89CpZ2nWsI9bHL1RHZ9BcYTx877K0bvYBNQqVTLsk70VgJev4WP4zo7mPVeWwAq5WA6afIZ1MIcwhkyNa1G0GdVJr/Du5MyK7sQcbXvVAVHg0T2hizSiEOWZOdoldx8mPb+OA8eRqpQv4bubSiWTI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=DjzOCy7C; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="DjzOCy7C" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577763; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sRAVBAAGLIHNuPxenAu6Zv4mVQpIwOINutBDBMSDYBI=; b=DjzOCy7CYYN46My00Xi1fs3rAUMoskKmV/KpehE3GWn6mX4MPqoLYrTNgfZYBGEzLR8iqX SjzK869MdpGpt78ig/FnB50pdDMBRhVufNL3A3S6FY7A6Fna4cqbAKf1pwbFNITcjpiouQ IsF7YtJduydG7B8wGDwJ7Ww4Rl22lmg= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 19/24] mm: thp: add THP_SPLIT_PMD_FAILED counter Date: Thu, 26 Mar 2026 19:09:01 -0700 Message-ID: <20260327021403.214713-20-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add a vmstat counter to track PTE allocation failures during PMD split. This enables monitoring of split failures due to memory pressure after the lazy PTE page table allocation change. The counter is incremented in three places: - __split_huge_pmd(): Main entry point for splitting a PMD - try_to_unmap_one(): When reclaim needs to split a PMD-mapped THP - try_to_migrate_one(): When migration needs to split a PMD-mapped THP Signed-off-by: Usama Arif --- include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 1 + mm/rmap.c | 7 +++++++ mm/vmstat.c | 1 + 4 files changed, 10 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 03fe95f5a0201..ce696cf7d6321 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -98,6 +98,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_DEFERRED_SPLIT_PAGE, THP_UNDERUSED_SPLIT_PAGE, THP_SPLIT_PMD, + THP_SPLIT_PMD_FAILED, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, THP_SCAN_EXCEED_SHARED_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 48c4884a6f386..b93718931e729 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3347,6 +3347,7 @@ int __split_huge_pmd(struct vm_area_struct *vma, pmd_= t *pmd, if (vma_is_anonymous(vma) && !arch_needs_pgtable_deposit()) { pgtable =3D pte_alloc_one(vma->vm_mm); if (!pgtable) { + count_vm_event(THP_SPLIT_PMD_FAILED); mmu_notifier_invalidate_range_end(&range); return -ENOMEM; } diff --git a/mm/rmap.c b/mm/rmap.c index efbcdd3b32632..a0180f62d9f69 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2087,8 +2087,12 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, pgtable_t pgtable =3D prealloc_pte; =20 prealloc_pte =3D NULL; + if (!arch_needs_pgtable_deposit() && !pgtable && vma_is_anonymous(vma)) { +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) + count_vm_event(THP_SPLIT_PMD_FAILED); +#endif page_vma_mapped_walk_done(&pvmw); ret =3D false; break; @@ -2491,6 +2495,9 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, prealloc_pte =3D NULL; if (!arch_needs_pgtable_deposit() && !pgtable && vma_is_anonymous(vma)) { +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) + count_vm_event(THP_SPLIT_PMD_FAILED); +#endif page_vma_mapped_walk_done(&pvmw); ret =3D false; break; diff --git a/mm/vmstat.c b/mm/vmstat.c index 2370c6fb1fcd6..b8df9c7296d8a 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1409,6 +1409,7 @@ const char * const vmstat_text[] =3D { [I(THP_DEFERRED_SPLIT_PAGE)] =3D "thp_deferred_split_page", [I(THP_UNDERUSED_SPLIT_PAGE)] =3D "thp_underused_split_page", [I(THP_SPLIT_PMD)] =3D "thp_split_pmd", + [I(THP_SPLIT_PMD_FAILED)] =3D "thp_split_pmd_failed", [I(THP_SCAN_EXCEED_NONE_PTE)] =3D "thp_scan_exceed_none_pte", [I(THP_SCAN_EXCEED_SWAP_PTE)] =3D "thp_scan_exceed_swap_pte", [I(THP_SCAN_EXCEED_SHARED_PTE)] =3D "thp_scan_exceed_share_pte", --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4F6230648A for ; Fri, 27 Mar 2026 02:16:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577771; cv=none; b=Q2R2yS/b87rpv8/MhJXIUpbwNTdA25UO/rXNZmZZpwfw0AArQ9Uv4s15TQ+LzOxoSZjAs7+5J7nqAfEd2ARukSOfbQJFa1psZJreffE2CD91K2EKDOYvVThJTRpzz4CXXC31QKiyP6Vxso4i7SFJ8jspRdAHPszsu5iu+UbUhNQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577771; c=relaxed/simple; bh=WgkRR2v6cqD/EZzq8PPKfApwSStvT24vlRiXPZmXxMQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B1jvEh7tQUQjBlZJASJ5XQ+/uD8OCuLFo+KsQgpYC3yBpqQdvdy+LjxCuf9riYuVZYLqycGUVKD/4ZXNjngZ9L5b/wG/m7PfErRnMe7p0s8nppVbWKUHi/ak5wZDKTtkodRHsUorMRbWz0/xGR/Pauke5N9WsAVPMMsHzcvyYDY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=BagS9qBG; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="BagS9qBG" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577768; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/WGDRJcAglr5vqEySbz/Q1Ok3+d9N6t6n4ui+xS3iNE=; b=BagS9qBGxnoN00/4kt2N0AtuZZ0wHRFPSolgBLBK2CrXSufkJDyuYyYuIvQyVjVaCsdhJK wMV0Y9lJGKZkNTSKhtRR7ZUnol1AuKxUSsHZsq9u78XkEixDlTJT5maiQxf53+lpNUY7rn n0Eq0GZTwR6gzj24aFATrdiHFUVLBU0= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 20/24] selftests/mm: add THP PMD split test infrastructure Date: Thu, 26 Mar 2026 19:09:02 -0700 Message-ID: <20260327021403.214713-21-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add test infrastructure for verifying THP PMD split behavior with lazy PTE allocation. This includes: - Test fixture with PMD-aligned memory allocation - Helper functions for reading vmstat counters - log_and_check_pmd_split() macro for logging counters and checking if thp_split_pmd has incremented and thp_split_pmd_failed hasn't. - THP allocation helper with verification Also add a test to check if partial unmap of a THP splits the PMD. This exercises zap_pmd_range part of split. Signed-off-by: Usama Arif --- tools/testing/selftests/mm/Makefile | 1 + .../testing/selftests/mm/thp_pmd_split_test.c | 149 ++++++++++++++++++ 2 files changed, 150 insertions(+) create mode 100644 tools/testing/selftests/mm/thp_pmd_split_test.c diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index cd24596cdd27e..4b4610c9b693d 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -95,6 +95,7 @@ TEST_GEN_FILES +=3D uffd-stress TEST_GEN_FILES +=3D uffd-unit-tests TEST_GEN_FILES +=3D uffd-wp-mremap TEST_GEN_FILES +=3D split_huge_page_test +TEST_GEN_FILES +=3D thp_pmd_split_test TEST_GEN_FILES +=3D ksm_tests TEST_GEN_FILES +=3D ksm_functional_tests TEST_GEN_FILES +=3D mdwe_test diff --git a/tools/testing/selftests/mm/thp_pmd_split_test.c b/tools/testin= g/selftests/mm/thp_pmd_split_test.c new file mode 100644 index 0000000000000..0f54ac04760d5 --- /dev/null +++ b/tools/testing/selftests/mm/thp_pmd_split_test.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Tests various kernel code paths that handle THP PMD splitting. + * + * Prerequisites: + * - THP enabled (always or madvise mode): + * echo always > /sys/kernel/mm/transparent_hugepage/enabled + * or + * echo madvise > /sys/kernel/mm/transparent_hugepage/enabled + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kselftest_harness.h" +#include "thp_settings.h" +#include "vm_util.h" + +/* Read vmstat counter */ +static unsigned long read_vmstat(const char *name) +{ + FILE *fp; + char line[256]; + unsigned long value =3D 0; + + fp =3D fopen("/proc/vmstat", "r"); + if (!fp) + return 0; + + while (fgets(line, sizeof(line), fp)) { + if (strncmp(line, name, strlen(name)) =3D=3D 0 && + line[strlen(name)] =3D=3D ' ') { + sscanf(line + strlen(name), " %lu", &value); + break; + } + } + fclose(fp); + return value; +} + +/* + * Log vmstat counters for split_pmd_after/split_pmd_failed_after, + * check if split_pmd_after is greater than before and split_pmd_failed_af= ter + * hasn't incremented. + */ +static void log_and_check_pmd_split(struct __test_metadata *const _metadat= a, + unsigned long split_pmd_before, unsigned long split_pmd_failed_before) +{ + unsigned long split_pmd_after =3D read_vmstat("thp_split_pmd"); + unsigned long split_pmd_failed_after =3D read_vmstat("thp_split_pmd_faile= d"); + + TH_LOG("thp_split_pmd: %lu -> %lu", \ + split_pmd_before, split_pmd_after); + TH_LOG("thp_split_pmd_failed: %lu -> %lu", \ + split_pmd_failed_before, split_pmd_failed_after); + ASSERT_GT(split_pmd_after, split_pmd_before); + ASSERT_EQ(split_pmd_failed_after, split_pmd_failed_before); +} + +/* Allocate a THP at the given aligned address */ +static int allocate_thp(void *aligned, size_t pmdsize) +{ + int ret; + + ret =3D madvise(aligned, pmdsize, MADV_HUGEPAGE); + if (ret) + return -1; + + /* Touch all pages to allocate the THP */ + memset(aligned, 0xAA, pmdsize); + + /* Verify we got a THP */ + if (!check_huge_anon(aligned, 1, pmdsize)) + return -1; + + return 0; +} + +FIXTURE(thp_pmd_split) +{ + void *mem; /* Base mmap allocation */ + void *aligned; /* PMD-aligned pointer within mem */ + size_t pmdsize; /* PMD size from sysfs */ + size_t pagesize; /* Base page size */ + size_t mmap_size; /* Total mmap size for alignment */ + unsigned long split_pmd_before; + unsigned long split_pmd_failed_before; +}; + +FIXTURE_SETUP(thp_pmd_split) +{ + if (!thp_available()) + SKIP(return, "THP not available"); + + self->pmdsize =3D read_pmd_pagesize(); + if (!self->pmdsize) + SKIP(return, "Unable to read PMD size"); + + self->pagesize =3D getpagesize(); + self->mmap_size =3D 4 * self->pmdsize; + + self->split_pmd_before =3D read_vmstat("thp_split_pmd"); + self->split_pmd_failed_before =3D read_vmstat("thp_split_pmd_failed"); + + self->mem =3D mmap(NULL, self->mmap_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(self->mem, MAP_FAILED); + + /* Align to PMD boundary */ + self->aligned =3D (void *)(((unsigned long)self->mem + self->pmdsize - 1)= & + ~(self->pmdsize - 1)); +} + +FIXTURE_TEARDOWN(thp_pmd_split) +{ + if (self->mem && self->mem !=3D MAP_FAILED) + munmap(self->mem, self->mmap_size); +} + +/* + * Partial munmap on THP (zap_pmd_range) + * + * Tests that partial munmap of a THP correctly splits the PMD. + * This exercises zap_pmd_range part of split. + */ +TEST_F(thp_pmd_split, partial_munmap) +{ + int ret; + + ret =3D allocate_thp(self->aligned, self->pmdsize); + if (ret) + SKIP(return, "Failed to allocate THP"); + + ret =3D munmap((char *)self->aligned + self->pagesize, self->pagesize); + ASSERT_EQ(ret, 0); + + log_and_check_pmd_split(_metadata, self->split_pmd_before, + self->split_pmd_failed_before); +} + +TEST_HARNESS_MAIN --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB70E334688 for ; Fri, 27 Mar 2026 02:16:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577776; cv=none; b=L9WeZmYnMZCafHBS0HQ8B0bP1t3pYyIuR7JvWz5REszUZs6erNbS4HJwiEyrBEj4wTkK476t/5isN/31GjOI8/CGah7eepTmewiOfR8NMGIItDcznOx3NrB3B187p4GD7asVh5QCswT3NE2MsH4GZph6aKS8boAPsuevzYepFPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577776; c=relaxed/simple; bh=syfgM5OExLCbukx9GidchZx7ggSkQRCtkMoeVcO/C88=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CAm3pc5L6PlcEVYE0vn70txQRoBZaV6j/YVLA8GFqLmHIxobLiviTT2A3ii1omR/1ukMdImIJZdxkJMFAJY1buShqz2ELIBxxMrAHhh68lzeMq52XusCQzsIi97U7Oe9vbxfaY5mrxS7DU6Gsz/bAPzYrP759CgGwrlQ45vbFLM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=YkBnJlAA; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="YkBnJlAA" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577772; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KBHgpp/Hlt3gblMxOihtIUK/6fHGYOfso38xhE/8i5Y=; b=YkBnJlAAFSpKbHZ97+87mwwuiOpaKGfMki0IODGcWo+ltRqHD5dxWBr38RHva/Za8up5uN sjKP0IWJ4LKydgafIE8LafBBavHbtcDNVR0+2IRtgtbgOEykPv1ek68+Pcb9SQy5kozeBm TXn8jzFa4saniUkUmgk+kYIpggBjy8U= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 21/24] selftests/mm: add partial_mprotect test for change_pmd_range Date: Thu, 26 Mar 2026 19:09:03 -0700 Message-ID: <20260327021403.214713-22-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add test for partial mprotect on THP which exercises change_pmd_range(). This verifies that partial mprotect correctly splits the PMD, applies protection only to the requested portion, and leaves the rest of the mapping writable. Signed-off-by: Usama Arif --- .../testing/selftests/mm/thp_pmd_split_test.c | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/mm/thp_pmd_split_test.c b/tools/testin= g/selftests/mm/thp_pmd_split_test.c index 0f54ac04760d5..4944a5a516da9 100644 --- a/tools/testing/selftests/mm/thp_pmd_split_test.c +++ b/tools/testing/selftests/mm/thp_pmd_split_test.c @@ -146,4 +146,35 @@ TEST_F(thp_pmd_split, partial_munmap) self->split_pmd_failed_before); } =20 +/* + * Partial mprotect on THP (change_pmd_range) + * + * Tests that partial mprotect of a THP correctly splits the PMD and + * applies protection only to the requested portion. This exercises + * the mprotect path which now handles split failures. + */ +TEST_F(thp_pmd_split, partial_mprotect) +{ + volatile unsigned char *ptr =3D (volatile unsigned char *)self->aligned; + int ret; + + ret =3D allocate_thp(self->aligned, self->pmdsize); + if (ret) + SKIP(return, "Failed to allocate THP"); + + /* Partial mprotect - make middle page read-only */ + ret =3D mprotect((char *)self->aligned + self->pagesize, self->pagesize, = PROT_READ); + ASSERT_EQ(ret, 0); + + /* Verify we can still write to non-protected pages */ + ptr[0] =3D 0xDD; + ptr[self->pmdsize - 1] =3D 0xEE; + + ASSERT_EQ(ptr[0], (unsigned char)0xDD); + ASSERT_EQ(ptr[self->pmdsize - 1], (unsigned char)0xEE); + + log_and_check_pmd_split(_metadata, self->split_pmd_before, + self->split_pmd_failed_before); +} + TEST_HARNESS_MAIN --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 672D1336880; Fri, 27 Mar 2026 02:16:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577780; cv=none; b=gE6EXOhIwa1tZbSd58B6imzLSZK9flhWno+7xas8DAnDixom/7goN7PL8oDEv7eBZjQkoEjGm3TM2a60n0gEJm+lNcK6mZeks2rOgg3NZIYvrD8kvtV6E99ZUK+HANtuWHcBdmsxhZhgdrHfIc21T0JDZhq1hrfguNEZWrNDiLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577780; c=relaxed/simple; bh=WjLYMPpOIgsMPc68ngTB1/ImztUUkzeVQScqbvTC6JU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zxqs7A66lLGXYWFw9UJT6HZoGx2lIIhcgYlLqLm5bHAOYLUNBSVRnVrb3uVh+2LROsk6sYIavoSiaAr+/orPMgESvz2X8ypBqwOOJf0k9L9wAN7Z0T7GFQKE0NZnP5WWncspduRtLJwRLR84WnbMKtDpIAgMc3LivUUoIwJlnrs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=nFrkNXxv; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="nFrkNXxv" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577777; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+u28b+kGwOwWeo4iPTn5u1dAfo4aLeElFP2cFybYpB4=; b=nFrkNXxvndBSqppjWyV9ntkfxnDZ0DtuCQnUOY5sup4Cn3dvGfyt7wUKnT4xAdfcGQUu19 prxL9x0QtRJ7eGgfh7QoH8qmgtTOcQ/tFTBltl+w+k5V6XIa5MkQdNn/z6ZBfUXndUQ60X oH4UtoxyWYliZvg4O6754ph/DAJrixE= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 22/24] selftests/mm: add partial_mlock test Date: Thu, 26 Mar 2026 19:09:04 -0700 Message-ID: <20260327021403.214713-23-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add test for partial mlock on THP which exercises walk_page_range() with a subset of the THP. This should trigger a PMD split since mlock operates at page granularity. Signed-off-by: Usama Arif --- .../testing/selftests/mm/thp_pmd_split_test.c | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/tools/testing/selftests/mm/thp_pmd_split_test.c b/tools/testin= g/selftests/mm/thp_pmd_split_test.c index 4944a5a516da9..3c9f05457efec 100644 --- a/tools/testing/selftests/mm/thp_pmd_split_test.c +++ b/tools/testing/selftests/mm/thp_pmd_split_test.c @@ -177,4 +177,30 @@ TEST_F(thp_pmd_split, partial_mprotect) self->split_pmd_failed_before); } =20 +/* + * Partial mlock triggering split (walk_page_range) + * + * Tests mlock on a partial THP region which should trigger a PMD split. + */ +TEST_F(thp_pmd_split, partial_mlock) +{ + int ret; + + ret =3D allocate_thp(self->aligned, self->pmdsize); + if (ret) + SKIP(return, "Failed to allocate THP"); + + /* Partial mlock - should trigger PMD split */ + ret =3D mlock((char *)self->aligned + self->pagesize, self->pagesize); + if (ret && errno =3D=3D ENOMEM) + SKIP(return, "mlock failed with ENOMEM (resource limit)"); + ASSERT_EQ(ret, 0); + + /* Cleanup */ + munlock((char *)self->aligned + self->pagesize, self->pagesize); + + log_and_check_pmd_split(_metadata, self->split_pmd_before, + self->split_pmd_failed_before); +} + TEST_HARNESS_MAIN --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6711B329E40 for ; Fri, 27 Mar 2026 02:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577786; cv=none; b=rP/TEir8K0YPIv1fB6sWvy9jG3Hy+M5H606dDggYbV1mzFRDvyPWeEkgXdNhfowg3zv8j5huY6QRQJ6p6XPTb42xaCuLWs1S5X4kyQvGOdVylqRHelXF6IssQUri0zCKaDwHlDffJ1EXFj4sejx4Bmj5Dx0N9pTbI7jL6vOJc18= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577786; c=relaxed/simple; bh=GRBA492nikdnvv9JLnq+tQVYDGMzkv2WY+smoK7uTJo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VezCgaIIVVrqZplEKO/ymnWmEV5I1a53BuoemSnUcCy1fK2WXmeuOy/jB49zv3qCas5/cjK4pqKgxS8s9XAhY9AC8v0jHP/ga9JAq2YeuJWrmx8EOIBTladps6BOOoNv1QDfqSIPJi8EJJ5Tf1c0LFSZtEE5NoScUeifKpVUfEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=XE9dZB9f; arc=none smtp.client-ip=95.215.58.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="XE9dZB9f" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lE7v5lDGM/JWEaWuAACCuFGk8rRw4V1pzGfx+4tEAgA=; b=XE9dZB9fLZ/OArOngsedrObo//4dofSQ7egtq16n4UZmnM75/spWLYNbGNXdNBcCTBB+Fo cnGQD1gdIQDl3tgpRirkYMGiJR0/XwR2Cvg+8mZxq+a73pFDK7w6i+bpfpGytLEzfpmE/g n0aN0IQDsz7hR+A1FcsvHnMT47HCL70= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 23/24] selftests/mm: add partial_mremap test for move_page_tables Date: Thu, 26 Mar 2026 19:09:05 -0700 Message-ID: <20260327021403.214713-24-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add test for partial mremap on THP which exercises move_page_tables(). This verifies that partial mremap correctly splits the PMD, moves only the requested page, and preserves data integrity in both the moved region and the original mapping. Signed-off-by: Usama Arif --- .../testing/selftests/mm/thp_pmd_split_test.c | 50 +++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/tools/testing/selftests/mm/thp_pmd_split_test.c b/tools/testin= g/selftests/mm/thp_pmd_split_test.c index 3c9f05457efec..1f29296759a5b 100644 --- a/tools/testing/selftests/mm/thp_pmd_split_test.c +++ b/tools/testing/selftests/mm/thp_pmd_split_test.c @@ -203,4 +203,54 @@ TEST_F(thp_pmd_split, partial_mlock) self->split_pmd_failed_before); } =20 +/* + * Partial mremap (move_page_tables) + * + * Tests that partial mremap of a THP correctly splits the PMD and + * moves only the requested portion. This exercises move_page_tables() + * which now handles split failures. + */ +TEST_F(thp_pmd_split, partial_mremap) +{ + void *new_addr; + unsigned long *ptr =3D (unsigned long *)self->aligned; + unsigned long *new_ptr; + unsigned long pattern =3D 0xABCDUL; + int ret; + + ret =3D allocate_thp(self->aligned, self->pmdsize); + if (ret) + SKIP(return, "Failed to allocate THP"); + + /* Write pattern to the page we'll move */ + ptr[self->pagesize / sizeof(unsigned long)] =3D pattern; + + /* Also write to first and last page to verify they stay intact */ + ptr[0] =3D 0x1234UL; + ptr[(self->pmdsize - self->pagesize) / sizeof(unsigned long)] =3D 0x4567U= L; + + /* Partial mremap - move one base page from the THP */ + new_addr =3D mremap((char *)self->aligned + self->pagesize, self->pagesiz= e, + self->pagesize, MREMAP_MAYMOVE); + if (new_addr =3D=3D MAP_FAILED) { + if (errno =3D=3D ENOMEM) + SKIP(return, "mremap failed with ENOMEM"); + ASSERT_NE(new_addr, MAP_FAILED); + } + + /* Verify data was moved correctly */ + new_ptr =3D (unsigned long *)new_addr; + ASSERT_EQ(new_ptr[0], pattern); + + /* Verify surrounding data is intact */ + ASSERT_EQ(ptr[0], 0x1234UL); + ASSERT_EQ(ptr[(self->pmdsize - self->pagesize) / sizeof(unsigned long)], = 0x4567UL); + + /* Cleanup the moved page */ + munmap(new_addr, self->pagesize); + + log_and_check_pmd_split(_metadata, self->split_pmd_before, + self->split_pmd_failed_before); +} + TEST_HARNESS_MAIN --=20 2.52.0 From nobody Thu Apr 2 18:53:29 2026 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 531C97081E; Fri, 27 Mar 2026 02:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577792; cv=none; b=jVqJ1zG7MOB9zUqgaohL01EnWySell3FZ/Bfb4iVwl9NQHf0ZMZUBJL54ZfSziA88ne/4Q1aAAG5Sz/W4CK93O41yIqu8F99IbdCGoHkkBVQrdjs/SaLKgqtP5rdoNY/FshoXcnj/5GvfiBJuEVjA7WKaB+526vEJc+u7Mu2ecg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774577792; c=relaxed/simple; bh=UUT7iH24/amHFqogk8FoV3c5CLRQAISV08V1vKPOync=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SzixXUhX6rxoMstStXww2j7x6ntIPU+kQ2/dMCrP1KXOgTlXnUnd9y2AywcBYCBpmbi1bvybq/MvPG0z030JHlbwsUdOcdvcp6TMnpQzKRv0YT4RvGDCIQMVbBprh/CZxepJ6rK+sP10LOL4I6xNEvx77DLHF57zKoV49qDSyqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=LBA+DGA7; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="LBA+DGA7" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774577789; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6/HdPBhl1cpZOUmiZuBTWaeHIDSWjs3qo4yxLZnBGpA=; b=LBA+DGA7C1hpEc4UK/USzbeWl9ofYAB9golUoHkRx3arz+0f+tCFBPShuiq8sFBcmjRcuj BkqIUVWSZWrzlncJMS6q2HSKiSIKaHundpnJFyYH9WZp0Dq8jBzWfRFloYOIU6zqnN8+4q hwvlj8SU2eHG55dpFcr7cLa/cO3P9xE= From: Usama Arif To: Andrew Morton , david@kernel.org, Lorenzo Stoakes , willy@infradead.org, linux-mm@kvack.org Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, linux-s390@vger.kernel.org, Usama Arif Subject: [v3 24/24] selftests/mm: add madv_dontneed_partial test Date: Thu, 26 Mar 2026 19:09:06 -0700 Message-ID: <20260327021403.214713-25-usama.arif@linux.dev> In-Reply-To: <20260327021403.214713-1-usama.arif@linux.dev> References: <20260327021403.214713-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Add test for partial MADV_DONTNEED on THP. This verifies that MADV_DONTNEED correctly triggers a PMD split, discards only the requested page (which becomes zero-filled), and preserves data in the surrounding pages. Signed-off-by: Usama Arif --- .../testing/selftests/mm/thp_pmd_split_test.c | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/tools/testing/selftests/mm/thp_pmd_split_test.c b/tools/testin= g/selftests/mm/thp_pmd_split_test.c index 1f29296759a5b..060ca1e341b75 100644 --- a/tools/testing/selftests/mm/thp_pmd_split_test.c +++ b/tools/testing/selftests/mm/thp_pmd_split_test.c @@ -253,4 +253,38 @@ TEST_F(thp_pmd_split, partial_mremap) self->split_pmd_failed_before); } =20 +/* + * MADV_DONTNEED on THP + * + * Tests that MADV_DONTNEED on a partial THP correctly handles + * the PMD split and discards only the requested pages. + */ +TEST_F(thp_pmd_split, partial_madv_dontneed) +{ + volatile unsigned char *ptr =3D (volatile unsigned char *)self->aligned; + int ret; + + ret =3D allocate_thp(self->aligned, self->pmdsize); + if (ret) + SKIP(return, "Failed to allocate THP"); + + /* Write pattern */ + memset(self->aligned, 0xDD, self->pmdsize); + + /* Partial MADV_DONTNEED - discard middle page */ + ret =3D madvise((char *)self->aligned + self->pagesize, self->pagesize, M= ADV_DONTNEED); + ASSERT_EQ(ret, 0); + + /* Verify non-discarded pages still have data */ + ASSERT_EQ(ptr[0], (unsigned char)0xDD); + ASSERT_EQ(ptr[2 * self->pagesize], (unsigned char)0xDD); + ASSERT_EQ(ptr[self->pmdsize - 1], (unsigned char)0xDD); + + /* Discarded page should be zero */ + ASSERT_EQ(ptr[self->pagesize], (unsigned char)0x00); + + log_and_check_pmd_split(_metadata, self->split_pmd_before, + self->split_pmd_failed_before); +} + TEST_HARNESS_MAIN --=20 2.52.0