From nobody Sat Feb 7 08:28:35 2026 Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 032651A9F90 for ; Mon, 2 Feb 2026 00:55:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769993732; cv=none; b=utK95jX0hMdJGUCmhT12ppB8GEQv/pJ+7h3STgzoOvqRykqjqW8MMLBb1b55w8CSkC0b3sHdhW0ss7CeLJz4fFmFosjOcs5ASewEDuC1vtqLM3t9FONLbP+odxh7LANRvZZcRX+hQQBQRDwGxikhInT0lDctN7fPln6o3uRhMZ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769993732; c=relaxed/simple; bh=1orPxcwDE+HeShO14qJw8h/wyHB+5A6Oqt+HZSouneo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RufduYd2HQdw5KNUH+LkKi1dPwoBUr05seiuYPVTjKXP4hz7XoEQUxtMbdARmLZ9itpHqwRYTmA286gZYajjzWjJbfV7v2FxYKTYRftejA5H7tuwSRpuYb1GLs+6meUQTR28Dlvs7poa3yfrVRrESG0AO1T4UZrYFEyZ+NMV16w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=csQcH1nR; arc=none smtp.client-ip=209.85.167.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="csQcH1nR" Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-460f3f9fdb1so854059b6e.0 for ; Sun, 01 Feb 2026 16:55:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769993729; x=1770598529; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=28/pyoJMGtkXfar6Z5eN1MS6qKjklbYO6LOLeELBWwE=; b=csQcH1nRALPVppV3ePO1ZahAwfZ27CSK0o70IRZ74/2CCqy9BuUOBdyYWoyxP4JoGy DVycILrU4Qwzo1k28ZiLKzOkFU1/MQWlezfhm3B3jTqeoJ+IJjB86tZW7HyDw6WLws62 j1z5ktubBAkyZEZJAOhJVc915iETqT8CPDA9B4hg/y+IuqL+sliu+Iz9RQeQOaFqzWCq 3IT5H4KMnsTQfaXoTM+fW7Cxsn5GCjjWsFjHCt1XQF9yHz4HS+pFAUBbz2NZ3SVQuujc b7UbHDuw3SkqobtchJ3NOXNCrA4ryAbXrCCp1eRCLf+mM6ufDZad8nTBOmG2FbsZ9q7d G2/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769993729; x=1770598529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=28/pyoJMGtkXfar6Z5eN1MS6qKjklbYO6LOLeELBWwE=; b=WfNVlbFJ4m97s2fsn+bNM6nMNC0IWywCf0q04VJsX/plupegsLlWlf8TRNgHWeMAFB qj6/Gn2/sU47nQbj179XYvv0avcuP+OSyqdXVH1XOm27GmF2fDguYVbvo94oOKsdjnuG Ols2xk86vQ68XoacP9MzaVbQtbW2rxMTG4IgCI06+f30db/z0qdAVcx8Kn2qK0zEVO5i byHc4nxOlXc+TO7tEWdDHbR6p0VOrikdNXPRwXhFJ6e64pXFcwzE+Pb+kxanJlz7mjh5 R7p7mXlXGJ+Nvn85XKMFtDmyTER6l/MQK+sSx3Le2VRRx/xd/OS8BTeTKl3NIRob6W5L ySaQ== X-Forwarded-Encrypted: i=1; AJvYcCUq/ZGqRe+CiuWGpHSEJB9IgfpdgcZ8LSUTgNlCLBUVY/ND7nAzxj+hgAgnGcTltEm8qhZzzB6Xk5UR0rc=@vger.kernel.org X-Gm-Message-State: AOJu0YwWm+i5/LxKG/zjpbSR3jrwKze97hhTstquJlo3T19ocz8SZRZM Ra65tHEwRQXADkciEig3YL+DDzg1l+0zpOQvConiLfZleVp6D3a2dehq X-Gm-Gg: AZuq6aKbTY5a7MRpNmSA3bxDeACAmk3iaJuLmpyUjdEy5yjqIiJ6mH36s7lzzzMghRW f4/Uu9SH/8aysbVbh383n64Vb6XGaBBNbE665IK9vagGUz2rL4FwBU2Yea5fu98aH3aGze7cNJb WZOA6mvCJzHNgwZPNa8coCDxdLiQeMNQHwGCnrSjGIS31yEpjKsJI4pJ6xtobhMcTt4m9XRkzQB w3kSgyRFWYefX7+pKWREzzV4VWamlyASIZSKTLsLoK5E1Lov3l8Q37c3N69rzIR8Vf4xTEVG1CC u4cj/zmB1M4rqxUOT/NGLns1xYDEkDo5OuguORgAC5FwKhFPhqRtoTOhsDbG8rKb/TcgshQ+v0l Che8CKe6NJ0NOYuQLPVE3RSLX11RTpnYm1SZ8EDDVX5bIzskj+rXgKvRY2mPVE2LYPTzHghv7jW +B83RTaF4= X-Received: by 2002:a05:6808:221f:b0:44d:9f05:7159 with SMTP id 5614622812f47-45f1e45761fmr6662638b6e.29.1769993728719; Sun, 01 Feb 2026 16:55:28 -0800 (PST) Received: from localhost ([2a03:2880:10ff:9::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-40980497603sm8657293fac.21.2026.02.01.16.55.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Feb 2026 16:55:28 -0800 (PST) From: Usama Arif To: ziy@nvidia.com, Andrew Morton , David Hildenbrand , lorenzo.stoakes@oracle.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [RFC 01/12] mm: add PUD THP ptdesc and rmap support Date: Sun, 1 Feb 2026 16:50:18 -0800 Message-ID: <20260202005451.774496-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260202005451.774496-1-usamaarif642@gmail.com> References: <20260202005451.774496-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For page table management, PUD THPs need to pre-deposit page tables that will be used when the huge page is later split. When a PUD THP is allocated, we cannot know in advance when or why it might need to be split (COW, partial unmap, reclaim), but we need page tables ready for that eventuality. Similar to how PMD THPs deposit a single PTE table, PUD THPs deposit a PMD table which itself contains deposited PTE tables - a two-level deposit. This commit adds the deposit/withdraw infrastructure and a new pud_huge_pmd field in ptdesc to store the deposited PMD. The deposited PMD tables are stored as a singly-linked stack using only page->lru.next as the link pointer. A doubly-linked list using the standard list_head mechanism would cause memory corruption: list_del() poisons both lru.next (offset 8) and lru.prev (offset 16), but lru.prev overlaps with ptdesc->pmd_huge_pte at offset 16. Since deposited PMD tables have their own deposited PTE tables stored in pmd_huge_pte, poisoning lru.prev would corrupt the PTE table list and cause crashes when withdrawing PTE tables during split. PMD THPs don't have this problem because their deposited PTE tables don't have sub-deposits. Using only lru.next avoids the overlap entirely. For reverse mapping, PUD THPs need the same rmap support that PMD THPs have. The page_vma_mapped_walk() function is extended to recognize and handle PUD-mapped folios during rmap traversal. A new TTU_SPLIT_HUGE_PUD flag tells the unmap path to split PUD THPs before proceeding, since there is no PUD-level migration entry format - the split converts the single PUD mapping into individual PTE mappings that can be migrated or swapped normally. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 5 +++ include/linux/mm.h | 19 ++++++++ include/linux/mm_types.h | 5 ++- include/linux/pgtable.h | 8 ++++ include/linux/rmap.h | 7 ++- mm/huge_memory.c | 8 ++++ mm/internal.h | 3 ++ mm/page_vma_mapped.c | 35 +++++++++++++++ mm/pgtable-generic.c | 83 ++++++++++++++++++++++++++++++++++ mm/rmap.c | 96 +++++++++++++++++++++++++++++++++++++--- 10 files changed, 260 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a4d9f964dfdea..e672e45bb9cc7 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -463,10 +463,15 @@ void __split_huge_pud(struct vm_area_struct *vma, pud= _t *pud, unsigned long address); =20 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +void split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long address); int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pudp, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); #else +static inline void +split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) {} static inline int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pudp, unsigned long addr, pgprot_t newprot, diff --git a/include/linux/mm.h b/include/linux/mm.h index ab2e7e30aef96..a15e18df0f771 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3455,6 +3455,22 @@ static inline bool pagetable_pmd_ctor(struct mm_stru= ct *mm, * considered ready to switch to split PUD locks yet; there may be places * which need to be converted from page_table_lock. */ +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static inline struct page *pud_pgtable_page(pud_t *pud) +{ + unsigned long mask =3D ~(PTRS_PER_PUD * sizeof(pud_t) - 1); + + return virt_to_page((void *)((unsigned long)pud & mask)); +} + +static inline struct ptdesc *pud_ptdesc(pud_t *pud) +{ + return page_ptdesc(pud_pgtable_page(pud)); +} + +#define pud_huge_pmd(pud) (pud_ptdesc(pud)->pud_huge_pmd) +#endif + static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) { return &mm->page_table_lock; @@ -3471,6 +3487,9 @@ static inline spinlock_t *pud_lock(struct mm_struct *= mm, pud_t *pud) static inline void pagetable_pud_ctor(struct ptdesc *ptdesc) { __pagetable_ctor(ptdesc); +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + ptdesc->pud_huge_pmd =3D NULL; +#endif } =20 static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 78950eb8926dc..26a38490ae2e1 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -577,7 +577,10 @@ struct ptdesc { struct list_head pt_list; struct { unsigned long _pt_pad_1; - pgtable_t pmd_huge_pte; + union { + pgtable_t pmd_huge_pte; /* For PMD tables: deposited PTE */ + pgtable_t pud_huge_pmd; /* For PUD tables: deposited PMD list */ + }; }; }; unsigned long __page_mapping; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2f0dd3a4ace1a..3ce733c1d71a2 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1168,6 +1168,14 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct = mm_struct *mm, pmd_t *pmdp); #define arch_needs_pgtable_deposit() (false) #endif =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +extern void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pu= dp, + pmd_t *pmd_table); +extern pmd_t *pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t = *pudp); +extern void pud_deposit_pte(pmd_t *pmd_table, pgtable_t pgtable); +extern pgtable_t pud_withdraw_pte(pmd_t *pmd_table); +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * This is an implementation of pmdp_establish() that is only suitable for= an diff --git a/include/linux/rmap.h b/include/linux/rmap.h index daa92a58585d9..08cd0a0eb8763 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -101,6 +101,7 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED =3D 0x80, /* do not grab rmap lock: * caller holds it */ + TTU_SPLIT_HUGE_PUD =3D 0x100, /* split huge PUD if any */ }; =20 #ifdef CONFIG_MMU @@ -473,6 +474,8 @@ void folio_add_anon_rmap_ptes(struct folio *, struct pa= ge *, int nr_pages, folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags) void folio_add_anon_rmap_pmd(struct folio *, struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); +void folio_add_anon_rmap_pud(struct folio *, struct page *, + struct vm_area_struct *, unsigned long address, rmap_t flags); void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address, rmap_t flags); void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, @@ -933,6 +936,7 @@ struct page_vma_mapped_walk { pgoff_t pgoff; struct vm_area_struct *vma; unsigned long address; + pud_t *pud; pmd_t *pmd; pte_t *pte; spinlock_t *ptl; @@ -970,7 +974,7 @@ static inline void page_vma_mapped_walk_done(struct pag= e_vma_mapped_walk *pvmw) static inline void page_vma_mapped_walk_restart(struct page_vma_mapped_walk *pvmw) { - WARN_ON_ONCE(!pvmw->pmd && !pvmw->pte); + WARN_ON_ONCE(!pvmw->pud && !pvmw->pmd && !pvmw->pte); =20 if (likely(pvmw->ptl)) spin_unlock(pvmw->ptl); @@ -978,6 +982,7 @@ page_vma_mapped_walk_restart(struct page_vma_mapped_wal= k *pvmw) WARN_ON_ONCE(1); =20 pvmw->ptl =3D NULL; + pvmw->pud =3D NULL; pvmw->pmd =3D NULL; pvmw->pte =3D NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40cf59301c21a..3128b3beedb0a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2933,6 +2933,14 @@ void __split_huge_pud(struct vm_area_struct *vma, pu= d_t *pud, spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); } + +void split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PUD_SIZE)); + if (pud_trans_huge(*pud)) + __split_huge_pud_locked(vma, pud, address); +} #else void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address) diff --git a/mm/internal.h b/mm/internal.h index 9ee336aa03656..21d5c00f638dc 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -545,6 +545,9 @@ int user_proactive_reclaim(char *buf, * in mm/rmap.c: */ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +pud_t *mm_find_pud(struct mm_struct *mm, unsigned long address); +#endif =20 /* * in mm/page_alloc.c diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index b38a1d00c971b..d31eafba38041 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -146,6 +146,18 @@ static bool check_pmd(unsigned long pfn, struct page_v= ma_mapped_walk *pvmw) return true; } =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +/* Returns true if the two ranges overlap. Careful to not overflow. */ +static bool check_pud(unsigned long pfn, struct page_vma_mapped_walk *pvmw) +{ + if ((pfn + HPAGE_PUD_NR - 1) < pvmw->pfn) + return false; + if (pfn > pvmw->pfn + pvmw->nr_pages - 1) + return false; + return true; +} +#endif + static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long = size) { pvmw->address =3D (pvmw->address + size) & ~(size - 1); @@ -188,6 +200,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) pud_t *pud; pmd_t pmde; =20 + /* The only possible pud mapping has been handled on last iteration */ + if (pvmw->pud && !pvmw->pmd) + return not_found(pvmw); + /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) return not_found(pvmw); @@ -234,6 +250,25 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk = *pvmw) continue; } =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + /* Check for PUD-mapped THP */ + if (pud_trans_huge(*pud)) { + pvmw->pud =3D pud; + pvmw->ptl =3D pud_lock(mm, pud); + if (likely(pud_trans_huge(*pud))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (!check_pud(pud_pfn(*pud), pvmw)) + return not_found(pvmw); + return true; + } + /* PUD was split under us, retry at PMD level */ + spin_unlock(pvmw->ptl); + pvmw->ptl =3D NULL; + pvmw->pud =3D NULL; + } +#endif + pvmw->pmd =3D pmd_offset(pud, pvmw->address); /* * Make sure the pmd value isn't cached in a register by the diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d3aec7a9926ad..2047558ddcd79 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -195,6 +195,89 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct= *mm, pmd_t *pmdp) } #endif =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +/* + * Deposit page tables for PUD THP. + * Called with PUD lock held. Stores PMD tables in a singly-linked stack + * via pud_huge_pmd, using only pmd_page->lru.next as the link pointer. + * + * IMPORTANT: We use only lru.next (offset 8) for linking, NOT the full + * list_head. This is because lru.prev (offset 16) overlaps with + * ptdesc->pmd_huge_pte, which stores the PMD table's deposited PTE tables. + * Using list_del() would corrupt pmd_huge_pte with LIST_POISON2. + * + * PTE tables should be deposited into the PMD using pud_deposit_pte(). + */ +void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, + pmd_t *pmd_table) +{ + pgtable_t pmd_page =3D virt_to_page(pmd_table); + + assert_spin_locked(pud_lockptr(mm, pudp)); + + /* Push onto stack using only lru.next as the link */ + pmd_page->lru.next =3D (struct list_head *)pud_huge_pmd(pudp); + pud_huge_pmd(pudp) =3D pmd_page; +} + +/* + * Withdraw the deposited PMD table for PUD THP split or zap. + * Called with PUD lock held. + * Returns NULL if no more PMD tables are deposited. + */ +pmd_t *pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp) +{ + pgtable_t pmd_page; + + assert_spin_locked(pud_lockptr(mm, pudp)); + + pmd_page =3D pud_huge_pmd(pudp); + if (!pmd_page) + return NULL; + + /* Pop from stack - lru.next points to next PMD page (or NULL) */ + pud_huge_pmd(pudp) =3D (pgtable_t)pmd_page->lru.next; + + return page_address(pmd_page); +} + +/* + * Deposit a PTE table into a standalone PMD table (not yet in page table = hierarchy). + * Used for PUD THP pre-deposit. The PMD table's pmd_huge_pte stores a lin= ked list. + * No lock assertion since the PMD isn't visible yet. + */ +void pud_deposit_pte(pmd_t *pmd_table, pgtable_t pgtable) +{ + struct ptdesc *ptdesc =3D virt_to_ptdesc(pmd_table); + + /* FIFO - add to front of list */ + if (!ptdesc->pmd_huge_pte) + INIT_LIST_HEAD(&pgtable->lru); + else + list_add(&pgtable->lru, &ptdesc->pmd_huge_pte->lru); + ptdesc->pmd_huge_pte =3D pgtable; +} + +/* + * Withdraw a PTE table from a standalone PMD table. + * Returns NULL if no more PTE tables are deposited. + */ +pgtable_t pud_withdraw_pte(pmd_t *pmd_table) +{ + struct ptdesc *ptdesc =3D virt_to_ptdesc(pmd_table); + pgtable_t pgtable; + + pgtable =3D ptdesc->pmd_huge_pte; + if (!pgtable) + return NULL; + ptdesc->pmd_huge_pte =3D list_first_entry_or_null(&pgtable->lru, + struct page, lru); + if (ptdesc->pmd_huge_pte) + list_del(&pgtable->lru); + return pgtable; +} +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ + #ifndef __HAVE_ARCH_PMDP_INVALIDATE pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) diff --git a/mm/rmap.c b/mm/rmap.c index 7b9879ef442d9..69acabd763da4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -811,6 +811,32 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long= address) return pmd; } =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +/* + * Returns the actual pud_t* where we expect 'address' to be mapped from, = or + * NULL if it doesn't exist. No guarantees / checks on what the pud_t* + * represents. + */ +pud_t *mm_find_pud(struct mm_struct *mm, unsigned long address) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud =3D NULL; + + pgd =3D pgd_offset(mm, address); + if (!pgd_present(*pgd)) + goto out; + + p4d =3D p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + goto out; + + pud =3D pud_offset(p4d, address); +out: + return pud; +} +#endif + struct folio_referenced_arg { int mapcount; int referenced; @@ -1415,11 +1441,7 @@ static __always_inline void __folio_add_anon_rmap(st= ruct folio *folio, SetPageAnonExclusive(page); break; case PGTABLE_LEVEL_PUD: - /* - * Keep the compiler happy, we don't support anonymous - * PUD mappings. - */ - WARN_ON_ONCE(1); + SetPageAnonExclusive(page); break; default: BUILD_BUG(); @@ -1503,6 +1525,31 @@ void folio_add_anon_rmap_pmd(struct folio *folio, st= ruct page *page, #endif } =20 +/** + * folio_add_anon_rmap_pud - add a PUD mapping to a page range of an anon = folio + * @folio: The folio to add the mapping to + * @page: The first page to add + * @vma: The vm area in which the mapping is added + * @address: The user virtual address of the first page to map + * @flags: The rmap flags + * + * The page range of folio is defined by [first_page, first_page + HPAGE_P= UD_NR) + * + * The caller needs to hold the page table lock, and the page must be lock= ed in + * the anon_vma case: to serialize mapping,index checking after setting. + */ +void folio_add_anon_rmap_pud(struct folio *folio, struct page *page, + struct vm_area_struct *vma, unsigned long address, rmap_t flags) +{ +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + __folio_add_anon_rmap(folio, page, HPAGE_PUD_NR, vma, address, flags, + PGTABLE_LEVEL_PUD); +#else + WARN_ON_ONCE(true); +#endif +} + /** * folio_add_new_anon_rmap - Add mapping to a new anonymous folio. * @folio: The folio to add the mapping to. @@ -1934,6 +1981,20 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, } =20 if (!pvmw.pte) { + /* + * Check for PUD-mapped THP first. + * If we have a PUD mapping and TTU_SPLIT_HUGE_PUD is set, + * split the PUD to PMD level and restart the walk. + */ + if (pvmw.pud && pud_trans_huge(*pvmw.pud)) { + if (flags & TTU_SPLIT_HUGE_PUD) { + split_huge_pud_locked(vma, pvmw.pud, pvmw.address); + flags &=3D ~TTU_SPLIT_HUGE_PUD; + page_vma_mapped_walk_restart(&pvmw); + continue; + } + } + if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) goto walk_done; @@ -2325,6 +2386,27 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); =20 while (page_vma_mapped_walk(&pvmw)) { + /* Handle PUD-mapped THP first */ + if (!pvmw.pte && !pvmw.pmd) { +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + /* + * PUD-mapped THP: skip migration to preserve the huge + * page. Splitting would defeat the purpose of PUD THPs. + * Return false to indicate migration failure, which + * will cause alloc_contig_range() to try a different + * memory region. + */ + if (pvmw.pud && pud_trans_huge(*pvmw.pud)) { + page_vma_mapped_walk_done(&pvmw); + ret =3D false; + break; + } +#endif + /* Unexpected state: !pte && !pmd but not a PUD THP */ + page_vma_mapped_walk_done(&pvmw); + break; + } + /* PMD-mapped THP migration entry */ if (!pvmw.pte) { __maybe_unused unsigned long pfn; @@ -2607,10 +2689,10 @@ void try_to_migrate(struct folio *folio, enum ttu_f= lags flags) =20 /* * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and - * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags. + * TTU_SPLIT_HUGE_PMD, TTU_SPLIT_HUGE_PUD, TTU_SYNC, and TTU_BATCH_FLUSH = flags. */ if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | - TTU_SYNC | TTU_BATCH_FLUSH))) + TTU_SPLIT_HUGE_PUD | TTU_SYNC | TTU_BATCH_FLUSH))) return; =20 if (folio_is_zone_device(folio) && --=20 2.47.3