From nobody Sat Feb 7 08:28:30 2026 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91D451D61A3 for ; Mon, 2 Feb 2026 00:55:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769993736; cv=none; b=tQj5pUCmjtz/uoJc37nSEtiOtNYlHqVK1L3mmDVpbvFjkfd4r51Qnue5TZlVDhW9BIRdzsDXR+cHErkRr9ZliDujL6bcpHMN5PltKIT15UA9yLCxyEFFaG7ItaRBJTtgByngn0WylYceSAi7DounTh+q72LRIUVqbYrhMYs5y4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769993736; c=relaxed/simple; bh=ic1Eq3oay0wPXLz1rrx+Q6+KQM7YDUABsR6V02ZzTQI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MybdDt+pQOx5zuhuASbFoCKKHGoUiR67QLYL5PdOrL7IbETL4IWXH0qj4hOpzb6+8a4H+0vOF+4j+DetSGUlwll94714XnnvMajnIIPZfOmgbyt8itEHBpL9TZiYYXDBJ4YDcl8ZzKwwtH19MmlL9s4cPf1ryQcdBy6ih1B6Fr4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RrYFHAzB; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RrYFHAzB" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7d1890f7cefso3323846a34.3 for ; Sun, 01 Feb 2026 16:55:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769993733; x=1770598533; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DxuUiLfYi20Fha3ulkW2hqwY4wzmXAQF/uDuddDWXP0=; b=RrYFHAzBgbIGxkFOPJDOZi2noEyw8b9IqDDVUllXRpbWcdHJOzky6DllZjdo0suqXy oDiuBjBwcZE1N5bHyAn4wYsCE7eG6tBGjxZNFspprqSEAH3CdmXIkhGk8XnvRX+AoiCJ QDyV01jNPyHnB37CzPgC4yE5j5WrBa4lqi3an9ueNoHQ4udCi7OSRnbn5lpyTL1aBPqh YmDx5x9ni+Lo574kTHHx6BrY0cUBPQOtv0rwmTuxk8xw+mO06tBHCQ1LB869mQaFJFjQ BVVUaPkNv9yLRW5DWb+JwQCYKAPENVi+QrCnHdK7CDU52BBR0TeG+B7Jrp866Io7eTG/ ii6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769993733; x=1770598533; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=DxuUiLfYi20Fha3ulkW2hqwY4wzmXAQF/uDuddDWXP0=; b=S6O53e8AJ8NsXGadr5+y2W2IjkFeKcySaTrcg/yTbueOzu+OUa8OK2wP+zREwfKEn7 4K1o2BR02Mny/j1nGUbhEQrDEGAP2yyfAFDMbefWdLI269Ej7XZ/ORddOzaaW41vDiMR 1xISOFAo0DrzKQVVt0KJwbM5EwCam4u5CnTEdJOfmJh1nKKr//EvLq4L19N3QoiH0rbZ OR6JhvAWsaf46ooQbPrdn3XqvEU4NglRBuV8myIHUbMay61ZdSEU4buKEGu59GFCcftI 6v6O5VYBOQ/YTP5xt1BfrN8SwTqEDaAjKxtGKdsLx+SWALdCjHIeZupI51J2OlMxqnNM F4cw== X-Forwarded-Encrypted: i=1; AJvYcCVWcvYKXG3C9QVlr/yYEzMpepys08Cnt8JSeSmH7Ln2JgOJjD+XAt/Z86vYfXJkrpQ/jp4ZIZV39Hn3KpI=@vger.kernel.org X-Gm-Message-State: AOJu0Yw33eF3/kdY7cVqXo6H6mRW8WDKVf+9LVCnBM255bOdvh9GgPqq RnlGpjVF9qLwq40kPnqUEYGDkVsSqDJrA0OSxRfxjtfRfyvMJtVy7qdb X-Gm-Gg: AZuq6aJntB5UvZLCOlg+ll0uP69NTEB97G6Zc4fFbIWjmdm+RxlOCzV2jpHAV8n1Mh7 QWoumBMwCU/KbLnd7tsiHjfOP3LDgvHREtjNVSI5ci4YN8pfesQxZ0DA6Ib/5vdCEbXUlc4L9h8 rE62/xFmsZWhJWG3sgj1Uv0SRROCEZa4owtYDo9KIVoVgjqRV9Y7o3/KJlrlgyz1PtHEdi0ivkV 8OMXrCooqvcWWYycc4UBODcQ+KBggrOADb4AjVWh++kSwbMXbpxAp4Devj76YD/P1ziTgg/0zsR l7QjenKC+S/QZjtVM/7xKAwT3utGohZy8QPF/bVuNR/gJvdI5jylxT9tEvNjZTDorweehFuIE6d as9tyqu992P6qIFpE8boNIDQ+ABjSy0iWMjQo0A8yq2ZHMPqw7mmrn/+CwepRRhggNcACxPEgLo vnVczITddK X-Received: by 2002:a05:6830:3112:b0:7d1:4d70:736e with SMTP id 46e09a7af769-7d1a535482fmr5803898a34.35.1769993733426; Sun, 01 Feb 2026 16:55:33 -0800 (PST) Received: from localhost ([2a03:2880:10ff:53::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d18c69bf4csm9760188a34.9.2026.02.01.16.55.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Feb 2026 16:55:32 -0800 (PST) From: Usama Arif To: ziy@nvidia.com, Andrew Morton , David Hildenbrand , lorenzo.stoakes@oracle.com, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [RFC 05/12] mm: thp: add reclaim and migration support for PUD THP Date: Sun, 1 Feb 2026 16:50:22 -0800 Message-ID: <20260202005451.774496-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260202005451.774496-1-usamaarif642@gmail.com> References: <20260202005451.774496-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable the memory reclaim and migration paths to handle PUD THPs correctly by splitting them before proceeding. Memory reclaim needs to unmap pages before they can be reclaimed. For PUD THPs, the unmap path now passes TTU_SPLIT_HUGE_PUD when unmapping PUD-sized folios. This triggers the PUD split during the unmap phase, converting the single PUD mapping into 262144 PTE mappings. Reclaim then proceeds normally with the individual pages. This follows the same pattern used for PMD THPs with TTU_SPLIT_HUGE_PMD. When migration encounters a PUD-sized folio, it now splits the folio first using the standard folio split mechanism. The resulting smaller folios (or individual pages) can then be migrated normally. This matches how PMD THPs are handled when PMD migration is not supported on a given architecture. The split-before-migrate approach means PUD THPs will be broken up during NUMA balancing or memory compaction. While this loses the TLB benefit of the large mapping, it allows these memory management operations to proceed. Future work could add PUD-level migration entries to preserve the mapping through migration. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 11 ++++++ mm/huge_memory.c | 83 +++++++++++++++++++++++++++++++++++++---- mm/migrate.c | 17 +++++++++ mm/vmscan.c | 2 + 4 files changed, 105 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a292035c0270f..8b2bffda4b4f3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -559,6 +559,17 @@ static inline bool folio_test_pmd_mappable(struct foli= o *folio) return folio_order(folio) >=3D HPAGE_PMD_ORDER; } =20 +/** + * folio_test_pud_mappable - Can we map this folio with a PUD? + * @folio: The folio to test + * + * Return: true - @folio can be PUD-mapped, false - @folio cannot be PUD-m= apped. + */ +static inline bool folio_test_pud_mappable(struct folio *folio) +{ + return folio_order(folio) >=3D HPAGE_PUD_ORDER; +} + vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); =20 vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 39b8212b5abd4..87b2c21df4a49 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2228,9 +2228,17 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct m= m_struct *src_mm, goto out_unlock; =20 /* - * TODO: once we support anonymous pages, use - * folio_try_dup_anon_rmap_*() and split if duplicating fails. + * For anonymous pages, split to PTE level. + * This simplifies fork handling - we don't need to duplicate + * the complex anon rmap at PUD level. */ + if (vma_is_anonymous(vma)) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pud(vma, src_pud, addr); + return -EAGAIN; + } + if (is_cow_mapping(vma->vm_flags) && pud_write(pud)) { pudp_set_wrprotect(src_mm, addr, src_pud); pud =3D pud_wrprotect(pud); @@ -3099,11 +3107,29 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_= area_struct *vma, { spinlock_t *ptl; pud_t orig_pud; + pmd_t *pmd_table; + pgtable_t pte_table; + int nr_pte_tables =3D 0; =20 ptl =3D __pud_trans_huge_lock(pud, vma); if (!ptl) return 0; =20 + /* + * Withdraw any deposited page tables before clearing the PUD. + * These need to be freed and their counters decremented. + */ + pmd_table =3D pgtable_trans_huge_pud_withdraw(tlb->mm, pud); + if (pmd_table) { + while ((pte_table =3D pud_withdraw_pte(pmd_table)) !=3D NULL) { + pte_free(tlb->mm, pte_table); + mm_dec_nr_ptes(tlb->mm); + nr_pte_tables++; + } + pmd_free(tlb->mm, pmd_table); + mm_dec_nr_pmds(tlb->mm); + } + orig_pud =3D pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); arch_check_zapped_pud(vma, orig_pud); tlb_remove_pud_tlb_entry(tlb, pud, addr); @@ -3114,14 +3140,15 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_= area_struct *vma, struct page *page =3D NULL; struct folio *folio; =20 - /* No support for anonymous PUD pages or migration yet */ - VM_WARN_ON_ONCE(vma_is_anonymous(vma) || - !pud_present(orig_pud)); + VM_WARN_ON_ONCE(!pud_present(orig_pud)); =20 page =3D pud_page(orig_pud); folio =3D page_folio(page); folio_remove_rmap_pud(folio, page, vma); - add_mm_counter(tlb->mm, mm_counter_file(folio), -HPAGE_PUD_NR); + if (vma_is_anonymous(vma)) + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PUD_NR); + else + add_mm_counter(tlb->mm, mm_counter_file(folio), -HPAGE_PUD_NR); =20 spin_unlock(ptl); tlb_remove_page_size(tlb, page, HPAGE_PUD_SIZE); @@ -3729,15 +3756,53 @@ static inline void split_huge_pmd_if_needed(struct = vm_area_struct *vma, unsigned split_huge_pmd_address(vma, address, false); } =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void split_huge_pud_address(struct vm_area_struct *vma, unsigned lo= ng address) +{ + pud_t *pud =3D mm_find_pud(vma->vm_mm, address); + + if (!pud) + return; + + __split_huge_pud(vma, pud, address); +} + +static inline void split_huge_pud_if_needed(struct vm_area_struct *vma, un= signed long address) +{ + /* + * If the new address isn't PUD-aligned and it could previously + * contain a PUD huge page: check if we need to split it. + */ + if (!IS_ALIGNED(address, HPAGE_PUD_SIZE) && + range_in_vma(vma, ALIGN_DOWN(address, HPAGE_PUD_SIZE), + ALIGN(address, HPAGE_PUD_SIZE))) + split_huge_pud_address(vma, address); +} +#else +static inline void split_huge_pud_if_needed(struct vm_area_struct *vma, un= signed long address) +{ +} +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ + void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, struct vm_area_struct *next) { - /* Check if we need to split start first. */ + /* Check if we need to split PUD THP at start first. */ + split_huge_pud_if_needed(vma, start); + + /* Check if we need to split PUD THP at end. */ + split_huge_pud_if_needed(vma, end); + + /* If we're incrementing next->vm_start, we might need to split it. */ + if (next) + split_huge_pud_if_needed(next, end); + + /* Check if we need to split PMD THP at start. */ split_huge_pmd_if_needed(vma, start); =20 - /* Check if we need to split end next. */ + /* Check if we need to split PMD THP at end. */ split_huge_pmd_if_needed(vma, end); =20 /* If we're incrementing next->vm_start, we might need to split it. */ @@ -3752,6 +3817,8 @@ static void unmap_folio(struct folio *folio) =20 VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); =20 + if (folio_test_pud_mappable(folio)) + ttu_flags |=3D TTU_SPLIT_HUGE_PUD; if (folio_test_pmd_mappable(folio)) ttu_flags |=3D TTU_SPLIT_HUGE_PMD; =20 diff --git a/mm/migrate.c b/mm/migrate.c index 4688b9e38cd2f..2d3d2f5585d14 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1859,6 +1859,23 @@ static int migrate_pages_batch(struct list_head *fro= m, * we will migrate them after the rest of the * list is processed. */ + /* + * PUD-sized folios cannot be migrated directly, + * but can be split. Try to split them first and + * migrate the resulting smaller folios. + */ + if (folio_test_pud_mappable(folio)) { + nr_failed++; + stats->nr_thp_failed++; + if (!try_split_folio(folio, split_folios, mode)) { + stats->nr_thp_split++; + stats->nr_split++; + continue; + } + stats->nr_failed_pages +=3D nr_pages; + list_move_tail(&folio->lru, ret_folios); + continue; + } if (!thp_migration_supported() && is_thp) { nr_failed++; stats->nr_thp_failed++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 619691aa43938..868514a770bf2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1348,6 +1348,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, enum ttu_flags flags =3D TTU_BATCH_FLUSH; bool was_swapbacked =3D folio_test_swapbacked(folio); =20 + if (folio_test_pud_mappable(folio)) + flags |=3D TTU_SPLIT_HUGE_PUD; if (folio_test_pmd_mappable(folio)) flags |=3D TTU_SPLIT_HUGE_PMD; /* --=20 2.47.3