From nobody Thu Apr 2 15:36:23 2026 Received: from canpmsgout02.his.huawei.com (canpmsgout02.his.huawei.com [113.46.200.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 648DA36B075 for ; Sat, 28 Feb 2026 07:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262932; cv=none; b=WE7VDzKWgx8lxJ01PB+F7vDK1//VsdN/k2i5HjtgFAZrI3Ed4EXm3EvjcovPHPbpIRvOYP3uLbBpuq7h189n8CQ0Y08gvuE40VEWsOQuO7IE2ZSyoWfKZoWzYa0RfjmPZ1pUh2OyQA8tDXnYMb2H14ANxn38RbuUQuSQUdHoPUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262932; c=relaxed/simple; bh=86M1OWsXpT6LGIsYDto7cpBzWLgsuelYkJ5yI59VmYs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ONmjdQ1ak8qrVCvKyU52S2rcV4raZBJy3rnB9irYUEbcFS9OXfZ35Pc3eCV8fsSgos29tqUhkmqNRsZuSC4fYAGKXj0FiSwQDR82wSyh3kQVS6sGfbEeM2t2ApxHTQtqUqre83+lzhzvscb46eavcuP7m7Hth+OvBKrINaJhZ8U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=F2yxSYtM; arc=none smtp.client-ip=113.46.200.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="F2yxSYtM" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=GY7w9lG2hNjdddPK/8x1giPete9XsAMvnf7gDXNDb+Y=; b=F2yxSYtM126iXPXdt97LlTD02LO7RRIbC7ASktZs819acq4FuUmUfnDGPoEXJ5gfBqoq++I+U VDjuSx7eYE2bDrwHfE4kyrEq0bo0V6N7u9JZnmeORNH0RStBvhdzxjA5ALyY5zk4khG4RjBYism OW4U8djntX5MznrWT8xvruQ= Received: from mail.maildlp.com (unknown [172.19.162.197]) by canpmsgout02.his.huawei.com (SkyGuard) with ESMTPS id 4fNGYM6X91zcb1Z; Sat, 28 Feb 2026 15:10:15 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 041724056E; Sat, 28 Feb 2026 15:15:22 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sat, 28 Feb 2026 15:15:20 +0800 From: Yin Tirui To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: , , Subject: [PATCH RFC v3 4/4] mm: add PMD-level huge page support for remap_pfn_range() Date: Sat, 28 Feb 2026 15:09:06 +0800 Message-ID: <20260228070906.1418911-5-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260228070906.1418911-1-yintirui@huawei.com> References: <20260228070906.1418911-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Add PMD-level huge page support to remap_pfn_range(), automatically creating huge mappings when prerequisites are satisfied (size, alignment, architecture support, etc.) and falling back to normal page mappings otherwise. Implement special huge PMD splitting by utilizing the pgtable deposit/ withdraw mechanism. When splitting is needed, the deposited pgtable is withdrawn and populated with individual PTEs created from the original huge mapping. Signed-off-by: Yin Tirui --- mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++-- mm/memory.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4ca8cfd7f9d..e463d51005ee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1857,6 +1857,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pmd =3D pmdp_get_lockless(src_pmd); if (unlikely(pmd_present(pmd) && pmd_special(pmd) && !is_huge_zero_pmd(pmd))) { + pgtable =3D pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; dst_ptl =3D pmd_lock(dst_mm, dst_pmd); src_ptl =3D pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -1870,6 +1873,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, * able to wrongly write to the backend MMIO. */ VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); + + /* dax won't reach here, it will be intercepted at vma_needs_copy() */ + VM_WARN_ON_ONCE(vma_is_dax(src_vma)); + + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); goto set_pmd; } =20 @@ -2360,6 +2369,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_ar= ea_struct *vma, arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (!vma_is_dax(vma) && vma_is_special_huge(vma)) { + if (pmd_special(orig_pmd)) + zap_deposited_table(tlb->mm, pmd); if (arch_needs_pgtable_deposit()) zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); @@ -3005,14 +3016,35 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, =20 if (!vma_is_anonymous(vma)) { old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); + + if (!vma_is_dax(vma) && vma_is_special_huge(vma)) { + pte_t entry; + + if (!pmd_special(old_pmd)) { + zap_deposited_table(mm, pmd); + return; + } + pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + if (unlikely(!pgtable)) + return; + pmd_populate(mm, &_pmd, pgtable); + pte =3D pte_offset_map(&_pmd, haddr); + entry =3D pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)); + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + pte_unmap(pte); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); + return; + } + /* * We are going to unmap this huge page. So * just go ahead and zap it */ if (arch_needs_pgtable_deposit()) zap_deposited_table(mm, pmd); - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) - return; + if (unlikely(pmd_is_migration_entry(old_pmd))) { const softleaf_t old_entry =3D softleaf_from_pmd(old_pmd); =20 diff --git a/mm/memory.c b/mm/memory.c index 07778814b4a8..affccf38cbcf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2890,6 +2890,40 @@ static int remap_pte_range(struct mm_struct *mm, pmd= _t *pmd, return err; } =20 +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP +static int remap_try_huge_pmd(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot) +{ + pgtable_t pgtable; + spinlock_t *ptl; + + if ((end - addr) !=3D PMD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PMD_SIZE)) + return 0; + + if (!IS_ALIGNED(pfn, HPAGE_PMD_NR)) + return 0; + + if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) + return 0; + + pgtable =3D pte_alloc_one(mm); + if (unlikely(!pgtable)) + return 0; + + mm_inc_nr_ptes(mm); + ptl =3D pmd_lock(mm, pmd); + set_pmd_at(mm, addr, pmd, pmd_mkspecial(pmd_mkhuge(pfn_pmd(pfn, prot)))); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + spin_unlock(ptl); + + return 1; +} +#endif + static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) @@ -2905,6 +2939,12 @@ static inline int remap_pmd_range(struct mm_struct *= mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next =3D pmd_addr_end(addr, end); +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP + if (remap_try_huge_pmd(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot)) { + continue; + } +#endif err =3D remap_pte_range(mm, pmd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) --=20 2.22.0