From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout05.his.huawei.com (canpmsgout05.his.huawei.com [113.46.200.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A448401A37; Tue, 26 May 2026 14:56:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807384; cv=none; b=kcL8O7ZeYbUIpaJNYFKDN4Rc8vom5AstpXp8+MStgKY07NebXMZ/NuMAMm/UmUFTEWg0Ih+IzXTmUxRFoz4lv6RrCTMyiSxPER83d23w896z9c28Xz7UlfTu8gRa1N7IeASr6i1VhivTpYPe/uAFZBEQaQfq1sdxhzU4o9QyNl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807384; c=relaxed/simple; bh=WnGUrY0oQogOhF2g3q9bQmT/7T4F2aeG7ceF9QJ1hfA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WTwzaOvUG1gQgcIqPm7YDSkAGnLAnw0MQBIIHkON2AZD0I9vW1fMCb126+moCGuByih/ldZhQT9aXDVHuut6CHtxRPypsWmbZhDAsNJeW9NEG5/oZlVdjMISNgNnCqO0nLgMwLGZRqsrIMHMXUD/sCS9u/jqGWDAzBJ+zmowtGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=kiqwDBz3; arc=none smtp.client-ip=113.46.200.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="kiqwDBz3" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=nT3Zd6oOPYEtjshdg1Bk0FOHwde3zTVY8ceQsnwsZdM=; b=kiqwDBz3UikJUKIHkaYsAJ3cG32hv5EY3xJjeU+J40+ELgW9Gb+SQrETtISNH2oxuiGlNE9zz zUXZE7m8QkjMeJKNROiCtDM8044dorM/6NK6IOzokAXdK6TMSsdxhi4dc1/jP9v0T4Rd/2wRjw7 5+jNMqmxHKWukUuVyrcGCkQ= Received: from mail.maildlp.com (unknown [172.19.162.223]) by canpmsgout05.his.huawei.com (SkyGuard) with ESMTPS id 4gPwbb0n0Hz12LJv; Tue, 26 May 2026 22:48:11 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 3F95740571; Tue, 26 May 2026 22:56:11 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:09 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 1/7] x86/mm: use PTE-level pgprot for huge PFN helpers Date: Tue, 26 May 2026 22:49:57 +0800 Message-ID: <20260526145003.88445-2-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Make the x86 PMD/PUD PFN helpers use PTE-level pgprot_t as the basic format. pfn_pmd() and pfn_pud() now translate PTE-level attributes into large-page entries, including the x86 PAT/PSE encoding. pmd_pgprot() and pud_pgprot() translate large-page attributes back to PTE-level pgprot_t, hiding _PAGE_PSE and converting large-page PAT encoding back to the PTE PAT position. Rework pmd_mkinvalid() and pud_mkinvalid() to use the same helpers: extract a PTE-level pgprot_t with pmd_pgprot()/pud_pgprot(), clear PRESENT/PROTNONE, and rebuild the PMD/PUD entry with pfn_pmd()/pfn_pud(). The old explicit huge pgprot conversion helpers are no longer needed. Remove pte_clrhuge(), pgprot_large_2_4k(), pgprot_4k_2_large(), PAGE_KERNEL_LARGE and PAGE_KERNEL_LARGE_EXEC, and update x86 callers to construct PMD/PUD entries through the normal PFN helpers. Signed-off-by: Yin Tirui --- arch/x86/include/asm/pgtable.h | 68 +++++++++++++++++++--------- arch/x86/include/asm/pgtable_types.h | 12 +---- arch/x86/mm/init_32.c | 8 ++-- arch/x86/mm/init_64.c | 30 ++++-------- arch/x86/mm/pat/set_memory.c | 51 ++++++--------------- arch/x86/mm/pgtable.c | 8 +--- arch/x86/power/hibernate_32.c | 6 +-- 7 files changed, 77 insertions(+), 106 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2edd6c9d789c..fe63a2f6d183 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -475,11 +475,6 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte_set_flags(pte, _PAGE_PSE); } =20 -static inline pte_t pte_clrhuge(pte_t pte) -{ - return pte_clear_flags(pte, _PAGE_PSE); -} - static inline pte_t pte_mkglobal(pte_t pte) { return pte_set_flags(pte, _PAGE_GLOBAL); @@ -741,29 +736,31 @@ static inline pte_t pfn_pte(unsigned long page_nr, pg= prot_t pgprot) static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot) { phys_addr_t pfn =3D (phys_addr_t)page_nr << PAGE_SHIFT; - pfn ^=3D protnone_mask(pgprot_val(pgprot)); + pgprotval_t protval =3D protval_4k_2_large(pgprot_val(pgprot)); + + protval =3D check_pgprot(__pgprot(protval)); + if (protval) + protval |=3D _PAGE_PSE; + + pfn ^=3D protnone_mask(protval); pfn &=3D PHYSICAL_PMD_PAGE_MASK; - return __pmd(pfn | check_pgprot(pgprot)); + + return __pmd(pfn | protval); } =20 static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot) { phys_addr_t pfn =3D (phys_addr_t)page_nr << PAGE_SHIFT; - pfn ^=3D protnone_mask(pgprot_val(pgprot)); - pfn &=3D PHYSICAL_PUD_PAGE_MASK; - return __pud(pfn | check_pgprot(pgprot)); -} + pgprotval_t protval =3D protval_4k_2_large(pgprot_val(pgprot)); =20 -static inline pmd_t pmd_mkinvalid(pmd_t pmd) -{ - return pfn_pmd(pmd_pfn(pmd), - __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); -} + protval =3D check_pgprot(__pgprot(protval)); + if (protval) + protval |=3D _PAGE_PSE; =20 -static inline pud_t pud_mkinvalid(pud_t pud) -{ - return pfn_pud(pud_pfn(pud), - __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); + pfn ^=3D protnone_mask(protval); + pfn &=3D PHYSICAL_PUD_PAGE_MASK; + + return __pud(pfn | protval); } =20 static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); @@ -860,10 +857,37 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot= , pgprot_t newprot) } =20 #define pte_pgprot(x) __pgprot(pte_flags(x)) -#define pmd_pgprot(x) __pgprot(pmd_flags(x)) -#define pud_pgprot(x) __pgprot(pud_flags(x)) +static inline pgprot_t pmd_pgprot(pmd_t pmd) +{ + return __pgprot(protval_large_2_4k(pmd_flags(pmd))); +} + +#define pmd_pgprot pmd_pgprot + +static inline pgprot_t pud_pgprot(pud_t pud) +{ + return __pgprot(protval_large_2_4k(pud_flags(pud))); +} + +#define pud_pgprot pud_pgprot #define p4d_pgprot(x) __pgprot(p4d_flags(x)) =20 +static inline pmd_t pmd_mkinvalid(pmd_t pmd) +{ + pgprot_t prot =3D pmd_pgprot(pmd); + + pgprot_val(prot) &=3D ~(_PAGE_PRESENT | _PAGE_PROTNONE); + return pfn_pmd(pmd_pfn(pmd), prot); +} + +static inline pud_t pud_mkinvalid(pud_t pud) +{ + pgprot_t prot =3D pud_pgprot(pud); + + pgprot_val(prot) &=3D ~(_PAGE_PRESENT | _PAGE_PROTNONE); + return pfn_pud(pud_pfn(pud), prot); +} + #define canon_pgprot(p) __pgprot(massage_pgprot(p)) =20 static inline int is_new_memtype_allowed(u64 paddr, unsigned long size, diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 2ec250ba467e..135f6f1f826c 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -251,8 +251,6 @@ enum page_cache_mode { #define PAGE_KERNEL_EXEC_NOENC __pgprot_mask(__PAGE_KERNEL_EXEC | = 0) #define PAGE_KERNEL_ROX __pgprot_mask(__PAGE_KERNEL_ROX | _ENC) #define PAGE_KERNEL_NOCACHE __pgprot_mask(__PAGE_KERNEL_NOCACHE | _ENC) -#define PAGE_KERNEL_LARGE __pgprot_mask(__PAGE_KERNEL_LARGE | _ENC) -#define PAGE_KERNEL_LARGE_EXEC __pgprot_mask(__PAGE_KERNEL_LARGE_EXEC | _E= NC) #define PAGE_KERNEL_VVAR __pgprot_mask(__PAGE_KERNEL_VVAR | _ENC) =20 #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) @@ -497,21 +495,13 @@ static inline pgprotval_t protval_4k_2_large(pgprotva= l_t val) return (val & ~(_PAGE_PAT | _PAGE_PAT_LARGE)) | ((val & _PAGE_PAT) << (_PAGE_BIT_PAT_LARGE - _PAGE_BIT_PAT)); } -static inline pgprot_t pgprot_4k_2_large(pgprot_t pgprot) -{ - return __pgprot(protval_4k_2_large(pgprot_val(pgprot))); -} + static inline pgprotval_t protval_large_2_4k(pgprotval_t val) { return (val & ~(_PAGE_PAT | _PAGE_PAT_LARGE)) | ((val & _PAGE_PAT_LARGE) >> (_PAGE_BIT_PAT_LARGE - _PAGE_BIT_PAT)); } -static inline pgprot_t pgprot_large_2_4k(pgprot_t pgprot) -{ - return __pgprot(protval_large_2_4k(pgprot_val(pgprot))); -} - =20 typedef struct page *pgtable_t; =20 diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 0908c44d51e6..3c2c0af5a2d2 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -311,14 +311,12 @@ kernel_physical_mapping_init(unsigned long start, */ if (use_pse) { unsigned int addr2; - pgprot_t prot =3D PAGE_KERNEL_LARGE; + pgprot_t prot =3D PAGE_KERNEL; /* * first pass will use the same initial * identity mapping attribute + _PAGE_PSE. */ - pgprot_t init_prot =3D - __pgprot(PTE_IDENT_ATTR | - _PAGE_PSE); + pgprot_t init_prot =3D __pgprot(PTE_IDENT_ATTR); =20 pfn &=3D PMD_MASK >> PAGE_SHIFT; addr2 =3D (pfn + PTRS_PER_PTE-1) * PAGE_SIZE + @@ -326,7 +324,7 @@ kernel_physical_mapping_init(unsigned long start, =20 if (is_x86_32_kernel_text(addr) || is_x86_32_kernel_text(addr2)) - prot =3D PAGE_KERNEL_LARGE_EXEC; + prot =3D PAGE_KERNEL_EXEC; =20 pages_2m++; if (mapping_iter =3D=3D 1) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 77b889b71cf3..9e83fac8df4e 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -90,13 +90,6 @@ DEFINE_ENTRY(pud, pud, init) DEFINE_ENTRY(pmd, pmd, init) DEFINE_ENTRY(pte, pte, init) =20 -static inline pgprot_t prot_sethuge(pgprot_t prot) -{ - WARN_ON_ONCE(pgprot_val(prot) & _PAGE_PAT); - - return __pgprot(pgprot_val(prot) | _PAGE_PSE); -} - /* * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the * physical space so we can cache the place of the first one and move @@ -390,8 +383,7 @@ static void __init __init_extra_mapping(unsigned long p= hys, unsigned long size, pmd_t *pmd; pgprot_t prot; =20 - pgprot_val(prot) =3D pgprot_val(PAGE_KERNEL_LARGE) | - protval_4k_2_large(cachemode2protval(cache)); + pgprot_val(prot) =3D pgprot_val(PAGE_KERNEL) | cachemode2protval(cache); BUG_ON((phys & ~PMD_MASK) || (size & ~PMD_MASK)); for (; size; phys +=3D PMD_SIZE, size -=3D PMD_SIZE) { pgd =3D pgd_offset_k((unsigned long)__va(phys)); @@ -414,7 +406,7 @@ static void __init __init_extra_mapping(unsigned long p= hys, unsigned long size, } pmd =3D pmd_offset(pud, phys); BUG_ON(!pmd_none(*pmd)); - set_pmd(pmd, __pmd(phys | pgprot_val(prot))); + set_pmd(pmd, pfn_pmd(phys >> PAGE_SHIFT, prot)); } } =20 @@ -572,15 +564,13 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, u= nsigned long paddr_end, paddr_last =3D paddr_next; continue; } - new_prot =3D pte_pgprot(pte_clrhuge(*(pte_t *)pmd)); + new_prot =3D pmd_pgprot(*pmd); } =20 if (page_size_mask & (1<> PAGE_SHIFT, prot_sethuge(prot)), - init); + set_pmd_init(pmd, pfn_pmd(paddr >> PAGE_SHIFT, prot), init); spin_unlock(&init_mm.page_table_lock); paddr_last =3D paddr_next; continue; @@ -658,15 +648,13 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, u= nsigned long paddr_end, paddr_last =3D paddr_next; continue; } - prot =3D pte_pgprot(pte_clrhuge(*(pte_t *)pud)); + prot =3D pud_pgprot(*pud); } =20 if (page_size_mask & (1<> PAGE_SHIFT, prot_sethuge(prot)), - init); + set_pud_init(pud, pfn_pud(paddr >> PAGE_SHIFT, prot), init); spin_unlock(&init_mm.page_table_lock); paddr_last =3D paddr_next; continue; @@ -1519,11 +1507,9 @@ static int __meminitdata node_start; void __meminit vmemmap_set_pmd(pmd_t *pmd, void *p, int node, unsigned long addr, unsigned long next) { - pte_t entry; + pmd_t entry =3D pfn_pmd(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); =20 - entry =3D pfn_pte(__pa(p) >> PAGE_SHIFT, - PAGE_KERNEL_LARGE); - set_pmd(pmd, __pmd(pte_val(entry))); + set_pmd(pmd, entry); =20 /* check to see if we have contiguous blocks */ if (p_end !=3D p || node_start !=3D node) { diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index d023a40a1e03..a26b2397c4cf 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -971,25 +971,16 @@ static int __should_split_large_page(pte_t *kpte, uns= igned long address, =20 /* * We are safe now. Check whether the new pgprot is the same: - * Convert protection attributes to 4k-format, as cpa->mask* are set - * up accordingly. + * Note that old_prot is already in the ideal 4k-format, so we can + * directly apply cpa->mask* to it. */ =20 - /* Clear PSE (aka _PAGE_PAT) and move PAT bit to correct position */ - req_prot =3D pgprot_large_2_4k(old_prot); + req_prot =3D old_prot; =20 pgprot_val(req_prot) &=3D ~pgprot_val(cpa->mask_clr); pgprot_val(req_prot) |=3D pgprot_val(cpa->mask_set); =20 - /* - * req_prot is in format of 4k pages. It must be converted to large - * page format: the caching mode includes the PAT bit located at - * different bit positions in the two formats. - */ - req_prot =3D pgprot_4k_2_large(req_prot); req_prot =3D pgprot_clear_protnone_bits(req_prot); - if (pgprot_val(req_prot) & _PAGE_PRESENT) - pgprot_val(req_prot) |=3D _PAGE_PSE; =20 /* * old_pfn points to the large page base pfn. So we need to add the @@ -1065,7 +1056,10 @@ static int __should_split_large_page(pte_t *kpte, un= signed long address, return 1; =20 /* All checks passed. Update the large page mapping. */ - new_pte =3D pfn_pte(old_pfn, new_prot); + if (level =3D=3D PG_LEVEL_2M) + new_pte =3D __pte(pmd_val(pfn_pmd(old_pfn, new_prot))); + else + new_pte =3D __pte(pud_val(pfn_pud(old_pfn, new_prot))); __set_pmd_pte(kpte, address, new_pte); cpa->flags |=3D CPA_FLUSHTLB; cpa_inc_lp_preserved(level); @@ -1120,7 +1114,10 @@ static void split_set_pte(struct cpa_data *cpa, pte_= t *pte, unsigned long pfn, else pr_warn_once("CPA: Cannot fixup static protections for PUD split\n"); set: - set_pte(pte, pfn_pte(pfn, ref_prot)); + if (size =3D=3D PMD_SIZE) + set_pmd((pmd_t *)pte, pfn_pmd(pfn, ref_prot)); + else + set_pte(pte, pfn_pte(pfn, ref_prot)); } =20 static int @@ -1151,11 +1148,6 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte= , unsigned long address, switch (level) { case PG_LEVEL_2M: ref_prot =3D pmd_pgprot(*(pmd_t *)kpte); - /* - * Clear PSE (aka _PAGE_PAT) and move - * PAT bit to correct position. - */ - ref_prot =3D pgprot_large_2_4k(ref_prot); ref_pfn =3D pmd_pfn(*(pmd_t *)kpte); lpaddr =3D address & PMD_MASK; lpinc =3D PAGE_SIZE; @@ -1167,13 +1159,6 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte= , unsigned long address, pfninc =3D PMD_SIZE >> PAGE_SHIFT; lpaddr =3D address & PUD_MASK; lpinc =3D PMD_SIZE; - /* - * Clear the PSE flags if the PRESENT flag is not set - * otherwise pmd_present() will return true even on a non - * present pmd. - */ - if (!(pgprot_val(ref_prot) & _PAGE_PRESENT)) - pgprot_val(ref_prot) &=3D ~_PAGE_PSE; break; =20 default: @@ -1289,8 +1274,7 @@ static int collapse_pmd_page(pmd_t *pmd, unsigned lon= g addr, old_pmd =3D *pmd; =20 /* Success: set up a large page */ - pgprot =3D pgprot_4k_2_large(pte_pgprot(first)); - pgprot_val(pgprot) |=3D _PAGE_PSE; + pgprot =3D pte_pgprot(first); _pmd =3D pfn_pmd(pfn, pgprot); set_pmd(pmd, _pmd); =20 @@ -1593,7 +1577,6 @@ static long populate_pmd(struct cpa_data *cpa, { long cur_pages =3D 0; pmd_t *pmd; - pgprot_t pmd_pgprot; =20 /* * Not on a 2M boundary? @@ -1625,8 +1608,6 @@ static long populate_pmd(struct cpa_data *cpa, if (num_pages =3D=3D cur_pages) return cur_pages; =20 - pmd_pgprot =3D pgprot_4k_2_large(pgprot); - while (end - start >=3D PMD_SIZE) { =20 /* @@ -1638,8 +1619,7 @@ static long populate_pmd(struct cpa_data *cpa, =20 pmd =3D pmd_offset(pud, start); =20 - set_pmd(pmd, pmd_mkhuge(pfn_pmd(cpa->pfn, - canon_pgprot(pmd_pgprot)))); + set_pmd(pmd, pfn_pmd(cpa->pfn, canon_pgprot(pgprot))); =20 start +=3D PMD_SIZE; cpa->pfn +=3D PMD_SIZE >> PAGE_SHIFT; @@ -1667,7 +1647,6 @@ static int populate_pud(struct cpa_data *cpa, unsigne= d long start, p4d_t *p4d, pud_t *pud; unsigned long end; long cur_pages =3D 0; - pgprot_t pud_pgprot; =20 end =3D start + (cpa->numpages << PAGE_SHIFT); =20 @@ -1705,14 +1684,12 @@ static int populate_pud(struct cpa_data *cpa, unsig= ned long start, p4d_t *p4d, return cur_pages; =20 pud =3D pud_offset(p4d, start); - pud_pgprot =3D pgprot_4k_2_large(pgprot); =20 /* * Map everything starting from the Gb boundary, possibly with 1G pages */ while (boot_cpu_has(X86_FEATURE_GBPAGES) && end - start >=3D PUD_SIZE) { - set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn, - canon_pgprot(pud_pgprot)))); + set_pud(pud, pfn_pud(cpa->pfn, canon_pgprot(pgprot))); =20 start +=3D PUD_SIZE; cpa->pfn +=3D PUD_SIZE >> PAGE_SHIFT; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index da7f0a03cf90..cd9a62f4d437 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -644,9 +644,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t= prot) if (pud_present(*pud) && !pud_leaf(*pud)) return 0; =20 - set_pte((pte_t *)pud, pfn_pte( - (u64)addr >> PAGE_SHIFT, - __pgprot(protval_4k_2_large(pgprot_val(prot)) | _PAGE_PSE))); + set_pud(pud, pfn_pud((u64)addr >> PAGE_SHIFT, prot)); =20 return 1; } @@ -676,9 +674,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t= prot) if (pmd_present(*pmd) && !pmd_leaf(*pmd)) return 0; =20 - set_pte((pte_t *)pmd, pfn_pte( - (u64)addr >> PAGE_SHIFT, - __pgprot(protval_4k_2_large(pgprot_val(prot)) | _PAGE_PSE))); + set_pmd(pmd, pfn_pmd((u64)addr >> PAGE_SHIFT, prot)); =20 return 1; } diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c index 223d5bca29b8..2f18f8223376 100644 --- a/arch/x86/power/hibernate_32.c +++ b/arch/x86/power/hibernate_32.c @@ -107,7 +107,7 @@ static int resume_physical_mapping_init(pgd_t *pgd_base) * NOTE: We can mark everything as executable here */ if (boot_cpu_has(X86_FEATURE_PSE)) { - set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE_EXEC)); + set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_EXEC)); pfn +=3D PTRS_PER_PTE; } else { pte_t *max_pte; @@ -156,13 +156,13 @@ static int set_up_temporary_text_mapping(pgd_t *pgd_b= ase) =20 if (boot_cpu_has(X86_FEATURE_PSE)) { set_pmd(pmd + pmd_index(restore_jump_address), - __pmd((jump_address_phys & PMD_MASK) | pgprot_val(PAGE_KERNEL_LARGE_EXEC= ))); + pfn_pmd(jump_address_phys >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); } else { pte =3D resume_one_page_table_init(pmd); if (!pte) return -ENOMEM; set_pte(pte + pte_index(restore_jump_address), - __pte((jump_address_phys & PAGE_MASK) | pgprot_val(PAGE_KERNEL_EXEC))); + pfn_pte(jump_address_phys >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); } =20 return 0; --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout12.his.huawei.com (canpmsgout12.his.huawei.com [113.46.200.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E9063FF1BB; Tue, 26 May 2026 14:56:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.227 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807380; cv=none; b=rGaydaCogTB/fuQNsU+7qyMx09AatrWK/1ITy8iEhrDnILSOvas+tQzUwOuO05ITK4m/b9+G5AsmCbBn+hP+reYIpDrLRdBT4Xof8cvQXkLu8KLeiQiqr7wHvrv+NuSGsUQzzCLV+1WDCyS4MpyXwX5CdqtLo8UjlC8/jzB03Ko= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807380; c=relaxed/simple; bh=VY2r86ynEzqqILXhMV+rXpgCQUiCYmwcVdujKDIV3bM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gNC/drUtgfD+v6pbJT6ojHkx9Ogx6Bzv/wqdgWQmAYFWR3q4D7bDZN7LcTfdyhWdoXGn+I/T+46e+VJZdErhBSgZ6ykUCIiqTXjlvKkzHYN1CFXoe/0GsvWsfz6ZWCktde7G1UGzLvMdmns8dg3C9LVSdTJHGBInpvzz2GJatAo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=0iMEW0ts; arc=none smtp.client-ip=113.46.200.227 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="0iMEW0ts" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=l+aFKJrjVwtzzo3eLUyF5+IW/uSU3+87cCQnsOJ4oeg=; b=0iMEW0tsF/BBgk4ze5nZcHMmdemvVB1xjaYbneUnB2p3zX6RYNWiFoVmHYi0xEF5s2CCLn5Oa A/qAQxOhoQKr4PXdSbSP16hYOVmOosHS57dE7hnJY/IV4AfhzMAuDqs4A7vCLbtKI+1pZ9V6gfL dPaibFsUNvw+EGtOP7+W/Jg= Received: from mail.maildlp.com (unknown [172.19.163.214]) by canpmsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4gPwbg0PqJznTVR; Tue, 26 May 2026 22:48:15 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 985234056C; Tue, 26 May 2026 22:56:12 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:11 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 2/7] arm64/mm: use PTE-level pgprot for huge PFN helpers Date: Tue, 26 May 2026 22:49:58 +0800 Message-ID: <20260526145003.88445-3-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Make the arm64 PMD/PUD PFN helpers use PTE-level pgprot_t as the basic format. pfn_pmd() and pfn_pud() now translate PTE-level attributes into block entries. pmd_pgprot() and pud_pgprot() translate block descriptor attributes back into PTE-level attributes. Remove mk_pmd_sect_prot() and mk_pud_sect_prot(). Signed-off-by: Yin Tirui --- arch/arm64/include/asm/pgtable.h | 48 ++++++++++++++++++++++---------- arch/arm64/mm/mmu.c | 4 +-- 2 files changed, 36 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 4dfa42b7d053..c3ee12e14f86 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -511,16 +511,6 @@ static inline pmd_t pte_pmd(pte_t pte) return __pmd(pte_val(pte)); } =20 -static inline pgprot_t mk_pud_sect_prot(pgprot_t prot) -{ - return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT); -} - -static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot) -{ - return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT); -} - static inline pte_t pte_swp_mkexclusive(pte_t pte) { return set_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE)); @@ -628,7 +618,13 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd) #define __pmd_to_phys(pmd) __pte_to_phys(pmd_pte(pmd)) #define __phys_to_pmd_val(phys) __phys_to_pte_val(phys) #define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT) -#define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PA= GE_SHIFT) | pgprot_val(prot)) +static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd =3D __pmd(__phys_to_pmd_val((phys_addr_t)pfn << PAGE_SHIFT) | + pgprot_val(prot)); + + return pmd_mkhuge(pmd); +} =20 #define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) @@ -652,22 +648,46 @@ static inline pud_t pud_mkhuge(pud_t pud) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys) __phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) -#define pfn_pud(pfn,prot) __pud(__phys_to_pud_val((phys_addr_t)(pfn) << PA= GE_SHIFT) | pgprot_val(prot)) +static inline pud_t pfn_pud(unsigned long pfn, pgprot_t prot) +{ + pud_t pud =3D __pud(__phys_to_pud_val((phys_addr_t)pfn << PAGE_SHIFT) | + pgprot_val(prot)); + + return pud_mkhuge(pud); +} =20 #define pmd_pgprot pmd_pgprot static inline pgprot_t pmd_pgprot(pmd_t pmd) { unsigned long pfn =3D pmd_pfn(pmd); + pmdval_t protval =3D pmd_val(pmd) ^ + __phys_to_pmd_val((phys_addr_t)pfn << PAGE_SHIFT); + + /* + * pgprot_t represents PTE-level attributes. Convert the PMD + * block descriptor type into a PTE page descriptor type. + */ + pmdval_t mask =3D PMD_TYPE_MASK & ~PTE_VALID; + pmdval_t val =3D PTE_TYPE_PAGE & ~PTE_VALID; =20 - return __pgprot(pmd_val(pfn_pmd(pfn, __pgprot(0))) ^ pmd_val(pmd)); + return __pgprot((protval & ~mask) | val); } =20 #define pud_pgprot pud_pgprot static inline pgprot_t pud_pgprot(pud_t pud) { unsigned long pfn =3D pud_pfn(pud); + pudval_t protval =3D pud_val(pud) ^ + __phys_to_pud_val((phys_addr_t)pfn << PAGE_SHIFT); + + /* + * pgprot_t represents PTE-level attributes. Convert the PUD + * block descriptor type into a PTE page descriptor type. + */ + pudval_t mask =3D PUD_TYPE_MASK & ~PTE_VALID; + pudval_t val =3D PTE_TYPE_PAGE & ~PTE_VALID; =20 - return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud)); + return __pgprot((protval & ~mask) | val); } =20 static inline void __set_ptes_anysz(struct mm_struct *mm, unsigned long ad= dr, diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e5a42b7a0160..2dd99d595f19 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1816,7 +1816,7 @@ void vmemmap_free(unsigned long start, unsigned long = end, =20 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot) { - pud_t new_pud =3D pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot)); + pud_t new_pud =3D pfn_pud(__phys_to_pfn(phys), prot); =20 /* Only allow permission changes for now */ if (!pgattr_change_is_safe(READ_ONCE(pud_val(*pudp)), @@ -1830,7 +1830,7 @@ int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgpro= t_t prot) =20 int pmd_set_huge(pmd_t *pmdp, phys_addr_t phys, pgprot_t prot) { - pmd_t new_pmd =3D pfn_pmd(__phys_to_pfn(phys), mk_pmd_sect_prot(prot)); + pmd_t new_pmd =3D pfn_pmd(__phys_to_pfn(phys), prot); =20 /* Only allow permission changes for now */ if (!pgattr_change_is_safe(READ_ONCE(pmd_val(*pmdp)), --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout04.his.huawei.com (canpmsgout04.his.huawei.com [113.46.200.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D323F3D091D; Tue, 26 May 2026 14:56:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807384; cv=none; b=gFqwpZKNUqCROsLTa+WGuUfFVds7wDg2wz8Puuy6qu3dKuNt9qaCewTL51+PuuFmJtl0RdXSOtT1aQXWB7gM1E8sweNMiPZyn+ab23VwRydLddPc6GwR8YBRm9xpWlhf//dMqucjAO6WHh/0H/YNaPX/64ivYCLLbSfXX3I/pR8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807384; c=relaxed/simple; bh=RN/6kATJoszWLSbFjzzLMkpAe24D9rshN9oE4jPee44=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GxecpG72p+3jdvE0t135R1Px5jVs1iDwSsbVLIuaALnv5gPmtRM9X/J2al5jSrUedNtp54+cYlE8hnRenOdur8vm84FvOc8PAIuUA+7TOj7e0pRsfDgGPp4gQ7ZHyzMiJL/IYlmzjWJ0BpY/P4Z0TxOeKY6Re6MEAFs0LIUNwUk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=i0YNxHW3; arc=none smtp.client-ip=113.46.200.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="i0YNxHW3" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=PEJKj8MrGm98KFDNsLR6hqBdS/9XHG5DFtB7L96xDQw=; b=i0YNxHW3wq3lFr0pxm1PaJNgAVJTO1vRZUeYUiJLUthZy1rqBkAIdd47b8R+6cAyaooBDq9ga Gi+EVanp6M6ViNTwBHbOygCaz+ZLtugXoLZmkLmAljpjW4ksf3jW9FcLRD3EtMD7xBeNQvKhMW1 e2Prn6nGp6met5Qh5OkxH2g= Received: from mail.maildlp.com (unknown [172.19.162.197]) by canpmsgout04.his.huawei.com (SkyGuard) with ESMTPS id 4gPwbt2RTYz1prmJ; Tue, 26 May 2026 22:48:26 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 0107540576; Tue, 26 May 2026 22:56:14 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:12 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 3/7] powerpc/mm: use PTE-level pgprot for huge PFN helpers Date: Tue, 26 May 2026 22:49:59 +0800 Message-ID: <20260526145003.88445-4-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Make the powerpc PMD PFN helper use PTE-level pgprot_t as the basic format. pmd_pgprot() currently derives pgprot_t from the PMD entry through pte_pgprot(). Some PMD leaf entries can carry H_PAGE_THP_HUGE, which is specific to huge PMDs and should not be propagated into PTE-level pgprot_t. Mask H_PAGE_THP_HUGE out in pmd_pgprot(). Signed-off-by: Yin Tirui --- arch/powerpc/include/asm/pgtable.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/= pgtable.h index d20ff2ae02f5..0f368ea64b1f 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -67,7 +67,16 @@ static inline pgprot_t pte_pgprot(pte_t pte) #define pmd_pgprot pmd_pgprot static inline pgprot_t pmd_pgprot(pmd_t pmd) { - return pte_pgprot(pmd_pte(pmd)); + pgprot_t prot =3D pte_pgprot(pmd_pte(pmd)); + + /* + * pmd_pgprot() returns PTE-level pgprot_t. H_PAGE_THP_HUGE is specific + * to huge PMDs. + */ +#ifdef H_PAGE_THP_HUGE + prot =3D __pgprot(pgprot_val(prot) & ~H_PAGE_THP_HUGE); +#endif + return prot; } =20 #define pud_pgprot pud_pgprot --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout07.his.huawei.com (canpmsgout07.his.huawei.com [113.46.200.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 612483FFAC5; Tue, 26 May 2026 14:56:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.222 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807382; cv=none; b=Fga7p+tY6Tjym3HKflSwlhb+2yw7uFHYTOTUcFwH+W2uIx9LyRtix5W0HIB0tH6r54uROKtPc7voUzeOutZMNZKq3H2A1p/pZAnyz/wTt+0rypFlMSALLyPUD39SDUSEDw1SCVrcb508FzDZjS+b7j5Dt7oltsTGQl8NgosDapA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807382; c=relaxed/simple; bh=iWjJTsW6BuEeSJC1LGqlI912lSrBEV+g23DvGRdK2hk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WoqFvPRtmzpDjd4+N6WlZaEvqS9cS9ZwdXlHN7x91lgOr0K4uW27N5RVGfV1OFNhOrYUTYkPSE7ZyUpOrZbWkRSp33bTtExx/Nj7fsf5KI0BRPAdEebgYgQ1p5pJDJ2l3L44FZRUnPtiXsnbC86O+juxpRJ2gyssHYrqTMm3ZGU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=6IZrptZt; arc=none smtp.client-ip=113.46.200.222 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="6IZrptZt" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=SRAvpL5kg7zfVB87jHmBF+c3eWPsqtwvfUMg/m3KWqc=; b=6IZrptZt5pccedulAnBT8WZ9BpyHxnDtRYuwsBIYpsE+R49/hVIqXtnwoIHkcra0wdM3302cd TArJlb9VMRw0M09I5X1Ptta9isxW+2x2NHLRQMjUNumTS8/2z7/BgrG5HQgOGHeoowJftfugxM4 f1Mp3586pp6HqfZ8jmmJtRQ= Received: from mail.maildlp.com (unknown [172.19.163.200]) by canpmsgout07.his.huawei.com (SkyGuard) with ESMTPS id 4gPwby4VlGzLlSS; Tue, 26 May 2026 22:48:30 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 613854055B; Tue, 26 May 2026 22:56:15 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:13 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 4/7] mm/huge_memory: refactor copy_huge_pmd() Date: Tue, 26 May 2026 22:50:00 +0800 Message-ID: <20260526145003.88445-5-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Classify the source PMD via pmd_present() and vm_normal_folio_pmd(), matching the way the PTE path uses pte_present() and vm_normal_page(). This moves the present-PMD decision from VMA identity checks to the actual PMD/folio state. Drop the defensive "if (!pmd_trans_huge(pmd)) goto out_unlock" branch: with mmap_write_lock held during fork, it should not occur. Extract the present-PMD side of copy_huge_pmd() into copy_present_huge_pmd(). The helper owns the child pgtable passed by the caller: it either deposits the pgtable when installing a copied PMD, or frees it on paths that do not install one. The child pgtable is now allocated once up front and freed on every skip path. This makes file/shmem and PFNMAP/special skip paths take the PMD locks and free the preallocated pgtable before returning. These are not expected to be hot paths, and the PFNMAP case is reused by the follow-up PMD PFNMAP copy support. Signed-off-by: Yin Tirui --- mm/huge_memory.c | 175 +++++++++++++++++++++++++---------------------- 1 file changed, 95 insertions(+), 80 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9832ee910d5e..3964258ff91d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1879,6 +1879,82 @@ bool touch_pmd(struct vm_area_struct *vma, unsigned = long addr, return false; } =20 +static int copy_present_huge_pmd( + struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, + pmd_t pmd, pgtable_t pgtable, bool *need_split) +{ + struct folio *src_folio; + bool wrprotect =3D true; + + src_folio =3D vm_normal_folio_pmd(src_vma, addr, pmd); + if (!src_folio) { + /* + * When page table lock is held, the huge zero pmd should not be + * under splitting since we don't split the page itself, only pmd to + * a page table. + */ + if (is_huge_zero_pmd(pmd)) { + /* + * mm_get_huge_zero_folio() will never allocate a new + * folio here, since we already have a zero page to + * copy. It just takes a reference. + */ + mm_get_huge_zero_folio(dst_mm); + goto set_pmd; + } + + /* + * Making sure it's not a CoW VMA with writable + * mapping, otherwise it means either the anon page wrongly + * applied special bit, or we made the PRIVATE mapping be + * able to wrongly write to the backend MMIO. + */ + VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); + pte_free(dst_mm, pgtable); + pgtable =3D NULL; + wrprotect =3D false; + goto set_pmd; + } + + /* File THPs are copied lazily by refaulting. */ + if (!folio_test_anon(src_folio)) { + pte_free(dst_mm, pgtable); + return 0; + } + + folio_get(src_folio); + if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, + &src_folio->page, + dst_vma, src_vma))) { + /* Page maybe pinned: split and retry the fault on PTEs. */ + folio_put(src_folio); + pte_free(dst_mm, pgtable); + *need_split =3D true; + return -EAGAIN; + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + +set_pmd: + if (pgtable) { + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + } + + if (wrprotect) { + pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!userfaultfd_wp(dst_vma)) + pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_wrprotect(pmd); + } + + pmd =3D pmd_mkold(pmd); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + + return 0; +} + static void copy_huge_non_present_pmd( struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -1940,104 +2016,43 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct= mm_struct *src_mm, struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) { spinlock_t *dst_ptl, *src_ptl; - struct page *src_page; - struct folio *src_folio; - pmd_t pmd; pgtable_t pgtable =3D NULL; - int ret =3D -ENOMEM; - - pmd =3D pmdp_get_lockless(src_pmd); - if (unlikely(pmd_present(pmd) && pmd_special(pmd) && - !is_huge_zero_pmd(pmd))) { - dst_ptl =3D pmd_lock(dst_mm, dst_pmd); - src_ptl =3D pmd_lockptr(src_mm, src_pmd); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - /* - * No need to recheck the pmd, it can't change with write - * mmap lock held here. - * - * Meanwhile, making sure it's not a CoW VMA with writable - * mapping, otherwise it means either the anon page wrongly - * applied special bit, or we made the PRIVATE mapping be - * able to wrongly write to the backend MMIO. - */ - VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); - goto set_pmd; - } - - /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(dst_vma)) - return 0; + bool need_split =3D false; + int ret =3D 0; + pmd_t pmd; =20 pgtable =3D pte_alloc_one(dst_mm); if (unlikely(!pgtable)) - goto out; + return -ENOMEM; =20 dst_ptl =3D pmd_lock(dst_mm, dst_pmd); src_ptl =3D pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); =20 - ret =3D -EAGAIN; pmd =3D *src_pmd; =20 - if (unlikely(thp_migration_supported() && - pmd_is_valid_softleaf(pmd))) { - copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr, + if (likely(pmd_present(pmd))) { + ret =3D copy_present_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr, + dst_vma, src_vma, pmd, pgtable, &need_split); + } else if (unlikely(thp_migration_supported() && pmd_is_valid_softleaf(pm= d))) { + if (unlikely(!vma_is_anonymous(dst_vma))) + pte_free(dst_mm, pgtable); + else + copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr, dst_vma, src_vma, pmd, pgtable); - ret =3D 0; - goto out_unlock; - } - - if (unlikely(!pmd_trans_huge(pmd))) { + } else { + VM_WARN_ONCE(1, "unexpected non-present PMD %llx\n", + (unsigned long long)pmd_val(pmd)); pte_free(dst_mm, pgtable); - goto out_unlock; - } - /* - * When page table lock is held, the huge zero pmd should not be - * under splitting since we don't split the page itself, only pmd to - * a page table. - */ - if (is_huge_zero_pmd(pmd)) { - /* - * mm_get_huge_zero_folio() will never allocate a new - * folio here, since we already have a zero page to - * copy. It just takes a reference. - */ - mm_get_huge_zero_folio(dst_mm); - goto out_zero_page; + ret =3D -EAGAIN; } =20 - src_page =3D pmd_page(pmd); - VM_BUG_ON_PAGE(!PageHead(src_page), src_page); - src_folio =3D page_folio(src_page); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); =20 - folio_get(src_folio); - if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, dst_vma, sr= c_vma))) { - /* Page maybe pinned: split and retry the fault on PTEs. */ - folio_put(src_folio); - pte_free(dst_mm, pgtable); - spin_unlock(src_ptl); - spin_unlock(dst_ptl); + if (unlikely(need_split)) __split_huge_pmd(src_vma, src_pmd, addr, false); - return -EAGAIN; - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); -out_zero_page: - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_clear_uffd_wp(pmd); - pmd =3D pmd_wrprotect(pmd); -set_pmd: - pmd =3D pmd_mkold(pmd); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); =20 - ret =3D 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); -out: return ret; } =20 --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout03.his.huawei.com (canpmsgout03.his.huawei.com [113.46.200.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F0BB40758A; Tue, 26 May 2026 14:56:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807387; cv=none; b=e1mnjiLV5E00PfBzrACjtfJgVs0VNpiADyW/KSVHUbB2ymMggqAtAknBz9z1NWPhCMcVbvJpPiXlDloVa9qRes9uml9rwKoZz9GmNFLeBHzUTe2BP3XRHopk6uHgkp5CFvuCUwnfqjptjMr5GESvpwAy7ikXztCJnajnfiJp/5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807387; c=relaxed/simple; bh=rWNQO1kIF+yECDVQJf//mBCXKjTbTEapDM2ZF5+W9bY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Zxoca49zETUCs1cBMRmOumkfcOIvVkFQsIlyYcICYXVFUzxgxtb61LOoCCVYy2fpLqcpczunlsIuS1r4abY9UAVwyqI0fv+BUiZhqbuv8WuqlvfM2YNQHerxW/YwrNu0CE8tjN/ZrzufDnsrKjKy1XhY1iegSKDlTA58kYtTHAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=cWP2C5F2; arc=none smtp.client-ip=113.46.200.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="cWP2C5F2" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=wMIWaZV32Daasf+za8/Pnxre+NVOPFzoBSPfz4En8HQ=; b=cWP2C5F2TBvhzg8T7RVE9LoZ2ByIKu0sClV+EAFEDCH0sYrngJWuk2R/h0BScXfHjlZ39EcF0 sm2W80NqlJG1l2DcbQ+f7BpC0XXWfXbT14ddcMTDZ61mK12koV6hAqFEZ0BpvXmYoJ1dR3dQgM8 I5JteHexY65OZ5Vz5woxwvk= Received: from mail.maildlp.com (unknown [172.19.163.0]) by canpmsgout03.his.huawei.com (SkyGuard) with ESMTPS id 4gPwcN06v8zpT17; Tue, 26 May 2026 22:48:52 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id B5BD740561; Tue, 26 May 2026 22:56:16 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:15 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 5/7] mm/huge_memory: refactor __split_huge_pmd_locked() Date: Tue, 26 May 2026 22:50:01 +0800 Message-ID: <20260526145003.88445-6-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Rework __split_huge_pmd_locked() to classify huge PMDs by the PMD entry itself instead of starting from vma_is_anonymous(). Present PMDs are classified with vm_normal_folio_pmd(): file/shmem THPs are dropped and refaulted later, anonymous THPs are split into PTEs, and PMDs without a normal folio are handled as huge zero or special PMDs. Non-present PMDs are classified with pmd_to_softleaf_folio(): file/shmem migration entries are dropped, while anonymous migration/device-private entries are split into PTEs. This also makes the anonymous decision folio-based. A private file mapping that has CoW'ed to an anonymous THP now follows the anonymous path even though the VMA is file-backed. No intended behavioural change. Signed-off-by: Yin Tirui --- mm/huge_memory.c | 197 +++++++++++++++++++++++++++-------------------- 1 file changed, 114 insertions(+), 83 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3964258ff91d..8cd77389d52f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3136,25 +3136,38 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, =20 count_vm_event(THP_SPLIT_PMD); =20 - if (!vma_is_anonymous(vma)) { - old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); - /* - * We are going to unmap this huge page. So - * just go ahead and zap it - */ - if (arch_needs_pgtable_deposit()) - zap_deposited_table(mm, pmd); - if (vma_is_special_huge(vma)) - return; - if (unlikely(pmd_is_migration_entry(old_pmd))) { - const softleaf_t old_entry =3D softleaf_from_pmd(old_pmd); + if (pmd_present(*pmd)) { + folio =3D vm_normal_folio_pmd(vma, haddr, *pmd); + + if (unlikely(!folio)) { + if (is_huge_zero_pmd(*pmd)) { + /* + * FIXME: Do we want to invalidate secondary mmu by calling + * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below + * inside __split_huge_pmd() ? + * + * We are going from a zero huge page write protected to zero + * small page also write protected so it does not seems useful + * to invalidate secondary mmu at this time. + */ + return __split_huge_zero_page_pmd(vma, haddr, pmd); + } =20 - folio =3D softleaf_to_folio(old_entry); - } else if (is_huge_zero_pmd(old_pmd)) { + /* Present but not a normal folio: drop the PMD. */ + old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); return; - } else { + } + + if (unlikely(!folio_test_anon(folio))) { + old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); + if (vma_is_special_huge(vma)) + return; + page =3D pmd_page(old_pmd); - folio =3D page_folio(page); if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) folio_mark_dirty(folio); if (!folio_test_referenced(folio) && pmd_young(old_pmd)) @@ -3164,72 +3177,7 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, folio_put(folio); return; } - add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); - return; - } - - if (is_huge_zero_pmd(*pmd)) { - /* - * FIXME: Do we want to invalidate secondary mmu by calling - * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below - * inside __split_huge_pmd() ? - * - * We are going from a zero huge page write protected to zero - * small page also write protected so it does not seems useful - * to invalidate secondary mmu at this time. - */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); - } - - if (pmd_is_migration_entry(*pmd)) { - softleaf_t entry; - - old_pmd =3D *pmd; - entry =3D softleaf_from_pmd(old_pmd); - page =3D softleaf_to_page(entry); - folio =3D page_folio(page); - - soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); - - write =3D softleaf_is_migration_write(entry); - if (PageAnon(page)) - anon_exclusive =3D softleaf_is_migration_read_exclusive(entry); - young =3D softleaf_is_migration_young(entry); - dirty =3D softleaf_is_migration_dirty(entry); - } else if (pmd_is_device_private_entry(*pmd)) { - softleaf_t entry; - - old_pmd =3D *pmd; - entry =3D softleaf_from_pmd(old_pmd); - page =3D softleaf_to_page(entry); - folio =3D page_folio(page); - - soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); - - write =3D softleaf_is_device_private_write(entry); - anon_exclusive =3D PageAnonExclusive(page); - - /* - * Device private THP should be treated the same as regular - * folios w.r.t anon exclusive handling. See the comments for - * folio handling and anon_exclusive below. - */ - if (freeze && anon_exclusive && - folio_try_share_anon_rmap_pmd(folio, page)) - freeze =3D false; - if (!freeze) { - rmap_t rmap_flags =3D RMAP_NONE; - - folio_ref_add(folio, HPAGE_PMD_NR - 1); - if (anon_exclusive) - rmap_flags |=3D RMAP_EXCLUSIVE; =20 - folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, - vma, haddr, rmap_flags); - } - } else { /* * Up to this point the pmd is present and huge and userland has * the whole access to the hugepage during the split (which @@ -3255,7 +3203,6 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, */ old_pmd =3D pmdp_invalidate(vma, haddr, pmd); page =3D pmd_page(old_pmd); - folio =3D page_folio(page); if (pmd_dirty(old_pmd)) { dirty =3D true; folio_set_dirty(folio); @@ -3266,7 +3213,6 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, uffd_wp =3D pmd_uffd_wp(old_pmd); =20 VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); - VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); =20 /* * Without "freeze", we'll simply split the PMD, propagating the @@ -3296,6 +3242,85 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, vma, haddr, rmap_flags); } + } else { + /* + * Non-present PMD: a softleaf-encoded migration or + * device-private entry. pmd_to_softleaf_folio() warns and + * returns NULL for any other encoding. + */ + folio =3D pmd_to_softleaf_folio(*pmd); + if (unlikely(!folio)) + return; + + if (unlikely(!folio_test_anon(folio))) { + /* + * File/shmem migration entry: drop the PMD without + * splitting. Unlike the present case the entry holds + * neither a folio reference nor an rmap to release, + * so just adjust the RSS counter. + */ + pmdp_huge_clear_flush(vma, haddr, pmd); + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); + if (unlikely(vma_is_special_huge(vma))) { + VM_WARN_ONCE(1, + "unexpected special huge PMD migration entry\n"); + return; + } + add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); + return; + } + + if (pmd_is_migration_entry(*pmd)) { + softleaf_t entry; + + old_pmd =3D *pmd; + entry =3D softleaf_from_pmd(old_pmd); + page =3D softleaf_to_page(entry); + + soft_dirty =3D pmd_swp_soft_dirty(old_pmd); + uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + + write =3D softleaf_is_migration_write(entry); + if (PageAnon(page)) + anon_exclusive =3D softleaf_is_migration_read_exclusive(entry); + young =3D softleaf_is_migration_young(entry); + dirty =3D softleaf_is_migration_dirty(entry); + } else if (pmd_is_device_private_entry(*pmd)) { + softleaf_t entry; + + old_pmd =3D *pmd; + entry =3D softleaf_from_pmd(old_pmd); + page =3D softleaf_to_page(entry); + + soft_dirty =3D pmd_swp_soft_dirty(old_pmd); + uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + + write =3D softleaf_is_device_private_write(entry); + anon_exclusive =3D PageAnonExclusive(page); + + /* + * Device-private THP should be treated the same as + * regular folios w.r.t. anon-exclusive handling. See + * the matching code for present anon folios above. + */ + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) + freeze =3D false; + if (!freeze) { + rmap_t rmap_flags =3D RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |=3D RMAP_EXCLUSIVE; + + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } + } else { + VM_WARN_ON_ONCE(1); + return; + } } =20 /* --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout05.his.huawei.com (canpmsgout05.his.huawei.com [113.46.200.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 456EB3FF88A; Tue, 26 May 2026 14:56:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807386; cv=none; b=mkqkqHYctpQPveaHkQNJOKAo7dqmw6uAk8rW+odBWIacck6n7kqPYurvMgrfX+jKsNknfzE6QY6ta1nSyuwqNcAXWjPefRUJcOBkH8p2+XFc0wYFFp0tYraJBidq1SnsYikyuZ65f8xKwTeL8TXY9M5rBM85HPYQc9XqoDxCx14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807386; c=relaxed/simple; bh=tIs6LDXW3XYysdHKFfsyuDh05S2fSpj5hhM/ht4Gqls=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Yt9xm1tQHf9Z6uGMNs9gGrAHNH9gZejSyQFfUMYfJIT0OE/c34ZsfAhLVZ1YKobB6bRMrn/5qGUoXHh9zJzaHVaURuDFuaEdPw32SpU61JuVlK707InY3VmU4xDNrzvOQ+SGNi06aqkJTQ2mYaslXXhXXbXHJWeyVaD9hwkYbE8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=aaxZKJZg; arc=none smtp.client-ip=113.46.200.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="aaxZKJZg" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=YAntgOTYzemzPb0EO7sTFQO6BvGc9iJeZEmRTSuIcHs=; b=aaxZKJZgaMiO97CRNKgdXbF9a+WdupCXAZJH6aCLKDE/QcIFvdDpw7trUN2xXzftoS/ljUZIs WbPF/+qBEysJMYcTplm5KjDlnSJ7bmDUsBo/PycdedUouph+stsmJVQaALSHyW+Vmh24Jj8Wp+M kjD6iiq6tNQulyrkLP5yR98= Received: from mail.maildlp.com (unknown [172.19.163.0]) by canpmsgout05.his.huawei.com (SkyGuard) with ESMTPS id 4gPwbk08Bnz12LdK; Tue, 26 May 2026 22:48:18 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 2CE5140561; Tue, 26 May 2026 22:56:18 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:16 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 6/7] mm/huge_memory: make move_huge_pmd() use has_deposited_pgtable() Date: Tue, 26 May 2026 22:50:02 +0800 Message-ID: <20260526145003.88445-7-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Use has_deposited_pgtable() in move_huge_pmd() to decide whether pmd_move_must_withdraw() should move a deposited pgtable instead of using the VMA type. PowerPC radix follows the generic rule. PowerPC hash keeps returning true. Signed-off-by: Yin Tirui --- arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ++--- arch/powerpc/mm/book3s64/pgtable.c | 11 +++++------ mm/huge_memory.c | 20 ++++++++++++-------- 3 files changed, 19 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index b6629c041e75..a0042cacac8d 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1424,9 +1424,8 @@ extern pud_t pudp_invalidate(struct vm_area_struct *v= ma, unsigned long address, =20 #define pmd_move_must_withdraw pmd_move_must_withdraw struct spinlock; -extern int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, - struct spinlock *old_pmd_ptl, - struct vm_area_struct *vma); +extern bool pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, + struct spinlock *old_pmd_ptl, bool has_deposit); /* * Hash translation mode use the deposited table to store hash pte * slot information. diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/= pgtable.c index 85ab6723c8f2..4c45b5762d57 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -548,15 +548,14 @@ void ptep_modify_prot_commit(struct vm_area_struct *v= ma, unsigned long addr, * pmd page. Hence if we have different pmd page we need to withdraw durin= g pmd * move. * - * With hash we use deposited table always irrespective of anon or not. - * With radix we use deposited table only for anonymous mapping. + * With hash we use deposited table always irrespective of has_deposit or = not. + * With radix we use the same rule as the generic implementation. */ -int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, - struct spinlock *old_pmd_ptl, - struct vm_area_struct *vma) +bool pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, + struct spinlock *old_pmd_ptl, bool has_deposit) { if (radix_enabled()) - return (new_pmd_ptl !=3D old_pmd_ptl) && vma_is_anonymous(vma); + return (new_pmd_ptl !=3D old_pmd_ptl) && has_deposit; =20 return true; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8cd77389d52f..be9b637c813b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2552,17 +2552,14 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, } =20 #ifndef pmd_move_must_withdraw -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, - spinlock_t *old_pmd_ptl, - struct vm_area_struct *vma) +static inline bool pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, + spinlock_t *old_pmd_ptl, bool has_deposit) { /* * With split pmd lock we also need to move preallocated * PTE page table if new_pmd is on different PMD page table. - * - * We also don't deposit and withdraw tables for file pages. */ - return (new_pmd_ptl !=3D old_pmd_ptl) && vma_is_anonymous(vma); + return (new_pmd_ptl !=3D old_pmd_ptl) && has_deposit; } #endif =20 @@ -2595,8 +2592,11 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, { spinlock_t *old_ptl, *new_ptl; pmd_t pmd; + struct folio *folio =3D NULL; struct mm_struct *mm =3D vma->vm_mm; bool force_flush =3D false; + bool has_deposit; + bool is_present; =20 /* * The destination pmd shouldn't be established, free_pgtables() @@ -2618,11 +2618,15 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsi= gned long old_addr, if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); pmd =3D pmdp_huge_get_and_clear(mm, old_addr, old_pmd); - if (pmd_present(pmd)) + is_present =3D pmd_present(pmd); + if (is_present) force_flush =3D true; VM_BUG_ON(!pmd_none(*new_pmd)); =20 - if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { + folio =3D normal_or_softleaf_folio_pmd(vma, old_addr, pmd, is_present); + has_deposit =3D has_deposited_pgtable(vma, pmd, folio); + + if (pmd_move_must_withdraw(new_ptl, old_ptl, has_deposit)) { pgtable_t pgtable; pgtable =3D pgtable_trans_huge_withdraw(mm, old_pmd); pgtable_trans_huge_deposit(mm, new_pmd, pgtable); --=20 2.43.0 From nobody Mon Jun 8 20:41:47 2026 Received: from canpmsgout09.his.huawei.com (canpmsgout09.his.huawei.com [113.46.200.224]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32E1B40627B; Tue, 26 May 2026 14:56:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.224 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807386; cv=none; b=rP5J1rvLS6lu1g2P4Lx9nfrYDJpZ/9t9n21hNzO+tBVOCMKBo8mxZVpKVMYycE76qFao4VvYD0PdDSwKcXrU7qRCSDzkg47HIjXsY1tl3rLktenXZ2X0khOYUshYBOZE9XlQ4kSPZXJCCdPMaxYkrt4ed3hWkir9ICYIzhOcW7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779807386; c=relaxed/simple; bh=fIamRjwG5CRrUwTm/ULdMuBv2N+tAmaE8RZ7rRP1iTI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BlC98KI+y//eQtgCLOm6gZEHY0QTBtxF2LrW92KB1baDO/qVHcW2yWYCzFNy3Itomc4QC+xJlu/66HTEdFuMwEzHniDvKw1XCQWT3/7Ks7DIV8lt6avIFyBE1Z+f9xfiOVqpm8a4jMR1HDlYk7azcg0vUgr78MlqqwyyDNfCF68= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=CR/mU79d; arc=none smtp.client-ip=113.46.200.224 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="CR/mU79d" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=k9KVwVo2Cm4f4rYBWd0AF4RdxTPXu9UJm3DZXHV1hSk=; b=CR/mU79d0PRk0JiMm+3Bz8ydmIH42L8XRQ5l/1mx34szLa8TeL9FDoHeIzGpUof1CeyHqV1uo MiMAzDnrdroxqRe8tboa8teu1VRpNr2pbt00sMv5yTUngQDbQorn/c8WgUYfzFhtNbbmPTL1rES BykeJAQlxO0hm/dE9NVpZho= Received: from mail.maildlp.com (unknown [172.19.163.15]) by canpmsgout09.his.huawei.com (SkyGuard) with ESMTPS id 4gPwc33yv6z1cyNp; Tue, 26 May 2026 22:48:35 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 7C67340539; Tue, 26 May 2026 22:56:19 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 26 May 2026 22:56:17 +0800 From: Yin Tirui To: Andrew Morton , Matthew Wilcox , David Hildenbrand , Lorenzo Stoakes , Juergen Gross , Jonathan Cameron , Will Deacon CC: Catalin Marinas , Peter Xu , Luiz Capitulino , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Liam R . Howlett" , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Anshuman Khandual , Rohan McLure , Kevin Brodsky , Alistair Popple , Andrew Donnellan , Pasha Tatashin , Baoquan He , Thomas Huth , Coiby Xu , Dan Williams , Yu-cheng Yu , Lu Baolu , Conor Dooley , Rik van Riel , , , , , , , , , Subject: [PATCH mm-unstable RFC v4 7/7] mm: add PMD-level PFNMAP support for remap_pfn_range() Date: Tue, 26 May 2026 22:50:03 +0800 Message-ID: <20260526145003.88445-8-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260526145003.88445-1-yintirui@huawei.com> References: <20260526145003.88445-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Teach remap_pfn_range() to install PMD-sized PFNMAP entries when the virtual range and PFN are PMD-aligned, the architecture exposes PMD PFNMAP support, and PMD leaves are available at runtime. The path only runs on VMAs without ->fault or ->huge_fault, so the resulting PMDs are known to be non-refaultable. Non-refaultable PFNMAP PMDs cannot be rebuilt on demand and are therefore installed with a deposited pgtable. vma_pfnmap_has_deposited_pgtable() becomes the common predicate driving the deposit logic in copy_huge_pmd(), zap_huge_pmd() through has_deposited_pgtable(), and the new __split_huge_pfnmap_pmd(). The split path withdraws the pgtable and populates it with special PTEs derived from the original PMD using pmd_pfn() and pmd_pgprot(). With pmd_pgprot() returning PTE-level pgprot_t, this preserves protection and cache attributes without reintroducing pte_clrhuge(). Signed-off-by: Yin Tirui --- mm/huge_memory.c | 60 ++++++++++++++++++++++++++++----- mm/internal.h | 21 ++++++++++++ mm/memory.c | 87 +++++++++++++++++++++++++++++++++++++++++------- 3 files changed, 148 insertions(+), 20 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index be9b637c813b..19e6d856e8bf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1879,6 +1879,8 @@ bool touch_pmd(struct vm_area_struct *vma, unsigned l= ong addr, return false; } =20 +static bool has_deposited_pgtable(struct vm_area_struct *vma, pmd_t pmdval, + struct folio *folio); static int copy_present_huge_pmd( struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -1912,8 +1914,12 @@ static int copy_present_huge_pmd( * able to wrongly write to the backend MMIO. */ VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); - pte_free(dst_mm, pgtable); - pgtable =3D NULL; + + if (!has_deposited_pgtable(dst_vma, pmd, NULL)) { + pte_free(dst_mm, pgtable); + pgtable =3D NULL; + } + wrprotect =3D false; goto set_pmd; } @@ -2495,11 +2501,19 @@ static bool has_deposited_pgtable(struct vm_area_st= ruct *vma, pmd_t pmdval, if (is_huge_zero_pmd(pmdval)) return !vma_is_dax(vma); =20 + /* + * PMD-sized PFNMAP mappings installed without fault handlers cannot be + * refaulted after the PMD is cleared, so they carry a deposited page + * table for later partial unmap/mprotect. + */ + if (!folio) + return pmd_present(pmdval) && vma_pfnmap_has_deposited_pgtable(vma); + /* * Otherwise, only anonymous folios are deposited, see * __do_huge_pmd_anonymous_page(). */ - return folio && folio_test_anon(folio); + return folio_test_anon(folio); } =20 /** @@ -3118,6 +3132,32 @@ static void __split_huge_zero_page_pmd(struct vm_are= a_struct *vma, pmd_populate(mm, pmd, pgtable); } =20 +static void __split_huge_pfnmap_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ + struct mm_struct *mm =3D vma->vm_mm; + pgtable_t pgtable; + pmd_t old_pmd, _pmd; + pte_t *pte, entry; + + old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); + if (!has_deposited_pgtable(vma, old_pmd, NULL)) + return; + + pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + pte =3D pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + + entry =3D pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)); + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + pte_unmap(pte); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, unsigned long haddr, bool freeze) { @@ -3157,11 +3200,12 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, return __split_huge_zero_page_pmd(vma, haddr, pmd); } =20 - /* Present but not a normal folio: drop the PMD. */ - old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); - if (arch_needs_pgtable_deposit()) - zap_deposited_table(mm, pmd); - return; + /* + * Present PMDs without a normal folio are special mappings. Huge zero = PMDs + * are handled above; the remaining PMD-level special mappings are PFNM= AP + * mappings. + */ + return __split_huge_pfnmap_pmd(vma, haddr, pmd); } =20 if (unlikely(!folio_test_anon(folio))) { diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..f82bd987131d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -198,6 +198,27 @@ static inline void vma_close(struct vm_area_struct *vm= a) } } =20 +static inline bool vma_has_fault_handler(const struct vm_area_struct *vma) +{ + const struct vm_operations_struct *vm_ops =3D vma->vm_ops; + + return vm_ops && (vm_ops->fault || vm_ops->huge_fault); +} + +/* + * PMD-sized PFNMAP mappings installed without fault handlers cannot be + * recreated after the PMD is cleared. Such mappings need a deposited page + * table so they can be split into PTEs for partial unmap/mprotect. + * + * Faultable PFNMAP VMAs can drop the PMD and refault it later, so they do + * not need a deposited page table. + */ +static inline bool +vma_pfnmap_has_deposited_pgtable(const struct vm_area_struct *vma) +{ + return vma_test(vma, VMA_PFNMAP_BIT) && !vma_has_fault_handler(vma); +} + /* unmap_vmas is in mm/memory.c */ void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap); =20 diff --git a/mm/memory.c b/mm/memory.c index 56886d1ddaf3..226e3a53a48e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2943,9 +2943,66 @@ static int remap_pte_range(struct mm_struct *mm, pmd= _t *pmd, return err; } =20 -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static int remap_try_install_pmd_leaf(struct mm_struct *mm, + pmd_t *pmd, struct vm_area_struct *vma, unsigned long addr, + unsigned long end, unsigned long pfn, pgprot_t prot) +{ + pgtable_t pgtable; + spinlock_t *ptl; + unsigned long i; + pmd_t entry; + + if (!pgtable_level_has_pxx_special(PGTABLE_LEVEL_PMD)) + return 0; + + if (!pgtable_has_pmd_leaves()) + return 0; + + /* + * Do not install PMD leaves through remap_pfn_range() for VMAs that have + * a fault handler. With this restriction, a PFNMAP PMD in a VMA without + * a fault handler is known to have been installed by remap_pfn_range() + * and to have a deposited page table for later split; see + * vma_pfnmap_has_deposited_pgtable(). + */ + if (vma_has_fault_handler(vma)) + return 0; + + if (!IS_ALIGNED(addr | end, PMD_SIZE)) + return 0; + + if (!IS_ALIGNED(PFN_PHYS(pfn), PMD_SIZE)) + return 0; + + for (i =3D 0; i < PFN_DOWN(PMD_SIZE); i++) { + if (!pfn_modify_allowed(pfn + i, prot)) + return -EACCES; + } + + pgtable =3D pte_alloc_one(mm); + if (unlikely(!pgtable)) + return 0; + + ptl =3D pmd_lock(mm, pmd); + if (!pmd_none(*pmd)) { + spin_unlock(ptl); + pte_free(mm, pgtable); + return 0; + } + + entry =3D pfn_pmd(pfn, prot); + entry =3D pmd_mkspecial(entry); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + mm_inc_nr_ptes(mm); + set_pmd_at(mm, addr, pmd, entry); + spin_unlock(ptl); + + return 1; +} + +static inline int remap_pmd_range(struct mm_struct *mm, + struct vm_area_struct *vma, pud_t *pud, unsigned long addr, + unsigned long end, unsigned long pfn, pgprot_t prot) { pmd_t *pmd; unsigned long next; @@ -2958,6 +3015,12 @@ static inline int remap_pmd_range(struct mm_struct *= mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next =3D pmd_addr_end(addr, end); + err =3D remap_try_install_pmd_leaf(mm, pmd, vma, addr, next, + pfn + (addr >> PAGE_SHIFT), prot); + if (err < 0) + return err; + if (err > 0) + continue; err =3D remap_pte_range(mm, pmd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) @@ -2966,9 +3029,9 @@ static inline int remap_pmd_range(struct mm_struct *m= m, pud_t *pud, return 0; } =20 -static inline int remap_pud_range(struct mm_struct *mm, p4d_t *p4d, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pud_range(struct mm_struct *mm, + struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, + unsigned long end, unsigned long pfn, pgprot_t prot) { pud_t *pud; unsigned long next; @@ -2980,7 +3043,7 @@ static inline int remap_pud_range(struct mm_struct *m= m, p4d_t *p4d, return -ENOMEM; do { next =3D pud_addr_end(addr, end); - err =3D remap_pmd_range(mm, pud, addr, next, + err =3D remap_pmd_range(mm, vma, pud, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; @@ -2988,9 +3051,9 @@ static inline int remap_pud_range(struct mm_struct *m= m, p4d_t *p4d, return 0; } =20 -static inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_p4d_range(struct mm_struct *mm, + struct vm_area_struct *vma, pgd_t *pgd, unsigned long addr, + unsigned long end, unsigned long pfn, pgprot_t prot) { p4d_t *p4d; unsigned long next; @@ -3002,7 +3065,7 @@ static inline int remap_p4d_range(struct mm_struct *m= m, pgd_t *pgd, return -ENOMEM; do { next =3D p4d_addr_end(addr, end); - err =3D remap_pud_range(mm, p4d, addr, next, + err =3D remap_pud_range(mm, vma, p4d, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; @@ -3049,7 +3112,7 @@ static int remap_pfn_range_internal(struct vm_area_st= ruct *vma, unsigned long ad flush_cache_range(vma, addr, end); do { next =3D pgd_addr_end(addr, end); - err =3D remap_p4d_range(mm, pgd, addr, next, + err =3D remap_p4d_range(mm, vma, pgd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) return err; --=20 2.43.0