From nobody Thu Apr 2 14:08:22 2026 Received: from canpmsgout05.his.huawei.com (canpmsgout05.his.huawei.com [113.46.200.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE2CF36495E for ; Sat, 28 Feb 2026 07:15:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262925; cv=none; b=t/si5BRKOH+No/tizYZbX2TJYc+cJXYBGKwgFyaOPRZmgvT/oCl0NK1RMCQQEtoaSACT+kaYZL4PlUs5jUYOizJdE4eOOoJ54vhzrLrMXmIOMuYi+uv5TQXp3WTeZ4xaZYT4ZEws8q4xRoO/dypYUTA+SetOtA9L8OhFbivH7QM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262925; c=relaxed/simple; bh=+SlEdBvPZCgSKc32L3wZvJ39KiG2IammxUtIK46KltA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=l7qDZKVK4XK2nHGiS/G9gj4YgywtX8VR0FdYykJydsR+Rku5n3+KkPTSDBMIgXIVZ3nxnb+/YVp3+7Yx6InNpgrveyBwq3X685zqrzvAv/Yx6fyTe+55Hr16StZQvnxLT0/roKlBh9CesQgVMNt6lTu2m0N5U4QDnQ/saQ4HF2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=R2hHTXWA; arc=none smtp.client-ip=113.46.200.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="R2hHTXWA" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=+vKnFbDy7FB6sF7REDl1Rwe0wUQPquhjGlWDfeFYqFE=; b=R2hHTXWA18ZhrxOkzNZLndRDgNWvHTt+4DcHZrJf/rsTQ5W2EDWOF0cxB/RCoS2dA7df9cByL 8xnULNpZcgD71u+b36aCNilavhq6qOStTB4fDjckgq+oJm/h4yJIZPZzoYFBaja2Rnmzsy3r2Su MGv9qdoOdNwtKwG+o0Tz61c= Received: from mail.maildlp.com (unknown [172.19.163.104]) by canpmsgout05.his.huawei.com (SkyGuard) with ESMTPS id 4fNGYr3FQxz12LDx; Sat, 28 Feb 2026 15:10:40 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 4796C4056A; Sat, 28 Feb 2026 15:15:19 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sat, 28 Feb 2026 15:15:18 +0800 From: Yin Tirui To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: , , Subject: [PATCH RFC v3 1/4] x86/mm: Use proper page table helpers for huge page generation Date: Sat, 28 Feb 2026 15:09:03 +0800 Message-ID: <20260228070906.1418911-2-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260228070906.1418911-1-yintirui@huawei.com> References: <20260228070906.1418911-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Historically, several core x86 mm subsystems (vmemmap, vmalloc, and CPA) have abused `pfn_pte()` to generate PMD and PUD entries by passing pgprot values containing the _PAGE_PSE flag, and then casting the resulting pte_t to a pmd_t or pud_t. This violates strict type safety and prevents us from enforcing the rule that `pfn_pte()` should strictly generate pte without huge page attributes. Fix these abuses by explicitly using the correct level-specific helpers (`pfn_pmd()` and `pfn_pud()`) and their corresponding setters (`set_pmd()`, `set_pud()`). For the CPA (Change Page Attribute) code, which uses `pte_t` as a generic container for page table entries across all levels in __should_split_large_page(), pack the correctly generated PMD/PUD values into the pte_t container. This cleanup prepares the ground for making `pfn_pte()` strictly filter out huge page attributes. Signed-off-by: Yin Tirui --- arch/x86/mm/init_64.c | 6 +++--- arch/x86/mm/pat/set_memory.c | 6 +++++- arch/x86/mm/pgtable.c | 4 ++-- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index df2261fa4f98..d65f3d05c66f 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1518,11 +1518,11 @@ static int __meminitdata node_start; void __meminit vmemmap_set_pmd(pmd_t *pmd, void *p, int node, unsigned long addr, unsigned long next) { - pte_t entry; + pmd_t entry; =20 - entry =3D pfn_pte(__pa(p) >> PAGE_SHIFT, + entry =3D pfn_pmd(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL_LARGE); - set_pmd(pmd, __pmd(pte_val(entry))); + set_pmd(pmd, entry); =20 /* check to see if we have contiguous blocks */ if (p_end !=3D p || node_start !=3D node) { diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 40581a720fe8..87aa0e9a8f82 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1059,7 +1059,11 @@ static int __should_split_large_page(pte_t *kpte, un= signed long address, return 1; =20 /* All checks passed. Update the large page mapping. */ - new_pte =3D pfn_pte(old_pfn, new_prot); + if (level =3D=3D PG_LEVEL_2M) + new_pte =3D __pte(pmd_val(pfn_pmd(old_pfn, new_prot))); + else + new_pte =3D __pte(pud_val(pfn_pud(old_pfn, new_prot))); + __set_pmd_pte(kpte, address, new_pte); cpa->flags |=3D CPA_FLUSHTLB; cpa_inc_lp_preserved(level); diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 2e5ecfdce73c..61320fd44e16 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -644,7 +644,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t= prot) if (pud_present(*pud) && !pud_leaf(*pud)) return 0; =20 - set_pte((pte_t *)pud, pfn_pte( + set_pud(pud, pfn_pud( (u64)addr >> PAGE_SHIFT, __pgprot(protval_4k_2_large(pgprot_val(prot)) | _PAGE_PSE))); =20 @@ -676,7 +676,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t= prot) if (pmd_present(*pmd) && !pmd_leaf(*pmd)) return 0; =20 - set_pte((pte_t *)pmd, pfn_pte( + set_pmd(pmd, pfn_pmd( (u64)addr >> PAGE_SHIFT, __pgprot(protval_4k_2_large(pgprot_val(prot)) | _PAGE_PSE))); =20 --=20 2.22.0 From nobody Thu Apr 2 14:08:22 2026 Received: from canpmsgout12.his.huawei.com (canpmsgout12.his.huawei.com [113.46.200.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2012F36495E for ; Sat, 28 Feb 2026 07:15:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.227 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262930; cv=none; b=k9aXDeE050zNeSSV036kPHGFnvoPBcbBRcs/WO2EmdqYTqaPoPpVtovoKLoqWSFX9iphHy/dSs6iBbkqVMrUEpbFGV8dmrsinSkyj/i0cYtu8bxJ8dUULsvupcMBQ8syfAb5x6Ok/Lv+wdcq73ub/9n4PRwB74q3k1kNz8LSy3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262930; c=relaxed/simple; bh=GIC5vRYgJ75d8warcQxj59U0K6CxdimB7AATy2Nqjio=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i9x5kX7N6bGwyP4zlJvNqBA0L18qCC4q5se28DZsSgPt1UXc4sBFE04mynK27SYfQAMrYNeb5UePhG9BKxALzLGtm2l14r3PvlwirLLYGNzM5Cfk+Es6hC16ftlSJkfhijhHD9TdFFIjvG/loSrKa09812eGAqEOWWyFxwCjqj4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=LoTKhMsn; arc=none smtp.client-ip=113.46.200.227 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="LoTKhMsn" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=4WAYkRFCYpcA+nAu13ib8+d0IGvS7iebtzbag+H95n4=; b=LoTKhMsnG4bUuracOAyVcvy7GChGNh6pZ0RXf5XyLir4bwYo/l+8jGr8CLDvZFyTS5KqC73M5 tcFI8XDj92UQ0NwM/PDBYjTSTLRu09DO2bv0fhN/8yVkr9XQyw+Wwb/ZZT4oJNaYFDxkRhi9SFL 6yMWypZ6QWkvJvOIu++TKjk= Received: from mail.maildlp.com (unknown [172.19.162.92]) by canpmsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4fNGYs1cGwznTVY; Sat, 28 Feb 2026 15:10:41 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 3B40B40568; Sat, 28 Feb 2026 15:15:20 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sat, 28 Feb 2026 15:15:19 +0800 From: Yin Tirui To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: , , Subject: [PATCH RFC v3 2/4] mm/pgtable: Make pfn_pte() filter out huge page attributes Date: Sat, 28 Feb 2026 15:09:04 +0800 Message-ID: <20260228070906.1418911-3-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260228070906.1418911-1-yintirui@huawei.com> References: <20260228070906.1418911-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" A fundamental principle of page table type safety is that `pte_t` represents the lowest level page table entry and should never carry huge page attribut= es. Currently, passing a pgprot with huge page bits (e.g., extracted via pmd_pgprot()) into pfn_pte() creates a malformed PTE that retains the huge attribute, leading to the necessity of the ugly `pte_clrhuge()` anti-patter= n. Enforce type safety by making `pfn_pte()` inherently filter out huge page attributes: - On x86: Strip the `_PAGE_PSE` bit. - On ARM64: Mask out the block descriptor bits in `PTE_TYPE_MASK` and enforce the `PTE_TYPE_PAGE` format. - On RISC-V: No changes required, as RISC-V leaf PMDs and PTEs share the exact same hardware format and do not use a distinct huge bit. Signed-off-by: Yin Tirui --- arch/arm64/include/asm/pgtable.h | 4 +++- arch/x86/include/asm/pgtable.h | 4 ++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index b3e58735c49b..f2a7a40106d2 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -141,7 +141,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t ph= ys) =20 #define pte_pfn(pte) (__pte_to_phys(pte) >> PAGE_SHIFT) #define pfn_pte(pfn,prot) \ - __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(pr= ot)) + __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | \ + ((pgprot_val(prot) & ~(PTE_TYPE_MASK & ~PTE_VALID)) | \ + (PTE_TYPE_PAGE & ~PTE_VALID))) =20 #define pte_none(pte) (!pte_val(pte)) #define pte_page(pte) (pfn_to_page(pte_pfn(pte))) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 1662c5a8f445..a4dbd81d42bf 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -738,6 +738,10 @@ static inline pgprotval_t check_pgprot(pgprot_t pgprot) static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) { phys_addr_t pfn =3D (phys_addr_t)page_nr << PAGE_SHIFT; + + /* Filter out _PAGE_PSE to ensure PTEs never carry the huge page bit */ + pgprot =3D __pgprot(pgprot_val(pgprot) & ~_PAGE_PSE); + /* This bit combination is used to mark shadow stacks */ WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) =3D=3D _PAGE_DIRTY); --=20 2.22.0 From nobody Thu Apr 2 14:08:22 2026 Received: from canpmsgout07.his.huawei.com (canpmsgout07.his.huawei.com [113.46.200.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72E5C364E86 for ; Sat, 28 Feb 2026 07:15:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.222 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262927; cv=none; b=HDilF1Qs1khZNSGJe2k7dpBoB5Tv5bcXQOVzjr6z2fsj1agIvOZqVsKPoJUoveXsIMqa94Oh6xIfRZRAT5XPP6wwTEts38mVJ8cg6g8GGt3piB67Azcps6G1qRU1MRJNWmpjjyLcCsR/7dnq1TX1nE+aoiw28VkjUWXOCnX+Epw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262927; c=relaxed/simple; bh=GPuWPDISYZmK35SCG29E5oqXw4WYLtBJbqlvtb/De5I=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=j2dbO7/hwTgMaWPfMRERDuVNEfVbUAccUK5wPEotceR6jIjV2dgAs+I1qtGHMUoLOQcfTFnXUrJBVWTryicOb1bfmjUXp4D+IdKm7OEjmkjJQBRJfPHkklsMbP0bAphOBuupJk3A52Q1snNKMRYEwlbXR7QTbjU5JSb6vnk6d30= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=b221ls31; arc=none smtp.client-ip=113.46.200.222 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="b221ls31" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=SOSDprgL0d5fs/XXHSWHAOQEs7sQM/PZoje0g5bejM4=; b=b221ls319orrwaq2qXPX2X0zO6JRk7cd5joqvN2G2Lw2Z1N4/h1DK7lO7egszzySI87F0+NCZ O0HVVtR/kXiJ5iHfgXHC74jXTGkIoz4xJ2UEd7vIeGKxB7f9+6BZDG0zz6yp0mKfUcKwuVer6nW 1joxT8IUuxM8Vaj5HB3NzDQ= Received: from mail.maildlp.com (unknown [172.19.163.15]) by canpmsgout07.his.huawei.com (SkyGuard) with ESMTPS id 4fNGYh4GDDzLlT6; Sat, 28 Feb 2026 15:10:32 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 1684A40565; Sat, 28 Feb 2026 15:15:21 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sat, 28 Feb 2026 15:15:19 +0800 From: Yin Tirui To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: , , Subject: [PATCH RFC v3 3/4] x86/mm: Remove pte_clrhuge() and clean up init_64.c Date: Sat, 28 Feb 2026 15:09:05 +0800 Message-ID: <20260228070906.1418911-4-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260228070906.1418911-1-yintirui@huawei.com> References: <20260228070906.1418911-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" With `pfn_pte()` now guaranteeing that it will natively filter out huge page attributes like `_PAGE_PSE`, the `pte_clrhuge()` helper has become obsolete. Remove `pte_clrhuge()` entirely. Concurrently, clean up the ugly type-casti= ng anti-pattern in `arch/x86/mm/init_64.c` where `(pte_t *)` was forcibly cast from `pmd_t *` to call `pte_clrhuge()`. Now, we can simply extract the pgprot directly via `pmd_pgprot()` and safely pass it downstream, knowi= ng that `pfn_pte()` will strip the huge bit automatically. Signed-off-by: Yin Tirui --- arch/x86/include/asm/pgtable.h | 5 ----- arch/x86/mm/init_64.c | 4 ++-- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a4dbd81d42bf..e8564d4ce318 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -483,11 +483,6 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte_set_flags(pte, _PAGE_PSE); } =20 -static inline pte_t pte_clrhuge(pte_t pte) -{ - return pte_clear_flags(pte, _PAGE_PSE); -} - static inline pte_t pte_mkglobal(pte_t pte) { return pte_set_flags(pte, _PAGE_GLOBAL); diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index d65f3d05c66f..a1ddcf793a8a 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -572,7 +572,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, uns= igned long paddr_end, paddr_last =3D paddr_next; continue; } - new_prot =3D pte_pgprot(pte_clrhuge(*(pte_t *)pmd)); + new_prot =3D pmd_pgprot(*pmd); } =20 if (page_size_mask & (1<; Sat, 28 Feb 2026 07:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262932; cv=none; b=WE7VDzKWgx8lxJ01PB+F7vDK1//VsdN/k2i5HjtgFAZrI3Ed4EXm3EvjcovPHPbpIRvOYP3uLbBpuq7h189n8CQ0Y08gvuE40VEWsOQuO7IE2ZSyoWfKZoWzYa0RfjmPZ1pUh2OyQA8tDXnYMb2H14ANxn38RbuUQuSQUdHoPUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772262932; c=relaxed/simple; bh=86M1OWsXpT6LGIsYDto7cpBzWLgsuelYkJ5yI59VmYs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ONmjdQ1ak8qrVCvKyU52S2rcV4raZBJy3rnB9irYUEbcFS9OXfZ35Pc3eCV8fsSgos29tqUhkmqNRsZuSC4fYAGKXj0FiSwQDR82wSyh3kQVS6sGfbEeM2t2ApxHTQtqUqre83+lzhzvscb46eavcuP7m7Hth+OvBKrINaJhZ8U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=F2yxSYtM; arc=none smtp.client-ip=113.46.200.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="F2yxSYtM" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=GY7w9lG2hNjdddPK/8x1giPete9XsAMvnf7gDXNDb+Y=; b=F2yxSYtM126iXPXdt97LlTD02LO7RRIbC7ASktZs819acq4FuUmUfnDGPoEXJ5gfBqoq++I+U VDjuSx7eYE2bDrwHfE4kyrEq0bo0V6N7u9JZnmeORNH0RStBvhdzxjA5ALyY5zk4khG4RjBYism OW4U8djntX5MznrWT8xvruQ= Received: from mail.maildlp.com (unknown [172.19.162.197]) by canpmsgout02.his.huawei.com (SkyGuard) with ESMTPS id 4fNGYM6X91zcb1Z; Sat, 28 Feb 2026 15:10:15 +0800 (CST) Received: from kwepemr500001.china.huawei.com (unknown [7.202.194.229]) by mail.maildlp.com (Postfix) with ESMTPS id 041724056E; Sat, 28 Feb 2026 15:15:22 +0800 (CST) Received: from huawei.com (10.50.87.63) by kwepemr500001.china.huawei.com (7.202.194.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Sat, 28 Feb 2026 15:15:20 +0800 From: Yin Tirui To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: , , Subject: [PATCH RFC v3 4/4] mm: add PMD-level huge page support for remap_pfn_range() Date: Sat, 28 Feb 2026 15:09:06 +0800 Message-ID: <20260228070906.1418911-5-yintirui@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260228070906.1418911-1-yintirui@huawei.com> References: <20260228070906.1418911-1-yintirui@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemr500001.china.huawei.com (7.202.194.229) Content-Type: text/plain; charset="utf-8" Add PMD-level huge page support to remap_pfn_range(), automatically creating huge mappings when prerequisites are satisfied (size, alignment, architecture support, etc.) and falling back to normal page mappings otherwise. Implement special huge PMD splitting by utilizing the pgtable deposit/ withdraw mechanism. When splitting is needed, the deposited pgtable is withdrawn and populated with individual PTEs created from the original huge mapping. Signed-off-by: Yin Tirui --- mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++-- mm/memory.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4ca8cfd7f9d..e463d51005ee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1857,6 +1857,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pmd =3D pmdp_get_lockless(src_pmd); if (unlikely(pmd_present(pmd) && pmd_special(pmd) && !is_huge_zero_pmd(pmd))) { + pgtable =3D pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; dst_ptl =3D pmd_lock(dst_mm, dst_pmd); src_ptl =3D pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -1870,6 +1873,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, * able to wrongly write to the backend MMIO. */ VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); + + /* dax won't reach here, it will be intercepted at vma_needs_copy() */ + VM_WARN_ON_ONCE(vma_is_dax(src_vma)); + + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); goto set_pmd; } =20 @@ -2360,6 +2369,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_ar= ea_struct *vma, arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (!vma_is_dax(vma) && vma_is_special_huge(vma)) { + if (pmd_special(orig_pmd)) + zap_deposited_table(tlb->mm, pmd); if (arch_needs_pgtable_deposit()) zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); @@ -3005,14 +3016,35 @@ static void __split_huge_pmd_locked(struct vm_area_= struct *vma, pmd_t *pmd, =20 if (!vma_is_anonymous(vma)) { old_pmd =3D pmdp_huge_clear_flush(vma, haddr, pmd); + + if (!vma_is_dax(vma) && vma_is_special_huge(vma)) { + pte_t entry; + + if (!pmd_special(old_pmd)) { + zap_deposited_table(mm, pmd); + return; + } + pgtable =3D pgtable_trans_huge_withdraw(mm, pmd); + if (unlikely(!pgtable)) + return; + pmd_populate(mm, &_pmd, pgtable); + pte =3D pte_offset_map(&_pmd, haddr); + entry =3D pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)); + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + pte_unmap(pte); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); + return; + } + /* * We are going to unmap this huge page. So * just go ahead and zap it */ if (arch_needs_pgtable_deposit()) zap_deposited_table(mm, pmd); - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) - return; + if (unlikely(pmd_is_migration_entry(old_pmd))) { const softleaf_t old_entry =3D softleaf_from_pmd(old_pmd); =20 diff --git a/mm/memory.c b/mm/memory.c index 07778814b4a8..affccf38cbcf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2890,6 +2890,40 @@ static int remap_pte_range(struct mm_struct *mm, pmd= _t *pmd, return err; } =20 +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP +static int remap_try_huge_pmd(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot) +{ + pgtable_t pgtable; + spinlock_t *ptl; + + if ((end - addr) !=3D PMD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PMD_SIZE)) + return 0; + + if (!IS_ALIGNED(pfn, HPAGE_PMD_NR)) + return 0; + + if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) + return 0; + + pgtable =3D pte_alloc_one(mm); + if (unlikely(!pgtable)) + return 0; + + mm_inc_nr_ptes(mm); + ptl =3D pmd_lock(mm, pmd); + set_pmd_at(mm, addr, pmd, pmd_mkspecial(pmd_mkhuge(pfn_pmd(pfn, prot)))); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + spin_unlock(ptl); + + return 1; +} +#endif + static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) @@ -2905,6 +2939,12 @@ static inline int remap_pmd_range(struct mm_struct *= mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next =3D pmd_addr_end(addr, end); +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP + if (remap_try_huge_pmd(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot)) { + continue; + } +#endif err =3D remap_pte_range(mm, pmd, addr, next, pfn + (addr >> PAGE_SHIFT), prot); if (err) --=20 2.22.0