From nobody Mon Feb 9 03:14:28 2026 Received: from canpmsgout02.his.huawei.com (canpmsgout02.his.huawei.com [113.46.200.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7B0128C864; Fri, 26 Dec 2025 10:07:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766743633; cv=none; b=IGvuo/zekzSh71cBghf44TwzpCgfGvaWerCzio9/QDhrl/ID20toFVwWhN/pQ6DK7fH34snnDbRJlMGIODoTJnb83cgZnFwS25KKgCBRpsp1hwIzPvEvBt0q6bsfcK0wrbyy6kgfyWVO5ZEjqLIvToJuxxBByd8Tn8XHp87MGmg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766743633; c=relaxed/simple; bh=+MolMJ3fQLMMKkPduWXE857VjXaksVKyleV4rEKCjlY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CujVi1DFcULwjgR+wHZT2L+Barme4wey8R/V7w7ikdRFXp431hvuyDBeD86MLRTD14LYiq5PoMq0Xjgy5a1pAbRGEGONYkUDD+4wpmfDXJUJvvgl922Ddxxxid5STqZuFAVzxKZq29bqYGmrQp5NhFpIIKa5h3/uiC+CCOa/Zr8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=bfevaC0V; arc=none smtp.client-ip=113.46.200.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="bfevaC0V" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=FelxkyErKwb1fCWEkQdWu7laREVOj//W6+ZNk547/w4=; b=bfevaC0VlaS9jhmmNkT/OUKQ5e1dnyVIixiQc3MalsmrXfjRi/FJDBe7GWMsJutkH/BeUJTT2 V2z+v3UldI1KC9EQ95SQm1wpwW7xisylD/ixNMQDyOOxxQCIdqdamMSKvRG1+KO7Pq9PXbk15kO PTPDm2uQjzYepYME2C2qzVE= Received: from mail.maildlp.com (unknown [172.19.163.0]) by canpmsgout02.his.huawei.com (SkyGuard) with ESMTPS id 4dd1RD2q5Kzcb1b; Fri, 26 Dec 2025 18:03:52 +0800 (CST) Received: from dggpemf500012.china.huawei.com (unknown [7.185.36.8]) by mail.maildlp.com (Postfix) with ESMTPS id 5596F4036C; Fri, 26 Dec 2025 18:07:02 +0800 (CST) Received: from huawei.com (10.50.85.135) by dggpemf500012.china.huawei.com (7.185.36.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 26 Dec 2025 18:07:01 +0800 From: Zhang Qilong To: , , , CC: , , , , , , , , , , , , , , , , , , , Subject: [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for executable file mmap Date: Fri, 26 Dec 2025 18:03:36 +0800 Message-ID: <20251226100337.4171191-2-zhangqilong3@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251226100337.4171191-1-zhangqilong3@huawei.com> References: <20251226100337.4171191-1-zhangqilong3@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To dggpemf500012.china.huawei.com (7.185.36.8) Content-Type: text/plain; charset="utf-8" During the user-space hot patching, the involved executable file segments of private mapping will be modified. If the modification meets THP mapping, the PMD entry will be cleared at first and do page COW fault handle. Currently, khugepaged may attempt to merge scattered file pages into THP. However, due to the single page COW, the modified executable segments can not be mapped in THP once again for hot patched process. Hence it can not benefit form khugepaged efforts. The executable segment mapped in page granularity may reduce the performance due to lower iTLB cache hit rate compared with the original THP mapping. For user-space hot patching, we introduce THP COW support for the executable mapping. If the exec COW meets THP mapping, it will allocate a anonymous THP and map it to remain PMD mapping. Signed-off-by: Zhang Qilong Tested-by: wang lian --- v2: - Fix linux-next build error (call to undeclared function vma_is_special_huge()), move it to do_huge_pmd_exec_cow() - Add a variable 'vm_flags' in wp_huge_pmd() --- include/linux/huge_mm.h | 1 + mm/huge_memory.c | 91 +++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 8 ++++ 3 files changed, 100 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a4d9f964dfde..8b710751d1e2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -23,10 +23,11 @@ static inline void huge_pud_set_accessed(struct vm_faul= t *vmf, pud_t orig_pud) { } #endif =20 vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_exec_cow(struct vm_fault *vmf); bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *= vma, pmd_t *pmd, unsigned long addr, unsigned long next); int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t= *pmd, unsigned long addr); int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t= *pud, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40cf59301c21..ae599431989d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2146,10 +2146,101 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vm= f) fallback: __split_huge_pmd(vma, vmf->pmd, vmf->address, false); return VM_FAULT_FALLBACK; } =20 +vm_fault_t do_huge_pmd_exec_cow(struct vm_fault *vmf) +{ + vm_fault_t ret; + struct vm_area_struct *vma =3D vmf->vma; + struct folio *folio, *src_folio; + pmd_t orig_pmd =3D vmf->orig_pmd; + unsigned long haddr =3D vmf->address & PMD_MASK; + struct mmu_notifier_range range; + pgtable_t pgtable =3D NULL; + + /* Skip special and shmem */ + if (vma_is_special_huge(vma) || vma_is_shmem(vma)) + return VM_FAULT_FALLBACK; + + ret =3D vmf_anon_prepare(vmf); + if (ret) + return ret; + + folio =3D vma_alloc_anon_folio_pmd(vma, haddr); + if (!folio) + return VM_FAULT_FALLBACK; + + if (!arch_needs_pgtable_deposit()) { + pgtable =3D pte_alloc_one(vma->vm_mm); + if (!pgtable) { + ret =3D VM_FAULT_OOM; + goto release; + } + } + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(pmdp_get(vmf->pmd), orig_pmd))) + goto unlock_ptl; + + ret =3D check_stable_address_space(vma->vm_mm); + if (ret) + goto unlock_ptl; + + src_folio =3D pmd_folio(orig_pmd); + if (!folio_trylock(src_folio)) { + ret =3D VM_FAULT_FALLBACK; + goto unlock_ptl; + } + + /* + * If uptodate bit is not set, it means this source folio is + * stale or invalid now, this memory data in it is not + * untrustworthy. So we just avoid copying it and fallback. + */ + if (!folio_test_uptodate(src_folio)) { + ret =3D VM_FAULT_FALLBACK; + goto unlock_folio; + } + + if (copy_user_large_folio(folio, src_folio, haddr, vma)) { + ret =3D VM_FAULT_HWPOISON; + goto unlock_folio; + } + folio_mark_uptodate(folio); + + folio_unlock(src_folio); + pmdp_huge_clear_flush(vma, haddr, vmf->pmd); + folio_remove_rmap_pmd(src_folio, folio_page(src_folio, 0), vma); + add_mm_counter(vma->vm_mm, mm_counter_file(src_folio), -HPAGE_PMD_NR); + folio_put(src_folio); + + map_anon_folio_pmd_pf(folio, vmf->pmd, vma, haddr); + if (pgtable) + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); + mm_inc_nr_ptes(vma->vm_mm); + spin_unlock(vmf->ptl); + mmu_notifier_invalidate_range_end(&range); + + return ret; + +unlock_folio: + folio_unlock(src_folio); +unlock_ptl: + spin_unlock(vmf->ptl); + mmu_notifier_invalidate_range_end(&range); +release: + if (pgtable) + pte_free(vma->vm_mm, pgtable); + folio_put(folio); + + return ret; +} + static inline bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, pmd_t pmd) { struct page *page; =20 diff --git a/mm/memory.c b/mm/memory.c index ee15303c4041..691e3ca38cc6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6104,10 +6104,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_= fault *vmf) =20 /* `inline' is required to avoid gcc 4.1.2 build error */ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; + const vm_flags_t vm_flags =3D vma->vm_flags; const bool unshare =3D vmf->flags & FAULT_FLAG_UNSHARE; vm_fault_t ret; =20 if (vma_is_anonymous(vma)) { if (likely(!unshare) && @@ -6125,10 +6126,17 @@ static inline vm_fault_t wp_huge_pmd(struct vm_faul= t *vmf) if (!(ret & VM_FAULT_FALLBACK)) return ret; } } =20 + if (is_exec_mapping(vm_flags) && + is_cow_mapping(vm_flags)) { + ret =3D do_huge_pmd_exec_cow(vmf); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } + split: /* COW or write-notify handled on pte level: split pmd. */ __split_huge_pmd(vma, vmf->pmd, vmf->address, false); =20 return VM_FAULT_FALLBACK; --=20 2.43.0 From nobody Mon Feb 9 03:14:28 2026 Received: from canpmsgout08.his.huawei.com (canpmsgout08.his.huawei.com [113.46.200.223]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0DD22E2663; Fri, 26 Dec 2025 10:07:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.223 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766743631; cv=none; b=EMThQIhxynRSDn9x6QjH2YnMALMNi7FJ4vikquS+QxFcYKYUWP5DRFjRs6vwdOtzNu8aXRMREYOGZjVB7V78Qs4wgSNOS9txPZ7zeutMe/X07L4bNb+JzoM5psFsiyiI4gxHpD2mqHSs0MGzibtSRZ3PFT6XuaiDVmXWj2wsKOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766743631; c=relaxed/simple; bh=4xvEdFpZvR3SKFkUK1wXHoUUb4W2Z7H1w64jltYA1nw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pqCgwtC89A207o+fjfgnIZROXsnCtFw6pAZjcCz+QC0DsLTCBu19mni003IBlL6m9VhCtyLdxwxO0PhKAPy8EB9CDIWW1BxUeiUl+vqVbqgW7sxSNuBcBCxsanDExoDqeNtWs/6zzeEiqlxouGPFhUZcXZXNExf42zBzTZlxqO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=ImJTDvuJ; arc=none smtp.client-ip=113.46.200.223 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="ImJTDvuJ" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=uHCEbvNncUuRInD7vc2/gwTA6d8ybu6jRUaqdDQkZqk=; b=ImJTDvuJi1+ZtX5SRSl+cYjB+gjMZvdy7CfO6TcarAyj6li0iYNm59timP5kR2LiErVcKYSKU 3rixA21pODZizTNSIRyouLGsgHeqXYwUeHFK+PfzDq4orVgC0fuIHvKFa5vt0oUBJ2xsD6MgTXf RFEXlONDUVzIeOzu/ZIaknQ= Received: from mail.maildlp.com (unknown [172.19.163.200]) by canpmsgout08.his.huawei.com (SkyGuard) with ESMTPS id 4dd1RH4shKzmVXB; Fri, 26 Dec 2025 18:03:55 +0800 (CST) Received: from dggpemf500012.china.huawei.com (unknown [7.185.36.8]) by mail.maildlp.com (Postfix) with ESMTPS id 793D840563; Fri, 26 Dec 2025 18:07:04 +0800 (CST) Received: from huawei.com (10.50.85.135) by dggpemf500012.china.huawei.com (7.185.36.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 26 Dec 2025 18:07:03 +0800 From: Zhang Qilong To: , , , CC: , , , , , , , , , , , , , , , , , , , Subject: [PATCH next v2 2/2] mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW Date: Fri, 26 Dec 2025 18:03:37 +0800 Message-ID: <20251226100337.4171191-3-zhangqilong3@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251226100337.4171191-1-zhangqilong3@huawei.com> References: <20251226100337.4171191-1-zhangqilong3@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To dggpemf500012.china.huawei.com (7.185.36.8) Content-Type: text/plain; charset="utf-8" Using the per-VMA hugepage flag to avoid system wide default behavior. If 'vma->vm_flags' indicates a preference for huge pages, then the exec THP COW can be attempted. Signed-off-by: Zhang Qilong --- v2: - Use 'vma->vm_flags' as hint for exec THP COW suggested by David --- mm/memory.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 691e3ca38cc6..eb2bb36e284c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6128,10 +6128,17 @@ static inline vm_fault_t wp_huge_pmd(struct vm_faul= t *vmf) } } =20 if (is_exec_mapping(vm_flags) && is_cow_mapping(vm_flags)) { + /* + * Reuse the per-VMA flag, only if VM_HUGEPAGE is + * set, do exec THP COW. + */ + if (!(vm_flags & VM_HUGEPAGE)) + goto split; + ret =3D do_huge_pmd_exec_cow(vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; } =20 --=20 2.43.0