From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94AFAC4332F for ; Tue, 20 Dec 2022 07:25:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233466AbiLTHZw (ORCPT ); Tue, 20 Dec 2022 02:25:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229866AbiLTHZT (ORCPT ); Tue, 20 Dec 2022 02:25:19 -0500 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A602F17048 for ; Mon, 19 Dec 2022 23:25:13 -0800 (PST) Received: by mail-pg1-x52f.google.com with SMTP id f3so7795626pgc.2 for ; Mon, 19 Dec 2022 23:25:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zN8GQXHI00/HeROFQmKChe4iFQdC5KCgpNeUIT0KwLk=; b=EtsN9zsJ3pirsct1axwSXx4qLQ1AewI0Q6+8HUX5Q51hUSXHkvjDCYErp3TgLe7ajY 0sRp2l1jXLkz7Fy6reBRji4kSlFqrrkuZ8YdL0doJbFjZN2Rz5nWrs9EYXCyzekF4m0G NLitGc5Nay+QWeU6bpxn9/khd11FbuGY48vk5ZxFOqNm6dVEqcb2ahY6OnOrVcYbvYFU +wbAKrZe5YEZWGq0dZfV5K9X9zE9yhctCyAMsaMY0WGVwykhGeXjL9gdsb+cowMkk3aZ MfK94XMaAsIo9CUWFtFOlw4aOtnipNUkGGJ/SC1AKwv48MtgS1iYkC45E1u4/UrAguyr Cvvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zN8GQXHI00/HeROFQmKChe4iFQdC5KCgpNeUIT0KwLk=; b=6X3sh9b3TQS6iKK5dacJGWa8KwLHEYAxr+027CzA3Tjp3vwNnPqF/N4JYfWHFsW0CD cWRQIbI0ueaqrXzgPGz8ytmQt9OA84rA05Kyzz/MYIRGEvb5tAA1OhVUD7Ntz9PE7GU0 0CvGgtLX29e+3AEmauSjKuLxFuBaPrHCM0ZsZOx3tDeXXNUihvGm8JssMhUQPFQ9Itv+ VEooi1hSEXUYNtsNO7ZMfc6bi4RYvpAXL8ml1WrPo/iPBo39Y1fyQFSwKXBZVHpIgAc+ 4p8q+r8S15XxlEaCTN+Gi8N3Y88MLeDNVGlY1ep0H204m+bx0Yn5if/HwdLjRkJNB9H7 MJIg== X-Gm-Message-State: ANoB5pl1Kxgy1MFNdTCYch+oUb9dpTHTI4ZXbpQbwLVGuk4nfw6rCVdh 7G3wrxBbNmiTV5IkpIAM7mU= X-Google-Smtp-Source: AA0mqf6gYVbHTs7yr9zs2kZWYw2+Km59Nz3mlW60Y6fD48DeDmKuygSgqg9b/c4U7WwnJgSjc/H0kA== X-Received: by 2002:a05:6a00:1988:b0:577:49da:6074 with SMTP id d8-20020a056a00198800b0057749da6074mr60208490pfl.19.1671521113122; Mon, 19 Dec 2022 23:25:13 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:12 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 01/14] mm: Allow user to control COW PTE via prctl Date: Tue, 20 Dec 2022 15:27:30 +0800 Message-Id: <20221220072743.3039060-2-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a new prctl, PR_SET_COW_PTE, to allow the user to enable COW PTE. Since it has a time gap between using the prctl to enable the COW PTE and doing the fork, we use two states (MMF_COW_PTE_READY and MMF_COW_PTE) to determine the task that wants to do COW PTE or already doing it. The MMF_COW_PTE_READY flag marks the task to do COW PTE in the next time of fork(). During fork(), if MMF_COW_PTE_READY set, fork() will unset the flag and set the MMF_COW_PTE flag. After that, fork() might shares PTEs instead of duplicates it. Signed-off-by: Chih-En Lin --- include/linux/sched/coredump.h | 12 +++++++++++- include/uapi/linux/prctl.h | 6 ++++++ kernel/sys.c | 11 +++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 8270ad7ae14c2..570d599ebc851 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -83,7 +83,17 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) =20 +/* + * MMF_COW_PTE_READY: Marking the task to do COW PTE in the next time of + * fork(). During fork(), if MMF_COW_PTE_READY set, fork() will unset the + * flag and set the MMF_COW_PTE flag. After that, fork() might shares PTEs + * rather than duplicates it. + */ +#define MMF_COW_PTE_READY 29 /* Share PTE tables in next time of fork() */ +#define MMF_COW_PTE 30 /* PTE tables are shared between processes */ +#define MMF_COW_PTE_MASK (1 << MMF_COW_PTE) + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ - MMF_DISABLE_THP_MASK) + MMF_DISABLE_THP_MASK | MMF_COW_PTE_MASK) =20 #endif /* _LINUX_SCHED_COREDUMP_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a5e06dcbba136..664a3c0230192 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -284,4 +284,10 @@ struct prctl_mm_map { #define PR_SET_VMA 0x53564d41 # define PR_SET_VMA_ANON_NAME 0 =20 +/* + * Set the prepare flag, MMF_COW_PTE_READY, to do the share (copy-on-write) + * page table in the next time of fork. + */ +#define PR_SET_COW_PTE 65 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 5fd54bf0e8867..d1062ea33981e 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2348,6 +2348,14 @@ static int prctl_set_vma(unsigned long opt, unsigned= long start, } #endif /* CONFIG_ANON_VMA_NAME */ =20 +static int prctl_set_cow_pte(struct mm_struct *mm) +{ + if (test_bit(MMF_COW_PTE, &mm->flags)) + return -EINVAL; + set_bit(MMF_COW_PTE_READY, &mm->flags); + return 0; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, ar= g3, unsigned long, arg4, unsigned long, arg5) { @@ -2626,6 +2634,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, ar= g2, unsigned long, arg3, case PR_SET_VMA: error =3D prctl_set_vma(arg2, arg3, arg4, arg5); break; + case PR_SET_COW_PTE: + error =3D prctl_set_cow_pte(me->mm); + break; default: error =3D -EINVAL; break; --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A9E0C4332F for ; Tue, 20 Dec 2022 07:26:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233512AbiLTH0N (ORCPT ); Tue, 20 Dec 2022 02:26:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233490AbiLTHZd (ORCPT ); Tue, 20 Dec 2022 02:25:33 -0500 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 412CF12A99 for ; Mon, 19 Dec 2022 23:25:22 -0800 (PST) Received: by mail-pj1-x102b.google.com with SMTP id o1-20020a17090a678100b00219cf69e5f0so15617129pjj.2 for ; Mon, 19 Dec 2022 23:25:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qM7t0EtrQCBBQ4xEzGpE7GCWeVgMcwv5i8mXTTw5aS8=; b=WuQtPOh65/0EpKrc7P0yc2EC2uypLOA1tW4WPA9wVc5eUXsajzPlK3K0qx9TqvzIvr UfpEluWZcVb39eFjiF7KN9ktADObxUZ8npZ7aXoXividYemlfPBeNe7xSx6r1oVOH4v6 HI+gwOHeQFW2l/fPGDhUg/VtUj2HwTSDU9udfxP0L8vgxF/IB3f1MvDPDnwOGoVILN9t gjeTuaupWPC3lQz5UhW2SVZ0TQ52VdpHh02G4lyoGa8LnFzHU5UNlYvRVMfye5X8KpnM LhkS2pwujkJigAPhtPPhKx6stmuLADxmeadIdEMpbSsNTZyIWHuBPn3ZIj6XBSaepHRn 6TWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qM7t0EtrQCBBQ4xEzGpE7GCWeVgMcwv5i8mXTTw5aS8=; b=0WEkTlD1aEE29zdyqbnpyC5aX/7GjbkRAL3uupCwoCB09ioE2oG05+6GN1uJN/ADkP bLl+x6YZqhBWuC6Fo/pltzSE9wTVj0pIg6jihKs4u1jsB/EaspqVMfWQRs750Z+MzMrp RDP1aNX77+Eb2sJXlqQbguevi9h8miy3B3LJAQdSnFsH/+r2GboZUKG1GT+2EKVld8Pz zU3YqPZ2ySePsp8dvWo68t28xGoKsCLKic7oWA/ReUEIJ4QJTK+qPkGWu1mwhZ+TrqJv BDxtSaIuVq4LNS8a3/qSvFB4jim47HARKfINe38MeA3c0S7aZMEup4ydqLz5wb4VFoSW pYqQ== X-Gm-Message-State: ANoB5pmif0Q0euQ0a6hg7A3B1MkIAcNn6733jvzIGdD2MmF2Otlp2EWW Bk8NbNIzTk+ikGbvE+2J49I= X-Google-Smtp-Source: AA0mqf5cdqNDRdFG4JF/+4PXyNhk67FD9r4sRL5tJFw/Rewib6A6tT2vtLtMlmFr8v7GG5F7WMCTmQ== X-Received: by 2002:a05:6a20:9c9b:b0:a7:a3cf:ddce with SMTP id mj27-20020a056a209c9b00b000a7a3cfddcemr55454577pzb.21.1671521121584; Mon, 19 Dec 2022 23:25:21 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:20 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 02/14] mm: Add Copy-On-Write PTE to fork() Date: Tue, 20 Dec 2022 15:27:31 +0800 Message-Id: <20221220072743.3039060-3-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add copy_cow_pte_range() and recover_pte_range() for copy-on-write (COW) PTE in fork system call. During COW PTE fork, when processing the shared PTE, we traverse all the entries to determine current mapped page is available to share between processes. If PTE can be shared, account those mapped pages and then share the PTE. However, once we find out the mapped page is unavailable, e.g., pinned page, we have to copy it via copy_present_page(), which means that we will fall back to default path, page table copying. And, since we may have already processed some COW-ed PTE entries, before starting the default path, we have to recover those entries. All the COW PTE behaviors are protected by the pte lock. The logic of how we handle nonpresent/present pte entries and error in copy_cow_pte_range() is same as copy_pte_range(). But to keep the codes clean (e.g., avoiding condition lock), we introduce new functions instead of modifying copy_pte_range(). To track the lifetime of COW-ed PTE, introduce the refcount of PTE. We reuse the _refcount in struct page for the page table to maintain the number of process references to COW-ed PTE table. Doing the fork with COW PTE will increase the refcount. And, when someone writes to the COW-ed PTE, it will cause the write fault to break COW PTE. If the refcount of COW-ed PTE is one, the process that triggers the fault will reuse the COW-ed PTE. Otherwise, the process will decrease the refcount and duplicate it. Since we share the PTE between the parent and child, the state of the parent's pte entries is different between COW PTE and the normal fork. COW PTE handles all the pte entries on the child side which means it will clear the dirty and access bit of the parent's pte entry. Signed-off-by: Chih-En Lin --- include/linux/mm.h | 16 +++ mm/memory.c | 263 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 279 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc55654..8c6ec1da2336f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2352,6 +2352,21 @@ static inline bool ptlock_init(struct page *page) { = return true; } static inline void ptlock_free(struct page *page) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ =20 +static inline int pmd_get_pte(pmd_t *pmd) +{ + return page_ref_inc_return(pmd_page(*pmd)); +} + +static inline bool pmd_put_pte(pmd_t *pmd) +{ + return page_ref_add_unless(pmd_page(*pmd), -1, 1); +} + +static inline int cow_pte_count(pmd_t *pmd) +{ + return page_count(pmd_page(*pmd)); +} + static inline void pgtable_init(void) { ptlock_cache_init(); @@ -2364,6 +2379,7 @@ static inline bool pgtable_pte_page_ctor(struct page = *page) return false; __SetPageTable(page); inc_lruvec_page_state(page, NR_PAGETABLE); + set_page_count(page, 1); return true; } =20 diff --git a/mm/memory.c b/mm/memory.c index 8a6d5c823f91b..5b474d14a5411 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -786,11 +786,17 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, unsigned long addr, int *rss) { + /* With COW PTE, dst_vma is src_vma. */ unsigned long vm_flags =3D dst_vma->vm_flags; pte_t pte =3D *src_pte; struct page *page; swp_entry_t entry =3D pte_to_swp_entry(pte); =20 + /* + * If it's COW PTE, parent shares PTE with child. Which means the + * following modifications of child will also affect parent. + */ + if (likely(!non_swap_entry(entry))) { if (swap_duplicate(entry) < 0) return -EIO; @@ -937,6 +943,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma /* * Copy one pte. Returns 0 if succeeded, or -EAGAIN if one preallocated p= age * is required to copy this pte. + * However, if prealloc is NULL, it is COW PTE. */ static inline int copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *sr= c_vma, @@ -960,6 +967,14 @@ copy_present_pte(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { /* Page maybe pinned, we have to copy. */ put_page(page); + /* + * If prealloc is NULL, we are processing share page + * table (COW PTE, in copy_cow_pte_range()). We cannot + * call copy_present_page() right now, instead, we + * should fall back to copy_pte_range(). + */ + if (!prealloc) + return -EAGAIN; return copy_present_page(dst_vma, src_vma, dst_pte, src_pte, addr, rss, prealloc, page); } @@ -980,6 +995,11 @@ copy_present_pte(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, } VM_BUG_ON(page && PageAnon(page) && PageAnonExclusive(page)); =20 + /* + * If it's COW PTE, parent shares PTE with child. + * Which means the following will also affect parent. + */ + /* * If it's a shared mapping, mark it clean in * the child @@ -988,6 +1008,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 + /* For COW PTE, dst_vma is still src_vma. */ if (!userfaultfd_wp(dst_vma)) pte =3D pte_clear_uffd_wp(pte); =20 @@ -1014,6 +1035,8 @@ page_copy_prealloc(struct mm_struct *src_mm, struct v= m_area_struct *vma, return new_page; } =20 + +/* copy_pte_range() will immediately allocate new page table. */ static int copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_= vma, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -1138,6 +1161,199 @@ copy_pte_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, return ret; } =20 +/* + * copy_cow_pte_range() will try to share the page table with child. + * The logic of non-present, present and error handling is same as + * copy_pte_range() but dst_vma and dst_pte are src_vma and src_pte. + * + * We cannot preserve soft-dirty information, because PTE will share + * between multiple processes. + */ +static int +copy_cow_pte_range(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + unsigned long end, unsigned long *recover_end) +{ + struct mm_struct *dst_mm =3D dst_vma->vm_mm; + struct mm_struct *src_mm =3D src_vma->vm_mm; + struct vma_iterator vmi; + struct vm_area_struct *curr =3D src_vma; + pte_t *src_pte, *orig_src_pte; + spinlock_t *src_ptl; + int ret =3D 0; + int rss[NR_MM_COUNTERS]; + swp_entry_t entry =3D (swp_entry_t){0}; + unsigned long vm_end, orig_addr =3D addr; + pgtable_t pte_table =3D pmd_page(*src_pmd); + + end =3D (addr + PMD_SIZE) & PMD_MASK; + addr =3D addr & PMD_MASK; + + /* + * Increase the refcount to prevent the parent's PTE + * dropped/reused. Only increace the refcount at first + * time attached. + */ + src_ptl =3D pte_lockptr(src_mm, src_pmd); + spin_lock(src_ptl); + pmd_get_pte(src_pmd); + pmd_install(dst_mm, dst_pmd, &pte_table); + spin_unlock(src_ptl); + + /* + * We should handle all of the entries in this PTE at this traversal, + * since we cannot promise that the next vma will not do the lazy fork. + */ + vma_iter_init(&vmi, src_mm, addr); + for_each_vma_range(vmi, curr, end) { + vm_end =3D min(end, curr->vm_end); + addr =3D max(addr, curr->vm_start); +again: + init_rss_vec(rss); + src_pte =3D pte_offset_map(src_pmd, addr); + src_ptl =3D pte_lockptr(src_mm, src_pmd); + orig_src_pte =3D src_pte; + spin_lock(src_ptl); + + arch_enter_lazy_mmu_mode(); + + do { + if (pte_none(*src_pte)) + continue; + if (unlikely(!pte_present(*src_pte))) { + /* + * Although, parent's PTE is COW-ed, we should + * still need to handle all the swap stuffs. + */ + ret =3D copy_nonpresent_pte(dst_mm, src_mm, + src_pte, src_pte, + curr, curr, + addr, rss); + if (ret =3D=3D -EIO) { + entry =3D pte_to_swp_entry(*src_pte); + break; + } else if (ret =3D=3D -EBUSY) { + break; + } else if (!ret) + continue; + /* + * Device exclusive entry restored, continue by + * copying the now present pte. + */ + WARN_ON_ONCE(ret !=3D -ENOENT); + } + /* + * copy_present_pte() will determine the mapped page + * should be COW or not. + */ + ret =3D copy_present_pte(curr, curr, src_pte, src_pte, + addr, rss, NULL); + /* + * If we need a pre-allocated page for this pte, + * drop the lock, recover all the entries, fall + * back to copy_pte_range(), and try again. + */ + if (unlikely(ret =3D=3D -EAGAIN)) + break; + } while (src_pte++, addr +=3D PAGE_SIZE, addr !=3D vm_end); + + arch_leave_lazy_mmu_mode(); + add_mm_rss_vec(dst_mm, rss); + spin_unlock(src_ptl); + pte_unmap(orig_src_pte); + cond_resched(); + + if (ret =3D=3D -EIO) { + VM_WARN_ON_ONCE(!entry.val); + if (add_swap_count_continuation(entry, GFP_KERNEL) < 0) { + ret =3D -ENOMEM; + goto out; + } + entry.val =3D 0; + } else if (ret =3D=3D -EBUSY) { + goto out; + } else if (ret =3D=3D -EAGAIN) { + /* + * We've to allocate the page immediately but first we + * should recover the processed entries and fall back + * to copy_pte_range(). + */ + *recover_end =3D addr; + return -EAGAIN; + } else if (ret) { + VM_WARN_ON_ONCE(1); + } + + /* We've captured and resolved the error. Reset, try again. */ + ret =3D 0; + if (addr !=3D vm_end) + goto again; + } + +out: + /* + * All the pte entries are available to COW. + * Now, we can share with child. + */ + pmdp_set_wrprotect(src_mm, orig_addr, src_pmd); + set_pmd_at(dst_mm, orig_addr, dst_pmd, pmd_wrprotect(*src_pmd)); + + return ret; +} + +/* WHen doing the recover, we should hold the locks entirely. */ +static int +recover_pte_range(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long end) +{ + struct mm_struct *dst_mm =3D dst_vma->vm_mm; + struct mm_struct *src_mm =3D src_vma->vm_mm; + pte_t *orig_src_pte, *orig_dst_pte; + pte_t *src_pte, *dst_pte; + spinlock_t *src_ptl, *dst_ptl; + unsigned long addr =3D end & PMD_MASK; + int ret =3D 0; + + /* Before we allocate the new PTE, clear the entry. */ + pmd_clear(dst_pmd); + dst_pte =3D pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); + if (!dst_pte) { + ret =3D -ENOMEM; + goto out; + } + + src_pte =3D pte_offset_map(src_pmd, addr); + src_ptl =3D pte_lockptr(src_mm, src_pmd); + spin_lock(src_ptl); + + orig_src_pte =3D src_pte; + orig_dst_pte =3D dst_pte; + arch_enter_lazy_mmu_mode(); + + do { + if (pte_none(*src_pte)) + continue; + /* COW mapping should also handled by COW PTE. */ + set_pte_at(dst_mm, addr, dst_pte, *src_pte); + } while (dst_pte++, src_pte++, addr +=3D PAGE_SIZE, addr !=3D end); + + arch_leave_lazy_mmu_mode(); + /* + * Before unlock src_ptl, release the holding from child. + * Parent may still share with others, so don't make it writeable. + */ + pmd_put_pte(src_pmd); + spin_unlock(src_ptl); + pte_unmap(orig_src_pte); + pte_unmap_unlock(orig_dst_pte, dst_ptl); + cond_resched(); +out: + + return ret; +} + static inline int copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_= vma, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, @@ -1166,6 +1382,53 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, continue; /* fall through */ } + + /* + * If MMF_COW_PTE set, copy_pte_range() will try to share + * the PTE page table first. In other words, it attempts to + * do COW on PTE (and mapped pages). However, if there has + * any unshareable page (e.g., pinned page, device private + * page), it will fall back to the default path, which will + * copy the page table immediately. + * In such a case, it stores the address of first unshareable + * page to recover_end then goes back to the beginning of PTE + * and recovers the COW-ed PTE entries until it meets the same + * unshareable page again. During the recovering, because of + * COW-ed PTE entries are logical same as COW mapping, so it + * only needs to allocate the new PTE and sets COW-ed PTE + * entries to new PTE (which will be same as COW mapping). + */ + if (test_bit(MMF_COW_PTE, &src_mm->flags)) { + unsigned long recover_end =3D 0; + int ret; + + /* + * Setting wrprotect with normal PTE to pmd entry + * will trigger pmd_bad(). Skip bad checking here. + */ + if (pmd_none(*src_pmd)) + continue; + /* Skip if the PTE already did COW PTE this time. */ + if (!pmd_none(*dst_pmd) && !pmd_write(*dst_pmd)) + continue; + + ret =3D copy_cow_pte_range(dst_vma, src_vma, + dst_pmd, src_pmd, + addr, next, &recover_end); + if (!ret) { + /* COW PTE succeeded. */ + continue; + } else if (ret =3D=3D -EAGAIN) { + /* fall back to normal copy method. */ + if (recover_pte_range(dst_vma, src_vma, + dst_pmd, src_pmd, + recover_end)) + return -ENOMEM; + addr =3D recover_end; + /* fall through */ + } else if (ret) + return -ENOMEM; + } if (pmd_none_or_clear_bad(src_pmd)) continue; if (copy_pte_range(dst_vma, src_vma, dst_pmd, src_pmd, --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB0A3C4332F for ; Tue, 20 Dec 2022 07:26:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233444AbiLTH0Q (ORCPT ); Tue, 20 Dec 2022 02:26:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233517AbiLTHZf (ORCPT ); Tue, 20 Dec 2022 02:25:35 -0500 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C31871659E for ; Mon, 19 Dec 2022 23:25:30 -0800 (PST) Received: by mail-pl1-x62c.google.com with SMTP id l4so122209pld.13 for ; Mon, 19 Dec 2022 23:25:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/t6LuwFAGWbsZ6dMtjW/A8v0QGvpGJligamF6F81WOc=; b=LK6A8Ib1of+UjiY50k1DoKIshnVtLUdpsiAq3QvElWxYPooXdkY2uj+3REuWy5FyJe W6DWqBPi8VZ6vGc5KGHdb1llRt1LgH+EnOjCuSmeZoU243yVA4fNFDk+niNM8ULiI109 fW7PKLJlGLT0P1h5Ej7eHyD5XuhpLYKBRa1qQrLAFy3NIX2Sd65er3C9VK/N3det7JYi bYYCLA/wPxCcBlIM8qNn23Jm61b+AWZTbeGzy9bz89OVHGlgJwP6Sgqj0AQKup11mO0d DI441aM6+te1JDuxLpewD14n1aPMN8LjqXRYXDdzzw8qjLpBwOzYf5AYISM9C0Tn6BGg yt5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/t6LuwFAGWbsZ6dMtjW/A8v0QGvpGJligamF6F81WOc=; b=S0MfqfQ3e66NpxskQzsWy6bKMy2CoLbSDZ3gL/lrENeSU004D9cXddUcjOfhPda9CT HMmacEbC6r0IYZ2hVlxgsg85UspW3KbObOybUGWKArUJP8ydQSq2c+FqU3ppGyctfnpO gm9wo7gH05Wm0Zvka3wtGkYu/13T/DAasf9YeYMHvG9ocFJir1TQv++O6S3ThZcPtfRr OE4RwKh6fybp1vUtdBAfIzSM6tSkNgsYCVZkHDp4B6nXPxXGORnz2V3UVGSDmtIA1Zpu wTsj7/AZAq2J7av5ns57Ke1Xp8WEU34UDTJuhz/54XU34BzBhUXppAO5jDriIU0wVqwf k6NA== X-Gm-Message-State: ANoB5pn5VDD4qzOTpKUf1rY+sC63I1+UddIQKBHdALSDrWhzuSlYm4jV bWJEFZfjUfy4JGU6u+SqHJE= X-Google-Smtp-Source: AA0mqf72zDpgTCXGteBrrOYijbalbeEHoDalBxMJpqfzoSwDK7e+ka4IWvLTQqCGG5GO6NrPC+8kpQ== X-Received: by 2002:a05:6a20:a023:b0:9d:efd3:66ec with SMTP id p35-20020a056a20a02300b0009defd366ecmr68457722pzj.51.1671521130077; Mon, 19 Dec 2022 23:25:30 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:29 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 03/14] mm: Add break COW PTE fault and helper functions Date: Tue, 20 Dec 2022 15:27:32 +0800 Message-Id: <20221220072743.3039060-4-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add the function, break_cow_pte_fault(), to break (unshare) COW-ed PTE with the page fault that will modify the PTE table or the mapped page resided in COW-ed PTE (i.e., write, unshared, file read). When breaking COW PTE, it first checks COW-ed PTE's refcount to try to reuse it. If COW-ed PTE cannot be reused, allocates new PTE and duplicates all pte entries in COW-ed PTE. Moreover, flush TLB when we change the write protection of PTE. In addition, provide the helper functions, break_cow_pte{,_range}(), to let the other features (remap, THP, migration, swapfile, etc) to use. Signed-off-by: Chih-En Lin --- include/linux/mm.h | 4 + include/linux/pgtable.h | 6 + mm/memory.c | 319 +++++++++++++++++++++++++++++++++++++++- mm/mmap.c | 4 + mm/mremap.c | 2 + mm/swapfile.c | 2 + 6 files changed, 331 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8c6ec1da2336f..6a0eb01ee6f7e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1894,6 +1894,10 @@ void pagecache_isize_extended(struct inode *inode, l= off_t from, loff_t to); void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t e= nd); int generic_error_remove_page(struct address_space *mapping, struct page *= page); =20 +int break_cow_pte(struct vm_area_struct *vma, pmd_t *pmd, unsigned long ad= dr); +int break_cow_pte_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end); + #ifdef CONFIG_MMU extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, unsigned int flags, diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a108b60a6962b..895fa18e3b011 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1395,6 +1395,12 @@ static inline int pmd_none_or_trans_huge_or_clear_ba= d(pmd_t *pmd) if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) return 1; + /* + * COW-ed PTE has write protection which can trigger pmd_bad(). + * To avoid this, return here if entry is write protection. + */ + if (!pmd_write(pmdval)) + return 0; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); return 1; diff --git a/mm/memory.c b/mm/memory.c index 5b474d14a5411..8ebff4cac2191 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -239,6 +239,35 @@ static inline void free_pmd_range(struct mmu_gather *t= lb, pud_t *pud, pmd =3D pmd_offset(pud, addr); do { next =3D pmd_addr_end(addr, end); + /* + * For COW-ed PTE, the pte entries still mapping to pages. + * However, we should did de-accounting to all of it. So, + * even if the refcount is not the same as zapping, we + * could still fall back to normal PTE and handle it + * without traversing entries to do the de-accounting. + */ + if (test_bit(MMF_COW_PTE, &tlb->mm->flags)) { + if (!pmd_none(*pmd) && !pmd_write(*pmd)) { + spinlock_t *ptl =3D pte_lockptr(tlb->mm, pmd); + + spin_lock(ptl); + if (!pmd_put_pte(pmd)) { + pmd_t new =3D pmd_mkwrite(*pmd); + + set_pmd_at(tlb->mm, addr, pmd, new); + spin_unlock(ptl); + free_pte_range(tlb, pmd, addr); + continue; + } + spin_unlock(ptl); + + pmd_clear(pmd); + mm_dec_nr_ptes(tlb->mm); + flush_tlb_mm_range(tlb->mm, addr, next, + PAGE_SHIFT, false); + } else + VM_WARN_ON(cow_pte_count(pmd) !=3D 1); + } if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); @@ -1676,12 +1705,34 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, pte_t *start_pte; pte_t *pte; swp_entry_t entry; + bool pte_is_shared =3D false; + + if (test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)) { + if (!range_in_vma(vma, addr & PMD_MASK, + (addr + PMD_SIZE) & PMD_MASK)) { + /* + * We cannot promise this COW-ed PTE will also be zap + * with the rest of VMAs. So, break COW PTE here. + */ + break_cow_pte(vma, pmd, addr); + } else { + start_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + if (cow_pte_count(pmd) =3D=3D 1) { + /* Reuse COW-ed PTE */ + pmd_t new =3D pmd_mkwrite(*pmd); + set_pmd_at(tlb->mm, addr, pmd, new); + } else + pte_is_shared =3D true; + pte_unmap_unlock(start_pte, ptl); + } + } =20 tlb_change_page_size(tlb, PAGE_SIZE); again: init_rss_vec(rss); start_pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); pte =3D start_pte; + flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { @@ -1698,11 +1749,15 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, page =3D vm_normal_page(vma, addr, ptent); if (unlikely(!should_zap_page(details, page))) continue; - ptent =3D ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); + if (pte_is_shared) + ptent =3D *pte; + else + ptent =3D ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); - zap_install_uffd_wp_if_needed(vma, addr, pte, details, - ptent); + if (!pte_is_shared) + zap_install_uffd_wp_if_needed(vma, addr, pte, + details, ptent); if (unlikely(!page)) continue; =20 @@ -1768,8 +1823,12 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, /* We should have covered all the swap entry types */ WARN_ON_ONCE(1); } - pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); - zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); + + if (!pte_is_shared) { + pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, + details, ptent); + } } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 add_mm_rss_vec(mm, rss); @@ -2147,6 +2206,8 @@ static int insert_page(struct vm_area_struct *vma, un= signed long addr, if (retval) goto out; retval =3D -ENOMEM; + if (break_cow_pte(vma, NULL, addr) < 0) + goto out; pte =3D get_locked_pte(vma->vm_mm, addr, &ptl); if (!pte) goto out; @@ -2406,6 +2467,9 @@ static vm_fault_t insert_pfn(struct vm_area_struct *v= ma, unsigned long addr, pte_t *pte, entry; spinlock_t *ptl; =20 + if (break_cow_pte(vma, NULL, addr) < 0) + return VM_FAULT_OOM; + pte =3D get_locked_pte(mm, addr, &ptl); if (!pte) return VM_FAULT_OOM; @@ -2783,6 +2847,10 @@ int remap_pfn_range_notrack(struct vm_area_struct *v= ma, unsigned long addr, BUG_ON(addr >=3D end); pfn -=3D addr >> PAGE_SHIFT; pgd =3D pgd_offset(mm, addr); + + if (!break_cow_pte_range(vma, addr, end)) + return -ENOMEM; + flush_cache_range(vma, addr, end); do { next =3D pgd_addr_end(addr, end); @@ -5143,6 +5211,226 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf,= pud_t orig_pud) return VM_FAULT_FALLBACK; } =20 +/* Break (unshare) COW PTE */ +static vm_fault_t handle_cow_pte_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma =3D vmf->vma; + struct mm_struct *mm =3D vma->vm_mm; + pmd_t *pmd =3D vmf->pmd; + unsigned long start, end, addr =3D vmf->address; + struct mmu_notifier_range range; + pmd_t cowed_entry; + pte_t *orig_dst_pte, *orig_src_pte; + pte_t *dst_pte, *src_pte; + spinlock_t *dst_ptl, *src_ptl; + int ret =3D 0; + + /* + * Do nothing with the fault that doesn't have PTE yet + * (from lazy fork). + */ + if (pmd_none(*pmd) || pmd_write(*pmd)) + return 0; + /* COW PTE doesn't handle huge page. */ + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + return 0; + + mmap_assert_write_locked(mm); + + start =3D addr & PMD_MASK; + end =3D (addr + PMD_SIZE) & PMD_MASK; + addr =3D start; + + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, + 0, vma, mm, start, end); + /* + * Because of the address range is PTE not only for the faulted + * vma, it might have some unmatch situations since mmu notifier + * will only reigster the faulted vma. + * Do we really need to care about this kind of unmatch? + */ + mmu_notifier_invalidate_range_start(&range); + raw_write_seqcount_begin(&mm->write_protect_seq); + + /* + * Fast path, check if we are the only one faulted task + * references to this COW-ed PTE, reuse it. + */ + src_pte =3D pte_offset_map_lock(mm, pmd, addr, &src_ptl); + if (cow_pte_count(pmd) =3D=3D 1) { + pmd_t new =3D pmd_mkwrite(*pmd); + set_pmd_at(mm, addr, pmd, new); + pte_unmap_unlock(src_pte, src_ptl); + goto flush_tlb; + } + pte_unmap_unlock(src_pte, src_ptl); + + /* + * Slow path. Since we already did the accounting and still + * sharing the mapped pages, we can just clone PTE. + */ + + cowed_entry =3D READ_ONCE(*pmd); + /* Decrease the pgtable_bytes of COW-ed PTE. */ + mm_dec_nr_ptes(mm); + pmd_clear(pmd); + orig_dst_pte =3D dst_pte =3D pte_alloc_map_lock(mm, pmd, addr, &dst_ptl); + if (unlikely(!dst_pte)) { + /* If allocation failed, restore COW-ed PTE. */ + set_pmd_at(mm, addr, pmd, cowed_entry); + ret =3D -ENOMEM; + goto out; + } + + /* + * We should hold the lock of COW-ed PTE until all the operations + * have been done, including duplicating, TLB flush, and decrease + * refcount. + */ + src_pte =3D pte_offset_map_lock(mm, &cowed_entry, addr, &src_ptl); + orig_src_pte =3D src_pte; + arch_enter_lazy_mmu_mode(); + + do { + if (pte_none(*src_pte)) + continue; + /* + * We should handled the most of cases in copy_cow_pte_range(), + * But, we cannot distinguish the vma is belong to parent or + * child, so we need to take care about it. + */ + set_pte_at(mm, addr, dst_pte, *src_pte); + } while (dst_pte++, src_pte++, addr +=3D PAGE_SIZE, addr !=3D end); + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_dst_pte, dst_ptl); + + /* Decrease the refcount of COW-ed PTE. */ + if (!pmd_put_pte(&cowed_entry)) { + /* COW-ed (old) PTE's refcount is 1, reuse it. */ + pgtable_t token =3D pmd_pgtable(*pmd); + /* Reuse COW-ed PTE. */ + pmd_t new =3D pmd_mkwrite(cowed_entry); + + /* Clear all the entries of new PTE. */ + addr =3D start; + dst_pte =3D pte_offset_map_lock(mm, pmd, addr, &dst_ptl); + orig_dst_pte =3D dst_pte; + do { + if (pte_none(*dst_pte)) + continue; + if (pte_present(*dst_pte)) + page_table_check_pte_clear(mm, addr, *dst_pte); + pte_clear(mm, addr, dst_pte); + } while (dst_pte++, addr +=3D PAGE_SIZE, addr !=3D end); + pte_unmap_unlock(orig_dst_pte, dst_ptl); + /* Now, we can safely free new PTE. */ + pmd_clear(pmd); + pte_free(mm, token); + /* Reuse COW-ed PTE */ + set_pmd_at(mm, start, pmd, new); + } + + pte_unmap_unlock(orig_src_pte, src_ptl); + +flush_tlb: + /* + * If we change the protection, flush TLB. + * flush_tlb_range() will only use vma to get mm, we don't need + * to consider the unmatch address range with vma problem here. + */ + flush_tlb_range(vma, start, end); +out: + raw_write_seqcount_end(&mm->write_protect_seq); + mmu_notifier_invalidate_range_end(&range); + + return ret; +} + +static inline int __break_cow_pte(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr) +{ + struct vm_fault vmf =3D { + .vma =3D vma, + .address =3D addr & PAGE_MASK, + .pmd =3D pmd, + }; + + return handle_cow_pte_fault(&vmf); +} + +/** + * break_cow_pte - duplicate/reuse shared, wprotected (COW-ed) PTE + * @vma: target vma want to break COW + * @pmd: pmd index that maps to the shared PTE + * @addr: the address trigger break COW PTE + * + * The address needs to be in the range of shared and write portected + * PTE that the pmd index mapped. If pmd is NULL, it will get the pmd + * from vma. Duplicate COW-ed PTE when some still mapping to it. + * Otherwise, reuse COW-ed PTE. + */ +int break_cow_pte(struct vm_area_struct *vma, pmd_t *pmd, unsigned long ad= dr) +{ + struct mm_struct *mm; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + + if (!vma) + return -EINVAL; + mm =3D vma->vm_mm; + + if (!test_bit(MMF_COW_PTE, &mm->flags)) + return 0; + + if (!pmd) { + pgd =3D pgd_offset(mm, addr); + if (pgd_none_or_clear_bad(pgd)) + return 0; + p4d =3D p4d_offset(pgd, addr); + if (p4d_none_or_clear_bad(p4d)) + return 0; + pud =3D pud_offset(p4d, addr); + if (pud_none_or_clear_bad(pud)) + return 0; + pmd =3D pmd_offset(pud, addr); + } + + /* We will check the type of pmd entry later. */ + + return __break_cow_pte(vma, pmd, addr); +} + +/** + * break_cow_pte_range - duplicate/reuse COW-ed PTE in a given range + * @vma: target vma want to break COW + * @start: the address of start breaking + * @end: the address of end breaking + * + * Return: zero on success, the number of failed otherwise. + */ +int break_cow_pte_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + unsigned long addr, next; + int nr_failed =3D 0; + + if (!vma) + return -EINVAL; + if (range_in_vma(vma, start, end)) + return -EINVAL; + + addr =3D start; + do { + next =3D pmd_addr_end(addr, end); + if (break_cow_pte(vma, NULL, addr) < 0) + nr_failed++; + } while (addr =3D next, addr !=3D end); + + return nr_failed; +} + /* * These routines also need to handle stuff like marking pages dirty * and/or accessed for architectures that don't do it in hardware (most @@ -5355,8 +5643,27 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, return 0; } } + /* + * Duplicate COW-ed PTE when page fault will change the + * mapped pages (write or unshared fault) or COW-ed PTE + * (file mapped read fault, see do_read_fault()). + */ + if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE) || + vma->vm_ops) && test_bit(MMF_COW_PTE, &mm->flags)) { + ret =3D handle_cow_pte_fault(&vmf); + if (unlikely(ret =3D=3D -ENOMEM)) + return VM_FAULT_OOM; + } } =20 + /* + * It's definitely will break the kernel when refcount of PTE + * is higher than 1 and it is writeable in PMD entry. But we + * want to see more information so just warning here. + */ + if (likely(!pmd_none(*vmf.pmd))) + VM_WARN_ON(cow_pte_count(vmf.pmd) > 1 && pmd_write(*vmf.pmd)); + return handle_pte_fault(&vmf); } =20 diff --git a/mm/mmap.c b/mm/mmap.c index 74a84eb33b904..3eb9b852adc3b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2208,6 +2208,10 @@ int __split_vma(struct mm_struct *mm, struct vm_area= _struct *vma, return err; } =20 + err =3D break_cow_pte(vma, NULL, addr); + if (err) + return err; + new =3D vm_area_dup(vma); if (!new) return -ENOMEM; diff --git a/mm/mremap.c b/mm/mremap.c index e465ffe279bb0..b4136b12f24b6 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -534,6 +534,8 @@ unsigned long move_page_tables(struct vm_area_struct *v= ma, old_pmd =3D get_old_pmd(vma->vm_mm, old_addr); if (!old_pmd) continue; + /* TLB flush twice time here? */ + break_cow_pte(vma, old_pmd, old_addr); new_pmd =3D alloc_new_pmd(vma->vm_mm, vma, new_addr); if (!new_pmd) break; diff --git a/mm/swapfile.c b/mm/swapfile.c index 72e481aacd5df..10af3e0a2eb5d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1911,6 +1911,8 @@ static inline int unuse_pmd_range(struct vm_area_stru= ct *vma, pud_t *pud, next =3D pmd_addr_end(addr, end); if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; + if (break_cow_pte(vma, pmd, addr) < 0) + return -ENOMEM; ret =3D unuse_pte_range(vma, pmd, addr, next, type); if (ret) return ret; --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0854AC10F1E for ; Tue, 20 Dec 2022 07:26:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233557AbiLTH0c (ORCPT ); Tue, 20 Dec 2022 02:26:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233564AbiLTHZm (ORCPT ); Tue, 20 Dec 2022 02:25:42 -0500 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AF10CD for ; Mon, 19 Dec 2022 23:25:39 -0800 (PST) Received: by mail-pf1-x42f.google.com with SMTP id x66so7935205pfx.3 for ; Mon, 19 Dec 2022 23:25:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nw94Wcad60kQlYesoGk9k3xA8YOcC6tDHrdqJDeUZPo=; b=lOCNCFWA/meHRThX8tIrSKBWtWjAtAFAyTPKlIuo8gru7rOJ1Pvq2cOSUhJWnYwZAG NA4typtr3ombOUIegnjNrJMPy96dI39GU/kqXDwBXKU6O6AZrdab21FuHKwTV50WR0AG DEWXzS5FvL9qGjCBlJcEYZ+pdC06SWcoxwGgj/LBSrItS/fBZkzEC2WclkyQvniUVxMN pYUGvbHtTMOF3+JJTaKVVd2J1V50tbGsShfgdJlCHyJYHvApYGLQ/g0u9YMsoJJYi88C xlznrMzHPtlQTh0YLGf3btaJN5LXFvogN1OrDAIhPXGVH9zQPZMJGSIUnnRAiz3kxzU5 JoYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nw94Wcad60kQlYesoGk9k3xA8YOcC6tDHrdqJDeUZPo=; b=eJDYTlYwPFRpGyhdIPf2GB8c86orND2PXIi5nj7WV8WzdikSZ8MVm4/KnAknh6YTqP /jTwko8D4f9J+PtJzfdW2hoCKfQb1aHcGojowGcRNka/hxd7H6A6WZlJ7IgAbjpvWq5x iw/pQfpdHIethOMsbEgcZj6mVUbtMUEqU/StAedVfYCz2xGrEScPwRzyHOvarNEebqoC BYhLL1S/pCZbuH4P6LIEkH3LoSDaPeZFPj+DQJ2SRIm7qi6dO409FMhory2V2AgAGQux 7HEqj9gVji3K9WMgpIIYh65mgAd8lv9lja5JK44PKzMDshmXG4JEcH1VfFZEkqTC/lXv hpXA== X-Gm-Message-State: ANoB5pk6/Nf9kefI1L8Txa7cQNPYrfYWYRj2w3V9aALwUzs5fN/I/rh6 TdK0Q9zitYlLRti/c9gElZM= X-Google-Smtp-Source: AA0mqf7n84R2otGmDntmZWp0jMee4xwTkCP+PYpGZJXN4hyRsMyg3xqv9rZ8VL6h/5SY66ibtNHinQ== X-Received: by 2002:aa7:87c3:0:b0:577:1857:57fa with SMTP id i3-20020aa787c3000000b00577185757famr43877192pfo.18.1671521138524; Mon, 19 Dec 2022 23:25:38 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:37 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 04/14] mm/rmap: Break COW PTE in rmap walking Date: Tue, 20 Dec 2022 15:27:33 +0800 Message-Id: <20221220072743.3039060-5-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Some of the features (unmap, migrate, device exclusive, mkclean, etc) might modify the pte entry via rmap. Add a new page vma mapped walk flag, PVMW_BREAK_COW_PTE, to indicate the rmap walking to break COW PTE. Signed-off-by: Chih-En Lin --- include/linux/rmap.h | 2 ++ mm/migrate.c | 3 ++- mm/page_vma_mapped.c | 2 ++ mm/rmap.c | 12 +++++++----- mm/vmscan.c | 7 ++++++- 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b155..d0f07e5519736 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -368,6 +368,8 @@ int make_device_exclusive_range(struct mm_struct *mm, u= nsigned long start, #define PVMW_SYNC (1 << 0) /* Look for migration entries rather than present PTEs */ #define PVMW_MIGRATION (1 << 1) +/* Break COW-ed PTE during walking */ +#define PVMW_BREAK_COW_PTE (1 << 2) =20 struct page_vma_mapped_walk { unsigned long pfn; diff --git a/mm/migrate.c b/mm/migrate.c index dff333593a8ae..a4be7e04c9b09 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -174,7 +174,8 @@ void putback_movable_pages(struct list_head *l) static bool remove_migration_pte(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *old) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); + DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, + PVMW_SYNC | PVMW_MIGRATION | PVMW_BREAK_COW_PTE); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3cb..5dfc9236dc505 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -251,6 +251,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *= pvmw) step_forward(pvmw, PMD_SIZE); continue; } + if (pvmw->flags & PVMW_BREAK_COW_PTE) + break_cow_pte(vma, pvmw->pmd, pvmw->address); if (!map_pte(pvmw)) goto next_pte; this_pte: diff --git a/mm/rmap.c b/mm/rmap.c index 2ec925e5fa6a9..b1b7dcbd498be 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -807,7 +807,8 @@ static bool folio_referenced_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { struct folio_referenced_arg *pra =3D arg; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + /* it will clear the entry, so we should break COW PTE. */ + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE); int referenced =3D 0; =20 while (page_vma_mapped_walk(&pvmw)) { @@ -1012,7 +1013,8 @@ static int page_vma_mkclean_one(struct page_vma_mappe= d_walk *pvmw) static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *v= ma, unsigned long address, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, + PVMW_SYNC | PVMW_BREAK_COW_PTE); int *cleaned =3D arg; =20 *cleaned +=3D page_vma_mkclean_one(&pvmw); @@ -1471,7 +1473,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, unsigned long address, void *arg) { struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE); pte_t pteval; struct page *subpage; bool anon_exclusive, ret =3D true; @@ -1842,7 +1844,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, unsigned long address, void *arg) { struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE); pte_t pteval; struct page *subpage; bool anon_exclusive, ret =3D true; @@ -2195,7 +2197,7 @@ static bool page_make_device_exclusive_one(struct fol= io *folio, struct vm_area_struct *vma, unsigned long address, void *priv) { struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE); struct make_exclusive_args *args =3D priv; pte_t pteval; struct page *subpage; diff --git a/mm/vmscan.c b/mm/vmscan.c index 026199c047e0e..980d2056adfd1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1781,6 +1781,10 @@ static unsigned int shrink_folio_list(struct list_he= ad *folio_list, } } =20 + /* + * Break COW PTE since checking the reference + * of folio might modify the PTE. + */ if (!ignore_references) references =3D folio_check_references(folio, sc); =20 @@ -1864,7 +1868,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, =20 /* * The folio is mapped into the page tables of one or more - * processes. Try to unmap it here. + * processes. Try to unmap it here. Also, since it will write + * to the page tables, break COW PTE if they are. */ if (folio_mapped(folio)) { enum ttu_flags flags =3D TTU_BATCH_FLUSH; --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17D2AC4332F for ; Tue, 20 Dec 2022 07:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233646AbiLTH0h (ORCPT ); Tue, 20 Dec 2022 02:26:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233639AbiLTHZs (ORCPT ); Tue, 20 Dec 2022 02:25:48 -0500 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4EF26146 for ; Mon, 19 Dec 2022 23:25:47 -0800 (PST) Received: by mail-pg1-x52b.google.com with SMTP id s196so7792778pgs.3 for ; Mon, 19 Dec 2022 23:25:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=eM2vnsugEhx9Fxi3d/0pz8nOmWBqT6WOzDrbW2LhnZvHmhw0chNBsYkvjN9VHvsJoE XDGGsi1BW5fvDjyFKw5DRpVKmd/q1UNAs2X7gpQCPCeM5unfWrK8STQMKuqo5m+tn9go 0+vnQSFdH2xF64ozR84c/+1qjLCTk8g3LGNK1HF4gauRIInyVcbu4uDtltrrzuP7hpYL 6xnBk5U0Sod8Q3OjEmpZwHe0xKZeNbnQo8Oosk8YJ3s0R8xbrKHj8svVdgtzpBKrYvC+ VkfDBibWFHr5ZAjwx+ekYkHpMvMzJNRc7L7rEtQKsT6YkAzjevQWGv8xW9Fmrvl9nbQC m0qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=u8IQ7ScBU/5UTwWjO31Vaa2tymI7En2WWsJKGrcu4PBIWH4IrIIhLohFmcqbOC1WOf U7VI2brV0SdXIGeKRv7fHAI2eJWC3VfH9ZFMmq6hETnOf33kZ5zPhXUUfe/JilG8QuPr mYRO+UAy2uqf91fdp4kV91B7/9tvHbAMKKOTdIRe8Vi3HWqurL5CIpxHsXEBDzmYvmqT VbYMyWwWcOAyhyRzdQecMSUXpcZfbgb8Et40PYMgLnMgiQpaPtSzA8z5+nTyfHNdy6s2 O73RY6T6MxT5CYFnn/aD7SfbSXBWEx9H3qwh3KFHyHkXYSIOJGWLDcD2DlMznCSLNXQS aHQg== X-Gm-Message-State: ANoB5pl0T4mRGMbR2i5nKgIz4957Ak7Hl0MVz6xqh8Ja2hhguqFOBhwf EeaQK6YUDHXVx1h6KzlcApU= X-Google-Smtp-Source: AA0mqf58qBYe4mGrS2Ei5/k0QEHKSaSFh9E15Ikmi+osvtgPbnlSPrbtQehqB3CGfj1ZyaecavszbA== X-Received: by 2002:aa7:814f:0:b0:56c:232e:3b00 with SMTP id d15-20020aa7814f000000b0056c232e3b00mr41607310pfn.7.1671521147136; Mon, 19 Dec 2022 23:25:47 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:46 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 05/14] mm/khugepaged: Break COW PTE before scanning pte Date: Tue, 20 Dec 2022 15:27:34 +0800 Message-Id: <20221220072743.3039060-6-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We should not allow THP to collapse COW-ed PTE. So, break COW PTE before collapse_pte_mapped_thp() collapse to THP. Also, break COW PTE before khugepaged_scan_pmd() scan PTE. Signed-off-by: Chih-En Lin --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 760455dfa8600..881553aa0f2f2 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,6 +13,7 @@ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_PMD_NONE, "pmd_none") \ EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ + EM( SCAN_COW_PTE, "cowed_pte") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a8d5ef2a77d24..106e1ce3931f7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -31,6 +31,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_PMD_NONE, SCAN_PMD_MAPPED, + SCAN_COW_PTE, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -1030,6 +1031,9 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, if (result !=3D SCAN_SUCCEED) goto out_up_write; =20 + /* We should already handled COW-ed PTE. */ + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + anon_vma_lock_write(vma->anon_vma); =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, @@ -1139,6 +1143,16 @@ static int hpage_collapse_scan_pmd(struct mm_struct = *mm, =20 memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + /* + * Before we scan each pte entry, we should first check PTE + * could be modified. So, we break COW if PTE is COW-ed. + */ + if (break_cow_pte(vma, pmd, address) < 0) { + result =3D SCAN_COW_PTE; + goto out; + } + pte =3D pte_offset_map_lock(mm, pmd, address, &ptl); for (_address =3D address, _pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address +=3D PAGE_SIZE) { @@ -1197,6 +1211,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct = *mm, goto out_unmap; } =20 + /* + * If we only trigger the break COW PTE, the page usually + * still in COW mapping, which it still be shared. + */ if (page_mapcount(page) > 1) { ++shared; if (cc->is_khugepaged && @@ -1472,6 +1490,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, un= signed long addr, goto drop_hpage; } =20 + /* We shouldn't let COW-ed PTE collapse. */ + if (break_cow_pte(vma, pmd, haddr) < 0) + goto drop_hpage; + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + start_pte =3D pte_offset_map_lock(mm, pmd, haddr, &ptl); result =3D SCAN_FAIL; =20 --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AA0BC4332F for ; Tue, 20 Dec 2022 07:26:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233732AbiLTH0l (ORCPT ); Tue, 20 Dec 2022 02:26:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233648AbiLTHZ5 (ORCPT ); Tue, 20 Dec 2022 02:25:57 -0500 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FF35AE64 for ; Mon, 19 Dec 2022 23:25:56 -0800 (PST) Received: by mail-pg1-x52b.google.com with SMTP id s196so7792985pgs.3 for ; Mon, 19 Dec 2022 23:25:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XyF9JuxS6b1bMIxmIDmo5YIppyUDgtVjeA/J5ZBy/Sg=; b=bZlL202+3g/lBUVxaTio6INJ4AnCFIK4K1tm95ryvusKHLN4qgv8Xdg9fQcsvNbRgY HuZ3lt2jYcmxhGdFgyZxc5l7Rqj/TwgybpEfpzllnB3CohJAJsellYQL5vo3f924l65W A7zJKiwVcGRuKChNh49XsLYRRty/AZrL4S8mxYRVyK6u1kHnTOdMkohZkYsHZzbi/RqH PMjn3PNK/k/rhQCAVgVR1xUvQizCi83l55O3/YvA0tfLQuotemf78cIKorMHrcrJEFgj hwGPkM0YbCUspKAISFNqg2mZCsXi1CY6Ia82TlREDiqt/JcEwXq353qv9J09cDq3UAfh wzUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XyF9JuxS6b1bMIxmIDmo5YIppyUDgtVjeA/J5ZBy/Sg=; b=R6h4tSbh3trfw+Ll6ApXCokOw4uFWda+UI0ArS/yA53Z/gvaK47yceXZsHaVN6Yy7G 7FcX7CmeMhaicMusqJJc3gzc8XkjZ3jXLRA5UMUTb4N37WYYHW6KlF+CoCW7Nylu8RQz 5gmlLsIdr1T0I1AIcc5WUzgjHPxDuwYf5fxJbuWmQAAEPgpjb4TpAzFEJkAIOM5XNmko iPurRDfJBovqPY2IBYVJi8cMiDyUjIt63EvXQoJapk4wKbE4j6u8N0GepcQkd2uIKUrW JAUCtEPcDmJWhchV9bsWfqWRFoc9l0pz3dVx9WKZAbnEhi2YjWoT0P+IRoer3wW+Xgpd hkug== X-Gm-Message-State: ANoB5pntsqK7RlY9LCC0teRLQ4iChT5CHTJC5GR5Sa3liDjnnlqdiIuY ykkBegjSaSR+FXLdCIF/2lU= X-Google-Smtp-Source: AA0mqf5HNABnkdJTVab5ykwjMWNSwkewL4JhDHIv8ftE7XKfaXPk26d71RZVeX1DPtdPy2jxjzxsXw== X-Received: by 2002:aa7:82d7:0:b0:575:a4f5:7812 with SMTP id f23-20020aa782d7000000b00575a4f57812mr45041888pfn.4.1671521155748; Mon, 19 Dec 2022 23:25:55 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:54 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 06/14] mm/ksm: Break COW PTE before modify shared PTE Date: Tue, 20 Dec 2022 15:27:35 +0800 Message-Id: <20221220072743.3039060-7-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break COW PTE before merge the page that reside in COW-ed PTE. Signed-off-by: Chih-En Lin --- mm/ksm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index c19fcca9bc03d..896a14c44a858 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1017,7 +1017,7 @@ static int write_protect_page(struct vm_area_struct *= vma, struct page *page, pte_t *orig_pte) { struct mm_struct *mm =3D vma->vm_mm; - DEFINE_PAGE_VMA_WALK(pvmw, page, vma, 0, 0); + DEFINE_PAGE_VMA_WALK(pvmw, page, vma, 0, PVMW_BREAK_COW_PTE); int swapped; int err =3D -EFAULT; struct mmu_notifier_range range; @@ -1136,6 +1136,8 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, barrier(); if (!pmd_present(pmde) || pmd_trans_huge(pmde)) goto out; + if (break_cow_pte(vma, pmd, addr) < 0) + goto out; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr, addr + PAGE_SIZE); --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8741AC4332F for ; Tue, 20 Dec 2022 07:27:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233753AbiLTH04 (ORCPT ); Tue, 20 Dec 2022 02:26:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233573AbiLTH0H (ORCPT ); Tue, 20 Dec 2022 02:26:07 -0500 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF5D2B491 for ; Mon, 19 Dec 2022 23:26:04 -0800 (PST) Received: by mail-pg1-x532.google.com with SMTP id q71so7773139pgq.8 for ; Mon, 19 Dec 2022 23:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4qeG6PpkfzCDCCR2FoWrTPHP2/xjfirILLn+aFk9hwU=; b=kxywGUZZeAuKFns3+pAcT6BLDXR+/Kz44ORCc9msg+h0jxLyHCxe0TVfC51CCemZiw lPK0mCZDKx2zNcxYQh15+8mErgYabTVvmXneeJbCUW7tTxoe8MZFNs+a3Ct9gkvam6G3 zViHBUPs7vX9GhKjdG/guIzrg+nppnH01Y7h5jt+0NKDrn8vAFPoUQsBP6vsmy7ruptW 2FohIsekZhDiWBPnI91e6jfyZYlAj0OpOaCEYL4Yr5HyN0g9UySXS/bVGVn30RwhUCZ7 vKbq4lJxlog20ykfCsbI7ibllD957ydpe1NRTNKsUPqsdXVY+RS54QMXgP9wNeNT2y0D By/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4qeG6PpkfzCDCCR2FoWrTPHP2/xjfirILLn+aFk9hwU=; b=ox465yDv/TCqGJeUPPH60lRPjq5Cmp5ILyv+MyxmuoTgglAA8UdN2azdsqwITJ/QBh ZI0WRQuILcIkpe2cIICKbJZFl7R9+E5hpedmSOt635LdDTAZXVV4kMflSSOIcKKAXX+N YpTlMrSLfoj/Xhl+qe12znb5MRTbYDQEUhcQSI8hNeaRMO6szFpOJj+CWrxtTBp6W0+r d/CBEbGKoEoBGlYv3Mv5aGVN1wQOg5TO76vuJRCSLBhO3j7TMav/DKm+DM4WtZ84CwwE FRkghMAikx7SJLdozXcnj6YYP9Xu7KEf1IjSOMmfNWWQqehnf6F3/moqWadTZQye6PgH aLeg== X-Gm-Message-State: ANoB5pnsCyczxzf1hj5i+xuQLY+eSzvBgkKZF45G6zJMb4t8+QJ1Y68X IA7JICJdcKZsdvbQez8/44k= X-Google-Smtp-Source: AA0mqf55kHPKQwj+Olza1S2CIKLn4EriHod9gAjefV9IM3bc3ZIZf34hXZGOaqVw0YtOQTGKdsoWyA== X-Received: by 2002:aa7:911a:0:b0:577:5afa:6321 with SMTP id 26-20020aa7911a000000b005775afa6321mr49726979pfh.26.1671521164189; Mon, 19 Dec 2022 23:26:04 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:03 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 07/14] mm/madvise: Handle COW-ed PTE with madvise() Date: Tue, 20 Dec 2022 15:27:36 +0800 Message-Id: <20221220072743.3039060-8-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break COW PTE if madvise() modify the pte entry of COW-ed PTE. Following are the list of flags which need to break COW PTE. However, like MADV_HUGEPAGE and MADV_MERGEABLE, we should handle it respectively. - MADV_DONTNEED: It calls to zap_page_range() which already be handled. - MADV_FREE: It uses walk_page_range() with madvise_free_pte_range() to free the page by itself, so add break_cow_pte(). - MADV_REMOVE: Same as MADV_FREE, it remove the page by itself, so add break_cow_pte_range(). - MADV_COLD: Similar to MAD_FREE, break COW PTE before pageout. - MADV_POPULATE: Let GUP deal with it. Signed-off-by: Chih-En Lin --- mm/madvise.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/madvise.c b/mm/madvise.c index c7105ec6d08c0..58bccec7caa88 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -408,6 +408,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (pmd_trans_unstable(pmd)) return 0; #endif + if (break_cow_pte(vma, pmd, addr) < 0) + return 0; + tlb_change_page_size(tlb, PAGE_SIZE); orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -614,6 +617,10 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned= long addr, if (pmd_trans_unstable(pmd)) return 0; =20 + /* We should only allocate PTE. */ + if (break_cow_pte(vma, pmd, addr) < 0) + goto next; + tlb_change_page_size(tlb, PAGE_SIZE); orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -974,6 +981,12 @@ static long madvise_remove(struct vm_area_struct *vma, if ((vma->vm_flags & (VM_SHARED|VM_WRITE)) !=3D (VM_SHARED|VM_WRITE)) return -EACCES; =20 + error =3D break_cow_pte_range(vma, start, end); + if (error < 0) + return error; + else if (error > 0) + return -ENOMEM; + offset =3D (loff_t)(start - vma->vm_start) + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); =20 --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 913BFC4332F for ; Tue, 20 Dec 2022 07:27:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233446AbiLTH1S (ORCPT ); Tue, 20 Dec 2022 02:27:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233041AbiLTH0P (ORCPT ); Tue, 20 Dec 2022 02:26:15 -0500 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 354D31277D for ; Mon, 19 Dec 2022 23:26:13 -0800 (PST) Received: by mail-pf1-x42a.google.com with SMTP id c13so7926033pfp.5 for ; Mon, 19 Dec 2022 23:26:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Do3acUsDb4bseCZZAAdCXOeb3zH5cpjKCNiCZshpYgw=; b=Jbm+ZDETl/xq5MthImH8BHHm9GBXQRsQVT3uhHj94LaQ8kg9xI0a4zaGCVg+cefzXk D5EG8JXUc/rT2JguGIYDN8cOKuwJv0wmeQi/ty+upc1PhBOmLL1OAFiJmY4QIEr3FV+u vkQIFqHEDKKxAt6NsvFG4EgYacTY5Kj73Kwv2ZtPwUgZQ+kat+enQ+Byyx4iHWoBBgQq lEBuUXzYINS4qMKwlgv7tZLpRkvuEoM9LlnJJp3/9ckZy1AN0k3HxjzcvUXMV91kEWB2 h8WeyuZsX4oE7ylWWH1LyNtoUaYvUmkYlnxbVpO/i5rxyaGkS28mf01HC/XLRtY7UuUs ZTLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Do3acUsDb4bseCZZAAdCXOeb3zH5cpjKCNiCZshpYgw=; b=tC04kCc9hvHsf4h34hBKTqu9xNL+TougYhzRpRni+syB8xqqJ5OsRy2RYml0HQaSs9 raO6+4nf4ONgjKiqziJmwTdAlYqTGjK+NTnLQYyAY4csO+j70yu+M60gqJ9CIQs/5+oY Usmu8CCvZEVCPMnZmeXPNFsBzb8hS80vDJGQYNO0bYO5z1mEXLaTlsSaFGhVdmJ6kB0J wMjEo5A6NIEAcTFFU5Wyh0hcbintwToxMVPY9Bzvu3Tqk7G87qpMOIXD5E5FnGPqlMEe suflmDFxjyVrtgRd349olXm5RLHCUhES+ceIgaakI2QICsK18P6/Yzr7rV6uUmpLiWHT /xgQ== X-Gm-Message-State: ANoB5plyY1pSYH6BMK4S5A23rY5sj4nnwfDmz1n4khT146M/TyMXX6OA dLJcnFLJP+2kiTRXtmAbBTo= X-Google-Smtp-Source: AA0mqf4sXbpc959lO3xaruJ/7JwWbqHNxobqwVmkT0oSHM0O93U4eYrT6K7YtNCEihfb5iqOzV5xyA== X-Received: by 2002:aa7:8190:0:b0:575:62f8:be3 with SMTP id g16-20020aa78190000000b0057562f80be3mr45051940pfi.23.1671521172728; Mon, 19 Dec 2022 23:26:12 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:11 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 08/14] mm/gup: Break COW PTE in follow_pfn_pte() Date: Tue, 20 Dec 2022 15:27:37 +0800 Message-Id: <20221220072743.3039060-9-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In most of cases, GUP will not modify the page table, excluding follow_pfn_pte(). To deal with COW PTE, break COW PTE in follow_pfn_pte(). Signed-off-by: Chih-En Lin --- mm/gup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index fe195d47de74a..cd72010ba0e6d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -476,6 +476,8 @@ static int follow_pfn_pte(struct vm_area_struct *vma, u= nsigned long address, entry =3D pte_mkyoung(entry); =20 if (!pte_same(*pte, entry)) { + if (break_cow_pte(vma, NULL, address) < 0) + return -ENOMEM; set_pte_at(vma->vm_mm, address, pte, entry); update_mmu_cache(vma, address, pte); } --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20E0BC4332F for ; Tue, 20 Dec 2022 07:27:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233430AbiLTH1W (ORCPT ); Tue, 20 Dec 2022 02:27:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233701AbiLTH0Y (ORCPT ); Tue, 20 Dec 2022 02:26:24 -0500 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B779C1705B for ; Mon, 19 Dec 2022 23:26:21 -0800 (PST) Received: by mail-pj1-x1033.google.com with SMTP id o12so11539778pjo.4 for ; Mon, 19 Dec 2022 23:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wuRQqb4yRAJ4FEZaEUwyqkhOJuLTRzxeOhNcE74yoH4=; b=JZbpM26wcX2ho4irepvHJdt5UeqQwiPyu7R+jVVSgN5Gcct1FWav/0nKIlEGG7Wr/t qOCuZ4KF6aR1ag3SCCwiU8Vx74XEDaTYbFrHlL+VmTd9OSwgmj3qg0WHtHlkcvJVyGVY jcyc8LwsTH+EyYQYCEN7cGXRKdGr8E9C2eodr+Tz89EQJprNi2IGMJmxjRy9UrNDymKa /3vyx/6zVl+r9inx3sFkNw4NvkktYhsqVCdfZIPrVzwFpF0IxxP+1hlsaTujxBYl7qa0 YM2TgNkL0USPA/hjL9mRvBEL2I8mlnA1r6IwEUnQVcz1Bdj7gjAMPYEqQMc3o44RT+Lh xCxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wuRQqb4yRAJ4FEZaEUwyqkhOJuLTRzxeOhNcE74yoH4=; b=cbJDtP4uSmmh1KWtgWYRQ7GQxbeyfcStl7e61xPrGynxppVwEqYwk6z+s81hP+EXkt K52yHrkgeeAgP86KtT5KAFcZ0CdsxLXHqKwHHprNR54synVjuphzGKTgpMgDyo96cDQB fk7Ncs70N0umkB8lonLXiZIB6sbvGdtjngVIXHVeQrTeKvueeVCDm9rJAMVEzVREJwae 047XVq+G+yIXFEb+L6MhjLgaCXRMmlxn75fFAY2DL4gVFYj+KXkyMEM3aQhBF5n/C4ez K+eW39eE1xvfKqDJkTHVvC3zYClM6m6HUBzaDztE/kKNI8yqFYoSWh2kSSEACwHmAZvK AvzQ== X-Gm-Message-State: ANoB5plaoMZvZRHynsM0ynaV0qvv6fIU7Ylsr0Rvx5pOcWZlir2HyxcK UX94KvMrIzWy9EfDYS0Xduo= X-Google-Smtp-Source: AA0mqf69bc1ZsvE4df0dFC152fE872Y/xXYRuYjW42k/aHXL1ec2lj+lXh051FmPTEXUg4pStZbx8A== X-Received: by 2002:a05:6a20:7b1b:b0:ae:e972:72a4 with SMTP id s27-20020a056a207b1b00b000aee97272a4mr26568218pzh.36.1671521181236; Mon, 19 Dec 2022 23:26:21 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:20 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 09/14] mm/mprotect: Break COW PTE before changing protection Date: Tue, 20 Dec 2022 15:27:38 +0800 Message-Id: <20221220072743.3039060-10-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If the PTE table is COW-ed, break it before changing the protection. Signed-off-by: Chih-En Lin --- mm/mprotect.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index 668bfaa6ed2ae..119116ec8f5e5 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -95,6 +95,9 @@ static unsigned long change_pte_range(struct mmu_gather *= tlb, if (pmd_trans_unstable(pmd)) return 0; =20 + if (break_cow_pte(vma, pmd, addr) < 0) + return 0; + /* * The pmd points to a regular pte so the pmd can't change * from under us even if the mmap_lock is only hold for @@ -305,6 +308,12 @@ static inline int pmd_none_or_clear_bad_unless_trans_h= uge(pmd_t *pmd) return 1; if (pmd_trans_huge(pmdval)) return 0; + /* + * If the entry point to COW-ed PTE, it's write protection bit + * will cause pmd_bad(). + */ + if (!pmd_write(pmdval)) + return 0; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); return 1; --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 165FFC4332F for ; Tue, 20 Dec 2022 07:27:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233580AbiLTH12 (ORCPT ); Tue, 20 Dec 2022 02:27:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233644AbiLTH0e (ORCPT ); Tue, 20 Dec 2022 02:26:34 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D6DF327 for ; Mon, 19 Dec 2022 23:26:30 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id f9so7777995pgf.7 for ; Mon, 19 Dec 2022 23:26:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lJ2p4kzRZQx2M+uHMuQF7vBRM0anRv9EBumcihg3PQc=; b=BBqm0Mh4ePRByKmrLfvCwkScNbWATyCDynwf1JxizlYMlSLRoxUJrIiVqjr2xr4xeM dmXraAU1naXOkExZXTq2HDWn1prL9xIFcS7HvM38W6tUrv6fZxpiarjsNpU32ymNLIQr 1HTwmj84kS7nDjMjYHgpU6TL4ka5SbrZPgwfjoq37iHoJJ51q0qV5SBZnfIY4Sob7Kaj IINB5xEwVrLOVXzzDrayCqBRJ+fzoXCVnctZ8OeeV9k357WHdx8XMkir3f9v6g+4S1ji kEFwAKqEguWLqCHDeSOnPuATmQFixsk2ZArf6QhMk5xYmzSmirPCclh4X9xne4SlPwtn 3JgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lJ2p4kzRZQx2M+uHMuQF7vBRM0anRv9EBumcihg3PQc=; b=OwxTFswTlHXzb3njO/InGtoiNFM1zx+zzC5C2VaU7cqTiXT4B1XTW8RNGbzOgXtNH4 8WLlgB6tDBKBD3OSl2+/JiPqNaaKCXDnJkSZtqbBVC6AIVvYheax21SVkFVZ818iRSPu 1tlr1aJ3ZDMomhDnRQHK80z9k+QOEVLuT5Wd3KnoLca9QhO/LvkBPcK/jnMnVaDraw7q ajtKOq66Ci192MBi9DNqjTbZ5WhzGv1OPKEYMMRezBNr1XULASgm01Zy/qFkEKYZFnHo 6jLrPCgBUnqUAvOck/KFxflhPYvVBhZJl3qkyG42fsa57tbsH3vePLOD+awEEnj8TxVl 2DcA== X-Gm-Message-State: AFqh2kpG+rzDZ3p63f+zyx2RfXIjRS0U3cFe2XlfVuF3m8q/8Bd+QJ9T ESpDYL5NoYJzjKQGZvssleM= X-Google-Smtp-Source: AMrXdXtfjmPeOLI9E+pbL9c2dP4v9DUaY59cThOKU2xu/3lwgRPb90rNOhO2dbkDkDN2F6VXbNEsrA== X-Received: by 2002:aa7:874b:0:b0:57e:c106:d50c with SMTP id g11-20020aa7874b000000b0057ec106d50cmr13197565pfo.17.1671521189678; Mon, 19 Dec 2022 23:26:29 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:28 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 10/14] mm/userfaultfd: Support COW PTE Date: Tue, 20 Dec 2022 15:27:39 +0800 Message-Id: <20221220072743.3039060-11-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If uffd fills the zeropage or installs to COW-ed PTE, break it first. Signed-off-by: Chih-En Lin --- mm/userfaultfd.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 650ab6cfd5f49..4ee21c0d42d90 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -69,6 +69,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pm= d_t *dst_pmd, struct inode *inode; pgoff_t offset, max_off; =20 + if (break_cow_pte(dst_vma, dst_pmd, dst_addr) < 0) + return -ENOMEM; + _dst_pte =3D mk_pte(page, dst_vma->vm_page_prot); _dst_pte =3D pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) @@ -227,6 +230,9 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, pgoff_t offset, max_off; struct inode *inode; =20 + if (break_cow_pte(dst_vma, dst_pmd, dst_addr) < 0) + return -ENOMEM; + _dst_pte =3D pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), dst_vma->vm_page_prot)); dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF77DC4332F for ; Tue, 20 Dec 2022 07:27:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233740AbiLTH1w (ORCPT ); Tue, 20 Dec 2022 02:27:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233795AbiLTH1I (ORCPT ); Tue, 20 Dec 2022 02:27:08 -0500 Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4FCA175BD for ; Mon, 19 Dec 2022 23:26:38 -0800 (PST) Received: by mail-pg1-x529.google.com with SMTP id e190so3335492pgc.9 for ; Mon, 19 Dec 2022 23:26:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bwVOI/UUu1ABuS2KCRqzKXC5K7Q/+T6ANJVcDug2ZfE=; b=BrhX07zrzIpvGLJX0fQwCopam6mGHXBowlolWZ0TNNwJxoVszhbWaofdn/qtuBfm/c OlW0Dl05bnhzU2G9MaRprfM6XJ1G41VZB7EZmJuTs/Qff+kX2r5AGV2rV8O04MXC9Fbz u1gsRw5xqriQjCIKIwyk3uyFEFIh4qY8bmnDuQikfTU/5eBgAA+wJVelu3IMIr/d73ya nRFL6I3cmj1aAjNAsAHd2zJm8NXgJHG7Aq6ltHfeEruRr1lmX74hfCu+3wBFF1fVwxlp yeISuWe9bh9Uh8PpP8bB5ff/C9et2/ft7OTjy0w5wUF48dg9LnjeFtP8dK9MlTSBidIH NrJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bwVOI/UUu1ABuS2KCRqzKXC5K7Q/+T6ANJVcDug2ZfE=; b=HwMlmDh9qLzqX4Q+3g5I+MGOV7mQOzA+DdSx4CezeSEcLOOYzPmfNAT2F6zBBc7uoI 3cRA1BvJ8xbD3tNEhSwKGvTYV4fvaO2T0XOxG08iV7kv/Fb7qVQU8TcrGFHvYdFKjAhY DyFoZww6CzsPq0o7ob6kcJiKxaKkNax6a752XF4CdKVn9O1Jblnu0w/ENxVYkF/Fr6PZ gtk9dEOK3QXo00WZuGJcliEl39USU3PAzzjfhdUDow80oUNuFlvnH7g5C401OJbItNM4 OlynlyFmLfFDeDm/I4e5IfshEvjuwhHCjk+yV3NZZcIhXag205xG5TmYmUgDDfUpN1mk YeYg== X-Gm-Message-State: ANoB5plt3txPASL9h8BzbiC+OYp1XZL+L40h5ia6+JW9hhWCUy834GRH Fxxr0a4rvc9ah8II1NcA4N0= X-Google-Smtp-Source: AA0mqf6YqNZk4ITPJSCpg7CeWUIx50XL/iBuxra6VL+g21vXDtDQwTBItG+8v2X3J92lZ1TFysHKmA== X-Received: by 2002:a05:6a00:1e:b0:576:e704:d8c3 with SMTP id h30-20020a056a00001e00b00576e704d8c3mr43783857pfk.23.1671521198441; Mon, 19 Dec 2022 23:26:38 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:37 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 11/14] mm/migrate_device: Support COW PTE Date: Tue, 20 Dec 2022 15:27:40 +0800 Message-Id: <20221220072743.3039060-12-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break COW PTE before collecting the pages in COW-ed PTE. Signed-off-by: Chih-En Lin --- mm/migrate_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 721b2365dbca9..f6d67bd9629f5 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -106,6 +106,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, } } =20 + if (!break_cow_pte_range(vma, pmdp, start, end)) + return migrate_vma_collect_skip(start, end, walk); if (unlikely(pmd_bad(*pmdp))) return migrate_vma_collect_skip(start, end, walk); =20 --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82746C4332F for ; Tue, 20 Dec 2022 07:28:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233670AbiLTH2J (ORCPT ); Tue, 20 Dec 2022 02:28:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233633AbiLTH1i (ORCPT ); Tue, 20 Dec 2022 02:27:38 -0500 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77CD217AA0 for ; Mon, 19 Dec 2022 23:26:47 -0800 (PST) Received: by mail-pj1-x1031.google.com with SMTP id v13-20020a17090a6b0d00b00219c3be9830so11234401pjj.4 for ; Mon, 19 Dec 2022 23:26:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SyroKMXkbrC7KLBuW9kdtHrj2LCvgbvHbaCl4XJKm0w=; b=RzWa85xp16ahKT8BYmKLizeFGJjcYh214M4gWXHr/BTGRafxhKKmAVlwNLT+ZEN/GA 1qDOToWo/wBMgTwbWnQ09xZqyazrhaVn48i6iInQH8gYcAdY3XYwCCebYfMurD7oW4fC vqWTNzfPplWVDkB4u7LVVi1xTyT30BJ3wEB2Vnn1TYtRnCWoebepvUJiJc41xQcDYVYM aPo9CDa9/5/L7J2cSCBw0wUtntyh2UxiQczZ0ZwdrZfgWPmfW1T0uX8u/AVR+Q/end4m tjci9bFINZ3mAj6dvfIpyxVwESPZNUSkFxZpnhPHOoVBNWsVbqJiMRMOBUi24P5iB0Qn s+ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SyroKMXkbrC7KLBuW9kdtHrj2LCvgbvHbaCl4XJKm0w=; b=vks1Z9OFLwA1allOv83OjnfV4E5ZktxWGhJsoDXqBTJoRR13nfoMj1bdt3CaVsEnPS 32itTdYO0zcRyMh9jJpOTlLVyPb4huV7Pcm3H/rDso824AhmkPtyT4hfKqgF8CyxeXVu 2SsSrL3Jb0YxnmyMhzzrxU4Xda++RfoESvuVRsaP/E8l8A6rauD/QNbJ2Yh8S6qUDP6E fIvL2I186M5AXICrxj4G1ipd9siuzEPndF+dVCFYyEy+wT+hH0iKo+QOE0ilEcXTSacm MGFKR4Nl9NbHXaTW4hyVfb9sUyG93EgW6dfGUixvjeICRNmiq9+zu71EN8IMxJrqvrJF X1og== X-Gm-Message-State: AFqh2kpAxViKZMm9qxZ/WZuPM5si9k2x1obuTva5Zr/QyUOIFXUn/lVz fz3QESrESpptHpdPlH4lh64= X-Google-Smtp-Source: AMrXdXvR+Whir8BM0XxNPZtdjAp5PWidsp1Iwk/5/7RTfWUgUOSkfvbwbfP7/3xDQQEAJ/adupbYiA== X-Received: by 2002:a05:6a21:3a43:b0:b0:275d:3036 with SMTP id zu3-20020a056a213a4300b000b0275d3036mr11490037pzb.24.1671521207015; Mon, 19 Dec 2022 23:26:47 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:46 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 12/14] fs/proc: Support COW PTE with clear_refs_write Date: Tue, 20 Dec 2022 15:27:41 +0800 Message-Id: <20221220072743.3039060-13-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Before clearing the entry in COW-ed PTE, break COW PTE first. Signed-off-by: Chih-En Lin --- fs/proc/task_mmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8a74cdcc9af00..7d34b036c1b96 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1190,6 +1190,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned = long addr, if (pmd_trans_unstable(pmd)) return 0; =20 + if (break_cow_pte(vma, pmd, addr) < 0) + return 0; + pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { ptent =3D *pte; --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5543C4332F for ; Tue, 20 Dec 2022 07:28:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233768AbiLTH2Y (ORCPT ); Tue, 20 Dec 2022 02:28:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233728AbiLTH1u (ORCPT ); Tue, 20 Dec 2022 02:27:50 -0500 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA8517AAA for ; Mon, 19 Dec 2022 23:26:56 -0800 (PST) Received: by mail-pf1-x42c.google.com with SMTP id 21so7933674pfw.4 for ; Mon, 19 Dec 2022 23:26:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gNIdCDWNwkrih/VXKM4PFxJFrGROnp7zsY1kJg/1Z4M=; b=gz1nFEL3YX0f+2Ug9WMH010mYn/MV/3KGNNouFO6lfF7BH9Ve3+BLXxeFDAJuISZBj B3Z1Fsk/aVHyfQMxzLEqeLGhKCpu2JVWiw4EuNZnCyCXlGcn7N9jc7zsDFgb191Gf86q j0/cfv87RORRD7VqFxu293jwGkwCB/EH/aCw6e1z2N2XToOWJQvQMB7JcxygpBVhHxxp 954CNvUu7r70gJBapgsoa93wOWSmimAkNHfZOOUcLvxDwc8FQEsgP8/UIz+S2dsZ0/Ha XfISSvcnATb8mG3jzngNjLEWua9Pe+d2e+I4iDBl5Xr7H232pYSLjqZOw/RzwUt4XB0r Lwow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gNIdCDWNwkrih/VXKM4PFxJFrGROnp7zsY1kJg/1Z4M=; b=kTY2kebefOqYykQNO11iCYoJZzowzpG+hMh+OM+mpUJKuZNdhA4rY/USppZyXb1apR pn+C0jIfsSsICxpeXji2D96GzQosw2HNqM1MdiKWkbXHYs6e/1F20aES2mQZt9iRVlNz cQammwtcMRoi/cfNeU/yElPLr17ghyCPBQ2dalEuSEHUU/I8LeR6w9497X2Fje64KK5D JlSEJzl6FmtLWudgaXSeT/k4ECmzkbsS7VZwvKZsQKA05+3e6kfdcDyaTHKmc6K/59Yz 4mFWI8SOr/3rk5M8jobsesj5++YvLNLXpC7xa7CUyKdmRxUe4oEsXnKhd06vnQ5zYILX z8QQ== X-Gm-Message-State: AFqh2koRqWP9BtvxRSQSJ0stSeSC36gICSLyx+jY2ygaaY3P8qt+2twy 2kQbq6xq461+RZ9Pj7qBhiI= X-Google-Smtp-Source: AA0mqf7f1ezLfkTLN/HA7kNFS1SqVBkEd8L8aWm+KL4xeaHkvPNKNWao+2auP5GeoK2czvTXY8A3RA== X-Received: by 2002:aa7:9f88:0:b0:57d:56f1:6ae7 with SMTP id z8-20020aa79f88000000b0057d56f16ae7mr23753731pfr.33.1671521215509; Mon, 19 Dec 2022 23:26:55 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:26:54 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 13/14] events/uprobes: Break COW PTE before replacing page Date: Tue, 20 Dec 2022 15:27:42 +0800 Message-Id: <20221220072743.3039060-14-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break COW PTE if we want to replace the page which resides in COW-ed PTE. Signed-off-by: Chih-En Lin --- kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index d9e357b7e17c9..2956a53da01a1 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -157,7 +157,7 @@ static int __replace_page(struct vm_area_struct *vma, u= nsigned long addr, struct folio *old_folio =3D page_folio(old_page); struct folio *new_folio; struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, PVMW_BREAK_COW_PTE); int err; struct mmu_notifier_range range; =20 --=20 2.37.3 From nobody Wed Sep 17 10:04:56 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5FC4C4332F for ; Tue, 20 Dec 2022 07:28:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233644AbiLTH2y (ORCPT ); Tue, 20 Dec 2022 02:28:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233591AbiLTH2E (ORCPT ); Tue, 20 Dec 2022 02:28:04 -0500 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 554DF167DF for ; Mon, 19 Dec 2022 23:27:04 -0800 (PST) Received: by mail-pf1-x42c.google.com with SMTP id 21so7933866pfw.4 for ; Mon, 19 Dec 2022 23:27:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xOFwABHyEQomQ1BAaOhC3aSTvtto13Wh1TP7RdkKGVE=; b=QnipYX8r/cNxeA/VcnjSW/OSh/9P8UNJuqQNtxcvn2GyxEmaZw9iQabvRE1dwc3IsR dAuFzZZSQzfKXY8pwrONKqRlExpvyWNEEsnwOOsEFk8OM76ZgWJfE5+Pm/D85D6rLtW3 Q9Uig/Noj0FXSt2AVzDvVG5SlyyDuU1JFt7gyRzHkSbSziPG0wdVLdfYwo3VpTPPazVd +pikE3ai8HaktWHADnbHo51lZ0dczhIifhyThwjVXSQCiEmVGod8pYv8ZVMXJ/vQQlZX eWhrMJgRDJUqm/f0KPO3VCs6J2V65fgNZVsZZxuLUtYnBItYK9U4Esqgrg1J0+rzfRhY Y+uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xOFwABHyEQomQ1BAaOhC3aSTvtto13Wh1TP7RdkKGVE=; b=BTJvYS4spxawqClPZN3JsF5BIQur69dzc90zuaFE6cJJCiJZrUzmmtEjfU6YgIC4Z6 AJGWICq80lXPLJTaCpEry0jEUsI55tWgfpMcY7Wpv4XramshiAeqtFLbBFtRkBDlkbOh BJfwA0UsEXWPhCKOxLbLHZOGf+frofbiJa6ledrcZ7r9xq60C5PFgdDlRizjaJt3KDNH fzJCs43P7v4qrH/fhzaPtY7sIlBHV6+rpY17IhFoLBsMBX+1hjVTmbmlmnlG+V69jsxQ SvmJILou60+0PH+KkLNkER3GvgXXnCIY2cxkxML5eb4a+ppQO9KxkRjc8dW14FV2BBCj tHzA== X-Gm-Message-State: AFqh2krG3eOY2N1S2whT1jEN2oCVqBDtTx2PImWJebHTKpEFZFrWB5He aKJ78XnHUCKTvWbNUMc487o= X-Google-Smtp-Source: AMrXdXtiyKeI7TFFv6WMGqX0SE+95zDp3DgcU0JzPCjOSWZyo1MAwEssYqWeR06tsKbgCRIzVkuWew== X-Received: by 2002:a05:6a00:2dc:b0:57e:866d:c095 with SMTP id b28-20020a056a0002dc00b0057e866dc095mr18317032pft.25.1671521223915; Mon, 19 Dec 2022 23:27:03 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.26.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:27:03 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 14/14] mm: fork: Enable COW PTE to fork system call Date: Tue, 20 Dec 2022 15:27:43 +0800 Message-Id: <20221220072743.3039060-15-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch enables the Copy-On-Write (COW) mechanism to the PTE table in fork system call. To let the process do COW PTE fork, use prctl(PR_SET_COW_PTE), it will set the MMF_COW_PTE_READY flag to the process for enabling COW PTE during the next time of fork. It uses the MMF_COW_PTE flag to distinguish the normal page table and the COW one. Moreover, it is difficult to distinguish whether all the page tables is out of COW state. So the MMF_COW_PTE flag won't be disabled after setup. Signed-off-by: Chih-En Lin --- kernel/fork.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index 08969f5aa38d5..ef3d27577aa43 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2668,6 +2668,11 @@ pid_t kernel_clone(struct kernel_clone_args *args) trace =3D 0; } =20 + if (current->mm && test_bit(MMF_COW_PTE_READY, ¤t->mm->flags)) { + clear_bit(MMF_COW_PTE_READY, ¤t->mm->flags); + set_bit(MMF_COW_PTE, ¤t->mm->flags); + } + p =3D copy_process(NULL, trace, NUMA_NO_NODE, args); add_latent_entropy(); =20 --=20 2.37.3