From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84F40C433EF for ; Tue, 5 Apr 2022 02:40:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229977AbiDECl7 (ORCPT ); Mon, 4 Apr 2022 22:41:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230009AbiDEClp (ORCPT ); Mon, 4 Apr 2022 22:41:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A36F521D05E for ; Mon, 4 Apr 2022 18:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123213; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pI9cWciCmYBnmPhneotwA4BwtHRdfFIYJm9g8S6gDqg=; b=W3AXKKht0wSd9/Tv9Q0EJtVtRX6u7CG7jeB5BnN2d384jQVVXl15WgI8ykrzdM5DBkM7gk Q0k/iW8kGiLfISKpAD8FpFicYwV1S+ONHuLxgE0hwoSl/NiRU2RHSUaePpTIlDcsGH7cB8 RbDcY1J3vv6jaF9w5tsUbtbYzLwFQc8= Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-226-Xvua7ITMMFCqh6jC0Xtevg-1; Mon, 04 Apr 2022 21:46:52 -0400 X-MC-Unique: Xvua7ITMMFCqh6jC0Xtevg-1 Received: by mail-io1-f71.google.com with SMTP id g16-20020a05660226d000b00638d8e1828bso7429991ioo.13 for ; Mon, 04 Apr 2022 18:46:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pI9cWciCmYBnmPhneotwA4BwtHRdfFIYJm9g8S6gDqg=; b=Cgk5JmWeJVjAG9VllYyOldh83LGBa+38IIvRIW8gKwAGXkX8lkQvLxxNc0WcO1SHMv aOgHshHJMg8StYEADjYIkvx9n6VDf1TTCZDoy9ePFe5U8VTJOIFxnPg/+Fa1Ou6HnPdd L1Sa/zipxSMJvTMAszXqe7LEtKgCMfUqZTeL/0IkHrmZi30H/BsRfl1gnEcz8M8MugqB aTPwj9Lu85glDkAITqT8VHckVhUvFBnWUW6sobDisJBvWXZyJ9+DyLG7jkdWiJdxmlHy HSq5PkDmRjBhgWgsXeEe2VFOo+FKmpWkqGm0NV065YZCdqQXT0CSIWTMVJzMAPWs6en4 yGfQ== X-Gm-Message-State: AOAM532hl88KpqwUJZ7Xkhs6OWnZEnC2X/57dlVZf9xq/qLfnvbjGE85 1QNNduVtPL/3vp6rSWjMGjk3y1IdH8dgJvObMt/J5CEv+4YteV9r7fzc2CfzyPeDPtWQ0E6zcWZ 8QvyLWa8dbBfDDuEweiprjibWFVbN69iKdzMdFKmbHmVBRAuhukqnZGKVaFh9savUDG9DvZrJjA == X-Received: by 2002:a92:cdaf:0:b0:2ca:1fe0:333f with SMTP id g15-20020a92cdaf000000b002ca1fe0333fmr562675ild.173.1649123211357; Mon, 04 Apr 2022 18:46:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwwDD4n8m40Y7RHhjN5HSEe+GsrzZ5EezHVdV8pG6lXvRYBwaehy8cbB+ODTD7LlpiU1TD9xA== X-Received: by 2002:a92:cdaf:0:b0:2ca:1fe0:333f with SMTP id g15-20020a92cdaf000000b002ca1fe0333fmr562650ild.173.1649123211093; Mon, 04 Apr 2022 18:46:51 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id d14-20020a056602184e00b00649673c175asm7556676ioi.25.2022.04.04.18.46.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:46:50 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Andrew Morton , David Hildenbrand , Matthew Wilcox , peterx@redhat.com, Alistair Popple , Nadav Amit , Axel Rasmussen , Andrea Arcangeli , "Kirill A . Shutemov" , Hugh Dickins , Jerome Glisse , Mike Rapoport Subject: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry Date: Mon, 4 Apr 2022 21:46:24 -0400 Message-Id: <20220405014646.13522-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch introduces a new swap entry type called PTE_MARKER. It can be installed for any pte that maps a file-backed memory when the pte is temporarily zapped, so as to maintain per-pte information. The information that kept in the pte is called a "marker". Here we define = the marker as "unsigned long" just to match pgoff_t, however it will only work = if it still fits in swp_offset(), which is e.g. currently 58 bits on x86_64. A new config CONFIG_PTE_MARKER is introduced too; it's by default off. A b= unch of helpers are defined altogether to service the rest of the pte marker cod= e. Signed-off-by: Peter Xu --- include/asm-generic/hugetlb.h | 9 ++++ include/linux/swap.h | 15 ++++++- include/linux/swapops.h | 78 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 6 +++ 4 files changed, 107 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 8e1e6244a89d..f39cad20ffc6 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -2,6 +2,9 @@ #ifndef _ASM_GENERIC_HUGETLB_H #define _ASM_GENERIC_HUGETLB_H =20 +#include +#include + static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) { return mk_pte(page, pgprot); @@ -80,6 +83,12 @@ static inline int huge_pte_none(pte_t pte) } #endif =20 +/* Please refer to comments above pte_none_mostly() for the usage */ +static inline int huge_pte_none_mostly(pte_t pte) +{ + return huge_pte_none(pte) || is_pte_marker(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT static inline pte_t huge_pte_wrprotect(pte_t pte) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 7daae5a4b3e1..5553189d0215 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -55,6 +55,19 @@ static inline int current_is_kswapd(void) * actions on faults. */ =20 +/* + * PTE markers are used to persist information onto PTEs that are mapped w= ith + * file-backed memories. As its name "PTE" hints, it should only be appli= ed to + * the leaves of pgtables. + */ +#ifdef CONFIG_PTE_MARKER +#define SWP_PTE_MARKER_NUM 1 +#define SWP_PTE_MARKER (MAX_SWAPFILES + SWP_HWPOISON_NUM + \ + SWP_MIGRATION_NUM + SWP_DEVICE_NUM) +#else +#define SWP_PTE_MARKER_NUM 0 +#endif + /* * Unaddressable device memory support. See include/linux/hmm.h and * Documentation/vm/hmm.rst. Short description is we need struct pages for @@ -107,7 +120,7 @@ static inline int current_is_kswapd(void) =20 #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM) + SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM) =20 /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 32d517a28969..7a00627845f0 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -274,6 +274,84 @@ static inline int is_readable_migration_entry(swp_entr= y_t entry) =20 #endif =20 +typedef unsigned long pte_marker; + +#define PTE_MARKER_MASK (0) + +#ifdef CONFIG_PTE_MARKER + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + return swp_entry(SWP_PTE_MARKER, marker); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return swp_type(entry) =3D=3D SWP_PTE_MARKER; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return swp_offset(entry) & PTE_MARKER_MASK; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return is_swap_pte(pte) && is_pte_marker_entry(pte_to_swp_entry(pte)); +} + +#else /* CONFIG_PTE_MARKER */ + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + /* This should never be called if !CONFIG_PTE_MARKER */ + WARN_ON_ONCE(1); + return swp_entry(0, 0); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return false; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return 0; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return false; +} + +#endif /* CONFIG_PTE_MARKER */ + +static inline pte_t make_pte_marker(pte_marker marker) +{ + return swp_entry_to_pte(make_pte_marker_entry(marker)); +} + +/* + * This is a special version to check pte_none() just to cover the case wh= en + * the pte is a pte marker. It existed because in many cases the pte mark= er + * should be seen as a none pte; it's just that we have stored some inform= ation + * onto the none pte so it becomes not-none any more. + * + * It should be used when the pte is file-backed, ram-based and backing + * userspace pages, like shmem. It is not needed upon pgtables that do not + * support pte markers at all. For example, it's not needed on anonymous + * memory, kernel-only memory (including when the system is during-boot), + * non-ram based generic file-system. It's fine to be used even there, bu= t the + * extra pte marker check will be pure overhead. + * + * For systems configured with !CONFIG_PTE_MARKER this will be automatical= ly + * optimized to pte_none(). + */ +static inline int pte_none_mostly(pte_t pte) +{ + return pte_none(pte) || is_pte_marker(pte); +} + static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { struct page *p =3D pfn_to_page(swp_offset(entry)); diff --git a/mm/Kconfig b/mm/Kconfig index 034d87953600..a1688b9314b2 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -909,6 +909,12 @@ config ANON_VMA_NAME area from being merged with adjacent virtual memory areas due to the difference in their name. =20 +config PTE_MARKER + bool "Marker PTEs support" + + help + Allows to create marker PTEs for file-backed memory. + source "mm/damon/Kconfig" =20 endmenu --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 262AFC433F5 for ; Tue, 5 Apr 2022 02:41:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229802AbiDECnU (ORCPT ); Mon, 4 Apr 2022 22:43:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229838AbiDECnG (ORCPT ); Mon, 4 Apr 2022 22:43:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D0D9B1404EE for ; Mon, 4 Apr 2022 18:48:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FzKAaIaqf6EF4behF3lujXRCF51brNcWMl75G/VXVIo=; b=YWTUUfvACPodtRvkVwOHtDQSYu4hTjadU1VRltTcpohNTxHfdjAl8jlZOe2zx0MxWYhw6a IJaAL7YcSvPguY0ujsxjmHvbcKT6nH5IrBS4rgfS7Y3KIgD8+/ZkHgNqe2ATGTksKTqfLO 6bnC+arj6XWi8R6HfqAknIhqDsqMEiQ= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-404-diFJKrpBMKG_5MLIi6qMMw-1; Mon, 04 Apr 2022 21:48:37 -0400 X-MC-Unique: diFJKrpBMKG_5MLIi6qMMw-1 Received: by mail-il1-f199.google.com with SMTP id r16-20020a056e02109000b002ca35f87493so3272705ilj.22 for ; Mon, 04 Apr 2022 18:48:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FzKAaIaqf6EF4behF3lujXRCF51brNcWMl75G/VXVIo=; b=EEW7C3dP51N6Og1tbfNoQTsKlHVNg8939yTyI/8R+aWkXrG5XK8Q0tgn1AJAyGkELh XCGZpKw6cWGCDE5DPhQC+fX5Ne05WMKbXTW2xW02uka3Qf/c76ws/TP6pAfNYRNKGMZz EoqerlOQQvCquwX3dd/U36kbt8/HwmRkTVjhj8KV/huo6ciomq3ENXy0MGWHJsM2JgJP aSW6ToYJ9iyjagzos3UN0SxI+rhKDlgBLaGGT1bn9FjYvFqEw2wWbfpIfUdVeX4yuoSV ifMac8czgwjU/cC6yEbVMTpOjp3u09SJ0A+vmpuccCvvwoawBdTZHvgy9K6p19ezRf82 lV/g== X-Gm-Message-State: AOAM533NLtYj9r6i0a9QrIf2MXESc+h+rCDMp/4ahf7ftgKnWrKCSFc/ +3eYuThbTXeql0j7lU+P3iwoEkg+Rz6bhMTQ5M7xg+JaPG8Qp7d42ObZOUVWQtzI1zLWiR6ctp9 ztTys4I6ackp/Pl5M0v0i8sfpVzCKNxVrP8E74RusWyqBjpw5dqVubqUr1qiK8sgnoicdIWe1FA == X-Received: by 2002:a05:6e02:1c0a:b0:2c7:75de:d84 with SMTP id l10-20020a056e021c0a00b002c775de0d84mr565302ilh.186.1649123316147; Mon, 04 Apr 2022 18:48:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8Bx++Tjrd/IXRjPJSzct03qjBhI2A5uyYDMnoG1Ghox9iQaiS+UpZyVnd00v9azlJVSYwiQ== X-Received: by 2002:a05:6e02:1c0a:b0:2c7:75de:d84 with SMTP id l10-20020a056e021c0a00b002c775de0d84mr565275ilh.186.1649123315794; Mon, 04 Apr 2022 18:48:35 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id c15-20020a5d8b4f000000b00648f75d0289sm7369921iot.6.2022.04.04.18.48.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:35 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 02/23] mm: Teach core mm about pte markers Date: Mon, 4 Apr 2022 21:48:33 -0400 Message-Id: <20220405014833.14015-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch still does not use pte marker in any way, however it teaches the core mm about the pte marker idea. For example, handle_pte_marker() is introduced that will parse and handle a= ll the pte marker faults. Many of the places are more about commenting it up - so that we know there's the possibility of pte marker showing up, and why we don't need special code for the cases. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 10 ++++++---- mm/filemap.c | 5 +++++ mm/hmm.c | 2 +- mm/memcontrol.c | 8 ++++++-- mm/memory.c | 23 +++++++++++++++++++++++ mm/mincore.c | 3 ++- mm/mprotect.c | 3 +++ 7 files changed, 46 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index aa0c47cb0d16..8b4a94f5a238 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -249,9 +249,10 @@ static inline bool userfaultfd_huge_must_wait(struct u= serfaultfd_ctx *ctx, =20 /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (huge_pte_none(pte)) + if (huge_pte_none_mostly(pte)) ret =3D true; if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) ret =3D true; @@ -330,9 +331,10 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, pte =3D pte_offset_map(pmd, address); /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (pte_none(*pte)) + if (pte_none_mostly(*pte)) ret =3D true; if (!pte_write(*pte) && (reason & VM_UFFD_WP)) ret =3D true; diff --git a/mm/filemap.c b/mm/filemap.c index 3a5ffb5587cd..ef77dae8c28d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3382,6 +3382,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, vmf->pte +=3D xas.xa_index - last_pgoff; last_pgoff =3D xas.xa_index; =20 + /* + * NOTE: If there're PTE markers, we'll leave them to be + * handled in the specific fault path, and it'll prohibit the + * fault-around logic. + */ if (!pte_none(*vmf->pte)) goto unlock; =20 diff --git a/mm/hmm.c b/mm/hmm.c index af71aac3140e..3fd3242c5e50 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -239,7 +239,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, uns= igned long addr, pte_t pte =3D *ptep; uint64_t pfn_req_flags =3D *hmm_pfn; =20 - if (pte_none(pte)) { + if (pte_none_mostly(pte)) { required_fault =3D hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7a08737bac4b..08af97c73f0f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5644,10 +5644,14 @@ static enum mc_target_type get_mctgt_type(struct vm= _area_struct *vma, =20 if (pte_present(ptent)) page =3D mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page =3D mc_handle_file_pte(vma, addr, ptent); else if (is_swap_pte(ptent)) page =3D mc_handle_swap_pte(vma, ptent, &ent); - else if (pte_none(ptent)) - page =3D mc_handle_file_pte(vma, addr, ptent); =20 if (!page && !ent.val) return ret; diff --git a/mm/memory.c b/mm/memory.c index 2c5d1bb4694f..3f396241a7db 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -100,6 +100,8 @@ struct page *mem_map; EXPORT_SYMBOL(mem_map); #endif =20 +static vm_fault_t do_fault(struct vm_fault *vmf); + /* * A number of key systems in x86 including ioremap() rely on the assumpti= on * that high_memory defines the upper bound on direct map memory, then end @@ -1415,6 +1417,8 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; + } else if (is_pte_marker_entry(entry)) { + /* By default, simply drop all pte markers when zap */ } else if (is_hwpoison_entry(entry)) { if (!should_zap_cows(details)) continue; @@ -3555,6 +3559,23 @@ static inline bool should_try_to_free_swap(struct pa= ge *page, page_count(page) =3D=3D 2; } =20 +static vm_fault_t handle_pte_marker(struct vm_fault *vmf) +{ + swp_entry_t entry =3D pte_to_swp_entry(vmf->orig_pte); + unsigned long marker =3D pte_marker_get(entry); + + /* + * PTE markers should always be with file-backed memories, and the + * marker should never be empty. If anything weird happened, the best + * thing to do is to kill the process along with its mm. + */ + if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) + return VM_FAULT_SIGBUS; + + /* TODO: handle pte markers */ + return 0; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3592,6 +3613,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret =3D vmf->page->pgmap->ops->migrate_to_ram(vmf); } else if (is_hwpoison_entry(entry)) { ret =3D VM_FAULT_HWPOISON; + } else if (is_pte_marker_entry(entry)) { + ret =3D handle_pte_marker(vmf); } else { print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL); ret =3D VM_FAULT_SIGBUS; diff --git a/mm/mincore.c b/mm/mincore.c index f4f627325e12..fa200c14185f 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -122,7 +122,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { pte_t pte =3D *ptep; =20 - if (pte_none(pte)) + /* We need to do cache lookup too for pte markers */ + if (pte_none_mostly(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, vma, vec); else if (pte_present(pte)) diff --git a/mm/mprotect.c b/mm/mprotect.c index 56060acdabd3..709a6f73b764 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -188,6 +188,9 @@ static unsigned long change_pte_range(struct vm_area_st= ruct *vma, pmd_t *pmd, newpte =3D pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); + } else if (is_pte_marker_entry(entry)) { + /* Skip it, the same as none pte */ + continue; } else { newpte =3D oldpte; } --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11C89C433F5 for ; Tue, 5 Apr 2022 02:41:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229789AbiDECnc (ORCPT ); Mon, 4 Apr 2022 22:43:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbiDECnG (ORCPT ); Mon, 4 Apr 2022 22:43:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DEB4618B335 for ; Mon, 4 Apr 2022 18:48:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sVIJ8cJ4ZYWLy9njZqjP9dBoADUUbduMAUK9CqZAoa8=; b=UdMJcHOnPK/KgaaaqIJ/WufnDJY3G4CHw2hv7tAy4qlSitJKRSL+b+cnD0nyINvozJFite X2N22Qrbubc9JnNoLUZoN6eouJOuKdxZqWt6/TLJocAGomrlt5sTpprV0m9WMU480Q18wC LQ7yz+Rz/JZfdRbS2CyHb6DWaf7SXzE= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-221-6MwUwmLLPNeMVGSvJmw1Wg-1; Mon, 04 Apr 2022 21:48:39 -0400 X-MC-Unique: 6MwUwmLLPNeMVGSvJmw1Wg-1 Received: by mail-io1-f70.google.com with SMTP id z23-20020a6b0a17000000b00649f13ea3a7so7400482ioi.23 for ; Mon, 04 Apr 2022 18:48:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sVIJ8cJ4ZYWLy9njZqjP9dBoADUUbduMAUK9CqZAoa8=; b=rqu0Oj6dh+/4pIQO8GASp88dNkCCSpdGff1oqtkUq9UOPFx9H4OWC7IyCQ+Cgtmraw wr8SrnzLGXCMBQynCxCfzYXcxAWdk4KD9teyRCtJ9A5ZtPyNysQ+vRGZF2TknGBk27Xo IzgHrRlsS0oodhUfeVHezKKR+40yzjrFvUuTmkYY5bho5yvX8m/A8/1ZcUIgKWvOCZ7B WknnxvBkSNh6yCJcxanh4R/kq31uJnCkpUkgj6dWwOxATGUYzM0PXNildqvfEIaFzCIR pWihoDDO9CB7rPNG6MYzZZ+8HrIQDhpI5EtMZ101AJMyWjP4i7NSUn18yxlcbXqBLGUv 2NJA== X-Gm-Message-State: AOAM530Vf+wuyjcgqCmPVJStweT0+O3QTO+/Hx1ZRzdKvjYwwsQyp2eL WXpbB1ggaF2JehXa26gDdSatEd+t9FnFAs5nG4KS6+pOYAIpsyrPh8sQpRrEjtAB9b9sGHd01yl H1qVjtnKSh6kc+jco/eIOdknI67W65nAH9+esdzGsWbofQgrO0wYsIWMZ7Jev8NUCYGvyS0iNHw == X-Received: by 2002:a92:cc41:0:b0:2ca:317d:1545 with SMTP id t1-20020a92cc41000000b002ca317d1545mr610371ilq.97.1649123319014; Mon, 04 Apr 2022 18:48:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw4r+oclsvcEsUiRUe9DMRABSCkkiRtx/NYkAy387pbnwUItvkV/gFFJ6K8jTgQhR+UEusSug== X-Received: by 2002:a92:cc41:0:b0:2ca:317d:1545 with SMTP id t1-20020a92cc41000000b002ca317d1545mr610343ilq.97.1649123318645; Mon, 04 Apr 2022 18:48:38 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id s5-20020a056602168500b0064c82210ce4sm7650542iow.13.2022.04.04.18.48.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:38 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault() Date: Mon, 4 Apr 2022 21:48:36 -0400 Message-Id: <20220405014836.14077-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We used to check against none pte in finish_fault(), with the assumption that the orig_pte is always none pte. This change prepares us to be able to call do_fault() on !none ptes. For example, we should allow that to happen for pte marker so that we can resto= re information out of the pte markers. Let's change the "pte_none" check into detecting changes since we fetched orig_pte. One trivial thing to take care of here is, when pmd=3D=3DNULL for the pgtable we may not initialize orig_pte at all in handle_pte_fault(). By default orig_pte will be all zeros however the problem is not all architectures are using all-zeros for a none pte. pte_clear() will be the right thing to use here so that we'll always have a valid orig_pte value for the whole handle_pte_fault() call. Signed-off-by: Peter Xu Reported-by: Marek Szyprowski Reviewed-by: Alistair Popple Tested-by: Marek Szyprowski --- mm/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 3f396241a7db..b1af996b09ca 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4241,7 +4241,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) vmf->address, &vmf->ptl); ret =3D 0; /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) do_set_pte(vmf, page, vmf->address); else ret =3D VM_FAULT_NOPAGE; @@ -4709,6 +4709,13 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) * concurrent faults and from rmap lookups. */ vmf->pte =3D NULL; + /* + * Always initialize orig_pte. This matches with below + * code to have orig_pte to be the none pte if pte=3D=3DNULL. + * This makes the rest code to be always safe to reference + * it, e.g. in finish_fault() we'll detect pte changes. + */ + pte_clear(vmf->vma->vm_mm, vmf->address, &vmf->orig_pte); } else { /* * If a huge pmd materialized under us just retry later. Use --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEFD8C433F5 for ; Tue, 5 Apr 2022 02:41:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230000AbiDECng (ORCPT ); Mon, 4 Apr 2022 22:43:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229970AbiDECnI (ORCPT ); Mon, 4 Apr 2022 22:43:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A986D71EC5 for ; Mon, 4 Apr 2022 18:48:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123323; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U8n+4P5JjRaDUHKZYY8oH8Iy0i6vYl4G2VE7hAdGm3Q=; b=XYqWP2IOZP/UU5FPWFCBJQ+q+zF2wpogoTeForttZeLMTkUrGMwdtJnR7genFKEeb05hwc lt5UInW0d31rmawsfOkUEBThCp3Z6dijJ+LzZ9tlfjS085b8UyAFOfxq1sjP7aNUsaFIZ1 KAFDK9UfESZsZt9k6Uqfa+Fke85QaHw= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-447-lQqqXMKxPqeKenshi0aNSQ-1; Mon, 04 Apr 2022 21:48:42 -0400 X-MC-Unique: lQqqXMKxPqeKenshi0aNSQ-1 Received: by mail-il1-f197.google.com with SMTP id f18-20020a926a12000000b002be48b02bc6so7180847ilc.17 for ; Mon, 04 Apr 2022 18:48:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U8n+4P5JjRaDUHKZYY8oH8Iy0i6vYl4G2VE7hAdGm3Q=; b=HEMA8i7TF0pjLLZqzLr5xJWLbup3atpL/gXuGYUHqk9Ed236hX0Hu3MvrioldgKHXY uiU9bnuOCz6ImkCVmFRTwTKsPHz9CVXbHNl6BLnn6KLbIdFwDIIitDSlxPseKPDwUVHU RNu+d6YI0xh6ihdBJ8sTD9/eWsMSUo+CjoRbOMfrAGjY54TdT0YTQI+lzZu0UwRzcbPJ WBaPiIZUdIPMaV/URgyc3yqv9QVo52y4Iux2hHFfMNQ6iF/BfB6h7/BLJNwQSVetTxv5 UGYvBcfRmkbdZaPclbW0aLr105GUQO19UxZJ2fWvrDbWbh1oXGnuBZyG+9JZSxeMXVa8 ik4A== X-Gm-Message-State: AOAM531YOvelJf6gXc4y+1C78YHevJrKiyBJ1tGOuvVjWr4U/BwyBHW6 JtuMFp3S+FLhkPyArEFIyRBlYBjYe86XpweUa9Wyv01ugHnOdr50sSiV3Lsr0hYAJZypnvI9lB0 OfuFnzzrha0nDmJdeAqMGraAmrnP6Cwqd/tV9NCrrg6j0ssrXlib5kkG/Og5NGOEbQvf5rM2HdQ == X-Received: by 2002:a05:6602:490:b0:638:c8ed:1e38 with SMTP id y16-20020a056602049000b00638c8ed1e38mr589245iov.202.1649123321824; Mon, 04 Apr 2022 18:48:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzBgvwBVvGE1Vqojm5+pnVWt3lmiOe9Zr6Q3Q9kFVvDL2PkJIvmVaEpRNWmsAlmIdguH9c/+A== X-Received: by 2002:a05:6602:490:b0:638:c8ed:1e38 with SMTP id y16-20020a056602049000b00638c8ed1e38mr589217iov.202.1649123321422; Mon, 04 Apr 2022 18:48:41 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id s5-20020a056602168500b0064c82210ce4sm7650607iow.13.2022.04.04.18.48.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:41 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 04/23] mm/uffd: PTE_MARKER_UFFD_WP Date: Mon, 4 Apr 2022 21:48:38 -0400 Message-Id: <20220405014838.14131-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch introduces the 1st user of pte marker: the uffd-wp marker. When the pte marker is installed with the uffd-wp bit set, it means this pte was wr-protected by uffd. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. Some helpers are introduced to detect whether a swap pte is uffd wr-protect= ed. After the pte marker introduced, one swap pte can be wr-protected in two fo= rms: either it is a normal swap pte and it has _PAGE_SWP_UFFD_WP set, or it's a = pte marker that has PTE_MARKER_UFFD_WP set. Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Suggested-by: Andrea Arcangeli Suggested-by: Hugh Dickins Signed-off-by: Peter Xu Reported-by: kernel test robot --- include/linux/swapops.h | 3 ++- include/linux/userfaultfd_k.h | 43 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 9 ++++++++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 7a00627845f0..fffbba0036f6 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -276,7 +276,8 @@ static inline int is_readable_migration_entry(swp_entry= _t entry) =20 typedef unsigned long pte_marker; =20 -#define PTE_MARKER_MASK (0) +#define PTE_MARKER_UFFD_WP BIT(0) +#define PTE_MARKER_MASK (PTE_MARKER_UFFD_WP) =20 #ifdef CONFIG_PTE_MARKER =20 diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 33cea484d1ad..bd09c3c89b59 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -15,6 +15,8 @@ =20 #include #include +#include +#include #include =20 /* The set of all possible UFFD-related VM flags. */ @@ -236,4 +238,45 @@ static inline void userfaultfd_unmap_complete(struct m= m_struct *mm, =20 #endif /* CONFIG_USERFAULTFD */ =20 +static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) +{ + return is_pte_marker_entry(entry) && + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP); +} + +static inline bool pte_marker_uffd_wp(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + swp_entry_t entry; + + if (!is_swap_pte(pte)) + return false; + + entry =3D pte_to_swp_entry(pte); + + return pte_marker_entry_uffd_wp(entry); +#else + return false; +#endif +} + +/* + * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er + * forms (pte marker or a normal swap pte), false otherwise. + */ +static inline bool pte_swp_uffd_wp_any(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + if (!is_swap_pte(pte)) + return false; + + if (pte_swp_uffd_wp(pte)) + return true; + + if (pte_marker_uffd_wp(pte)) + return true; +#endif + return false; +} + #endif /* _LINUX_USERFAULTFD_K_H */ diff --git a/mm/Kconfig b/mm/Kconfig index a1688b9314b2..6e7c2d59fa96 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -915,6 +915,15 @@ config PTE_MARKER help Allows to create marker PTEs for file-backed memory. =20 +config PTE_MARKER_UFFD_WP + bool "Marker PTEs support for userfaultfd write protection" + depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP + + help + Allows to create marker PTEs for userfaultfd write protection + purposes. It is required to enable userfaultfd write protection on + file-backed memory types like shmem and hugetlbfs. + source "mm/damon/Kconfig" =20 endmenu --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8291C433F5 for ; Tue, 5 Apr 2022 02:41:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229846AbiDECnp (ORCPT ); Mon, 4 Apr 2022 22:43:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230020AbiDECnK (ORCPT ); Mon, 4 Apr 2022 22:43:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 68F8B3178E6 for ; Mon, 4 Apr 2022 18:48:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123328; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U7N++YETabzz5IcvkFNQ6jfBTPxFyQ6g10gwAR5sfFo=; b=jJOquU4dRtsbORAtL0QtiVKZblAmB+azqwKy+Wi6C8pKqK2/blj/Ud9FYjSvkbJjVF4Xxs k7ER3530HGSQCKNEneQ9jyq561fAbeA0UmzWjaneMNrIqSGFCDZz2ALYXq5ybPRibRh9/e wH4EzCO84M/KOfVjFrCdeuqGyVFgKN4= Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-634-QnmOIHo7P5W0dkZTLwbn8A-1; Mon, 04 Apr 2022 21:48:45 -0400 X-MC-Unique: QnmOIHo7P5W0dkZTLwbn8A-1 Received: by mail-il1-f198.google.com with SMTP id s4-20020a92c5c4000000b002c7884b8608so7161856ilt.21 for ; Mon, 04 Apr 2022 18:48:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U7N++YETabzz5IcvkFNQ6jfBTPxFyQ6g10gwAR5sfFo=; b=g1JZ/mSYeWY2YSKTUKQ/3kleio3Ta+DPPsUNAA/yZZ+7xVxOUNcNJFVnX3lYSgtaYW ES1ZgS7n9XXh6I0hVDyzsNA7A1mY+PJ1TBq4B19+d6xdshWGbE9QEuBgjMBcXND0b1AF Js0gHgtsV6NoE9nR2e26Nl5QlicnUyPCsUZEK8bVkwzerUtn73C5Sml68uHiBO1lAF4+ +BI+nrF0MBHyeDhp1uxnw4bDifc/z1hAKvRCu6lO2NG6bm6GZW3umsztMY8u2X8W52lM fqPho9UPcICQvtwVh7/n01bInKXzZlq21VeJCxnzy75pl2moP0EoBICaU/fc/tAbVlfL RydQ== X-Gm-Message-State: AOAM531CIf4r1CRZdlfp3aCqD7kwn5BkFZ2Mi6vi1JOF+39zw5csuCGq mDXuJznxakgfC+SN7E4cPuQjv8FgkGsvxFkRB7KQqvz7WQ/ZWDQlxO29/Ws0wWWqhKpt8ypr8JL HLNJjt6XK6AkgB+6Veo9BEBvF74WqD/oOYzVZDQapWMXAEyA2ztBf7ZNsOdWKcGhtir4SFqYXQA == X-Received: by 2002:a02:ccdb:0:b0:321:2cf8:8c70 with SMTP id k27-20020a02ccdb000000b003212cf88c70mr736099jaq.32.1649123324395; Mon, 04 Apr 2022 18:48:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwdVvGICqrYSIFeMsXS/EOPSh7BPG/ifQSrmjaNr8HpCpVJTxhXefsGedxvCcLNau5XeR/6Pg== X-Received: by 2002:a02:ccdb:0:b0:321:2cf8:8c70 with SMTP id k27-20020a02ccdb000000b003212cf88c70mr736070jaq.32.1649123324113; Mon, 04 Apr 2022 18:48:44 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id ay18-20020a5d9d92000000b0064c77f6aaecsm7925169iob.3.2022.04.04.18.48.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:43 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 4 Apr 2022 21:48:41 -0400 Message-Id: <20220405014841.14185-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Pass wp_copy into shmem_mfill_atomic_pte() through the stack, then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with UFFDIO_COPY_MODE= _WP. wp_copy lands mfill_atomic_install_pte() finally. Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte()= , as mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file). Signed-off-by: Peter Xu --- include/linux/shmem_fs.h | 4 ++-- mm/shmem.c | 4 ++-- mm/userfaultfd.c | 23 ++++++++++++++++++----- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 3e915cc550bc..a68f982f22d1 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -145,11 +145,11 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *d= st_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, pagep) ({ BUG(); 0; }) + src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ =20 diff --git a/mm/shmem.c b/mm/shmem.c index 7004c7f55716..9efb8a96d75e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2319,7 +2319,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep) { struct inode *inode =3D file_inode(dst_vma->vm_file); @@ -2392,7 +2392,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, goto out_release; =20 ret =3D mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, false); + page, true, wp_copy); if (ret) goto out_delete_from_cache; =20 diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index dae25d985d15..b1c875b77fbb 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -77,10 +77,19 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, = pmd_t *dst_pmd, * Always mark a PTE as write-protected when needed, regardless of * VM_WRITE, which the user might change. */ - if (wp_copy) + if (wp_copy) { _dst_pte =3D pte_mkuffd_wp(_dst_pte); - else if (writable) + writable =3D false; + } + + if (writable) _dst_pte =3D pte_mkwrite(_dst_pte); + else + /* + * We need this to make sure write bit removed; as mk_pte() + * could return a pte with write bit set. + */ + _dst_pte =3D pte_wrprotect(_dst_pte); =20 dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); =20 @@ -95,7 +104,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, = pmd_t *dst_pmd, } =20 ret =3D -EEXIST; - if (!pte_none(*dst_pte)) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!pte_none_mostly(*dst_pte)) goto out_unlock; =20 if (page_in_cache) { @@ -479,11 +493,10 @@ static __always_inline ssize_t mfill_atomic_pte(struc= t mm_struct *dst_mm, err =3D mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr); } else { - VM_WARN_ON_ONCE(wp_copy); err =3D shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode !=3D MCOPY_ATOMIC_NORMAL, - page); + wp_copy, page); } =20 return err; --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8BCFC433EF for ; Tue, 5 Apr 2022 02:41:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230025AbiDECnl (ORCPT ); Mon, 4 Apr 2022 22:43:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230024AbiDECnK (ORCPT ); Mon, 4 Apr 2022 22:43:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 51C442AE9CA for ; Mon, 4 Apr 2022 18:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123329; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fw6KWLeUFN89OSB0e6nh5J0fTPBfve4te3QhwRo4Loc=; b=LUEylUBj5/6qb4WhFA/BsuRVsHZM/JmHgyEEgIlkR2WooOJ2QmMzHQ1jGAVA5BYkGnJVji QjZ0C9huFQhFdVynzG8GZ15Jhhp/q4Xr1ps6fcegNGDos2mLAYlG2YEtn4Hg6z7WZ69euJ a2fFGIHj71fGtLqbiwcHnGWge+2/Mhc= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-326-rX5Fmf9mP729tvveIqKRzQ-1; Mon, 04 Apr 2022 21:48:48 -0400 X-MC-Unique: rX5Fmf9mP729tvveIqKRzQ-1 Received: by mail-il1-f197.google.com with SMTP id x1-20020a056e020f0100b002c98fce9c13so7221709ilj.3 for ; Mon, 04 Apr 2022 18:48:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Fw6KWLeUFN89OSB0e6nh5J0fTPBfve4te3QhwRo4Loc=; b=aB3SJyspUqmMxm7d9nV3lRVQA2ioCBs3REhZT4i0LsXyEPDWH/Z59Z1JSM0ozF2JmD 8A+iPN9e96etRzQF8tmBBafeALA/DU7ebRJkNAQhhUSGSrMnpAwBR2Ry+ZBdPWmYgBCR NXk0jT+Sjme1L1nUtzYjd1OtfIYnJqd8wF4TnCMrhYOPJ1/AauAGJf0fMz7FI80yisM1 lGGxrTnQn2VNODFUlG15Sm37mrPd8gYC7kjm/FyRR2Tk2hO1ZDLZRRRfMVHzuQxSLnFp 4PSN5ACPqnH/JH/XclunIi+GmS/A+ph91K74SCPBF5XFvibw7+0fDaZjkDcxAYuJZM6X Jvgg== X-Gm-Message-State: AOAM533JFaeZTE+5iTTVEe2am2+40GoH93QRXcAn2W0mfk+TtDyfnz9L i6hgoz6WtyDjt+1mFZjH7wGjROkVrVSLVsUISNTyMEywr6W3gK1XGFSsKTTUiVGNKQPb6n3XI/f 2NMZVk/ZWP2+aR1+HEk5czlNre259fr/o3+mU7LB0YDRrX7G4JmAbZX/4ymuih5Lrq40tK99FHg == X-Received: by 2002:a92:6c0c:0:b0:2c7:ace3:7ecc with SMTP id h12-20020a926c0c000000b002c7ace37eccmr567927ilc.124.1649123327554; Mon, 04 Apr 2022 18:48:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx84rejRE/kakXdjuASUjJL9LRrkKumCkwNt6LzBT3SqnMgqMybL7GwNEjDRDEuJ5YeMeE60w== X-Received: by 2002:a92:6c0c:0:b0:2c7:ace3:7ecc with SMTP id h12-20020a926c0c000000b002c7ace37eccmr567899ilc.124.1649123327154; Mon, 04 Apr 2022 18:48:47 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id m4-20020a0566022e8400b006463059bf2fsm7314659iow.49.2022.04.04.18.48.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:46 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Date: Mon, 4 Apr 2022 21:48:44 -0400 Message-Id: <20220405014844.14239-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memories are prone to unmap/swap so the ptes are always unstabl= e, because they can be easily faulted back later using the page cache. This c= ould lead to uffd-wp getting lost when unmapping or swapping out such memory. O= ne example is shmem. PTE markers are needed to store those information. This patch prepares it by handling uffd-wp pte markers first it is applied elsewhere, so that the page fault handler can recognize uffd-wp pte markers. The handling of uffd-wp pte markers is similar to missing fault, it's just = that we'll handle this "missing fault" when we see the pte markers, meanwhile we need to make sure the marker information is kept during processing the faul= t. This is a slow path of uffd-wp handling, because zapping of wr-protected sh= mem ptes should be rare. So far it should only trigger in two conditions: (1) When trying to punch holes in shmem_fallocate(), there is an optimiza= tion to zap the pgtables before evicting the page. (2) When swapping out shmem pages. Because of this, the page fault handling is simplifed too by not sending the wr-protect message in the 1st page fault, instead the page will be installed read-only, so the uffd-wp message will be generated in the next fault, which will trigger the do_wp_page() path of general uffd-wp handling. Disable fault-around for all uffd-wp registered ranges for extra safety just like uffd-minor fault, and clean the code up. Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 17 +++++++++ mm/memory.c | 67 ++++++++++++++++++++++++++++++----- 2 files changed, 75 insertions(+), 9 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index bd09c3c89b59..827e38b7be65 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -96,6 +96,18 @@ static inline bool uffd_disable_huge_pmd_share(struct vm= _area_struct *vma) return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } =20 +/* + * Don't do fault around for either WP or MINOR registered uffd range. For + * MINOR registered range, fault around will be a total disaster and ptes = can + * be installed without notifications; for WP it should mostly be fine as = long + * as the fault around checks for pte_none() before the installation, howe= ver + * to be super safe we just forbid it. + */ +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; @@ -236,6 +248,11 @@ static inline void userfaultfd_unmap_complete(struct m= m_struct *mm, { } =20 +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ =20 static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) diff --git a/mm/memory.c b/mm/memory.c index b1af996b09ca..21abb8a30553 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3559,6 +3559,39 @@ static inline bool should_try_to_free_swap(struct pa= ge *page, page_count(page) =3D=3D 2; } =20 +static vm_fault_t pte_marker_clear(struct vm_fault *vmf) +{ + vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + /* + * Be careful so that we will only recover a special uffd-wp pte into a + * none pte. Otherwise it means the pte could have changed, so retry. + */ + if (is_pte_marker(*vmf->pte)) + pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; +} + +/* + * This is actually a page-missing access, but with uffd-wp special pte + * installed. It means this pte was wr-protected before being unmapped. + */ +static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf) +{ + /* + * Just in case there're leftover special ptes even after the region + * got unregistered - we can simply clear them. We can also do that + * proactively when e.g. when we do UFFDIO_UNREGISTER upon some uffd-wp + * ranges, but it should be more efficient to be done lazily here. + */ + if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) + return pte_marker_clear(vmf); + + /* do_fault() can handle pte markers too like none pte */ + return do_fault(vmf); +} + static vm_fault_t handle_pte_marker(struct vm_fault *vmf) { swp_entry_t entry =3D pte_to_swp_entry(vmf->orig_pte); @@ -3572,8 +3605,11 @@ static vm_fault_t handle_pte_marker(struct vm_fault = *vmf) if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) return VM_FAULT_SIGBUS; =20 - /* TODO: handle pte markers */ - return 0; + if (pte_marker_entry_uffd_wp(entry)) + return pte_marker_handle_uffd_wp(vmf); + + /* This is an unknown pte marker */ + return VM_FAULT_SIGBUS; } =20 /* @@ -4157,6 +4193,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct pa= ge *page) void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long add= r) { struct vm_area_struct *vma =3D vmf->vma; + bool uffd_wp =3D pte_marker_uffd_wp(vmf->orig_pte); bool write =3D vmf->flags & FAULT_FLAG_WRITE; bool prefault =3D vmf->address !=3D addr; pte_t entry; @@ -4171,6 +4208,8 @@ void do_set_pte(struct vm_fault *vmf, struct page *pa= ge, unsigned long addr) =20 if (write) entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); + if (unlikely(uffd_wp)) + entry =3D pte_mkuffd_wp(pte_wrprotect(entry)); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -4344,9 +4383,21 @@ static vm_fault_t do_fault_around(struct vm_fault *v= mf) return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); } =20 +/* Return true if we should do read fault-around, false otherwise */ +static inline bool should_fault_around(struct vm_fault *vmf) +{ + /* No ->map_pages? No way to fault around... */ + if (!vmf->vma->vm_ops->map_pages) + return false; + + if (uffd_disable_fault_around(vmf->vma)) + return false; + + return fault_around_bytes >> PAGE_SHIFT > 1; +} + static vm_fault_t do_read_fault(struct vm_fault *vmf) { - struct vm_area_struct *vma =3D vmf->vma; vm_fault_t ret =3D 0; =20 /* @@ -4354,12 +4405,10 @@ static vm_fault_t do_read_fault(struct vm_fault *vm= f) * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) { - if (likely(!userfaultfd_minor(vmf->vma))) { - ret =3D do_fault_around(vmf); - if (ret) - return ret; - } + if (should_fault_around(vmf)) { + ret =3D do_fault_around(vmf); + if (ret) + return ret; } =20 ret =3D __do_fault(vmf); --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC33FC433EF for ; Tue, 5 Apr 2022 02:41:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229999AbiDECns (ORCPT ); Mon, 4 Apr 2022 22:43:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229937AbiDECnH (ORCPT ); Mon, 4 Apr 2022 22:43:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4AB842AD5CB for ; Mon, 4 Apr 2022 18:48:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123335; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ir+2jSPGqDkP1F+Jy2GyRCXdov0XcHTBUOMC/Y4cJsA=; b=J2kbutqsc/51FJFlu3TQKx0FXxBO72IT3M0e0r6X1r+Xbk8CBO5084N378jTAnD90ZHdaL I3bKxZkIFzTOP2cKVGs/+xAvDgJdRfIpmBhUNYv8P6tZVSn+lxQC+NS7LLEgrN0er+At3s nrqqRCY4MUZg3szTX/fxAcON4D/HXZk= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-632-5Idd-kcHPD6p5ozyShonbQ-1; Mon, 04 Apr 2022 21:48:52 -0400 X-MC-Unique: 5Idd-kcHPD6p5ozyShonbQ-1 Received: by mail-io1-f72.google.com with SMTP id u18-20020a5d8712000000b0064c7a7c497aso7416481iom.18 for ; Mon, 04 Apr 2022 18:48:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ir+2jSPGqDkP1F+Jy2GyRCXdov0XcHTBUOMC/Y4cJsA=; b=4lCPN4EhHR9fHJIxZx647JTGdVSbS3rW9L/hY/dtxOTdDme/mcusYQENZr+ud6VDJ0 DrAvUzJb4Sx6yYgfXAAJbmKGkxUwGAThwKgj0VMOEo8yFsjt4EfNXSzQ9Cxg0and28qQ e8QxXVlfTcs8MbjNZpkhyJTu37G9rdx5QKwUK7TjjjedAsBELLta7jCjJPSCqmHxwuGf GNHdOYhdo0w365yvNigJHnMlSVNgmjwZIXV+pvzkQ0pKI1Q4mPB7A7zNAAbWc+71PPBB S7R78F0KSPWswR7aTeqjtiwleXeraDjR56YhqxD8Q/4aQiMBcuL/q+D558f6wzXHEKWO ZIGw== X-Gm-Message-State: AOAM5339kPPN+Tn58BvL709beTQundvt7Pc8995oC0+bwaL4GUzd88Aw VmN/CLBwyowEw8cIwBSjaQccL5hajz0SwcdkgPo32ezeOO4Ng5aAcnI3hdUetgll67Te+lo7iFp rhgmVKNwaBzqccMsG6mPT9fr1lXd0wgGPnI7bN/jO+reFaAbduTXP7XhRc4Fddmq3zj/Ivc5SBg == X-Received: by 2002:a02:cc1a:0:b0:323:b8c8:99c7 with SMTP id n26-20020a02cc1a000000b00323b8c899c7mr750406jap.300.1649123330146; Mon, 04 Apr 2022 18:48:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxep3RcWdGwAtcQqF4Uw6sBieBCpnIICgT/Mvu6Lvjw0eziijwHEOn7vjw8uy5CPgpEMYPXfQ== X-Received: by 2002:a02:cc1a:0:b0:323:b8c8:99c7 with SMTP id n26-20020a02cc1a000000b00323b8c899c7mr750379jap.300.1649123329750; Mon, 04 Apr 2022 18:48:49 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id l9-20020a922909000000b002ca4ef64362sm1386018ilg.84.2022.04.04.18.48.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:49 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Date: Mon, 4 Apr 2022 21:48:47 -0400 Message-Id: <20220405014847.14295-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memory is prone to being unmapped at any time. It means all information in the pte will be dropped, including the uffd-wp flag. To persist the uffd-wp flag, we'll use the pte markers. This patch teaches= the zap code to understand uffd-wp and know when to keep or drop the uffd-wp bi= t. Add a new flag ZAP_FLAG_DROP_MARKER and set it in zap_details when we don't want to persist such an information, for example, when destroying the whole vma, or punching a hole in a shmem file. For the rest cases we should never drop the uffd-wp bit, or the wr-protect information will get lost. The new ZAP_FLAG_DROP_MARKER needs to be put into mm.h rather than memory.c because it'll be further referenced in hugetlb files later. Signed-off-by: Peter Xu --- include/linux/mm.h | 10 ++++++++ include/linux/mm_inline.h | 43 ++++++++++++++++++++++++++++++++++ mm/memory.c | 49 ++++++++++++++++++++++++++++++++++++--- mm/rmap.c | 8 +++++++ 4 files changed, 107 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 26428ff262fc..857bc8f7af45 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3422,4 +3422,14 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned= long start, } #endif =20 +typedef unsigned int __bitwise zap_flags_t; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information f= or + * file-backed memory. This should only be specified when we will complet= ely + * drop the page in the mm, either by truncation or unmapping of the vma. = By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index ac32125745ab..7b25b53c474a 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -6,6 +6,8 @@ #include #include #include +#include +#include =20 /** * folio_is_file_lru - Should the folio be on a file LRU or anon LRU? @@ -316,5 +318,46 @@ static inline bool mm_tlb_flush_nested(struct mm_struc= t *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } =20 +/* + * If this pte is wr-protected by uffd-wp in any form, arm the special pte= to + * replace a none pte. NOTE! This should only be called when *pte is alr= eady + * cleared so we will never accidentally replace something valuable. Mean= while + * none pte also means we are not demoting the pte so tlb flushed is not n= eeded. + * E.g., when pte cleared the caller should have taken care of the tlb flu= sh. + * + * Must be called with pgtable lock held so that no thread will see the no= ne + * pte, and if they see it, they'll fault and serialize at the pgtable loc= k. + * + * This function is a no-op if PTE_MARKER_UFFD_WP is not enabled. + */ +static inline void +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long ad= dr, + pte_t *pte, pte_t pteval) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + bool arm_uffd_pte =3D false; + + /* The current status of the pte should be "cleared" before calling */ + WARN_ON_ONCE(!pte_none(*pte)); + + if (vma_is_anonymous(vma) || !userfaultfd_wp(vma)) + return; + + /* A uffd-wp wr-protected normal pte */ + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + arm_uffd_pte =3D true; + + /* + * A uffd-wp wr-protected swap pte. Note: this should even cover an + * existing pte marker with uffd-wp bit set. + */ + if (unlikely(pte_swp_uffd_wp_any(pteval))) + arm_uffd_pte =3D true; + + if (unlikely(arm_uffd_pte)) + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); +#endif +} =20 #endif diff --git a/mm/memory.c b/mm/memory.c index 21abb8a30553..1144845ff734 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -74,6 +74,7 @@ #include #include #include +#include =20 #include =20 @@ -1306,6 +1307,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma) struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ bool even_cows; /* Zap COWed private pages too? */ + zap_flags_t zap_flags; /* Extra flags for zapping */ }; =20 /* Whether we should zap all COWed (private) pages too */ @@ -1334,6 +1336,29 @@ static inline bool should_zap_page(struct zap_detail= s *details, struct page *pag return !PageAnon(page); } =20 +static inline bool zap_drop_file_uffd_wp(struct zap_details *details) +{ + if (!details) + return false; + + return details->zap_flags & ZAP_FLAG_DROP_MARKER; +} + +/* + * This function makes sure that we'll replace the none pte with an uffd-wp + * swap special pte marker when necessary. Must be with the pgtable lock h= eld. + */ +static inline void +zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, + struct zap_details *details, pte_t pteval) +{ + if (zap_drop_file_uffd_wp(details)) + return; + + pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1371,6 +1396,8 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, ptent =3D ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, + ptent); if (unlikely(!page)) continue; =20 @@ -1401,6 +1428,13 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, page =3D pfn_swap_entry_to_page(entry); if (unlikely(!should_zap_page(details, page))) continue; + /* + * Both device private/exclusive mappings should only + * work with anonymous page so far, so we don't need to + * consider uffd-wp bit when zap. For more information, + * see zap_install_uffd_wp_if_needed(). + */ + WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(page)]--; if (is_device_private_entry(entry)) page_remove_rmap(page, vma, false); @@ -1417,8 +1451,10 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; - } else if (is_pte_marker_entry(entry)) { - /* By default, simply drop all pte markers when zap */ + } else if (pte_marker_entry_uffd_wp(entry)) { + /* Only drop the uffd-wp marker if explicitly requested */ + if (!zap_drop_file_uffd_wp(details)) + continue; } else if (is_hwpoison_entry(entry)) { if (!should_zap_cows(details)) continue; @@ -1427,6 +1463,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, WARN_ON_ONCE(1); } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 add_mm_rss_vec(mm, rss); @@ -1637,12 +1674,17 @@ void unmap_vmas(struct mmu_gather *tlb, unsigned long end_addr) { struct mmu_notifier_range range; + struct zap_details details =3D { + .zap_flags =3D ZAP_FLAG_DROP_MARKER, + /* Careful - we need to zap private pages too! */ + .even_cows =3D true, + }; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, start_addr, end_addr); mmu_notifier_invalidate_range_start(&range); for ( ; vma && vma->vm_start < end_addr; vma =3D vma->vm_next) - unmap_single_vma(tlb, vma, start_addr, end_addr, NULL); + unmap_single_vma(tlb, vma, start_addr, end_addr, &details); mmu_notifier_invalidate_range_end(&range); } =20 @@ -3438,6 +3480,7 @@ void unmap_mapping_folio(struct folio *folio) =20 details.even_cows =3D false; details.single_folio =3D folio; + details.zap_flags =3D ZAP_FLAG_DROP_MARKER; =20 i_mmap_lock_read(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) diff --git a/mm/rmap.c b/mm/rmap.c index 208b2c683cec..69416072b1a6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -73,6 +73,7 @@ #include #include #include +#include =20 #include =20 @@ -1538,6 +1539,13 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } =20 + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5871C433EF for ; Tue, 5 Apr 2022 02:42:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229887AbiDECn4 (ORCPT ); Mon, 4 Apr 2022 22:43:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229883AbiDECnQ (ORCPT ); Mon, 4 Apr 2022 22:43:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AD1F115AE08 for ; Mon, 4 Apr 2022 18:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2+qvTtHyC51V4BqYHYp23OrN7TjCUKrDVYDSF5MCVo=; b=QO0o1JQvBza6CEXpx7zPbkJm3vwfrMlNuAThhvXiyZyyN7c/9RkWRkdiHuZiW5ra7GzcYA 4tfXznHUKOGjguLOJSbXIgNqNi496shVNmDjL4afyplJ8pf77jNxkLlrvAvWEfV6iFkjTP fSmFvty/68zwamcXFZxp9DgsdBJggKU= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-34-8HGma5BuNPaemgpB245Rrw-1; Mon, 04 Apr 2022 21:48:54 -0400 X-MC-Unique: 8HGma5BuNPaemgpB245Rrw-1 Received: by mail-il1-f200.google.com with SMTP id m3-20020a056e02158300b002b6e3d1f97cso7181864ilu.19 for ; Mon, 04 Apr 2022 18:48:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=l2+qvTtHyC51V4BqYHYp23OrN7TjCUKrDVYDSF5MCVo=; b=hDzpImuEbXQehT0BXGidS8eXFKxniUbLYfFKy9CuPTvWxKKAqAQWWfxnst/eWoj/T+ uNJYNtzYscuVsB+RQKx63IIEcboydjKfBRSB8kNyiuhBcTDTPFwAYjm8iRcI3cEk3xvp FwBe5gK+uFFu2Tfi4wQcMtau96xhWpZ25D56uJFnL5/Exrh8f0DMmwQKmaHu0YIv3m9F 294fyKW7e2+3/cp+JCXBh3+7KUB2qx00pQHFg7Z2pXr+sL7NS+IofcmW/ZH6MwG00gg8 I267q5UBU7MSBFwzuqbuRf+TDrB3BQgxiFSs+B/J76YD+Ng+2wm0huugE2VshmNM96YK 3OVg== X-Gm-Message-State: AOAM532/vbF5sdCQES/EMtA9lTjaecQsCeCAiYgTtd8gnb0vyFwA/QV9 MR7PfnZSBUOVSJz3k/Gqa8IOPRFYqByevp+OVxyQpHGHegN/+bccj5+curXIz5LriwQz+jx6Q0M ic+Q515Wsx5oWk/bbS3LRtHEwBWyYUa7M6nRU2A9W5KbTVKBksTZbjQsRiC4JtqsVB/1/fYd8lw == X-Received: by 2002:a05:6602:14cb:b0:646:3b7d:6aee with SMTP id b11-20020a05660214cb00b006463b7d6aeemr609375iow.178.1649123332981; Mon, 04 Apr 2022 18:48:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwC5SfA3HnFu2jqAo60amfsdIfOUDVqgatfBUTtv0HUIBqFTM3g2TfSh/uOX6Ux3OnPnNRg4g== X-Received: by 2002:a05:6602:14cb:b0:646:3b7d:6aee with SMTP id b11-20020a05660214cb00b006463b7d6aeemr609348iow.178.1649123332673; Mon, 04 Apr 2022 18:48:52 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id g9-20020a056e020d0900b002ca5573dfe8sm514842ilj.22.2022.04.04.18.48.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:52 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem Date: Mon, 4 Apr 2022 21:48:50 -0400 Message-Id: <20220405014850.14352-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memory differs from anonymous memory in that even if the pte is missing, the data could still resides either in the file or in page/swap ca= che. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp pte markers when necessary. So when there's a future write to the pte, the fault handler will go the special pa= th to first fault-in the page as read-only, then report to userfaultfd server = with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the pte got unmapped but replaced by the special uffd-wp marker. Then we'll need t= o be able to recover from a uffd-wp pte marker into a none pte, so that the next access to the page will fault in correctly as usual when accessed the next time. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be a= ble to pre-populate the page table entries if we see (!anonymous && MM_CP_UFFD_= WP) requests, otherwise change_protection_range() will always skip when the pgt= able entry does not exist. For example, the pgtable can be missing for a whole chunk of 2M pmd, but the page cache can exist for the 2M range. When we want to wr-protect one 4K p= age within the 2M pmd range, we need to pre-populate the pgtable and install the pte marker showing that we want to get a message and block the thread when = the page cache of that 4K page is written. Without pre-populating the pmd, change_protection() will simply skip that whole pmd. Note that this patch only covers the small pages (pte level) but not coveri= ng any of the transparent huge pages yet. That will be done later, and this p= atch will be a preparation for it too. Signed-off-by: Peter Xu --- mm/mprotect.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 709a6f73b764..bd62d5938c6c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -188,8 +189,16 @@ static unsigned long change_pte_range(struct vm_area_s= truct *vma, pmd_t *pmd, newpte =3D pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); - } else if (is_pte_marker_entry(entry)) { - /* Skip it, the same as none pte */ + } else if (pte_marker_entry_uffd_wp(entry)) { + /* + * If this is uffd-wp pte marker and we'd like + * to unprotect it, drop it; the next page + * fault will trigger without uffd trapping. + */ + if (uffd_wp_resolve) { + pte_clear(vma->vm_mm, addr, pte); + pages++; + } continue; } else { newpte =3D oldpte; @@ -204,6 +213,20 @@ static unsigned long change_pte_range(struct vm_area_s= truct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect a none pte, because even if the + * pte is none, the page/swap cache could + * exist. Doing that by install a marker. + */ + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); + pages++; + } } } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); arch_leave_lazy_mmu_mode(); @@ -237,6 +260,39 @@ static inline int pmd_none_or_clear_bad_unless_trans_h= uge(pmd_t *pmd) return 0; } =20 +/* Return true if we're uffd wr-protecting file-backed memory, or false */ +static inline bool +uffd_wp_protect_file(struct vm_area_struct *vma, unsigned long cp_flags) +{ + return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); +} + +/* + * If wr-protecting the range for file-backed, populate pgtable for the ca= se + * when pgtable is empty but page cache exists. When {pte|pmd|...}_alloc() + * failed it means no memory, we don't have a better option but stop. + */ +#define change_pmd_prepare(vma, pmd, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \ + break; \ + } \ + } while (0) +/* + * This is the general pud/p4d/pgd version of change_pmd_prepare(). We nee= d to + * have separate change_pmd_prepare() because pte_alloc() returns 0 on suc= cess, + * while {pmd|pud|p4d}_alloc() returns the valid pointer on success. + */ +#define change_prepare(vma, high, low, addr, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + low##_t *p =3D low##_alloc(vma->vm_mm, high, addr); \ + if (WARN_ON_ONCE(p =3D=3D NULL)) \ + break; \ + } \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -255,6 +311,7 @@ static inline unsigned long change_pmd_range(struct vm_= area_struct *vma, =20 next =3D pmd_addr_end(addr, end); =20 + change_pmd_prepare(vma, pmd, cp_flags); /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur @@ -320,6 +377,7 @@ static inline unsigned long change_pud_range(struct vm_= area_struct *vma, pud =3D pud_offset(p4d, addr); do { next =3D pud_addr_end(addr, end); + change_prepare(vma, pud, pmd, addr, cp_flags); if (pud_none_or_clear_bad(pud)) continue; pages +=3D change_pmd_range(vma, pud, addr, next, newprot, @@ -340,6 +398,7 @@ static inline unsigned long change_p4d_range(struct vm_= area_struct *vma, p4d =3D p4d_offset(pgd, addr); do { next =3D p4d_addr_end(addr, end); + change_prepare(vma, p4d, pud, addr, cp_flags); if (p4d_none_or_clear_bad(p4d)) continue; pages +=3D change_pud_range(vma, p4d, addr, next, newprot, @@ -365,6 +424,7 @@ static unsigned long change_protection_range(struct vm_= area_struct *vma, inc_tlb_flush_pending(mm); do { next =3D pgd_addr_end(addr, end); + change_prepare(vma, pgd, p4d, addr, cp_flags); if (pgd_none_or_clear_bad(pgd)) continue; pages +=3D change_p4d_range(vma, pgd, addr, next, newprot, --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0502BC433F5 for ; Tue, 5 Apr 2022 02:42:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230071AbiDECoA (ORCPT ); Mon, 4 Apr 2022 22:44:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbiDECnR (ORCPT ); Mon, 4 Apr 2022 22:43:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 66FCC39A6BF for ; Mon, 4 Apr 2022 18:48:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123337; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9Tho0jTi/mP2SxzzY9PiukQljtVmr59NjLeFBDKY3mg=; b=VvYOROd87jq+Jg2/XfDAYGNmhXghpTkG00CR2PUzRMNtHtM8tMJOnheAERPmyU9s4n/G/i 5a5Gp4XQO2UvH0fjCGQQ+HosVBnJAtD3z5fStbt6/jk3l4iSbSAMZz9fpmyrBfOgCb4z6a thBjITEovDD8fcCcg4/ahIprgOJz/QA= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-116-F3aZemvIOrizMzRkvM_hLw-1; Mon, 04 Apr 2022 21:48:56 -0400 X-MC-Unique: F3aZemvIOrizMzRkvM_hLw-1 Received: by mail-il1-f199.google.com with SMTP id m3-20020a056e02158300b002b6e3d1f97cso7181907ilu.19 for ; Mon, 04 Apr 2022 18:48:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9Tho0jTi/mP2SxzzY9PiukQljtVmr59NjLeFBDKY3mg=; b=3ygzHfw73x/LbAoxCzoKI3d7UiwMg/yDhzIoQ2spn3i/c0pOsCuwklI8u/UOoIM/mh 2Wu440nsFtpNyqJWZQDhn8u3SKCnYa88oI3YJa6W99CYXVXb3qdWxSRuVJLc9NlAsuM0 qfnB5FDyi1EOG/GyUK4rOK8CvT2qX/3NjxVmQ0FOBV/vLaF0qHBqoBu+jQKE0p6XQ4M8 G7vXKDJHvQ5bwpTjnNb2JIZCM2jW/Fp3M40dE1siO6BeiCXyLq2AQThng+1moHKZRz45 M6yDZ1FEfbDm0w/40CUG58w+f8qZVKISD3PBvPV56Ly3Z+LjugCiOSKTAInIaIKziczv v+Hg== X-Gm-Message-State: AOAM532csGB5kwIJxbQ/tEemkxCNU0AflItFAVv4/SeqkyzhUXQ399dT +BuP18qCn4QCZbshrF0D11NtYGumjhnY3Q5BtuWrnl6c+ZX0ldu0y7jtzC1UEL4Pl5cCWfAxOt0 YEI2z9t7+gG1R/M7KUMQd9aGAzHEJKZs7wGzt35H4mEJj0lU2VS6q/sebSiA+kEoIg89e0UJKjA == X-Received: by 2002:a6b:e60a:0:b0:646:3e9e:172f with SMTP id g10-20020a6be60a000000b006463e9e172fmr604274ioh.1.1649123335833; Mon, 04 Apr 2022 18:48:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwEUfWYkT5v8wLM/eubINNRKFmNMGPSIHFsXbNqMBzFCqBGDzRGSMNc6UHIgjk9FfldR5FQwA== X-Received: by 2002:a6b:e60a:0:b0:646:3e9e:172f with SMTP id g10-20020a6be60a000000b006463e9e172fmr604247ioh.1.1649123335505; Mon, 04 Apr 2022 18:48:55 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id k6-20020a6b4006000000b00649d7111ebasm7563860ioa.0.2022.04.04.18.48.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:55 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Date: Mon, 4 Apr 2022 21:48:52 -0400 Message-Id: <20220405014852.14413-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We don't have "huge" version of pte markers, instead when necessary we split the thp. However split the thp is not enough, because file-backed thp is handled tot= ally differently comparing to anonymous thps: rather than doing a real split, the thp pmd will simply got cleared in __split_huge_pmd_locked(). That is not enough if e.g. when there is a thp covers range [0, 2M) but we = want to wr-protect small page resides in [4K, 8K) range, because after __split_huge_pmd() returns, there will be a none pmd, and change_pmd_range() will just skip it right after the split. Here we leverage the previously introduced change_pmd_prepare() macro so th= at we'll populate the pmd with a pgtable page after the pmd split (in which process the pmd will be cleared for cases like shmem). Then change_pte_ran= ge() will do all the rest for us by installing the uffd-wp pte marker at any none pte that we'd like to wr-protect. Signed-off-by: Peter Xu --- mm/mprotect.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index bd62d5938c6c..e0a567b66d07 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -333,8 +333,15 @@ static inline unsigned long change_pmd_range(struct vm= _area_struct *vma, } =20 if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr !=3D HPAGE_PMD_SIZE) { + if ((next - addr !=3D HPAGE_PMD_SIZE) || + uffd_wp_protect_file(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); + /* + * For file-backed, the pmd could have been + * cleared; make sure pmd populated if + * necessary, then fall-through to pte level. + */ + change_pmd_prepare(vma, pmd, cp_flags); } else { int nr_ptes =3D change_huge_pmd(vma, pmd, addr, newprot, cp_flags); --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22792C433EF for ; Tue, 5 Apr 2022 02:41:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229822AbiDECnR (ORCPT ); Mon, 4 Apr 2022 22:43:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229912AbiDECnG (ORCPT ); Mon, 4 Apr 2022 22:43:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A2EB23178DC for ; Mon, 4 Apr 2022 18:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xu2h9LPZD95UgWg3OX67cvf1aND/vrgeeBhE/APRTdQ=; b=KDjDbiebLj463dZtcD2oziRaTvUBtp2CZ4N9glLWvN1cfKgXiBYs0bBd3nBDWTtb2OV3BT 5uRklzsc5uOqX7UQEzZNaMi+uMiDO3ad8V8ys8ZcAX8amLJNS/IFjcRnCe4eNvaZU/R1Cl D4j9PCWAGn+uJ2k7VOB6BdGGswcMkro= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-164-SX5xPaY2NFCZ0xzL6qfUfg-1; Mon, 04 Apr 2022 21:48:59 -0400 X-MC-Unique: SX5xPaY2NFCZ0xzL6qfUfg-1 Received: by mail-il1-f199.google.com with SMTP id h13-20020a056e021d8d00b002c7fb1ec601so7204222ila.6 for ; Mon, 04 Apr 2022 18:48:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Xu2h9LPZD95UgWg3OX67cvf1aND/vrgeeBhE/APRTdQ=; b=uMlauL5KV9l3yICX3creE+o992jTDKd2SpHcL5Y8IvU3bR4BUeeanIeNAm0XKRmAmi EyeY3HEpYrUmdj03SMvRoGYrW0RuiuQF3szOIwcLwZQNCLiKHxAws+xG4DmNxV3as2W/ da1+He6mLUessB59Y1MIINsy2Ynij58tjKPaPu90bsBDadQKXAuY49o3J3VSOWqKdUhC pchHf3Waxu1YFPFJ71Ng7F56rCC0NsXlMYE643jk4XRyx7b2SigiF91poLN9O9an3eiD ngKwtPUWgKkggArnyZmXA6krH0tt644zE5n1VOLnbFhJ6wzclE5em8/659Qyz2RDgGjp yHug== X-Gm-Message-State: AOAM531e5IjS2IbSZ/DxRya6OtEeHd/hPhfVRW3sF5DN1G9slLUciZY8 ScHSG1HgKJ6LDodbsxsikH38c1u752LZbRtCFXwqnYakvWOb6hhWhdmPNbbl+16VJIORNtcwImn sG8AezBkTwrHZALEe2YvxN/pQHzHqBEnPN2gnGFXdGmqELnICS7dwtgoejq3g9M0jBr6wGFOimQ == X-Received: by 2002:a05:6638:d87:b0:323:c006:3650 with SMTP id l7-20020a0566380d8700b00323c0063650mr709891jaj.64.1649123338509; Mon, 04 Apr 2022 18:48:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwMFmhYcxtKesF8e8jA4ez0VAvz2QlCxEYSpJ9E7Y0EzGtvgbMSWc0UDKb86qoEwbB54Mdr3g== X-Received: by 2002:a05:6638:d87:b0:323:c006:3650 with SMTP id l7-20020a0566380d8700b00323c0063650mr709862jaj.64.1649123338140; Mon, 04 Apr 2022 18:48:58 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id u15-20020a92d1cf000000b002ca56804ec4sm473668ilg.23.2022.04.04.18.48.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:48:57 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 10/23] mm/shmem: Handle uffd-wp during fork() Date: Mon, 4 Apr 2022 21:48:55 -0400 Message-Id: <20220405014855.14468-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Normally we skip copy page when fork() for VM_SHARED shmem, but we can't sk= ip it anymore if uffd-wp is enabled on dst vma. This should only happen when = the src uffd has UFFD_FEATURE_EVENT_FORK enabled on uffd-wp shmem vma, so that VM_UFFD_WP will be propagated onto dst vma too, then we should copy the pgtables with uffd-wp bit and pte markers, because these information will be lost otherwise. Since the condition checks will become even more complicated for deciding "whether a vma needs to copy the pgtable during fork()", introduce a helper vma_needs_copy() for it, so everything will be clearer. Signed-off-by: Peter Xu Reported-by: kernel test robot --- mm/memory.c | 49 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 1144845ff734..8ba1bb196095 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -867,6 +867,14 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct m= m_struct *src_mm, if (try_restore_exclusive_pte(src_pte, src_vma, addr)) return -EBUSY; return -ENOENT; + } else if (is_pte_marker_entry(entry)) { + /* + * We're copying the pgtable should only because dst_vma has + * uffd-wp enabled, do sanity check. + */ + WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); + set_pte_at(dst_mm, addr, dst_pte, pte); + return 0; } if (!userfaultfd_wp(dst_vma)) pte =3D pte_swp_clear_uffd_wp(pte); @@ -1221,6 +1229,38 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, return 0; } =20 +/* + * Return true if the vma needs to copy the pgtable during this fork(). R= eturn + * false when we can speed up fork() by allowing lazy page faults later un= til + * when the child accesses the memory range. + */ +bool +vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_= vma) +{ + /* + * Always copy pgtables when dst_vma has uffd-wp enabled even if it's + * file-backed (e.g. shmem). Because when uffd-wp is enabled, pgtable + * contains uffd-wp protection information, that's something we can't + * retrieve from page cache, and skip copying will lose those info. + */ + if (userfaultfd_wp(dst_vma)) + return true; + + if (src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) + return true; + + if (src_vma->anon_vma) + return true; + + /* + * Don't copy ptes where a page fault will fill them correctly. Fork + * becomes much lighter when there are big shared or private readonly + * mappings. The tradeoff is that copy_page_range is more efficient + * than faulting. + */ + return false; +} + int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src= _vma) { @@ -1234,14 +1274,7 @@ copy_page_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma) bool is_cow; int ret; =20 - /* - * Don't copy ptes where a page fault will fill them correctly. - * Fork becomes much lighter when there are big shared or private - * readonly mappings. The tradeoff is that copy_page_range is more - * efficient than faulting. - */ - if (!(src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) && - !src_vma->anon_vma) + if (!vma_needs_copy(dst_vma, src_vma)) return 0; =20 if (is_vm_hugetlb_page(src_vma)) --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93293C433EF for ; Tue, 5 Apr 2022 02:41:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229810AbiDECn1 (ORCPT ); Mon, 4 Apr 2022 22:43:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229890AbiDECnG (ORCPT ); Mon, 4 Apr 2022 22:43:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B265315AE24 for ; Mon, 4 Apr 2022 18:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1OYbScRjTNz5QBi4xGatPlRbPmUiT0LOinB85jUkB/M=; b=L/BBwTYeBPsyYH4JiLRV2thymvAaY+vrSAIi67NrTTwBKM8boXGihDQShxCNODnL2UOEAY 0ELSrhYIa3phD0eKYKVtVB75NbeydG4JlbKuLGnimj1I0WyiZ2eiNX+yuNCzEpKon47PWv aL8E9Usqd7fRKR3M6iK7lz1H42yDeOA= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-401-PnNWMM54ME6FFusKLR1g5w-1; Mon, 04 Apr 2022 21:49:02 -0400 X-MC-Unique: PnNWMM54ME6FFusKLR1g5w-1 Received: by mail-io1-f69.google.com with SMTP id x16-20020a6bfe10000000b006409f03e39eso7462484ioh.7 for ; Mon, 04 Apr 2022 18:49:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1OYbScRjTNz5QBi4xGatPlRbPmUiT0LOinB85jUkB/M=; b=njevMjeQHqnssYvm05IxuFjGh0Ow5j3csBZ3jVXQlnaSkoBM5AH1dpkmWTnPWl79dV PCPKKwsTtgAEdbbBy+stYOn7kSEuE6Y7XNP0yr8jAoAehIzQUeDVT4qPr/LzuR1DEEFJ zouX1G5BotDBx+M18a9RTFD+urc0s3RDmbiZbU/0TWu/FLL0jYQ3QhLf6PzeywCpH8NK eRSxNII67jBLD6nd6/nbK1vp+UcwDemDA17ancaaU/bEQ8DwkXystq1JtPx664YRddv0 MY78IRtPS53XyLyU/k/K+m01Q2qovQq0jzpsLs/ayaDbOuEdTxEdNJPvWhBhv7U0QtbD QBLA== X-Gm-Message-State: AOAM5303R+iWCB0l9u3EEkbKj6APeUVYflyPpPo1xou7T9dyqh7ikwgE YZ6RswZq/PboHHUvL0T2j9k1JC4rbPHrWmA3+FIuJmHpL7hMimNViuGeEfMLsIvPftOmXu+Rrqd 663UzrpBruFn6UbsYq9G/q9dGCoFYtkXelH68hBIec5kme8MRSHoKVL0mwsWonEymawBdQTDrqw == X-Received: by 2002:a05:6638:a3a:b0:323:5c6d:ae20 with SMTP id 26-20020a0566380a3a00b003235c6dae20mr735143jao.80.1649123341205; Mon, 04 Apr 2022 18:49:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxMAeWuW/3+txDsVV62vki1LADOgkDFuyCvrbdc9Pj11bhRHmjppURbT8nshi8PLqZuUQv4iQ== X-Received: by 2002:a05:6638:a3a:b0:323:5c6d:ae20 with SMTP id 26-20020a0566380a3a00b003235c6dae20mr735120jao.80.1649123340991; Mon, 04 Apr 2022 18:49:00 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id l14-20020a05660227ce00b00645ebb013c1sm8287007ios.45.2022.04.04.18.48.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:00 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Date: Mon, 4 Apr 2022 21:48:58 -0400 Message-Id: <20220405014858.14531-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" They will be used in the follow up patches to either check/set/clear uffd-wp bit of a huge pte. So far it reuses all the small pte helpers. Archs can overwrite these vers= ions when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the future. Signed-off-by: Peter Xu --- arch/s390/include/asm/hugetlb.h | 15 +++++++++++++++ include/asm-generic/hugetlb.h | 15 +++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index bea47e7cc6a0..be99eda87f4d 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -115,6 +115,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_= t newprot) return pte_modify(pte, newprot); } =20 +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte; +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte; +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return 0; +} + static inline bool gigantic_page_runtime_supported(void) { return true; diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index f39cad20ffc6..896f341f614d 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -35,6 +35,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t = newprot) return pte_modify(pte, newprot); } =20 +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte_mkuffd_wp(pte); +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_uffd_wp(pte); +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return pte_uffd_wp(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34D40C433F5 for ; Tue, 5 Apr 2022 02:42:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229828AbiDECoE (ORCPT ); Mon, 4 Apr 2022 22:44:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229968AbiDECnS (ORCPT ); Mon, 4 Apr 2022 22:43:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BBA461265B0 for ; Mon, 4 Apr 2022 18:49:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cMJ3LqPsnbXMwjHxk9MsuTKHubQJUVkMzfk9UWY8MQ4=; b=X/tVQ8WuOQcs0Wry4k6Knl/HQdFenm3wqauPxGciWndda0ShSWTg5YNdxEAd2h1B81Ek4Q 2g5S7GOjRQ6hYyinImlpnCw3tWIvZktoFMgwppixwBVtaXdU5cz1Ypks3d/eFJEV56rcG1 TolWm05GZ+RFYX/AfpoMQxKW3B4nkIM= Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-490--fqfR7a5PF-QmGErn5FBAQ-1; Mon, 04 Apr 2022 21:49:04 -0400 X-MC-Unique: -fqfR7a5PF-QmGErn5FBAQ-1 Received: by mail-il1-f198.google.com with SMTP id o17-20020a92c691000000b002c2c04aebe7so7201424ilg.8 for ; Mon, 04 Apr 2022 18:49:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cMJ3LqPsnbXMwjHxk9MsuTKHubQJUVkMzfk9UWY8MQ4=; b=V37JTsiuXmt6HVhI/dSGTM12e9LjvHe2059rBZqWgWAEavYvMHgtR3tPPSJ/JkzW3M AaXr/M0BZQkuiruS2vTYfEo0SxVvUgsaIGZMGtZTaQdFwkCstth0CWsBQHWKRCdPxJUl lyylOxNPYda1buAP/J+WzOPEJZXcwgRbmNJpg7MQddMdU48+mLQ43mt75wDD5iXPnsCg 46cjiihyPw7cMMTWaHoXKH52lDQ9nXuMgiKXqha5htvibCdgz3ovHNrD5XoXmY/piyUb 9/epjN5rJs3e1uXJySVGbYT3y+2AcjjR3pH/32oV6Ys5l8dWLtvSRW9nqcFeoeWDTATG XlAg== X-Gm-Message-State: AOAM530uobI3IjEu+nodRcxCLxEwQy1H0DdCP/Qxi1lCV1IJUMnSbY+z C9ppKJx0Q8VzpmSg1IAZtUQlCNm4KajeydCB+/3hQJLfNNIx0XPEv76wZH3DpMWw6nN7YRwXQ7H /chxDxKdetlXBCnJLrD8tGzfQ+7vtZct8op0FukAao87diUAXrRFfDVS3eaxI0AJF2z6E6P9ytQ == X-Received: by 2002:a05:6602:2e10:b0:649:e2d4:3334 with SMTP id o16-20020a0566022e1000b00649e2d43334mr586232iow.210.1649123344085; Mon, 04 Apr 2022 18:49:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbt1ibJTi0gvyMJwC5hy/Js4ILQPnWU9UGGKs5bKIIkDs9NeLq7C5AVpY51XCUy10wipducA== X-Received: by 2002:a05:6602:2e10:b0:649:e2d4:3334 with SMTP id o16-20020a0566022e1000b00649e2d43334mr586205iow.210.1649123343719; Mon, 04 Apr 2022 18:49:03 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id y8-20020a920908000000b002ca38acaa60sm2917919ilg.81.2022.04.04.18.49.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:03 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 12/23] mm/hugetlb: Hook page faults for uffd write protection Date: Mon, 4 Apr 2022 21:49:01 -0400 Message-Id: <20220405014901.14590-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faul= ts. We do this slightly earlier than hugetlb_cow() so that we can avoid taking = some extra locks that we definitely don't need. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dd642cfc538b..82df0fcfedf9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5711,6 +5711,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; =20 + /* Handle userfault-wp first, before trying to lock more pages */ + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + struct vm_fault vmf =3D { + .vma =3D vma, + .address =3D haddr, + .real_address =3D address, + .flags =3D flags, + }; + + spin_unlock(ptl); + if (pagecache_page) { + unlock_page(pagecache_page); + put_page(pagecache_page); + } + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + return handle_userfault(&vmf, VM_UFFD_WP); + } + /* * hugetlb_wp() requires page locks of pte_page(entry) and * pagecache_page, so here we need take the former one --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DF2C433EF for ; Tue, 5 Apr 2022 02:42:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230087AbiDECoI (ORCPT ); Mon, 4 Apr 2022 22:44:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230011AbiDECnU (ORCPT ); Mon, 4 Apr 2022 22:43:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D8DFCE388B for ; Mon, 4 Apr 2022 18:49:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=95EZeCyII9xxx3gyIQ0ulWF2wAsEmWfCQ2/ndLgG/cw=; b=Fp7AI/alRZXby+WfMaRUNqSLOHb3ML3AYU/Jnp8q17Zna2zLg1FOZ+WMtyKl7Lu8XBFTOQ 1x869yngkeDN1TOTfASkMD1U5wzvI4nB/kF9cabejVzzDeqG7ykuWfwSnTqvvzbXmRGcFf aJBAVSIrK0QSswC1TXxnvwGJ8GxxqT4= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-X98yPUgQOqeQUx_o0CL_0w-1; Mon, 04 Apr 2022 21:49:08 -0400 X-MC-Unique: X98yPUgQOqeQUx_o0CL_0w-1 Received: by mail-io1-f72.google.com with SMTP id w28-20020a05660205dc00b00645d3cdb0f7so7424115iox.10 for ; Mon, 04 Apr 2022 18:49:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=95EZeCyII9xxx3gyIQ0ulWF2wAsEmWfCQ2/ndLgG/cw=; b=G9CE+cOOo7WBdcj+c1x6PzgAdIYo6+S0+phw9Lj2vhwaByPADCLsDECUE8vmTRTShR jEOf4a8uB16i35HEwfMdhSjk1mxgzGBEZHC8krilvz1g1NpkKwfLu/oKPawoaZH/V+60 lFxCB0PclIabzQeVP9bOy/SbKKVfygBcgVYER7auqSOactQOZrUZXgLxE3wPh8oqaeGz QrlTzL09zZnhr3NFcrPqpGxdYgIo3KoDtdnBZJn+iLy5qCub9iF1ztc3cXYYT8lDW3pb QLPaYau5+xLOPuGFjyiqn1pXWYrovAH1DUAzrwi16tF8Tg4vMsY2EuRejW5Ce7y8es6j MNdQ== X-Gm-Message-State: AOAM530i9GuSyJb8H7N4plPGXl510xG+poHrpgn+7qVVRXGzj6VU4MMI qOT6okACJwXQfZDbLG9s9iW2T3oMCckI/wVdylmY5BUc1BzfLWolUwmefb0GnG6mHp2jxvyTQYS NSBV3sDVkfDnGWt1b44aNc/JHDkyvWcElFOOurZ4XJ4UfxH8B6xNbw8BRnCyOftgHCFRVB8wa9w == X-Received: by 2002:a05:6602:13d5:b0:64c:9ef0:65e1 with SMTP id o21-20020a05660213d500b0064c9ef065e1mr588048iov.157.1649123346655; Mon, 04 Apr 2022 18:49:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvBZ1x9YNEIDP6fPb8GcdToNu45lYsI3Q8dcSYoOOQ7uq3K76BU1AVUi+jy6YnevEjPpvOKw== X-Received: by 2002:a05:6602:13d5:b0:64c:9ef0:65e1 with SMTP id o21-20020a05660213d500b0064c9ef065e1mr588020iov.157.1649123346392; Mon, 04 Apr 2022 18:49:06 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id b15-20020a05660214cf00b0064cb75d7e97sm7836568iow.53.2022.04.04.18.49.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:06 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 4 Apr 2022 21:49:04 -0400 Message-Id: <20220405014904.14643-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the sta= ck. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without set= ting dirty bit in the huge pte if the page is installed as read-only. However w= e'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, b= ut also because the page does contain dirty data that the kernel just copied f= rom the userspace. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 29 +++++++++++++++++++++++------ mm/userfaultfd.c | 14 +++++++++----- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 53c1b6082a4c..6347298778b6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, = pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -355,7 +356,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_st= ruct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 82df0fcfedf9..c94deead22b2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5795,7 +5795,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue =3D (mode =3D=3D MCOPY_ATOMIC_CONTINUE); struct hstate *h =3D hstate_vma(dst_vma); @@ -5925,7 +5926,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_m= m, goto out_release_unlock; =20 ret =3D -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; =20 if (vm_shared) { @@ -5935,17 +5941,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } =20 - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable =3D 0; else writable =3D dst_vma->vm_flags & VM_WRITE; =20 _dst_pte =3D make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte =3D huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte =3D huge_pte_mkdirty(_dst_pte); _dst_pte =3D pte_mkyoung(_dst_pte); =20 + if (wp_copy) + _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); =20 (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b1c875b77fbb..da0b3ed2a6b5 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -304,7 +304,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_shared =3D dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -392,7 +393,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, } =20 if (mode !=3D MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err =3D -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -400,7 +401,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, } =20 err =3D hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); =20 mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -455,7 +457,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct = *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ =20 static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -575,7 +578,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm= _struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); =20 if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9320AC433F5 for ; Tue, 5 Apr 2022 02:42:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbiDECoL (ORCPT ); Mon, 4 Apr 2022 22:44:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229881AbiDECn1 (ORCPT ); Mon, 4 Apr 2022 22:43:27 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A30839E01A for ; Mon, 4 Apr 2022 18:49:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123352; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cYO5DrzO5I7Lh9p2TuEQvYaoFZvRSzMSA+DH3SoMHTo=; b=hfI+zQZ2Fpc5+9J7Z9vZA74QpYaddqdkrpG0LV/0PEgrHhP6VATbS1Xg5vZYGv75y9mJfG 1h82ZTfy8LA9DWJj4FaqryB35ta28lemBLbtTQHYVhdSr/799A+gOlBAyIENs5VLW0YHh3 QmPEAZODewV2ZH/JpJlu0X6k4ZJQ02U= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-252-Ox7s0Mu4OWaiIjGHp6_6HQ-1; Mon, 04 Apr 2022 21:49:11 -0400 X-MC-Unique: Ox7s0Mu4OWaiIjGHp6_6HQ-1 Received: by mail-io1-f69.google.com with SMTP id h10-20020a05660224ca00b0064c77aa4477so7426698ioe.17 for ; Mon, 04 Apr 2022 18:49:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cYO5DrzO5I7Lh9p2TuEQvYaoFZvRSzMSA+DH3SoMHTo=; b=7SQvukmGTID62w810LkKX048UXSLayUiioMDDEENuzI8f+K1rYkBO/zMKIKJwHpde1 OMoZoCzJd5L4aQ4oAuG0aFaUoggqH9BsOgwvfyiY/5m/zX15Qn9HUHQkRxkYrIhep/Ri xbLHGeCqVBvOED5N8kzZuPAH9dd6j1ceSQsl063dJMwhFYIBoxnlk8xQIKT4pUs8EOZn l9/uqcHO6kVSlWpHbklmI/GO1rGIsgpqtkwJdFyfA/5qK4QCfYDeVcprdjjuDbqdrv+h PV8NAEItWrrht+lKzNqdfvsCBXmBAdHWIz4CV+kJHjIS+w+p903A4DNhJn9FApmoWRuv Ak0Q== X-Gm-Message-State: AOAM530ir97TpkFR/pdbMJvUnUB9yxK+fiZGKVeBeFmIdsas3pC7sZ+v 2eq+sn17B7yymfWt5YOSH2qEZoHLVf5fKYmPBw1IGMWLRY2EyrF1azzhDMvW7bBc/b7f/F0VJam 9lHcvj26D1vhZBMYOS+s5XIGYC2XyafHtecKaoZF7xNflaAtiBodgWgKFWpM5H8aXImRNT5Wn1A == X-Received: by 2002:a05:6e02:1aa8:b0:2c9:b67e:170a with SMTP id l8-20020a056e021aa800b002c9b67e170amr627234ilv.254.1649123349557; Mon, 04 Apr 2022 18:49:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIRnqjIx5iwXEGGjGc+n3hX5DRyLK+ATAlgFSZlW8F/klF7wxFnaQrfq2feLlPULuSJF4kVA== X-Received: by 2002:a05:6e02:1aa8:b0:2c9:b67e:170a with SMTP id l8-20020a056e021aa800b002c9b67e170amr627210ilv.254.1649123349195; Mon, 04 Apr 2022 18:49:09 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id u15-20020a92d1cf000000b002ca56804ec4sm473939ilg.23.2022.04.04.18.49.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:08 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Date: Mon, 4 Apr 2022 21:49:06 -0400 Message-Id: <20220405014906.14708-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This starts from passing cp_flags into hugetlb_change_protection() so huget= lb will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests. huge_pte_clear_uffd_wp() is introduced to handle the case where the UFFDIO_WRITEPROTECT is requested upon migrating huge page entries. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 13 ++++++++++++- mm/mprotect.c | 3 ++- mm/userfaultfd.c | 8 ++++++++ 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6347298778b6..38c5ac28b787 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -210,7 +210,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsi= gned long address, int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long address, unsigned long end, pgprot_t newprot, + unsigned long cp_flags); =20 bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -391,7 +392,8 @@ static inline void move_hugetlb_state(struct page *oldp= age, =20 static inline unsigned long hugetlb_change_protection( struct vm_area_struct *vma, unsigned long address, - unsigned long end, pgprot_t newprot) + unsigned long end, pgprot_t newprot, + unsigned long cp_flags) { return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c94deead22b2..2401dd5997b7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6207,7 +6207,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct= vm_area_struct *vma, } =20 unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot) + unsigned long address, unsigned long end, + pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long start =3D address; @@ -6217,6 +6218,8 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, unsigned long pages =3D 0; bool shared_pmd =3D false; struct mmu_notifier_range range; + bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; =20 /* * In the case of shared PMDs, the area to flush could be beyond @@ -6263,6 +6266,10 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, entry =3D make_readable_migration_entry( swp_offset(entry)); newpte =3D swp_entry_to_pte(entry); + if (uffd_wp) + newpte =3D pte_swp_mkuffd_wp(newpte); + else if (uffd_wp_resolve) + newpte =3D pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, newpte, huge_page_size(h)); pages++; @@ -6277,6 +6284,10 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + if (uffd_wp) + pte =3D huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte =3D huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; } diff --git a/mm/mprotect.c b/mm/mprotect.c index e0a567b66d07..6b0e8c213508 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -455,7 +455,8 @@ unsigned long change_protection(struct vm_area_struct *= vma, unsigned long start, BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL); =20 if (is_vm_hugetlb_page(vma)) - pages =3D hugetlb_change_protection(vma, start, end, newprot); + pages =3D hugetlb_change_protection(vma, start, end, newprot, + cp_flags); else pages =3D change_protection_range(vma, start, end, newprot, cp_flags); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index da0b3ed2a6b5..58d67f2bf980 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -704,6 +704,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsig= ned long start, atomic_t *mmap_changing) { struct vm_area_struct *dst_vma; + unsigned long page_mask; pgprot_t newprot; int err; =20 @@ -740,6 +741,13 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsi= gned long start, if (!vma_is_anonymous(dst_vma)) goto out_unlock; =20 + if (is_vm_hugetlb_page(dst_vma)) { + err =3D -EINVAL; + page_mask =3D vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + goto out_unlock; + } + if (enable_wp) newprot =3D vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); else --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28A58C433F5 for ; Tue, 5 Apr 2022 02:42:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229894AbiDECoO (ORCPT ); Mon, 4 Apr 2022 22:44:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229882AbiDECn1 (ORCPT ); Mon, 4 Apr 2022 22:43:27 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 35D1239E01C for ; Mon, 4 Apr 2022 18:49:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123354; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lKVs/J2+ks2bdCQxZbqUvWZ98wqpvNaMBO4b+MQPP2E=; b=PNOkHJtDRhlumm2ZQ85wWVmopfW5xvoAB54M0ZqSpgsgmhb67hrDfl+waBcUUQMqmoq2NO P+C9iUWXZ29Yyl5BKleGGNnvKfSR7C7krin68PbCYxe9Lw0ULa7qp3BMQ2q8WHMcTDJtKs LrPfcvfR+g4Qe1qjO4Duji1inXpk9Z8= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-302-PBEt8VDXMrKaTFFz6JI89g-1; Mon, 04 Apr 2022 21:49:13 -0400 X-MC-Unique: PBEt8VDXMrKaTFFz6JI89g-1 Received: by mail-il1-f197.google.com with SMTP id a6-20020a056e02120600b002ca412e65a7so2681440ilq.0 for ; Mon, 04 Apr 2022 18:49:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lKVs/J2+ks2bdCQxZbqUvWZ98wqpvNaMBO4b+MQPP2E=; b=OgI2lc3ErIfuMJQ7ld6SrJbL6KeZECW+LZrhPFiQ1AXFiiC/y8wG2+YaUv/XXMs+Zt b/9sAyvphREQ1bF+zFjVQAT8XOpETI2Rr/Gwq7sEc66zokYvczjo/LEEvr8P3P1p0o4+ dPfR09BI5Y4Jm610nYxCAk7LW3Pt0Odx38JmVieFqoCaLXQtpTuspijZnkZAkbwytHgp so+lhoVdUK5nuuLrQUKo9kA7zsXFx8dnl+G0pdnd2bN6iVhq4zLWMpgTrI6aLiEyyHUE Mf2XNmhygKAB2BrawDgq5T/MGmns8Em4dXnRfq3oQh9LI1Z2/HjEIARY2aSKH/u1pae3 KN5w== X-Gm-Message-State: AOAM533XwFwIrQ0Q5le5g9KJLh07Ka565TuNvmE7l7DAArocR3Zxt3TM T9bn94Pc5Cf7LMeTz9oNS4O1R0hNY02Y4nOoNLwh72KDtiswEpOFHl5ZxaxWmAgbhGHLtofelfd YTdaAwWRkr9bXQiZxpSEuEDW9GeEhelGuxpUtZ3P0NvV+E0evzMKRO/bj8zA26NiQpVVgg1Po8w == X-Received: by 2002:a02:ce91:0:b0:323:6d4a:484a with SMTP id y17-20020a02ce91000000b003236d4a484amr707146jaq.311.1649123352484; Mon, 04 Apr 2022 18:49:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyupx1oRwn841EYWh23vc7XbLA4GVVEmXYgiyGK3QUl9XTVB2ugR3JABDvh7vEjtnRLX6P2Ww== X-Received: by 2002:a02:ce91:0:b0:323:6d4a:484a with SMTP id y17-20020a02ce91000000b003236d4a484amr707114jaq.311.1649123352138; Mon, 04 Apr 2022 18:49:12 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id r9-20020a6b6009000000b006412abddbbbsm7344446iog.24.2022.04.04.18.49.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:11 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 15/23] mm/hugetlb: Handle pte markers in page faults Date: Mon, 4 Apr 2022 21:49:09 -0400 Message-Id: <20220405014909.14761-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allow hugetlb code to handle pte markers just like none ptes. It's mostly there, we just need to make sure we don't assume hugetlb_no_page() only han= dles none pte, so when detecting pte change we should use pte_same() rather than pte_none(). We need to pass in the old_pte to do the comparison. Check the original pte to see whether it's a pte marker, if it is, we should recover uffd-wp bit on the new pte to be installed, so that the next write = will be trapped by uffd. Signed-off-by: Peter Xu Reported-by: kernel test robot --- mm/hugetlb.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2401dd5997b7..9317b790161d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5412,7 +5412,8 @@ static inline vm_fault_t hugetlb_handle_userfault(str= uct vm_area_struct *vma, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, unsigned int flags) + unsigned long address, pte_t *ptep, + pte_t old_pte, unsigned int flags) { struct hstate *h =3D hstate_vma(vma); vm_fault_t ret =3D VM_FAULT_SIGBUS; @@ -5539,7 +5540,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, =20 ptl =3D huge_pte_lock(h, mm, ptep); ret =3D 0; - if (!huge_pte_none(huge_ptep_get(ptep))) + /* If pte changed from under us, retry */ + if (!pte_same(huge_ptep_get(ptep), old_pte)) goto backout; =20 if (anon_rmap) { @@ -5549,6 +5551,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *= mm, page_dup_file_rmap(page, true); new_pte =3D make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); + /* + * If this pte was previously wr-protected, keep it wr-protected even + * if populated. + */ + if (unlikely(pte_marker_uffd_wp(old_pte))) + new_pte =3D huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); set_huge_pte_at(mm, haddr, ptep, new_pte); =20 hugetlb_count_add(pages_per_huge_page(h), mm); @@ -5666,8 +5674,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 entry =3D huge_ptep_get(ptep); - if (huge_pte_none(entry)) { - ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags); + /* PTE markers should be handled the same way as none pte */ + if (huge_pte_none_mostly(entry)) { + ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + entry, flags); goto out_mutex; } =20 --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01D19C433F5 for ; Tue, 5 Apr 2022 02:42:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229949AbiDECoU (ORCPT ); Mon, 4 Apr 2022 22:44:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229927AbiDECnb (ORCPT ); Mon, 4 Apr 2022 22:43:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F365CE41FB for ; Mon, 4 Apr 2022 18:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123357; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tYxpsQItjDK7cACelYDmbYGqmACKurZeMNuIXszIjeQ=; b=RzS1Edb6hqCIQnZRWCCG/TWCQUSKtE4qrHbtwXyEH9l6Z01tVFfa6Q4aeSs+nRspbZ6wF9 itNxSG/LGld1pleaF1io6zx7aN7hkNjGjluWhFx8yN+nFEyv8vu+ev0PxnQPGxCwthGumx x4bvUtqMAN+Utpqz4CwEvoZJ+5IRQec= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-lcQlRgaiP2aCb_o9OH_MnA-1; Mon, 04 Apr 2022 21:49:16 -0400 X-MC-Unique: lcQlRgaiP2aCb_o9OH_MnA-1 Received: by mail-il1-f200.google.com with SMTP id m3-20020a056e02158300b002b6e3d1f97cso7182240ilu.19 for ; Mon, 04 Apr 2022 18:49:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tYxpsQItjDK7cACelYDmbYGqmACKurZeMNuIXszIjeQ=; b=0xy8rQb4mru1yGfTD7/vfuP0Uq1qARkOd12c5Kv+X2w74shoMs8S5RByv8w+mjDJ4o c0f95QjZycSp4XfXFhS0c8oCl2qhPsx57OztuofK7OwtVAaM1M86MiZ8V8VpwR3MMbbg C7U9kjPzM/LYJIzgB4g6u+KJ6z0ZoLvyUv3EC3AhfJGAYkOKt/RDbqEj+9KaMYV9ZW98 7+EdmfRK0Sp1jGtCbM19XbiUDjuts0wEUKZEKEb6W0gp+cScPtbD2WzVPVyWrykToWkG PQfL0bPTNXpqdMgUWXmGc6oXH5nn/CoVF0OkOVBpj6E/3ODGvNfRk5JgZdzZLTEltcI/ mjNA== X-Gm-Message-State: AOAM533p59PZdkoZp8rGxKPwRThrjrmZeuqWact0i6FfkcRdGfrbCV09 2ZVwBfsGO+3hL71v2ib7nYLa3D5bFaDqJa4uQdSXdiIYPejk09uBucmGk9OcYnK0B5uLFTdSF67 y3JwFB6JG/n1/Eeg9KktVoSxXgnEKrxzTBze/MchErgtWsTqyWCB5Og7ZjX+mwfIwk1dDT2HF7A == X-Received: by 2002:a92:c545:0:b0:2ca:1066:3d6 with SMTP id a5-20020a92c545000000b002ca106603d6mr616140ilj.229.1649123355310; Mon, 04 Apr 2022 18:49:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDsITZBgZ19aGlVu4ZTn7lfD33BDzVrxkkBzyP8LrQwVfSiH2pEhd9gNDZ7UGnRWBIoTtlXw== X-Received: by 2002:a92:c545:0:b0:2ca:1066:3d6 with SMTP id a5-20020a92c545000000b002ca106603d6mr616119ilj.229.1649123355058; Mon, 04 Apr 2022 18:49:15 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id m5-20020a92c525000000b002ca19cc6e43sm5578396ili.20.2022.04.04.18.49.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:14 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Date: Mon, 4 Apr 2022 21:49:12 -0400 Message-Id: <20220405014912.14815-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Teach hugetlbfs code to wr-protect none ptes just in case the page cache existed for that pte. Meanwhile we also need to be able to recognize a uff= d-wp marker pte and remove it for uffd_wp_resolve. Since at it, introduce a variable "psize" to replace all references to the = huge page size fetcher. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9317b790161d..578c48ef931a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6225,7 +6225,7 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, pte_t *ptep; pte_t pte; struct hstate *h =3D hstate_vma(vma); - unsigned long pages =3D 0; + unsigned long pages =3D 0, psize =3D huge_page_size(h); bool shared_pmd =3D false; struct mmu_notifier_range range; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; @@ -6245,13 +6245,19 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, =20 mmu_notifier_invalidate_range_start(&range); i_mmap_lock_write(vma->vm_file->f_mapping); - for (; address < end; address +=3D huge_page_size(h)) { + for (; address < end; address +=3D psize) { spinlock_t *ptl; - ptep =3D huge_pte_offset(mm, address, huge_page_size(h)); + ptep =3D huge_pte_offset(mm, address, psize); if (!ptep) continue; ptl =3D huge_pte_lock(h, mm, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { + /* + * When uffd-wp is enabled on the vma, unshare + * shouldn't happen at all. Warn about it if it + * happened due to some reason. + */ + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); pages++; spin_unlock(ptl); shared_pmd =3D true; @@ -6281,12 +6287,20 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, else if (uffd_wp_resolve) newpte =3D pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, - newpte, huge_page_size(h)); + newpte, psize); pages++; } spin_unlock(ptl); continue; } + if (unlikely(pte_marker_uffd_wp(pte))) { + /* + * This is changing a non-present pte into a none pte, + * no need for huge_ptep_modify_prot_start/commit(). + */ + if (uffd_wp_resolve) + huge_pte_clear(mm, address, ptep, psize); + } if (!huge_pte_none(pte)) { pte_t old_pte; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); @@ -6300,6 +6314,12 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, pte =3D huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; + } else { + /* None pte */ + if (unlikely(uffd_wp)) + /* Safe to modify directly (none->non-present). */ + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); } --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D50D3C433EF for ; Tue, 5 Apr 2022 02:42:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229981AbiDECoc (ORCPT ); Mon, 4 Apr 2022 22:44:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229962AbiDECnf (ORCPT ); Mon, 4 Apr 2022 22:43:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4030F3A37A4 for ; Mon, 4 Apr 2022 18:49:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2hUkP0cgtQwlbYC5UnwB6jjYdUXBTnNxAup2wyXqQZs=; b=H7fBJWYWCzHwWyTHtrz97oEcgfcSwc8y4dzLJq21ixoObioUmX2WF9+u70HaUPKNeOU2Fu 2YhmVkbrKDj0VblLK0xMLgyQfcwjWabp9EX2uaaeq5x4mvyY73s5dF+XHvrZdrVUn177FO puCi2pj/qb8CJbpCHNbgEldFQ6+9q64= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-627-HJvA1abNNLaSql0eQ_sK4A-1; Mon, 04 Apr 2022 21:49:19 -0400 X-MC-Unique: HJvA1abNNLaSql0eQ_sK4A-1 Received: by mail-il1-f197.google.com with SMTP id d13-20020a056e02214d00b002ca4d440f73so1848533ilv.15 for ; Mon, 04 Apr 2022 18:49:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2hUkP0cgtQwlbYC5UnwB6jjYdUXBTnNxAup2wyXqQZs=; b=IlY8ngIBkKUGZyp9gDqVasHXVM27XD1JLb/KRhUsZYQ4Ij+kzC+uuSmZkSqQ1QECqf HkaGAImKXvx9qy3gcFvFeczFfM+nvYYG6yR8zAPZLXZyaLDaM5o1edo/5a79BktNkrVR mswgrKZR4SWmspBaoa2P+qYuUbBOFeEd3Dp6vJMIFh/F1I87xdv4HpbkW/zli7cmwK9D 892So5QW9uVgMjkD65DShNCTfiK72NdcnONdpOEclH7clk97feSlCeX7UsrEFrYDGggI 9SUr+9Zk6P0uinPHtNxQ74lQpeCm+/pcBiAeSO8TiOk0sCofaJ/PxOvXpkXFPfFwkv3G rxvA== X-Gm-Message-State: AOAM533bfI2otFo+ucgCCBMhqh/6ah+ylZYTMH7+8LQxSXzuC9vNDHcT hvwncAz1i17+CqtGL4Z0fXOkn8eZ0HU2UJbSE9KVeESFKpxMCKlXfl6RbiX1hkVxEvmoknwkpLE prY7+3OKxjHABV+LM3MZ4tawdVJCpAzu/U0zBcngtpv6FXVALMP/X980dpWB+aOX1Yk3iwMZv+g == X-Received: by 2002:a02:85ac:0:b0:323:4099:dee0 with SMTP id d41-20020a0285ac000000b003234099dee0mr636222jai.189.1649123358273; Mon, 04 Apr 2022 18:49:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx10kJiNJ151ApjRhb7SjuIvmsEDPjVdezJB8ZAXTGrM1Yb6bc+vTyyR4bDx+mEDmR1g9tcNQ== X-Received: by 2002:a02:85ac:0:b0:323:4099:dee0 with SMTP id d41-20020a0285ac000000b003234099dee0mr636201jai.189.1649123357903; Mon, 04 Apr 2022 18:49:17 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id a3-20020a5ec303000000b006496b4dd21csm7250821iok.5.2022.04.04.18.49.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:17 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Date: Mon, 4 Apr 2022 21:49:15 -0400 Message-Id: <20220405014915.14873-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte = if unmapping an entire vma or synchronized such that faults can not race with = the unmap operation. This requires passing zap_flags all the way to the lowest level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronizat= ion. Later when hole punch actually removes the page from the file, it will chec= k to see if there was a subsequent fault and if so take the hugetlb fault mutex while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARKER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unmap= a hugetlb range" is (IMHO): we should never reach a state when a page fault c= ould errornously fault in a page-cache page that was wr-protected to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_lis= t(), because if a page faults after that call and before remove_inode_hugepages(= ) is executed, the page cache can be mapped writable again in the small racy win= dow, that can cause unexpected data overwritten. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 33 +++++++++++++++++++++++++-------- mm/memory.c | 5 ++++- 4 files changed, 43 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 99c7477cee5c..8b5b9df2be7d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page) } =20 static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t = end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t = end, + unsigned long zap_flags) { struct vm_area_struct *vma; =20 @@ -438,7 +439,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgof= f_t start, pgoff_t end) } =20 unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } =20 @@ -516,7 +517,8 @@ static void remove_inode_hugepages(struct inode *inode,= loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); } =20 @@ -582,7 +584,8 @@ static void hugetlb_vmtruncate(struct inode *inode, lof= f_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -615,8 +618,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, l= off_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 38c5ac28b787..ab48b3bbb0e6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -143,11 +143,12 @@ long follow_hugetlb_page(struct mm_struct *, struct v= m_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -400,7 +401,8 @@ static inline unsigned long hugetlb_change_protection( =20 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 578c48ef931a..e4af8b357b90 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4947,7 +4947,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *v= ma, =20 static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_= struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long address; @@ -5003,7 +5003,18 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct * unmapped and its refcount is dropped, so just clear pte here. */ if (unlikely(!pte_present(pte))) { - huge_pte_clear(mm, address, ptep, sz); + /* + * If the pte was wr-protected by uffd-wp in any of the + * swap forms, meanwhile the caller does not want to + * drop the uffd-wp bit in this zap, then replace the + * pte with a marker. + */ + if (pte_swp_uffd_wp_any(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); + else + huge_pte_clear(mm, address, ptep, sz); spin_unlock(ptl); continue; } @@ -5031,7 +5042,11 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); - + /* Leave a uffd-wp pte marker if needed */ + if (huge_pte_uffd_wp(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); hugetlb_count_sub(pages_per_huge_page(h), mm); page_remove_rmap(page, vma, true); =20 @@ -5065,9 +5080,10 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct =20 void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); =20 /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -5083,12 +5099,13 @@ void __unmap_hugepage_range_final(struct mmu_gather= *tlb, } =20 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; =20 tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } =20 @@ -5143,7 +5160,7 @@ static void unmap_ref_private(struct mm_struct *mm, s= truct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index 8ba1bb196095..9808edfe18d4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1675,8 +1675,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags =3D details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 336F4C433F5 for ; Tue, 5 Apr 2022 02:42:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230004AbiDECop (ORCPT ); Mon, 4 Apr 2022 22:44:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbiDECnl (ORCPT ); Mon, 4 Apr 2022 22:43:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 946163A37B8 for ; Mon, 4 Apr 2022 18:49:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zpk9Li6fgqH75i55vcn3N+1Xng1FM6T4OSs35V/WwSY=; b=AuyFPVawAUTMJ7DgjsnuWVKTP92a4zATGCq9czhbUkniyPNlNBx8F7G2mJl+u9W8WICXeS PnreGcHGxhEvIyhXlRc6vGzjJu8bqKvesjgoUDFNjcl0DD6f7il9R7XTtbx+uijcYy9L01 rN5PbCOzv85ng9HbNSZpMlsIM5wiEMQ= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-629-xScWbi0WPTSzR6cwXTV8kw-1; Mon, 04 Apr 2022 21:49:21 -0400 X-MC-Unique: xScWbi0WPTSzR6cwXTV8kw-1 Received: by mail-io1-f70.google.com with SMTP id x16-20020a6bfe10000000b006409f03e39eso7462851ioh.7 for ; Mon, 04 Apr 2022 18:49:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Zpk9Li6fgqH75i55vcn3N+1Xng1FM6T4OSs35V/WwSY=; b=VKpz4gxTpIoimHG+r1BGZLVN49XDMGXj/hlGnwqg+ANI7JGX+Nsb/3TyXXi4dAQlvi iVcbShGir11ibLLbFb+Zypr1s7K/MMuQYPwUJvRDwyCrCZ6Bm8kv+c9LYlqMKpvz+41n S0SNThP9FS2Edv2zslwtc43X5vD35VHousWgQe2IfrZFkaJxqn2jb3rJ50SO6vwb+T2O yv/1EwfstFcZN6nwh6ebGKAtlyXV2eFOW/oxaV3atqTpToBCpjnnQHDuAswX7QZZToo7 cR37LSBn/3fFXRTtgvUzrHVjovlcN0VyTD2pI8a6xLCcrUr/np36PG9PeQNJryhPSWpM OvEw== X-Gm-Message-State: AOAM531xjWnj6liRYlgIo26L75Jcu2/wQzR8D9DN2getyA9Ac7HRiZvl pSJmMhptdB6JEY706jqmyqZhPev/uWaEEU20PawPGWZjN4oIdVT1qbFkJHNBAFP5DRx9y2kpS+P UJ9GCw58MAfFuVDH+L1EyrBQWkWHxtvDALzzInIWTgTxbOCZqEaTQ3UpM8bY7EIda3Q8oTduWfw == X-Received: by 2002:a05:6638:164b:b0:323:ac42:8d4b with SMTP id a11-20020a056638164b00b00323ac428d4bmr693516jat.75.1649123361107; Mon, 04 Apr 2022 18:49:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDhrnP+UnjED6dnxAUMKpLm5xwk/WcwG2E8d8U2B2Tf/WxM+ICv/y59UNBKP9S5t8sm4pR1A== X-Received: by 2002:a05:6638:164b:b0:323:ac42:8d4b with SMTP id a11-20020a056638164b00b00323ac428d4bmr693489jat.75.1649123360806; Mon, 04 Apr 2022 18:49:20 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id t1-20020a056e02060100b002ca41adce5dsm2355369ils.8.2022.04.04.18.49.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:20 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 18/23] mm/hugetlb: Handle uffd-wp during fork() Date: Mon, 4 Apr 2022 21:49:18 -0400 Message-Id: <20220405014918.14932-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() becau= se for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 +++++-- mm/hugetlb.c | 42 +++++++++++++++++++++++++++-------------- mm/memory.c | 2 +- 3 files changed, 34 insertions(+), 17 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ab48b3bbb0e6..6df51d23b7ee 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct vm_area_struct *new_vma, unsigned long old_addr, unsigned long new_addr, unsigned long len); -int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct= vm_area_struct *); +int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, + struct vm_area_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, @@ -268,7 +269,9 @@ static inline struct page *follow_huge_addr(struct mm_s= truct *mm, } =20 static inline int copy_hugetlb_page_range(struct mm_struct *dst, - struct mm_struct *src, struct vm_area_struct *vma) + struct mm_struct *src, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e4af8b357b90..e1571179698a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4706,23 +4706,24 @@ hugetlb_install_page(struct vm_area_struct *vma, pt= e_t *ptep, unsigned long addr } =20 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; struct page *ptepage; unsigned long addr; - bool cow =3D is_cow_mapping(vma->vm_flags); - struct hstate *h =3D hstate_vma(vma); + bool cow =3D is_cow_mapping(src_vma->vm_flags); + struct hstate *h =3D hstate_vma(src_vma); unsigned long sz =3D huge_page_size(h); unsigned long npages =3D pages_per_huge_page(h); - struct address_space *mapping =3D vma->vm_file->f_mapping; + struct address_space *mapping =3D src_vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret =3D 0; =20 if (cow) { - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src, - vma->vm_start, - vma->vm_end); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, + src_vma->vm_start, + src_vma->vm_end); mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); @@ -4736,12 +4737,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, i_mmap_lock_read(mapping); } =20 - for (addr =3D vma->vm_start; addr < vma->vm_end; addr +=3D sz) { + for (addr =3D src_vma->vm_start; addr < src_vma->vm_end; addr +=3D sz) { spinlock_t *src_ptl, *dst_ptl; src_pte =3D huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte =3D huge_pte_alloc(dst, vma, addr, sz); + dst_pte =3D huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret =3D -ENOMEM; break; @@ -4776,6 +4777,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } else if (unlikely(is_hugetlb_entry_migration(entry) || is_hugetlb_entry_hwpoisoned(entry))) { swp_entry_t swp_entry =3D pte_to_swp_entry(entry); + bool uffd_wp =3D huge_pte_uffd_wp(entry); =20 if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -4785,10 +4787,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, swp_entry =3D make_readable_migration_entry( swp_offset(swp_entry)); entry =3D swp_entry_to_pte(swp_entry); + if (userfaultfd_wp(src_vma) && uffd_wp) + entry =3D huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, entry, sz); } + if (!userfaultfd_wp(dst_vma) && uffd_wp) + entry =3D huge_pte_clear_uffd_wp(entry); set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + } else if (unlikely(is_pte_marker(entry))) { + /* + * We copy the pte marker only if the dst vma has + * uffd-wp enabled. + */ + if (userfaultfd_wp(dst_vma)) + set_huge_pte_at(dst, addr, dst_pte, entry); } else { entry =3D huge_ptep_get(src_pte); ptepage =3D pte_page(entry); @@ -4806,20 +4819,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, */ if (!PageAnon(ptepage)) { page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, vma)) { + } else if (page_try_dup_anon_rmap(ptepage, true, + src_vma)) { pte_t src_pte_old =3D entry; struct page *new; =20 spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new =3D alloc_huge_page(vma, addr, 1); + new =3D alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { put_page(ptepage); ret =3D PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, vma, + copy_user_huge_page(new, ptepage, addr, dst_vma, npages); put_page(ptepage); =20 @@ -4829,13 +4843,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry =3D huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, vma, addr, + restore_reserve_on_error(h, dst_vma, addr, new); put_page(new); /* dst_entry won't change as in child */ goto again; } - hugetlb_install_page(vma, dst_pte, addr, new); + hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; diff --git a/mm/memory.c b/mm/memory.c index 9808edfe18d4..d1e9c2517dfb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1278,7 +1278,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma) return 0; =20 if (is_vm_hugetlb_page(src_vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, src_vma); + return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); =20 if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { /* --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29865C433F5 for ; Tue, 5 Apr 2022 02:43:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229768AbiDECoy (ORCPT ); Mon, 4 Apr 2022 22:44:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229992AbiDECnm (ORCPT ); Mon, 4 Apr 2022 22:43:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7D8793A52FF for ; Mon, 4 Apr 2022 18:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123365; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bWfhkJFq1BCdKqisyymaNspfijHKW5rO8LQgLYY4Yk0=; b=W2dP0PEOeXHbAG72TsAXZKmEm1YsqSNstkbmfUYGZ7Eskh8zuKKJWkiRue5+Ghk4VKNl+x ZqGmACvE/4vpKIrn8LusZatDMb7daItKWPumTolgX8xkk2z/KmSfClk3LSalAC9r6V6/Es mKLpJOn2F2Z2PzuXhoxy61DqIZlxE+c= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-505--UYCS4FZPnale-CgzcWn_A-1; Mon, 04 Apr 2022 21:49:24 -0400 X-MC-Unique: -UYCS4FZPnale-CgzcWn_A-1 Received: by mail-il1-f200.google.com with SMTP id l8-20020a056e020dc800b002ca4c433357so1956705ilj.23 for ; Mon, 04 Apr 2022 18:49:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bWfhkJFq1BCdKqisyymaNspfijHKW5rO8LQgLYY4Yk0=; b=63J5yNcoIpDPaiB39F3XcHIydutVnAnp8wybnfZQFWomz0xuqqQwS2/APGvuSHG9wh 57VM1Wud3Gf492R/8CaY322otS5wVtBjOZnvYDZf+Ls0bIqedYN2QrB1/O1/gESGWXUw FGIRpfBMbAGP13RsGQWdb+YRjKKOlIKjrZVS65AuTeQMyD/fCkSU2FzWosvudGQwHJsh f9LJjamo9unvT+hRPXvJ0Nz+xSejXyIAcxRInRbYxaSeMCCqPLnBYGiX8qDGrPr1Nfpf UwiGsUl/WhGlZNP9+PEQdOTbCioqrJbBnRoi2o53kPxdDn8jcUS9u+koqMs/9g3Hj2LV x0bQ== X-Gm-Message-State: AOAM53220A+WUsCUY7mmW6Un4v3UPrVo83IDgz7PRgiJiE1JeCk6UjqQ u1RHi3e41xC91gLp8OQxoSAovHtzK/G61/z0l1QxPIeW6rVg3TJqOecGw60E99QznMABIYaPmhd 808UBUp2I0ArvjRgZcc228UiUUg/BN6YmSoko3OzzBpHh9Xu2GNpQ4BXMGu/K4bWqP/akHN06oA == X-Received: by 2002:a02:cd12:0:b0:321:29bd:b5ae with SMTP id g18-20020a02cd12000000b0032129bdb5aemr659611jaq.83.1649123363651; Mon, 04 Apr 2022 18:49:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFo84QPKUst3NxwRFnpoGwYsq1cNnk6B7cOmPSPQoeut9h5XdJGOgIWmVZYMgQ4peNHpQKkw== X-Received: by 2002:a02:cd12:0:b0:321:29bd:b5ae with SMTP id g18-20020a02cd12000000b0032129bdb5aemr659588jaq.83.1649123363425; Mon, 04 Apr 2022 18:49:23 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id z12-20020a92d18c000000b002ca3ac378e2sm2852863ilz.76.2022.04.04.18.49.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Date: Mon, 4 Apr 2022 21:49:21 -0400 Message-Id: <20220405014921.14994-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When we're trying to collapse a 2M huge shmem page, don't retract pgtable p= md page if it's registered with uffd-wp, because that pgtable could have pte markers installed. Recycling of that pgtable means we'll lose the pte mark= ers. That could cause data loss for an uffd-wp enabled application on shmem. Instead of disabling khugepaged on these files, simply skip retracting these special VMAs, then the page cache can still be merged into a huge thp, and other mm/vma can still map the range of file with a huge thp when proper. Note that checking VM_UFFD_WP needs to be done with mmap_sem held for write, that avoids race like: khugepaged user thread =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D check VM_UFFD_WP, not set UFFDIO_REGISTER with uffd-wp on shmem wr-protect some pages (install marke= rs) take mmap_sem write lock erase pmd and free pmd page --> pte markers are dropped unnoticed! Signed-off-by: Peter Xu --- mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 04a972259136..d7c5bb9fd1fb 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1464,6 +1464,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; =20 + /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ + if (userfaultfd_wp(vma)) + return; + hpage =3D find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1599,7 +1603,15 @@ static void retract_page_tables(struct address_space= *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + /* + * When a vma is registered with uffd-wp, we can't + * recycle the pmd pgtable because there can be pte + * markers installed. Skip it only, so the rest mm/vma + * can still have the same file mapped hugely, however + * it'll always mapped in small page size for uffd-wp + * registered ranges. + */ + if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48741C433EF for ; Tue, 5 Apr 2022 02:43:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229906AbiDECo6 (ORCPT ); Mon, 4 Apr 2022 22:44:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229885AbiDECoS (ORCPT ); Mon, 4 Apr 2022 22:44:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2C920348B21 for ; Mon, 4 Apr 2022 18:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123368; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kYLbDoazIMJSeMNhJ5zt7FEdCu3V+kUqTKXo5M6GV7A=; b=DsdZCpek7tReqMpcMcautA9qE+ABAufXDD7EE6NYxl/i1OzJsvKW0xd84rZanEbHSRMN36 uA1OTtOKUvauvvVrT4BG/xlWwDOCAB8HsiPblEsUta7m/DyjKg9Esdd2mafw0jVVme1KNk QjpyTy4v8m243zQr1luJfxZENUuWg/4= Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-112-evMN9tRyPISt4wWNlEBifA-1; Mon, 04 Apr 2022 21:49:27 -0400 X-MC-Unique: evMN9tRyPISt4wWNlEBifA-1 Received: by mail-io1-f71.google.com with SMTP id z16-20020a05660217d000b006461c7cbee3so7400554iox.21 for ; Mon, 04 Apr 2022 18:49:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kYLbDoazIMJSeMNhJ5zt7FEdCu3V+kUqTKXo5M6GV7A=; b=ZRTXb88HF4HgCMk56trmVtXr9mpqnH7jOZdR9MHssBepVxq31T+dXXwiWW6oCxKPBb xmeHB1e0+w/XADcH9knQthOYMo26cIsgb/QzL858qUq0ZG1uplRz9/un7LBCO/25S7Yx 5y2KR5NbjxOUReLf9idvxuLh3K6HskQ63XUL0WeNNbpGGOKRE2AmsRCWSqOjVfPwqzYC GSPKuRGYC0GhcQDRgQIoN9kQofgLUdTeLxoNRI6Rc+AkB+C+TRHZmXVna68XbctyC/Z9 68fVx9e1FDVUX36AksNAMdYzvhsTAo1c4uU4STuVk47JLkhDGreZO7uN4c0YXCT5OAIi y/oQ== X-Gm-Message-State: AOAM531yUmQCch5bFN2DUTtdWws7xwE9Al9M7tKN+mmENCBDEC7TLTLZ BAPkTQxBS7iBIjh/kA1mSr56ZYqGw1k+D2BEhj/63ZWG2Gg1WrShoEhTurGDWMpwMR1ggpmBNZ3 BEbLaL49Jpy9E1koxT+QrdMNDEInmRIYdkaq3CbQcRTxTWGuh8X0pl2CqqQFW+ZLKvndiOLgelA == X-Received: by 2002:a05:6602:2c0d:b0:60f:6ac8:ad05 with SMTP id w13-20020a0566022c0d00b0060f6ac8ad05mr539106iov.175.1649123366511; Mon, 04 Apr 2022 18:49:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyIt71qNBgRyuVjW7pMNtAgATJ9MNf0M9CvhTW16IXYsjPRDdAvOlDVVqXJGfZlYTfnSe8R1g== X-Received: by 2002:a05:6602:2c0d:b0:60f:6ac8:ad05 with SMTP id w13-20020a0566022c0d00b0060f6ac8ad05mr539087iov.175.1649123366302; Mon, 04 Apr 2022 18:49:26 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id m9-20020a0566022ac900b0064cf3d9f35fsm2767620iov.35.2022.04.04.18.49.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:26 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Date: Mon, 4 Apr 2022 21:49:23 -0400 Message-Id: <20220405014923.15047-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This requires the pagemap code to be able to recognize the newly introduced swap special pte for uffd-wp, meanwhile the general case for hugetlb that we recently start to support. It should make pagemap uffd-wp support complete. Signed-off-by: Peter Xu --- fs/proc/task_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..194dfd7abf2b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1421,6 +1421,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pa= gemapread *pm, migration =3D is_migration_entry(entry); if (is_pfn_swap_entry(entry)) page =3D pfn_swap_entry_to_page(entry); + if (pte_marker_entry_uffd_wp(entry)) + flags |=3D PM_UFFD_WP; } =20 if (page && !PageAnon(page)) @@ -1556,10 +1558,15 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, if (page_mapcount(page) =3D=3D 1) flags |=3D PM_MMAP_EXCLUSIVE; =20 + if (huge_pte_uffd_wp(pte)) + flags |=3D PM_UFFD_WP; + flags |=3D PM_PRESENT; if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); + } else if (pte_swp_uffd_wp_any(pte)) { + flags |=3D PM_UFFD_WP; } =20 for (; addr !=3D end; addr +=3D PAGE_SIZE) { --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A52C433EF for ; Tue, 5 Apr 2022 02:43:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230159AbiDECpD (ORCPT ); Mon, 4 Apr 2022 22:45:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230101AbiDECoW (ORCPT ); Mon, 4 Apr 2022 22:44:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 29FCC3B039D for ; Mon, 4 Apr 2022 18:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123371; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vOyz+2xZD3mxaBpxdZs8qkqfXmUFPooQ6m/cX5syVEc=; b=AtbNT0JRwMl3vr7k+lXM0TF8tUwuDXxca0nGsi0ODSEiK2XD6ekXJsBK5HDaaqjYG6GR2k z8keovfBAdRYJrhJLWe5HDpaH/6rVIlOxBHMbW+56XYf19VEWXnW9a+c+p7R77RYha9bwO eNcj0CjeedRSdk78uBWQoYLiRNlukXk= Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-638-buIkkurMMPK7zb-o0uSR5g-1; Mon, 04 Apr 2022 21:49:30 -0400 X-MC-Unique: buIkkurMMPK7zb-o0uSR5g-1 Received: by mail-il1-f198.google.com with SMTP id y19-20020a056e02119300b002c2d3ef05bfso7173836ili.18 for ; Mon, 04 Apr 2022 18:49:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vOyz+2xZD3mxaBpxdZs8qkqfXmUFPooQ6m/cX5syVEc=; b=qba0npl3YQj/mtrxyT1YRO7lfTKpp5rS/dmROjrfRb1vNrzBjG/Q9Tue3F0SAGhKmU 2oanFv3fyUuCydXMcPmfnUQIBT5xd72s21/9iwIx/iOZQDa8zvUVFSFWyXlI/y6qrPA5 NEPv+Dd+R/M5Pzskyw8XE29F1aNx5OZ0b2pCD65c5FfDavCBnHhZ1V2jPtKt+/lRHYh5 0qgAsJbC2WQX3vNyGF6YQROYTJAX6t0t86EeLjAnGl0NTb/U5Uwv9yi0GUk8bBaMquGX knYUucAGNdC7oth7auJd4MMI7efAa+DNKhwp2VGFcBgUqtPdtGtcKMR1eYyfpnYMMa/x o/5w== X-Gm-Message-State: AOAM5339b1hTi3dzqvQT9SEdRyxmbuSAoBOyAJYTs6svcQTcQN/Al0+l 5g9xwQxqwHptxk8x36rahqKCu/74w/Yx0DYlF24IhHGJn7K45tSAXylDJCTr1YBi3sz+8AvphmP 7F7gi3Q+REMIoVVowDZJCHb/UUmJIYgpMGamBR+OaU0vfKqIbmRYk7vfBmwLTpLeTVeFm9I14zA == X-Received: by 2002:a05:6e02:156b:b0:2c7:bea3:4e3f with SMTP id k11-20020a056e02156b00b002c7bea34e3fmr552959ilu.297.1649123369556; Mon, 04 Apr 2022 18:49:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzTG3iAZemJX0eJmfE0/JA/EKpfb5j6PGqOBIGRnudHQh5V4DM77svm07A1N03QJ/sTzCyG+Q== X-Received: by 2002:a05:6e02:156b:b0:2c7:bea3:4e3f with SMTP id k11-20020a056e02156b00b002c7bea34e3fmr552932ilu.297.1649123369224; Mon, 04 Apr 2022 18:49:29 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id s12-20020a92cbcc000000b002bd04428740sm6652376ilq.80.2022.04.04.18.49.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:29 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Date: Mon, 4 Apr 2022 21:49:26 -0400 Message-Id: <20220405014926.15101-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We've had all the necessary changes ready for both shmem and hugetlbfs. Tu= rn on all the shmem/hugetlbfs switches for userfaultfd-wp. We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too bec= ause all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 21 +++------------------ include/linux/userfaultfd_k.h | 20 ++++++++++++++++++++ include/uapi/linux/userfaultfd.h | 10 ++++++++-- mm/userfaultfd.c | 9 +++------ 4 files changed, 34 insertions(+), 26 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8b4a94f5a238..fb45522a2b44 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1257,24 +1257,6 @@ static __always_inline int validate_range(struct mm_= struct *mm, return 0; } =20 -static inline bool vma_can_userfault(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - /* FIXME: add WP support to hugetlbfs and shmem */ - if (vm_flags & VM_UFFD_WP) { - if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) - return false; - } - - if (vm_flags & VM_UFFD_MINOR) { - if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) - return false; - } - - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); -} - static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { @@ -1955,6 +1937,9 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; +#endif +#ifndef CONFIG_PTE_MARKER_UFFD_WP + uffdio_api.features &=3D ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM; #endif uffdio_api.ioctls =3D UFFD_API_IOCTLS; ret =3D -EFAULT; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 827e38b7be65..ea11bed9bb7e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -18,6 +18,7 @@ #include #include #include +#include =20 /* The set of all possible UFFD-related VM flags. */ #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) @@ -140,6 +141,25 @@ static inline bool userfaultfd_armed(struct vm_area_st= ruct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } =20 +static inline bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (vm_flags & VM_UFFD_MINOR) + return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); + +#ifndef CONFIG_PTE_MARKER_UFFD_WP + /* + * If user requested uffd-wp but not enabled pte markers for + * uffd-wp, then shmem & hugetlbfs are not supported but only + * anonymous. + */ + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) + return false; +#endif + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); +} + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); =20 diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index ef739054cb1c..7d32b1e797fb 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -33,7 +33,8 @@ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ - UFFD_FEATURE_EXACT_ADDRESS) + UFFD_FEATURE_EXACT_ADDRESS | \ + UFFD_FEATURE_WP_HUGETLBFS_SHMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -47,7 +48,8 @@ #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ - (__u64)1 << _UFFDIO_CONTINUE) + (__u64)1 << _UFFDIO_CONTINUE | \ + (__u64)1 << _UFFDIO_WRITEPROTECT) =20 /* * Valid ioctl command number range with this API is from 0x00 to @@ -194,6 +196,9 @@ struct uffdio_api { * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page * faults would be provided and the offset within the page would not be * masked. + * + * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd + * write-protection mode is supported on both shmem and hugetlbfs. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -207,6 +212,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) +#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) __u64 features; =20 __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 58d67f2bf980..156e9bdf9f23 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -730,15 +730,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, uns= igned long start, =20 err =3D -ENOENT; dst_vma =3D find_dst_vma(dst_mm, start, len); - /* - * Make sure the vma is not shared, that the dst range is - * both valid and fully within a single existing vma. - */ - if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + + if (!dst_vma) goto out_unlock; if (!userfaultfd_wp(dst_vma)) goto out_unlock; - if (!vma_is_anonymous(dst_vma)) + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) goto out_unlock; =20 if (is_vm_hugetlb_page(dst_vma)) { --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E438C433EF for ; Tue, 5 Apr 2022 02:43:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230032AbiDECpI (ORCPT ); Mon, 4 Apr 2022 22:45:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230106AbiDECoZ (ORCPT ); Mon, 4 Apr 2022 22:44:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B53CDE997D for ; Mon, 4 Apr 2022 18:49:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lW39+rLF3iuwrcyWFAu8xWUbxamG/E8iQp/9Q8XMaVI=; b=HSNJ4zpHFUTSfB84BzQnpFbkRik8AKHp1Rv+joxbvHmigM1XyQQO592lbif9Wq1Bf2+7+v y7c3bnSjm/yJcu+GINPqU9GqPUbWKopAMy4kX3nxx1CaPdWHnv1wqc6PLpURrKqp72YeeT s4vcIAZF1leDWtXQkqsZNdA3sAP7evI= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-308-Y-IEES_gM6KYpTvQFdtZLQ-1; Mon, 04 Apr 2022 21:49:33 -0400 X-MC-Unique: Y-IEES_gM6KYpTvQFdtZLQ-1 Received: by mail-io1-f70.google.com with SMTP id w28-20020a05660205dc00b00645d3cdb0f7so7424553iox.10 for ; Mon, 04 Apr 2022 18:49:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lW39+rLF3iuwrcyWFAu8xWUbxamG/E8iQp/9Q8XMaVI=; b=a+mEN3d+Qfxg2vZBzLaSG0f11jxQp3Nm6qP3jgEgnd3KVWqRWkvqUUcV2q0WpkC5h1 aJ3M2uea9OCNEamVnCFJxT1Ir82UxX08mJBb8hD9odSHsCxHBSmz7FqjICBYwNqqfm+A woz7o5QJCOd+pP0Z82REc+TDojttEIIinQh841kkpA1qJAIO5ZqFJcWFv0g67euochvD 2sLEsCJFlGxWRbJHcsY5twjj6k/cqRG4wVod6vwJHB9w/CIY3mie+NRWphOVvuB4Xdqn GRC14hQUcf2PKF80yMRKeCh622TX5XHUbv/PX0WdiEx4/R9qSxBPhic7sxaWjtNk3ted Skog== X-Gm-Message-State: AOAM531vk4mjDOlLYQ75kCkPQ3mhPpAxO6AnX+12adwC0RReAy9sKBb6 fBC5MsPNFaRQPfE8q6FYuMrdlQTkPF1QGfDSWcaBRtDC9fhz1ZU86tSE4ajWIFGwZKtS4K8tQuk sOOoP4GI5xlU+abEs//2JgzF/HrdDg773AhSNr1pADjvc0IeBP56z4V3aLvZluPdDKPiK3JbHww == X-Received: by 2002:a6b:cd0c:0:b0:649:adb8:79eb with SMTP id d12-20020a6bcd0c000000b00649adb879ebmr582217iog.138.1649123372105; Mon, 04 Apr 2022 18:49:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwn1jD8bZrilkR6TJdV2+HFVoMQ6KZBMcYF9DIorLbsiTZnX9G3AePuh+OKuYEiS2DRPtH+RA== X-Received: by 2002:a6b:cd0c:0:b0:649:adb8:79eb with SMTP id d12-20020a6bcd0c000000b00649adb879ebmr582198iog.138.1649123371881; Mon, 04 Apr 2022 18:49:31 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id b11-20020a92c56b000000b002c76a618f52sm6657231ilj.63.2022.04.04.18.49.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:31 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 22/23] mm: Enable PTE markers by default Date: Mon, 4 Apr 2022 21:49:29 -0400 Message-Id: <20220405014929.15158-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Enable PTE markers by default. On x86_64 it means it'll auto-enable PTE_MARKER_UFFD_WP as well. Signed-off-by: Peter Xu Reported-by: Johannes Weiner --- mm/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 6e7c2d59fa96..3eca34c864c5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -911,12 +911,14 @@ config ANON_VMA_NAME =20 config PTE_MARKER bool "Marker PTEs support" + default y =20 help Allows to create marker PTEs for file-backed memory. =20 config PTE_MARKER_UFFD_WP bool "Marker PTEs support for userfaultfd write protection" + default y depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP =20 help --=20 2.32.0 From nobody Fri Jun 19 10:47:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F8C2C433F5 for ; Tue, 5 Apr 2022 02:43:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230051AbiDECpM (ORCPT ); Mon, 4 Apr 2022 22:45:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230005AbiDECou (ORCPT ); Mon, 4 Apr 2022 22:44:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0DC0CEA34A for ; Mon, 4 Apr 2022 18:49:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9fEneeyQr1XaNwKhMDQcOUUZbbgPBUxsqyNto4zx3Io=; b=OUAQt8iiqhobVHP3MutT95TFGpHYoDkfAsY8qnl8O19zSTOF2FDN7ZJ/8M/19N9ZzhtioN bgIUyf+wZd9tU6e8/v95kOHfuPASH/0VYSgDnClBt2AQwUsIsCwj1fbB23tgnr0rkAQuc2 0goKrvWIjhpfIOymQwLgWDXIzQsWz9w= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-451-Myq_7ErbMCuYDx5nwF2OcQ-1; Mon, 04 Apr 2022 21:49:35 -0400 X-MC-Unique: Myq_7ErbMCuYDx5nwF2OcQ-1 Received: by mail-il1-f200.google.com with SMTP id y8-20020a056e020f4800b002ca498c9655so2147608ilj.20 for ; Mon, 04 Apr 2022 18:49:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9fEneeyQr1XaNwKhMDQcOUUZbbgPBUxsqyNto4zx3Io=; b=0isT15sTqSNARIl3w4jOtB7JD0qlyglNcpWucfOAJeKKKBiidrX/5t/Ocp+uJYmkN+ usmIVrWdaAtkvmUjPD2FonnYvur1uR+DSxYlKMoI6K+yae5RmO7l+mrRWN/eAU17Ifku XA4S0gh6uS0cBVrp26+MetuZHbLM3+Y4xNHJYMHxcDpTKXZN+iXYVSP6DnpkUip5fBTJ TfLhRrVlrCaArL4T22OHlZaN29HKqQyNGfVFeMIrJEIYFZ0qLKBbiNEuZg3SE0ActzPG YdH/BPK7VroFW9wS0MNrYGunv76D0HwxMELX+/2veFKowThKgIrRQ2sqsVHOpaKHZSqR 2brA== X-Gm-Message-State: AOAM530CTDpx8p57WFGjfVxBTK5uYkPXYJRkoHMP3wKMPOiKojGoORe4 pMgD2vEZ1zW2KRDH7xhJ62tZAh7safeZPCfJJFLZNF8WG24HqzmZ6dclOvXyj1MyZ3eJFanKK// 7GF60crhDiR5hOM18q9j6m56qgUkOeEsxiN0WaLSyH9y9mt64agIrdaYB77NsY3GMvwj6ww/WUA == X-Received: by 2002:a92:2e01:0:b0:2ca:1f0d:fe5d with SMTP id v1-20020a922e01000000b002ca1f0dfe5dmr531926ile.201.1649123374803; Mon, 04 Apr 2022 18:49:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6gsEsxVDLt+Q8uilC+AQSvDlTB1lbfOZtS3HSLlBkhdkp9NXI433lqCZ74NP5YsBF3yosQw== X-Received: by 2002:a92:2e01:0:b0:2ca:1f0d:fe5d with SMTP id v1-20020a922e01000000b002ca1f0dfe5dmr531898ile.201.1649123374449; Mon, 04 Apr 2022 18:49:34 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id s10-20020a6b740a000000b006413d13477dsm7272806iog.33.2022.04.04.18.49.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Date: Mon, 4 Apr 2022 21:49:32 -0400 Message-Id: <20220405014932.15212-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After we added support for shmem and hugetlbfs, we can turn uffd-wp test on always now. Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selft= ests/vm/userfaultfd.c index 92a4516f8f0d..bbc4a6d8cf7b 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -82,7 +82,7 @@ static int test_type; static volatile bool test_uffdio_copy_eexist =3D true; static volatile bool test_uffdio_zeropage_eexist =3D true; /* Whether to test uffd write-protection */ -static bool test_uffdio_wp =3D false; +static bool test_uffdio_wp =3D true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor =3D false; =20 @@ -1594,8 +1594,6 @@ static void set_test_type(const char *type) if (!strcmp(type, "anon")) { test_type =3D TEST_ANON; uffd_test_ops =3D &anon_uffd_test_ops; - /* Only enable write-protect test for anonymous test */ - test_uffdio_wp =3D true; } else if (!strcmp(type, "hugetlb")) { test_type =3D TEST_HUGETLB; uffd_test_ops =3D &hugetlb_uffd_test_ops; --=20 2.32.0