From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA25C433F5 for ; Fri, 4 Mar 2022 05:17:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238024AbiCDFSa (ORCPT ); Fri, 4 Mar 2022 00:18:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238013AbiCDFSX (ORCPT ); Fri, 4 Mar 2022 00:18:23 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 40C1115A11 for ; Thu, 3 Mar 2022 21:17:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KCL3KButghAabZJudedPY5ytCZQMosbpyIRioh4blvI=; b=OZ/8KC/oFGtaJEm8fgiGpsGBZhrH1+4xALXGN/2CI5z/peeeXH/nCSExQo4IJ7bO/Eex3R YEv5vovVvN1mmu/JmYezjoIa9sESWR1cBlVvni1n3hlTeETOkUDB7iTwdEl6+G2xe2Lmkk gACIp1bqia+c20/WKaWxcZFI8S+C/H8= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-473-iqUB8XQHNZ-kYkVE-BhsYQ-1; Fri, 04 Mar 2022 00:17:32 -0500 X-MC-Unique: iqUB8XQHNZ-kYkVE-BhsYQ-1 Received: by mail-pf1-f200.google.com with SMTP id i72-20020a62874b000000b004f66c5b963cso2695112pfe.6 for ; Thu, 03 Mar 2022 21:17:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KCL3KButghAabZJudedPY5ytCZQMosbpyIRioh4blvI=; b=2oiGOQQ/XD404kg5qvo6lwGou3c09zn9tE6zigcdarx7B829caw9HIQ3izRu2qF6bn RqJq3rf5/wAjhA/byw1nK6ZTEYJetk9n1kphbQOrglsjUawrIrPa1+ykXOE2aDpcYbda FduD0adhmJSYaCBvjvsudoBVQ9nO1MxsknQ0aekiJZeAPWlbc5Rw/Akzh7fxu+fF2+RN 6D0XfhY7VfTXGZAjhHJ/3rxsVseZjIgjEXsTQFTfKen5UyzXE8CYdZ2WmDdCzecvQbiM oB0i7Jf9m3iLGrU4wbwFVCfnVwSP7xa3PKPKW3ZMhdti2sFJiL2V3DWkTTZdcFiawIZc kBuA== X-Gm-Message-State: AOAM532H5+cwnbyIJ/pN+5aGs7R0Wi3FDPm9I7KV3KdX3+mjYZPQSsAv htSv7oD+QMJvSMdhdi/lrjeC92K7Sr7c0Kn/vBA3WJlm7HW6YUd20FjocUd8GTa7hAtWYMJ4+Y/ FD7qI0t/c00G2XP0pbUgD0QTG X-Received: by 2002:a63:2fc1:0:b0:374:9f30:9559 with SMTP id v184-20020a632fc1000000b003749f309559mr33022345pgv.278.1646371050818; Thu, 03 Mar 2022 21:17:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJym7Td/RpnBv1y2IaKZPLGC+ETMIqu+l9tJA1ieJk002T9FcPI25/B0PQuSJLNVv/X4K+wkwg== X-Received: by 2002:a63:2fc1:0:b0:374:9f30:9559 with SMTP id v184-20020a632fc1000000b003749f309559mr33022332pgv.278.1646371050512; Thu, 03 Mar 2022 21:17:30 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.17.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:17:30 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 01/23] mm: Introduce PTE_MARKER swap entry Date: Fri, 4 Mar 2022 13:16:46 +0800 Message-Id: <20220304051708.86193-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch introduces a new swap entry type called PTE_MARKER. It can be installed for any pte that maps a file-backed memory when the pte is temporarily zapped, so as to maintain per-pte information. The information that kept in the pte is called a "marker". Here we define = the marker as "unsigned long" just to match pgoff_t, however it will only work = if it still fits in swp_offset(), which is e.g. currently 58 bits on x86_64. A new config CONFIG_PTE_MARKER is introduced too; it's by default off. A b= unch of helpers are defined altogether to service the rest of the pte marker cod= e. Signed-off-by: Peter Xu --- include/asm-generic/hugetlb.h | 9 ++++ include/linux/swap.h | 15 ++++++- include/linux/swapops.h | 78 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 7 ++++ 4 files changed, 108 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 8e1e6244a89d..f39cad20ffc6 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -2,6 +2,9 @@ #ifndef _ASM_GENERIC_HUGETLB_H #define _ASM_GENERIC_HUGETLB_H =20 +#include +#include + static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) { return mk_pte(page, pgprot); @@ -80,6 +83,12 @@ static inline int huge_pte_none(pte_t pte) } #endif =20 +/* Please refer to comments above pte_none_mostly() for the usage */ +static inline int huge_pte_none_mostly(pte_t pte) +{ + return huge_pte_none(pte) || is_pte_marker(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT static inline pte_t huge_pte_wrprotect(pte_t pte) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 42ebe2d6078d..20b4aceed920 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -55,6 +55,19 @@ static inline int current_is_kswapd(void) * actions on faults. */ =20 +/* + * PTE markers are used to persist information onto PTEs that are mapped w= ith + * file-backed memories. As its name "PTE" hints, it should only be appli= ed to + * the leaves of pgtables. + */ +#ifdef CONFIG_PTE_MARKER +#define SWP_PTE_MARKER_NUM 1 +#define SWP_PTE_MARKER (MAX_SWAPFILES + SWP_HWPOISON_NUM + \ + SWP_MIGRATION_NUM + SWP_DEVICE_NUM) +#else +#define SWP_PTE_MARKER_NUM 0 +#endif + /* * Unaddressable device memory support. See include/linux/hmm.h and * Documentation/vm/hmm.rst. Short description is we need struct pages for @@ -100,7 +113,7 @@ static inline int current_is_kswapd(void) =20 #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM) + SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM) =20 /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d356ab4047f7..5103d2a4ae38 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -247,6 +247,84 @@ static inline int is_writable_migration_entry(swp_entr= y_t entry) =20 #endif =20 +typedef unsigned long pte_marker; + +#define PTE_MARKER_MASK (0) + +#ifdef CONFIG_PTE_MARKER + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + return swp_entry(SWP_PTE_MARKER, marker); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return swp_type(entry) =3D=3D SWP_PTE_MARKER; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return swp_offset(entry) & PTE_MARKER_MASK; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return is_swap_pte(pte) && is_pte_marker_entry(pte_to_swp_entry(pte)); +} + +#else /* CONFIG_PTE_MARKER */ + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + /* This should never be called if !CONFIG_PTE_MARKER */ + WARN_ON_ONCE(1); + return swp_entry(0, 0); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return false; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return 0; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return false; +} + +#endif /* CONFIG_PTE_MARKER */ + +static inline pte_t make_pte_marker(pte_marker marker) +{ + return swp_entry_to_pte(make_pte_marker_entry(marker)); +} + +/* + * This is a special version to check pte_none() just to cover the case wh= en + * the pte is a pte marker. It existed because in many cases the pte mark= er + * should be seen as a none pte; it's just that we have stored some inform= ation + * onto the none pte so it becomes not-none any more. + * + * It should be used when the pte is file-backed, ram-based and backing + * userspace pages, like shmem. It is not needed upon pgtables that do not + * support pte markers at all. For example, it's not needed on anonymous + * memory, kernel-only memory (including when the system is during-boot), + * non-ram based generic file-system. It's fine to be used even there, bu= t the + * extra pte marker check will be pure overhead. + * + * For systems configured with !CONFIG_PTE_MARKER this will be automatical= ly + * optimized to pte_none(). + */ +static inline int pte_none_mostly(pte_t pte) +{ + return pte_none(pte) || is_pte_marker(pte); +} + static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { struct page *p =3D pfn_to_page(swp_offset(entry)); diff --git a/mm/Kconfig b/mm/Kconfig index c313bad5167a..25bcbb89f8e5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -900,6 +900,13 @@ config ANON_VMA_NAME area from being merged with adjacent virtual memory areas due to the difference in their name. =20 +config PTE_MARKER + def_bool n + bool "Marker PTEs support" + + help + Allows to create marker PTEs for file-backed memory. + source "mm/damon/Kconfig" =20 endmenu --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6320DC433F5 for ; Fri, 4 Mar 2022 05:17:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238020AbiCDFSm (ORCPT ); Fri, 4 Mar 2022 00:18:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238025AbiCDFSb (ORCPT ); Fri, 4 Mar 2022 00:18:31 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5894D35268 for ; Thu, 3 Mar 2022 21:17:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1FpwyjJeS3X1Irq3CGuq+3W5FvpvI1rV+CjmQkx/pZ4=; b=IkOEIYsOsbamxNpjtfm6RoJl0hO2lHGfW5hdNcU0QOWW8F46zeywR20lkOXbPNR9pAPcPU EOei5coiiO3YKBki6VvPo7so1znX4nANLNNJtcHTg29dYJDrPU7nio9xcj3B3ZIvkejzgo NfYyn5fVJcYIZhzNkjhPAfofv2kp8Yg= Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-392-E4VfJAsPPdaRD1TSB4DAwQ-1; Fri, 04 Mar 2022 00:17:40 -0500 X-MC-Unique: E4VfJAsPPdaRD1TSB4DAwQ-1 Received: by mail-pl1-f198.google.com with SMTP id i7-20020a170902cf0700b0015163eb319eso4083600plg.18 for ; Thu, 03 Mar 2022 21:17:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1FpwyjJeS3X1Irq3CGuq+3W5FvpvI1rV+CjmQkx/pZ4=; b=g29vU+y46chVDhvZqTdDt9uXG/Ojlbn3Utf0zl861Ftv47zlhkbgQHVjIs5AYUrKXr jT0ZbJoxs+/TycQhp6OQ9LfKFAfG8z5sLw2nNkF7PEvk5tlrAnwEWSnP2nLvCV72UYnZ 6YGUXPwVmdYf5WgJaPMykF3higYPyV87uKSeSyPXsUdFotsmodXntHEv7XFA40N+Rdrb H/eM2c+W/YMfTAac/wwFdhUl54AyYhSuTnoz4B7ZYiSPcCn+5aTe5PWQBzdOkminb7wD Vhp4dS5shG9yErypJl6O1pzdJC46dFT6kUNBY+qEwCrdRpnIfm9Hbv4+hRBLZ5TeuXWB g+fQ== X-Gm-Message-State: AOAM531HBusHbqVDAukKZFb4vlYta6C6lC7obwTH3BOQrEE/w5zLoZPK NYyJlNUzCt0vNEiej2ZFjnt6FBusExZXC9QifjZvakqnPmcW+T9OcxIJj3E96hDS4QG8zNrPuNQ go3/MNYgMor4OhhhcLuCJ/K5U X-Received: by 2002:a17:903:2285:b0:151:4b38:298e with SMTP id b5-20020a170903228500b001514b38298emr29448165plh.36.1646371058818; Thu, 03 Mar 2022 21:17:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJw9mrpZaTeSPHfY/AgvFnrkerVofenDGmydQIn76U/VF813c+ZQ9LQV59XdkRzKTOQM8WpPSg== X-Received: by 2002:a17:903:2285:b0:151:4b38:298e with SMTP id b5-20020a170903228500b001514b38298emr29448139plh.36.1646371058465; Thu, 03 Mar 2022 21:17:38 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.17.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:17:38 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 02/23] mm: Teach core mm about pte markers Date: Fri, 4 Mar 2022 13:16:47 +0800 Message-Id: <20220304051708.86193-3-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch still does not use pte marker in any way, however it teaches the core mm about the pte marker idea. For example, handle_pte_marker() is introduced that will parse and handle a= ll the pte marker faults. Many of the places are more about commenting it up - so that we know there's the possibility of pte marker showing up, and why we don't need special code for the cases. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 10 ++++++---- mm/filemap.c | 5 +++++ mm/hmm.c | 2 +- mm/memcontrol.c | 8 ++++++-- mm/memory.c | 23 +++++++++++++++++++++++ mm/mincore.c | 3 ++- mm/mprotect.c | 3 +++ 7 files changed, 46 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index aa0c47cb0d16..8b4a94f5a238 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -249,9 +249,10 @@ static inline bool userfaultfd_huge_must_wait(struct u= serfaultfd_ctx *ctx, =20 /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (huge_pte_none(pte)) + if (huge_pte_none_mostly(pte)) ret =3D true; if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) ret =3D true; @@ -330,9 +331,10 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, pte =3D pte_offset_map(pmd, address); /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (pte_none(*pte)) + if (pte_none_mostly(*pte)) ret =3D true; if (!pte_write(*pte) && (reason & VM_UFFD_WP)) ret =3D true; diff --git a/mm/filemap.c b/mm/filemap.c index 8f7e6088ee2a..464b8f0f111a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3379,6 +3379,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, vmf->pte +=3D xas.xa_index - last_pgoff; last_pgoff =3D xas.xa_index; =20 + /* + * NOTE: If there're PTE markers, we'll leave them to be + * handled in the specific fault path, and it'll prohibit the + * fault-around logic. + */ if (!pte_none(*vmf->pte)) goto unlock; =20 diff --git a/mm/hmm.c b/mm/hmm.c index af71aac3140e..3fd3242c5e50 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -239,7 +239,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, uns= igned long addr, pte_t pte =3D *ptep; uint64_t pfn_req_flags =3D *hmm_pfn; =20 - if (pte_none(pte)) { + if (pte_none_mostly(pte)) { required_fault =3D hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f79bb3f25ce4..bba3b7e9f699 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5636,10 +5636,14 @@ static enum mc_target_type get_mctgt_type(struct vm= _area_struct *vma, =20 if (pte_present(ptent)) page =3D mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page =3D mc_handle_file_pte(vma, addr, ptent); else if (is_swap_pte(ptent)) page =3D mc_handle_swap_pte(vma, ptent, &ent); - else if (pte_none(ptent)) - page =3D mc_handle_file_pte(vma, addr, ptent); =20 if (!page && !ent.val) return ret; diff --git a/mm/memory.c b/mm/memory.c index a0ca84756159..22d24ea7b87d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -99,6 +99,8 @@ struct page *mem_map; EXPORT_SYMBOL(mem_map); #endif =20 +static vm_fault_t do_fault(struct vm_fault *vmf); + /* * A number of key systems in x86 including ioremap() rely on the assumpti= on * that high_memory defines the upper bound on direct map memory, then end @@ -1419,6 +1421,8 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; + } else if (is_pte_marker_entry(entry)) { + /* By default, simply drop all pte markers when zap */ } else if (is_hwpoison_entry(entry)) { if (!should_zap_cows(details)) continue; @@ -3508,6 +3512,23 @@ static inline bool should_try_to_free_swap(struct pa= ge *page, page_count(page) =3D=3D 2; } =20 +static vm_fault_t handle_pte_marker(struct vm_fault *vmf) +{ + swp_entry_t entry =3D pte_to_swp_entry(vmf->orig_pte); + unsigned long marker =3D pte_marker_get(entry); + + /* + * PTE markers should always be with file-backed memories, and the + * marker should never be empty. If anything weird happened, the best + * thing to do is to kill the process along with its mm. + */ + if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) + return VM_FAULT_SIGBUS; + + /* TODO: handle pte markers */ + return 0; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3544,6 +3565,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret =3D vmf->page->pgmap->ops->migrate_to_ram(vmf); } else if (is_hwpoison_entry(entry)) { ret =3D VM_FAULT_HWPOISON; + } else if (is_pte_marker_entry(entry)) { + ret =3D handle_pte_marker(vmf); } else { print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL); ret =3D VM_FAULT_SIGBUS; diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..736869f4b409 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -121,7 +121,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { pte_t pte =3D *ptep; =20 - if (pte_none(pte)) + /* We need to do cache lookup too for pte markers */ + if (pte_none_mostly(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, vma, vec); else if (pte_present(pte)) diff --git a/mm/mprotect.c b/mm/mprotect.c index b69ce7a7b2b7..6d179c720089 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -184,6 +184,9 @@ static unsigned long change_pte_range(struct vm_area_st= ruct *vma, pmd_t *pmd, newpte =3D pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); + } else if (is_pte_marker_entry(entry)) { + /* Skip it, the same as none pte */ + continue; } else { newpte =3D oldpte; } --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70725C433EF for ; Fri, 4 Mar 2022 05:18:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238048AbiCDFSs (ORCPT ); Fri, 4 Mar 2022 00:18:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238037AbiCDFSm (ORCPT ); Fri, 4 Mar 2022 00:18:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 250484EF51 for ; Thu, 3 Mar 2022 21:17:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DWHI+KLiCdBad7t4Fcw0RAEljy7M5LeKI3evjgamGnQ=; b=B+bXlQG4u/9FTHhB/z+DVzxi/MCY6BMiBknsjQb4BaUk56A7ef7lyyyagWwXYPvCyDt196 lrMm8PmgtcNuoLddAOybflH7plV2Do0cxoisaRoroDqUWc3UrbgUnlSMQHfEaZ2iiTsQGa YWqWzmTnoUsAcQc5gIanYEqI6Jz+n3k= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-MUrnkqA7MsiTPrWNFAGLSQ-1; Fri, 04 Mar 2022 00:17:48 -0500 X-MC-Unique: MUrnkqA7MsiTPrWNFAGLSQ-1 Received: by mail-pl1-f197.google.com with SMTP id x18-20020a170902b41200b0014fc2665bddso4129776plr.0 for ; Thu, 03 Mar 2022 21:17:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DWHI+KLiCdBad7t4Fcw0RAEljy7M5LeKI3evjgamGnQ=; b=AwwBMCjxdnMpAeHthF2BI4q5oyW1qNC81PDyQ+EYE+L5G3QFw0MjOnUO1qqoObVm8/ UjRMh9lLTHcrtV88DhUPvsn6K18b5yYN3ywbWhHYtON+dcQBb64xpfq2awQpsMFysyGr yB3h3hGeWQZ4KG6jOJJbySWaNZMWZa9hHVh2oBzA2mcDVAE1ATBGtDc2eSfvuPmAa7Jh B0rn14oXvwSUSsPUO4okjis808nZzQXNfoX8t3Oegmz+TnDU9UZxV0k6XcXiJK7ehvre GTnWyqGDFwH7tKqZPNvusW/6xkKJa3MBSRcF642XDllCIhjX+jquaJXDOb/QZjQ34znj jvDw== X-Gm-Message-State: AOAM5311ZOj/4scirP/orvf0vaRdm/SIk43QiQGG0T9tN0AzOkwFkWl/ kCoPCKXuU7iTXWHlirg+/XxhqQVnJUKs0FDsFRbVa6RXqBw1VTiMWp3PkVE9AvimcEoGQl2l9Nm YD3WfOdZFkf65acdv/ocdCYBY X-Received: by 2002:a17:902:d894:b0:151:64c6:273 with SMTP id b20-20020a170902d89400b0015164c60273mr24307292plz.150.1646371066904; Thu, 03 Mar 2022 21:17:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJwBDtDn1Z3wbdPzigbaBlDKAgZd/ChJa3km/enVI+LZTN2c4X2t5bAMomnNrL0k2Ir186+95Q== X-Received: by 2002:a17:902:d894:b0:151:64c6:273 with SMTP id b20-20020a170902d89400b0015164c60273mr24307281plz.150.1646371066620; Thu, 03 Mar 2022 21:17:46 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.17.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:17:46 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 03/23] mm: Check against orig_pte for finish_fault() Date: Fri, 4 Mar 2022 13:16:48 +0800 Message-Id: <20220304051708.86193-4-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We used to check against none pte in finish_fault(), with the assumption that the orig_pte is always none pte. This change prepares us to be able to call do_fault() on !none ptes. For example, we should allow that to happen for pte marker so that we can resto= re information out of the pte markers. Let's change the "pte_none" check into detecting changes since we fetched orig_pte. One trivial thing to take care of here is, when pmd=3D=3DNULL for the pgtable we may not initialize orig_pte at all in handle_pte_fault(). By default orig_pte will be all zeros however the problem is not all architectures are using all-zeros for a none pte. pte_clear() will be the right thing to use here so that we'll always have a valid orig_pte value for the whole handle_pte_fault() call. Signed-off-by: Peter Xu --- mm/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 22d24ea7b87d..cdd0d108d3ee 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4135,7 +4135,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) vmf->address, &vmf->ptl); ret =3D 0; /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) do_set_pte(vmf, page, vmf->address); else ret =3D VM_FAULT_NOPAGE; @@ -4600,6 +4600,13 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) * concurrent faults and from rmap lookups. */ vmf->pte =3D NULL; + /* + * Always initialize orig_pte. This matches with below + * code to have orig_pte to be the none pte if pte=3D=3DNULL. + * This makes the rest code to be always safe to reference + * it, e.g. in finish_fault() we'll detect pte changes. + */ + pte_clear(vmf->vma->vm_mm, vmf->address, &vmf->orig_pte); } else { /* * If a huge pmd materialized under us just retry later. Use --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BD82C433EF for ; Fri, 4 Mar 2022 05:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238039AbiCDFS4 (ORCPT ); Fri, 4 Mar 2022 00:18:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238025AbiCDFSp (ORCPT ); Fri, 4 Mar 2022 00:18:45 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0EB5C53B5A for ; Thu, 3 Mar 2022 21:17:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371077; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YEH//nu79wWwuy8WPoBNLdXG3sMM2ihkgiTBywoRJ9c=; b=cV05+NNa8PXnC9SW1AzR1aK38Nsc9NtSmYT5y2Obro3mNUWlw1fM4UwylDhFySz3H51ahn zESElxBRI+43jdAG/fO7Fv83fLqn6vIqevooAHfwVtdz+yoxV8565A7lF3TTLbG0FSne08 ghG2CSK86RCCSjifuZMWTZv7203lbKY= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-57-bvto9wAGO0Gb2k0AdEg_dw-1; Fri, 04 Mar 2022 00:17:55 -0500 X-MC-Unique: bvto9wAGO0Gb2k0AdEg_dw-1 Received: by mail-pj1-f70.google.com with SMTP id m9-20020a17090ade0900b001bedf2d1d4cso6200173pjv.2 for ; Thu, 03 Mar 2022 21:17:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YEH//nu79wWwuy8WPoBNLdXG3sMM2ihkgiTBywoRJ9c=; b=XEqoABSY31sj45CgVnIo0H1AniK+TL1yYBahK5szwfBnLarCHSsMivwQ2Tk7E0YGE+ YiByLRSUzsjH2cQhHfGVwWCVrHRjighRpS7/5Bg/Eekg8A/3pLD4/TRXZpJjShLSp6ZV N/y1V2wpJNrd0uYtmZvs9JL7mWSA9MsDnyPTzXBWYNeJNxltAXjfdRHwWwYCq0j7qhMt JfgTJFYUdPPNzI45TpDN/qjk13UnK9lojVanedUyxpWAX5Qz8j7mOruty3jQ9fO66kZE HVbMCdhltR3p8ssS2LJkvrKjiYqPxbZB5lVaPUUQS2ibcnsBSYoD4A8G3ol1VGR2vgt/ llwg== X-Gm-Message-State: AOAM5333EKEi9WB1aEGCZQmr9BLvGV6Xxl9LB8kUWmv/g0Aew6SGXtCP yAFcf+RP9CWpvgJoscLfrngRTgiEo8AOUIWwt6/mtkyYD4oQrdXwJwOWXPEKo5/UEuKOHb396VH XNiFVFxaNtuZ3iLUvJqXCIbZ4 X-Received: by 2002:a17:90a:5d93:b0:1bc:4f9c:8eed with SMTP id t19-20020a17090a5d9300b001bc4f9c8eedmr8907317pji.180.1646371074788; Thu, 03 Mar 2022 21:17:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJwXiEiEBTk5qw5SPMWSMqnxEw34BDQjWY2sS9WIU+2vrSP3R9/ofOCk8lF82RBcX9U1hDyWIw== X-Received: by 2002:a17:90a:5d93:b0:1bc:4f9c:8eed with SMTP id t19-20020a17090a5d9300b001bc4f9c8eedmr8907294pji.180.1646371074503; Thu, 03 Mar 2022 21:17:54 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.17.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:17:54 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 04/23] mm/uffd: PTE_MARKER_UFFD_WP Date: Fri, 4 Mar 2022 13:16:49 +0800 Message-Id: <20220304051708.86193-5-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch introduces the 1st user of pte marker: the uffd-wp marker. When the pte marker is installed with the uffd-wp bit set, it means this pte was wr-protected by uffd. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. Some helpers are introduced to detect whether a swap pte is uffd wr-protect= ed. After the pte marker introduced, one swap pte can be wr-protected in two fo= rms: either it is a normal swap pte and it has _PAGE_SWP_UFFD_WP set, or it's a = pte marker that has PTE_MARKER_UFFD_WP set. Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Suggested-by: Andrea Arcangeli Suggested-by: Hugh Dickins Signed-off-by: Peter Xu --- include/linux/swapops.h | 3 ++- include/linux/userfaultfd_k.h | 43 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 9 ++++++++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 5103d2a4ae38..2cec3ef355a7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -249,7 +249,8 @@ static inline int is_writable_migration_entry(swp_entry= _t entry) =20 typedef unsigned long pte_marker; =20 -#define PTE_MARKER_MASK (0) +#define PTE_MARKER_UFFD_WP BIT(0) +#define PTE_MARKER_MASK (PTE_MARKER_UFFD_WP) =20 #ifdef CONFIG_PTE_MARKER =20 diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 33cea484d1ad..bd09c3c89b59 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -15,6 +15,8 @@ =20 #include #include +#include +#include #include =20 /* The set of all possible UFFD-related VM flags. */ @@ -236,4 +238,45 @@ static inline void userfaultfd_unmap_complete(struct m= m_struct *mm, =20 #endif /* CONFIG_USERFAULTFD */ =20 +static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) +{ + return is_pte_marker_entry(entry) && + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP); +} + +static inline bool pte_marker_uffd_wp(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + swp_entry_t entry; + + if (!is_swap_pte(pte)) + return false; + + entry =3D pte_to_swp_entry(pte); + + return pte_marker_entry_uffd_wp(entry); +#else + return false; +#endif +} + +/* + * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er + * forms (pte marker or a normal swap pte), false otherwise. + */ +static inline bool pte_swp_uffd_wp_any(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + if (!is_swap_pte(pte)) + return false; + + if (pte_swp_uffd_wp(pte)) + return true; + + if (pte_marker_uffd_wp(pte)) + return true; +#endif + return false; +} + #endif /* _LINUX_USERFAULTFD_K_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 25bcbb89f8e5..a80ea8721885 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -907,6 +907,15 @@ config PTE_MARKER help Allows to create marker PTEs for file-backed memory. =20 +config PTE_MARKER_UFFD_WP + bool "Marker PTEs support for userfaultfd write protection" + depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP + + help + Allows to create marker PTEs for userfaultfd write protection + purposes. It is required to enable userfaultfd write protection on + file-backed memory types like shmem and hugetlbfs. + source "mm/damon/Kconfig" =20 endmenu --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFE11C433EF for ; Fri, 4 Mar 2022 05:18:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235105AbiCDFTD (ORCPT ); Fri, 4 Mar 2022 00:19:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238065AbiCDFSz (ORCPT ); Fri, 4 Mar 2022 00:18:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1D70E654AD for ; Thu, 3 Mar 2022 21:18:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371085; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J4WHOp4As9jU9WPKckCAt0i8DCpDD3uwt3O/JTJCiOk=; b=bPnHkyGIhVRo2+aQJkEvazF8fFnVP7JC8wJYnvvvAAaEE8hySlp63p4E9KJzYlitqI0Vw1 7iaN15Vi66ZPtQGlQ626907sVushuvzeDBquz+gasHVeQRrZ1cIT4Wq4pQWnaHdcHPW/SP 57e6cPuagf8cMEQx0eaaCJtgY42DskE= Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-338-1EMfd2r-OD-W0TMSzfOYfg-1; Fri, 04 Mar 2022 00:18:04 -0500 X-MC-Unique: 1EMfd2r-OD-W0TMSzfOYfg-1 Received: by mail-pf1-f197.google.com with SMTP id k130-20020a628488000000b004f362b45f28so4472517pfd.9 for ; Thu, 03 Mar 2022 21:18:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J4WHOp4As9jU9WPKckCAt0i8DCpDD3uwt3O/JTJCiOk=; b=hB+4ZRxhyfTZLxK0qFeLIh5IJXrIZ0tIa84a5gedsh5Ugy5W50EPcG2xFL5dCsMn/y cnwvW3vpjltS86x/fYAeTKGDBiX9fxsyGbVjeAq3PmCZnYtd4N3sdZC8eFgqN8zqyN2a NqiKiMuMIk0/JNP7yGap+M3hE/BXqyWaRJzMi5yOKcyqvpxfuI9B89A4wa8rv1DvFZas IRoKVvYcOONP8CuCJm7nIuvzXHL+vJKZjE7MCPO3/NHukubizqPIEWNtZOzYFvbiKITW eBqvjw+6EvCt+aoXawudWp6yGZA2s44AMnq1ZWdIHxf7m2s2mQP2YsNxVx1UYfv+0mqR hxFg== X-Gm-Message-State: AOAM532fXnCGT66qGONqx0nA7a1HonYAd5Z4aQyKSNHBHOFFg+1m/ML6 oroxV79tFbMJ9/W1TgBHT6gYODq4DxfZNo9ydVJLf7xZM7rBkpn36W3KjT5m04VGqUyvKeO6qPZ oqLSMamtYFUjuI8ua5CSoAR1U X-Received: by 2002:a63:8648:0:b0:37c:8fc8:ae4 with SMTP id x69-20020a638648000000b0037c8fc80ae4mr2421605pgd.482.1646371082769; Thu, 03 Mar 2022 21:18:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJznlFXBVvf7x27XrfMAEWi1fvHMcNPQxFuHlxUyI/+59HbIcVzbyw8INKOOkmh2hJfkrZdmyg== X-Received: by 2002:a63:8648:0:b0:37c:8fc8:ae4 with SMTP id x69-20020a638648000000b0037c8fc80ae4mr2421585pgd.482.1646371082478; Thu, 03 Mar 2022 21:18:02 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.17.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:02 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Date: Fri, 4 Mar 2022 13:16:50 +0800 Message-Id: <20220304051708.86193-6-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Pass wp_copy into shmem_mfill_atomic_pte() through the stack, then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with UFFDIO_COPY_MODE= _WP. wp_copy lands mfill_atomic_install_pte() finally. Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte()= , as mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file). Signed-off-by: Peter Xu --- include/linux/shmem_fs.h | 4 ++-- mm/shmem.c | 4 ++-- mm/userfaultfd.c | 23 ++++++++++++++++++----- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index ab51d3cd39bd..02d23ce5f979 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -145,11 +145,11 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *d= st_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, pagep) ({ BUG(); 0; }) + src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ =20 diff --git a/mm/shmem.c b/mm/shmem.c index 81a69bd247b4..3f0332c1c1e9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2317,7 +2317,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep) { struct inode *inode =3D file_inode(dst_vma->vm_file); @@ -2390,7 +2390,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, goto out_release; =20 ret =3D mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, false); + page, true, wp_copy); if (ret) goto out_delete_from_cache; =20 diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e9bb6db002aa..ef418a48b121 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -77,10 +77,19 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, = pmd_t *dst_pmd, * Always mark a PTE as write-protected when needed, regardless of * VM_WRITE, which the user might change. */ - if (wp_copy) + if (wp_copy) { _dst_pte =3D pte_mkuffd_wp(_dst_pte); - else if (writable) + writable =3D false; + } + + if (writable) _dst_pte =3D pte_mkwrite(_dst_pte); + else + /* + * We need this to make sure write bit removed; as mk_pte() + * could return a pte with write bit set. + */ + _dst_pte =3D pte_wrprotect(_dst_pte); =20 dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); =20 @@ -95,7 +104,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, = pmd_t *dst_pmd, } =20 ret =3D -EEXIST; - if (!pte_none(*dst_pte)) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!pte_none_mostly(*dst_pte)) goto out_unlock; =20 if (page_in_cache) { @@ -479,11 +493,10 @@ static __always_inline ssize_t mfill_atomic_pte(struc= t mm_struct *dst_mm, err =3D mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr); } else { - VM_WARN_ON_ONCE(wp_copy); err =3D shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode !=3D MCOPY_ATOMIC_NORMAL, - page); + wp_copy, page); } =20 return err; --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABD69C433EF for ; Fri, 4 Mar 2022 05:18:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238025AbiCDFTS (ORCPT ); Fri, 4 Mar 2022 00:19:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238073AbiCDFTL (ORCPT ); Fri, 4 Mar 2022 00:19:11 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5B48D7B572 for ; Thu, 3 Mar 2022 21:18:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371093; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h4ZxUpUrUaV8nyMPDE8x8P31UP3HUv0lUXU89pH2T9E=; b=CPw1GPYEmsnwU9//m5WqsPkWze4ZJGUCm+AZg70kGDE6n20QoRR3iSTCizfat9LIr3qw/S qEUauH+UtL8SoJU98L01FDPZkN5JawG1vyJzbw5nQ1AtNC67PsXPsIVzsn+lsY4+l62lJ8 T34UbSXQnkL2FerA7m87DNhpCeiGnRE= Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-554-bPCDomFMOMm--0LmnngjDg-1; Fri, 04 Mar 2022 00:18:12 -0500 X-MC-Unique: bPCDomFMOMm--0LmnngjDg-1 Received: by mail-pg1-f198.google.com with SMTP id g31-20020a63521f000000b003783582a261so3921414pgb.5 for ; Thu, 03 Mar 2022 21:18:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=h4ZxUpUrUaV8nyMPDE8x8P31UP3HUv0lUXU89pH2T9E=; b=u+KkciP+2XQMX/fGvVn/jkknTBNQ8N2+lYEYy4Pl/HxBTUpKZCm5CUuy9bZno/gLuO ADCV38Hsqs3aYJuSDR/8VllfTCL238Vd5+4oZFxrNEkdyzEMYoniDBX1AJG0XVXoIs49 oEGN1hg663QrwI+vkl1Q2WEzXrbt1Fu42n/bqzqhjovElFT0WHyBrILbaaaRH/5IxoAB u5+WW1cPlM+SZhPDZMcwNh74xAXZdKBVojQ15cETsIRXQmfj78PXhwJxr0PDnSFwdjgl bE1pKICZV62dWN8PFqIK/7r6DttCQiMA3oI8M0b+PuL4jSSj2DKsTSA3XV/qn6gT/bTI B9nw== X-Gm-Message-State: AOAM533yRag318zGmwLsCSd09LNH/BoH4/3jNb+18E9+bzIK/S8ca4KM FKgx08C72/lJfgzADG7i6Oy1FINR5VPlgMsiIhjdpi+yBlLgwKGraLApPhVKqOr70YXVRrjTGr/ 8aTuK5pdA440RoawhG2GORanj X-Received: by 2002:a05:6a00:b96:b0:4f3:c0f6:5c47 with SMTP id g22-20020a056a000b9600b004f3c0f65c47mr517330pfj.69.1646371091189; Thu, 03 Mar 2022 21:18:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJyn5dg8SYjJevnXT+y2Ca46TGDW60Atz3RHVXa5U9+Jx4K7xiL9CysJtRs0lRcaiYEnHEu04Q== X-Received: by 2002:a05:6a00:b96:b0:4f3:c0f6:5c47 with SMTP id g22-20020a056a000b9600b004f3c0f65c47mr517282pfj.69.1646371090410; Thu, 03 Mar 2022 21:18:10 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:10 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Date: Fri, 4 Mar 2022 13:16:51 +0800 Message-Id: <20220304051708.86193-7-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memories are prone to unmap/swap so the ptes are always unstabl= e, because they can be easily faulted back later using the page cache. This c= ould lead to uffd-wp getting lost when unmapping or swapping out such memory. O= ne example is shmem. PTE markers are needed to store those information. This patch prepares it by handling uffd-wp pte markers first it is applied elsewhere, so that the page fault handler can recognize uffd-wp pte markers. The handling of uffd-wp pte markers is similar to missing fault, it's just = that we'll handle this "missing fault" when we see the pte markers, meanwhile we need to make sure the marker information is kept during processing the faul= t. This is a slow path of uffd-wp handling, because zapping of wr-protected sh= mem ptes should be rare. So far it should only trigger in two conditions: (1) When trying to punch holes in shmem_fallocate(), there is an optimiza= tion to zap the pgtables before evicting the page. (2) When swapping out shmem pages. Because of this, the page fault handling is simplifed too by not sending the wr-protect message in the 1st page fault, instead the page will be installed read-only, so the uffd-wp message will be generated in the next fault, which will trigger the do_wp_page() path of general uffd-wp handling. Disable fault-around for all uffd-wp registered ranges for extra safety just like uffd-minor fault, and clean the code up. Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 17 +++++++++ mm/memory.c | 67 ++++++++++++++++++++++++++++++----- 2 files changed, 75 insertions(+), 9 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index bd09c3c89b59..827e38b7be65 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -96,6 +96,18 @@ static inline bool uffd_disable_huge_pmd_share(struct vm= _area_struct *vma) return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } =20 +/* + * Don't do fault around for either WP or MINOR registered uffd range. For + * MINOR registered range, fault around will be a total disaster and ptes = can + * be installed without notifications; for WP it should mostly be fine as = long + * as the fault around checks for pte_none() before the installation, howe= ver + * to be super safe we just forbid it. + */ +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; @@ -236,6 +248,11 @@ static inline void userfaultfd_unmap_complete(struct m= m_struct *mm, { } =20 +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ =20 static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) diff --git a/mm/memory.c b/mm/memory.c index cdd0d108d3ee..f509ddf2ad39 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3512,6 +3512,39 @@ static inline bool should_try_to_free_swap(struct pa= ge *page, page_count(page) =3D=3D 2; } =20 +static vm_fault_t pte_marker_clear(struct vm_fault *vmf) +{ + vmf->pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + /* + * Be careful so that we will only recover a special uffd-wp pte into a + * none pte. Otherwise it means the pte could have changed, so retry. + */ + if (is_pte_marker(*vmf->pte)) + pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; +} + +/* + * This is actually a page-missing access, but with uffd-wp special pte + * installed. It means this pte was wr-protected before being unmapped. + */ +static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf) +{ + /* + * Just in case there're leftover special ptes even after the region + * got unregistered - we can simply clear them. We can also do that + * proactively when e.g. when we do UFFDIO_UNREGISTER upon some uffd-wp + * ranges, but it should be more efficient to be done lazily here. + */ + if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) + return pte_marker_clear(vmf); + + /* do_fault() can handle pte markers too like none pte */ + return do_fault(vmf); +} + static vm_fault_t handle_pte_marker(struct vm_fault *vmf) { swp_entry_t entry =3D pte_to_swp_entry(vmf->orig_pte); @@ -3525,8 +3558,11 @@ static vm_fault_t handle_pte_marker(struct vm_fault = *vmf) if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) return VM_FAULT_SIGBUS; =20 - /* TODO: handle pte markers */ - return 0; + if (pte_marker_entry_uffd_wp(entry)) + return pte_marker_handle_uffd_wp(vmf); + + /* This is an unknown pte marker */ + return VM_FAULT_SIGBUS; } =20 /* @@ -4051,6 +4087,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct pa= ge *page) void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long add= r) { struct vm_area_struct *vma =3D vmf->vma; + bool uffd_wp =3D pte_marker_uffd_wp(vmf->orig_pte); bool write =3D vmf->flags & FAULT_FLAG_WRITE; bool prefault =3D vmf->address !=3D addr; pte_t entry; @@ -4065,6 +4102,8 @@ void do_set_pte(struct vm_fault *vmf, struct page *pa= ge, unsigned long addr) =20 if (write) entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); + if (unlikely(uffd_wp)) + entry =3D pte_mkuffd_wp(pte_wrprotect(entry)); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -4238,9 +4277,21 @@ static vm_fault_t do_fault_around(struct vm_fault *v= mf) return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); } =20 +/* Return true if we should do read fault-around, false otherwise */ +static inline bool should_fault_around(struct vm_fault *vmf) +{ + /* No ->map_pages? No way to fault around... */ + if (!vmf->vma->vm_ops->map_pages) + return false; + + if (uffd_disable_fault_around(vmf->vma)) + return false; + + return fault_around_bytes >> PAGE_SHIFT > 1; +} + static vm_fault_t do_read_fault(struct vm_fault *vmf) { - struct vm_area_struct *vma =3D vmf->vma; vm_fault_t ret =3D 0; =20 /* @@ -4248,12 +4299,10 @@ static vm_fault_t do_read_fault(struct vm_fault *vm= f) * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) { - if (likely(!userfaultfd_minor(vmf->vma))) { - ret =3D do_fault_around(vmf); - if (ret) - return ret; - } + if (should_fault_around(vmf)) { + ret =3D do_fault_around(vmf); + if (ret) + return ret; } =20 ret =3D __do_fault(vmf); --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B24A7C433F5 for ; Fri, 4 Mar 2022 05:18:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238058AbiCDFTU (ORCPT ); Fri, 4 Mar 2022 00:19:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238110AbiCDFTQ (ORCPT ); Fri, 4 Mar 2022 00:19:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0A2C7A94CE for ; Thu, 3 Mar 2022 21:18:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371102; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1n551s1g6Vv5nncVOaasRljGy2eKMZ3v3Uj1KhN2rf0=; b=fbGMxW8+6krHsDzGXXd3WKlWlY33GitFA0YVt/tQKg+Jj7W0kSY6FHznPdzJM0vEnfwiJB gXqiMLCKoAI+aUb3Qihg6mSAGultsdfkhXHQqmpbQM5Z7A7Pl0FTpmC0FJn2khIR1/dNI7 vrXtkVIn6To7KAvrZrIujgdOtOSRtjU= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-613-WQneyNI5M4OPZyUzEPG8fw-1; Fri, 04 Mar 2022 00:18:21 -0500 X-MC-Unique: WQneyNI5M4OPZyUzEPG8fw-1 Received: by mail-pj1-f72.google.com with SMTP id m14-20020a17090a4d8e00b001bf2d4926c5so65704pjh.3 for ; Thu, 03 Mar 2022 21:18:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1n551s1g6Vv5nncVOaasRljGy2eKMZ3v3Uj1KhN2rf0=; b=FJ3QpLB5V4z7kiUkwhv+rE6Im2Dq1kZsq27QRWuOc/DG2Wq1l2pn5grJy42Bk+hdEX 3GrNBqweT2u6f69Paj/FIpTh9o8hKF4R4OKJqCR/7o+/eUVZUdn1XiTGwvalse70DEbg jbc/BZhrGHDqj4XVk99c2AMn84BTJkGjoCfRUT6yxDiGpCVjPPB+DTSXw6kgdR/LvY27 oQPsp7B1X4SmFqLS5g3No6xZyooYXh9qqHPRp1e8xrLnEjDwwkJI4UBjytkmgS6aLZRg waYncQSm64GWWv11D62UCoZkNusTRloNi/FjBIDVpAbNnvqOyOasivQ1ZNDAa4TYdu09 4AtQ== X-Gm-Message-State: AOAM5313I8VNSKPTSAoOnUmL9TTdgWq7AkkvO2vkUxikTraR+WNvpob/ 4BzW3SqH3sF7aJWpT09lv5eXOzMvn+zAPcSlUJ7h+eI+dPhbZTSDG47FOujVIasf5gRhvRg1/Iv yj/4nBXYXCloovaBkU1TXF4Lv X-Received: by 2002:a17:902:7c13:b0:151:a784:34c7 with SMTP id x19-20020a1709027c1300b00151a78434c7mr7469091pll.174.1646371099464; Thu, 03 Mar 2022 21:18:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJx0fLWHy2rctiM2ENjL5ljsJ4tYkRpfVBiuZmQMy/vB+djlUOenh/QkOWrinBqK7A5FOggeSw== X-Received: by 2002:a17:902:7c13:b0:151:a784:34c7 with SMTP id x19-20020a1709027c1300b00151a78434c7mr7469075pll.174.1646371099075; Thu, 03 Mar 2022 21:18:19 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:18 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Date: Fri, 4 Mar 2022 13:16:52 +0800 Message-Id: <20220304051708.86193-8-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memory is prone to being unmapped at any time. It means all information in the pte will be dropped, including the uffd-wp flag. To persist the uffd-wp flag, we'll use the pte markers. This patch teaches= the zap code to understand uffd-wp and know when to keep or drop the uffd-wp bi= t. Add a new flag ZAP_FLAG_DROP_MARKER and set it in zap_details when we don't want to persist such an information, for example, when destroying the whole vma, or punching a hole in a shmem file. For the rest cases we should never drop the uffd-wp bit, or the wr-protect information will get lost. The new ZAP_FLAG_DROP_MARKER needs to be put into mm.h rather than memory.c because it'll be further referenced in hugetlb files later. Signed-off-by: Peter Xu --- include/linux/mm.h | 10 ++++++++ include/linux/mm_inline.h | 43 ++++++++++++++++++++++++++++++++++ mm/memory.c | 49 ++++++++++++++++++++++++++++++++++++--- mm/rmap.c | 8 +++++++ 4 files changed, 107 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0b9a0334d0f8..cdefbb078a73 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3385,4 +3385,14 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned= long start, } #endif =20 +typedef unsigned int __bitwise zap_flags_t; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information f= or + * file-backed memory. This should only be specified when we will complet= ely + * drop the page in the mm, either by truncation or unmapping of the vma. = By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index ac32125745ab..70e72ce85b25 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -6,6 +6,8 @@ #include #include #include +#include +#include =20 /** * folio_is_file_lru - Should the folio be on a file LRU or anon LRU? @@ -316,5 +318,46 @@ static inline bool mm_tlb_flush_nested(struct mm_struc= t *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } =20 +/* + * If this pte is wr-protected by uffd-wp in any form, arm the special pte= to + * replace a none pte. NOTE! This should only be called when *pte is alr= eady + * cleared so we will never accidentally replace something valuable. Mean= while + * none pte also means we are not demoting the pte so tlb flushed is not n= eeded. + * E.g., when pte cleared the caller should have taken care of the tlb flu= sh. + * + * Must be called with pgtable lock held so that no thread will see the no= ne + * pte, and if they see it, they'll fault and serialize at the pgtable loc= k. + * + * This function is a no-op if PTE_MARKER_UFFD_WP is not enabled. + */ +static inline void +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long ad= dr, + pte_t *pte, pte_t pteval) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + bool arm_uffd_pte =3D false; + + /* The current status of the pte should be "cleared" before calling */ + WARN_ON_ONCE(!pte_none(*pte)); + + if (vma_is_anonymous(vma)) + return; + + /* A uffd-wp wr-protected normal pte */ + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + arm_uffd_pte =3D true; + + /* + * A uffd-wp wr-protected swap pte. Note: this should even cover an + * existing pte marker with uffd-wp bit set. + */ + if (unlikely(pte_swp_uffd_wp_any(pteval))) + arm_uffd_pte =3D true; + + if (unlikely(arm_uffd_pte)) + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); +#endif +} =20 #endif diff --git a/mm/memory.c b/mm/memory.c index f509ddf2ad39..e3e67e32eb8a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -74,6 +74,7 @@ #include #include #include +#include =20 #include =20 @@ -1310,6 +1311,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma) struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ bool even_cows; /* Zap COWed private pages too? */ + zap_flags_t zap_flags; /* Extra flags for zapping */ }; =20 /* Whether we should zap all COWed (private) pages too */ @@ -1338,6 +1340,29 @@ static inline bool should_zap_page(struct zap_detail= s *details, struct page *pag return !PageAnon(page); } =20 +static inline bool zap_drop_file_uffd_wp(struct zap_details *details) +{ + if (!details) + return false; + + return details->zap_flags & ZAP_FLAG_DROP_MARKER; +} + +/* + * This function makes sure that we'll replace the none pte with an uffd-wp + * swap special pte marker when necessary. Must be with the pgtable lock h= eld. + */ +static inline void +zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, + struct zap_details *details, pte_t pteval) +{ + if (zap_drop_file_uffd_wp(details)) + return; + + pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1375,6 +1400,8 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, ptent =3D ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, + ptent); if (unlikely(!page)) continue; =20 @@ -1405,6 +1432,13 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, page =3D pfn_swap_entry_to_page(entry); if (unlikely(!should_zap_page(details, page))) continue; + /* + * Both device private/exclusive mappings should only + * work with anonymous page so far, so we don't need to + * consider uffd-wp bit when zap. For more information, + * see zap_install_uffd_wp_if_needed(). + */ + WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(page)]--; if (is_device_private_entry(entry)) page_remove_rmap(page, vma, false); @@ -1421,8 +1455,10 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; - } else if (is_pte_marker_entry(entry)) { - /* By default, simply drop all pte markers when zap */ + } else if (pte_marker_entry_uffd_wp(entry)) { + /* Only drop the uffd-wp marker if explicitly requested */ + if (!zap_drop_file_uffd_wp(details)) + continue; } else if (is_hwpoison_entry(entry)) { if (!should_zap_cows(details)) continue; @@ -1431,6 +1467,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, WARN_ON_ONCE(1); } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 add_mm_rss_vec(mm, rss); @@ -1641,12 +1678,17 @@ void unmap_vmas(struct mmu_gather *tlb, unsigned long end_addr) { struct mmu_notifier_range range; + struct zap_details details =3D { + .zap_flags =3D ZAP_FLAG_DROP_MARKER, + /* Careful - we need to zap private pages too! */ + .even_cows =3D true, + }; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, start_addr, end_addr); mmu_notifier_invalidate_range_start(&range); for ( ; vma && vma->vm_start < end_addr; vma =3D vma->vm_next) - unmap_single_vma(tlb, vma, start_addr, end_addr, NULL); + unmap_single_vma(tlb, vma, start_addr, end_addr, &details); mmu_notifier_invalidate_range_end(&range); } =20 @@ -3391,6 +3433,7 @@ void unmap_mapping_folio(struct folio *folio) =20 details.even_cows =3D false; details.single_folio =3D folio; + details.zap_flags =3D ZAP_FLAG_DROP_MARKER; =20 i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) diff --git a/mm/rmap.c b/mm/rmap.c index 3d288a7c8c32..f83d812d0a5e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -73,6 +73,7 @@ #include #include #include +#include =20 #include =20 @@ -1526,6 +1527,13 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } =20 + /* + * Now the pte is cleared. If this is uffd-wp armed pte, we + * may want to replace a none pte with a marker pte if it's + * file-backed, so we don't lose the tracking information. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EB64C433EF for ; Fri, 4 Mar 2022 05:18:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233928AbiCDFTY (ORCPT ); Fri, 4 Mar 2022 00:19:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238082AbiCDFTR (ORCPT ); Fri, 4 Mar 2022 00:19:17 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8925E9A995 for ; Thu, 3 Mar 2022 21:18:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371109; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tQXe4rml5LknCAC9VWmfIhmJrE0oVG26j9HM2k7VJ/Y=; b=QEW/oiNnGDnL8Fb07kMj0s66sF+eblaRjPoVOM7U8v/FPiRmXWddMNSievE3IpxDSc/9kE ZUfNnFfvzUDdeKXoW2UO3DGUiqKWlSvOxYZVSDGrjjitmYYVhwObwisry6jqIjlGw+0dJi IEatfeg5/MARZCpCHp0tsY4Xxlr3C7k= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-498-FOjqB8vdNTeTycsD9nA67A-1; Fri, 04 Mar 2022 00:18:28 -0500 X-MC-Unique: FOjqB8vdNTeTycsD9nA67A-1 Received: by mail-pl1-f197.google.com with SMTP id j3-20020a170902da8300b0014fdd4e979cso4095854plx.17 for ; Thu, 03 Mar 2022 21:18:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tQXe4rml5LknCAC9VWmfIhmJrE0oVG26j9HM2k7VJ/Y=; b=Mr8eUexr9OaX69cj/z9dKG98bWAD3Sm+/TtrPt3a0ZVFEIfidv7TcVzjdX1qm29tbh U+940Diod1ls8jlLbds26zbn5dGtSOZLCCpx7GaCp85lsZhDrPPw1BAxIDq6v9dJnPAM v7FE2EX7F7GgFJU3PsmwDAruQ61Z3C8mNnh4bnK9cR/rB+gcReYeG/2RRkiQpSX5KxpK cPn5rQ88SDS2iQQQdl3hx4YCKlkMt1Ptg4F1DW0iS1o13LvC0oZnu5rltxHU78hel7hJ LxJnVpA0BRtUJwz1ZVmbc1TFBHpMf3wGC6fCCO1BeJJjL8fnyoVULNEFy7CwO2w6uXk6 +dGA== X-Gm-Message-State: AOAM531dJ1+OI2Y3Huh5845YI1BrSwZOudG4Il+E0Q7jz4ulHU7fr4sz Zbd01w3OaFZnT3ANkx0mTQ2mtb+sK2gA2LtGJFUTJFJ5KuXE/4157kp/9pozYaOrssIXGhS7yqY SdGu7aqPrwSB+oBisQjLgyG7M X-Received: by 2002:a17:903:2285:b0:151:4b38:298e with SMTP id b5-20020a170903228500b001514b38298emr29450205plh.36.1646371107399; Thu, 03 Mar 2022 21:18:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJxWOXIBo4RzUqL6IP5zCBIf1bLH01fWQdczZaceP2UrGilCNmkw3HF88pBzciWEi4lBzLO6HQ== X-Received: by 2002:a17:903:2285:b0:151:4b38:298e with SMTP id b5-20020a170903228500b001514b38298emr29450185plh.36.1646371107064; Thu, 03 Mar 2022 21:18:27 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:26 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem Date: Fri, 4 Mar 2022 13:16:53 +0800 Message-Id: <20220304051708.86193-9-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" File-backed memory differs from anonymous memory in that even if the pte is missing, the data could still resides either in the file or in page/swap ca= che. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp pte markers when necessary. So when there's a future write to the pte, the fault handler will go the special pa= th to first fault-in the page as read-only, then report to userfaultfd server = with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the pte got unmapped but replaced by the special uffd-wp marker. Then we'll need t= o be able to recover from a uffd-wp pte marker into a none pte, so that the next access to the page will fault in correctly as usual when accessed the next time. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be a= ble to pre-populate the page table entries if we see (!anonymous && MM_CP_UFFD_= WP) requests, otherwise change_protection_range() will always skip when the pgt= able entry does not exist. For example, the pgtable can be missing for a whole chunk of 2M pmd, but the page cache can exist for the 2M range. When we want to wr-protect one 4K p= age within the 2M pmd range, we need to pre-populate the pgtable and install the pte marker showing that we want to get a message and block the thread when = the page cache of that 4K page is written. Without pre-populating the pmd, change_protection() will simply skip that whole pmd. Note that this patch only covers the small pages (pte level) but not coveri= ng any of the transparent huge pages yet. That will be done later, and this p= atch will be a preparation for it too. Signed-off-by: Peter Xu --- mm/mprotect.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 6d179c720089..4878b6b99df9 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -184,8 +185,16 @@ static unsigned long change_pte_range(struct vm_area_s= truct *vma, pmd_t *pmd, newpte =3D pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); - } else if (is_pte_marker_entry(entry)) { - /* Skip it, the same as none pte */ + } else if (pte_marker_entry_uffd_wp(entry)) { + /* + * If this is uffd-wp pte marker and we'd like + * to unprotect it, drop it; the next page + * fault will trigger without uffd trapping. + */ + if (uffd_wp_resolve) { + pte_clear(vma->vm_mm, addr, pte); + pages++; + } continue; } else { newpte =3D oldpte; @@ -200,6 +209,20 @@ static unsigned long change_pte_range(struct vm_area_s= truct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect a none pte, because even if the + * pte is none, the page/swap cache could + * exist. Doing that by install a marker. + */ + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); + pages++; + } } } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); arch_leave_lazy_mmu_mode(); @@ -233,6 +256,39 @@ static inline int pmd_none_or_clear_bad_unless_trans_h= uge(pmd_t *pmd) return 0; } =20 +/* Return true if we're uffd wr-protecting file-backed memory, or false */ +static inline bool +uffd_wp_protect_file(struct vm_area_struct *vma, unsigned long cp_flags) +{ + return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); +} + +/* + * If wr-protecting the range for file-backed, populate pgtable for the ca= se + * when pgtable is empty but page cache exists. When {pte|pmd|...}_alloc() + * failed it means no memory, we don't have a better option but stop. + */ +#define change_pmd_prepare(vma, pmd, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \ + break; \ + } \ + } while (0) +/* + * This is the general pud/p4d/pgd version of change_pmd_prepare(). We nee= d to + * have separate change_pmd_prepare() because pte_alloc() returns 0 on suc= cess, + * while {pmd|pud|p4d}_alloc() returns the valid pointer on success. + */ +#define change_prepare(vma, high, low, addr, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + low##_t *p =3D low##_alloc(vma->vm_mm, high, addr); \ + if (WARN_ON_ONCE(p =3D=3D NULL)) \ + break; \ + } \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -251,6 +307,7 @@ static inline unsigned long change_pmd_range(struct vm_= area_struct *vma, =20 next =3D pmd_addr_end(addr, end); =20 + change_pmd_prepare(vma, pmd, cp_flags); /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur @@ -316,6 +373,7 @@ static inline unsigned long change_pud_range(struct vm_= area_struct *vma, pud =3D pud_offset(p4d, addr); do { next =3D pud_addr_end(addr, end); + change_prepare(vma, pud, pmd, addr, cp_flags); if (pud_none_or_clear_bad(pud)) continue; pages +=3D change_pmd_range(vma, pud, addr, next, newprot, @@ -336,6 +394,7 @@ static inline unsigned long change_p4d_range(struct vm_= area_struct *vma, p4d =3D p4d_offset(pgd, addr); do { next =3D p4d_addr_end(addr, end); + change_prepare(vma, p4d, pud, addr, cp_flags); if (p4d_none_or_clear_bad(p4d)) continue; pages +=3D change_pud_range(vma, p4d, addr, next, newprot, @@ -361,6 +420,7 @@ static unsigned long change_protection_range(struct vm_= area_struct *vma, inc_tlb_flush_pending(mm); do { next =3D pgd_addr_end(addr, end); + change_prepare(vma, pgd, p4d, addr, cp_flags); if (pgd_none_or_clear_bad(pgd)) continue; pages +=3D change_p4d_range(vma, pgd, addr, next, newprot, --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8BF0C433F5 for ; Fri, 4 Mar 2022 05:18:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238082AbiCDFTd (ORCPT ); Fri, 4 Mar 2022 00:19:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238079AbiCDFTZ (ORCPT ); Fri, 4 Mar 2022 00:19:25 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 98B48AA01A for ; Thu, 3 Mar 2022 21:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371117; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cWQZRYNGI2cSzJcmY2NeIfodfPZX4V25Mo1spySEF5s=; b=FJtY6RtzeJpPfCauenbbhYLU99scnUrXwd04P0gEZqI9qaEGWel++OnFeawcbroKNUGz3q VGWsn9ZVGuoAsAbUSW+lE6epUWmb1c8/v4ar9cD1RKV/iqLPdG4n6j27+DtZY43nqmeQxL b41qK1bZ/pQ61suzF3+pUBX48483aHs= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-344-q1iNiXs1N6Ck0eExqtroyw-1; Fri, 04 Mar 2022 00:18:36 -0500 X-MC-Unique: q1iNiXs1N6Ck0eExqtroyw-1 Received: by mail-pj1-f72.google.com with SMTP id e7-20020a17090a4a0700b001bc5a8c533eso4173379pjh.4 for ; Thu, 03 Mar 2022 21:18:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cWQZRYNGI2cSzJcmY2NeIfodfPZX4V25Mo1spySEF5s=; b=wmRjd84AdgTeZLdDvmPUYGp1Tpzmhic3+WFiHanbVCwWgHiMOqjHAJEXPxHmj51ece 2j82HIrXg46lmKbJOfkRqrZ236CmT1EeTHSNcsIlNsQnkfY1bzhWkBysrxbKg6f3IaAJ eG9fxZTuOS+roWRw3JfWffnchqZW5nqnE2YFeytPGPUM+Ems4wprUB2LZbleSy+MFTEu 8JYbi4ki2ySwwhOKeYo6gMwGcmLBKmGzAcVcKbYv7cpuC5zPzxxCE4XNPpsFRAb79DYd SN/YfJnTV7Owp+4yqr09A7k4fn9vLlr3N2y7PKhmKV2JX67VW5ZRxi2JuO5W6ZNS6RbC fYPA== X-Gm-Message-State: AOAM530wRoq/W11MLjZbnxWHARXUPE1rb+v9v9ebBQIMVPU/2qowvJ/I vpnGHLngly1h0/2aEOicyO0LLTSFJPLNv28rCJyZstzxy5C3e2U+dKyOt931IBoJMZBioDOCpLF UdmcW6B6/E6oilyWcftN/4oxv X-Received: by 2002:a63:af02:0:b0:375:57f0:8af1 with SMTP id w2-20020a63af02000000b0037557f08af1mr32843634pge.188.1646371115432; Thu, 03 Mar 2022 21:18:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4jOImxHShOaqYtf+LhwSwkfO4VXJz5d+7rIaM+Vv6wcTRt41/dvgoYP9dxEas9oL3cZidzQ== X-Received: by 2002:a63:af02:0:b0:375:57f0:8af1 with SMTP id w2-20020a63af02000000b0037557f08af1mr32843618pge.188.1646371115135; Thu, 03 Mar 2022 21:18:35 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:34 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Date: Fri, 4 Mar 2022 13:16:54 +0800 Message-Id: <20220304051708.86193-10-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We don't have "huge" version of pte markers, instead when necessary we split the thp. However split the thp is not enough, because file-backed thp is handled tot= ally differently comparing to anonymous thps: rather than doing a real split, the thp pmd will simply got cleared in __split_huge_pmd_locked(). That is not enough if e.g. when there is a thp covers range [0, 2M) but we = want to wr-protect small page resides in [4K, 8K) range, because after __split_huge_pmd() returns, there will be a none pmd, and change_pmd_range() will just skip it right after the split. Here we leverage the previously introduced change_pmd_prepare() macro so th= at we'll populate the pmd with a pgtable page after the pmd split (in which process the pmd will be cleared for cases like shmem). Then change_pte_ran= ge() will do all the rest for us by installing the uffd-wp pte marker at any none pte that we'd like to wr-protect. Signed-off-by: Peter Xu --- mm/mprotect.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 4878b6b99df9..95b307d4766d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -329,8 +329,15 @@ static inline unsigned long change_pmd_range(struct vm= _area_struct *vma, } =20 if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr !=3D HPAGE_PMD_SIZE) { + if ((next - addr !=3D HPAGE_PMD_SIZE) || + uffd_wp_protect_file(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); + /* + * For file-backed, the pmd could have been + * cleared; make sure pmd populated if + * necessary, then fall-through to pte level. + */ + change_pmd_prepare(vma, pmd, cp_flags); } else { int nr_ptes =3D change_huge_pmd(vma, pmd, addr, newprot, cp_flags); --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A69EC433F5 for ; Fri, 4 Mar 2022 05:19:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232778AbiCDFTr (ORCPT ); Fri, 4 Mar 2022 00:19:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238067AbiCDFTm (ORCPT ); Fri, 4 Mar 2022 00:19:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C670AA9A47 for ; Thu, 3 Mar 2022 21:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=87YY6m3X7TCIFcgpIMmGMt9yOOTi6qvz7lf8TeZlPS8=; b=IzH7HX8J4Ck9fSNRZYTPEXjls22A9iNDvGqCbnHRCT7LplEpLH1PCqwyMFziKvl9170icN 9w1slkYAZj5OBAgFDm4/ORB11vtATM1Ih2Eo0ErrbgKVL6q6el8rRXENvUGfs/V1z/rrej ekSy9mUvrJ+28LjJw9bxWNZcrmXoKAM= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-504-b-8cTzfQN8OiiWfpt5QLsw-1; Fri, 04 Mar 2022 00:18:44 -0500 X-MC-Unique: b-8cTzfQN8OiiWfpt5QLsw-1 Received: by mail-pj1-f70.google.com with SMTP id p5-20020a17090a748500b001bee6752974so4166744pjk.8 for ; Thu, 03 Mar 2022 21:18:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=87YY6m3X7TCIFcgpIMmGMt9yOOTi6qvz7lf8TeZlPS8=; b=T0oMHfwOIc/G7LxphcNNztQiCY+2YJHHfvuHB3RngZRVUpZA1dGWCXOOhRKHeYD72q TEDiRPalGY289LxTkoK0lT+e1B3dAXqCXxYfclR/oBKRWyOZbbMqxP/3k6xcxTPPIBYf PZnG36TZ8vxDD8ObLez5/kVj6n/52BYsXGjRD2FxRCEVcXJM7HMyRu0HNxVTUlFZd0GO bDModIYvRj43aj4h6OMGCFaM9Tx+5y4h7TFerUmboOoC1eGhjAUpOy2QPcmk1pFjdzHI k0CamFDjKRZ6tX/iMWrsnn9R8MLvF8rmDr9aRNegzkzLsQkeXjc98U0THk27cpADPELU DdEg== X-Gm-Message-State: AOAM532t329zZDNQ2KoQ8VNCqmOG3jUefbPE9R2lBINvOly49kachqkz A4YFBMIVkqWdNiUzcg3Yr3lOaJ9q69mEX1O3hU9ZxL8MWJvYOGbcZBXG2CoUyjGezV+6SK7ab3j B5LM1sBBve0Ukf80Tw9jliRD5 X-Received: by 2002:a63:5525:0:b0:372:c376:74f1 with SMTP id j37-20020a635525000000b00372c37674f1mr32598160pgb.433.1646371123403; Thu, 03 Mar 2022 21:18:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJxvoeKLrN4o7hnThDJS0UIUxVik/k+WtOZ+b4Gnnk3c8Fk0c2piAJVffr6/tNq+9lzTQITaSA== X-Received: by 2002:a63:5525:0:b0:372:c376:74f1 with SMTP id j37-20020a635525000000b00372c37674f1mr32598143pgb.433.1646371123117; Thu, 03 Mar 2022 21:18:43 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.35 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:42 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 10/23] mm/shmem: Handle uffd-wp during fork() Date: Fri, 4 Mar 2022 13:16:55 +0800 Message-Id: <20220304051708.86193-11-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Normally we skip copy page when fork() for VM_SHARED shmem, but we can't sk= ip it anymore if uffd-wp is enabled on dst vma. This should only happen when = the src uffd has UFFD_FEATURE_EVENT_FORK enabled on uffd-wp shmem vma, so that VM_UFFD_WP will be propagated onto dst vma too, then we should copy the pgtables with uffd-wp bit and pte markers, because these information will be lost otherwise. Since the condition checks will become even more complicated for deciding "whether a vma needs to copy the pgtable during fork()", introduce a helper vma_needs_copy() for it, so everything will be clearer. Signed-off-by: Peter Xu --- mm/memory.c | 49 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e3e67e32eb8a..e9e335ecb5dc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -857,6 +857,14 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct m= m_struct *src_mm, if (try_restore_exclusive_pte(src_pte, src_vma, addr)) return -EBUSY; return -ENOENT; + } else if (is_pte_marker_entry(entry)) { + /* + * We're copying the pgtable should only because dst_vma has + * uffd-wp enabled, do sanity check. + */ + WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); + set_pte_at(dst_mm, addr, dst_pte, pte); + return 0; } if (!userfaultfd_wp(dst_vma)) pte =3D pte_swp_clear_uffd_wp(pte); @@ -1225,6 +1233,38 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma, return 0; } =20 +/* + * Return true if the vma needs to copy the pgtable during this fork(). R= eturn + * false when we can speed up fork() by allowing lazy page faults later un= til + * when the child accesses the memory range. + */ +bool +vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_= vma) +{ + /* + * Always copy pgtables when dst_vma has uffd-wp enabled even if it's + * file-backed (e.g. shmem). Because when uffd-wp is enabled, pgtable + * contains uffd-wp protection information, that's something we can't + * retrieve from page cache, and skip copying will lose those info. + */ + if (userfaultfd_wp(dst_vma)) + return true; + + if (src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) + return true; + + if (src_vma->anon_vma) + return true; + + /* + * Don't copy ptes where a page fault will fill them correctly. Fork + * becomes much lighter when there are big shared or private readonly + * mappings. The tradeoff is that copy_page_range is more efficient + * than faulting. + */ + return false; +} + int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src= _vma) { @@ -1238,14 +1278,7 @@ copy_page_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma) bool is_cow; int ret; =20 - /* - * Don't copy ptes where a page fault will fill them correctly. - * Fork becomes much lighter when there are big shared or private - * readonly mappings. The tradeoff is that copy_page_range is more - * efficient than faulting. - */ - if (!(src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) && - !src_vma->anon_vma) + if (!vma_needs_copy(dst_vma, src_vma)) return 0; =20 if (is_vm_hugetlb_page(src_vma)) --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00FF4C433EF for ; Fri, 4 Mar 2022 05:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238101AbiCDFTu (ORCPT ); Fri, 4 Mar 2022 00:19:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238121AbiCDFTo (ORCPT ); Fri, 4 Mar 2022 00:19:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5807EF1EA1 for ; Thu, 3 Mar 2022 21:18:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371133; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jyv1lzZ4mkTvzdL9kLWgnj86vlKE/ygyLYqVdoUf6Sw=; b=VXl9iCoiO9qlufgXhHbdWOyK+RBcsRdQPGh7ddP0bZElMiSsCioVmCE49qDkEZgsDIg0MW SDV8mzBkOHzcw4lQ+oEMaHErKC8hdb81RGE2jXUsJRctXTIV7fWGnOsdaO85CLqBIyfgTG B3imoILp1uw7H20QngzhOyFBR007LDk= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-481-qQiV0rURMXqdGsEzr2f61g-1; Fri, 04 Mar 2022 00:18:52 -0500 X-MC-Unique: qQiV0rURMXqdGsEzr2f61g-1 Received: by mail-pl1-f199.google.com with SMTP id n12-20020a170902e54c00b0015011534fa3so4107140plf.9 for ; Thu, 03 Mar 2022 21:18:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Jyv1lzZ4mkTvzdL9kLWgnj86vlKE/ygyLYqVdoUf6Sw=; b=ajn9wtqVyKKw8AjP5rOuGudoDZOr9Ncj+Mb6I2Xft4pRqn/kFUtVGn6yXbRNLAwNyB hHafJVw5bSqzoltIWDu4m+qwSiKBWOcfH7tgui4Nl03T+HwYY5eN2/Fpng2u0s00owCV 90lmjV/dGpNynEg9RRCBZJwLQ1JpRWYfxjmseeij7d6r7uNhVu2ZrDWcIMQAeSpvPS6n mtKiziNhCdc5eu3sSua5VRkQWCQWtArAssVwQ2rSErRuuVCDIIsAtXC7slSBZrIPghnm Q8slCKYBUnFj9uNAh9ttNYojGa+QuEY4U9btR3qu95tEzkKRgjm2lm/1hncMBjGSwtfZ CziA== X-Gm-Message-State: AOAM530eYZafTfjtgVBW27JsYVAcwxhQ1xy1XEfXlQU8v1FLGPLpWw4l 9gPrd8minknk/NPINZvUt4KRisjVqvaN+gCJ23Oc/P8o32UCly63111KKD5LRngRrMT7Xis5f0e +izxEkxd5A63VOJS67NfDU0NH X-Received: by 2002:a63:944:0:b0:374:5324:eea1 with SMTP id 65-20020a630944000000b003745324eea1mr33249203pgj.366.1646371131327; Thu, 03 Mar 2022 21:18:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJwhisbQFqQkhrO4+7tBm5bGrC3HEMdBg8mxUGWFFpK8iGmzvB3YNBGzza29cCioxbX4vov0sQ== X-Received: by 2002:a63:944:0:b0:374:5324:eea1 with SMTP id 65-20020a630944000000b003745324eea1mr33249184pgj.366.1646371131059; Thu, 03 Mar 2022 21:18:51 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:50 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Date: Fri, 4 Mar 2022 13:16:56 +0800 Message-Id: <20220304051708.86193-12-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" They will be used in the follow up patches to either check/set/clear uffd-wp bit of a huge pte. So far it reuses all the small pte helpers. Archs can overwrite these vers= ions when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the future. Signed-off-by: Peter Xu --- arch/s390/include/asm/hugetlb.h | 15 +++++++++++++++ include/asm-generic/hugetlb.h | 15 +++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 60f9241e5e4a..19c4b4431d27 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -115,6 +115,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_= t newprot) return pte_modify(pte, newprot); } =20 +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte; +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte; +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return 0; +} + static inline bool gigantic_page_runtime_supported(void) { return true; diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index f39cad20ffc6..896f341f614d 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -35,6 +35,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t = newprot) return pte_modify(pte, newprot); } =20 +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte_mkuffd_wp(pte); +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_uffd_wp(pte); +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return pte_uffd_wp(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00A3BC433EF for ; Fri, 4 Mar 2022 05:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238110AbiCDFT4 (ORCPT ); Fri, 4 Mar 2022 00:19:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238095AbiCDFTt (ORCPT ); Fri, 4 Mar 2022 00:19:49 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BAF6BF8B84 for ; Thu, 3 Mar 2022 21:19:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n7rwO48JtzdkJuoHFZCI8H9KL9Pu0D4gPtoWSB8Kejg=; b=ecAZ2Y+vYtgrk7sA2cJMta3taQpgBxozb0/JKhMInqcxd/a4jFyNecvvtJi1IF3DrwBz40 6cVwomrlUlB5c1R9r4kWhCCcYpVNGqMebd4Hk9Q4BIotDjeJjA4wA3t4eHUcIrXHYrrKUH bECSPROEp/EL0PkLGdkHH6tbO67Pduw= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-226-NJAUmqREMhauC9GZO3UN0A-1; Fri, 04 Mar 2022 00:19:00 -0500 X-MC-Unique: NJAUmqREMhauC9GZO3UN0A-1 Received: by mail-pg1-f200.google.com with SMTP id h12-20020a63530c000000b0037c8f45bf1bso1026095pgb.7 for ; Thu, 03 Mar 2022 21:19:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n7rwO48JtzdkJuoHFZCI8H9KL9Pu0D4gPtoWSB8Kejg=; b=Z6gT0HuqO7KkQDsBKYNvB3d6StQ0OOpRMyMyoNDDT8jFMdnz+7CKW+/6jH9t0ncOkg WGL91LqJNrMtx7ZjfqFXsiRCyEVI58Bt1bxXyqL55zY/R4M5ZNQ25w07+mx7wc/bisC7 DTmH2tPNOW/E4uJIUu2aWfRjKsn3UHHFcAq/LpKb3u/rve8mS6RSM3dO3Kgubdh/wmM8 xQxqyHq69GYURiQprnBx/wuIaoGSzGMsVZPW8C66AY1DN7sV9DFJqs6fD6Qge7/8xqaA Vzj63O5xGDAp+U2scx7b2LOBe3ObwT9zdDsXLBre2WcXvR/VhYCBy0X+5vKzsI/W2+Rc FKzg== X-Gm-Message-State: AOAM532pEcGk43dPJc3Qf8gxUacIn6YRkQLYf3RzHsnp5R0mINfEVahP 9ERsLi+ln+Pv5+MS9FGz2SRqs56gKjlfqB/wyzV4miROmLdyFB6fueg0jTvHJsOp9xlmL1G6Qs1 Tcy0zr3Pbh0RzN7BO6j7qx+a8 X-Received: by 2002:a17:902:bc83:b0:14f:d9b7:ab3 with SMTP id bb3-20020a170902bc8300b0014fd9b70ab3mr38914578plb.23.1646371139355; Thu, 03 Mar 2022 21:18:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJyxQvXTM1bbdfv800pkAzqKOP1rVnQ9jp11rKIolBaZd7PJNXmzE+rE+wUwHWSuQvUScVEx9A== X-Received: by 2002:a17:902:bc83:b0:14f:d9b7:ab3 with SMTP id bb3-20020a170902bc8300b0014fd9b70ab3mr38914569plb.23.1646371139079; Thu, 03 Mar 2022 21:18:59 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:18:58 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 12/23] mm/hugetlb: Hook page faults for uffd write protection Date: Fri, 4 Mar 2022 13:16:57 +0800 Message-Id: <20220304051708.86193-13-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faul= ts. We do this slightly earlier than hugetlb_cow() so that we can avoid taking = some extra locks that we definitely don't need. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b34f50156f7e..d2539e2fe066 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5680,6 +5680,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; =20 + /* Handle userfault-wp first, before trying to lock more pages */ + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + struct vm_fault vmf =3D { + .vma =3D vma, + .address =3D haddr, + .real_address =3D address, + .flags =3D flags, + }; + + spin_unlock(ptl); + if (pagecache_page) { + unlock_page(pagecache_page); + put_page(pagecache_page); + } + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + return handle_userfault(&vmf, VM_UFFD_WP); + } + /* * hugetlb_cow() requires page locks of pte_page(entry) and * pagecache_page, so here we need take the former one --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 673E2C433F5 for ; Fri, 4 Mar 2022 05:19:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238063AbiCDFUL (ORCPT ); Fri, 4 Mar 2022 00:20:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238126AbiCDFUG (ORCPT ); Fri, 4 Mar 2022 00:20:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1C8A9107D3B for ; Thu, 3 Mar 2022 21:19:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371150; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RL3s2QRFkPddYKwSOJ6UpdW5WS1d6dwfLlTkzvd/MfY=; b=OMR3/FhwmI2Aw17NnjajbIUW4LVRM9lZqaW+PpF3UMA+c4No35aZnd1vKTiqm9jr7Aip97 S9mmZ2WL73pM2nhdUCSyubpVkTdzmmmYTlEXfXCrUMJzsakCdGvevQ92qWWOIcATW20Woh oS6sOxgmfPWb4XAqrJNIpV4fWGaHFJE= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-_Joiw8MbP6aI4RAdXKzltQ-1; Fri, 04 Mar 2022 00:19:09 -0500 X-MC-Unique: _Joiw8MbP6aI4RAdXKzltQ-1 Received: by mail-pl1-f199.google.com with SMTP id x6-20020a1709029a4600b0014efe26b04fso4075364plv.21 for ; Thu, 03 Mar 2022 21:19:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RL3s2QRFkPddYKwSOJ6UpdW5WS1d6dwfLlTkzvd/MfY=; b=bSBvrmsYxstf/Wq0T3zsX7cEbYc+R8/1SSXtmcFtIr0rfLXk4c1SXCOYjUZh1dGPGw 40vAUImgF4SwmDPaJDZfFj0gNXoDt2lHGUR87Fb+tO4yBTPNBcTm9RpAMx6PTDiKRaN8 NY2bXhfwKiAocjfwhrvKc+UpgdXeth5DFjqlyvsWd65B4p+602DMDG3QW6nOFe75kYyw F69o5Rb9DgRIVSbYWi5WT05uPI+rGLO4FhAVg7xTEfWMa+z+cAAfo6pW+vyrCxNHn4hl sQvpxPwTIJSGO1iCXcq6ltqNOTIvY0bIqZP18bmacX8re4RUby+MfAbv6lyWyYLTiJ9B bMNA== X-Gm-Message-State: AOAM5337c6YkwqJisXbq0avBZoi+2LPqZT9ZnLyiOttfQHtu07BY/1au KMTq9ya7KfMJQ1eCxPl8BY+RO5js9ulzyn0ANfuN4DLhYAEsigBiuNy+gePTQBVe4IGLDq9Rxh3 xefv3sGktky49R7LBWaxtkzG8 X-Received: by 2002:a17:90a:6542:b0:1bd:149f:1c29 with SMTP id f2-20020a17090a654200b001bd149f1c29mr8883037pjs.240.1646371147291; Thu, 03 Mar 2022 21:19:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJw98NhiPLfQ+4A3W3/3rIHdWYk9obskF5GhOKrRBQEImBHdUerNxlQUCrFQ/jhk98NTLdWLrQ== X-Received: by 2002:a17:90a:6542:b0:1bd:149f:1c29 with SMTP id f2-20020a17090a654200b001bd149f1c29mr8883022pjs.240.1646371147002; Thu, 03 Mar 2022 21:19:07 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.18.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:06 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Date: Fri, 4 Mar 2022 13:16:58 +0800 Message-Id: <20220304051708.86193-14-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the sta= ck. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without set= ting dirty bit in the huge pte if the page is installed as read-only. However w= e'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, b= ut also because the page does contain dirty data that the kernel just copied f= rom the userspace. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 29 +++++++++++++++++++++++------ mm/userfaultfd.c | 14 +++++++++----- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 53c1b6082a4c..6347298778b6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, = pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -355,7 +356,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_st= ruct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d2539e2fe066..b094359255f7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5763,7 +5763,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue =3D (mode =3D=3D MCOPY_ATOMIC_CONTINUE); struct hstate *h =3D hstate_vma(dst_vma); @@ -5893,7 +5894,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_m= m, goto out_release_unlock; =20 ret =3D -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; =20 if (vm_shared) { @@ -5903,17 +5909,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_= mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } =20 - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable =3D 0; else writable =3D dst_vma->vm_flags & VM_WRITE; =20 _dst_pte =3D make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte =3D huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte =3D huge_pte_mkdirty(_dst_pte); _dst_pte =3D pte_mkyoung(_dst_pte); =20 + if (wp_copy) + _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); =20 (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index ef418a48b121..54e58f0d93e4 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -304,7 +304,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_shared =3D dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -392,7 +393,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, } =20 if (mode !=3D MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err =3D -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -400,7 +401,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(s= truct mm_struct *dst_mm, } =20 err =3D hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); =20 mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -455,7 +457,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct = *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ =20 static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -575,7 +578,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm= _struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); =20 if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABCC2C433EF for ; Fri, 4 Mar 2022 05:19:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237727AbiCDFUS (ORCPT ); Fri, 4 Mar 2022 00:20:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238142AbiCDFUH (ORCPT ); Fri, 4 Mar 2022 00:20:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AEE04182BE9 for ; Thu, 3 Mar 2022 21:19:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7O5uZll4sV1zFyOKkIfzUXmTvhYgtYwRuXLomdEwgYM=; b=c+01xZcWsWWnqKBYokd5JeJT5fc5GvN7XJAJ2C3iv2zKUVu9eHun2d2s4wod6QAxwcVQa4 Ta8E/9Bdj52Jp5cRkHDz5vpSvwCcRWCdNvWEmd3znxg3JQxl48Ucf9nhyXEQd+lMeVKQW5 CU3ww06HdvJEzdeH1YDmPsWEKdOe8CY= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-351-3jyxDXRFPaK7qlAQhQ0Yig-1; Fri, 04 Mar 2022 00:19:17 -0500 X-MC-Unique: 3jyxDXRFPaK7qlAQhQ0Yig-1 Received: by mail-pf1-f200.google.com with SMTP id a23-20020aa794b7000000b004f6a3ac7a87so1396759pfl.23 for ; Thu, 03 Mar 2022 21:19:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7O5uZll4sV1zFyOKkIfzUXmTvhYgtYwRuXLomdEwgYM=; b=iQG2qBnaheRa030MqY4deFu2yDr1ufFkf/FOnyBjsbI1pPhP8wHVA8fyGHaHpLADTR +FShsHpG5Uj9uS9qv0Ym7ijHAVZpoAYwM7jXBMQ0zcyantdJkcR17KDe3tLD96W4Mgyi qYzJONaG+l6r9Y2arP+x5k/dZFO/NFn4T1NFw3E1aSaUoW35amSDoqLqmn1EcdmdwJ3s oG37TBfBrLXN4kUQl2KJJESQXOAqnYFBYIaHgjoS6kT0mNXzSrn1Tx7zfQlZfS7+PRr8 d2WFwwhooUaO7CYdBB85NATEa5dmRpBhhHxtmmzDCyAqCdafqh96ooL5CTXVWjAKShjV KZdw== X-Gm-Message-State: AOAM531Bwmv0wZBwRg90sTolBlVuyMxpjhQ53uftNZjx3xSZRtKeNU2J dcQxxiLhzi6qNiIhoxR72FcdstP0OpVIs90eW5C8P2zlnSgDFTtQk3glOMcA1zFg0ZOQMDrm7C1 NjH4G9OfP6S8CrMMrIOBZZNEI X-Received: by 2002:a05:6a00:1a92:b0:4f0:edf6:83f5 with SMTP id e18-20020a056a001a9200b004f0edf683f5mr41896945pfv.31.1646371155355; Thu, 03 Mar 2022 21:19:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwjpsXb3Z+my2n7fkg0RxYfFGgIXvxEsSLrg+d/2jd0WAHT4xitgJREhUKjI7MHFmmcFfIeRA== X-Received: by 2002:a05:6a00:1a92:b0:4f0:edf6:83f5 with SMTP id e18-20020a056a001a9200b004f0edf683f5mr41896931pfv.31.1646371155050; Thu, 03 Mar 2022 21:19:15 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:14 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Date: Fri, 4 Mar 2022 13:16:59 +0800 Message-Id: <20220304051708.86193-15-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This starts from passing cp_flags into hugetlb_change_protection() so huget= lb will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests. huge_pte_clear_uffd_wp() is introduced to handle the case where the UFFDIO_WRITEPROTECT is requested upon migrating huge page entries. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 13 ++++++++++++- mm/mprotect.c | 3 ++- mm/userfaultfd.c | 8 ++++++++ 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6347298778b6..38c5ac28b787 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -210,7 +210,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsi= gned long address, int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long address, unsigned long end, pgprot_t newprot, + unsigned long cp_flags); =20 bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -391,7 +392,8 @@ static inline void move_hugetlb_state(struct page *oldp= age, =20 static inline unsigned long hugetlb_change_protection( struct vm_area_struct *vma, unsigned long address, - unsigned long end, pgprot_t newprot) + unsigned long end, pgprot_t newprot, + unsigned long cp_flags) { return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b094359255f7..396d5a516d05 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6151,7 +6151,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct= vm_area_struct *vma, } =20 unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot) + unsigned long address, unsigned long end, + pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long start =3D address; @@ -6161,6 +6162,8 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, unsigned long pages =3D 0; bool shared_pmd =3D false; struct mmu_notifier_range range; + bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; =20 /* * In the case of shared PMDs, the area to flush could be beyond @@ -6202,6 +6205,10 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, entry =3D make_readable_migration_entry( swp_offset(entry)); newpte =3D swp_entry_to_pte(entry); + if (uffd_wp) + newpte =3D pte_swp_mkuffd_wp(newpte); + else if (uffd_wp_resolve) + newpte =3D pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, newpte, huge_page_size(h)); pages++; @@ -6216,6 +6223,10 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + if (uffd_wp) + pte =3D huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte =3D huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 95b307d4766d..1b98e29316b6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -451,7 +451,8 @@ unsigned long change_protection(struct vm_area_struct *= vma, unsigned long start, BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL); =20 if (is_vm_hugetlb_page(vma)) - pages =3D hugetlb_change_protection(vma, start, end, newprot); + pages =3D hugetlb_change_protection(vma, start, end, newprot, + cp_flags); else pages =3D change_protection_range(vma, start, end, newprot, cp_flags); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 54e58f0d93e4..441728732033 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -704,6 +704,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsig= ned long start, atomic_t *mmap_changing) { struct vm_area_struct *dst_vma; + unsigned long page_mask; pgprot_t newprot; int err; =20 @@ -740,6 +741,13 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsi= gned long start, if (!vma_is_anonymous(dst_vma)) goto out_unlock; =20 + if (is_vm_hugetlb_page(dst_vma)) { + err =3D -EINVAL; + page_mask =3D vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + goto out_unlock; + } + if (enable_wp) newprot =3D vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); else --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02699C433F5 for ; Fri, 4 Mar 2022 05:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230244AbiCDFUX (ORCPT ); Fri, 4 Mar 2022 00:20:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238079AbiCDFUO (ORCPT ); Fri, 4 Mar 2022 00:20:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EBDF3182DBA for ; Thu, 3 Mar 2022 21:19:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u8Oi5M9aJF+Nhh3uBfIH6zUjZCwWQp8p9uYRbzwSERc=; b=LhZNmCUa1XWemfPZk44eMSeGCW7O+jDZTQdXJ5qRWS4A7BFLWt2WpaX9decsP1d4rp8HIG WQyabHGwdRj4S4rB3FQqjQi02fZ6GA1UPyXLV99NR/oeu3gfJYBCfGphKBWSeZccXtaV+7 RtPNISc8W3lar1gF6w+SlNjYg5YjWqY= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-223-fgRhX_qVMCOIYPTm4N3TNA-1; Fri, 04 Mar 2022 00:19:24 -0500 X-MC-Unique: fgRhX_qVMCOIYPTm4N3TNA-1 Received: by mail-pj1-f70.google.com with SMTP id ev5-20020a17090aeac500b001bc3cb23d4cso4314379pjb.1 for ; Thu, 03 Mar 2022 21:19:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=u8Oi5M9aJF+Nhh3uBfIH6zUjZCwWQp8p9uYRbzwSERc=; b=sQ9yYpTFToONYCzMPHJoAquA5mrPouz0kJCmDTqSaOS0ru+k7OFuj5KmZD9nI5dOV3 KFEBVABg4yfpz+UXr6eV7wWUqp8xA4stzEl2CsEWkevhxwVP+xeEny5jA6SddrN2cCPJ vQIOzbQ8PZRQ/HPr3E04LEYHTmfybKw+7/2y65nyzr5Zj27O3lpB4AXlXQnKLsIDGZ72 BGeg/HktADDdkSKFk82+pcZRew8BZ/Ap8wMWDUNouFoVjuB2K1FGCynjY/FR79boOuTj Amr48n4PPqnCVRfr6ghaJXx/+ysdVwlp2pYoTkXlO8hKaJ3kZkFH+KyhEDyWAuNMwmPQ 110A== X-Gm-Message-State: AOAM533GEWwU/ad1/I3Jp8pJePLlHYksiQ96po3VHtT6xuDXnOa6/Dy7 s2ICdwckSe2Iw1TKuvhNlkt5Lm9vATTU+MdnhWL4EySCaddB2DJI0ENKX0DRT+a83eije3lzTCg NVGShhSH49jDSDHloyKppb7GA X-Received: by 2002:a17:902:e80f:b0:151:bdd2:cabc with SMTP id u15-20020a170902e80f00b00151bdd2cabcmr970852plg.31.1646371163395; Thu, 03 Mar 2022 21:19:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJx50hZU3JRaU/3TajwefuQOeadN49vMH2q5JxcC4kmufwLK+rjdl0sW8g3unVKCqbQjfQkoew== X-Received: by 2002:a17:902:e80f:b0:151:bdd2:cabc with SMTP id u15-20020a170902e80f00b00151bdd2cabcmr970827plg.31.1646371163102; Thu, 03 Mar 2022 21:19:23 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:22 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 15/23] mm/hugetlb: Handle pte markers in page faults Date: Fri, 4 Mar 2022 13:17:00 +0800 Message-Id: <20220304051708.86193-16-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allow hugetlb code to handle pte markers just like none ptes. It's mostly there, we just need to make sure we don't assume hugetlb_no_page() only han= dles none pte, so when detecting pte change we should use pte_same() rather than pte_none(). We need to pass in the old_pte to do the comparison. Check the original pte to see whether it's a pte marker, if it is, we should recover uffd-wp bit on the new pte to be installed, so that the next write = will be trapped by uffd. Signed-off-by: Peter Xu --- mm/hugetlb.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 396d5a516d05..afd3d93cfe9a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5383,7 +5383,8 @@ static inline vm_fault_t hugetlb_handle_userfault(str= uct vm_area_struct *vma, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, unsigned int flags) + unsigned long address, pte_t *ptep, + pte_t old_pte, unsigned int flags) { struct hstate *h =3D hstate_vma(vma); vm_fault_t ret =3D VM_FAULT_SIGBUS; @@ -5509,7 +5510,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *m= m, =20 ptl =3D huge_pte_lock(h, mm, ptep); ret =3D 0; - if (!huge_pte_none(huge_ptep_get(ptep))) + /* If pte changed from under us, retry */ + if (!pte_same(huge_ptep_get(ptep), old_pte)) goto backout; =20 if (anon_rmap) { @@ -5519,6 +5521,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *= mm, page_dup_rmap(page, true); new_pte =3D make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); + /* + * If this pte was previously wr-protected, keep it wr-protected even + * if populated. + */ + if (unlikely(pte_marker_uffd_wp(old_pte))) + new_pte =3D huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); set_huge_pte_at(mm, haddr, ptep, new_pte); =20 hugetlb_count_add(pages_per_huge_page(h), mm); @@ -5636,8 +5644,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); =20 entry =3D huge_ptep_get(ptep); - if (huge_pte_none(entry)) { - ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags); + /* PTE markers should be handled the same way as none pte */ + if (huge_pte_none_mostly(entry)) { + ret =3D hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + entry, flags); goto out_mutex; } =20 --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2539C433EF for ; Fri, 4 Mar 2022 05:19:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238132AbiCDFUd (ORCPT ); Fri, 4 Mar 2022 00:20:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238119AbiCDFUW (ORCPT ); Fri, 4 Mar 2022 00:20:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D7343184615 for ; Thu, 3 Mar 2022 21:19:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8c/lNa0oM80rkM4kbMMwF9soU2NQfSaZ5E7MuZnV//g=; b=isNEHponEL21nwh7Ua9pT37LZoTRuMx/nxB0oG0Avq3rQFvV3elkehMafVN1iNcNy1/IrZ NZTHIdIUUkFR1mXBcPvBtDxLojoBreRCOvsV0z792eHN8ZfrffveWox6Sac/4X+QIhQvHP Ek21AZPYHoGLV4F8Mal3LmZfuVopio0= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-80-d18QJnMHO_iSKP7gAm8lxw-1; Fri, 04 Mar 2022 00:19:32 -0500 X-MC-Unique: d18QJnMHO_iSKP7gAm8lxw-1 Received: by mail-pj1-f70.google.com with SMTP id p5-20020a17090a748500b001bee6752974so4167502pjk.8 for ; Thu, 03 Mar 2022 21:19:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8c/lNa0oM80rkM4kbMMwF9soU2NQfSaZ5E7MuZnV//g=; b=poFGTYUIZIkMyJTGOSBgU5m1cNTGJOlAA+9yInXLTd6TyPCaZixLnZv9IGSGPHako2 +mHJlEquqP3skcw85tJgQ2VmcQKtkRdLbsJS7UO/TUWzdYSZfOUcw0OLPySsqVGFZiNY y4WE0egUaWYQo7QFU9vwepHChrxezrfTGBChlbHWxHnmE0YW7FOInluuoSwTZ+0ORvBv RbXz4zydO/5/uiTg7hJ6UG055N07ihNNc19Syrtj0DbuT35PUSMQxMc62YxtLTq1qSud o6Aqs1JogSylviykCqFHghox/p/kpX0/vrCqQGPv7X47KJAz4BQX59GypEnipY95ITWQ H3dQ== X-Gm-Message-State: AOAM530gA7uXbdkukbOXqXEFlqrYWPLgJS13PKTTFogEnkt9Ui+FxrlZ Gw+MRXNLQY1fb9BARe6x0zjpnugoZvxAZnO+r1nm3EzoaMKLVIs3NHQqFIeMbMllrEc0G1fTkLv IaQUBziBUUoYgR9aHogQbkQqo X-Received: by 2002:a17:902:f145:b0:151:a441:433a with SMTP id d5-20020a170902f14500b00151a441433amr8804581plb.44.1646371171329; Thu, 03 Mar 2022 21:19:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFfQF8+il2cx3Q+YaVVz5krns+0z+Q5hZNGFwZcBniVedYFkNfGyU++c+42HLp3JPbzL24yQ== X-Received: by 2002:a17:902:f145:b0:151:a441:433a with SMTP id d5-20020a170902f14500b00151a441433amr8804562plb.44.1646371171051; Thu, 03 Mar 2022 21:19:31 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:30 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Date: Fri, 4 Mar 2022 13:17:01 +0800 Message-Id: <20220304051708.86193-17-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Teach hugetlbfs code to wr-protect none ptes just in case the page cache existed for that pte. Meanwhile we also need to be able to recognize a uff= d-wp marker pte and remove it for uffd_wp_resolve. Since at it, introduce a variable "psize" to replace all references to the = huge page size fetcher. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index afd3d93cfe9a..1a20be29ac3a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6169,7 +6169,7 @@ unsigned long hugetlb_change_protection(struct vm_are= a_struct *vma, pte_t *ptep; pte_t pte; struct hstate *h =3D hstate_vma(vma); - unsigned long pages =3D 0; + unsigned long pages =3D 0, psize =3D huge_page_size(h); bool shared_pmd =3D false; struct mmu_notifier_range range; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; @@ -6189,13 +6189,19 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, =20 mmu_notifier_invalidate_range_start(&range); i_mmap_lock_write(vma->vm_file->f_mapping); - for (; address < end; address +=3D huge_page_size(h)) { + for (; address < end; address +=3D psize) { spinlock_t *ptl; - ptep =3D huge_pte_offset(mm, address, huge_page_size(h)); + ptep =3D huge_pte_offset(mm, address, psize); if (!ptep) continue; ptl =3D huge_pte_lock(h, mm, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { + /* + * When uffd-wp is enabled on the vma, unshare + * shouldn't happen at all. Warn about it if it + * happened due to some reason. + */ + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); pages++; spin_unlock(ptl); shared_pmd =3D true; @@ -6220,12 +6226,20 @@ unsigned long hugetlb_change_protection(struct vm_a= rea_struct *vma, else if (uffd_wp_resolve) newpte =3D pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, - newpte, huge_page_size(h)); + newpte, psize); pages++; } spin_unlock(ptl); continue; } + if (unlikely(pte_marker_uffd_wp(pte))) { + /* + * This is changing a non-present pte into a none pte, + * no need for huge_ptep_modify_prot_start/commit(). + */ + if (uffd_wp_resolve) + huge_pte_clear(mm, address, ptep, psize); + } if (!huge_pte_none(pte)) { pte_t old_pte; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); @@ -6239,6 +6253,12 @@ unsigned long hugetlb_change_protection(struct vm_ar= ea_struct *vma, pte =3D huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; + } else { + /* None pte */ + if (unlikely(uffd_wp)) + /* Safe to modify directly (none->non-present). */ + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); } --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9D2CC433EF for ; Fri, 4 Mar 2022 05:19:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238089AbiCDFUj (ORCPT ); Fri, 4 Mar 2022 00:20:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238114AbiCDFUa (ORCPT ); Fri, 4 Mar 2022 00:20:30 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 10995186225 for ; Thu, 3 Mar 2022 21:19:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W3A1Sw4gSLvq3WpWIO0c1Ts85eaL9QjEv2hQ97BtHXI=; b=HmdgJAZ8zrF1KvAZn7ZTifd4/4NGqTA5dHAFWSt58JmrtZ2aNqEuzaFywBPwiSVVatFSEk hSEMtsSFQgRpQaTcHzPYsJkhvlNyarL96qa+mRZ3kaSkm5KGQlmD7vNiN4mmEAb5MYQa89 9KDBJS9K/WSwlVmhowBuqrnmng4jGk4= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-31-kg2NmLfVND2FP1Ik24wbMw-1; Fri, 04 Mar 2022 00:19:40 -0500 X-MC-Unique: kg2NmLfVND2FP1Ik24wbMw-1 Received: by mail-pf1-f200.google.com with SMTP id j204-20020a6280d5000000b004e107ad3488so4454269pfd.15 for ; Thu, 03 Mar 2022 21:19:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=W3A1Sw4gSLvq3WpWIO0c1Ts85eaL9QjEv2hQ97BtHXI=; b=RTsGWwC6V4PorcwCyYzXhZEQcOGg7VyWVFpU4/on1utaq3qneHe2qXgFJMZWi56FKx I2fM+LrDCTM0lrh50ss/bgyftRdrrhvhoz3O2gs6FRiOQUhSgK/ywrJKeZ4q+Be9gpVe c1eEoteruvOy5jO8yfZ1HDOMKmHqQ0QNdaT4RPHIjmGJWFkUZF3moZdT+NW0GLxk7ALg Khouo5t9GqODIAvFhObcLwSR0RtE4eo538rsRJbnpD/F58FlExDn4CQmfeqFQRWBy06W bVBEng8J0EYtRP18GDFyC0Id4VRa2J4hS4LpnN6fMXWlKDmR7LN9H+jOnY994Hqd+3G7 fv9A== X-Gm-Message-State: AOAM533ai/59ZNmsJZXNFJ94hxfZsEIJCH7OlzNS8MCbATl5ov+u701b 9MXU2IO39bjYPdyO3X+5QpFz5vXWLdBZtwuk7MxH0D6ETnCz/j72brknvEUnmUopYBVDVrC4Bn2 7rKl7uR9JSrRxXLs+Y3ZppAyD X-Received: by 2002:a17:902:b908:b0:151:b8ec:2038 with SMTP id bf8-20020a170902b90800b00151b8ec2038mr2680076plb.76.1646371179418; Thu, 03 Mar 2022 21:19:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJx0oSKjLHhzbwnFEI3D/leuju9Hh++AWa1WGIDqEaacCZODw/P1z3OkOVq6mvSOA7JhcqdiMA== X-Received: by 2002:a17:902:b908:b0:151:b8ec:2038 with SMTP id bf8-20020a170902b90800b00151b8ec2038mr2680056plb.76.1646371179057; Thu, 03 Mar 2022 21:19:39 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:38 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Date: Fri, 4 Mar 2022 13:17:02 +0800 Message-Id: <20220304051708.86193-18-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte = if unmapping an entire vma or synchronized such that faults can not race with = the unmap operation. This requires passing zap_flags all the way to the lowest level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronizat= ion. Later when hole punch actually removes the page from the file, it will chec= k to see if there was a subsequent fault and if so take the hugetlb fault mutex while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARKER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unmap= a hugetlb range" is (IMHO): we should never reach a state when a page fault c= ould errornously fault in a page-cache page that was wr-protected to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_lis= t(), because if a page faults after that call and before remove_inode_hugepages(= ) is executed, the page cache can be mapped writable again in the small racy win= dow, that can cause unexpected data overwritten. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 33 +++++++++++++++++++++++++-------- mm/memory.c | 5 ++++- 4 files changed, 43 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 171212bdaae6..d017c674f1b8 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page) } =20 static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t = end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t = end, + unsigned long zap_flags) { struct vm_area_struct *vma; =20 @@ -438,7 +439,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgof= f_t start, pgoff_t end) } =20 unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } =20 @@ -516,7 +517,8 @@ static void remove_inode_hugepages(struct inode *inode,= loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); } =20 @@ -582,7 +584,8 @@ static void hugetlb_vmtruncate(struct inode *inode, lof= f_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -615,8 +618,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, l= off_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 38c5ac28b787..ab48b3bbb0e6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -143,11 +143,12 @@ long follow_hugetlb_page(struct mm_struct *, struct v= m_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -400,7 +401,8 @@ static inline unsigned long hugetlb_change_protection( =20 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1a20be29ac3a..994d7a3ee871 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4931,7 +4931,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *v= ma, =20 static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_= struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long address; @@ -4987,7 +4987,18 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct * unmapped and its refcount is dropped, so just clear pte here. */ if (unlikely(!pte_present(pte))) { - huge_pte_clear(mm, address, ptep, sz); + /* + * If the pte was wr-protected by uffd-wp in any of the + * swap forms, meanwhile the caller does not want to + * drop the uffd-wp bit in this zap, then replace the + * pte with a marker. + */ + if (pte_swp_uffd_wp_any(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); + else + huge_pte_clear(mm, address, ptep, sz); spin_unlock(ptl); continue; } @@ -5015,7 +5026,11 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); - + /* Leave a uffd-wp pte marker if needed */ + if (huge_pte_uffd_wp(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); hugetlb_count_sub(pages_per_huge_page(h), mm); page_remove_rmap(page, vma, true); =20 @@ -5049,9 +5064,10 @@ static void __unmap_hugepage_range(struct mmu_gather= *tlb, struct vm_area_struct =20 void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); =20 /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -5067,12 +5083,13 @@ void __unmap_hugepage_range_final(struct mmu_gather= *tlb, } =20 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; =20 tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } =20 @@ -5127,7 +5144,7 @@ static void unmap_ref_private(struct mm_struct *mm, s= truct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index e9e335ecb5dc..43ab8d6c768e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1679,8 +1679,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags =3D details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34FCFC433F5 for ; Fri, 4 Mar 2022 05:20:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238136AbiCDFUv (ORCPT ); Fri, 4 Mar 2022 00:20:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238122AbiCDFUl (ORCPT ); Fri, 4 Mar 2022 00:20:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 13467184B75 for ; Thu, 3 Mar 2022 21:19:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371190; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wn3ILT0Lxv5s6XkvrMXIyY+ihJQ80lsNhwBboggDj0M=; b=d6XhGMTO4JNx7TAyA9UtAGDe203phBfwTEZExJPKqIBLVE8vZPC+SAUGKMXChBRGVoXRuX w8LfMPmAlnDIBr1M+Ly4ilu+jhJHqZxvs9rFdMUd2LFIRBmrhlTRFehwtIo7vjlHa88bNp ZZMdHXWe4s1xiNEMiXNecIrVBdRD0ac= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-464-YeHpyN9vP2GHZyPuREOxpw-1; Fri, 04 Mar 2022 00:19:49 -0500 X-MC-Unique: YeHpyN9vP2GHZyPuREOxpw-1 Received: by mail-pj1-f70.google.com with SMTP id m14-20020a17090a4d8e00b001bf2d4926c5so67406pjh.3 for ; Thu, 03 Mar 2022 21:19:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Wn3ILT0Lxv5s6XkvrMXIyY+ihJQ80lsNhwBboggDj0M=; b=7vjBfIsrbmiEvMibOVuJ1sIyNfq01UNwsDiCjFzJk0M2+IPxo84zd4U3VGjZZVw4f3 i7MEKy2WCICcpz3oZqPreMjENHO3i7q+x3gT0YCwYq1ntJO+CmibOGNRpCqmNK9ZyYCA bJM9+hOy9BPrKA19/W85EkLYqr7wTj+EMNnIWGhxzFdCWfDOYpI3ZyxV/hWIZYAUO5JT fwGGu0eU8B4Y/MRSKMd4LZ1cD6Zmi7Nf+cvmRUbez3UU7vmSuzoHuVMB4XgFfo+C6nYk bu0RJD6ZwTt7inoIfcETSy4+xQBZeKPMf78mJNvQrsd4IqM5MzOShP+ezR6V87cNqWDc jrnw== X-Gm-Message-State: AOAM533NERY5aGIoZ9dfKRELp2fM/FAJDA+Xb75WQvUqIA3i8w3mPvnZ OKRVf5mzu0Fhig33ftJw8RumHrxcf8AAerUE+CttVhGO32+L0GvMn0Y9HNGAipE8AgdLLZaCE1Z qaWwmnH9kHlBdMOv3l+W0iOYZ X-Received: by 2002:a17:90a:f48f:b0:1bc:2521:fb0a with SMTP id bx15-20020a17090af48f00b001bc2521fb0amr9165163pjb.48.1646371187693; Thu, 03 Mar 2022 21:19:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJycPvMgb2+JEX9mKyCqqvdoYelrEZ2x92Thb/SSCgHsoZmufdyixH7q1XZKq5P8YsP5zDuMwg== X-Received: by 2002:a17:90a:f48f:b0:1bc:2521:fb0a with SMTP id bx15-20020a17090af48f00b001bc2521fb0amr9165135pjb.48.1646371187413; Thu, 03 Mar 2022 21:19:47 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:47 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 18/23] mm/hugetlb: Handle uffd-wp during fork() Date: Fri, 4 Mar 2022 13:17:03 +0800 Message-Id: <20220304051708.86193-19-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() becau= se for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 +++++-- mm/hugetlb.c | 41 +++++++++++++++++++++++++++-------------- mm/memory.c | 2 +- 3 files changed, 33 insertions(+), 17 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ab48b3bbb0e6..6df51d23b7ee 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct vm_area_struct *new_vma, unsigned long old_addr, unsigned long new_addr, unsigned long len); -int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct= vm_area_struct *); +int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, + struct vm_area_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, @@ -268,7 +269,9 @@ static inline struct page *follow_huge_addr(struct mm_s= truct *mm, } =20 static inline int copy_hugetlb_page_range(struct mm_struct *dst, - struct mm_struct *src, struct vm_area_struct *vma) + struct mm_struct *src, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 994d7a3ee871..f2508620f197 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4696,23 +4696,24 @@ hugetlb_install_page(struct vm_area_struct *vma, pt= e_t *ptep, unsigned long addr } =20 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; struct page *ptepage; unsigned long addr; - bool cow =3D is_cow_mapping(vma->vm_flags); - struct hstate *h =3D hstate_vma(vma); + bool cow =3D is_cow_mapping(src_vma->vm_flags); + struct hstate *h =3D hstate_vma(src_vma); unsigned long sz =3D huge_page_size(h); unsigned long npages =3D pages_per_huge_page(h); - struct address_space *mapping =3D vma->vm_file->f_mapping; + struct address_space *mapping =3D src_vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret =3D 0; =20 if (cow) { - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src, - vma->vm_start, - vma->vm_end); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, + src_vma->vm_start, + src_vma->vm_end); mmu_notifier_invalidate_range_start(&range); } else { /* @@ -4724,12 +4725,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, i_mmap_lock_read(mapping); } =20 - for (addr =3D vma->vm_start; addr < vma->vm_end; addr +=3D sz) { + for (addr =3D src_vma->vm_start; addr < src_vma->vm_end; addr +=3D sz) { spinlock_t *src_ptl, *dst_ptl; src_pte =3D huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte =3D huge_pte_alloc(dst, vma, addr, sz); + dst_pte =3D huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret =3D -ENOMEM; break; @@ -4764,6 +4765,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } else if (unlikely(is_hugetlb_entry_migration(entry) || is_hugetlb_entry_hwpoisoned(entry))) { swp_entry_t swp_entry =3D pte_to_swp_entry(entry); + bool uffd_wp =3D huge_pte_uffd_wp(entry); =20 if (is_writable_migration_entry(swp_entry) && cow) { /* @@ -4773,10 +4775,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, swp_entry =3D make_readable_migration_entry( swp_offset(swp_entry)); entry =3D swp_entry_to_pte(swp_entry); + if (userfaultfd_wp(src_vma) && uffd_wp) + entry =3D huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, entry, sz); } + if (!userfaultfd_wp(dst_vma) && uffd_wp) + entry =3D huge_pte_clear_uffd_wp(entry); set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + } else if (unlikely(is_pte_marker(entry))) { + /* + * We copy the pte marker only if the dst vma has + * uffd-wp enabled. + */ + if (userfaultfd_wp(dst_vma)) + set_huge_pte_at(dst, addr, dst_pte, entry); } else { entry =3D huge_ptep_get(src_pte); ptepage =3D pte_page(entry); @@ -4791,20 +4804,20 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, * need to be without the pgtable locks since we could * sleep during the process. */ - if (unlikely(page_needs_cow_for_dma(vma, ptepage))) { + if (unlikely(page_needs_cow_for_dma(src_vma, ptepage))) { pte_t src_pte_old =3D entry; struct page *new; =20 spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new =3D alloc_huge_page(vma, addr, 1); + new =3D alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { put_page(ptepage); ret =3D PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, vma, + copy_user_huge_page(new, ptepage, addr, dst_vma, npages); put_page(ptepage); =20 @@ -4814,13 +4827,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry =3D huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, vma, addr, + restore_reserve_on_error(h, dst_vma, addr, new); put_page(new); /* dst_entry won't change as in child */ goto again; } - hugetlb_install_page(vma, dst_pte, addr, new); + hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; diff --git a/mm/memory.c b/mm/memory.c index 43ab8d6c768e..66c9890b7678 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1282,7 +1282,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma) return 0; =20 if (is_vm_hugetlb_page(src_vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, src_vma); + return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); =20 if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { /* --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEE36C433FE for ; Fri, 4 Mar 2022 05:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238119AbiCDFU7 (ORCPT ); Fri, 4 Mar 2022 00:20:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238143AbiCDFUt (ORCPT ); Fri, 4 Mar 2022 00:20:49 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 86C9118621A for ; Thu, 3 Mar 2022 21:19:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PRicOcRvWzoxpLs51SSJ+aFkDsMZLS/bjPquNOlSlQc=; b=Yca8+iVM3eE0dnGc4A+nYlYi/0dJC4srfLiwgqVURRSjZNFUyIGTq+TIpedDa5UOGf0TN0 UjQANFq1ATI+ZAsOAK7UrqG7IYW9DT30nw8O34aR0daTjOKA6h2vezX3cjhQWvtZBMy6ic gudt5oLeziktB5AFc80UWwxIRgkV88Q= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-484-r29z7rh0N1uL0nfaiV1c2Q-1; Fri, 04 Mar 2022 00:19:56 -0500 X-MC-Unique: r29z7rh0N1uL0nfaiV1c2Q-1 Received: by mail-pg1-f200.google.com with SMTP id g31-20020a63521f000000b003783582a261so3923368pgb.5 for ; Thu, 03 Mar 2022 21:19:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PRicOcRvWzoxpLs51SSJ+aFkDsMZLS/bjPquNOlSlQc=; b=swfysf/hzH6y2mgdTPasDfG1YlUJGbnvjIT/apnye9iCg3pvhS/xLw9sTFG3JERKCg RV6TFTxWSmVQKDS835PxvGQf7cF9omcsqLXlM1Apw6Vosnrch0B3xVvm+eChGJJS8KdM x7TPzcj7rkWTGMDo6BdtoylRR6VnToRHuIpyzHve4fiVYZaP45XRFmhueNQKap3soH4D SVzrtMrNkLjnBdxo8EY/OgbmJzUjMYnZMIVjYRm4MufJAzAA2sZTZXp7yuBN2kZ7+m3Q wVQZJk1tkV9ws5b7+jHFzDBb2l7qi6vp47P3zXVYEVxyEKHR1/gSJ0ZJuqHEYOODWxfc qrhw== X-Gm-Message-State: AOAM533EGRnsQUV+U++w2M6FmY+SUJfSkaG6rb4VGn616XAIwdm2QIiT gWHjYJN/PFrnoI0v6BTHg2pRbdyz2Zj8JmgIQI3+ddNTVVzcmKMk0QuddZuYKKzEr6zFHO8WakI bGWSZW1cLy1Hd5pchtzSUZO5T X-Received: by 2002:a63:8bca:0:b0:37c:9049:103 with SMTP id j193-20020a638bca000000b0037c90490103mr2179463pge.387.1646371195375; Thu, 03 Mar 2022 21:19:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJzMQjw3TZyj+/ynwRaeUsP3tPV3EyAhD/DLSy/Gjrm4jimUydTxbOQNlZECErcoEoI+7O+Amw== X-Received: by 2002:a63:8bca:0:b0:37c:9049:103 with SMTP id j193-20020a638bca000000b0037c90490103mr2179446pge.387.1646371195056; Thu, 03 Mar 2022 21:19:55 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:19:54 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Date: Fri, 4 Mar 2022 13:17:04 +0800 Message-Id: <20220304051708.86193-20-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When we're trying to collapse a 2M huge shmem page, don't retract pgtable p= md page if it's registered with uffd-wp, because that pgtable could have pte markers installed. Recycling of that pgtable means we'll lose the pte mark= ers. That could cause data loss for an uffd-wp enabled application on shmem. Instead of disabling khugepaged on these files, simply skip retracting these special VMAs, then the page cache can still be merged into a huge thp, and other mm/vma can still map the range of file with a huge thp when proper. Note that checking VM_UFFD_WP needs to be done with mmap_sem held for write, that avoids race like: khugepaged user thread =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D check VM_UFFD_WP, not set UFFDIO_REGISTER with uffd-wp on shmem wr-protect some pages (install marke= rs) take mmap_sem write lock erase pmd and free pmd page --> pte markers are dropped unnoticed! Signed-off-by: Peter Xu --- mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a4e5eaf3eb01..87d88d6725af 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; =20 + /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ + if (userfaultfd_wp(vma)) + return; + hpage =3D find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space= *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + /* + * When a vma is registered with uffd-wp, we can't + * recycle the pmd pgtable because there can be pte + * markers installed. Skip it only, so the rest mm/vma + * can still have the same file mapped hugely, however + * it'll always mapped in small page size for uffd-wp + * registered ranges. + */ + if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA530C433EF for ; Fri, 4 Mar 2022 05:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238166AbiCDFVL (ORCPT ); Fri, 4 Mar 2022 00:21:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238114AbiCDFUx (ORCPT ); Fri, 4 Mar 2022 00:20:53 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3D477182DBB for ; Thu, 3 Mar 2022 21:20:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371205; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VZsj8wGvJYP0Z35VYck5GXCqHSNO5FmtKWCwc6ZUvEE=; b=hLYXAFE7nhtMixtTlijhkixA6p8xjmUAClTwIJcrK/mfQcZQY3cdS3NvYM8Mqi0jH7+to5 HuMYovYgsKWFnnoUfaVi71FRz6lXseJ3lHy8X0Faj2HmCOUTueUIMr+uT5Ei/esj5o9v3r 8yIYpFgPt+5Rqq6ofX9sMrxQWBl9W9Q= Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594-9uWn_ZIGMByvYnbyOl2QTA-1; Fri, 04 Mar 2022 00:20:04 -0500 X-MC-Unique: 9uWn_ZIGMByvYnbyOl2QTA-1 Received: by mail-pg1-f197.google.com with SMTP id v8-20020a654608000000b0037d5ef9cfa0so212301pgq.8 for ; Thu, 03 Mar 2022 21:20:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VZsj8wGvJYP0Z35VYck5GXCqHSNO5FmtKWCwc6ZUvEE=; b=dNGtFQZzoLw73uVBae0mydS4Ctl/Zd0xRyKa2ixg11XlDg0LONYizvWbD/6sb/3/vB QHS/wPtK43VxNbc0yhznVl5hlnYiFQ+w3VjDhNT2qBCg2HCTLIdq2vnRs/dQhK48D2Pr csn2kIJb2yCwH1rgaFO3jxI/oRaeO7tIXXvjlN35p6iCohqniiYUyFojEKWd2JfiZYPx hmn9sC+GJd+Ba7j8pfmRU3UXmFRUfJzin80QWzIlAH12X388QNDAt3k5NuoN9UkNsxfA jfIaSow6xpVYtDgMSljSDPD6O+xlrr4MXm7olkktT2OQxGtd5KxH/6EWeV0Vwsse2K92 9YtA== X-Gm-Message-State: AOAM531+2fm3rO45bBLQBBQNKq5ryr+BvJNGtvjNgUjchqqcHF/FaAT1 e3psGNp3Tl1HZIMW1huRedckP9maagwuRHKhdnkDy6A4mNkDbbhbHfDRlSI2HV+rFKtsJwNNClF yrAtJjKvpveYs4FGNpPhjLmGq X-Received: by 2002:a17:902:6b47:b0:150:80de:5d49 with SMTP id g7-20020a1709026b4700b0015080de5d49mr34640514plt.77.1646371203263; Thu, 03 Mar 2022 21:20:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJw05sOycQa3nYtvDYuzzS5+ZO7xhuLtA37+jl2qgXReyHL11C3e1uJ1D5StzvLh4AArL9JTNA== X-Received: by 2002:a17:902:6b47:b0:150:80de:5d49 with SMTP id g7-20020a1709026b4700b0015080de5d49mr34640498plt.77.1646371203014; Thu, 03 Mar 2022 21:20:03 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.19.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:20:02 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Date: Fri, 4 Mar 2022 13:17:05 +0800 Message-Id: <20220304051708.86193-21-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This requires the pagemap code to be able to recognize the newly introduced swap special pte for uffd-wp, meanwhile the general case for hugetlb that we recently start to support. It should make pagemap uffd-wp support complete. Signed-off-by: Peter Xu --- fs/proc/task_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2c48b1eaaa9c..46e5896b82d2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1421,6 +1421,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pa= gemapread *pm, migration =3D is_migration_entry(entry); if (is_pfn_swap_entry(entry)) page =3D pfn_swap_entry_to_page(entry); + if (pte_marker_entry_uffd_wp(entry)) + flags |=3D PM_UFFD_WP; } =20 if (page && !PageAnon(page)) @@ -1556,10 +1558,15 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, if (page_mapcount(page) =3D=3D 1) flags |=3D PM_MMAP_EXCLUSIVE; =20 + if (huge_pte_uffd_wp(pte)) + flags |=3D PM_UFFD_WP; + flags |=3D PM_PRESENT; if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); + } else if (pte_swp_uffd_wp_any(pte)) { + flags |=3D PM_UFFD_WP; } =20 for (; addr !=3D end; addr +=3D PAGE_SIZE) { --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A81C433EF for ; Fri, 4 Mar 2022 05:21:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238220AbiCDFVv (ORCPT ); Fri, 4 Mar 2022 00:21:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238182AbiCDFVS (ORCPT ); Fri, 4 Mar 2022 00:21:18 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ECCCB18644A for ; Thu, 3 Mar 2022 21:20:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371215; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ffibrZjnGVszpXN3ddH8kcpciqEGO3vV2nAFK0S1oq0=; b=aiGa7zW99IewUe0q8nbC309+7s0v/wIMQc6DXtrI6vb56w73dgJAsVfcXlUkt5cgnYshhu G2vQ7QIfnQoMinQsCUVykbnMkxxIFww7Xt0/t3c1EroGQL6nPunN48QnnRgdWZQW6QSmWc qQYixIBjEtbuH7mP1nrjWtIIACccE1I= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-607-xTnW3XduOSOfMN1rSPyI9g-1; Fri, 04 Mar 2022 00:20:13 -0500 X-MC-Unique: xTnW3XduOSOfMN1rSPyI9g-1 Received: by mail-pj1-f69.google.com with SMTP id ds4-20020a17090b08c400b001bc1dee733eso4470207pjb.0 for ; Thu, 03 Mar 2022 21:20:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ffibrZjnGVszpXN3ddH8kcpciqEGO3vV2nAFK0S1oq0=; b=XIGCqMF8zCL/gbEP9oFim3dvyi/v/dF/jA2b4WqufOfVU0OOSEbdyY4oz9qKY9dizX JAmld5g3zQwl7iDiGAQEBujZ5DumWkUCIIqs3XSuN6c944i6OZITYfhN5sWQuGJZUcq+ MoALgkSkrp7zF5glFb/7o2DRRyqh/QimmvDjaITIsU9wMgtAd9Ozv6eD9VXPH5VJ+aJ0 ReruRpoD3nNwGnwopqjMSYwGmsSBZguKpiieBPUcFijJPrl4DsDcB0efaHpIR9xTKIdX 1M0gaeKVEuygBDZawUgxTwR62SwADBEKRVejhwnERK7Hl6JzB/4YSufyctwf3QQrNI94 rSqg== X-Gm-Message-State: AOAM533za9BbI6w2KpA7Ha/qYVnLEBvL+lZ4dym71KvLqjZw3FR1MXlr RFCGOx1zgvy423++SsdpTcCcxSNxgfTFpSnVaDCVRK4X7qTlM6pQDJObhdK4Z1PIoUDM9GRW1ZG OX8hriDoRvFg0QKyCQID4YDEH X-Received: by 2002:a17:902:8f83:b0:151:5c71:a6e6 with SMTP id z3-20020a1709028f8300b001515c71a6e6mr27671444plo.126.1646371211271; Thu, 03 Mar 2022 21:20:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJxJfL2XlMGfvdIOmaVhgHly9k8v9pLNrjUxE51LcGrUcbGkXTtSAjOI3Or1kjLz33VXgzk0CA== X-Received: by 2002:a17:902:8f83:b0:151:5c71:a6e6 with SMTP id z3-20020a1709028f8300b001515c71a6e6mr27671420plo.126.1646371210948; Thu, 03 Mar 2022 21:20:10 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.20.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:20:10 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Date: Fri, 4 Mar 2022 13:17:06 +0800 Message-Id: <20220304051708.86193-22-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We've had all the necessary changes ready for both shmem and hugetlbfs. Tu= rn on all the shmem/hugetlbfs switches for userfaultfd-wp. We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too bec= ause all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 21 ++------------------- include/linux/userfaultfd_k.h | 11 +++++++++++ include/uapi/linux/userfaultfd.h | 10 ++++++++-- mm/userfaultfd.c | 9 +++------ 4 files changed, 24 insertions(+), 27 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8b4a94f5a238..cd19083123fe 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1257,24 +1257,6 @@ static __always_inline int validate_range(struct mm_= struct *mm, return 0; } =20 -static inline bool vma_can_userfault(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - /* FIXME: add WP support to hugetlbfs and shmem */ - if (vm_flags & VM_UFFD_WP) { - if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) - return false; - } - - if (vm_flags & VM_UFFD_MINOR) { - if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) - return false; - } - - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); -} - static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { @@ -1954,7 +1936,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP - uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; + uffdio_api.features &=3D + ~(UFFD_FEATURE_PAGEFAULT_FLAG_WP | UFFD_FEATURE_WP_HUGETLBFS_SHMEM); #endif uffdio_api.ioctls =3D UFFD_API_IOCTLS; ret =3D -EFAULT; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 827e38b7be65..69b174807d8f 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -18,6 +18,7 @@ #include #include #include +#include =20 /* The set of all possible UFFD-related VM flags. */ #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) @@ -140,6 +141,16 @@ static inline bool userfaultfd_armed(struct vm_area_st= ruct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } =20 +static inline bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (vm_flags & VM_UFFD_MINOR) + return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); + + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); +} + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); =20 diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index ef739054cb1c..7d32b1e797fb 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -33,7 +33,8 @@ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ - UFFD_FEATURE_EXACT_ADDRESS) + UFFD_FEATURE_EXACT_ADDRESS | \ + UFFD_FEATURE_WP_HUGETLBFS_SHMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -47,7 +48,8 @@ #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ - (__u64)1 << _UFFDIO_CONTINUE) + (__u64)1 << _UFFDIO_CONTINUE | \ + (__u64)1 << _UFFDIO_WRITEPROTECT) =20 /* * Valid ioctl command number range with this API is from 0x00 to @@ -194,6 +196,9 @@ struct uffdio_api { * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page * faults would be provided and the offset within the page would not be * masked. + * + * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd + * write-protection mode is supported on both shmem and hugetlbfs. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -207,6 +212,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) +#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) __u64 features; =20 __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 441728732033..b70167a563f8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -730,15 +730,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, uns= igned long start, =20 err =3D -ENOENT; dst_vma =3D find_dst_vma(dst_mm, start, len); - /* - * Make sure the vma is not shared, that the dst range is - * both valid and fully within a single existing vma. - */ - if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + + if (!dst_vma) goto out_unlock; if (!userfaultfd_wp(dst_vma)) goto out_unlock; - if (!vma_is_anonymous(dst_vma)) + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) goto out_unlock; =20 if (is_vm_hugetlb_page(dst_vma)) { --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFD35C433EF for ; Fri, 4 Mar 2022 05:20:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238192AbiCDFVb (ORCPT ); Fri, 4 Mar 2022 00:21:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238205AbiCDFVY (ORCPT ); Fri, 4 Mar 2022 00:21:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 32BE3186B81 for ; Thu, 3 Mar 2022 21:20:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371221; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TsbV9aZwys4B5ATDrVcah3NNWPIloXpy6hGN6YFlyJ0=; b=aPEArK2oXQA8kC9M9f7b4TPpmYhgATepuaN4sO7wqDGRNLeQxNx+LbMX61sTLNvV3mn+0Y WWgzOxCE54MhRNl+Mk/LbhxHWKFdliN1lg1ImDwtYrBIC6BO548Lp+q1c1XP2kClwfr7Z4 58YarW/K/rmUzqx7UuABDOVYnDf6eEQ= Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-580-ur6LRQ8wPhi5cqmKyWRF3g-1; Fri, 04 Mar 2022 00:20:20 -0500 X-MC-Unique: ur6LRQ8wPhi5cqmKyWRF3g-1 Received: by mail-pg1-f198.google.com with SMTP id v32-20020a634660000000b0037c3f654c50so3272570pgk.6 for ; Thu, 03 Mar 2022 21:20:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TsbV9aZwys4B5ATDrVcah3NNWPIloXpy6hGN6YFlyJ0=; b=1SjgoDbHSXy6RH/MqiRiQQxNttbqQx6oHn1obuiuFDfy+nqxrOFn4hYaDsBRRloA19 5ZRN7tjS8xLLe+86WHYFfBFnJfxok0TxrOYuqnnqFsau/w7BrL58JYasMsYBWXlgp8eW IecyvmpB1STkftJ+5GpLCnO3jZJQZ9BhGP3ByVkNPoEvgiIo1+O91LdaZcO7qu5Ofy1D 7x3UNT6AO1taB2VgH2goTg9zpgEzr4KhVGINJfIZtIa93UnU4D9j2wdb8DsTIM8llpSR JvH+gcbGOU5+V7NPtiFLo7Benrj7AowN0DxN62ti8d+4ugyPrFGilnrgMYBN5cmze9zJ 9SgA== X-Gm-Message-State: AOAM532HPEMeTShUH7eGIQfeiHBGuQlqRZKv8bgIvtMfZwEdi44YkuXQ EsK2AI7Vf5oDZOEio0UxCGKA5sRf3t5vq8KRCNtM8BOSWO9fq0PH/RFdVKBVGjGyK1BEvQAx5jV PjgFtEn0EaGnAVZCDS+yni5IT X-Received: by 2002:a63:8648:0:b0:37c:8fc8:ae4 with SMTP id x69-20020a638648000000b0037c8fc80ae4mr2426557pgd.482.1646371219106; Thu, 03 Mar 2022 21:20:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJxRWTNjzqF9FDUsgNw/iTSctk6JZiPnQ2cg8VqL5h80ojfUPsMdu0+kgc5kk3/QOyRztfI6Hg== X-Received: by 2002:a63:8648:0:b0:37c:8fc8:ae4 with SMTP id x69-20020a638648000000b0037c8fc80ae4mr2426547pgd.482.1646371218872; Thu, 03 Mar 2022 21:20:18 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.20.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:20:18 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 22/23] mm: Enable PTE markers by default Date: Fri, 4 Mar 2022 13:17:07 +0800 Message-Id: <20220304051708.86193-23-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Enable PTE markers by default. On x86_64 it means it'll auto-enable PTE_MARKER_UFFD_WP as well. Signed-off-by: Peter Xu --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index a80ea8721885..93e90efc4ab7 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -901,7 +901,7 @@ config ANON_VMA_NAME difference in their name. =20 config PTE_MARKER - def_bool n + def_bool y bool "Marker PTEs support" =20 help --=20 2.32.0 From nobody Tue Jun 23 13:01:38 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E383EC433F5 for ; Fri, 4 Mar 2022 05:21:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234052AbiCDFVq (ORCPT ); Fri, 4 Mar 2022 00:21:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238214AbiCDFV0 (ORCPT ); Fri, 4 Mar 2022 00:21:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0F200187BB7 for ; Thu, 3 Mar 2022 21:20:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646371229; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/LhV/Ezx1y/irXCjQ5LcN5y8nixtLOeiBAqYVpW0dWs=; b=TlxyA7LUg59805Uegl8WcuUbJLCiH/Or1fM9D8NAJoKlQuo+kzGAfmq/DsNElmFFPLfXjS M1K519NGNd5uKFQDia0E8G+luDhCd/9ucXP6pvozLqL4VtH3jSj5MnQnsgoowXn8fr3lzJ 01kY4zfT3hFnyz/5mTBrvmZrzmtp6/A= Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-583-G0i2FlNdNj672f5lREGhaw-1; Fri, 04 Mar 2022 00:20:27 -0500 X-MC-Unique: G0i2FlNdNj672f5lREGhaw-1 Received: by mail-pg1-f199.google.com with SMTP id j29-20020a634a5d000000b00376a7b5602dso3922084pgl.4 for ; Thu, 03 Mar 2022 21:20:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/LhV/Ezx1y/irXCjQ5LcN5y8nixtLOeiBAqYVpW0dWs=; b=khjS2lmqvlccCvSK6VNPW4hk5Y8noPFO8sl+gbWLy6LkfpACq9O9HJNZTt8t9T796q BoB+LGwvxVq+Uk/zTKVVHqX0BO2lzNP9S7v7TLpqyobSFggofJma525Zu1QGY3I/kMM+ m2K13Vp7o+TG78KelehdfJ5jfCci2NVoFwMn1gTkRc9U3Gpx03eFhtJOO+s7YKKzvlxE gQvFXeXDKbYNGrKyVX+JTUC5UyLFl/p0Yko42dvMc5Pg8by3c7JAoOPV01CdsZwd1WjG vQCfEFZew5Yy/FG1uPRJZ6cyjMwIYBKwmq+rPzhF9AXgaJc7ghvbOdkl+jbphpNx9/gp Tp3A== X-Gm-Message-State: AOAM532B1rHI5x2sIIS9OTrTOxc0/JfjAUXeznQLAeR54Vm2FNIiUvUg dixvoMSh+PMHJqPdHO01gMkwQd1LjFwLr/0l3lzWTKFx2Ht5A26qYsuFgwNyUjkc4ISHhieiHBx PfKmnwblNiopbGcVzh9Zs04lE X-Received: by 2002:a17:90b:17cb:b0:1bf:138d:e0f8 with SMTP id me11-20020a17090b17cb00b001bf138de0f8mr5251910pjb.157.1646371226899; Thu, 03 Mar 2022 21:20:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJw1mSRuLkPo2ioYs6r74omrthJGnPJt2ZsaElav8kmbyVq3KwqPqDcyzWlHKXdEjHb9j9zBKw== X-Received: by 2002:a17:90b:17cb:b0:1bf:138d:e0f8 with SMTP id me11-20020a17090b17cb00b001bf138de0f8mr5251896pjb.157.1646371226610; Thu, 03 Mar 2022 21:20:26 -0800 (PST) Received: from localhost.localdomain ([94.177.118.59]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f669806cd9sm4323865pfo.87.2022.03.03.21.20.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 03 Mar 2022 21:20:26 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Nadav Amit , Hugh Dickins , David Hildenbrand , Axel Rasmussen , Matthew Wilcox , Alistair Popple , Mike Rapoport , Andrew Morton , Jerome Glisse , Mike Kravetz , "Kirill A . Shutemov" , Andrea Arcangeli Subject: [PATCH v7 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Date: Fri, 4 Mar 2022 13:17:08 +0800 Message-Id: <20220304051708.86193-24-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220304051708.86193-1-peterx@redhat.com> References: <20220304051708.86193-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After we added support for shmem and hugetlbfs, we can turn uffd-wp test on always now. Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selft= ests/vm/userfaultfd.c index fe404398c65a..d91668df8135 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -82,7 +82,7 @@ static int test_type; static volatile bool test_uffdio_copy_eexist =3D true; static volatile bool test_uffdio_zeropage_eexist =3D true; /* Whether to test uffd write-protection */ -static bool test_uffdio_wp =3D false; +static bool test_uffdio_wp =3D true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor =3D false; =20 @@ -1597,8 +1597,6 @@ static void set_test_type(const char *type) if (!strcmp(type, "anon")) { test_type =3D TEST_ANON; uffd_test_ops =3D &anon_uffd_test_ops; - /* Only enable write-protect test for anonymous test */ - test_uffdio_wp =3D true; } else if (!strcmp(type, "hugetlb")) { test_type =3D TEST_HUGETLB; uffd_test_ops =3D &hugetlb_uffd_test_ops; --=20 2.32.0