From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F6B8380FF0; Fri, 29 May 2026 17:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075655; cv=none; b=YnZSYbYr6GGfdts6KuTq2cZgvMnj8Kzv6eBAiDlVZq4I6VrRjXs55/txJQqEGa7YaCQts6EQqKIGl1d6EA5yGcx2emIFAFFZLm0sDlfBGmpHxWSSCOItvq5u+/HL7r8WZ8Kp2PKpBrM6sA8agFxqJMLt1/d/fOJL0nv5pbXuJA8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075655; c=relaxed/simple; bh=AWWuaqGq0G4jTq3f+mmGAbwoUQNRVv6tbROyWGo8E7M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oy/Vkh51OVwU2PDpMGHTwlioOcI7QlyqwV5fuemfQco3xSC0JkIogYezzOA6Y6bhzBSH1bqbeqOvmz+HZpzTuikHt8kjsqgZx68CjMT1Jqx+nFS1dwdxtsCk16DYNzG1CY8jmZdyce3dLWKh3BUxsg05J4mT/mSLe0aeSTMLLb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RYd1H3Zh; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RYd1H3Zh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7EEA61F00898; Fri, 29 May 2026 17:27:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075653; bh=kjQ0CjwsQOsekYiYgl67Dp/dWChC3MTyF9WfYhKnZC8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RYd1H3ZhuzdVpTnWI4L+EyqBF71AfUDwCkj8O4Sd1GQFeINC1kihEht7ODuUysNPa Fv6Y1qmL5b/wumKH/LE3vrwPfKRkUa9LWXuee5Wr3UThRbBHrEC8IUr2ryxQef3Tor zSGemv67KN3NLDMU4IBQ+2WmgO1FWyn/ZJPG4kPmtV/o3SOHYUeU86KW0izf7BySTq Kdyoo23q9ULTCYP3FFZyTAsIE/cIq4vtCqZPN68jpWfct4+RMSUnGJrLZTBxcOg6Kn F4LoW04k7LUOyLBhUW9BJajiQoVKypILNOfczpo9EJ+GOO1ig5d/ibzjgn2+tIvefY UYiParw1v2E3g== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id DE33DF4006F; Fri, 29 May 2026 13:27:31 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 29 May 2026 13:27:31 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+GoG DcWoyDF/fWUJN++x6DV6Wp6BMpiK6v8WfiKpYFCu0M4T15owdA4n/JBrqwz5djvc6TUXig 2lvFyuVNCz149HHZ/D09kn5+VLvPB6wJM89w6enQYbR20EdkaafMnngbJnisuV89UjGNHv GvZu6oHwHGgxlkkiDdTpM8oZ7QHEgtOi+hAxx2sHQrxRagJfsikwoKsGb2u6xG3GXkvIIk DD2FfAfA2RL0TBw6YdZVvEM+oqVA07dYql/XJzehtMYYx5bTB4hp4RXJK8iI1x/TwuVRiA 4vVtx1XuJUfOIWEfcSOBmDnLlkZ0oGmAckn1TCASXzMCz3vERPN7kfWLmjQA X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:29 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 01/15] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Date: Fri, 29 May 2026 18:26:30 +0100 Message-ID: <20260529172716.357179-2-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pte_protnone() and pmd_protnone() detect present-but-inaccessible page table entries. This capability is useful beyond NUMA balancing -- for example, userfaultfd working set tracking uses protnone PTEs to track page access without unmapping pages. Introduce CONFIG_ARCH_HAS_PTE_PROTNONE to decouple the protnone PTE infrastructure from CONFIG_NUMA_BALANCING. The six architectures that support protnone PTEs (x86_64, arm64, powerpc, s390, riscv, loongarch) now select this option, and CONFIG_NUMA_BALANCING depends on it. No functional change -- the same set of architectures continues to have working protnone support, but the infrastructure is now available independently of NUMA balancing. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Acked-by: SeongJae Park Acked-by: Mike Rapoport (Microsoft) --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 7 ++--- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/pgtable.h | 4 +-- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++--- arch/powerpc/platforms/Kconfig.cputype | 1 + arch/riscv/Kconfig | 1 + arch/riscv/include/asm/pgtable.h | 7 ++--- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 4 +-- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 8 ++--- include/linux/pgtable.h | 32 ++++++++++++++------ init/Kconfig | 8 +++++ mm/debug_vm_pgtable.c | 4 +-- 15 files changed, 52 insertions(+), 36 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fe60738e5943..319470b3b1bb 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -78,6 +78,7 @@ config ARM64 select ARCH_SUPPORTS_CFI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 4dfa42b7d053..873f4ea2e288 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -553,10 +553,7 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { /* @@ -575,7 +572,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 606597da46b8..c085f5067b3b 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -67,6 +67,7 @@ config LOONGARCH select ARCH_SUPPORTS_LTO_CLANG select ARCH_SUPPORTS_LTO_CLANG_THIN select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE if 64BIT select ARCH_SUPPORTS_NUMA_BALANCING if NUMA select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_RT diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/= asm/pgtable.h index 2a0b63ae421f..d295447a2763 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -619,7 +619,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_s= truct *mm, =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline long pte_protnone(pte_t pte) { return (pte_val(pte) & _PAGE_PROTNONE); @@ -629,7 +629,7 @@ static inline long pmd_protnone(pmd_t pmd) { return (pmd_val(pmd) & _PAGE_PROTNONE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_leaf(pmd) ((pmd_val(pmd) & _PAGE_HUGE) !=3D 0) #define pud_leaf(pud) ((pud_val(pud) & _PAGE_HUGE) !=3D 0) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index e67e64ac6e8c..53a0c5892548 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -490,13 +490,13 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)= ) =3D=3D cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_hw_valid(pte_t pte) { @@ -1067,12 +1067,12 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #endif #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) =20 diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platform= s/Kconfig.cputype index bac02c83bb3e..36b64a24cf30 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -87,6 +87,7 @@ config PPC_BOOK3S_64 select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION select ARCH_ENABLE_SPLIT_PMD_PTLOCK select ARCH_SUPPORTS_HUGETLBFS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select HAVE_MOVE_PMD select HAVE_MOVE_PUD diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index c5754942cf85..e2c5776d18cf 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -71,6 +71,7 @@ config RISCV select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU select ARCH_SUPPORTS_PER_VMA_LOCK if MMU + select ARCH_HAS_PTE_PROTNONE if MMU select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK select ARCH_SUPPORTS_SCHED_MC if SMP diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index a1a7c6520a09..48a127323b21 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -524,10 +524,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) PAGE_SIZE) #endif =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/asm-generic/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) =3D=3D _PAGE_PR= OT_NONE; @@ -537,7 +534,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 /* Modify page protection bits */ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index ecbcbb781e40..bc5bef08454b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -151,6 +151,7 @@ config S390 select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && CC_IS_CLANG select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 2c6cee8241e0..97241dea5573 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -842,7 +842,7 @@ static inline int pte_same(pte_t a, pte_t b) return pte_val(a) =3D=3D pte_val(b); } =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return pte_present(pte) && !(pte_val(pte) & _PAGE_READ); @@ -853,7 +853,7 @@ static inline int pmd_protnone(pmd_t pmd) /* pmd_leaf(pmd) implies pmd_present(pmd) */ return pmd_leaf(pmd) && !(pmd_val(pmd) & _SEGMENT_ENTRY_READ); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_swp_exclusive(pte_t pte) { diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f3f7cb01d69d..9da1119e8ff6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select ARCH_SUPPORTS_DEBUG_PAGEALLOC select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64 + select ARCH_HAS_PTE_PROTNONE if X86_64 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <=3D 4096 select ARCH_SUPPORTS_CFI if X86_64 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2187e9cfcefa..c7f014cbf0a9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -985,11 +985,7 @@ static inline int pmd_present(pmd_t pmd) return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE); } =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * These work without NUMA balancing but the kernel does not care. See the - * comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT)) @@ -1001,7 +997,7 @@ static inline int pmd_protnone(pmd_t pmd) return (pmd_flags(pmd) & (_PAGE_PROTNONE | _PAGE_PRESENT)) =3D=3D _PAGE_PROTNONE; } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline int pmd_none(pmd_t pmd) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cdd68ed3ae1a..b6516a11adfa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -2052,18 +2052,26 @@ static inline int pud_trans_unstable(pud_t *pud) return 0; } =20 -#ifndef CONFIG_NUMA_BALANCING +#ifndef CONFIG_ARCH_HAS_PTE_PROTNONE /* - * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It is - * perfectly valid to indicate "no" in that case, which is why our default - * implementation defaults to "always no". + * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It + * is perfectly valid to indicate "no" in that case, which is why our + * default implementation defaults to "always no". * - * In an accessible VMA, however, pte_protnone() reliably indicates PROT_N= ONE - * page protection due to NUMA hinting. NUMA hinting faults only apply in - * accessible VMAs. + * In an accessible VMA, pte_protnone() reliably indicates a present + * PROT_NONE page protection. Today the kernel uses such PTEs for two + * purposes: NUMA hinting faults, and userfaultfd RWP tracking on + * VM_UFFD_RWP VMAs. The two are distinguished by the uffd PTE bit and + * the VMA flag; see include/linux/userfaultfd_k.h. * - * So, to reliably identify PROT_NONE PTEs that require a NUMA hinting fau= lt, - * looking at the VMA accessibility is sufficient. + * So, to reliably identify PROT_NONE PTEs that require kernel handling, + * looking at the VMA accessibility (and the uffd bit on RWP VMAs) is + * sufficient. + * + * Architectures without CONFIG_ARCH_HAS_PTE_PROTNONE get the always-zero + * stubs below; PAGE_NONE references that survive to runtime fire the + * BUILD_BUG() fallback, since callers should have folded such paths to + * dead code via IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE). */ static inline int pte_protnone(pte_t pte) { @@ -2074,7 +2082,11 @@ static inline int pmd_protnone(pmd_t pmd) { return 0; } -#endif /* CONFIG_NUMA_BALANCING */ + +#ifndef PAGE_NONE +#define PAGE_NONE ({ BUILD_BUG(); (pgprot_t){0}; }) +#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #endif /* CONFIG_MMU */ =20 diff --git a/init/Kconfig b/init/Kconfig index 2937c4d308ae..58abb7f19206 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -944,6 +944,13 @@ config SCHED_PROXY_EXEC =20 endmenu =20 +# +# For architectures that support present-but-inaccessible (PROT_NONE) page +# table entries detectable via pte_protnone() / pmd_protnone(): +# +config ARCH_HAS_PTE_PROTNONE + bool + # # For architectures that want to enable the support for NUMA-affine schedu= ler # balancing logic: @@ -1010,6 +1017,7 @@ config ARCH_WANT_NUMA_VARIABLE_LOCALITY config NUMA_BALANCING bool "Memory placement aware NUMA scheduler" depends on ARCH_SUPPORTS_NUMA_BALANCING + depends on ARCH_HAS_PTE_PROTNONE depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY depends on SMP && NUMA_MIGRATION && !PREEMPT_RT help diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 23dc3ee09561..5e9f3a35f924 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -672,7 +672,7 @@ static void __init pte_protnone_tests(struct pgtable_de= bug_args *args) { pte_t pte =3D pfn_pte(args->fixed_pte_pfn, args->page_prot_none); =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 pr_debug("Validating PTE protnone\n"); @@ -685,7 +685,7 @@ static void __init pmd_protnone_tests(struct pgtable_de= bug_args *args) { pmd_t pmd; =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 if (!has_transparent_hugepage()) --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DEFE386453; Fri, 29 May 2026 17:27:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075665; cv=none; b=hS3ggGdxmYM64gKkF9eWVTUp0Ky8AeIGNcBtjH0FtnFQk0/exDPD8DrbptBvcmN0rlxJFyeKKxzbniILjU7Pw6Wfbh4AzWAh9LC1vgyM6n6syZHW0+IKI/8ERojJPY0I5jpjS0s3mKIwJyX2315E02tm4oT39gZkRXQbOzjjUJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075665; c=relaxed/simple; bh=W8xARqvMauLB860AXK5Y8h+WzBIZnfWZVnCOfU9MjVI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K1EXyYt+XYlY0VEwvMFIUxPNEaYKHjlABBaPFyEMJbyFswGPWzmAyUxUqO+hPgJ6SRjHhptpVhSkJGapi0QoUyD9QoU0G77iCRlSyTvFvoSLxt0dx5nt+RjRqRoawXSjwgkkycG/sx6h9HCNLfpuLGF7MsDRRZ96s2UIpAtX11g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RDbwJoS7; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RDbwJoS7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0570F1F00893; Fri, 29 May 2026 17:27:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075660; bh=hTy85+CBbgePeScVZ3smE24qgHSrvtQVk5MU+uCFfm4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RDbwJoS7uLoKuAbdJB4NnfL6jC3kDkJN3wYpuy/6ROOKojxp1NKOqZbE4nb4OcQ7Z /wTZRMsOvhvUtADTYnvLu0g48fmQxixftxr1JwBRMQDe059j8eemfNEi33wW7uOosq QhwG/eIArisdHqGICzCnmiqc5A/DwCoQb9Osv2UELwSZSgCw9nfhJ8ust371MCjZmc rmc4NgiqH0ikhm6zU06608cGGOnRwPJo2d6lnljScBOSqg6fcvL9c75PP7NonQuT2f etpklpipUMm5Roki+Ebyve5P79ACt0XKE886te/8wzrp6BomcdGdrIEmmPpzdnp13k qhsErrJp/JKQw== Received: from phl-compute-08.internal (phl-compute-08.internal [10.202.2.48]) by mailfauth.phl.internal (Postfix) with ESMTP id 6398CF40070; Fri, 29 May 2026 13:27:39 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-08.internal (MEProxy); Fri, 29 May 2026 13:27:39 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+GLg ISthShzrqhcW9pUcv6aOKq0EubxlohSWfJGN/I0MulPgypl+zMtesi11RGu1adttYdtAo3 mwxdmiDGciG2aaHKhNIp2YhGN5a8JTy/f5woCV4RH/YBPXqp3HdWp36dvaAFSBbpoc3/Ke wdYSVHpEcR7T3cZUztN4TDUvSqG0hUl1J7LCdC8c4o0GbXtyvNqaD6FwmH9ZcID3KxqQhd tYjLHzsbaQH0EROxpAf3O9l6eFiwZSwFL8F1cfYudh4cz51Fj0d0JbqKFlrPQoAlPbJt4o kYHY7psyNCtk2w57Ud2t4hrYY9xLiPClx4cxMnmk08Wd9fJWZBtrHL40jXRg X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:37 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 02/15] mm: rename uffd-wp PTE bit macros to uffd Date: Fri, 29 May 2026 18:26:31 +0100 Message-ID: <20260529172716.357179-3-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The uffd-wp PTE bit is about to gain a second consumer: userfaultfd RWP will use the same bit to mark access-tracking PTEs, distinct from mprotect(PROT_NONE) or NUMA-hinting PTEs. WP vs RWP semantics come from the VMA flag; the bit is just "uffd has claimed this entry." Drop the "_wp" suffix from the arch-private bit macros so they reflect that. x86: _PAGE_BIT_UFFD_WP -> _PAGE_BIT_UFFD _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD arm64: PTE_UFFD_WP -> PTE_UFFD PTE_SWP_UFFD_WP -> PTE_SWP_UFFD riscv: _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable-prot.h | 8 ++++---- arch/arm64/include/asm/pgtable.h | 12 ++++++------ arch/riscv/include/asm/pgtable-bits.h | 12 ++++++------ arch/riscv/include/asm/pgtable.h | 14 +++++++------- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ arch/x86/include/asm/pgtable_types.h | 16 ++++++++-------- 6 files changed, 43 insertions(+), 43 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm= /pgtable-prot.h index 212ce1b02e15..09d7c00cf405 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -28,11 +28,11 @@ #define PTE_PRESENT_VALID_KERNEL (PTE_VALID | PTE_MAYBE_NG) =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ +#define PTE_UFFD (_AT(pteval_t, 1) << 58) /* userfaultfd tracking */ +#define PTE_SWP_UFFD (_AT(pteval_t, 1) << 3) /* only for swp ptes */ #else -#define PTE_UFFD_WP (_AT(pteval_t, 0)) -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define PTE_UFFD (_AT(pteval_t, 0)) +#define PTE_SWP_UFFD (_AT(pteval_t, 0)) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 873f4ea2e288..3eecb2c17711 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -343,17 +343,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_UFFD_WP); + return !!(pte_val(pte) & PTE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP))); + return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -539,17 +539,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_SWP_UFFD_WP); + return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm= /pgtable-bits.h index b422d9691e60..d5a86b4df3ce 100644 --- a/arch/riscv/include/asm/pgtable-bits.h +++ b/arch/riscv/include/asm/pgtable-bits.h @@ -40,20 +40,20 @@ =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP =20 -/* ext_svrsw60t59b: Bit(60) for uffd-wp tracking */ -#define _PAGE_UFFD_WP \ +/* ext_svrsw60t59b: Bit(60) for userfaultfd tracking */ +#define _PAGE_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ (1UL << 60) : 0) /* * Bit 4 is not involved into swap entry computation, so we - * can borrow it for swap page uffd-wp tracking. + * can borrow it for swap page userfaultfd tracking. */ -#define _PAGE_SWP_UFFD_WP \ +#define _PAGE_SWP_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ _PAGE_USER : 0) #else -#define _PAGE_UFFD_WP 0 -#define _PAGE_SWP_UFFD_WP 0 +#define _PAGE_UFFD 0 +#define _PAGE_SWP_UFFD 0 #endif =20 #define _PAGE_TABLE _PAGE_PRESENT diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 48a127323b21..ca69948b3ed8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -405,32 +405,32 @@ static inline pte_t pte_wrprotect(pte_t pte) =20 static inline bool pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_UFFD_WP); + return !!(pte_val(pte) & _PAGE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD_WP)); + return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 static inline bool pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_SWP_UFFD_WP); + return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SWP_UFFD_WP); + return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1157,7 +1157,7 @@ static inline pud_t pud_modify(pud_t pud, pgprot_t ne= wprot) * bit 0: _PAGE_PRESENT (zero) * bit 1 to 2: (zero) * bit 3: _PAGE_SWP_SOFT_DIRTY - * bit 4: _PAGE_SWP_UFFD_WP + * bit 4: _PAGE_SWP_UFFD * bit 5: _PAGE_PROT_NONE (zero) * bit 6: exclusive marker * bits 7 to 11: swap type diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c7f014cbf0a9..038c806b50a2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -413,17 +413,17 @@ static inline pte_t pte_wrprotect(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_UFFD_WP; + return pte_flags(pte) & _PAGE_UFFD; } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD_WP)); + return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_UFFD_WP); + return pte_clear_flags(pte, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -528,17 +528,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_UFFD_WP; + return pmd_flags(pmd) & _PAGE_UFFD; } =20 static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) { - return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD_WP)); + return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1550,32 +1550,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_SWP_UFFD_WP; + return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 static inline int pmd_swp_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_SWP_UFFD_WP; + return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 2ec250ba467e..af08d98be930 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -31,7 +31,7 @@ =20 #define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 -#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ +#define _PAGE_BIT_UFFD _PAGE_BIT_SOFTW2 /* userfaultfd tracking */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted= to large */ =20 @@ -39,7 +39,7 @@ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD)= */ #else -/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ +/* Shared with _PAGE_BIT_UFFD which is not supported on 32 bit */ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD)= */ #endif @@ -111,11 +111,11 @@ #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define _PAGE_UFFD_WP (_AT(pteval_t, 1) << _PAGE_BIT_UFFD_WP) -#define _PAGE_SWP_UFFD_WP _PAGE_USER +#define _PAGE_UFFD (_AT(pteval_t, 1) << _PAGE_BIT_UFFD) +#define _PAGE_SWP_UFFD _PAGE_USER #else -#define _PAGE_UFFD_WP (_AT(pteval_t, 0)) -#define _PAGE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define _PAGE_UFFD (_AT(pteval_t, 0)) +#define _PAGE_SWP_UFFD (_AT(pteval_t, 0)) #endif =20 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) @@ -129,7 +129,7 @@ /* * The hardware requires shadow stack to be Write=3D0,Dirty=3D1. However, * there are valid cases where the kernel might create read-only PTEs that - * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * are dirty (e.g., fork(), mprotect(), userfaultfd, soft-dirty tracking).= In * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bi= t, * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have * (Write=3D0,SavedDirty=3D1,Dirty=3D0) set. @@ -151,7 +151,7 @@ #define _COMMON_PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | \ _PAGE_DIRTY_BITS | _PAGE_SOFT_DIRTY | \ - _PAGE_CC | _PAGE_UFFD_WP) + _PAGE_CC | _PAGE_UFFD) #define _PAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PAT) #define _HPAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_PAT_LAR= GE) =20 --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2BCC383C88; Fri, 29 May 2026 17:27:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075673; cv=none; b=uedujIQ52xcRSJGVi053P8Q92El+ue+5vj+mUuexmiTfJvCdM+BEu6KVoAAwK+3SH9Q3IX6BsPonJRl54U/PyaaxcpTiBcR14OS/cmQRCfJcAfRCPLXNN8b3rT2oDVW3T+BM9mUEmO1X+ptApTFX+VUtxM9kNIsUBngHieao4E0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075673; c=relaxed/simple; bh=eeK8dErWJF9RJealsu+L1kiMRXIs6pg3hhvPstVbgE8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TEWv2no1MKsy1svlVkc0G1RvZulZG3nS07GT75jSn8kc1mZi/C+BCQTKLH2YIdIntdGFQnAKs7Y4JBwhUTFBTVW1Z9NxMNIDQBu6arbdhNKDDpCvjCik5oAQpFMglvyAasnH4MiTmoyVRUIr5DodVyBm11jtuxJ+0PD+q1gGAiw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oYuaJQlN; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oYuaJQlN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8B911F00893; Fri, 29 May 2026 17:27:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075669; bh=AnRqrgX5/grU1pPP3p9d7iIovw60iDW9eUs+Dcg2Whs=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=oYuaJQlNwfRHsldXHJN6NhMMHfTykr1cjSvaqiwQXTCDG+3mu2Gm74W1k/L0y3kG5 l2fwLsz62m3XGlQx+HXCubLRHS7EHnDwCBFBK0JvhAbdpv5l2NsxManpLjFDb7RxNg RZ1F93kR87fxpDLMmEgoG+pqLSrMXLD8A8S4I21Dtc8wdW8Kxf5B8Tj1IKTHgYs4js YepxbDcdaCKxYZusoESFGBDyj5iWDazkZ1/oOv+t+o3QrtDuuPUAb9qrW5tD5SHDYW me3qBArKnKo9fxVhyo4/T7SXticUkKypekYBgp49z/wX2FfoVqnEg2BArC0q5+YJmT flUP4hgXKMgYg== Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfauth.phl.internal (Postfix) with ESMTP id 53E57F40070; Fri, 29 May 2026 13:27:48 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Fri, 29 May 2026 13:27:48 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+Ges VbRawnQiOHWu4yoBK9tTi1epd9Wadn8mvsEZ4NETWjuNVlMNZA52/wtxEQ3mxV2ilSBjOA tIOFDaGC+WGl9DWjDPPVkvQOr8HGoWoUqRoLapxewhzW3hQbHYCzmNj0bzSwlI0uX1uUEM WEcB1+dDiyzAHmrT7qNnkbkVSqjC2sv2zhrXhhyq4COgfcl13Nmph00zmhevm5noowewtY F3GWdcGNcVzMTjfQNoXP3gtOEHg5i3UxNV5hp856M91euFlWLPufJC4SFLsiG/ANtkJaS6 yNnqJtPxaEFBnPc99PvekHNfMsagYr6TKEqOgNLUH5HwDVQEWwo4gTbejJ5w X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:45 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 03/15] mm: rename uffd-wp PTE accessors to uffd Date: Fri, 29 May 2026 18:26:32 +0100 Message-ID: <20260529172716.357179-4-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Userfaultfd RWP will reuse the uffd-wp PTE bit to mark access-tracking PTEs, alongside the write-protected ones it already marks. The bit's meaning now depends on the VMA flag (WP or RWP), not on its name. Rename the kernel-internal names that describe the bit: - pte/pmd/huge_pte accessors (and swap variants) - pgtable_supports_uffd() capability query - SCAN_PTE_UFFD khugepaged enum The ftrace string emitted by mm_khugepaged_scan_pmd for this enum is kept as "pte_uffd_wp" so existing trace-based tooling keeps matching. Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable.h | 28 +++++++-------- arch/riscv/include/asm/pgtable.h | 38 ++++++++++---------- arch/s390/include/asm/hugetlb.h | 12 +++---- arch/x86/include/asm/pgtable.h | 24 ++++++------- fs/proc/task_mmu.c | 44 +++++++++++------------ include/asm-generic/hugetlb.h | 18 +++++----- include/asm-generic/pgtable_uffd.h | 32 ++++++++--------- include/linux/leafops.h | 4 +-- include/linux/mm_inline.h | 4 +-- include/linux/swapops.h | 4 +-- include/linux/userfaultfd_k.h | 14 ++++---- include/trace/events/huge_memory.h | 2 +- mm/huge_memory.c | 56 +++++++++++++++--------------- mm/hugetlb.c | 46 ++++++++++++------------ mm/internal.h | 4 +-- mm/khugepaged.c | 22 ++++++------ mm/memory.c | 34 +++++++++--------- mm/migrate.c | 12 +++---- mm/migrate_device.c | 8 ++--- mm/mprotect.c | 12 +++---- mm/mremap.c | 4 +-- mm/page_table_check.c | 8 ++--- mm/rmap.c | 16 ++++----- mm/swapfile.c | 4 +-- mm/userfaultfd.c | 6 ++-- 25 files changed, 228 insertions(+), 228 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 3eecb2c17711..c41e4d59dc9f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -341,17 +341,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } @@ -537,17 +537,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } @@ -590,13 +590,13 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_mkvalid_k(pmd) pte_pmd(pte_mkvalid_k(pmd_pte(pmd))) #define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) -#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) -#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) -#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd)) -#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))) -#define pmd_swp_clear_uffd_wp(pmd) \ - pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))) +#define pmd_uffd(pmd) pte_uffd(pmd_pte(pmd)) +#define pmd_mkuffd(pmd) pte_pmd(pte_mkuffd(pmd_pte(pmd))) +#define pmd_clear_uffd(pmd) pte_pmd(pte_clear_uffd(pmd_pte(pmd))) +#define pmd_swp_uffd(pmd) pte_swp_uffd(pmd_pte(pmd)) +#define pmd_swp_mkuffd(pmd) pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))) +#define pmd_swp_clear_uffd(pmd) \ + pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) @@ -1512,7 +1512,7 @@ static inline pmd_t pmdp_establish(struct vm_area_str= uct *vma, * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive - * bit 3: remember uffd-wp state + * bit 3: remember uffd state * bits 6-10: swap type * bit 11: PTE_PRESENT_INVALID (must be zero) * bits 12-61: swap offset diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index ca69948b3ed8..b111e134795e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -400,35 +400,35 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pgtable_supports_uffd_wp() \ +#define pgtable_supports_uffd() \ riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B) =20 -static inline bool pte_uffd_wp(pte_t pte) +static inline bool pte_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 -static inline bool pte_swp_uffd_wp(pte_t pte) +static inline bool pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } @@ -886,34 +886,34 @@ static inline pud_t pud_mkspecial(pud_t pud) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline bool pmd_uffd_wp(pmd_t pmd) +static inline bool pmd_uffd(pmd_t pmd) { - return pte_uffd_wp(pmd_pte(pmd)); + return pte_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { - return pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_clear_uffd(pmd_pte(pmd))); } =20 -static inline bool pmd_swp_uffd_wp(pmd_t pmd) +static inline bool pmd_swp_uffd(pmd_t pmd) { - return pte_swp_uffd_wp(pmd_pte(pmd)); + return pte_swp_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { - return pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 6983e52eaf81..cf8a176ff3d8 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,20 +77,20 @@ static inline void huge_ptep_set_wrprotect(struct mm_st= ruct *mm, __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } =20 -#define __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { return 0; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 038c806b50a2..d14c84b2a332 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -411,17 +411,17 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_UFFD; } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_UFFD); } @@ -526,17 +526,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pmd_uffd_wp(pmd_t pmd) +static inline int pmd_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_UFFD; } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_UFFD); } @@ -1548,32 +1548,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 06fb94a965ff..939657aa334a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2035,14 +2035,14 @@ static pagemap_entry_t pte_to_pagemap_entry(struct = pagemapread *pm, page =3D vm_normal_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_uffd_wp(pte)) + if (pte_uffd(pte)) flags |=3D PM_UFFD_WP; } else { softleaf_t entry; =20 if (pte_swp_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) flags |=3D PM_UFFD_WP; entry =3D softleaf_from_pte(pte); if (pm->show_pfn) { @@ -2108,7 +2108,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_PRESENT; if (pmd_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) flags |=3D PM_UFFD_WP; if (pm->show_pfn) frame =3D pmd_pfn(pmd) + idx; @@ -2127,7 +2127,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_swp_uffd_wp(pmd)) + if (pmd_swp_uffd(pmd)) flags |=3D PM_UFFD_WP; VM_WARN_ON_ONCE(!pmd_is_migration_entry(pmd)); page =3D softleaf_to_page(entry); @@ -2233,14 +2233,14 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, !hugetlb_pmd_shared(ptep)) flags |=3D PM_MMAP_EXCLUSIVE; =20 - if (huge_pte_uffd_wp(pte)) + if (huge_pte_uffd(pte)) flags |=3D PM_UFFD_WP; =20 flags |=3D PM_PRESENT; if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); - } else if (pte_swp_uffd_wp_any(pte)) { + } else if (pte_swp_uffd_any(pte)) { flags |=3D PM_UFFD_WP; } =20 @@ -2441,7 +2441,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd_wp(pte)) + if (!pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2459,7 +2459,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; =20 entry =3D softleaf_from_pte(pte); @@ -2484,13 +2484,13 @@ static void make_uffd_wp_pte(struct vm_area_struct = *vma, pte_t old_pte; =20 old_pte =3D ptep_modify_prot_start(vma, addr, pte); - ptent =3D pte_mkuffd_wp(old_pte); + ptent =3D pte_mkuffd(old_pte); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); } else if (pte_none(ptent)) { set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); } else { - ptent =3D pte_swp_mkuffd_wp(ptent); + ptent =3D pte_swp_mkuffd(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); } } @@ -2509,7 +2509,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd_wp(pmd)) + if (!pmd_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2524,7 +2524,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd_wp(pmd)) + if (!pmd_swp_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2548,10 +2548,10 @@ static void make_uffd_wp_pmd(struct vm_area_struct = *vma, =20 if (pmd_present(pmd)) { old =3D pmdp_invalidate_ad(vma, addr, pmdp); - pmd =3D pmd_mkuffd_wp(old); + pmd =3D pmd_mkuffd(old); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } else if (pmd_is_migration_entry(pmd)) { - pmd =3D pmd_swp_mkuffd_wp(pmd); + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } } @@ -2573,7 +2573,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd_wp(pte)) + if (!huge_pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; @@ -2584,7 +2584,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2612,12 +2612,12 @@ static void make_uffd_wp_huge_pte(struct vm_area_st= ruct *vma, =20 if (softleaf_is_migration(entry)) { set_huge_pte_at(vma->vm_mm, addr, ptep, - pte_swp_mkuffd_wp(ptent), psize); + pte_swp_mkuffd(ptent), psize); } else { pte_t old_pte, new_pte; =20 old_pte =3D huge_ptep_modify_prot_start(vma, addr, ptep); - new_pte =3D huge_pte_mkuffd_wp(old_pte); + new_pte =3D huge_pte_mkuffd(old_pte); huge_ptep_modify_prot_commit(vma, addr, ptep, old_pte, new_pte); } } @@ -2850,8 +2850,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) @@ -2868,8 +2868,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, unsigned long next =3D addr + PAGE_SIZE; pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; ret =3D pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, p, addr, &next); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index e1a2e1b7c8e7..635c41cc3479 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -37,24 +37,24 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t= newprot) return pte_modify(pte, newprot); } =20 -#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { - return huge_pte_wrprotect(pte_mkuffd_wp(pte)); + return huge_pte_wrprotect(pte_mkuffd(pte)); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { - return pte_clear_uffd_wp(pte); + return pte_clear_uffd(pte); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { - return pte_uffd_wp(pte); + return pte_uffd(pte); } #endif =20 diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtab= le_uffd.h index 0d85791efdf7..30e88fc1de2f 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -2,79 +2,79 @@ #define _ASM_GENERIC_PGTABLE_UFFD_H =20 /* - * Some platforms can customize the uffd-wp bit, making it unavailable + * Some platforms can customize the uffd PTE bit, making it unavailable * even if the architecture provides the resource. * Adding this API allows architectures to add their own checks for the * devices on which the kernel is running. * Note: When overriding it, please make sure the * CONFIG_HAVE_ARCH_USERFAULTFD_WP is part of this macro. */ -#ifndef pgtable_supports_uffd_wp -#define pgtable_supports_uffd_wp() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD= _WP) +#ifndef pgtable_supports_uffd +#define pgtable_supports_uffd() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) #endif =20 static inline bool uffd_supports_wp_marker(void) { - return pgtable_supports_uffd_wp() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP= ); + return pgtable_supports_uffd() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP); } =20 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static __always_inline int pte_uffd_wp(pte_t pte) +static __always_inline int pte_uffd(pte_t pte) { return 0; } =20 -static __always_inline int pmd_uffd_wp(pmd_t pmd) +static __always_inline int pmd_uffd(pmd_t pmd) { return 0; } =20 -static __always_inline pte_t pte_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_clear_uffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline int pte_swp_uffd_wp(pte_t pte) +static __always_inline int pte_swp_uffd(pte_t pte) { return 0; } =20 -static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte; } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd; } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return 0; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd; } diff --git a/include/linux/leafops.h b/include/linux/leafops.h index 992cd8bd8ed0..2ce2f37ac883 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -100,8 +100,8 @@ static inline softleaf_t softleaf_from_pmd(pmd_t pmd) =20 if (pmd_swp_soft_dirty(pmd)) pmd =3D pmd_swp_clear_soft_dirty(pmd); - if (pmd_swp_uffd_wp(pmd)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + if (pmd_swp_uffd(pmd)) + pmd =3D pmd_swp_clear_uffd(pmd); arch_entry =3D __pmd_to_swp_entry(pmd); =20 /* Temporary until swp_entry_t eliminated. */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..2811caf4188d 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -600,14 +600,14 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, return false; =20 /* A uffd-wp wr-protected normal pte */ - if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + if (unlikely(pte_present(pteval) && pte_uffd(pteval))) arm_uffd_pte =3D true; =20 /* * A uffd-wp wr-protected swap pte. Note: this should even cover an * existing pte marker with uffd-wp bit set. */ - if (unlikely(pte_swp_uffd_wp_any(pteval))) + if (unlikely(pte_swp_uffd_any(pteval))) arm_uffd_pte =3D true; =20 if (unlikely(arm_uffd_pte)) { diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 8cfc966eae48..15c6440e38dd 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -73,8 +73,8 @@ static inline pte_t pte_swp_clear_flags(pte_t pte) pte =3D pte_swp_clear_exclusive(pte); if (pte_swp_soft_dirty(pte)) pte =3D pte_swp_clear_soft_dirty(pte); - if (pte_swp_uffd_wp(pte)) - pte =3D pte_swp_clear_uffd_wp(pte); + if (pte_swp_uffd(pte)) + pte =3D pte_swp_clear_uffd(pte); return pte; } =20 diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 68edac4dcd78..658740df2978 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -211,13 +211,13 @@ static inline bool userfaultfd_minor(struct vm_area_s= truct *vma) static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { - return userfaultfd_wp(vma) && pte_uffd_wp(pte); + return userfaultfd_wp(vma) && pte_uffd(pte); } =20 static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, pmd_t pmd) { - return userfaultfd_wp(vma) && pmd_uffd_wp(pmd); + return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 static inline bool userfaultfd_armed(struct vm_area_struct *vma) @@ -272,10 +272,10 @@ static inline bool userfaultfd_wp_use_markers(struct = vm_area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { if (!uffd_supports_wp_marker()) return false; @@ -283,7 +283,7 @@ static inline bool pte_swp_uffd_wp_any(pte_t pte) if (pte_present(pte)) return false; =20 - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) return true; =20 if (pte_is_uffd_wp_marker(pte)) @@ -424,7 +424,7 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er * forms (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { return false; } diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 291fae364c62..5a48c5406cce 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -16,7 +16,7 @@ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ - EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ + EM( SCAN_PTE_UFFD, "pte_uffd_wp") \ EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7c895b1d366..d43c2255f47d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1909,8 +1909,8 @@ static void copy_huge_non_present_pmd( pmd =3D swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } else if (softleaf_is_device_private(entry)) { /* @@ -1923,8 +1923,8 @@ static void copy_huge_non_present_pmd( =20 if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } =20 @@ -1944,7 +1944,7 @@ static void copy_huge_non_present_pmd( mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } =20 @@ -2040,7 +2040,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); pmdp_set_wrprotect(src_mm, addr, src_pmd); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2581,9 +2581,9 @@ static pmd_t clear_uffd_wp_pmd(pmd_t pmd) if (pmd_none(pmd)) return pmd; if (pmd_present(pmd)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); else - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); =20 return pmd; } @@ -2663,16 +2663,16 @@ static void change_non_present_huge_pmd(struct mm_s= truct *mm, } else if (softleaf_is_device_private_write(entry)) { entry =3D make_readable_device_private_entry(swp_offset(entry)); newpmd =3D swp_entry_to_pmd(entry); - if (pmd_swp_uffd_wp(*pmd)) - newpmd =3D pmd_swp_mkuffd_wp(newpmd); + if (pmd_swp_uffd(*pmd)) + newpmd =3D pmd_swp_mkuffd(newpmd); } else { newpmd =3D *pmd; } =20 if (uffd_wp) - newpmd =3D pmd_swp_mkuffd_wp(newpmd); + newpmd =3D pmd_swp_mkuffd(newpmd); else if (uffd_wp_resolve) - newpmd =3D pmd_swp_clear_uffd_wp(newpmd); + newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); } @@ -2753,14 +2753,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, =20 entry =3D pmd_modify(oldpmd, newprot); if (uffd_wp) - entry =3D pmd_mkuffd_wp(entry); + entry =3D pmd_mkuffd(entry); else if (uffd_wp_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly * handled. */ - entry =3D pmd_clear_uffd_wp(entry); + entry =3D pmd_clear_uffd(entry); =20 /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && @@ -3103,8 +3103,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, =20 entry =3D pfn_pte(zero_pfn(addr), vma->vm_page_prot); entry =3D pte_mkspecial(entry); - if (pmd_uffd_wp(old_pmd)) - entry =3D pte_mkuffd_wp(entry); + if (pmd_uffd(old_pmd)) + entry =3D pte_mkuffd(entry); VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3188,7 +3188,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_migration_write(entry); if (PageAnon(page)) @@ -3204,7 +3204,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_device_private_write(entry); anon_exclusive =3D PageAnonExclusive(page); @@ -3261,7 +3261,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, write =3D pmd_write(old_pmd); young =3D pmd_young(old_pmd); soft_dirty =3D pmd_soft_dirty(old_pmd); - uffd_wp =3D pmd_uffd_wp(old_pmd); + uffd_wp =3D pmd_uffd(old_pmd); =20 VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); @@ -3332,7 +3332,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3359,7 +3359,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3377,7 +3377,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); @@ -5018,8 +5018,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_wa= lk *pvmw, pmdswp =3D swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp =3D pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp =3D pmd_swp_mkuffd_wp(pmdswp); + if (pmd_uffd(pmdval)) + pmdswp =3D pmd_swp_mkuffd(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); folio_remove_rmap_pmd(folio, page, vma); folio_put(folio); @@ -5049,8 +5049,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) pmde =3D pmd_mksoft_dirty(pmde); if (softleaf_is_migration_write(entry)) pmde =3D pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_mkuffd(pmde); if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ @@ -5070,8 +5070,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) =20 if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde =3D pmd_swp_mksoft_dirty(pmde); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_swp_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_swp_mkuffd(pmde); } =20 if (folio_test_anon(folio)) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 571212b80835..d0c81a056ae2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4843,8 +4843,8 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte= _t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4918,10 +4918,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { - bool uffd_wp =3D pte_swp_uffd_wp(entry); + bool uffd =3D pte_swp_uffd(entry); =20 if (!softleaf_is_migration_read(softleaf) && cow) { /* @@ -4931,12 +4931,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + if (userfaultfd_wp(src_vma) && uffd) + entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { const pte_marker marker =3D copy_pte_marker(softleaf, dst_vma); @@ -5013,7 +5013,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } =20 if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); =20 set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); @@ -5061,9 +5061,9 @@ static void move_huge_pte(struct vm_area_struct *vma,= unsigned long old_addr, } else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -5197,7 +5197,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, * drop the uffd-wp bit in this zap, then replace the * pte with a marker. */ - if (pte_swp_uffd_wp_any(pte) && + if (pte_swp_uffd_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5233,7 +5233,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, if (huge_pte_dirty(pte)) folio_mark_dirty(folio); /* Leave a uffd-wp pte marker if needed */ - if (huge_pte_uffd_wp(pte) && + if (huge_pte_uffd(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5437,7 +5437,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) * can trigger this, because hugetlb_fault() will always resolve * uffd-wp bit first. */ - if (!unshare && huge_pte_uffd_wp(pte)) + if (!unshare && huge_pte_uffd(pte)) return 0; =20 /* Let's take out MAP_SHARED mappings first. */ @@ -5581,8 +5581,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) huge_ptep_clear_flush(vma, vmf->address, vmf->pte); hugetlb_remove_rmap(old_folio); hugetlb_add_new_anon_rmap(new_folio, vma, vmf->address); - if (huge_pte_uffd_wp(pte)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (huge_pte_uffd(pte)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(mm, vmf->address, vmf->pte, newpte, huge_page_size(h)); folio_set_hugetlb_migratable(new_folio); @@ -5860,7 +5860,7 @@ static vm_fault_t hugetlb_no_page(struct address_spac= e *mapping, * if populated. */ if (unlikely(pte_is_uffd_wp_marker(vmf->orig_pte))) - new_pte =3D huge_pte_mkuffd_wp(new_pte); + new_pte =3D huge_pte_mkuffd(new_pte); set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h)); =20 hugetlb_count_add(pages_per_huge_page(h), mm); @@ -6058,7 +6058,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, goto out_ptl; =20 /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(mm, vmf.address= , vmf.pte)) && + if (userfaultfd_wp(vma) && huge_pte_uffd(huge_ptep_get(mm, vmf.address, v= mf.pte)) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) { if (!userfaultfd_wp_async(vma)) { spin_unlock(vmf.ptl); @@ -6067,7 +6067,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, return handle_userfault(&vmf, VM_UFFD_WP); } =20 - vmf.orig_pte =3D huge_pte_clear_uffd_wp(vmf.orig_pte); + vmf.orig_pte =3D huge_pte_clear_uffd(vmf.orig_pte); set_huge_pte_at(mm, vmf.address, vmf.pte, vmf.orig_pte, huge_page_size(hstate_vma(vma))); /* Fallthrough to CoW */ @@ -6352,7 +6352,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, _dst_pte =3D pte_mkyoung(_dst_pte); =20 if (wp_enabled) - _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); + _dst_pte =3D huge_pte_mkuffd(_dst_pte); =20 set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte, size); =20 @@ -6476,9 +6476,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); } else if (unlikely(pte_is_marker(pte))) { @@ -6499,9 +6499,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); if (uffd_wp) - pte =3D huge_pte_mkuffd_wp(pte); + pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/internal.h b/mm/internal.h index 5602393054f3..9325eefbea6a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -412,8 +412,8 @@ static inline pte_t pte_move_swp_offset(pte_t pte, long= delta) new =3D pte_swp_mksoft_dirty(new); if (pte_swp_exclusive(pte)) new =3D pte_swp_mkexclusive(new); - if (pte_swp_uffd_wp(pte)) - new =3D pte_swp_mkuffd_wp(new); + if (pte_swp_uffd(pte)) + new =3D pte_swp_mkuffd(new); =20 return new; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4549a020bf73..afa218be15de 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -37,7 +37,7 @@ enum scan_result { SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, - SCAN_PTE_UFFD_WP, + SCAN_PTE_UFFD, SCAN_PTE_MAPPED_HUGEPAGE, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -712,8 +712,8 @@ static enum scan_result __collapse_huge_page_isolate(st= ruct vm_area_struct *vma, result =3D SCAN_PTE_NON_PRESENT; goto out; } - if (pte_uffd_wp(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_uffd(pteval)) { + result =3D SCAN_PTE_UFFD; goto out; } page =3D vm_normal_page(vma, addr, pteval); @@ -1566,7 +1566,7 @@ static int mthp_collapse(struct mm_struct *mm, struct= vm_area_struct *vma, case SCAN_PAGE_NULL: case SCAN_DEL_PAGE_LRU: case SCAN_PTE_NON_PRESENT: - case SCAN_PTE_UFFD_WP: + case SCAN_PTE_UFFD: case SCAN_ALLOC_HUGE_PAGE_FAIL: case SCAN_PAGE_LAZYFREE: goto next_order; @@ -1666,15 +1666,15 @@ static enum scan_result collapse_scan_pmd(struct mm= _struct *mm, /* * Always be strict with uffd-wp * enabled swap entries. Please see - * comment below for pte_uffd_wp(). + * comment below for pte_uffd(). */ - if (pte_swp_uffd_wp_any(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_swp_uffd_any(pteval)) { + result =3D SCAN_PTE_UFFD; goto out_unmap; } continue; } - if (pte_uffd_wp(pteval)) { + if (pte_uffd(pteval)) { /* * Don't collapse the page if any of the small * PTEs are armed with uffd write protection. @@ -1684,7 +1684,7 @@ static enum scan_result collapse_scan_pmd(struct mm_s= truct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result =3D SCAN_PTE_UFFD_WP; + result =3D SCAN_PTE_UFFD; goto out_unmap; } =20 @@ -1897,7 +1897,7 @@ static enum scan_result try_collapse_pte_mapped_thp(s= truct mm_struct *mm, unsign =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ if (userfaultfd_wp(vma)) - return SCAN_PTE_UFFD_WP; + return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); @@ -3244,7 +3244,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, /* Whitelisted set of results where continuing OK */ case SCAN_NO_PTE_TABLE: case SCAN_PTE_NON_PRESENT: - case SCAN_PTE_UFFD_WP: + case SCAN_PTE_UFFD: case SCAN_LACK_REFERENCED_PAGE: case SCAN_PAGE_NULL: case SCAN_PAGE_COUNT: diff --git a/mm/memory.c b/mm/memory.c index 7c020995eafc..c4fd5cb4a08f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -893,8 +893,8 @@ static void restore_exclusive_pte(struct vm_area_struct= *vma, if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_mksoft_dirty(pte); =20 - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_mkuffd(pte); =20 if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { @@ -984,8 +984,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_private(entry)) { @@ -1018,8 +1018,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, entry =3D make_readable_device_private_entry( swp_offset(entry)); pte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_exclusive(entry)) { @@ -1042,7 +1042,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, return 0; } if (!userfaultfd_wp(dst_vma)) - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } @@ -1090,7 +1090,7 @@ copy_present_page(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) /* Uffd-wp needs to be delivered to dest pte as well */ - pte =3D pte_mkuffd_wp(pte); + pte =3D pte_mkuffd(pte); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1113,7 +1113,7 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkold(pte); =20 if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); =20 set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } @@ -3925,8 +3925,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (unlikely(unshare)) { if (pte_soft_dirty(vmf->orig_pte)) entry =3D pte_mksoft_dirty(entry); - if (pte_uffd_wp(vmf->orig_pte)) - entry =3D pte_mkuffd_wp(entry); + if (pte_uffd(vmf->orig_pte)) + entry =3D pte_mkuffd(entry); } else { entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); } @@ -4261,7 +4261,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * etc.) because we're only removing the uffd-wp bit, * which is completely invisible to the user. */ - pte =3D pte_clear_uffd_wp(ptep_get(vmf->pte)); + pte =3D pte_clear_uffd(ptep_get(vmf->pte)); =20 set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); /* @@ -5038,8 +5038,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte =3D mk_pte(page, vma->vm_page_prot); if (pte_swp_soft_dirty(vmf->orig_pte)) pte =3D pte_mksoft_dirty(pte); - if (pte_swp_uffd_wp(vmf->orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(vmf->orig_pte)) + pte =3D pte_mkuffd(pte); =20 /* * Same logic as in do_wp_page(); however, optimize for pages that are @@ -5255,7 +5255,7 @@ void map_anon_folio_pte_nopf(struct folio *folio, pte= _t *pte, if (vma->vm_flags & VM_WRITE) entry =3D pte_mkwrite(pte_mkdirty(entry), vma); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 folio_ref_add(folio, nr_pages - 1); folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); @@ -5322,7 +5322,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } if (vmf_orig_pte_uffd_wp(vmf)) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); set_pte_at(vma->vm_mm, addr, vmf->pte, entry); =20 /* No need to invalidate - it was non-present before */ @@ -5572,7 +5572,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio= *folio, else if (pte_write(entry) && folio_test_dirty(folio)) entry =3D pte_mkdirty(entry); if (unlikely(vmf_orig_pte_uffd_wp(vmf))) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { VM_BUG_ON_FOLIO(nr !=3D 1, folio); diff --git a/mm/migrate.c b/mm/migrate.c index 0c6a0ab6ecce..4bdb5be7afbf 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -326,8 +326,8 @@ static bool try_to_map_unused_to_zeropage(struct page_v= ma_mapped_walk *pvmw, =20 if (pte_swp_soft_dirty(old_pte)) newpte =3D pte_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(old_pte)) - newpte =3D pte_mkuffd_wp(newpte); + if (pte_swp_uffd(old_pte)) + newpte =3D pte_mkuffd(newpte); =20 set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 @@ -391,8 +391,8 @@ static bool remove_migration_pte(struct folio *folio, =20 if (softleaf_is_migration_write(entry)) pte =3D pte_mkwrite(pte, vma); - else if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_mkuffd_wp(pte); + else if (pte_swp_uffd(old_pte)) + pte =3D pte_mkuffd(pte); =20 if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; @@ -407,8 +407,8 @@ static bool remove_migration_pte(struct folio *folio, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(old_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(old_pte)) + pte =3D pte_swp_mkuffd(pte); } =20 #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 554754eb26ff..17da1bab0248 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -445,13 +445,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (pte_present(pte)) { if (pte_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, addr, ptep, swp_pte); =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 9cbf932b028c..8340c8b228c6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -240,8 +240,8 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, */ entry =3D make_readable_device_private_entry(swp_offset(entry)); newpte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(oldpte)) - newpte =3D pte_swp_mkuffd_wp(newpte); + if (pte_swp_uffd(oldpte)) + newpte =3D pte_swp_mkuffd(newpte); } else if (softleaf_is_marker(entry)) { /* * Ignore error swap entries unconditionally, @@ -266,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { set_pte_at(vma->vm_mm, addr, pte, newpte); @@ -290,9 +290,9 @@ static __always_inline void change_present_ptes(struct = mmu_gather *tlb, ptent =3D pte_modify(oldpte, newprot); =20 if (uffd_wp) - ptent =3D pte_mkuffd_wp(ptent); + ptent =3D pte_mkuffd(ptent); else if (uffd_wp_resolve) - ptent =3D pte_clear_uffd_wp(ptent); + ptent =3D pte_clear_uffd(ptent); =20 /* * In some writable, shared mappings, we might want diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..12732a5c547e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -297,9 +297,9 @@ static int move_ptes(struct pagetable_move_control *pmc, else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 53a8997ec043..3fb995e5d40d 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -188,8 +188,8 @@ static inline bool softleaf_cached_writable(softleaf_t = entry) static void page_table_check_pte_flags(pte_t pte) { if (pte_present(pte)) { - WARN_ON_ONCE(pte_uffd_wp(pte) && pte_write(pte)); - } else if (pte_swp_uffd_wp(pte)) { + WARN_ON_ONCE(pte_uffd(pte) && pte_write(pte)); + } else if (pte_swp_uffd(pte)) { const softleaf_t entry =3D softleaf_from_pte(pte); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); @@ -216,9 +216,9 @@ EXPORT_SYMBOL(__page_table_check_ptes_set); static inline void page_table_check_pmd_flags(pmd_t pmd) { if (pmd_present(pmd)) { - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) WARN_ON_ONCE(pmd_write(pmd)); - } else if (pmd_swp_uffd_wp(pmd)) { + } else if (pmd_swp_uffd(pmd)) { const softleaf_t entry =3D softleaf_from_pmd(pmd); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); diff --git a/mm/rmap.c b/mm/rmap.c index 1c77d5dc06e9..546bc1cf9391 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2318,13 +2318,13 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, if (likely(pte_present(pteval))) { if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { @@ -2692,14 +2692,14 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { swp_pte =3D swp_entry_to_pte(entry); if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, diff --git a/mm/swapfile.c b/mm/swapfile.c index e3d126602a1e..15fdca2da1f7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2557,8 +2557,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, new_pte =3D pte_mkold(mk_pte(page, vma->vm_page_prot)); if (pte_swp_soft_dirty(old_pte)) new_pte =3D pte_mksoft_dirty(new_pte); - if (pte_swp_uffd_wp(old_pte)) - new_pte =3D pte_mkuffd_wp(new_pte); + if (pte_swp_uffd(old_pte)) + new_pte =3D pte_mkuffd(new_pte); setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f6d2a1c67019..9d74be69873a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -394,7 +394,7 @@ static int mfill_atomic_install_pte(pmd_t *dst_pmd, if (writable) _dst_pte =3D pte_mkwrite(_dst_pte, dst_vma); if (flags & MFILL_ATOMIC_WP) - _dst_pte =3D pte_mkuffd_wp(_dst_pte); + _dst_pte =3D pte_mkuffd(_dst_pte); =20 ret =3D -EAGAIN; dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); @@ -3591,7 +3591,7 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) vm_flags |=3D VM_UFFD_MISSING; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) { - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) goto out; =20 vm_flags |=3D VM_UFFD_WP; @@ -4301,7 +4301,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, uffdio_api.features &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; =20 if (!uffd_supports_wp_marker()) { --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC1E4388893; Fri, 29 May 2026 17:27:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075682; cv=none; b=hnaxfbwvDxb0PuRY65lABTAtfxSWTUNFcEqDLxkIwt7yEozXHl+aaX967Re050gMwg03rdtfxAx1q/AN2A9unZLDboC9GhaZ0Flg8Q8UKd2aNIOMkmtqhdAL7+zuhTX4EikNAI9/G1jVC8VkKmXaLA5VUm9PMEhcxOnfHA2EyJ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075682; c=relaxed/simple; bh=pweU6E6V5UxZMVhZ2AP8UStKS+cHPSOB9FOU44WKrf0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eZXhpRXMugHDSo+kByWZI4AjCM86N20lbC54roo99mmhO8trdALu3tdBqBC5RBZ3gSp8n6zCMvupqWGfHYx/CiovlZ0SGlj51QLmI1fQDsRHod9ZRKUfn5aUJbO6kFUyrZsESsGlhbHf4lNFqNb5FKqs77u3Bt4PWksK7LrHgZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nec8atXG; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nec8atXG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA4AC1F0089B; Fri, 29 May 2026 17:27:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075673; bh=SQ1nDTsRLGDl2EAMFRNs5j0bNOh8Bj7CANx4ag7x/sY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=nec8atXGkJujJy7W2x4RHeh46PyG43lwX17Al4yfDKD2PJi9hzW+FGqt0/jfwtaxv 8Iz8IXxQt2kesi3AVnRMvmM7gQtKPiuw/755WryrO0nCG5ESZ42N4YTj0TTgANSpH5 Dn8STpqqEFEj93dlVLHoCOacVswN4P8LSw6nwdZOvjM768/1s3adgceTMnkqn0f8v2 mkX0T0NjgYuWFQiHh1R7QLORYC/4HxgqrNOhcZuR23mgXBXK5aIB9Wqj7A2fHTiMuX MhxhXkrLCzfBEkjGKrj1NuGymRZkdd4wiErdmHAjdQfWxf7y8Z9Ya72AnUnfFRmPk0 iLz9Sda5XClWg== Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfauth.phl.internal (Postfix) with ESMTP id 13B7DF40070; Fri, 29 May 2026 13:27:52 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Fri, 29 May 2026 13:27:52 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEUyOwNbDmHa1KNnZhhp0sknwyB5iYBikO499ls5PXcb2OBGZ+LtCI75nkUeooCiF UzrXo04XJ8+Ua/uZDk13qfrbBwlZ5p9xJmobKJQlMbnj7l7Wa1HVN1RPiFo+pMeAxt22yi 61vTsKtOSGFbPiuQJwzf8kDMk7i9oYKVPojpqHuO0mt23+5gFs2X9nq4e5eFTuainOibKw /PjyCpBsVGYk6ilHJDmI324wGNqP1tQqUfnSbHSEKXH/CNKWAMLzoIWnryvyvz0mzyGoSH 2C2M5+XUQ3GxwG6CgDLP16c5lhscyjq12y5muGouPR8MZ3kTeYPrBARrvcwg6KATek/ZWq +dP0D1rsZmQ1VFiNA+xyuNFvJNCjaMNMPCLtlNpIj6q9b06VrZoOfLYh6a5mPwNib0hkRL c+ZWzy8ciJA0v6d/VtEpjxVHWwAipQCSTrc9sxdVPlecEW53mOgLz+tV1zJ7JLvKwfawMO qVXArPzUPOoTGowwQnDvJrO7egZwJGKJkwMQDaQbsj+yUcxyiVKnn/amGJ9qd5HJ9Sp4sO 7gte4/4huMO65wclA1C+VepBODWBRI0A5r+98PASSCy+d7rtf8X9OJnFiQibhGontdXFK3 qWaGg4A2UenLkAl7DzOZU688UcpGeTPHuTKZu/nwkn6RY9O7b2wuki171pbQ X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:51 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 04/15] userfaultfd: test uffd VMA flags through the vma_flags_t API Date: Fri, 29 May 2026 18:26:33 +0100 Message-ID: <20260529172716.357179-5-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The uffd VMA-flag helpers read vma->vm_flags directly. Now that config-gated per-mode masks exist, switch them to the vma_flags_t accessor vma_test_any_mask(), which is the going-forward API and keeps a single place (the VMA_UFFD_* masks) that knows which modes are available on the current build. No functional change: vma_flags_t is in union with vm_flags, so the same bits are read, and the masks fold to the same code the open-coded vm_flags tests produced -- verified identical on gcc and clang, 32- and 64-bit. Suggested-by: Lorenzo Stoakes Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-8 Acked-by: Mike Rapoport (Microsoft) Reviewed-by: Lorenzo Stoakes --- include/linux/userfaultfd_k.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 658740df2978..c4f2cc6dfcf0 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -178,7 +178,8 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(stru= ct vm_area_struct *vma, */ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { - return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); + return vma_test_any_mask(vma, + mk_vma_flags_from_masks(VMA_UFFD_WP, VMA_UFFD_MINOR)); } =20 /* @@ -190,22 +191,23 @@ static inline bool uffd_disable_huge_pmd_share(struct= vm_area_struct *vma) */ static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) { - return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); + return vma_test_any_mask(vma, + mk_vma_flags_from_masks(VMA_UFFD_WP, VMA_UFFD_MINOR)); } =20 static inline bool userfaultfd_missing(struct vm_area_struct *vma) { - return vma->vm_flags & VM_UFFD_MISSING; + return vma_test_any_mask(vma, VMA_UFFD_MISSING); } =20 static inline bool userfaultfd_wp(struct vm_area_struct *vma) { - return vma->vm_flags & VM_UFFD_WP; + return vma_test_any_mask(vma, VMA_UFFD_WP); } =20 static inline bool userfaultfd_minor(struct vm_area_struct *vma) { - return vma->vm_flags & VM_UFFD_MINOR; + return vma_test_any_mask(vma, VMA_UFFD_MINOR); } =20 static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, @@ -222,7 +224,7 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_ar= ea_struct *vma, =20 static inline bool userfaultfd_armed(struct vm_area_struct *vma) { - return vma->vm_flags & __VM_UFFD_FLAGS; + return vma_test_any_mask(vma, __VMA_UFFD_FLAGS); } =20 static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82F763876A4; Fri, 29 May 2026 17:27:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075679; cv=none; b=HfzE8pA9qx1Pb3T3SYNYuPPAo7MvjVDAphN7NHBD5opITUrqPTLCNSUM0B8E4fWKA+DlfQzkk2BWc65wZ7gZwNUpKKrjlQmfRrAYnkJyHAaUk3wUHGErmFiYyeMnSJ56hXjgi7mp6gmJLk/nb49ffy6gKzo1EjDJVQCmQW5u/64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075679; c=relaxed/simple; bh=wgQBjdRmhL1PmCnIwL5OxLFWTdzgn0xgdLKxbSOeQuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fOa1Qo6g63gCMopUrX2oeVR1zrY/z4h2ll497pDIT65/hAQTovi3lVedx61EUAfhKvbaOeVd6DoW1cGUAOmQVJY+ipAJF0maH279VCxI4Dkc6nzceZIQ/ZdGJnK78ObwonNHbKCUllipLd3tfBH0b6ClcG689TNMsvMZNxf/onY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iTsoKonQ; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iTsoKonQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E5A61F0089D; Fri, 29 May 2026 17:27:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075675; bh=1kPL22rrzi+QauahU6y1XS1epMwqoNAKDudDgPyn9b8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=iTsoKonQtoRNhZIcdFYNnxsjobIzbTuBsaMI8AKLqnDUFuRAEFCXAaDZzLQ4uluvf 2BoK36uLWtVdWMuAW7C+wFPPZXavkJ9jyq0cH/U+Q6lXKQ9+xj0KY7JgqPNDrCJdUD I7O8aH/AmTfAAimqbuAFCmBueKzIN+ShY10Qk5MvYv/65QDSKfwkH978L3WXIi9zed SAVssbyDIXfzgK3NLUWN7lGPsvypUD+ZFkQm/6pjmygyb4cSUSQJML/ThttwzkcXI6 +Makwar22WjuR+EFxNY6zf3IR21Vshut+rOIM7C9O11WnU4BcRuGBHCbtbC19Fr9G0 W80B9jkmbwbHw== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id AD56DF4006F; Fri, 29 May 2026 13:27:53 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 29 May 2026 13:27:53 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+GWq gPC1wTp7fcejRHI3Q1FCzTHBKt9o4RKI/fN91edtIJi4rsYCqiSgf9/MN9PyG3oVtWwq2S OAwDPArj+xnGVYvnZdoZqf1pexBbR5njpxL6C3GTR+3jo1NY5djFNqneHNCwf2Hpsm15in 1XS1KK767JDo+yUy4ycjs5w9OSpF8yQaWZzTrerso3eInllVcoT5QdI3MWYQVZqkXjPL8b mUzD5UgIUZGuGJCF4hd21JivANc+J6m3AGnfZkPWlppdwuBGA0vWQZoNjowrfQO8m/0BEo x4qJqm4lDkVPvQiPUZnEnO5I6L9exiwUbCXRwfxeGYYJEZi7J5Q+cK1EffWA X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:53 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 05/15] mm: add VM_UFFD_RWP VMA flag Date: Fri, 29 May 2026 18:26:34 +0100 Message-ID: <20260529172716.357179-6-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Preparatory patch for userfaultfd read-write protection (RWP). RWP extends userfaultfd protection from plain write-protection (WP) to full read-write protection: accesses to an RWP-protected range -- reads as well as writes -- trap through userfaultfd. Reserve VM_UFFD_RWP, add the userfaultfd_rwp() and userfaultfd_protected() helpers, and wire up the smaps "ur" entry and the trace-flag table the rest of the series will use. The flag is gated on CONFIG_USERFAULTFD_RWP, which is introduced together with the UAPI in a later patch; until then VM_UFFD_RWP aliases VM_NONE and every downstream check folds to dead code. Nothing sets or queries the flag yet. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park Reviewed-by: Lorenzo Stoakes --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 41 ++++++++++++++++++++---------- include/linux/userfaultfd_k.h | 32 +++++++++++++++++++---- include/trace/events/mmflags.h | 7 +++++ 5 files changed, 65 insertions(+), 19 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index db6167befb7b..db28207c5290 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -607,6 +607,7 @@ encoded manner. The codes are the following: um userfaultfd missing tracking uw userfaultfd wr-protect tracking ui userfaultfd minor fault + ur userfaultfd read-write-protect tracking ss shadow/guarded control stack page sl sealed lf lock on fault pages diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 939657aa334a..ca0f69b347e8 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1237,6 +1237,9 @@ static void show_smap_vma_flags(struct seq_file *m, s= truct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] =3D "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_USERFAULTFD_RWP + [ilog2(VM_UFFD_RWP)] =3D "ur", +#endif #ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] =3D "ss", #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 485df9c2dbdd..5ac31fbadeef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -353,6 +353,7 @@ enum { #endif DECLARE_VMA_BIT(UFFD_MINOR, 41), DECLARE_VMA_BIT(SEALED, 42), + DECLARE_VMA_BIT(UFFD_RWP, 43), /* Flags that reuse flags above. */ DECLARE_VMA_BIT_ALIAS(PKEY_BIT0, HIGH_ARCH_0), DECLARE_VMA_BIT_ALIAS(PKEY_BIT1, HIGH_ARCH_1), @@ -496,12 +497,17 @@ enum { #else #define VM_UFFD_MINOR VM_NONE #endif +#ifdef CONFIG_USERFAULTFD_RWP +#define VM_UFFD_RWP INIT_VM_FLAG(UFFD_RWP) +#else +#define VM_UFFD_RWP VM_NONE +#endif =20 /* - * vma_flags_t masks for the userfaultfd VMA flags. VMA_UFFD_MINOR is gate= d on - * the same config as VM_UFFD_MINOR -- which implies 64BIT, where the bit = fits - * -- so an out-of-range bit is never fed to mk_vma_flags() on a build who= se - * bitmap cannot hold it. + * vma_flags_t masks for the userfaultfd VMA flags. The two high-bit modes= are + * gated on the same configs as their VM_* flags above -- both of which im= ply + * 64BIT -- so an out-of-range bit is never fed to mk_vma_flags() on a bui= ld + * whose bitmap cannot hold it. */ #define VMA_UFFD_MISSING mk_vma_flags(VMA_UFFD_MISSING_BIT) #define VMA_UFFD_WP mk_vma_flags(VMA_UFFD_WP_BIT) @@ -510,6 +516,11 @@ enum { #else #define VMA_UFFD_MINOR EMPTY_VMA_FLAGS #endif +#ifdef CONFIG_USERFAULTFD_RWP +#define VMA_UFFD_RWP mk_vma_flags(VMA_UFFD_RWP_BIT) +#else +#define VMA_UFFD_RWP EMPTY_VMA_FLAGS +#endif =20 #ifdef CONFIG_64BIT #define VM_ALLOW_ANY_UNCACHED INIT_VM_FLAG(ALLOW_ANY_UNCACHED) @@ -648,22 +659,24 @@ enum { * reconsistuted upon page fault, so necessitate page table copying upon f= ork. * * Note that these flags should be compared with the DESTINATION VMA not t= he - * source, as VM_UFFD_WP may not be propagated to destination, while all o= ther - * flags will be. + * source: VM_UFFD_WP and VM_UFFD_RWP may be cleared on the destination + * (dup_userfaultfd() -> userfaultfd_reset_ctx() when the parent context d= id + * not negotiate UFFD_FEATURE_EVENT_FORK), while all other flags propagate. * * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot= be * reasonably reconstructed on page fault. * * VM_UFFD_WP - Encodes metadata about an installed uffd - * write protect handler, which cannot be - * reconstructed on page fault. + * VM_UFFD_RWP write- or read-write-protect handler, which + * cannot be reconstructed on page fault. * - * We always copy pgtables when dst_vma has uffd= -wp - * enabled even if it's file-backed - * (e.g. shmem). Because when uffd-wp is enabled, - * pgtable contains uffd-wp protection informati= on, - * that's something we can't retrieve from page = cache, - * and skip copying will lose those info. + * We always copy pgtables when dst_vma has the + * uffd PTE bit in use even if it's file-backed + * (e.g. shmem). Because when the uffd bit is + * in use, the pgtable contains the protection + * information, that's something we can't + * retrieve from page cache, and skip copying + * will lose those info. * * VM_MAYBE_GUARD - Could contain page guard region markers which * by design are a property of the page tables diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c4f2cc6dfcf0..f3b2db27989b 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -21,10 +21,11 @@ #include =20 /* The set of all possible UFFD-related VM flags. */ -#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_MINOR | \ + VM_UFFD_WP | VM_UFFD_RWP) =20 #define __VMA_UFFD_FLAGS mk_vma_flags_from_masks(VMA_UFFD_MISSING, VMA_UFF= D_WP, \ - VMA_UFFD_MINOR) + VMA_UFFD_MINOR, VMA_UFFD_RWP) =20 /* * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining @@ -179,7 +180,8 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(stru= ct vm_area_struct *vma, static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { return vma_test_any_mask(vma, - mk_vma_flags_from_masks(VMA_UFFD_WP, VMA_UFFD_MINOR)); + mk_vma_flags_from_masks(VMA_UFFD_WP, VMA_UFFD_MINOR, + VMA_UFFD_RWP)); } =20 /* @@ -210,6 +212,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return vma_test_any_mask(vma, VMA_UFFD_MINOR); } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return vma_test_any_mask(vma, VMA_UFFD_RWP); +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return userfaultfd_wp(vma) || userfaultfd_rwp(vma); +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -330,6 +342,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return false; } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return false; +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -423,8 +445,8 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ static inline bool pte_swp_uffd_any(pte_t pte) { diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a6e5a44c9b42..bfface3d0203 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -194,6 +194,12 @@ IF_HAVE_PG_ARCH_3(arch_3) # define IF_HAVE_UFFD_MINOR(flag, name) #endif =20 +#ifdef CONFIG_USERFAULTFD_RWP +# define IF_HAVE_UFFD_RWP(flag, name) {flag, name}, +#else +# define IF_HAVE_UFFD_RWP(flag, name) +#endif + #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32) # define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name}, #else @@ -215,6 +221,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \ {VM_PFNMAP, "pfnmap" }, \ {VM_MAYBE_GUARD, "maybe_guard" }, \ {VM_UFFD_WP, "uffd_wp" }, \ +IF_HAVE_UFFD_RWP(VM_UFFD_RWP, "uffd_rwp" ) \ {VM_LOCKED, "locked" }, \ {VM_IO, "io" }, \ {VM_SEQ_READ, "seqread" }, \ --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C85D3388860; Fri, 29 May 2026 17:27:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075685; cv=none; b=eBYdI1/fUCEHIovUwu2BVUyc2D2jIE1eMvOlV9soPLH3E8gLCWvB3kcZy2eIrbR9EQIN6QgPRpcxAreWbsBUBBkFuK1PBh3UiyGXn18L8ORMidj7s/LzzFHRdDTgQufUB24D/Hblqd8RY0ZWvYbDevEUt3ypUl1W/8rVhX1NYjc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075685; c=relaxed/simple; bh=mD9wYan0hLuZ9uGm6U51SZ80FiCq6+gIUCwkLsZRcVY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K/4I4GygvnehPFNSHbXUjBossrjq+a3YDlas/nH7VnVnykxrP1P3sS8JdAASemuNEM5ozl85ENCIvI9UHZJ/icEtGkqxQjne3WWSvjQpVX6o72Vra/ZjlOB83Yp2Y2j8gplT4GfAtj2fMsYz85FraB9KmwqHUTkK992PD6fMR+s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Dn4CaQqd; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Dn4CaQqd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E23CB1F008A0; Fri, 29 May 2026 17:27:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075676; bh=/8S90nK/A4Vz9oa36BZRAUi/Se+DtN4O4febKrT+1aM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Dn4CaQqdkZD775XxdG51F4sjVc940OusAs5NiTRX8xiBqWs/kGZfeyW8+5iPBGDz4 ACDnQVH7oeqmMKbPlEpTpMTdVriuui6xEyBnbsRKY/huJOf3bsgsuZ4MKrJ+YUvJqs GR8gWsHT87hn+Zis+MdNLFFUFezeCaDrhClCKFpHWnHc8Qs4MlWX7CT8feBI18JvLV Xd3RwD6kLgN+r93N3TsogycwWlqnRFzIkEM8TkD0qNvOlC/PJa9Lgj94bTSrMFFKsr jvDxHPyNCsFLAsH8pB6RPozXn+jZl3mQbsbkYYtrGVhFlfRo8JJHat+IW5S9dtH28f vftWnOfO19ivw== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 4C033F4006F; Fri, 29 May 2026 13:27:55 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Fri, 29 May 2026 13:27:55 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+GAj aAfyt/5KZZPh3PyKNocvyfWJANIOS1qhXy7PHxTWZP532uY7I4vrLb37xDzrgFPmETTkB1 Yj6QgVJm7pwed8rj3p1ZQfPpJKIDRz7yQpIaWUs5LdB5aIcQcbS+JbFalUsynopvq5VLLt D/Zxwxc8ZxqYv991RdEnMr/WS61TyGhBdUudUq+b5dOCTbA2AUb3iZuW7+eUpIjoBF/r6Z +Q93xl4ifihWXGuKRDuddGaBrWKT4WIDzJ7IFQ/w77b0pV8eF9LcQcjrwrVdAlkI8GjP0t b11T8rdfQLFOjB/xx8fR6UfWZlKIanEIBeSL4c9W9U0lOXpVKX94BD7qOOVA X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:54 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 06/15] mm: add MM_CP_UFFD_RWP change_protection() flag Date: Fri, 29 May 2026 18:26:35 +0100 Message-ID: <20260529172716.357179-7-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Preparatory patch. Add the change_protection() primitive that userfaultfd RWP will use. An RWP-protected PTE is PAGE_NONE with the uffd PTE bit set. The PROT_NONE half makes the CPU fault on any access; the uffd bit distinguishes an RWP fault from a plain mprotect(PROT_NONE) or NUMA hinting fault. MM_CP_UFFD_WP and MM_CP_UFFD_RWP share the same PTE bit, so the two cannot be used together on the same range. Two new change_protection() flags: MM_CP_UFFD_RWP install PAGE_NONE and set the uffd bit MM_CP_UFFD_RWP_RESOLVE restore vma->vm_page_prot, clear the uffd bit Both are wired through change_pte_range(), change_huge_pmd(), and hugetlb_change_protection() so anon, shmem, THP, and hugetlb all share the same semantics. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- include/linux/mm.h | 5 ++++ include/linux/userfaultfd_k.h | 1 - mm/huge_memory.c | 30 ++++++++++++---------- mm/hugetlb.c | 25 +++++++++++++----- mm/mprotect.c | 48 +++++++++++++++++++++++++++-------- 5 files changed, 78 insertions(+), 31 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5ac31fbadeef..87b2fb1e3f23 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3330,6 +3330,11 @@ int get_cmdline(struct task_struct *task, char *buff= er, int buflen); #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +/* Whether this change is for uffd RWP */ +#define MM_CP_UFFD_RWP (1UL << 4) /* do rwp */ +#define MM_CP_UFFD_RWP_RESOLVE (1UL << 5) /* resolve rwp */ +#define MM_CP_UFFD_RWP_ALL (MM_CP_UFFD_RWP | \ + MM_CP_UFFD_RWP_RESOLVE) =20 bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long add= r, pte_t pte); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f3b2db27989b..5115827981a2 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -364,7 +364,6 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_ar= ea_struct *vma, return false; } =20 - static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d43c2255f47d..40c65bf2d6dc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2640,8 +2640,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsign= ed long old_addr, } =20 static void change_non_present_huge_pmd(struct mm_struct *mm, - unsigned long addr, pmd_t *pmd, bool uffd_wp, - bool uffd_wp_resolve) + unsigned long addr, pmd_t *pmd, bool uffd_prot, + bool uffd_prot_resolve) { softleaf_t entry =3D softleaf_from_pmd(*pmd); const struct folio *folio =3D softleaf_to_folio(entry); @@ -2669,9 +2669,9 @@ static void change_non_present_huge_pmd(struct mm_str= uct *mm, newpmd =3D *pmd; } =20 - if (uffd_wp) + if (uffd_prot) newpmd =3D pmd_swp_mkuffd(newpmd); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); @@ -2692,8 +2692,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, spinlock_t *ptl; pmd_t oldpmd, entry; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); int ret =3D 1; =20 tlb_change_page_size(tlb, HPAGE_PMD_SIZE); @@ -2706,11 +2707,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, return 0; =20 if (thp_migration_supported() && pmd_is_valid_softleaf(*pmd)) { - change_non_present_huge_pmd(mm, addr, pmd, uffd_wp, - uffd_wp_resolve); + change_non_present_huge_pmd(mm, addr, pmd, uffd_prot, + uffd_prot_resolve); goto unlock; } =20 + /* Already in the desired state */ + if (prot_numa && pmd_protnone(*pmd)) + goto unlock; + if ((cp_flags & MM_CP_UFFD_RWP) && pmd_protnone(*pmd) && pmd_uffd(*pmd)) + goto unlock; + if (prot_numa) { =20 /* @@ -2721,9 +2728,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, if (is_huge_zero_pmd(*pmd)) goto unlock; =20 - if (pmd_protnone(*pmd)) - goto unlock; - if (!folio_can_map_prot_numa(pmd_folio(*pmd), vma, vma_is_single_threaded_private(vma))) goto unlock; @@ -2752,9 +2756,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, oldpmd =3D pmdp_invalidate_ad(vma, addr, pmd); =20 entry =3D pmd_modify(oldpmd, newprot); - if (uffd_wp) + if (uffd_prot) entry =3D pmd_mkuffd(entry); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d0c81a056ae2..4d75b69d4272 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6395,6 +6395,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, unsigned long last_addr_mask; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; + bool uffd_rwp_resolve =3D cp_flags & MM_CP_UFFD_RWP_RESOLVE; struct mmu_gather tlb; =20 /* @@ -6420,6 +6422,11 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, =20 ptep =3D hugetlb_walk(vma, address, psize); if (!ptep) { + /* + * uffd_wp installs a pte marker on the unpopulated + * entry; uffd_rwp does not install markers so the + * allocation is unnecessary for it. + */ if (!uffd_wp) { address |=3D last_addr_mask; continue; @@ -6441,7 +6448,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, * shouldn't happen at all. Warn about it if it * happened due to some reason. */ - WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve || + uffd_rwp || uffd_rwp_resolve); pages++; spin_unlock(ptl); address |=3D last_addr_mask; @@ -6475,9 +6483,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pages++; } =20 - if (uffd_wp) + if (uffd_wp || uffd_rwp) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); @@ -6488,19 +6496,24 @@ long hugetlb_change_protection(struct vm_area_struc= t *vma, * pte_marker_uffd_wp()=3D=3Dtrue implies !poison * because they're mutual exclusive. */ - if (pte_is_uffd_wp_marker(pte) && uffd_wp_resolve) + if (pte_is_uffd_wp_marker(pte) && + (uffd_wp_resolve || uffd_rwp_resolve)) /* Safe to modify directly (non-present->none). */ huge_pte_clear(mm, address, ptep, psize); } else { pte_t old_pte; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); =20 + /* Already protnone with uffd bit set? Nothing to do. */ + if (uffd_rwp && pte_protnone(pte) && huge_pte_uffd(pte)) + goto next; + old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); - if (uffd_wp) + if (uffd_wp || uffd_rwp) pte =3D huge_pte_mkuffd(pte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; diff --git a/mm/mprotect.c b/mm/mprotect.c index 8340c8b228c6..7dcc94e7bfd6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -214,8 +214,9 @@ static __always_inline void set_write_prot_commit_flush= _ptes(struct vm_area_stru static long change_softleaf_pte(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, pte_t oldpte, unsigned long cp_flags) { - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); softleaf_t entry =3D softleaf_from_pte(oldpte); pte_t newpte; =20 @@ -256,7 +257,7 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, * to unprotect it, drop it; the next page * fault will trigger without uffd trapping. */ - if (uffd_wp_resolve) { + if (uffd_prot_resolve) { pte_clear(vma->vm_mm, addr, pte); return 1; } @@ -265,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, newpte =3D oldpte; } =20 - if (uffd_wp) + if (uffd_prot) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { @@ -282,16 +283,17 @@ static __always_inline void change_present_ptes(struc= t mmu_gather *tlb, int nr_ptes, unsigned long end, pgprot_t newprot, struct folio *folio, struct page *page, unsigned long cp_flags) { - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); pte_t ptent, oldpte; =20 oldpte =3D modify_prot_start_ptes(vma, addr, ptep, nr_ptes); ptent =3D pte_modify(oldpte, newprot); =20 - if (uffd_wp) + if (uffd_prot) ptent =3D pte_mkuffd(ptent); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 /* @@ -325,6 +327,7 @@ static long change_pte_range(struct mmu_gather *tlb, long pages =3D 0; bool is_private_single_threaded; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; int nr_ptes; =20 @@ -350,6 +353,14 @@ static long change_pte_range(struct mmu_gather *tlb, /* Already in the desired state. */ if (prot_numa && pte_protnone(oldpte)) continue; + /* + * RWP-protected PTEs carry _PAGE_UFFD as a marker on + * top of PROT_NONE. Skip only entries already in that + * exact state; plain PROT_NONE from mprotect() still needs + * to be promoted so future faults can be distinguished. + */ + if (uffd_rwp && pte_protnone(oldpte) && pte_uffd(oldpte)) + continue; =20 page =3D vm_normal_page(vma, addr, oldpte); if (page) @@ -358,6 +369,8 @@ static long change_pte_range(struct mmu_gather *tlb, /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. + * Skip this filter for uffd RWP which + * must set protnone regardless of NUMA placement. */ if (prot_numa && !folio_can_map_prot_numa(folio, vma, @@ -428,7 +441,7 @@ pgtable_split_needed(struct vm_area_struct *vma, unsign= ed long cp_flags) * (e.g. 2M shmem) because file thp is handled differently when * split by erasing the pmd so far. */ - return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); + return (cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP)) && !vma_is_anonymous= (vma); } =20 /* @@ -667,7 +680,16 @@ long change_protection(struct mmu_gather *tlb, pgprot_t newprot =3D vma->vm_page_prot; long pages; =20 - BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL); + /* + * MM_CP_UFFD_{WP,RWP} and _RESOLVE are mutually exclusive within one + * change, and WP and RWP cannot mix. Miswired callers get a warn and + * a no-op; userspace cannot reach this state. + */ + if (WARN_ON_ONCE((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL = || + (cp_flags & MM_CP_UFFD_RWP_ALL) =3D=3D MM_CP_UFFD_RWP_ALL || + ((cp_flags & MM_CP_UFFD_WP_ALL) && + (cp_flags & MM_CP_UFFD_RWP_ALL)))) + return 0; =20 #ifdef CONFIG_NUMA_BALANCING /* @@ -681,6 +703,10 @@ long change_protection(struct mmu_gather *tlb, WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA); #endif =20 + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE) && + (cp_flags & MM_CP_UFFD_RWP)) + newprot =3D PAGE_NONE; + if (is_vm_hugetlb_page(vma)) pages =3D hugetlb_change_protection(vma, start, end, newprot, cp_flags); --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E0AA388880 for ; Fri, 29 May 2026 17:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075685; cv=none; b=JlXj3oMWS66b6rxzBjU3kHPghbzvfnv4+m3xlRAVw7shk6xi6UWxGVDiBfILMZxQm/9kju28sWuzyqpgtnkaH31S7UdMWvBWMM1EsAiAiw6X8o3D5/lu5liqB6f+CsWQ4zHHzQVaNN92FyzsYN/ytalLPWcRdVakK4UPtdS50vE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075685; c=relaxed/simple; bh=CuPbOB5fozbLeaTHKSqaQDEs/YfR1XJ5KaLO1c1VXmQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pO2N1qH/jjYrC3TY3IUMhjSOJJeqBQMIsRpH0gyHL1s7+f3sY6Sl9YYE5OerEdtx95Kx13twLbMwGlhZDZsX4WS1E6JbFY8w1MQ+Urw254KtTA7XWv4fMVanPRlaa6wzuQIMHwqi6PGhysRTVxRWmTPKij+hlrUrKqWvVpe/XsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HqKZZa2d; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HqKZZa2d" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB3F21F00898; Fri, 29 May 2026 17:27:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075678; bh=9Ppl2ZPX4UvVZnQvvVSRklO/lWgNtj1Bkp4Usu3NQaw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=HqKZZa2d2OGofkBtiUTGxS3jZ1duKT9ti90cxKq0uuevIFbTgspAu4NI3IH5w9/oB ECbT/VKbyBQ3nYkBR9z+bwjpvDZCsvyzgvoel2+9YX2dY5CXVsOIgGCwzGsTabynLa ZGywN9wNLF02i8JgEAPq5bssOuCiJ/L1RVfS3rqcby9Q22ipqrIMu/X+m02dJsicYF PG+C6U33KjtAlIxdO/BN4yOmvLi7n5HXO0/4CU7U4rmOPslPv3W6D8ntIEl/L28W8i X2INdmFBXD29yLZjF+vacEdsmFEwkqgSn8I8vNZSjNRJ8X4mGk1AxtjPoPsHHAcI/r vIlYO7579/8Ew== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id 15FCCF4006F; Fri, 29 May 2026 13:27:57 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Fri, 29 May 2026 13:27:57 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEUyOwNbDmHa1KNnZhhp0sknwyB5iYBikO499ls5PXcb2OBGZ+LtCI75nkUeooCiF UzrXo04XJ8+Ua/uZDk13qfrbBwlZ5p9xJmobKJQlMbnj7l7Wa1HVN1RPiFo+pMeAxt22yi 61vTsKtOSGFbPiuQJwzf8kDMk7i9oYKVPojpqHuO0mt23+5gFs2X9nq4e5eFTuainOibKw /PjyCpBsVGYk6ilHJDmI324wGNqP1tQqUfnSbHSEKXH/CNKWAMLzoIWnryvyvz0mzyGoSH 2C2M5+XUQ3GxwG6CgDLP16c5lhscyjq12y5muGouPR8MZ3kTeYPrBARrvcwg6KATek/ZXs cCvPInqY0Sp/i7lNL4RxfUXthtCDJrQIV9ngKt9EDyXdMUq3Jjz9XgitDq9mA/Tm6/uB+d 5x4eGn1CT54md7yppNJW2m6ov0aO2DxrjdXPOkRAKDOI50Wi6eAjyYv/FcPHV//UX5y2Q0 RMnVzZy2Hu9OQiyTRD2swS49NUSb9errqNQXzfoErJrdY6NtYyG0Z1pynuSXC0NpRnJ93N j7J/oz7vbpdgFSboAu2xkJfSa50XeMxvRwJCyKvE0AE7CRrekAGdbWKHXHURepMycdmay1 oXqtfWMO2Y8FSMsPOHFs7vFX+kRdfSH07nJ5kA/3PJyVKvFP5AwLCrtJzqNg X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:56 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 07/15] mm: preserve RWP marker across PTE rewrites Date: Fri, 29 May 2026 18:26:36 +0100 Message-ID: <20260529172716.357179-8-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The uffd PTE bit must survive any kernel path that rewrites a PTE on a VM_UFFD_RWP VMA, otherwise the marker that carries PAGE_NONE semantics is silently dropped and the next access leaks past RWP tracking. Wire the preservation through every path that rewrites a VM_UFFD_RWP PTE. Swap and device-exclusive: do_swap_page(), restore_exclusive_pte(), and unuse_pte() (swapoff()) re-apply PAGE_NONE when the swap PTE carries the uffd bit and the VMA has VM_UFFD_RWP. Migration: remove_migration_pte() and remove_migration_pmd() do the same after the migration entry is replaced with a real PTE/PMD. Fork: __copy_present_ptes(), copy_present_page(), copy_nonpresent_pte(), copy_huge_pmd(), copy_huge_non_present_pmd(), and copy_hugetlb_page_range() keep the uffd bit on the child when the destination VMA has VM_UFFD_RWP, matching the existing VM_UFFD_WP handling. Add VM_UFFD_RWP to VM_COPY_ON_FORK so the flag itself propagates. mprotect(): change_pte_range() and change_huge_pmd() restore PAGE_NONE after pte_modify()/pmd_modify() have recomputed the base protection from a (possibly user-changed) vm_page_prot. pte_modify() preserves _PAGE_UFFD, so the bit stays; we just have to force PAGE_NONE back on top. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 3 ++- mm/huge_memory.c | 47 +++++++++++++++++++++++++++++++++++++---- mm/hugetlb.c | 52 ++++++++++++++++++++++++++++++++++++++-------- mm/memory.c | 49 ++++++++++++++++++++++++++++++++++++------- mm/migrate.c | 8 +++++++ mm/mprotect.c | 10 +++++++++ mm/mremap.c | 13 ++++++++++-- mm/swapfile.c | 5 +++++ mm/userfaultfd.c | 17 +++++++++++++++ 9 files changed, 181 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 87b2fb1e3f23..3d4d5f9a6f1b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -683,7 +683,8 @@ enum { * only and thus cannot be reconstructed on page * fault. */ -#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_MAYBE_G= UARD) +#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_UFFD_RW= P | \ + VM_MAYBE_GUARD) =20 /* * mapping from the currently active vm_flags protection bits (the diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40c65bf2d6dc..6417d883d2e4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1943,7 +1943,7 @@ static void copy_huge_non_present_pmd( add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } @@ -2038,9 +2038,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) + + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && pmd_uffd(pmd)) + pmd =3D pmd_modify(pmd, dst_vma->vm_page_prot); pmd =3D pmd_clear_uffd(pmd); + } + + pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2626,8 +2632,16 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, pgtable_trans_huge_deposit(mm, new_pmd, pgtable); } pmd =3D move_soft_dirty_pmd(pmd); - if (vma_has_uffd_without_event_remap(vma)) + if (vma_has_uffd_without_event_remap(vma)) { + /* + * See __copy_present_ptes(): normalise RWP PMDs so + * the destination starts accessible instead of taking + * a numa-hinting fault on first access. + */ + if (pmd_present(pmd) && userfaultfd_rwp(vma)) + pmd =3D pmd_modify(pmd, vma->vm_page_prot); pmd =3D clear_uffd_wp_pmd(pmd); + } set_pmd_at(mm, new_addr, new_pmd, pmd); if (force_flush) flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); @@ -2766,6 +2780,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct v= m_area_struct *vma, */ entry =3D pmd_clear_uffd(entry); =20 + /* See change_pte_range(): preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && pmd_uffd(entry)) + entry =3D pmd_modify(entry, PAGE_NONE); + /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) @@ -2933,6 +2951,13 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t = *dst_pmd, pmd_t *src_pmd, pm _dst_pmd =3D move_soft_dirty_pmd(src_pmdval); _dst_pmd =3D clear_uffd_wp_pmd(_dst_pmd); } + + /* Re-arm RWP on the moved PMD if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + _dst_pmd =3D pmd_modify(_dst_pmd, PAGE_NONE); + _dst_pmd =3D pmd_mkuffd(_dst_pmd); + } + set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); =20 src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); @@ -3109,6 +3134,11 @@ static void __split_huge_zero_page_pmd(struct vm_are= a_struct *vma, entry =3D pte_mkspecial(entry); if (pmd_uffd(old_pmd)) entry =3D pte_mkuffd(entry); + + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && pmd_uffd(old_pmd)) + entry =3D pte_modify(entry, PAGE_NONE); + VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3383,6 +3413,10 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, if (uffd_wp) entry =3D pte_mkuffd(entry); =20 + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && uffd_wp) + entry =3D pte_modify(entry, PAGE_NONE); + for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); =20 @@ -5055,6 +5089,11 @@ void remove_migration_pmd(struct page_vma_mapped_wal= k *pvmw, struct page *new) pmde =3D pmd_mkwrite(pmde, vma); if (pmd_swp_uffd(*pvmw->pmd)) pmde =3D pmd_mkuffd(pmde); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pmd_swp_uffd(*pvmw->pmd) && userfaultfd_rwp(vma)) + pmde =3D pmd_modify(pmde, PAGE_NONE); + if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4d75b69d4272..0d8d39cd8888 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4843,8 +4843,16 @@ hugetlb_install_folio(struct vm_area_struct *vma, pt= e_t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + if (userfaultfd_protected(vma) && huge_pte_uffd(old)) { newpte =3D huge_pte_mkuffd(newpte); + /* Restore PAGE_NONE so the RWP marker keeps trapping. */ + if (userfaultfd_rwp(vma)) { + unsigned int shift =3D huge_page_shift(hstate_vma(vma)); + + newpte =3D huge_pte_modify(newpte, PAGE_NONE); + newpte =3D arch_make_huge_pte(newpte, shift, vma->vm_flags); + } + } set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4917,7 +4925,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, =20 softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { @@ -4931,11 +4939,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd) + if (userfaultfd_protected(src_vma) && uffd) entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { @@ -5000,6 +5008,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, goto next; } =20 + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && huge_pte_uffd(entry)) { + entry =3D huge_pte_modify(entry, dst_vma->vm_page_prot); + entry =3D arch_make_huge_pte(entry, huge_page_shift(h), + dst_vma->vm_flags); + } + entry =3D huge_pte_clear_uffd(entry); + } + if (cow) { /* * No need to notify as we are downgrading page @@ -5012,9 +5030,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, entry =3D huge_pte_wrprotect(entry); } =20 - if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd(entry); - set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); } @@ -5060,10 +5075,22 @@ static void move_huge_pte(struct vm_area_struct *vm= a, unsigned long old_addr, huge_pte_clear(mm, new_addr, dst_pte, sz); } else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise RWP + * PTEs so the destination starts accessible + * instead of taking a numa-hinting fault on + * first access. + */ + if (userfaultfd_rwp(vma)) { + pte =3D huge_pte_modify(pte, vma->vm_page_prot); + pte =3D arch_make_huge_pte(pte, huge_page_shift(h), + vma->vm_flags); + } pte =3D huge_pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -6515,6 +6542,13 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); + + /* Preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && huge_pte_uffd(pte)) { + pte =3D huge_pte_modify(pte, PAGE_NONE); + pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + } + huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/memory.c b/mm/memory.c index c4fd5cb4a08f..06473285c0dc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -896,6 +896,10 @@ static void restore_exclusive_pte(struct vm_area_struc= t *vma, if (pte_swp_uffd(orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { if (folio_test_dirty(folio)) @@ -1041,7 +1045,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, make_pte_marker(marker)); return 0; } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; @@ -1088,9 +1092,13 @@ copy_present_page(struct vm_area_struct *dst_vma, st= ruct vm_area_struct *src_vma /* All done, just insert the new page copy in the child */ pte =3D folio_mk_pte(new_folio, dst_vma->vm_page_prot); pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); - if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) - /* Uffd-wp needs to be delivered to dest pte as well */ + if (userfaultfd_protected(dst_vma) && pte_uffd(ptep_get(src_pte))) { + /* The uffd bit needs to be delivered to the dest pte as well */ pte =3D pte_mkuffd(pte); + /* Restore PAGE_NONE so the RWP marker keeps trapping */ + if (userfaultfd_rwp(dst_vma)) + pte =3D pte_modify(pte, PAGE_NONE); + } set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1100,9 +1108,31 @@ static __always_inline void __copy_present_ptes(stru= ct vm_area_struct *dst_vma, pte_t pte, unsigned long addr, int nr) { struct mm_struct *src_mm =3D src_vma->vm_mm; + bool writable; + + /* + * Snapshot writability before the RWP-disarm rewrite below: when the + * child is not RWP-armed, pte_modify(pte, dst_vma->vm_page_prot) can + * silently drop _PAGE_RW from a resolved (no-marker) writable PTE, + * so a later pte_write(pte) check would skip the COW wrprotect and + * leave the parent writable over a folio shared with the child. + */ + writable =3D pte_write(pte); + + /* + * Child is not RWP-armed: restore accessible protection so the + * inherited PAGE_NONE does not cost a fault on first read. Gate on + * pte_uffd(pte) so unrelated PAGE_NONE markers (e.g. NUMA balancing) + * are not normalised away. + */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && pte_uffd(pte)) + pte =3D pte_modify(pte, dst_vma->vm_page_prot); + pte =3D pte_clear_uffd(pte); + } =20 /* If it's a COW mapping, write protect it both processes. */ - if (is_cow_mapping(src_vma->vm_flags) && pte_write(pte)) { + if (is_cow_mapping(src_vma->vm_flags) && writable) { wrprotect_ptes(src_mm, addr, src_pte, nr); pte =3D pte_wrprotect(pte); } @@ -1112,9 +1142,6 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 - if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd(pte); - set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } =20 @@ -5041,6 +5068,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (pte_swp_uffd(vmf->orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* + * A page reclaimed while RWP-protected carries the uffd bit on + * its swap entry. Re-apply PAGE_NONE on swap-in so the first access + * still traps as an RWP fault. pte_modify() preserves _PAGE_UFFD. + */ + if (pte_swp_uffd(vmf->orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + /* * Same logic as in do_wp_page(); however, optimize for pages that are * certainly not shared either because we just allocated them without diff --git a/mm/migrate.c b/mm/migrate.c index 4bdb5be7afbf..8d7fd0b056b6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -329,6 +329,10 @@ static bool try_to_map_unused_to_zeropage(struct page_= vma_mapped_walk *pvmw, if (pte_swp_uffd(old_pte)) newpte =3D pte_mkuffd(newpte); =20 + /* See remove_migration_pte(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(pvmw->vma)) + newpte =3D pte_modify(newpte, PAGE_NONE); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); @@ -394,6 +398,10 @@ static bool remove_migration_pte(struct folio *folio, else if (pte_swp_uffd(old_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 7dcc94e7bfd6..cc85a8862c28 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -296,6 +296,16 @@ static __always_inline void change_present_ptes(struct= mmu_gather *tlb, else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 + /* + * The uffd bit on a VM_UFFD_RWP VMA carries PROT_NONE + * semantics. If mprotect() or NUMA hinting changed the + * base protection, restore PAGE_NONE so the PTE still + * traps on any access. pte_modify() preserves + * _PAGE_UFFD. + */ + if (userfaultfd_rwp(vma) && pte_uffd(ptent)) + ptent =3D pte_modify(ptent, PAGE_NONE); + /* * In some writable, shared mappings, we might want * to catch actual write access -- see diff --git a/mm/mremap.c b/mm/mremap.c index 12732a5c547e..8a46ec5831c8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -296,10 +296,19 @@ static int move_ptes(struct pagetable_move_control *p= mc, pte_clear(mm, new_addr, new_ptep); else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise + * RWP PTEs so the destination starts + * accessible instead of taking a + * numa-hinting fault on first access. + */ + if (userfaultfd_rwp(vma) && pte_uffd(pte)) + pte =3D pte_modify(pte, vma->vm_page_prot); pte =3D pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/swapfile.c b/mm/swapfile.c index 15fdca2da1f7..27cc299ead9b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2559,6 +2559,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd= _t *pmd, new_pte =3D pte_mksoft_dirty(new_pte); if (pte_swp_uffd(old_pte)) new_pte =3D pte_mkuffd(new_pte); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + new_pte =3D pte_modify(new_pte, PAGE_NONE); + setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9d74be69873a..e30878e4e00b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1285,6 +1285,13 @@ static long move_present_ptes(struct mm_struct *mm, if (pte_dirty(orig_src_pte)) orig_dst_pte =3D pte_mkdirty(orig_dst_pte); orig_dst_pte =3D pte_mkwrite(orig_dst_pte, dst_vma); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + orig_dst_pte =3D pte_modify(orig_dst_pte, PAGE_NONE); + orig_dst_pte =3D pte_mkuffd(orig_dst_pte); + } + set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); =20 src_addr +=3D PAGE_SIZE; @@ -1366,6 +1373,9 @@ static int move_swap_pte(struct mm_struct *mm, struct= vm_area_struct *dst_vma, orig_src_pte =3D ptep_get_and_clear(mm, src_addr, src_pte); if (pgtable_supports_soft_dirty()) orig_src_pte =3D pte_swp_mksoft_dirty(orig_src_pte); + /* Re-arm RWP on the moved swap entry if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) + orig_src_pte =3D pte_swp_mkuffd(orig_src_pte); set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); double_pt_unlock(dst_ptl, src_ptl); =20 @@ -1392,6 +1402,13 @@ static int move_zeropage_pte(struct mm_struct *mm, =20 zero_pte =3D pte_mkspecial(pfn_pte(zero_pfn(dst_addr), dst_vma->vm_page_prot)); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + zero_pte =3D pte_modify(zero_pte, PAGE_NONE); + zero_pte =3D pte_mkuffd(zero_pte); + } + ptep_clear_flush(src_vma, src_addr, src_pte); set_pte_at(mm, dst_addr, dst_pte, zero_pte); double_pt_unlock(dst_ptl, src_ptl); --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01EE2386C03 for ; Fri, 29 May 2026 17:27:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075682; cv=none; b=ZyirfPYXLuStzYgPbATV5Q5K8v+brhvIrBTQukDpSbFjGoI+9+q6YK79mwPNjhrUjXksbIQjN/DKjxj228ezCgz6MsDTDzY0NfHrmbqqwB7ut5q+P1j8gztGqHdNTTG3dYA3WhOyTCvT/UKdH6HWOh4SFLfRq5jOCNOn5L0UCBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075682; c=relaxed/simple; bh=f8D5ZMNdJD7C71kEbuCYuskvMP82xmHhUsbga8MlvQs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hQRkP6Z2JObqkqfJ/LB0XMpsvXF0LpF9JlOiAvaqOSIprvkF5B5d6Nm419+49ra6Qt/xGpKVAgBbUunkgt+eXaePBRAV94tU+M0w4GjirSiaq77wKdJ15TlTN5yvpBqv6K0j9AlQi0J9wRBUbIOwBzWqbaSC9Ee+RaR5z5MZ3fQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SaIC+iqG; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SaIC+iqG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6914D1F0089A; Fri, 29 May 2026 17:27:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075679; bh=A8QQJfS2QfTr6HNQ3SpcUaLaL+91VUPkySopvF24IIY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=SaIC+iqGLKv9oGsWp2Syf6v+Tt5tXc6cLjZReRBzF0N5Ye4l6vS9qxkMQFBLXAK4V 8c8STrXa7qvf/elntSB3k0RISaaYT8yWFtOUFj4qUPGtcGJg67dLe/llxrBMJgUfAl Ty6zqggEa9lLouKBvb68pbW1XnvEyAGg8MhpCLK9mssWKXYqR8xFyOwxwcxKmPipyK OskJ/RH+MzVWK4yiuhbSRXzVNPepOWBwm5XAwl/teWSL3GMFJups6vuiUBrM1vPmc5 BlK0numGaMCAjITqXFDh9AXH+wQYMWJCo/0AcnuR8Xm03eQylgOCyM5c84IiOo2uLw 15aY5gKDzEVjw== Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfauth.phl.internal (Postfix) with ESMTP id C8ACCF4006F; Fri, 29 May 2026 13:27:58 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Fri, 29 May 2026 13:27:58 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEHS1Z+n1S80q4BaszFx9bNnHLQ6sgL6J9HpxOFXRfFtKsh6le5cKxaCZzZ/qH/SJ zONNCxWzVoPiQhYsD9dliy9AdrB2UY/Bhb82egQk/l96x3r4UUhrX92dr53s7Ptki96Mx4 DvJi8H/xrcKSa4upwFZAuWyRsvkJtw5DUL41fEdJBQDb5TWXZV3TgknVzb3pm5Cw5eabt3 ZKinBz9XCIvsS/RAa3Cb6hoIPIA7Rb0ceUklQcqu2F89W2umMVSE8bbF5HocBMl20u4Nxk 92MelAoY5r8qDI90ESLS21z7KfMewjRnztTxCHQfAtq0f2Qz4D8wiVPs5MI39B39+ARIHa 4eHCBMOP2+kYi8t5TfwCrqkfAB8edMdtMWBLymx0Sb+U3MH1XCob0Q6hsed86Ynyp4kIb2 qXTBszFbdd2fiEhAouFzhcL9vJCiVU2O+27R/ACg8CtYfcSLaYSfQ9rUYf4JXpg49vCjgm aioJx3kzA6pENDC5SqOv/Jpm2/qBeFr3q7e92hFrjAoiYIrGPCZaxVKCRL5qhRYfvX9tnv 0x6tRZ7bc6/ll43B9jjIgdK3MwoYJfjUpF+GubErEr4fQUAtFWb8SydJebj1GeY6CZlBDe Jc1ugqShldy4SWNio9we7yD2GRZBe3fJQu3zIDQKBYG/E+97zdRhPrflRzdA X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:58 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 08/15] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Date: Fri, 29 May 2026 18:26:37 +0100 Message-ID: <20260529172716.357179-9-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Three mm paths outside the fault handler gate on the uffd PTE bit today: khugepaged (skip collapse on ranges carrying markers), rmap (cap unmap batching), and GUP (force a fault through gup_can_follow_protnone). Extend each to treat VM_UFFD_RWP the same as VM_UFFD_WP; otherwise per-PTE RWP state is silently destroyed or bypassed. khugepaged: try_collapse_pte_mapped_thp() and file_backed_vma_is_retractable() already refuse to collapse or retract page tables on ranges carrying the uffd PTE bit. Broaden the VMA predicate from userfaultfd_wp() to userfaultfd_protected() so VM_UFFD_RWP ranges get the same protection. hpage_collapse_scan_pmd() needs no change =E2=80=94 its existing pte_uffd() check already catches an RWP PTE because it carries the uffd bit. rmap: folio_unmap_pte_batch() caps batching at 1 for VM_UFFD_RWP so the restore path handles each PTE with its own marker. GUP: gup_can_follow_protnone() forces a fault on VM_UFFD_RWP VMAs regardless of FOLL_HONOR_NUMA_FAULT. RWP uses protnone as an access-tracking marker, not for NUMA hinting, so any GUP =E2=80=94 read or write =E2=80=94 must go through the userfaultfd fault path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) Reviewed-by: Lorenzo Stoakes --- include/linux/mm.h | 16 +++++++++++++++- mm/khugepaged.c | 18 +++++++++++------- mm/rmap.c | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3d4d5f9a6f1b..2b04f690b516 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4644,11 +4644,25 @@ static inline int vm_fault_to_errno(vm_fault_t vm_f= ault, int foll_flags) =20 /* * Indicates whether GUP can follow a PROT_NONE mapped page, or whether - * a (NUMA hinting) fault is required. + * a (NUMA hinting or userfaultfd RWP) fault is required. */ static inline bool gup_can_follow_protnone(const struct vm_area_struct *vm= a, unsigned int flags) { + /* + * VM_UFFD_RWP uses protnone as an access-tracking marker, not for + * NUMA hinting. GUP must always take a fault so the access is + * delivered to userfaultfd, regardless of FOLL_HONOR_NUMA_FAULT. + * + * Only do so while the VMA is accessible. If it has been made + * inaccessible (e.g. mprotect(PROT_NONE)), fall through to the guard + * below: forcing a fault there would loop, as handle_mm_fault() makes + * no progress on protnone in an inaccessible VMA, and the access is + * denied regardless of RWP anyway. + */ + if ((vma->vm_flags & VM_UFFD_RWP) && vma_is_accessible(vma)) + return false; + /* * If callers don't want to honor NUMA hinting faults, no need to * determine if we would actually have to trigger a NUMA hinting fault. diff --git a/mm/khugepaged.c b/mm/khugepaged.c index afa218be15de..4f3fedcd75cf 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1895,8 +1895,11 @@ static enum scan_result try_collapse_pte_mapped_thp(= struct mm_struct *mm, unsign if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return SCAN_VMA_CHECK; =20 - /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ - if (userfaultfd_wp(vma)) + /* + * Keep pmd pgtable while the uffd bit is in use; see comment in + * retract_page_tables(). + */ + if (userfaultfd_protected(vma)) return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, @@ -2109,13 +2112,14 @@ static bool file_backed_vma_is_retractable(struct v= m_area_struct *vma) return false; =20 /* - * When a vma is registered with uffd-wp, we cannot recycle + * When a vma is registered with uffd-wp or RWP, we cannot recycle * the page table because there may be pte markers installed. - * Other vmas can still have the same file mapped hugely, but - * skip this one: it will always be mapped in small page size - * for uffd-wp registered ranges. + * VM_UFFD_RWP ranges similarly rely on per-PTE uffd state + * and cannot be recycled to a shared PMD. Other vmas can still + * have the same file mapped hugely, but skip this one: it will + * always be mapped in small page size for these registrations. */ - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return false; =20 /* diff --git a/mm/rmap.c b/mm/rmap.c index 546bc1cf9391..9fb733489898 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1965,7 +1965,7 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return 1; =20 /* --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1146D388E5B; Fri, 29 May 2026 17:28:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075686; cv=none; b=pWoWnHpjF161uWiJm4CzGfHxx4JYwGOjtKVGtFW3B5e1HJITbKKOsxaMk0W97KtdBgD6pz4l+n/jzk+2VfccBPU/+zRvwCiPILiLs5KDxnBXgUQjScMuaN3jwo/1FiE9hKPwdKzHdb51uoBKbpIiD80o7lVXwXF9m7JFeeqtlI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075686; c=relaxed/simple; bh=5ZhOgslPtGrkVjEZ64WT+F5j+jpq4rqLa7pXdsoiwQc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NrSeR98X5VwNbS319NDupMCuRYaJn+C63xGq8JZng5y6cQ1RV9ugzb5PEZiW81F+efuQM7EY8gyKQl1z+nka36gEIN1PFqq32gsvdJwKbT45gg11SG+jpmYWt/V6Es0zR3B1h6tNHxQI4HnhLn+37VfeoGW0LbSZ80/7rKBxAbY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=irMSoY43; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="irMSoY43" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45BAB1F008A2; Fri, 29 May 2026 17:28:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075681; bh=RgHUcO9zOOU0NSzK4GPlfHyRlPNkFFMFUZ9QDaPQAxc=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=irMSoY43ILIg366K3zo8XV+bjDC0UWPxuxRTjpQ36t9PTIr2dgubZdyeOTs6bAv8M 1wSARfM5eOk2hZ7egJT75taRlPqBIS8x9FXH6G4lHIOEbzJfC6fZHerhJbDXH95JOl mPbxZiKQXy6ZASKakUWg4Z4cnLlk7icUoHSbOwOhlu7YB10nLjjnP9VhE47tuO9xzq 8iPDvBMqP4D5eTYE2xaGqMzWjVwEKqz6RWIeuU7xXACIzbGfZK7afCE4QPs66QALQE yIZb6tiM4MTnBpXG7NOsYoUOVMQ4A1W58eLh+XFQXlHrIkUFcex1RU4mw6e6QPUO+1 O59EErTIFmx/A== Received: from phl-compute-10.internal (phl-compute-10.internal [10.202.2.50]) by mailfauth.phl.internal (Postfix) with ESMTP id A2063F4006F; Fri, 29 May 2026 13:28:00 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Fri, 29 May 2026 13:28:00 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEHS1Z+n1S80q4BaszFx9bNnHLQ6sgL6J9HpxOFXRfFtKsh6le5cKxaCZzZ/qH/SJ zONNCxWzVoPiQhYsD9dliy9AdrB2UY/Bhb82egQk/l96x3r4UUhrX92dr53s7Ptki96Mx4 DvJi8H/xrcKSa4upwFZAuWyRsvkJtw5DUL41fEdJBQDb5TWXZV3TgknVzb3pm5Cw5eabt3 ZKinBz9XCIvsS/RAa3Cb6hoIPIA7Rb0ceUklQcqu2F89W2umMVSE8bbF5HocBMl20u4Nxk 92MelAoY5r8qDI90ESLS21z7KfMewjRnztTxCHQfAtq0f2Qz4D8wiVPs5MI39B39+ARIB3 WcWKMELBym8czu8yPkh/bqVNd7lU6jSLQ8pOPppnRP3jfjRlneJX4hgafo4AfFHURiwK/m 2CQ8gteJg8A3DR1tnc37Csx5XD45IOMzipSP9JEfmrhWrW4OP9Mdgntlwgum+3OrPWqPla IcNuh9xPKrDVXwKwGduPRzpk76s7Pdfb4CTLhnUjwd9jPfcn44XxEkXOn3py4mUC8nL/YI 59RGluiiyTne3n7R3Dodb6CQspV3MDbiRPARuHobRvB5vru0AjfQCXw31U1ChWs6MfXMAt r9pDpTH2rvs4S4ocribmdnlows7fmpPLidtqhS0P8SzwDxugrv3g52BKYiKQ X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:27:59 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 09/15] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Date: Fri, 29 May 2026 18:26:38 +0100 Message-ID: <20260529172716.357179-10-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add the userspace interface for read-write protection tracking: - UFFDIO_REGISTER_MODE_RWP register a range for RWP tracking - UFFD_FEATURE_RWP capability bit - UFFDIO_RWPROTECT install / remove RWP on a range Introduce CONFIG_USERFAULTFD_RWP, auto-selected on 64-bit kernels with ARCH_HAS_PTE_PROTNONE and HAVE_ARCH_USERFAULTFD_WP. The symbol gates VM_UFFD_RWP (previously aliased to VM_NONE) and the smaps/trace-flag hooks added in the preparatory patches; without it the UAPI bits added here have nothing to drive and would be unreachable. Registration sets VM_UFFD_RWP on the VMA. Combining MODE_WP with MODE_RWP is rejected because both modes claim the uffd PTE bit. UFFDIO_RWPROTECT is the bidirectional counterpart of UFFDIO_WRITEPROTECT: - MODE_RWP change_protection() with MM_CP_UFFD_RWP installs PAGE_NONE and sets the uffd bit on present PTEs - !MODE_RWP change_protection() with MM_CP_UFFD_RWP_RESOLVE restores vma->vm_page_prot and clears the bit userfaultfd_clear_vma() runs the same resolve pass on unregister so RWP state cannot outlive the uffd. Re-registering a range must not drop a mode that installs per-PTE markers (WP or RWP); doing so returns -EBUSY. This also closes a pre-existing window where re-registering without MODE_WP would strand uffd-wp markers: before, those caused extra write-faults but were otherwise benign; with RWP preservation in place, a subsequent mprotect() on a VM_UFFD_RWP VMA would silently promote the stale markers to RWP. The feature is not yet advertised. UFFDIO_REGISTER_MODE_RWP, UFFD_FEATURE_RWP, and _UFFDIO_RWPROTECT are intentionally absent from UFFD_API_REGISTER_MODES, UFFD_API_FEATURES, and UFFD_API_RANGE_IOCTLS, so UFFDIO_API masks them out and the register-mode validator rejects the bit. The follow-up patch adds fault dispatch and exposes the UAPI. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/userfaultfd.rst | 10 + include/linux/userfaultfd_k.h | 2 + include/uapi/linux/userfaultfd.h | 19 ++ mm/Kconfig | 9 + mm/userfaultfd.c | 189 ++++++++++++++++++- 5 files changed, 226 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index e5cc8848dcb3..1e533639fd50 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -131,6 +131,16 @@ userfaults on the range registered. Not all ioctls wil= l necessarily be supported for all memory types (e.g. anonymous memory vs. shmem vs. hugetlbfs), or all types of intercepted faults. =20 +.. note:: + + Re-registering an already-registered range must not drop any of the + modes that install per-PTE markers =E2=80=94 currently + ``UFFDIO_REGISTER_MODE_WP`` and ``UFFDIO_REGISTER_MODE_RWP``. Doing + so would strand markers with no flag to describe them, so the call + is rejected with ``-EBUSY``; userspace must issue + ``UFFDIO_UNREGISTER`` first. This differs from older kernels, which + silently replaced the mode bits on re-registration. + Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove memory from the ``userfaultfd`` registered range). This means a userfault diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 5115827981a2..8e0833e6613f 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -150,6 +150,8 @@ static inline uffd_flags_t uffd_flags_set_mode(uffd_fla= gs_t flags, enum mfill_at =20 extern long uffd_wp_range(struct vm_area_struct *vma, unsigned long start, unsigned long len, bool enable_wp); +extern int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long sta= rt, + unsigned long len, bool enable_rwp); =20 /* move_pages */ void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 2841e4ea8f2c..7b78aa3b5318 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -79,6 +79,7 @@ #define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) +#define _UFFDIO_RWPROTECT (0x09) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -103,6 +104,8 @@ struct uffdio_continue) #define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \ struct uffdio_poison) +#define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ + struct uffdio_rwprotect) =20 /* read() structure */ struct uffd_msg { @@ -158,6 +161,7 @@ struct uffd_msg { #define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ #define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ #define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */ +#define UFFD_PAGEFAULT_FLAG_RWP (1<<3) /* If reason is VM_UFFD_RWP */ =20 struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -230,6 +234,11 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_RWP indicates that the kernel supports + * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. + * Pages are made inaccessible via UFFDIO_RWPROTECT and faults + * are delivered when the pages are re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +257,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_RWP (1<<17) __u64 features; =20 __u64 ioctls; @@ -263,6 +273,7 @@ struct uffdio_register { #define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0) #define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1) #define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2) +#define UFFDIO_REGISTER_MODE_RWP ((__u64)1<<3) __u64 mode; =20 /* @@ -356,6 +367,14 @@ struct uffdio_poison { __s64 updated; }; =20 +struct uffdio_rwprotect { + struct uffdio_range range; + /* !RWP means undo RWP-protection */ +#define UFFDIO_RWPROTECT_MODE_RWP ((__u64)1<<0) +#define UFFDIO_RWPROTECT_MODE_DONTWAKE ((__u64)1<<1) + __u64 mode; +}; + struct uffdio_move { __u64 dst; __u64 src; diff --git a/mm/Kconfig b/mm/Kconfig index 776b67c66e82..fac01bcfc0d1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1333,6 +1333,15 @@ config HAVE_ARCH_USERFAULTFD_MINOR help Arch has userfaultfd minor fault support =20 +config USERFAULTFD_RWP + def_bool y + depends on 64BIT && ARCH_HAS_PTE_PROTNONE && HAVE_ARCH_USERFAULTFD_WP + help + Userfaultfd read-write protection (UFFDIO_RWPROTECT) delivers a + userfaultfd notification on every access -- read or write -- to a + protected range, letting userspace observe the working set of a + process. + menuconfig USERFAULTFD bool "Enable userfaultfd() system call" depends on MMU diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e30878e4e00b..c07e3232a01a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1157,6 +1157,75 @@ static int mwriteprotect_range(struct userfaultfd_ct= x *ctx, unsigned long start, return err; } =20 +int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, + unsigned long len, bool enable_rwp) +{ + struct mm_struct *dst_mm =3D ctx->mm; + unsigned long end =3D start + len; + struct vm_area_struct *dst_vma; + unsigned int mm_cp_flags; + struct mmu_gather tlb; + bool found =3D false; + VMA_ITERATOR(vmi, dst_mm, start); + + VM_WARN_ON_ONCE(start & ~PAGE_MASK); + VM_WARN_ON_ONCE(len & ~PAGE_MASK); + VM_WARN_ON_ONCE(start + len <=3D start); + + guard(mmap_read_lock)(dst_mm); + guard(rwsem_read)(&ctx->map_changing_lock); + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (enable_rwp) + mm_cp_flags =3D MM_CP_UFFD_RWP; + else + mm_cp_flags =3D MM_CP_UFFD_RWP_RESOLVE; + + /* + * Pre-scan the range: validate every spanned VMA before applying + * any change_protection() so a partial failure cannot leave the + * process with only a prefix of the range re-protected. + */ + for_each_vma_range(vmi, dst_vma, end) { + if (!userfaultfd_rwp(dst_vma)) + return -ENOENT; + + if (is_vm_hugetlb_page(dst_vma)) { + unsigned long page_mask; + + page_mask =3D vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + return -EINVAL; + } + found =3D true; + } + if (!found) + return -ENOENT; + + vma_iter_set(&vmi, start); + tlb_gather_mmu(&tlb, dst_mm); + for_each_vma_range(vmi, dst_vma, end) { + unsigned long vma_start =3D max(dst_vma->vm_start, start); + unsigned long vma_end =3D min(dst_vma->vm_end, end); + unsigned int flags =3D mm_cp_flags; + + /* + * On resolve, try to upgrade writability per-VMA -- + * MM_CP_TRY_CHANGE_WRITABLE WARNs in + * maybe_change_pte_writable() if the VMA is not VM_WRITE, + * and RWP can be registered on PROT_READ-only mappings. + */ + if (!enable_rwp && vma_wants_manual_pte_write_upgrade(dst_vma)) + flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + change_protection(&tlb, dst_vma, vma_start, vma_end, flags); + } + tlb_finish_mmu(&tlb); + + return 0; +} =20 void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2) @@ -2145,6 +2214,15 @@ static bool vma_can_userfault(struct vm_area_struct = *vma, vm_flags_t vm_flags, !vma_is_anonymous(vma)) return false; =20 + /* + * RWP uses protnone as an access-tracking marker. PROT_NONE VMAs + * have vm_page_prot =3D=3D PAGE_NONE, so RWP resolution can't make a + * page accessible -- the next access would fault again. Reject up + * front instead of letting FOLL_FORCE loop on protnone+uffd PTEs. + */ + if ((vm_flags & VM_UFFD_RWP) && !vma_is_accessible(vma)) + return false; + return ops->can_userfault(vma, vm_flags); } =20 @@ -2197,9 +2275,22 @@ static struct vm_area_struct *userfaultfd_clear_vma(= struct vma_iterator *vmi, if (start =3D=3D vma->vm_start && end =3D=3D vma->vm_end) give_up_on_oom =3D true; =20 - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, end - start, false); + /* Clear the uffd bit and/or restore protnone PTEs */ + if (userfaultfd_protected(vma)) { + unsigned int mm_cp_flags =3D 0; + struct mmu_gather tlb; + + if (userfaultfd_wp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_WP_RESOLVE; + if (userfaultfd_rwp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_RWP_RESOLVE; + if (vma_wants_manual_pte_write_upgrade(vma)) + mm_cp_flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + tlb_gather_mmu(&tlb, vma->vm_mm); + change_protection(&tlb, vma, start, end, mm_cp_flags); + tlb_finish_mmu(&tlb); + } =20 ret =3D vma_modify_flags_uffd(vmi, prev, vma, start, end, &new_vma_flags, NULL_VM_UFFD_CTX, @@ -2248,6 +2339,14 @@ static int userfaultfd_register_range(struct userfau= ltfd_ctx *ctx, vma_test_all_mask(vma, vma_flags)) goto skip; =20 + /* + * Pre-scan in userfaultfd_register() already rejected mode + * switches that would drop VM_UFFD_WP or VM_UFFD_RWP, so a + * stray bit here is a bug. + */ + VM_WARN_ON_ONCE(vma->vm_userfaultfd_ctx.ctx =3D=3D ctx && + vma->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags); + if (vma->vm_start > start) start =3D vma->vm_start; vma_end =3D min(end, vma->vm_end); @@ -2514,6 +2613,8 @@ static inline struct uffd_msg userfault_msg(unsigned = long address, msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WP; + if (reason & VM_UFFD_RWP) + msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_RWP; if (reason & VM_UFFD_MINOR) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) @@ -3613,6 +3714,22 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, =20 vm_flags |=3D VM_UFFD_WP; } + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP) { + if (!pgtable_supports_uffd() || VM_UFFD_RWP =3D=3D VM_NONE) + goto out; + if (!(ctx->features & UFFD_FEATURE_RWP)) + goto out; + vm_flags |=3D VM_UFFD_RWP; + } + + /* + * WP and RWP share the uffd PTE bit and + * cannot coexist in the same VMA =E2=80=94 the bit would carry ambiguous + * semantics. Reject the combination up front. + */ + if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP)) + goto out; + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) { #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR goto out; @@ -3706,6 +3823,16 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, cur->vm_userfaultfd_ctx.ctx !=3D ctx) goto out_unlock; =20 + /* + * Mode switches that drop VM_UFFD_WP or VM_UFFD_RWP would + * leave PTE markers without the flag that describes them; + * subsequent mprotect() would then promote stale markers + * into the other mode. Require an unregister first. + */ + if (cur->vm_userfaultfd_ctx.ctx =3D=3D ctx && + cur->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags) + goto out_unlock; + /* * Note vmas containing huge pages */ @@ -3739,6 +3866,10 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) ioctls_out &=3D ~((__u64)1 << _UFFDIO_CONTINUE); =20 + /* RWPROTECT is only supported for RWP ranges */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP)) + ioctls_out &=3D ~((__u64)1 << _UFFDIO_RWPROTECT); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -4086,6 +4217,55 @@ static int userfaultfd_writeprotect(struct userfault= fd_ctx *ctx, return ret; } =20 +static int userfaultfd_rwprotect(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + int ret; + struct uffdio_rwprotect uffdio_rwp; + struct userfaultfd_wake_range range; + bool mode_rwp, mode_dontwake; + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (copy_from_user(&uffdio_rwp, (void __user *)arg, + sizeof(uffdio_rwp))) + return -EFAULT; + + ret =3D validate_range(ctx->mm, uffdio_rwp.range.start, + uffdio_rwp.range.len); + if (ret) + return ret; + + if (uffdio_rwp.mode & ~(UFFDIO_RWPROTECT_MODE_DONTWAKE | + UFFDIO_RWPROTECT_MODE_RWP)) + return -EINVAL; + + mode_rwp =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_RWP; + mode_dontwake =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_DONTWAKE; + + if (mode_rwp && mode_dontwake) + return -EINVAL; + + if (mmget_not_zero(ctx->mm)) { + ret =3D mrwprotect_range(ctx, uffdio_rwp.range.start, + uffdio_rwp.range.len, mode_rwp); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (ret) + return ret; + + if (!mode_rwp && !mode_dontwake) { + range.start =3D uffdio_rwp.range.start; + range.len =3D uffdio_rwp.range.len; + wake_userfault(ctx, &range); + } + return ret; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -4392,6 +4572,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_POISON: ret =3D userfaultfd_poison(ctx, arg); break; + case UFFDIO_RWPROTECT: + ret =3D userfaultfd_rwprotect(ctx, arg); + break; } return ret; } --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1D1738C438; Fri, 29 May 2026 17:28:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075690; cv=none; b=Gcz3QYRjDEjbidsvGFQZ6CZT3t/WaxHm/mGULY2BdV0/wAiDtP91f8Qr/ntxCThHrHVZYh/C7RJS6AGEpfBzhuQxUWkbhp8u/hEo9+fKrTF8AHa2Q/JML1cZrAgfG+HbDeD1zny26hYNS0cBOF9hFun1ZOkABG3EZkORW7Rpg5Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075690; c=relaxed/simple; bh=/LRnsmd+xoD+KOjbYmU9DTOanGR5cBLbECCLD43M4PE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TYHjp4DPprENqEKY2NsUCFN4/NtlFq7PxrqU2kX9Iw8HvoMo4eomLWuJKKW1IPFjM9Mm133+KRPS4OE0f+b6YQEZAoY4Wp5axnSWMJvZOUgMi+n5Nlxu1dcybR0Oj6z8uxLcC0eggxtMd/pREDk7XXkdSZViZ9toJnS47owlaUc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a4WWIxCX; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a4WWIxCX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0011E1F0089A; Fri, 29 May 2026 17:28:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075684; bh=gv2+pRc1U6osORJP228ODxtBMek/urQFSJbHm0PtvWw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=a4WWIxCXDnrZqTI1Qe+ZF2iXjRAi916ExUZ9vROr5qeOcQQFpFUrlbseSQ4pNCfgT 0yfOxP+eGUllXaX6F8Hrd+V2CIZ0xeeARqbNGouZMA+kGMZ5Xee4/gHstleug2lm+K NnlLPYO58Ak7pkHvvtFd0jfJxi6lenSed0y63UW3QHx88rx8mnVgmWBaCFo3f3R7Ku HxRf9Ont7gmTfgkwT6gnAHuTvLAEKiBFw3jeeG9OUNwfAVkqi2IeYxiBlWpcSymn+z IPULPjwm1xkQC2f8WOrcImMMFAvgrcc59KX1rPXEBmVk4EjB/8E1SvGt2pHHZoIdvo L75XMon651J4Q== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id 5DE70F4006F; Fri, 29 May 2026 13:28:03 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 29 May 2026 13:28:03 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGEJshrF8A2I+ABT1+PcGe92P4pFG6+cE0vklYqRrHb+mEFiJirgfeISxfctvyDGZ GD+y1IeZv35tmnbzf3LLnXzuWli8IsJO39uylVNJfpIhw1RwqSaiqgJqpRIr+kvRh0VYKQ hEgzmgZT3VRasc06vpC33AigH7tICC2Iwt4d1dJuL7T8m+/ijDYDWTAi4mKbspJqr+bVrb MSH1cVn2xbuEamsG3tXwf9pNNkU33/NFHfeivk6kEXasVk0NjqFjFU9J4z/SYAXJRjmQ8p iEkseBPErjuJTuMKszjhjApDifllPe/w00800msCp6mk2mwgOI/kxoE6f7Uw227lFWaunq UK+PxxVafsjvjp98sZ4Vff8Pytl5ElNRoPtVaTtpoNEzigSzPNDIUbs5MldlpGCqGrNKx3 I1YS+TPjb0Zja8i605HxL19GV0P5r3ldivKTTByIOwBMw7eG/K4ljoUXm3VQkm/I0buwKX NUHj9lTPrY73szOibjqKsPpNwyzrRCHdlxn6PQZgscJDpQCRTPSQzjf7Cd3l8Jt4ymgobP rCBTt58EejelLFPzXMfrx1PGSeO2XZQc0jESDWn2J+ZvMS47P4bOyDvUuxei1rUjmKcANR adir3XVGGC7J60lYe4e6U6X00JCJqVIzUKWBXtBvkR02NT9DRDbtfqz3qmrw X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:02 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 10/15] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Date: Fri, 29 May 2026 18:26:39 +0100 Message-ID: <20260529172716.357179-11-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Wire the fault side of read-write protection tracking and turn the userspace interface on. An RWP-protected PTE is PAGE_NONE with the uffd bit set. The PROT_NONE triggers a fault on any access; the uffd bit distinguishes it from plain mprotect(PROT_NONE) or NUMA hinting. Fault dispatch, per level: PTE handle_pte_fault() -> do_uffd_rwp() PMD __handle_mm_fault() -> do_huge_pmd_uffd_rwp() hugetlb hugetlb_fault() -> hugetlb_handle_userfault() The RWP branches gate on userfaultfd_pte_rwp() / userfaultfd_huge_pmd_rwp() (VM_UFFD_RWP plus the uffd bit) and fall through to do_numa_page() / do_huge_pmd_numa_page() otherwise. Each delivers a UFFD_PAGEFAULT_FLAG_RWP message through handle_userfault(); the handler resolves it with UFFDIO_RWPROTECT clearing MODE_RWP. userfaultfd_must_wait() and userfaultfd_huge_must_wait() add matching protnone+uffd waiters so sync-mode fault handlers block correctly. Expose the UAPI: UFFDIO_REGISTER_MODE_RWP -> UFFD_API_REGISTER_MODES UFFD_FEATURE_RWP -> UFFD_API_FEATURES _UFFDIO_RWPROTECT -> UFFD_API_RANGE_IOCTLS UFFD_API_RANGE_IOCTLS_BASIC UFFD_FEATURE_RWP is masked out at UFFDIO_API time when PROT_NONE is not available or VM_UFFD_RWP aliases VM_NONE (32-bit), so userspace never sees an advertised-but-broken feature. Works on anonymous, shmem, and hugetlb memory. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/huge_mm.h | 7 +++++++ include/linux/userfaultfd_k.h | 24 ++++++++++++++++++++++++ include/uapi/linux/userfaultfd.h | 12 ++++++++---- mm/huge_memory.c | 5 +++++ mm/hugetlb.c | 11 +++++++++++ mm/memory.c | 31 +++++++++++++++++++++++++++++-- mm/userfaultfd.c | 32 ++++++++++++++++++++++++++++++-- 7 files changed, 114 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index edece3e26985..fe48d76957fb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -529,6 +529,8 @@ static inline bool folio_test_pmd_mappable(struct folio= *folio) =20 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf); + vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); =20 extern struct folio *huge_zero_folio; @@ -716,6 +718,11 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *p= ud, return NULL; } =20 +static inline vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return 0; +} + static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { return 0; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 8e0833e6613f..6b633ec694e1 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -236,6 +236,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return userfaultfd_rwp(vma) && pte_uffd(pte); +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return userfaultfd_rwp(vma) && pmd_uffd(pmd); +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma_test_any_mask(vma, __VMA_UFFD_FLAGS); @@ -366,6 +378,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return false; } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return false; +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 7b78aa3b5318..d803e76d47ad 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -25,7 +25,8 @@ #define UFFD_API ((__u64)0xAA) #define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING | \ UFFDIO_REGISTER_MODE_WP | \ - UFFDIO_REGISTER_MODE_MINOR) + UFFDIO_REGISTER_MODE_MINOR | \ + UFFDIO_REGISTER_MODE_RWP) #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ @@ -42,7 +43,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_RWP) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -54,13 +56,15 @@ (__u64)1 << _UFFDIO_MOVE | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) =20 /* * Valid ioctl command number range with this API is from 0x00 to diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6417d883d2e4..72cb44332004 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2289,6 +2289,11 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, return pmd_dirty(pmd); } =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return handle_userfault(vmf, VM_UFFD_RWP); +} + /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d8d39cd8888..d4da39d698b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6062,6 +6062,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, goto out_mutex; } =20 + /* + * Protnone hugetlb PTEs with the uffd bit are used by + * userfaultfd RWP for access tracking. Plain PROT_NONE (without the + * marker) is not an RWP fault and is not expected on hugetlb (no + * NUMA hinting), so let normal hugetlb fault handling proceed. + */ + if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && + userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + } + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any diff --git a/mm/memory.c b/mm/memory.c index 06473285c0dc..4f8b8dff0b7f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6122,6 +6122,16 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru if (!pte_present(ptent) || !pte_protnone(ptent)) continue; =20 + /* + * RWP-armed PTEs are also protnone but carry _PAGE_UFFD as a + * marker. Leave them alone -- rewriting to vm_page_prot would + * stop the RWP trap. Gate on userfaultfd_rwp(vma) too: + * NUMA balancing preserves _PAGE_UFFD on UFFD_WP-marked PTEs + * when applying PROT_NONE, and those still need rebuilding. + */ + if (userfaultfd_rwp(vma) && pte_uffd(ptent)) + continue; + if (pfn_folio(pte_pfn(ptent)) !=3D folio) continue; =20 @@ -6137,6 +6147,12 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru } } =20 +static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) +{ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); +} + static vm_fault_t do_numa_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -6412,8 +6428,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); =20 - if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) + if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) { + /* + * RWP-protected PTEs are protnone plus the uffd bit. On a + * VM_UFFD_RWP VMA, a protnone PTE without the uffd bit is + * NUMA hinting and must still fall through to do_numa_page(). + */ + if (userfaultfd_pte_rwp(vmf->vma, vmf->orig_pte)) + return do_uffd_rwp(vmf); return do_numa_page(vmf); + } =20 spin_lock(vmf->ptl); entry =3D vmf->orig_pte; @@ -6527,8 +6551,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, return 0; } if (pmd_trans_huge(vmf.orig_pmd)) { - if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) + if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) { + if (userfaultfd_huge_pmd_rwp(vma, vmf.orig_pmd)) + return do_huge_pmd_uffd_rwp(&vmf); return do_huge_pmd_numa_page(&vmf); + } =20 if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !pmd_write(vmf.orig_pmd)) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index c07e3232a01a..db3707b9d977 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2668,6 +2668,12 @@ static inline bool userfaultfd_huge_must_wait(struct= userfaultfd_ctx *ctx, */ if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) return true; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * resolution. Plain PROT_NONE without the marker is not an RWP fault. + */ + if (pte_protnone(pte) && huge_pte_uffd(pte) && (reason & VM_UFFD_RWP)) + return true; =20 return false; } @@ -2728,8 +2734,14 @@ static inline bool userfaultfd_must_wait(struct user= faultfd_ctx *ctx, if (!pmd_present(_pmd)) return false; =20 - if (pmd_trans_huge(_pmd)) - return !pmd_write(_pmd) && (reason & VM_UFFD_WP); + if (pmd_trans_huge(_pmd)) { + if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) + return true; + if (pmd_protnone(_pmd) && pmd_uffd(_pmd) && + (reason & VM_UFFD_RWP)) + return true; + return false; + } =20 pte =3D pte_offset_map(pmd, address); if (!pte) @@ -2765,6 +2777,13 @@ static inline bool userfaultfd_must_wait(struct user= faultfd_ctx *ctx, */ if (!pte_write(ptent) && (reason & VM_UFFD_WP)) goto out; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * userspace to resolve. Plain PROT_NONE without the marker is not + * an RWP fault. + */ + if (pte_protnone(ptent) && pte_uffd(ptent) && (reason & VM_UFFD_RWP)) + goto out; =20 ret =3D false; out: @@ -4506,6 +4525,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; } + /* + * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + uffdio_api.features &=3D ~UFFD_FEATURE_RWP; =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A607638B157; Fri, 29 May 2026 17:28:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075691; cv=none; b=RXInJoezA8/deXAEUAdiCnKwc8UYf0BRIuZVCtIFJmkzcCGTMIdXYl/x8vh5VuM2QHHBLZiB+vNM8StzOt1vp2dlDD1YXrygeaW5NVZa8aCTIMc4lgISZZF6hMCECoUY3mbAOOIlKjGXCPfyn8To8vFq+bdAyuipycP2NsGwzqA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075691; c=relaxed/simple; bh=YzBjPd63uEeu1EoXVayUwvySyhxW1q0luv/Xr7igBmg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fxprKvN7kXO5+uBwRX2rrJlgt+kiUGTObJWrgZD9qA6O5NnbASkzotf2va7OjaBLt79eDywhy+KYbt/2/yeOcKMYjD4zpsuuEEDRGWfeCtt6V3JkkIqGcUuWBoUFyfXe46IsDQDiSllO79SVLTOaVVSQJ9IICyZRZwWIodrXeRM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bI/lKOtb; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bI/lKOtb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 964C51F0089D; Fri, 29 May 2026 17:28:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075686; bh=7FwHn2bRKJaKHPfoXVMI3en/RPbsnRravWt/lCAWGZY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=bI/lKOtbBymmQ86llSP14z3ZJ6gXciiIAG6G8GjkXgPzkLIJqTAYP8cNlP9+6MZ0P ZkyQfDVeY3DrE+z/cKP0FGn8kf48AIyVN2m2lThJD6z1p6VFzMOkAMR1M3t7VtTm9B d0t/sJv2dewu10X2PUi3VznO4X1LwICqTLdLPRwMxvt2DN5rH7ilpnYYG1wzfLZIun jUbEG12n7n8QodkbeQDL+f+Fh/VvtOWZesTOvjwlPiRmxodQ5sxA+Ufs3hcl28qH5l oof7pebojFrwp2WODovrfUMUwjsM3rKCZV57hte3xIzyaoDDoD4sFdAEqHORVvGhno JnKSbXI0G7d4A== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id F1CDEF4006F; Fri, 29 May 2026 13:28:04 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 29 May 2026 13:28:05 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEHS1Z+n1S80q4BaszFx9bNnHLQ6sgL6J9HpxOFXRfFtKsh6le5cKxaCZzZ/qH/SJ zONNCxWzVoPiQhYsD9dliy9AdrB2UY/Bhb82egQk/l96x3r4UUhrX92dr53s7Ptki96Mx4 DvJi8H/xrcKSa4upwFZAuWyRsvkJtw5DUL41fEdJBQDb5TWXZV3TgknVzb3pm5Cw5eabt3 ZKinBz9XCIvsS/RAa3Cb6hoIPIA7Rb0ceUklQcqu2F89W2umMVSE8bbF5HocBMl20u4Nxk 92MelAoY5r8qDI90ESLS21z7KfMewjRnztTxCHQfAtq0f2Qz4D8wiVPs5MI39B39+ARIMl 7vGlI3BeSZ/LbcxUUUzqd0Q3MmJoqw42mfHQ4uJNv9dJLruJ2+FcaOcFYdA/PGYdncCvJm NAsT0ivnUcG82UzbIQXKjCJBkce1l0gVAAL3WBrzEZ8nhtKDwdCfdUcNZW4ZX6rtTB1VpR 5wexrTcjiHB0QwWwIy7ckY06iH+3NrLTKNb6QSdAcP9seml2MROTMxQ1FoERhXV6sSIagC Hu0CHhXEQWU/ffyogkgH6zX/A79BD9K7JUjoDHaM0Elmp+YgJaZmleljAD0ZEeaYoBnAj0 vqvWj4uQnpamauZkHm7n9ri5/TQlFF0RAQGr89u4giMyd04mwY3ytj59lYRA X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:04 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 11/15] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Date: Fri, 29 May 2026 18:26:40 +0100 Message-ID: <20260529172716.357179-12-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable PAGEMAP_SCAN already reports PAGE_IS_WRITTEN from the inverted uffd PTE bit, targeting the UFFDIO_WRITEPROTECT workflow. UFFDIO_RWPROTECT reuses the same PTE bit as a marker for read-write protection, but "has been written" and "has been accessed" are distinct semantic signals =E2=80=94 they happen to share one PTE bit today only because the t= wo implementations share infrastructure. Give RWP its own pagemap category so the UAPI does not conflate them: PAGE_IS_WRITTEN reported on VM_UFFD_WP VMAs, !pte_uffd(pte) PAGE_IS_ACCESSED reported on VM_UFFD_RWP VMAs, !pte_uffd(pte) Both still read the same PTE bit today, but each is scoped to the VMA whose registered mode makes the bit meaningful. If a future implementation moves RWP to a separate PTE bit, only PAGE_IS_ACCESSED switches over. This is a UAPI narrowing. Outside VM_UFFD_WP VMAs the uffd bit is always clear, so PAGEMAP_SCAN used to flag PAGE_IS_WRITTEN on every present PTE there =E2=80=94 a meaningless duplicate of PAGE_IS_PRESENT. Now PAGE_IS_WRITTEN fires only inside VM_UFFD_WP VMAs. pagemap_hugetlb_category() now takes the vma like its PTE/PMD peers. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/pagemap.rst | 13 +++-- fs/proc/task_mmu.c | 63 +++++++++++++++++------- include/uapi/linux/fs.h | 1 + tools/include/uapi/linux/fs.h | 1 + 4 files changed, 57 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin= -guide/mm/pagemap.rst index c57e61b5d8aa..ffa690a171c8 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -19,8 +19,11 @@ There are four components to pagemap: * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst) * Bit 56 page exclusively mapped (since 4.2) - * Bit 57 pte is uffd-wp write-protected (since 5.13) (see - Documentation/admin-guide/mm/userfaultfd.rst) + * Bit 57 pte is tracked by userfaultfd (since 5.13) =E2=80=94 in a + ``VM_UFFD_WP`` VMA this indicates a write-protected PTE; in a + ``VM_UFFD_RWP`` VMA it indicates an RWP-protected PTE. WP and + RWP are mutually exclusive per VMA, so the meaning is + unambiguous. See Documentation/admin-guide/mm/userfaultfd.rst. * Bit 58 pte is a guard region (since 6.15) (see madvise (2) man p= age) * Bits 59-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) @@ -244,7 +247,8 @@ in this IOCTL: Following flags about pages are currently supported: =20 - ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled -- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was writ= e protected +- ``PAGE_IS_WRITTEN`` - Page in a ``UFFDIO_REGISTER_MODE_WP`` VMA has been + written to since it was write-protected. Only reported inside such VMAs. - ``PAGE_IS_FILE`` - Page is file backed - ``PAGE_IS_PRESENT`` - Page is present in the memory - ``PAGE_IS_SWAPPED`` - Page is in swapped @@ -252,6 +256,9 @@ Following flags about pages are currently supported: - ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed - ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty - ``PAGE_IS_GUARD`` - Page is a part of a guard region +- ``PAGE_IS_ACCESSED`` - Page in a ``UFFDIO_REGISTER_MODE_RWP`` VMA has be= en + accessed since RWP was applied. Only reported inside such VMAs. See + Documentation/admin-guide/mm/userfaultfd.rst for the RWP workflow. =20 The ``struct pm_scan_arg`` is used as the argument of the IOCTL. =20 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ca0f69b347e8..a1683d73b405 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2284,7 +2284,7 @@ static const struct mm_walk_ops pagemap_ops =3D { * Bits 5-54 swap offset if swapped * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) * Bit 56 page exclusively mapped - * Bit 57 pte is uffd-wp write-protected + * Bit 57 pte is tracked by userfaultfd (uffd-wp or RWP) * Bit 58 pte is a guard region * Bits 59-60 zero * Bit 61 page is file-page or shared-anon @@ -2419,7 +2419,7 @@ static int pagemap_release(struct inode *inode, struc= t file *file) PAGE_IS_FILE | PAGE_IS_PRESENT | \ PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \ PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \ - PAGE_IS_GUARD) + PAGE_IS_GUARD | PAGE_IS_ACCESSED) #define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC) =20 struct pagemap_scan_private { @@ -2444,8 +2444,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page(vma, addr, pte); @@ -2462,8 +2466,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 entry =3D softleaf_from_pte(pte); if (softleaf_is_guard_marker(entry)) @@ -2512,8 +2520,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page_pmd(vma, addr, pmd); @@ -2527,8 +2539,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_swp_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; =20 @@ -2561,7 +2577,8 @@ static void make_uffd_wp_pmd(struct vm_area_struct *v= ma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 #ifdef CONFIG_HUGETLB_PAGE -static unsigned long pagemap_hugetlb_category(pte_t pte) +static unsigned long pagemap_hugetlb_category(struct vm_area_struct *vma, + pte_t pte) { unsigned long categories =3D PAGE_IS_HUGE; =20 @@ -2576,8 +2593,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!huge_pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; if (is_zero_pfn(pte_pfn(pte))) @@ -2587,8 +2608,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; } @@ -2864,7 +2889,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, goto flush_and_return; } =20 - if (!p->arg.category_anyof_mask && !p->arg.category_inverted && + if (userfaultfd_wp(vma) && !p->arg.category_anyof_mask && + !p->arg.category_inverted && p->arg.category_mask =3D=3D PAGE_IS_WRITTEN && p->arg.return_mask =3D=3D PAGE_IS_WRITTEN) { for (addr =3D start; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -2939,7 +2965,8 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, /* Go the short route when not write-protecting pages. */ =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | + pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) return 0; @@ -2951,7 +2978,7 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, ptl =3D huge_pte_lock(hstate_vma(vma), vma->vm_mm, ptep); =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) goto out_unlock; diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 13f71202845e..c4aeaa0c31c7 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -455,6 +455,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index 24ddf7bc4f25..f0a26309b6d5 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -364,6 +364,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28A5A38F248; Fri, 29 May 2026 17:28:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075692; cv=none; b=kT1N+b/PhvdCDQu9cCVl4+EH5oCH//n1CWjUFqtObms60iLG2a5cuuBUDZkcbad9dcKdtAyQWESIWXqe0hRkWJi98uBqsYbwH7tGl9LOTTVfQArKw2WI2T4kaPN1EGrn6ZVUn3uTaG8lgFk6I5SeiPwxEfG1Asm3+nM6FqAlK0E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075692; c=relaxed/simple; bh=B795AlSscjU+quUZgxpCJ/aoVwP6jYSl9lFQZZODYXI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eU7EhSGULmL1Wxs1M5kWrQF//eYboBiCWDguD9Opyt7YBXDSnBvt2bNF9Cz06hzyvorL8xnf5C1ZdwEdpjtTKZLAYHyoNXHmcMzfY5cVL9oSUnNUnO8agvMa6PkEIbLtYH8x2HRkgRU7qeUXi10z9IRY8RawZWrCOyYicFo2QxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=G//HA131; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="G//HA131" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E3A11F0089C; Fri, 29 May 2026 17:28:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075688; bh=ADbKc85LrkLZOvS7Y1h7hpN699pGMwcdOp6g8DvIwlY=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=G//HA131z1d+vWmR2bas6foAgrBNPgxq6R+uTZmhLTrWV6qAwVNv8qCEBGXHLVSFl /t/+Rv/OAdIdZur7gUxV3qFn3hGe23/xmq2iU6IDme8n5iqNe2IW9+02zhmy4N4f4l PpeXmHxoU01OujSCplwbbsESvdsILYCFGL8x8pqSlxHsHcBNE9xu1Ienr589d27SVv C8zIGFwDA2hw1UQy9Sp5K5AH+6DRQuc2LV2eMOvyaNQAWXrZ/XnpON+2QcyWJhtHSe Tx+u9ExrsDZx+eW/wJgHS3FhbJxD8W0iR2fyNz7mJV7zmj5spbrEwfxAAGX/6wAk38 AnqNEMevnPphA== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 995A4F40070; Fri, 29 May 2026 13:28:06 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 29 May 2026 13:28:06 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEHS1Z+n1S80q4BaszFx9bNnHLQ6sgL6J9HpxOFXRfFtKsh6le5cKxaCZzZ/qH/SJ zONNCxWzVoPiQhYsD9dliy9AdrB2UY/Bhb82egQk/l96x3r4UUhrX92dr53s7Ptki96Mx4 DvJi8H/xrcKSa4upwFZAuWyRsvkJtw5DUL41fEdJBQDb5TWXZV3TgknVzb3pm5Cw5eabt3 ZKinBz9XCIvsS/RAa3Cb6hoIPIA7Rb0ceUklQcqu2F89W2umMVSE8bbF5HocBMl20u4Nxk 92MelAoY5r8qDI90ESLS21z7KfMewjRnztTxCHQfAtq0f2Qz4D8wiVPs5MI39B39+ARICH MVi1tFYcSFTEJer+zf9qinKOoAsOjaND8wl8K0S5vkiW33iZad0dBlcZDkxEIguT72vf0X MVWUtcQfczwY+/fVQeMpWlbM18InrmcXh9HcOOib3I/iKiinojE4Jss27qHdDmcOOYoxwu d173aOKqwBRaXDgrFr3LuzM1b+mIZOnNncS6dF7On6VBZzXU7KUEEP367JjIsmH1Xxl9O4 znCS+QbyWONBH8kPzt8ijPXL81CcRSbaXM+5QbynXJ7aQguQlR2JxXT6/XzIn5nrgE/dhn yY6/7Z/GoLR3zeTOkswjcQW1XF4VdBx45NFgHo5KqY2WHMXCg9bihS+/Fh1A X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:06 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 12/15] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Date: Fri, 29 May 2026 18:26:41 +0100 Message-ID: <20260529172716.357179-13-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sync RWP delivers a message and blocks the faulting thread until the handler resolves the fault. For working-set tracking the VMM does not need the message: it just needs to know, at scan time, which pages were touched. Async RWP serves that use case =E2=80=94 the kernel restores access in-place and the faulting thread continues without blocking. The VMM reconstructs the access pattern after the fact via PAGEMAP_SCAN: pages whose uffd bit is still set (inverted PAGE_IS_ACCESSED) were not re-accessed since the last RWP cycle. Worth calling out: async resolution upgrades writable private anon PTEs via pte_mkwrite() when can_change_pte_writable() allows, mirroring do_numa_page(). Without it, every re-access of an RWP'd writable page would COW-fault a second time. UFFD_FEATURE_RWP_ASYNC requires UFFD_FEATURE_RWP. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/userfaultfd_k.h | 6 ++++++ include/uapi/linux/userfaultfd.h | 11 ++++++++++- mm/huge_memory.c | 25 ++++++++++++++++++++++++- mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++- mm/memory.c | 27 +++++++++++++++++++++++++-- mm/userfaultfd.c | 19 ++++++++++++++++++- 6 files changed, 114 insertions(+), 6 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 6b633ec694e1..dd3c8ba97296 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -281,6 +281,7 @@ extern void userfaultfd_unmap_complete(struct mm_struct= *mm, struct list_head *uf); extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern bool userfaultfd_rwp_async(struct vm_area_struct *vma); =20 static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma) { @@ -459,6 +460,11 @@ static inline bool userfaultfd_wp_async(struct vm_area= _struct *vma) return false; } =20 +static inline bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return false; +} + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index d803e76d47ad..c10f08f8a618 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -44,7 +44,8 @@ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ UFFD_FEATURE_MOVE | \ - UFFD_FEATURE_RWP) + UFFD_FEATURE_RWP | \ + UFFD_FEATURE_RWP_ASYNC) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -243,6 +244,13 @@ struct uffdio_api { * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. * Pages are made inaccessible via UFFDIO_RWPROTECT and faults * are delivered when the pages are re-accessed. + * + * UFFD_FEATURE_RWP_ASYNC indicates asynchronous mode for + * UFFDIO_REGISTER_MODE_RWP. When set, faults on read-write + * protected pages are auto-resolved by the kernel (PTE + * permissions restored immediately) without delivering a message + * to the userfaultfd handler. Use PAGEMAP_SCAN with inverted + * PAGE_IS_ACCESSED to find pages that were not re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -262,6 +270,7 @@ struct uffdio_api { #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) #define UFFD_FEATURE_RWP (1<<17) +#define UFFD_FEATURE_RWP_ASYNC (1<<18) __u64 features; =20 __u64 ioctls; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 72cb44332004..8f120452d995 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2291,7 +2291,30 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, =20 vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) { - return handle_userfault(vmf, VM_UFFD_RWP); + struct vm_area_struct *vma =3D vmf->vma; + pmd_t pmd; + + if (!userfaultfd_rwp_async(vma)) + return handle_userfault(vmf, VM_UFFD_RWP); + + vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(pmdp_get(vmf->pmd), vmf->orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + pmd =3D pmd_modify(vmf->orig_pmd, vma->vm_page_prot); + /* pmd_modify() preserves _PAGE_UFFD; drop it on resolution */ + pmd =3D pmd_clear_uffd(pmd); + pmd =3D pmd_mkyoung(pmd); + if (!pmd_write(pmd) && + vma_wants_manual_pte_write_upgrade(vma) && + can_change_pmd_writable(vma, vmf->address, pmd)) + pmd =3D pmd_mkwrite(pmd, vma); + set_pmd_at(vma->vm_mm, vmf->address & HPAGE_PMD_MASK, + vmf->pmd, pmd); + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; } =20 /* NUMA hinting page fault entry point for trans huge pmds */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d4da39d698b8..9da52d95b3fb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6070,7 +6070,37 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, */ if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { - return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + spinlock_t *ptl; + pte_t pte; + + /* Sync: drop hugetlb locks before blocking in handle_userfault() */ + if (!userfaultfd_rwp_async(vma)) + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + + ptl =3D huge_pte_lock(h, mm, vmf.pte); + pte =3D huge_ptep_get(mm, vmf.address, vmf.pte); + if (pte_protnone(pte) && huge_pte_uffd(pte)) { + unsigned int shift =3D huge_page_shift(h); + + pte =3D huge_pte_modify(pte, vma->vm_page_prot); + pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + /* huge_pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D huge_pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + /* + * Unlike do_uffd_rwp(), do not upgrade to writable + * here. Hugetlb lacks a can_change_huge_pte_writable() + * equivalent, so a write access will take a separate + * COW fault =E2=80=94 acceptable for the rare private hugetlb + * case. + */ + set_huge_pte_at(mm, vmf.address, vmf.pte, pte, + huge_page_size(h)); + update_mmu_cache(vma, vmf.address, vmf.pte); + } + spin_unlock(ptl); + ret =3D 0; + goto out_mutex; } =20 /* diff --git a/mm/memory.c b/mm/memory.c index 4f8b8dff0b7f..43b5e63c368b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6149,8 +6149,31 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru =20 static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) { - pte_unmap(vmf->pte); - return handle_userfault(vmf, VM_UFFD_RWP); + pte_t pte; + + if (!userfaultfd_rwp_async(vmf->vma)) { + /* Sync mode: unmap PTE and deliver to userfaultfd handler */ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); + } + + spin_lock(vmf->ptl); + if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } + pte =3D pte_modify(vmf->orig_pte, vmf->vma->vm_page_prot); + /* pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + if (!pte_write(pte) && + vma_wants_manual_pte_write_upgrade(vmf->vma) && + can_change_pte_writable(vmf->vma, vmf->address, pte)) + pte =3D pte_mkwrite(pte, vmf->vma); + set_pte_at(vmf->vma->vm_mm, vmf->address, vmf->pte, pte); + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; } =20 static vm_fault_t do_numa_page(struct vm_fault *vmf) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index db3707b9d977..f40bf473a6f6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2487,6 +2487,11 @@ static bool userfaultfd_wp_async_ctx(struct userfaul= tfd_ctx *ctx) return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); } =20 +static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) +{ + return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); +} + /* * Whether WP_UNPOPULATED is enabled on the uffd context. It is only * meaningful when userfaultfd_wp()=3D=3Dtrue on the vma and when it's @@ -4408,6 +4413,11 @@ bool userfaultfd_wp_async(struct vm_area_struct *vma) return userfaultfd_wp_async_ctx(vma->vm_userfaultfd_ctx.ctx); } =20 +bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return userfaultfd_rwp_async_ctx(vma->vm_userfaultfd_ctx.ctx); +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -4511,6 +4521,12 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, if (features & UFFD_FEATURE_WP_ASYNC) features |=3D UFFD_FEATURE_WP_UNPOPULATED; =20 + ret =3D -EINVAL; + /* RWP_ASYNC requires RWP */ + if ((features & UFFD_FEATURE_RWP_ASYNC) && + !(features & UFFD_FEATURE_RWP)) + goto err_out; + /* report all available features and ioctls to userland */ uffdio_api.features =3D UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR @@ -4533,7 +4549,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, * but not actually usable. */ if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_RWP; + uffdio_api.features &=3D + ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36C59390212; Fri, 29 May 2026 17:28:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075695; cv=none; b=sLcJFsxKCOit+cyua4tlUBOnG70v8iDFSVGfdShcyfn+Vm+PeNjDI9k0Ky1qTw/X3OIZNrIT6F9cQGSv++nKyiufgvjU7mk40iVUKlQBPZ5+uRunVz4NqT/FYjCDaZE58mth6rxrKaj3qNVDFXl2MkMPr7LcX+HwcD7LzPYntxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075695; c=relaxed/simple; bh=hIC3IeKACkXKCsODTQIfZXNmF9Z+XmJa3bFgTJcueqs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oUpC2qnuHpYBMiuObcey3WBRbR/HQBslWSJOmYLXJ6MD4AZUfiIzSuncuR+hSrnEsCNCoTd8vY32QeMlajtphhuekZOHVp+6LOWwN8O+u64kBIYl6jKJ2mQGVlfTzyKFA43nc2lPEPIgK+Kg3Tt6P3dQL+K8bIrJV/Nq2lyFDsI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QFPqhrbv; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QFPqhrbv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03DD51F00893; Fri, 29 May 2026 17:28:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075689; bh=4cGOsnZk0xVIPLCyz6aanH/SA/ySZrmY+G3zdoIpae8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=QFPqhrbv+oq5L1Y1rne8d5SyWmyjpMxI+dd7dJ4js+CF87JyscG9PMuO6mRNk7WUF jaYE94qgTEiMwV7DgMS1HfN/Jgi6wxr7xeubVwHbzytcHM55yZ3j771SFTnm/els+x 2XrOXiB5uCkruPJgBCUycEpExvINWTJz0lgsf6LA+jEfULuhqzSHb5iyNLWusQIhX7 v2RehCGHPOQznhE2cZNGWFMaJqMyeZw522xNTXD6R49ZZbzQiGoTcZ/yOSHz/K241L RP5xo23DcwD/KypdcFWwJtjHCqGRkBwvzQT423SkVkxG6b2jSa4h37KrcV05didUxc aVE5TI1E1FbYg== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id 63321F4006F; Fri, 29 May 2026 13:28:08 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Fri, 29 May 2026 13:28:08 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGahW8s3xjZ3arjCZYsPeROx1tHGGIOxZ4ZTborx5JIY1xfx8o+3Ks6DBHpwKdcBc M0LV7UYzyw5BBW/MLyCvRpPXH4b3j/O36w1UIutGmjGoZPWXllz5NGLuWKcGXqDNqja7AN LjyibA8JUH8LlxNww+LolPBmmXeffckfCMyZ3ofUE4q+w091KUDN+vWzLN57aJjrFHnLmA DJhoBTsevLB+L7lFnUlwKPgFqlT5AUPgwhCfp483ir0podeZjRFOwVO5acGk8d/hTIZjyE iTtkTrHlenKgPfeJb7TkHTrSkVle7vVdqqUtJoKJ+Cofq6CG9ZD2U9nUSpGxcnBPWB+GRa b0IO7F5OKS2OV+GbvbidsB1qtHMcNOoe+59ZDo6M7vz3L9A2FRVgvm4um+TJvo5fWSEW9U gH3jd7S3OC6KsDFT1J56/dHqFJxINl48DrpXpU1bzj2a1g/HuERh8sgT2E1m1RHDST9jJi s/mvHpvoIBhhOxkYWUGIRUGbh8jO6FM6EmptXydkJ15HpjpoP+vK1KttBgs0Q/TDXHgfrQ 0puVZknNbxLOQW+QuAqTV77dwReNjNWbo6afSkdM1Ku7s9UW3OMZq82w5fLoOMP6UZ8TO8 DV0fKnqtWOd9qz9StSvYRvO2tB8IknM5bHEO6dR9GIDcy2Cm4ocehQ6NrQWg X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:07 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 13/15] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Date: Fri, 29 May 2026 18:26:42 +0100 Message-ID: <20260529172716.357179-14-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add an ioctl to toggle async mode at runtime without re-registering the userfaultfd. This allows a VMM to switch between sync and async RWP modes on-the-fly -- for example, starting in async mode for working set scanning, then switching to sync mode to intercept faults during page eviction. UFFDIO_SET_MODE takes an enable/disable bitmask of UFFD_FEATURE_* flags. Only UFFD_FEATURE_RWP_ASYNC is toggleable today; the ioctl rejects any other bit with -EINVAL. Enabling RWP_ASYNC also requires RWP to have been negotiated at UFFDIO_API time, mirroring the UFFDIO_API invariant. Fault-path readers of ctx->features run under mmap_read_lock or a per-VMA lock; the RMW takes mmap_write_lock and calls vma_start_write() on every UFFD-armed VMA, so those readers are fully excluded. userfaultfd_show_fdinfo(), however, reads ctx->features without any lock, so the RMW is written as a single WRITE_ONCE and fdinfo reads it with READ_ONCE. That keeps the lockless observer from seeing a mid-RMW intermediate and removes the audit burden when new toggleable bits are added later. When switching to async, pending sync waiters are woken so they retry and auto-resolve under the new mode. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- include/uapi/linux/userfaultfd.h | 14 +++ mm/userfaultfd.c | 150 +++++++++++++++++++++++++------ 2 files changed, 136 insertions(+), 28 deletions(-) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index c10f08f8a618..cea11aad6b54 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -49,6 +49,7 @@ #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ + (__u64)1 << _UFFDIO_SET_MODE | \ (__u64)1 << _UFFDIO_API) #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ @@ -85,6 +86,7 @@ #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) #define _UFFDIO_RWPROTECT (0x09) +#define _UFFDIO_SET_MODE (0x0A) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -111,6 +113,8 @@ struct uffdio_poison) #define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ struct uffdio_rwprotect) +#define UFFDIO_SET_MODE _IOW(UFFDIO, _UFFDIO_SET_MODE, \ + struct uffdio_set_mode) =20 /* read() structure */ struct uffd_msg { @@ -406,6 +410,16 @@ struct uffdio_move { __s64 move; }; =20 +struct uffdio_set_mode { + /* + * Toggle async mode for features at runtime. + * Supported: UFFD_FEATURE_RWP_ASYNC. + * Setting a bit in both enable and disable is invalid. + */ + __u64 enable; + __u64 disable; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f40bf473a6f6..f172ec14a6c8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2477,19 +2477,29 @@ struct userfaultfd_wake_range { /* internal indication that UFFD_API ioctl was successfully executed */ #define UFFD_FEATURE_INITIALIZED (1u << 31) =20 +/* + * UFFDIO_SET_MODE updates ctx->features under mmap_write_lock with + * WRITE_ONCE; readers that run outside mmap_read_lock or the per-VMA + * lock (poll/read_iter/ioctl, fdinfo) must pair with READ_ONCE. + */ +static unsigned int userfaultfd_features(struct userfaultfd_ctx *ctx) +{ + return READ_ONCE(ctx->features); +} + static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) { - return ctx->features & UFFD_FEATURE_INITIALIZED; + return userfaultfd_features(ctx) & UFFD_FEATURE_INITIALIZED; } =20 static bool userfaultfd_wp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_WP_ASYNC); } =20 static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_RWP_ASYNC); } =20 /* @@ -2504,7 +2514,7 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct= *vma) if (!ctx) return false; =20 - return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; + return userfaultfd_features(ctx) & UFFD_FEATURE_WP_UNPOPULATED; } =20 static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, @@ -4290,6 +4300,109 @@ static int userfaultfd_rwprotect(struct userfaultfd= _ctx *ctx, return ret; } =20 +/* Subset of UFFD_API_FEATURES actually supported by this kernel/arch */ +static __u64 uffd_api_available_features(void) +{ + __u64 f =3D UFFD_API_FEATURES; + + if (!IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_MINOR)) + f &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + if (!pgtable_supports_uffd()) + f &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; + if (!uffd_supports_wp_marker()) + f &=3D ~(UFFD_FEATURE_WP_HUGETLBFS_SHMEM | + UFFD_FEATURE_WP_UNPOPULATED | + UFFD_FEATURE_WP_ASYNC); + /* + * RWP needs both PROT_NONE support and the uffd PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + f &=3D ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + return f; +} + +/* Async features that can be toggled at runtime via UFFDIO_SET_MODE */ +#define UFFD_FEATURE_TOGGLEABLE UFFD_FEATURE_RWP_ASYNC + +static int userfaultfd_set_mode(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + struct uffdio_set_mode mode; + struct mm_struct *mm =3D ctx->mm; + + if (copy_from_user(&mode, (void __user *)arg, sizeof(mode))) + return -EFAULT; + + /* enable and disable must not overlap */ + if (mode.enable & mode.disable) + return -EINVAL; + + /* only toggleable features that this kernel/arch actually supports */ + if ((mode.enable | mode.disable) & + ~(uffd_api_available_features() & UFFD_FEATURE_TOGGLEABLE)) + return -EINVAL; + + /* RWP_ASYNC can only be enabled on contexts that negotiated RWP */ + if ((mode.enable & UFFD_FEATURE_RWP_ASYNC) && + !(userfaultfd_features(ctx) & UFFD_FEATURE_RWP)) + return -EINVAL; + + if (!mmget_not_zero(mm)) + return -ESRCH; + + /* + * Drain in-flight faults before flipping features. mmap_write_lock() + * blocks new mmap_read_lock() callers, but per-VMA locked faults + * (lock_vma_under_rcu() + FAULT_FLAG_VMA_LOCK) that acquired before + * this point keep running. Calling vma_start_write() on each UFFD- + * armed VMA waits for those readers to drop, so no in-flight fault + * can observe the old features after mmap_write_unlock(). + */ + mmap_write_lock(mm); + { + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx =3D=3D ctx) + vma_start_write(vma); + } + } + /* + * Single WRITE_ONCE so lockless readers (fdinfo, poll/read_iter + * via userfaultfd_is_initialized(), and the userfaultfd_features() + * helper used elsewhere) can't observe a mid-RMW intermediate + * value. Hot-path readers already serialise through the mmap lock + * + vma_start_write() drain above, so their load doesn't need an + * annotation. + */ + WRITE_ONCE(ctx->features, + (ctx->features | mode.enable) & ~mode.disable); + mmap_write_unlock(mm); + + /* + * If switching to async, wake threads blocked in handle_userfault(). + * They will retry the fault and auto-resolve under the new mode. + * len=3D0 means wake all pending faults on this context. + */ + if (mode.enable & UFFD_FEATURE_RWP_ASYNC) { + struct userfaultfd_wake_range range =3D { .len =3D 0 }; + + spin_lock_irq(&ctx->fault_pending_wqh.lock); + __wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, + &range); + __wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range); + spin_unlock_irq(&ctx->fault_pending_wqh.lock); + } + + mmput(mm); + return 0; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -4528,29 +4641,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, goto err_out; =20 /* report all available features and ioctls to userland */ - uffdio_api.features =3D UFFD_API_FEATURES; -#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR - uffdio_api.features &=3D - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif - if (!pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; - - if (!uffd_supports_wp_marker()) { - uffdio_api.features &=3D ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; - } - /* - * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The - * VM_UFFD_RWP check covers compile-time unavailability; the - * pgtable_supports_uffd() check covers runtime (e.g. riscv - * without the SVRSW60T59B extension) where the PTE bit is declared - * but not actually usable. - */ - if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D - ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + uffdio_api.features =3D uffd_api_available_features(); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) @@ -4620,6 +4711,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_RWPROTECT: ret =3D userfaultfd_rwprotect(ctx, arg); break; + case UFFDIO_SET_MODE: + ret =3D userfaultfd_set_mode(ctx, arg); + break; } return ret; } @@ -4647,7 +4741,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *= m, struct file *f) * protocols: aa:... bb:... */ seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n", - pending, total, UFFD_API, ctx->features, + pending, total, UFFD_API, userfaultfd_features(ctx), UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS); } #endif --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64CA1390C9F for ; Fri, 29 May 2026 17:28:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075699; cv=none; b=PDPh2GpXzstQuJnlXmYbpZwgTIr4HXHBFS8tynX6uh1+QHsx4QplqPdnBQqIFFKVfmOgPThefGppdbZowlg25dTG8OZPNzGEVsMww0tZTxnIE4XM8Kwuc9kqN6C/z7AiQnNN1uTLps+NG7+Rz0+zfkT4Q2f01gETKKmV8I1INcQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075699; c=relaxed/simple; bh=PGJgs8fI3BQqvC0LEnPkmdF9PB6MBVqX7c0nhc+laW0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=t+NhWvPM13ozJuSeFofKGWgg7T5zB7V/eNVu3tXX7EWBX9ndxqptpuqCrbOitPn/PnPUL4lRVyhX66tYEbDZas1ekmuV96o78Xp/UFExavx+7/G42PrAWJ0fSpgRarn1a/t7+M/nXlxj9x2fafW+h8MGdKaWMfrOu7V3bmfopQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ljUaxZEa; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ljUaxZEa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CDBB31F0089D; Fri, 29 May 2026 17:28:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075692; bh=HhHhE0AQrKlVLBggXJvX1A2KMhQJOt0DmZdnuqiw3VQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=ljUaxZEa+08agKK3MuSVhkapXfgigLbEqKFfqCyEWQnSiza2JwQzVScvvBd/rcxL2 WTYV9GslTApBfARYmPNX6JFzlWjM3jmvS58Dq5jB6PPKEm3aX1s595ItvgJ3InZcWJ CnIUAOhhpRnX+xu2vpR7fLDa67mqFB/4HtQ5letg/sdNQ+GR7Ycgr94sxNk8EogUIs 9gAmgiKjoN/MILYV+10m/cMJidFV/Ytelk5hk2ZjTAZLoafi6pXgeT5ti0LhKx9LHd TqNXm/AKn89E7Sdv6h8snPnsy7ExJ3STYVpdD7x4YW2zvd/jLlV8OvbyaK9jAsc9YD bwWSnu7Tq4daw== Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfauth.phl.internal (Postfix) with ESMTP id 4BFB2F4006F; Fri, 29 May 2026 13:28:10 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Fri, 29 May 2026 13:28:10 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEHS1Z+n1S80q4BaszFx9bNnHLQ6sgL6J9HpxOFXRfFtKsh6le5cKxaCZzZ/qH/SJ zONNCxWzVoPiQhYsD9dliy9AdrB2UY/Bhb82egQk/l96x3r4UUhrX92dr53s7Ptki96Mx4 DvJi8H/xrcKSa4upwFZAuWyRsvkJtw5DUL41fEdJBQDb5TWXZV3TgknVzb3pm5Cw5eabt3 ZKinBz9XCIvsS/RAa3Cb6hoIPIA7Rb0ceUklQcqu2F89W2umMVSE8bbF5HocBMl20u4Nxk 92MelAoY5r8qDI90ESLS21z7KfMewjRnztTxCHQfAtq0f2Qz4D8wiVPs5MI39B39+ARIGQ toaraB83DZx1MKp+787umVvzqQ2JbE75vd7MUy4ysmmk+rBYxtaH0RARCKQxF2G61VQFcl 4inWZRnDnejvmYWqFrPurcEl2AofQYWk4MAZKCjclffwtiL3qKxOwXiKafPBG6FFPEk5iY 8mnG0IZh5IBljh08tJgvamBAq0+l2mOTMpgzlBqvc90Kf2uhL4XBwf4P0aLced5sKbCmfC o0p/KGJdrxNuYB+ocWMFBbRP8FJEyCF6ZG8j9rzl2AAyb1d7pReB3MzwVE+Cx4bjWuHIzA wTNqhnIg5px5TYEeePqdtv0SYg4BxxNh61FF+PAhpHqt2/2A56AxMf0q3JZQ X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:09 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 14/15] selftests/mm: add userfaultfd RWP tests Date: Fri, 29 May 2026 18:26:43 +0100 Message-ID: <20260529172716.357179-15-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Coverage for UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT: rwp-async async mode =E2=80=94 touch pages, verify permissions a= re auto-restored without a message rwp-sync sync mode =E2=80=94 access blocks, handler resolves via UFFDIO_RWPROTECT rwp-pagemap PAGEMAP_SCAN reports still-cold pages via inverted PAGE_IS_ACCESSED rwp-mprotect RWP survives mprotect(PROT_NONE) -> mprotect(PROT_READ|PROT_WRITE) round-trip rwp-gup GUP walks through a protnone RWP PTE (pipe write/read drives the GUP path) rwp-async-toggle UFFDIO_SET_MODE flips between sync and async without re-registering rwp-close closing the uffd restores page permissions rwp-fork RWP survives fork() with EVENT_FORK; child's PTEs keep the uffd bit rwp-fork-pin RWP survives fork() on an RO-longterm-pinned anon page (forces copy_present_page()); child read auto-resolves and clears the bit, proving PAGE_NONE was in place rwp-wp-exclusive register with MODE_WP|MODE_RWP returns -EINVAL All tests run against anon, shmem, shmem-private, hugetlb, and hugetlb-private memory, except rwp-fork-pin which is anon-only =E2=80=94 copy_present_page() is the private-anon pinned-exclusive fork path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- tools/testing/selftests/mm/uffd-unit-tests.c | 765 +++++++++++++++++++ 1 file changed, 765 insertions(+) diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/s= elftests/mm/uffd-unit-tests.c index a6c14109e818..9f5a5ccf6044 100644 --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -7,6 +7,8 @@ =20 #include "uffd-common.h" =20 +#include +#include #include "../../../../mm/gup_test.h" =20 #ifdef __NR_userfaultfd @@ -109,6 +111,10 @@ static void uffd_test_skip(const char *message) =20 static void test_uffd_api(bool use_dev) { + const uint64_t expected_ioctls =3D + BIT_ULL(_UFFDIO_REGISTER) | + BIT_ULL(_UFFDIO_UNREGISTER) | + BIT_ULL(_UFFDIO_API); struct uffdio_api uffdio_api; int uffd; =20 @@ -148,6 +154,15 @@ static void test_uffd_api(bool use_dev) goto out; } =20 + /* Verify returned fd-level ioctls bitmask */ + if ((uffdio_api.ioctls & expected_ioctls) !=3D expected_ioctls) { + uffd_test_fail("UFFDIO_API missing expected ioctls: " + "got=3D0x%"PRIx64", expected=3D0x%"PRIx64, + (uint64_t)uffdio_api.ioctls, + expected_ioctls); + goto out; + } + /* Test double requests of UFFDIO_API with a random feature set */ uffdio_api.features =3D BIT_ULL(0); if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D 0) { @@ -602,6 +617,685 @@ void uffd_minor_collapse_test(uffd_global_test_opts_t= *gopts, uffd_test_args_t * uffd_minor_test_common(gopts, true, false); } =20 +static int uffd_register_rwp(int uffd, void *addr, uint64_t len) +{ + struct uffdio_register reg =3D { + .range =3D { .start =3D (unsigned long)addr, .len =3D len }, + .mode =3D UFFDIO_REGISTER_MODE_RWP, + }; + + if (ioctl(uffd, UFFDIO_REGISTER, ®) =3D=3D -1) + return -errno; + return 0; +} + +static void rwprotect_range(int uffd, __u64 start, __u64 len, bool protect) +{ + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D start, .len =3D len }, + .mode =3D protect ? UFFDIO_RWPROTECT_MODE_RWP : 0, + }; + + if (ioctl(uffd, UFFDIO_RWPROTECT, &rwp)) + err("UFFDIO_RWPROTECT failed"); +} + +static void set_async_mode(int uffd, bool enable) +{ + struct uffdio_set_mode mode =3D { }; + + if (enable) + mode.enable =3D UFFD_FEATURE_RWP_ASYNC; + else + mode.disable =3D UFFD_FEATURE_RWP_ASYNC; + + if (ioctl(uffd, UFFDIO_SET_MODE, &mode)) + err("UFFDIO_SET_MODE failed"); +} + +/* + * Test async RWP faults on anonymous memory. + * Populate pages, register MODE_RWP with RWP_ASYNC, + * RW-protect, re-access, verify content preserved and no faults delivered. + */ +static void uffd_rwp_async_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate all pages with known content */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages (sets protnone) */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Access all pages =E2=80=94 should auto-resolve, no faults */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu content mismatch: %u !=3D %u", + p, page[0], expected); + return; + } + } + + uffd_test_pass(); +} + +/* + * Fault handler for RWP =E2=80=94 unprotect the page via UFFDIO_RWPROTECT. + */ +static void uffd_handle_rwp_fault(uffd_global_test_opts_t *gopts, + struct uffd_msg *msg, + struct uffd_args *uargs) +{ + if (!(msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_RWP)) + err("expected RWP fault, got 0x%llx", + msg->arg.pagefault.flags); + + rwprotect_range(gopts->uffd, msg->arg.pagefault.address, + gopts->page_size, false); + uargs->minor_faults++; +} + +/* + * Test sync RWP faults on anonymous memory. + * Populate pages, register MODE_RWP (sync), RW-protect, + * access from worker thread, verify fault delivered, UFFDIO_RWPROTECT res= olves. + */ +static void uffd_rwp_sync_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + pthread_t uffd_mon; + struct uffd_args uargs =3D { }; + bool failed =3D false; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Start fault handler thread */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access all pages =E2=80=94 triggers sync RWP faults, handler unprotect= s */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + failed =3D true; + goto out; + } + } + +out: + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (failed) + return; + if (uargs.minor_faults =3D=3D 0) + uffd_test_fail("expected RWP faults, got 0"); + else + uffd_test_pass(); +} + +/* + * Test PAGEMAP_SCAN detection of RW-protected (cold) pages. + */ +static void uffd_rwp_pagemap_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Need at least 4 pages */ + if (nr_pages < 4) { + uffd_test_skip("need at least 4 pages"); + return; + } + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Touch first half of pages to re-activate them (async auto-resolve) */ + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* Scan for cold (still RW-protected) pages */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + /* + * PAGE_IS_ACCESSED is set once the uffd-wp bit has been cleared + * (access happened, or the user resolved). Invert it to select + * still-protected (cold) pages. + */ + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + + /* + * The second half of pages should be reported as RW-protected. + * They may be coalesced into one region. + */ + if (ret < 1) { + uffd_test_fail("expected cold pages, got %ld regions", ret); + return; + } + + /* Verify the cold region covers the second half */ + uint64_t cold_start =3D regions[0].start; + uint64_t expected_start =3D (uint64_t)gopts->area_dst + + (nr_pages / 2) * page_size; + + if (cold_start !=3D expected_start) { + uffd_test_fail("cold region starts at 0x%lx, expected 0x%lx", + (unsigned long)cold_start, + (unsigned long)expected_start); + return; + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection survives a mprotect(PROT_NONE) -> + * mprotect(PROT_READ|PROT_WRITE) round-trip. The uffd-wp bit on a + * VM_UFFD_RWP VMA must continue to carry PROT_NONE semantics after + * mprotect() changes the base protection; otherwise accesses would + * silently succeed and the pagemap bit would stick without a fault + * ever clearing it. + */ +static void uffd_rwp_mprotect_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect the whole range */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Round-trip mprotect(): PROT_NONE -> PROT_READ|PROT_WRITE */ + if (mprotect(gopts->area_dst, nr_pages * page_size, PROT_NONE)) + err("mprotect() PROT_NONE"); + if (mprotect(gopts->area_dst, nr_pages * page_size, + PROT_READ | PROT_WRITE)) + err("mprotect() PROT_READ|PROT_WRITE"); + + /* Touch every page. Async RWP must auto-resolve each fault. */ + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* + * After touching, no page should remain RW-protected. A stuck + * uffd-wp bit would mean mprotect() silently dropped PROT_NONE and + * the access never faulted. + */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + if (ret !=3D 0) { + uffd_test_fail("expected no cold pages after mprotect()+touch, got %ld r= egions", + ret); + return; + } + + uffd_test_pass(); +} + +/* + * Test that GUP resolves through protnone PTEs (async mode). + * vmsplice() into a pipe pins user pages via get_user_pages_fast() -- + * unlike write(), which goes through copy_from_user() and ordinary + * hardware page faults -- so it exercises gup_can_follow_protnone() on + * the RW-protected PTE. In async mode the kernel auto-restores + * permissions and GUP returns the page. + */ +static void uffd_rwp_gup_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + struct iovec iov; + char buf; + int pipefd[2]; + + /* Populate first page with known content */ + memset(gopts->area_dst, 0xCD, gopts->page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, gopts->page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + gopts->page_size, true); + + if (pipe(pipefd)) + err("pipe"); + + /* + * One byte's worth of iov is enough to GUP the containing page and + * keeps the pipe transfer well under any pipe-capacity limit even on + * hugetlb-backed runs. + */ + iov.iov_base =3D gopts->area_dst; + iov.iov_len =3D 1; + if (vmsplice(pipefd[1], &iov, 1, 0) !=3D 1) { + uffd_test_fail("vmsplice from RW-protected page failed: %s", + strerror(errno)); + goto out; + } + + if (read(pipefd[0], &buf, 1) !=3D 1) { + uffd_test_fail("read from pipe failed"); + goto out; + } + + if (buf !=3D (char)0xCD) { + uffd_test_fail("content mismatch: got 0x%02x, expected 0xCD", + (unsigned char)buf); + goto out; + } + + uffd_test_pass(); +out: + close(pipefd[0]); + close(pipefd[1]); +} + +/* + * Test runtime toggle between async and sync modes. + * Start in async mode (detection), flip to sync (eviction), verify faults + * block, resolve them, flip back to async. + */ +static void uffd_rwp_async_toggle_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffd_args uargs =3D { }; + pthread_t uffd_mon; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* Phase 1: async detection =E2=80=94 RW-protect, access first half */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; /* auto-resolves in async mode */ + } + + /* Phase 2: flip to sync for eviction */ + set_async_mode(gopts->uffd, false); + + /* Start handler =E2=80=94 will receive faults for cold pages */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access second half (cold pages) =E2=80=94 should trigger sync faults */ + for (p =3D nr_pages / 2; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + goto out; + } + } + + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. Stopping + * here also makes Phase 3 a clean async-only test -- with the + * handler still running it would silently resolve any sync fault + * the kernel erroneously delivers, masking a regression. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (uargs.minor_faults =3D=3D 0) { + uffd_test_fail("expected sync faults, got 0"); + return; + } + + /* Phase 3: flip back to async */ + set_async_mode(gopts->uffd, true); + + /* RW-protect and access again =E2=80=94 should auto-resolve */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + uffd_test_pass(); + return; +out: + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); +} + +/* + * Test that RW-protected pages become accessible after closing uffd. + */ +static void uffd_rwp_close_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Close uffd =E2=80=94 should restore protnone PTEs */ + close(gopts->uffd); + gopts->uffd =3D -1; + + /* All pages should be accessible with original content */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu not accessible after close", p); + return; + } + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection is preserved across fork() when + * UFFD_FEATURE_EVENT_FORK is enabled. Without preservation, the child's + * PTEs would lose the uffd-wp marker and RWP-protected accesses would + * silently fall through to do_numa_page(). + */ +static void uffd_rwp_fork_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + int pagemap_fd; + uint64_t value; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failed"); + + /* Populate + RWP-protect */ + *gopts->area_dst =3D 1; + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + page_size, true); + + /* Parent: verify uffd-wp bit is set before fork */ + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * Fork with EVENT_FORK: child inherits VM_UFFD_RWP. Child reads + * its own pagemap and must still see the uffd-wp bit set. + */ + if (pagemap_test_fork(gopts, true, false)) { + uffd_test_fail("RWP marker lost in child after fork"); + goto out; + } + + uffd_test_pass(); +out: + close(pagemap_fd); +} + +/* + * Test that RWP protection on a pinned anon page is preserved across fork= (). + * Pinning forces copy_present_page() in the child path, which must restore + * PAGE_NONE on top of the uffd bit. Using async mode, a read in the child + * auto-resolves if =E2=80=94 and only if =E2=80=94 the PTE was actually p= rotnone+uffd; the + * cleared uffd bit afterward proves the fault path ran. + */ +static void uffd_rwp_fork_pin_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long page_size =3D gopts->page_size; + fork_event_args fevent_args =3D { .gopts =3D gopts, .child_uffd =3D -1 }; + pin_args pin_args =3D {}; + int pagemap_fd, status; + pthread_t fevent_thread; + uint64_t value; + pid_t child; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, page_size)) + err("register failed"); + + /* Populate. */ + *gopts->area_dst =3D 1; + + /* RO-longterm pin so fork() takes copy_present_page() for this PTE. */ + if (pin_pages(&pin_args, gopts->area_dst, page_size)) { + uffd_test_skip("Possibly CONFIG_GUP_TEST missing or unprivileged"); + uffd_unregister(gopts->uffd, gopts->area_dst, page_size); + return; + } + + /* RWP-protect: PTE is now PAGE_NONE + uffd bit. */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, page_size, true); + + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * UFFD_FEATURE_EVENT_FORK is required so the child inherits + * VM_UFFD_RWP and the marker; without it dup_userfaultfd() resets + * the child VMA and the test would pass for the wrong reason. + * dup_userfaultfd() blocks until the EVENT_FORK message is consumed, + * so spawn a reader before the fork(). + */ + gopts->ready_for_fork =3D false; + if (pthread_create(&fevent_thread, NULL, fork_event_consumer, + &fevent_args)) + err("pthread_create() for fork event consumer"); + while (!gopts->ready_for_fork) + ; /* Wait for consumer to start polling. */ + + child =3D fork(); + if (child < 0) + err("fork"); + if (child =3D=3D 0) { + volatile char c; + int cfd; + + /* + * Read the pinned page. Only reaches the fault path if the + * child PTE is protnone + uffd; async mode auto-resolves and + * clears the uffd bit. If copy_present_page() dropped + * PAGE_NONE, the read would silently succeed and the bit + * would still be set. + */ + c =3D *(volatile char *)gopts->area_dst; + (void)c; + + cfd =3D pagemap_open(); + value =3D pagemap_get_entry(cfd, gopts->area_dst); + close(cfd); + _exit((value & PM_UFFD_WP) ? 1 : 0); + } + if (waitpid(child, &status, 0) < 0) + err("waitpid"); + if (pthread_join(fevent_thread, NULL)) + err("pthread_join() for fork event consumer"); + if (fevent_args.child_uffd >=3D 0) + close(fevent_args.child_uffd); + + unpin_pages(&pin_args); + close(pagemap_fd); + if (uffd_unregister(gopts->uffd, gopts->area_dst, page_size)) + err("unregister failed"); + + if (!WIFEXITED(status) || WEXITSTATUS(status) !=3D 0) { + uffd_test_fail("RWP not enforced in child after pinned fork"); + return; + } + + uffd_test_pass(); +} + +/* + * WP and RWP share the uffd-wp PTE bit and cannot coexist in the same VMA. + * Registration requesting both modes must be rejected. + */ +static void uffd_rwp_wp_exclusive_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffdio_register reg =3D { }; + + reg.range.start =3D (unsigned long)gopts->area_dst; + reg.range.len =3D nr_pages * page_size; + reg.mode =3D UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_RWP; + + if (ioctl(gopts->uffd, UFFDIO_REGISTER, ®) =3D=3D 0) { + uffd_test_fail("register with WP|RWP unexpectedly succeeded"); + return; + } + if (errno !=3D EINVAL) { + uffd_test_fail("register with WP|RWP: expected EINVAL, got %d", + errno); + return; + } + uffd_test_pass(); +} + static sigjmp_buf jbuf, *sigbuf; =20 static void sighndl(int sig, siginfo_t *siginfo, void *ptr) @@ -1604,6 +2298,77 @@ uffd_test_case_t uffd_tests[] =3D { /* We can't test MADV_COLLAPSE, so try our luck */ .uffd_feature_required =3D UFFD_FEATURE_MINOR_SHMEM, }, + { + .name =3D "rwp-async", + .uffd_fn =3D uffd_rwp_async_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-sync", + .uffd_fn =3D uffd_rwp_sync_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-pagemap", + .uffd_fn =3D uffd_rwp_pagemap_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-mprotect", + .uffd_fn =3D uffd_rwp_mprotect_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-gup", + .uffd_fn =3D uffd_rwp_gup_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-async-toggle", + .uffd_fn =3D uffd_rwp_async_toggle_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-close", + .uffd_fn =3D uffd_rwp_close_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-fork", + .uffd_fn =3D uffd_rwp_fork_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-fork-pin", + .uffd_fn =3D uffd_rwp_fork_pin_test, + .mem_targets =3D MEM_ANON, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC | + UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-wp-exclusive", + .uffd_fn =3D uffd_rwp_wp_exclusive_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | + UFFD_FEATURE_PAGEFAULT_FLAG_WP | + UFFD_FEATURE_WP_HUGETLBFS_SHMEM, + }, { .name =3D "sigbus", .uffd_fn =3D uffd_sigbus_test, --=20 2.54.0 From nobody Mon Jun 8 10:56:41 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AD61383C86 for ; Fri, 29 May 2026 17:28:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075702; cv=none; b=eG9UQFYXImUWTVtTUyd3KPayL2yK1eaSh2yQbtblOU5Cidpf2qPiWSZlaFd5Rg+MiBIdpI/tEnoc829zLTqySCw05qXI1G1oa1GUXZOns0mJKiH/F8QN18y/aGLg0/s9VWDCeK+z6wotxw2zfBDqrM0ZmCFL0cpZ4xgTzqRIgMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075702; c=relaxed/simple; bh=ECJp/SwN+SvABQG3MNxZ61tl15iK0iHjERJkwYyNR4s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=m5FooWUqupP+mCVlCrku1l0lES6y8WOxDM8z0bSCZhpHK5pp9hXLO5rrOO6A9t/lyVuKJgvfLMVE9wkaH00d2niTDzvVhtdQ+IkAAY9TMUWUkCcCMFuar6COC2D4NIPxYCyzNaPNh+pdF8RzTu9sWaqgwCH/Y3+6S+JGmj4bXb4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Wzjt4seb; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Wzjt4seb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8F50E1F0089F; Fri, 29 May 2026 17:28:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780075694; bh=MIQxxozRIu6+nnGJE0oJP6Nk3SNPcpDq2i+v8EPL78w=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Wzjt4sebkPZ6jVBhJa+W3B7qhYvkYj/u3xwrMJvT43wGuv3oysK0vjE38by/BDDSD Muahw60ipDLwR4SVDM7YeuO6cRle2cbvGq0kkib51ZdYxCq/D89DVahF0tEOTX0zCc nloBqyNqoS9ARUc0yG31esZOHcMpZ40I5MlWdKPFcaAEwgqF7FnOgNp8564XeLwTO3 pmZUBwg3ovZ+Z1s8E1eSWsy1zFRwfTrObhUnyKqQbP56SkVAbd9TFZbfN+L6TORNw/ RXxajWOwaPljfscjWU6VcADGdo3LkZJrszFo6Xkkem65TtNWSmWVOTId9dEoag8ckt UcpOa8+uc5iLA== Received: from phl-compute-10.internal (phl-compute-10.internal [10.202.2.50]) by mailfauth.phl.internal (Postfix) with ESMTP id EDF8FF4006F; Fri, 29 May 2026 13:28:12 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Fri, 29 May 2026 13:28:12 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGBi04+ZX5edMs6arnVhjz5oTtZuLX6rF9F7dBMO1f7bo0/ibgVknVxOLxLEaTw3V SP8XCWrfEIm+ORQdwtFgGmZbSjd5PWqHK7OyczVLOTkfernQnqiFyq/o6xwE8cbLWskv8V +iZNmtaHm6WeW6aqk5t6K7mKlErgu7aQt0UjXT2ADQ7mf9nBrz2Zxt+3DgC32qmbL5bfPS wp7fg604KU/dBYc/nC2f3KAKjHsrOYa0CCMIHhOSv9QOJzU1reEx7CzTRePC09oBWGnLrT o+Pu2UEaZ81TuKLWkyHIRAb9DY0C7evn8G3qQgUz8zVG7T+c5/VvKECJCsNluq96Jwivtj dSsiyOB9JK1+TqpuilG+0oS9Uz8ZJtNYcFKsnGxgHV/a00Q740eP/GqRi2+fD4HWViNz1N l1MUW3aPaBSPOJBJH4X1DAHkBkCWXUqal/HVx0w93oZbm+raHUlRozRwpFZ5gWcn0WjiVs /BHjthOcCrulBbD06r7frzLXf48yDnKzoh9G2ZWGjiLbqpOYQVglNT0r5bui58+iN0m2X+ 5boxVqkorYKY/H1175kziF5tVhhelAbabXyOQns2oz01ntmpZ2NT0KJLGVRJAn4Ms9FLOe zAedmCzLH+PXoATcn11B/eMhoIeBKJh/5BXhh1oQNTMgeTwD2qMp8P2YC86Q X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 29 May 2026 13:28:12 -0400 (EDT) From: "Kiryl Shutsemau (Meta)" To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, kas@kernel.org Subject: [PATCH v6 15/15] Documentation/userfaultfd: document RWP working set tracking Date: Fri, 29 May 2026 18:26:44 +0100 Message-ID: <20260529172716.357179-16-kas@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260529172716.357179-1-kas@kernel.org> References: <20260529172716.357179-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP: - sync and async fault models; - UFFDIO_RWPROTECT semantics; - UFFD_FEATURE_RWP_ASYNC; - UFFDIO_SET_MODE runtime mode flips. It also covers typical VMM working-set-tracking workflow from detection loop through sync-mode eviction and back to async. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 --- Documentation/admin-guide/mm/userfaultfd.rst | 243 ++++++++++++++++++- 1 file changed, 237 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index 1e533639fd50..2a72e54962c8 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -275,16 +275,16 @@ tracking and it can be different in a few ways: - Dirty information will not get lost if the pte was zapped due to various reasons (e.g. during split of a shmem transparent huge page). =20 - - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit - set; dirty when uffd-wp bit cleared), it has different semantics on - some of the memory operations. For example: ``MADV_DONTNEED`` on + - Due to a reverted meaning of soft-dirty (page clean when the uffd bit + is set; dirty when the uffd bit is cleared), it has different semantics + on some of the memory operations. For example: ``MADV_DONTNEED`` on anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as - dirtying of memory by dropping uffd-wp bit during the procedure. + dirtying of memory by dropping the uffd bit during the procedure. =20 The user app can collect the "written/dirty" status by looking up the -uffd-wp bit for the pages being interested in /proc/pagemap. +uffd bit for the pages being interested in /proc/pagemap. =20 -The page will not be under track of uffd-wp async mode until the page is +The page will not be under track of userfaultfd-wp async mode until the pa= ge is explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault that was tracked by async mode userfaultfd-wp is invalid. @@ -307,6 +307,237 @@ transparent to the guest, we want that same address r= ange to act as if it was still poisoned, even though it's on a new physical host which ostensibly doesn't have a memory error in the exact same spot. =20 +Read-Write Protection +--------------------- + +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)`` +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a +protected range fall through to the normal missing-page path. It uses the +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages +inaccessible while keeping them resident in memory. Works on anonymous, +shmem, and hugetlbfs memory. + +RWP is designed for VM memory managers that need to track the working set +of guest memory for cold page eviction to tiered or remote storage. + +**Setup:** + +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``. + Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well =E2=80=94 it requ= ires + ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call. + +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP`` + (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be + fetched back from storage). + +**Feature availability:** + +RWP is built on top of two kernel primitives: a spare PTE bit owned by +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and architecture support +for present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When= both +are available on a 64-bit kernel, the build selects +``CONFIG_USERFAULTFD_RWP=3Dy`` and the ``VM_UFFD_RWP`` VMA flag becomes +available. + +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are unavailable when +the running kernel or architecture does not support them =E2=80=94 for exa= mple +32-bit kernels (where ``VM_UFFD_RWP`` is unavailable), kernels built +without ``CONFIG_USERFAULTFD_RWP``, and architectures whose ptes cannot +carry the uffd bit at runtime (e.g. riscv without the ``SVRSW60T59B`` +extension). Requesting an unsupported feature in +``uffdio_api.features`` makes ``UFFDIO_API`` fail with ``EINVAL`` and +leaves the userfaultfd context uninitialized; the bitmask returned in +``uffdio_api.features`` then advertises the features the kernel does +support. The recommended probe sequence is therefore to open a +throwaway userfaultfd, call ``UFFDIO_API`` once with ``features =3D 0``, +inspect the returned bitmask, close that fd, then open the real one +and call ``UFFDIO_API`` again with only the supported features set. + +**Protecting and Unprotecting:** + +Use ``UFFDIO_RWPROTECT`` to protect or unprotect a range, mirroring the +``UFFDIO_WRITEPROTECT`` interface:: + + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D addr, .len =3D len }, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP, /* protect */ + }; + ioctl(uffd, UFFDIO_RWPROTECT, &rwp); + +Setting ``UFFDIO_RWPROTECT_MODE_RWP`` sets PROT_NONE on present PTEs in the +range. Pages stay resident and their physical frames are preserved =E2=80= =94 only +access permissions are removed. + +Clearing ``UFFDIO_RWPROTECT_MODE_RWP`` restores normal VMA permissions and +wakes any faulting threads (unless ``UFFDIO_RWPROTECT_MODE_DONTWAKE`` is s= et). + +**Scope of protection:** + +RWP protection is a property of *present* PTEs. ``UFFDIO_RWPROTECT`` only +affects entries that are already populated. Unpopulated addresses within +the range remain unpopulated; when first accessed they fault through the +normal missing path (``do_anonymous_page()``, ``do_swap_page()``, +``finish_fault()``) and the resulting PTE is not RWP-protected. To observe +the population itself, co-register the range with +``UFFDIO_REGISTER_MODE_MISSING``. + +Protection is preserved across page reclaim: a page swapped out while +RWP-protected carries the marker on its swap entry, and swap-in restores +the PROT_NONE state so the first access after swap-in still faults. The +same applies to pages temporarily replaced by migration entries. + +Operations that drop the PTE entirely =E2=80=94 ``MADV_DONTNEED`` on anony= mous +memory, hole-punch on shmem, truncation of a file mapping =E2=80=94 also d= rop the +RWP marker: the next access re-populates the range without protection. +Unlike WP (which persists via ``PTE_MARKER_UFFD_WP``), there is no +persistent RWP marker today. The user needs to re-arm the range with +``UFFDIO_RWPROTECT`` after any operation that explicitly frees PTEs. + +**Fault Handling:** + +When a protected page is accessed: + +- **Sync mode** (default): The faulting thread blocks and a + ``UFFD_PAGEFAULT_FLAG_RWP`` message is delivered to the userfaultfd + handler. The handler resolves the fault with ``UFFDIO_RWPROTECT`` + (clearing ``MODE_RWP``), which restores the PTE permissions and wakes + the faulting thread. + +- **Async mode** (``UFFD_FEATURE_RWP_ASYNC``): The kernel automatically + restores PTE permissions and the thread continues without blocking. No + message is delivered to the handler. + +**Runtime Mode Switching:** + +``UFFDIO_SET_MODE`` toggles ``UFFD_FEATURE_RWP_ASYNC`` at runtime, allowing +the VMM to switch between lightweight async detection and safe sync +eviction without re-registering. The toggle takes ``mmap_write_lock()`` +and calls ``vma_start_write()`` on each UFFD-armed VMA, draining +in-flight per-VMA-locked faults before the new mode takes effect. + +**Cold Page Detection with PAGEMAP_SCAN:** + +RWP-protected PTEs carry the uffd PTE bit; the fault-resolution path +clears it. ``PAGEMAP_SCAN`` reports ``PAGE_IS_ACCESSED`` once the bit is +clear on a ``VM_UFFD_RWP`` VMA, so inverting it efficiently reports the +still-protected (cold) pages. Require ``PAGE_IS_PRESENT`` too so memory +holes (which carry neither category bit) are filtered out:: + + struct pm_scan_arg arg =3D { + .size =3D sizeof(arg), + .start =3D guest_mem_start, + .end =3D guest_mem_end, + .vec =3D (uint64_t)regions, + .vec_len =3D regions_len, + .category_mask =3D PAGE_IS_PRESENT | PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + }; + long n =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &arg); + +The returned ``page_region`` array contains contiguous cold ranges that can +then be evicted. + +**Cleanup:** + +When the userfaultfd is closed or the range is unregistered, all PROT_NONE +PTEs are automatically restored to their normal VMA permissions. This +prevents pages from becoming permanently inaccessible. + +**VMM Working Set Tracking Workflow:** + +A typical VMM lifecycle for cold page eviction to tiered storage. Two +mappings of the same shmem (or hugetlbfs) file are used: ``guest_mem`` is +the RWP-registered mapping that vCPUs access through, and ``io_mem`` is a +private mapping for VMM-side I/O. Reading ``io_mem`` does not go through +the RWP-protected PTEs of ``guest_mem``, so the VMM's own ``pwrite()`` +never traps on its own :: + + /* One-time setup */ + fd =3D memfd_create("guest", MFD_CLOEXEC); + ftruncate(fd, guest_size); + guest_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* vCPU view, RWP-registered */ + io_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* VMM I/O view, unprotected */ + + uffd =3D userfaultfd(O_CLOEXEC | O_NONBLOCK); + struct uffdio_api api =3D { + .api =3D UFFD_API, + .features =3D UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }; + ioctl(uffd, UFFDIO_API, &api); + if (!(api.features & UFFD_FEATURE_RWP)) + /* RWP unavailable on this kernel/arch -- fall back. */ + ioctl(uffd, UFFDIO_REGISTER, &(struct uffdio_register){ + .range =3D { guest_mem, guest_size }, + .mode =3D UFFDIO_REGISTER_MODE_RWP | + UFFDIO_REGISTER_MODE_MISSING, + }); + + /* Tracking loop */ + while (vm_running) { + /* 1. Detection phase (async -- no vCPU stalls) */ + ioctl(uffd, UFFDIO_RWPROTECT, &(struct uffdio_rwprotect){ + .range =3D full_range, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP }); + sleep(tracking_interval); + + /* + * 2. Switch to sync BEFORE scanning. In async mode a vCPU + * access between the scan and any eviction step silently + * clears the uffd bit, so the scan would already disagree + * with the page state by the time eviction begins. Sync mode + * blocks vCPU accesses, freezing the cold snapshot for the + * rest of the iteration. + */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .disable =3D UFFD_FEATURE_RWP_ASYNC }); + + /* 3. Find cold pages (uffd bit still set, page present) */ + ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){ + .category_mask =3D PAGE_IS_PRESENT | PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + ... + }); + + /* 4. Evict cold pages (vCPU faults block on guest_mem) */ + for each cold range: + /* Read from io_mem -- bypasses RWP, no fault. */ + pwrite(storage_fd, (char *)io_mem + cold_offset, + len, cold_offset); + /* Drop the page from the shared file. */ + fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + cold_offset, len); + /* + * Wake any vCPU blocked on the RWP fault for this range: + * fallocate() does not iterate ctx->fault_pending_wqh. + */ + ioctl(uffd, UFFDIO_WAKE, &(struct uffdio_range){ + .start =3D (uintptr_t)guest_mem + cold_offset, + .len =3D len }); + + /* 5. Resume async tracking */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .enable =3D UFFD_FEATURE_RWP_ASYNC }); + } + +During step 4, a vCPU that accesses ``guest_mem + cold_offset`` blocks +with a ``UFFD_PAGEFAULT_FLAG_RWP`` fault while the eviction is in +progress. After ``fallocate()`` punches the page out and ``UFFDIO_WAKE`` +fires, the vCPU retries the access, faults as ``MISSING``, and the +handler resolves it with ``UFFDIO_COPY`` from storage. + +This workflow targets shmem and hugetlbfs (both support a private +``io_mem`` mapping over the same fd). Anonymous-memory backings need a +different inner-loop strategy because the VMM has no way to read the +page without going through the RWP-protected mapping. + QEMU/KVM =3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.54.0