From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5553E401490; Fri, 22 May 2026 13:39:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457156; cv=none; b=siLBMAvFEwjc/T5OsxTsF8ECMsRI3kbzAmBkhFZ/804GS41UxO9nRk1erSZtu38LAUexG/gjIhML34Hs7RXfE/prE6S49LPv/nahG8djrtsyh483yDo8UjImVC7nG0c6uLOClsvaezQTtmHbfYIKjDI+FFihe42clmuM2mzOocE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457156; c=relaxed/simple; bh=nATjgNMBsypNIB7WUpB9uXUDHuk1SLIJw2eIla6IFVc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PjDKmK99EoKECQ9nPcCNTwJMThZIwB87yn/vyAIBySVi0DcJAqx0I3NPndAWF37MrdG0kdeiRAmdex202FRbax5gZ/vH5MKT0dI04/HkA06WDjsEPhQok9OxneL7+EL6okSQmoxs6PT99gCRZggqjSdDnKxK5z4GV9RlQc0/dHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=lKbi+rpf; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=o1ODzb5g; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="lKbi+rpf"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="o1ODzb5g" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id 146267A0091; Fri, 22 May 2026 09:39:13 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Fri, 22 May 2026 09:39:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457152; x= 1779543552; bh=lFejW6pY3Lb6ZOpcKZvpdBFeHnt3oKZ73VgU/+PGrlY=; b=l Kbi+rpfbEC3Qk8XPZ5Vy3DaEr+Rb3aNoNc8AlZXpmB8MHfk+WNUy8ofaUEvEarYY 2y5vSHIBgp4gV8bEI77wZUAVd9kvsmKGe0gvXgk1QL8SAmxTu2JAd8DLkxHIq5xG H9rV9Dq35O0FaiNCH0hcvxj0kbonLDrfpCR7KiI4YXZY7WxKYpoPs3OGtBs/mxQd uhX4SDwYNgqWHxeReftJEABwLDwRdsfJjaaHaFr3srSWKRwunyGPxYaxOxbKiZXN IXZQUDvbp/exbVjydzBgbTyfLBUgCqe2Hleop6AIxQ+zNcms3W+65YXM2BWmGR0G fHl8i6S5Eo/wlKdk7886w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457152; x=1779543552; bh=l FejW6pY3Lb6ZOpcKZvpdBFeHnt3oKZ73VgU/+PGrlY=; b=o1ODzb5gRP8kwB6Ub swTZmlfvVk+3l2XMtArFoyzMxqky5WcjDmtKiTlN8AkunDRHTxuc5d2venhoud8k OiuZoLOXat2vCu9ZG3omzKIDEyv5jcB3RVWYblwdGqGPqaNVpr+2/Tdx5QrJlf9g EwLo+A7v4F2nmnDUWy+kMAcwEUCFgMiTk4uf/MJU+Fmbw/LNG98GI7RLe93zObJZ LOpOajoq0i89q4f/PCZPyrAR+BxDJub5z9UVT01ffzBVXWUdlRXRVMz4Ir5lwMqY RF91KyhuoKRqub58G+qrG6Pkoq9/EjCVOHRHjYkLicUR3sXM8FzOWltnWCZHBQCE YJrxg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:11 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 01/16] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Date: Fri, 22 May 2026 14:38:42 +0100 Message-ID: <20260522133857.552279-2-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" pte_protnone() and pmd_protnone() detect present-but-inaccessible page table entries. This capability is useful beyond NUMA balancing -- for example, userfaultfd working set tracking uses protnone PTEs to track page access without unmapping pages. Introduce CONFIG_ARCH_HAS_PTE_PROTNONE to decouple the protnone PTE infrastructure from CONFIG_NUMA_BALANCING. The six architectures that support protnone PTEs (x86_64, arm64, powerpc, s390, riscv, loongarch) now select this option, and CONFIG_NUMA_BALANCING depends on it. No functional change -- the same set of architectures continues to have working protnone support, but the infrastructure is now available independently of NUMA balancing. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Acked-by: SeongJae Park Acked-by: Mike Rapoport (Microsoft) --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 7 ++--- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/pgtable.h | 4 +-- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++--- arch/powerpc/platforms/Kconfig.cputype | 1 + arch/riscv/Kconfig | 1 + arch/riscv/include/asm/pgtable.h | 7 ++--- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 4 +-- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 8 ++--- include/linux/pgtable.h | 32 ++++++++++++++------ init/Kconfig | 8 +++++ mm/debug_vm_pgtable.c | 4 +-- 15 files changed, 52 insertions(+), 36 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fe60738e5943..319470b3b1bb 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -78,6 +78,7 @@ config ARM64 select ARCH_SUPPORTS_CFI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 4dfa42b7d053..873f4ea2e288 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -553,10 +553,7 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { /* @@ -575,7 +572,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 3b042dbb2c41..229b3d1b7056 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -67,6 +67,7 @@ config LOONGARCH select ARCH_SUPPORTS_LTO_CLANG select ARCH_SUPPORTS_LTO_CLANG_THIN select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING if NUMA select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_RT diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/= asm/pgtable.h index 2a0b63ae421f..d295447a2763 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -619,7 +619,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_s= truct *mm, =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline long pte_protnone(pte_t pte) { return (pte_val(pte) & _PAGE_PROTNONE); @@ -629,7 +629,7 @@ static inline long pmd_protnone(pmd_t pmd) { return (pmd_val(pmd) & _PAGE_PROTNONE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_leaf(pmd) ((pmd_val(pmd) & _PAGE_HUGE) !=3D 0) #define pud_leaf(pud) ((pud_val(pud) & _PAGE_HUGE) !=3D 0) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index e67e64ac6e8c..53a0c5892548 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -490,13 +490,13 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)= ) =3D=3D cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_hw_valid(pte_t pte) { @@ -1067,12 +1067,12 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #endif #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) =20 diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platform= s/Kconfig.cputype index bac02c83bb3e..36b64a24cf30 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -87,6 +87,7 @@ config PPC_BOOK3S_64 select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION select ARCH_ENABLE_SPLIT_PMD_PTLOCK select ARCH_SUPPORTS_HUGETLBFS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select HAVE_MOVE_PMD select HAVE_MOVE_PUD diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index d235396c4514..9eb4a9315bdf 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -71,6 +71,7 @@ config RISCV select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU select ARCH_SUPPORTS_PER_VMA_LOCK if MMU + select ARCH_HAS_PTE_PROTNONE if MMU select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK select ARCH_SUPPORTS_SCHED_MC if SMP diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index a1a7c6520a09..48a127323b21 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -524,10 +524,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) PAGE_SIZE) #endif =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/asm-generic/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) =3D=3D _PAGE_PR= OT_NONE; @@ -537,7 +534,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 /* Modify page protection bits */ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index ecbcbb781e40..bc5bef08454b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -151,6 +151,7 @@ config S390 select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && CC_IS_CLANG select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 2c6cee8241e0..97241dea5573 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -842,7 +842,7 @@ static inline int pte_same(pte_t a, pte_t b) return pte_val(a) =3D=3D pte_val(b); } =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return pte_present(pte) && !(pte_val(pte) & _PAGE_READ); @@ -853,7 +853,7 @@ static inline int pmd_protnone(pmd_t pmd) /* pmd_leaf(pmd) implies pmd_present(pmd) */ return pmd_leaf(pmd) && !(pmd_val(pmd) & _SEGMENT_ENTRY_READ); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_swp_exclusive(pte_t pte) { diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f3f7cb01d69d..9da1119e8ff6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select ARCH_SUPPORTS_DEBUG_PAGEALLOC select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64 + select ARCH_HAS_PTE_PROTNONE if X86_64 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <=3D 4096 select ARCH_SUPPORTS_CFI if X86_64 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2187e9cfcefa..c7f014cbf0a9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -985,11 +985,7 @@ static inline int pmd_present(pmd_t pmd) return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE); } =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * These work without NUMA balancing but the kernel does not care. See the - * comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT)) @@ -1001,7 +997,7 @@ static inline int pmd_protnone(pmd_t pmd) return (pmd_flags(pmd) & (_PAGE_PROTNONE | _PAGE_PRESENT)) =3D=3D _PAGE_PROTNONE; } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline int pmd_none(pmd_t pmd) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cdd68ed3ae1a..b6516a11adfa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -2052,18 +2052,26 @@ static inline int pud_trans_unstable(pud_t *pud) return 0; } =20 -#ifndef CONFIG_NUMA_BALANCING +#ifndef CONFIG_ARCH_HAS_PTE_PROTNONE /* - * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It is - * perfectly valid to indicate "no" in that case, which is why our default - * implementation defaults to "always no". + * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It + * is perfectly valid to indicate "no" in that case, which is why our + * default implementation defaults to "always no". * - * In an accessible VMA, however, pte_protnone() reliably indicates PROT_N= ONE - * page protection due to NUMA hinting. NUMA hinting faults only apply in - * accessible VMAs. + * In an accessible VMA, pte_protnone() reliably indicates a present + * PROT_NONE page protection. Today the kernel uses such PTEs for two + * purposes: NUMA hinting faults, and userfaultfd RWP tracking on + * VM_UFFD_RWP VMAs. The two are distinguished by the uffd PTE bit and + * the VMA flag; see include/linux/userfaultfd_k.h. * - * So, to reliably identify PROT_NONE PTEs that require a NUMA hinting fau= lt, - * looking at the VMA accessibility is sufficient. + * So, to reliably identify PROT_NONE PTEs that require kernel handling, + * looking at the VMA accessibility (and the uffd bit on RWP VMAs) is + * sufficient. + * + * Architectures without CONFIG_ARCH_HAS_PTE_PROTNONE get the always-zero + * stubs below; PAGE_NONE references that survive to runtime fire the + * BUILD_BUG() fallback, since callers should have folded such paths to + * dead code via IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE). */ static inline int pte_protnone(pte_t pte) { @@ -2074,7 +2082,11 @@ static inline int pmd_protnone(pmd_t pmd) { return 0; } -#endif /* CONFIG_NUMA_BALANCING */ + +#ifndef PAGE_NONE +#define PAGE_NONE ({ BUILD_BUG(); (pgprot_t){0}; }) +#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #endif /* CONFIG_MMU */ =20 diff --git a/init/Kconfig b/init/Kconfig index 2937c4d308ae..58abb7f19206 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -944,6 +944,13 @@ config SCHED_PROXY_EXEC =20 endmenu =20 +# +# For architectures that support present-but-inaccessible (PROT_NONE) page +# table entries detectable via pte_protnone() / pmd_protnone(): +# +config ARCH_HAS_PTE_PROTNONE + bool + # # For architectures that want to enable the support for NUMA-affine schedu= ler # balancing logic: @@ -1010,6 +1017,7 @@ config ARCH_WANT_NUMA_VARIABLE_LOCALITY config NUMA_BALANCING bool "Memory placement aware NUMA scheduler" depends on ARCH_SUPPORTS_NUMA_BALANCING + depends on ARCH_HAS_PTE_PROTNONE depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY depends on SMP && NUMA_MIGRATION && !PREEMPT_RT help diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 23dc3ee09561..5e9f3a35f924 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -672,7 +672,7 @@ static void __init pte_protnone_tests(struct pgtable_de= bug_args *args) { pte_t pte =3D pfn_pte(args->fixed_pte_pfn, args->page_prot_none); =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 pr_debug("Validating PTE protnone\n"); @@ -685,7 +685,7 @@ static void __init pmd_protnone_tests(struct pgtable_de= bug_args *args) { pmd_t pmd; =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 if (!has_transparent_hugepage()) --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C1C3402425; Fri, 22 May 2026 13:39:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457159; cv=none; b=JizbH8ESJBtXbUmep/xK/SNg6NWlbVLqwkz7EpkFHzNQ2i0ZKgxEto60DOZGKWEY+pGwBk3T80Cxa42JRz93AUBTWdtvDEkOXGQp9EhZ8ZS6QjCGnCY3t2fFIE1Asoq8mPHgwkNexPjis8EgmyGmseF2Z705jXJic5wg+REWnfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457159; c=relaxed/simple; bh=Z8m4yi1y7+g+fI47qOWGhzGaZYpiVjmKIPJzgFnPkEU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EYUUAxDlNY6n3INnGBJwElP07+OqmwZi3g23xoO0Kc+RUb7erh3obREjrKeU9dzTWfcBcCGWAIzlhVoN5zLy1vU0Vj69YFCy2ES61g4RG1baKwBDq2N1gsnUsaGdCwPcMHoA11K7q2VdfAu7ryCUmvWqJORDpGVkLBUSpGjVmL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=hC82Nue4; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=B01aBVPO; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="hC82Nue4"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="B01aBVPO" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 3BE361D000D2; Fri, 22 May 2026 09:39:16 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457156; x= 1779543556; bh=ElgLTz6VTfB5I9DvFSGb9HgYoqQwYVCAuKS8x03tgn8=; b=h C82Nue4T7i6zzbBTKyEmdK7Bcucf7TkTFPdi29v9fshD6PPp77Tid+A+rNoVq4QS Iw9lXwdDP6CkLHQBzbGT1yjyvWJdmCLDdrn1k7dTrvCT7Ylk2yoynLDgRFU1VdTt Cg87bhNQei11yWdF3281FqGcIv5WZwmW1F1NfUOtnyQZ96/REmL8B7kdJ8uyltWv WsoeYoYUurM8sHYfi9Ddocf3+momSoH2iFC4XtxB7KqFWLhDl7uSBfyaZXi9Pg2Z hZ5jejXvoujAG3zM3adElNlpVJ+CifNMTq7UjnyK1yuLZZYhMM261Sdu1k116OOf tfpSPo0UkFr6UCxLUN38w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457156; x=1779543556; bh=E lgLTz6VTfB5I9DvFSGb9HgYoqQwYVCAuKS8x03tgn8=; b=B01aBVPOU+RauqBKo lS+X7FmmIeEqVKSq01RsUrah/HAMzKb8HRJd7Iw89HJoB5i2iBZO5J7NT13thh0V Y+PeLtPEsVZM1yvRsfDRwqplEe7qUWd1gLlWNuruQCLuVkG7zIfHTdcLAJqkMl3g 4+ygWRQ78+jUrZR4blAVM5e2O0a59izR9AL3hPcDmMkVVIJcKPZnqBO7UTvWgfwf 27rE3sNrGbI0fODY4X0DvQ/fZDnPbP5idmm3hIYxQW9l9MxAuSNWU6lBHzC1frsO IoF6xpnBCHuFCoJCe4JB1PzFpLjNwzlVf2veCsp81IxRNiTJ9F9x4F+nJfTuMBTh ht7SQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:15 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 02/16] mm: rename uffd-wp PTE bit macros to uffd Date: Fri, 22 May 2026 14:38:43 +0100 Message-ID: <20260522133857.552279-3-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" The uffd-wp PTE bit is about to gain a second consumer: userfaultfd RWP will use the same bit to mark access-tracking PTEs, distinct from mprotect(PROT_NONE) or NUMA-hinting PTEs. WP vs RWP semantics come from the VMA flag; the bit is just "uffd has claimed this entry." Drop the "_wp" suffix from the arch-private bit macros so they reflect that. x86: _PAGE_BIT_UFFD_WP -> _PAGE_BIT_UFFD _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD arm64: PTE_UFFD_WP -> PTE_UFFD PTE_SWP_UFFD_WP -> PTE_SWP_UFFD riscv: _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable-prot.h | 8 ++++---- arch/arm64/include/asm/pgtable.h | 12 ++++++------ arch/riscv/include/asm/pgtable-bits.h | 12 ++++++------ arch/riscv/include/asm/pgtable.h | 14 +++++++------- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ arch/x86/include/asm/pgtable_types.h | 16 ++++++++-------- 6 files changed, 43 insertions(+), 43 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm= /pgtable-prot.h index 212ce1b02e15..09d7c00cf405 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -28,11 +28,11 @@ #define PTE_PRESENT_VALID_KERNEL (PTE_VALID | PTE_MAYBE_NG) =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ +#define PTE_UFFD (_AT(pteval_t, 1) << 58) /* userfaultfd tracking */ +#define PTE_SWP_UFFD (_AT(pteval_t, 1) << 3) /* only for swp ptes */ #else -#define PTE_UFFD_WP (_AT(pteval_t, 0)) -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define PTE_UFFD (_AT(pteval_t, 0)) +#define PTE_SWP_UFFD (_AT(pteval_t, 0)) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 873f4ea2e288..3eecb2c17711 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -343,17 +343,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_UFFD_WP); + return !!(pte_val(pte) & PTE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP))); + return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -539,17 +539,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_SWP_UFFD_WP); + return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm= /pgtable-bits.h index b422d9691e60..d5a86b4df3ce 100644 --- a/arch/riscv/include/asm/pgtable-bits.h +++ b/arch/riscv/include/asm/pgtable-bits.h @@ -40,20 +40,20 @@ =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP =20 -/* ext_svrsw60t59b: Bit(60) for uffd-wp tracking */ -#define _PAGE_UFFD_WP \ +/* ext_svrsw60t59b: Bit(60) for userfaultfd tracking */ +#define _PAGE_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ (1UL << 60) : 0) /* * Bit 4 is not involved into swap entry computation, so we - * can borrow it for swap page uffd-wp tracking. + * can borrow it for swap page userfaultfd tracking. */ -#define _PAGE_SWP_UFFD_WP \ +#define _PAGE_SWP_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ _PAGE_USER : 0) #else -#define _PAGE_UFFD_WP 0 -#define _PAGE_SWP_UFFD_WP 0 +#define _PAGE_UFFD 0 +#define _PAGE_SWP_UFFD 0 #endif =20 #define _PAGE_TABLE _PAGE_PRESENT diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 48a127323b21..ca69948b3ed8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -405,32 +405,32 @@ static inline pte_t pte_wrprotect(pte_t pte) =20 static inline bool pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_UFFD_WP); + return !!(pte_val(pte) & _PAGE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD_WP)); + return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 static inline bool pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_SWP_UFFD_WP); + return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SWP_UFFD_WP); + return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1157,7 +1157,7 @@ static inline pud_t pud_modify(pud_t pud, pgprot_t ne= wprot) * bit 0: _PAGE_PRESENT (zero) * bit 1 to 2: (zero) * bit 3: _PAGE_SWP_SOFT_DIRTY - * bit 4: _PAGE_SWP_UFFD_WP + * bit 4: _PAGE_SWP_UFFD * bit 5: _PAGE_PROT_NONE (zero) * bit 6: exclusive marker * bits 7 to 11: swap type diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c7f014cbf0a9..038c806b50a2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -413,17 +413,17 @@ static inline pte_t pte_wrprotect(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_UFFD_WP; + return pte_flags(pte) & _PAGE_UFFD; } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD_WP)); + return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_UFFD_WP); + return pte_clear_flags(pte, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -528,17 +528,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_UFFD_WP; + return pmd_flags(pmd) & _PAGE_UFFD; } =20 static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) { - return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD_WP)); + return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1550,32 +1550,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_SWP_UFFD_WP; + return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 static inline int pmd_swp_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_SWP_UFFD_WP; + return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 2ec250ba467e..af08d98be930 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -31,7 +31,7 @@ =20 #define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 -#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ +#define _PAGE_BIT_UFFD _PAGE_BIT_SOFTW2 /* userfaultfd tracking */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted= to large */ =20 @@ -39,7 +39,7 @@ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD)= */ #else -/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ +/* Shared with _PAGE_BIT_UFFD which is not supported on 32 bit */ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD)= */ #endif @@ -111,11 +111,11 @@ #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define _PAGE_UFFD_WP (_AT(pteval_t, 1) << _PAGE_BIT_UFFD_WP) -#define _PAGE_SWP_UFFD_WP _PAGE_USER +#define _PAGE_UFFD (_AT(pteval_t, 1) << _PAGE_BIT_UFFD) +#define _PAGE_SWP_UFFD _PAGE_USER #else -#define _PAGE_UFFD_WP (_AT(pteval_t, 0)) -#define _PAGE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define _PAGE_UFFD (_AT(pteval_t, 0)) +#define _PAGE_SWP_UFFD (_AT(pteval_t, 0)) #endif =20 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) @@ -129,7 +129,7 @@ /* * The hardware requires shadow stack to be Write=3D0,Dirty=3D1. However, * there are valid cases where the kernel might create read-only PTEs that - * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * are dirty (e.g., fork(), mprotect(), userfaultfd, soft-dirty tracking).= In * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bi= t, * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have * (Write=3D0,SavedDirty=3D1,Dirty=3D0) set. @@ -151,7 +151,7 @@ #define _COMMON_PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | \ _PAGE_DIRTY_BITS | _PAGE_SOFT_DIRTY | \ - _PAGE_CC | _PAGE_UFFD_WP) + _PAGE_CC | _PAGE_UFFD) #define _PAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PAT) #define _HPAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_PAT_LAR= GE) =20 --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FCB74028D6; Fri, 22 May 2026 13:39:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457163; cv=none; b=EYeMP+oIgpBZ5pnKQDeplkeXz6vTXzigd8bW9Sq6IELMzAMNbpJKPu/eWZD0mFoGNiQsdlWN/N44aYADqiZDGsVebLyomAiwOFagnFLp756lWRnC8QqC14Z9Ke5yKIH6RN8ixaM6lzTUbNi1vGsZQsSw7KnGyOROlOxbVrAEFbY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457163; c=relaxed/simple; bh=2xBuSGVPr4uFsGy4eEye01rLK9xINLORj4hNTbIQbgM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I+nPYOZBI06RXSyVTKi85mz8dYD6ppAF4lA+SO/gl77mX8MRBRn1g8XO8mWQnRCNrwAdqhGbhHkCKOI8z95OL2rWJO3AJQOEkhSf8+zeOaID3uBNskZ5t0T7o8pQsPp+6w+Og/6adUe0Z+Az4MV9PZ9qg47f8uOWTsXi8D9j5rM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=i+JtCGBD; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=cZ/NfqaX; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="i+JtCGBD"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="cZ/NfqaX" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 118AF7A00A4; Fri, 22 May 2026 09:39:19 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457158; x= 1779543558; bh=YV1DMfM2ENXjqHvjFYKlyLqwgjRtdsy3NYnfHL6/NZ8=; b=i +JtCGBDQqNQkq2/+IItbbB6T/oN1c9FQndizhvn8qAllRrRHddCMai2h5GlRkxxG zdM3bp0uCtzemxynXC/6gceqDrmpFNWksNaYi+kqPs5TLQGdUQp673K7PUM/xUEr DR8/9nfkWlaubkB7A4jAbreeuSbj2NhmKmDfKpN4fUW6oQm4sbGuOGe8GthpH25C wWAp31VVSPpHqv56MrN9BWUUOW+GD7VOKu2IN7wWX4kAYTgvSIfJe3Q0pyJPbjkc lGsdocVfoG7YNg1oGKJ6EweMRD9Qic8qxObEGtBP+N71EyARxymcbrxP8Rz7EfZ0 R+YA29muMs0kHAgpzCUiA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457158; x=1779543558; bh=Y V1DMfM2ENXjqHvjFYKlyLqwgjRtdsy3NYnfHL6/NZ8=; b=cZ/NfqaXQE/vKKhYP hBGc2W7DRgoGpinN3+OJ52Y+D86qhBQxtlgWBFqVnVwuHZIEGSS0RZGVT0EIkyUR qvkeYBFfzJGSow097v1Hv3Z4PEiUcbCQ3mBPFIKf5WHUuloHcPyVUCbJ5RVmIlIO 9Ek3bY79PCk7MleQc9qU4fWMXs1OQ39MmI+0TwqDKuE9XnZyJZGH+vY8qsLhqhTG QDem4xt3wPRiNWAcZTHL8mxnVNwFu75FvD2x5FtyBDj+UN1/pAqg935oVBvOe184 zzPkkvAA4mtPNQWE37vfPSp+e1di+n578BSmPNFypWV5IZLXBFzE/TRFwZyAJgtk iXheQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:18 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 03/16] mm: rename uffd-wp PTE accessors to uffd Date: Fri, 22 May 2026 14:38:44 +0100 Message-ID: <20260522133857.552279-4-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Userfaultfd RWP will reuse the uffd-wp PTE bit to mark access-tracking PTEs, alongside the write-protected ones it already marks. The bit's meaning now depends on the VMA flag (WP or RWP), not on its name. Rename the kernel-internal names that describe the bit: - pte/pmd/huge_pte accessors (and swap variants) - pgtable_supports_uffd() capability query - SCAN_PTE_UFFD khugepaged enum The ftrace string emitted by mm_khugepaged_scan_pmd for this enum is kept as "pte_uffd_wp" so existing trace-based tooling keeps matching. Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable.h | 28 ++++++++-------- arch/riscv/include/asm/pgtable.h | 38 +++++++++++----------- arch/s390/include/asm/hugetlb.h | 12 +++---- arch/x86/include/asm/pgtable.h | 24 +++++++------- fs/proc/task_mmu.c | 44 ++++++++++++------------- fs/userfaultfd.c | 4 +-- include/asm-generic/hugetlb.h | 18 +++++------ include/asm-generic/pgtable_uffd.h | 32 +++++++++--------- include/linux/leafops.h | 4 +-- include/linux/mm_inline.h | 4 +-- include/linux/swapops.h | 4 +-- include/linux/userfaultfd_k.h | 14 ++++---- include/trace/events/huge_memory.h | 2 +- mm/huge_memory.c | 52 +++++++++++++++--------------- mm/hugetlb.c | 46 +++++++++++++------------- mm/internal.h | 4 +-- mm/khugepaged.c | 20 ++++++------ mm/memory.c | 34 +++++++++---------- mm/migrate.c | 12 +++---- mm/migrate_device.c | 8 ++--- mm/mprotect.c | 12 +++---- mm/mremap.c | 4 +-- mm/page_table_check.c | 8 ++--- mm/rmap.c | 16 ++++----- mm/swapfile.c | 4 +-- mm/userfaultfd.c | 2 +- 26 files changed, 225 insertions(+), 225 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 3eecb2c17711..c41e4d59dc9f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -341,17 +341,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } @@ -537,17 +537,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } @@ -590,13 +590,13 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_mkvalid_k(pmd) pte_pmd(pte_mkvalid_k(pmd_pte(pmd))) #define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) -#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) -#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) -#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd)) -#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))) -#define pmd_swp_clear_uffd_wp(pmd) \ - pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))) +#define pmd_uffd(pmd) pte_uffd(pmd_pte(pmd)) +#define pmd_mkuffd(pmd) pte_pmd(pte_mkuffd(pmd_pte(pmd))) +#define pmd_clear_uffd(pmd) pte_pmd(pte_clear_uffd(pmd_pte(pmd))) +#define pmd_swp_uffd(pmd) pte_swp_uffd(pmd_pte(pmd)) +#define pmd_swp_mkuffd(pmd) pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))) +#define pmd_swp_clear_uffd(pmd) \ + pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) @@ -1512,7 +1512,7 @@ static inline pmd_t pmdp_establish(struct vm_area_str= uct *vma, * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive - * bit 3: remember uffd-wp state + * bit 3: remember uffd state * bits 6-10: swap type * bit 11: PTE_PRESENT_INVALID (must be zero) * bits 12-61: swap offset diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index ca69948b3ed8..b111e134795e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -400,35 +400,35 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pgtable_supports_uffd_wp() \ +#define pgtable_supports_uffd() \ riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B) =20 -static inline bool pte_uffd_wp(pte_t pte) +static inline bool pte_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 -static inline bool pte_swp_uffd_wp(pte_t pte) +static inline bool pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } @@ -886,34 +886,34 @@ static inline pud_t pud_mkspecial(pud_t pud) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline bool pmd_uffd_wp(pmd_t pmd) +static inline bool pmd_uffd(pmd_t pmd) { - return pte_uffd_wp(pmd_pte(pmd)); + return pte_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { - return pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_clear_uffd(pmd_pte(pmd))); } =20 -static inline bool pmd_swp_uffd_wp(pmd_t pmd) +static inline bool pmd_swp_uffd(pmd_t pmd) { - return pte_swp_uffd_wp(pmd_pte(pmd)); + return pte_swp_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { - return pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 6983e52eaf81..cf8a176ff3d8 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,20 +77,20 @@ static inline void huge_ptep_set_wrprotect(struct mm_st= ruct *mm, __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } =20 -#define __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { return 0; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 038c806b50a2..d14c84b2a332 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -411,17 +411,17 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_UFFD; } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_UFFD); } @@ -526,17 +526,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pmd_uffd_wp(pmd_t pmd) +static inline int pmd_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_UFFD; } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_UFFD); } @@ -1548,32 +1548,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 751b9ba160fb..5827074962e7 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1948,14 +1948,14 @@ static pagemap_entry_t pte_to_pagemap_entry(struct = pagemapread *pm, page =3D vm_normal_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_uffd_wp(pte)) + if (pte_uffd(pte)) flags |=3D PM_UFFD_WP; } else { softleaf_t entry; =20 if (pte_swp_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) flags |=3D PM_UFFD_WP; entry =3D softleaf_from_pte(pte); if (pm->show_pfn) { @@ -2021,7 +2021,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_PRESENT; if (pmd_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) flags |=3D PM_UFFD_WP; if (pm->show_pfn) frame =3D pmd_pfn(pmd) + idx; @@ -2040,7 +2040,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_swp_uffd_wp(pmd)) + if (pmd_swp_uffd(pmd)) flags |=3D PM_UFFD_WP; VM_WARN_ON_ONCE(!pmd_is_migration_entry(pmd)); page =3D softleaf_to_page(entry); @@ -2146,14 +2146,14 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, !hugetlb_pmd_shared(ptep)) flags |=3D PM_MMAP_EXCLUSIVE; =20 - if (huge_pte_uffd_wp(pte)) + if (huge_pte_uffd(pte)) flags |=3D PM_UFFD_WP; =20 flags |=3D PM_PRESENT; if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); - } else if (pte_swp_uffd_wp_any(pte)) { + } else if (pte_swp_uffd_any(pte)) { flags |=3D PM_UFFD_WP; } =20 @@ -2354,7 +2354,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd_wp(pte)) + if (!pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2372,7 +2372,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; =20 entry =3D softleaf_from_pte(pte); @@ -2397,13 +2397,13 @@ static void make_uffd_wp_pte(struct vm_area_struct = *vma, pte_t old_pte; =20 old_pte =3D ptep_modify_prot_start(vma, addr, pte); - ptent =3D pte_mkuffd_wp(old_pte); + ptent =3D pte_mkuffd(old_pte); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); } else if (pte_none(ptent)) { set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); } else { - ptent =3D pte_swp_mkuffd_wp(ptent); + ptent =3D pte_swp_mkuffd(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); } } @@ -2422,7 +2422,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd_wp(pmd)) + if (!pmd_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2437,7 +2437,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd_wp(pmd)) + if (!pmd_swp_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2461,10 +2461,10 @@ static void make_uffd_wp_pmd(struct vm_area_struct = *vma, =20 if (pmd_present(pmd)) { old =3D pmdp_invalidate_ad(vma, addr, pmdp); - pmd =3D pmd_mkuffd_wp(old); + pmd =3D pmd_mkuffd(old); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } else if (pmd_is_migration_entry(pmd)) { - pmd =3D pmd_swp_mkuffd_wp(pmd); + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } } @@ -2486,7 +2486,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd_wp(pte)) + if (!huge_pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; @@ -2497,7 +2497,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2525,10 +2525,10 @@ static void make_uffd_wp_huge_pte(struct vm_area_st= ruct *vma, =20 if (softleaf_is_migration(entry)) set_huge_pte_at(vma->vm_mm, addr, ptep, - pte_swp_mkuffd_wp(ptent), psize); + pte_swp_mkuffd(ptent), psize); else huge_ptep_modify_prot_commit(vma, addr, ptep, ptent, - huge_pte_mkuffd_wp(ptent)); + huge_pte_mkuffd(ptent)); } #endif /* CONFIG_HUGETLB_PAGE */ =20 @@ -2759,8 +2759,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) @@ -2777,8 +2777,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, unsigned long next =3D addr + PAGE_SIZE; pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; ret =3D pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, p, addr, &next); diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 4b53dc4a3266..0fdf28f62702 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1287,7 +1287,7 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) vm_flags |=3D VM_UFFD_MISSING; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) { - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) goto out; =20 vm_flags |=3D VM_UFFD_WP; @@ -1997,7 +1997,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, uffdio_api.features &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; =20 if (!uffd_supports_wp_marker()) { diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index e1a2e1b7c8e7..635c41cc3479 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -37,24 +37,24 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t= newprot) return pte_modify(pte, newprot); } =20 -#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { - return huge_pte_wrprotect(pte_mkuffd_wp(pte)); + return huge_pte_wrprotect(pte_mkuffd(pte)); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { - return pte_clear_uffd_wp(pte); + return pte_clear_uffd(pte); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { - return pte_uffd_wp(pte); + return pte_uffd(pte); } #endif =20 diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtab= le_uffd.h index 0d85791efdf7..30e88fc1de2f 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -2,79 +2,79 @@ #define _ASM_GENERIC_PGTABLE_UFFD_H =20 /* - * Some platforms can customize the uffd-wp bit, making it unavailable + * Some platforms can customize the uffd PTE bit, making it unavailable * even if the architecture provides the resource. * Adding this API allows architectures to add their own checks for the * devices on which the kernel is running. * Note: When overriding it, please make sure the * CONFIG_HAVE_ARCH_USERFAULTFD_WP is part of this macro. */ -#ifndef pgtable_supports_uffd_wp -#define pgtable_supports_uffd_wp() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD= _WP) +#ifndef pgtable_supports_uffd +#define pgtable_supports_uffd() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) #endif =20 static inline bool uffd_supports_wp_marker(void) { - return pgtable_supports_uffd_wp() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP= ); + return pgtable_supports_uffd() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP); } =20 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static __always_inline int pte_uffd_wp(pte_t pte) +static __always_inline int pte_uffd(pte_t pte) { return 0; } =20 -static __always_inline int pmd_uffd_wp(pmd_t pmd) +static __always_inline int pmd_uffd(pmd_t pmd) { return 0; } =20 -static __always_inline pte_t pte_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_clear_uffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline int pte_swp_uffd_wp(pte_t pte) +static __always_inline int pte_swp_uffd(pte_t pte) { return 0; } =20 -static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte; } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd; } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return 0; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd; } diff --git a/include/linux/leafops.h b/include/linux/leafops.h index 992cd8bd8ed0..2ce2f37ac883 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -100,8 +100,8 @@ static inline softleaf_t softleaf_from_pmd(pmd_t pmd) =20 if (pmd_swp_soft_dirty(pmd)) pmd =3D pmd_swp_clear_soft_dirty(pmd); - if (pmd_swp_uffd_wp(pmd)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + if (pmd_swp_uffd(pmd)) + pmd =3D pmd_swp_clear_uffd(pmd); arch_entry =3D __pmd_to_swp_entry(pmd); =20 /* Temporary until swp_entry_t eliminated. */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..2811caf4188d 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -600,14 +600,14 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, return false; =20 /* A uffd-wp wr-protected normal pte */ - if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + if (unlikely(pte_present(pteval) && pte_uffd(pteval))) arm_uffd_pte =3D true; =20 /* * A uffd-wp wr-protected swap pte. Note: this should even cover an * existing pte marker with uffd-wp bit set. */ - if (unlikely(pte_swp_uffd_wp_any(pteval))) + if (unlikely(pte_swp_uffd_any(pteval))) arm_uffd_pte =3D true; =20 if (unlikely(arm_uffd_pte)) { diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 8cfc966eae48..15c6440e38dd 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -73,8 +73,8 @@ static inline pte_t pte_swp_clear_flags(pte_t pte) pte =3D pte_swp_clear_exclusive(pte); if (pte_swp_soft_dirty(pte)) pte =3D pte_swp_clear_soft_dirty(pte); - if (pte_swp_uffd_wp(pte)) - pte =3D pte_swp_clear_uffd_wp(pte); + if (pte_swp_uffd(pte)) + pte =3D pte_swp_clear_uffd(pte); return pte; } =20 diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index d2920f98ab86..98f546e83cd2 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -225,13 +225,13 @@ static inline bool userfaultfd_minor(struct vm_area_s= truct *vma) static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { - return userfaultfd_wp(vma) && pte_uffd_wp(pte); + return userfaultfd_wp(vma) && pte_uffd(pte); } =20 static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, pmd_t pmd) { - return userfaultfd_wp(vma) && pmd_uffd_wp(pmd); + return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 static inline bool userfaultfd_armed(struct vm_area_struct *vma) @@ -308,10 +308,10 @@ static inline bool userfaultfd_wp_use_markers(struct = vm_area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { if (!uffd_supports_wp_marker()) return false; @@ -319,7 +319,7 @@ static inline bool pte_swp_uffd_wp_any(pte_t pte) if (pte_present(pte)) return false; =20 - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) return true; =20 if (pte_is_uffd_wp_marker(pte)) @@ -460,7 +460,7 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er * forms (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { return false; } diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index bcdc57eea270..b4a314b06aef 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -16,7 +16,7 @@ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ - EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ + EM( SCAN_PTE_UFFD, "pte_uffd_wp") \ EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 970e077019b7..d88fcccd386d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1884,8 +1884,8 @@ static void copy_huge_non_present_pmd( pmd =3D swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } else if (softleaf_is_device_private(entry)) { /* @@ -1898,8 +1898,8 @@ static void copy_huge_non_present_pmd( =20 if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } =20 @@ -1919,7 +1919,7 @@ static void copy_huge_non_present_pmd( mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } =20 @@ -2015,7 +2015,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); pmdp_set_wrprotect(src_mm, addr, src_pmd); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2556,9 +2556,9 @@ static pmd_t clear_uffd_wp_pmd(pmd_t pmd) if (pmd_none(pmd)) return pmd; if (pmd_present(pmd)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); else - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); =20 return pmd; } @@ -2643,9 +2643,9 @@ static void change_non_present_huge_pmd(struct mm_str= uct *mm, } =20 if (uffd_wp) - newpmd =3D pmd_swp_mkuffd_wp(newpmd); + newpmd =3D pmd_swp_mkuffd(newpmd); else if (uffd_wp_resolve) - newpmd =3D pmd_swp_clear_uffd_wp(newpmd); + newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); } @@ -2726,14 +2726,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, =20 entry =3D pmd_modify(oldpmd, newprot); if (uffd_wp) - entry =3D pmd_mkuffd_wp(entry); + entry =3D pmd_mkuffd(entry); else if (uffd_wp_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly * handled. */ - entry =3D pmd_clear_uffd_wp(entry); + entry =3D pmd_clear_uffd(entry); =20 /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && @@ -3076,8 +3076,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, =20 entry =3D pfn_pte(zero_pfn(addr), vma->vm_page_prot); entry =3D pte_mkspecial(entry); - if (pmd_uffd_wp(old_pmd)) - entry =3D pte_mkuffd_wp(entry); + if (pmd_uffd(old_pmd)) + entry =3D pte_mkuffd(entry); VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3161,7 +3161,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_migration_write(entry); if (PageAnon(page)) @@ -3177,7 +3177,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_device_private_write(entry); anon_exclusive =3D PageAnonExclusive(page); @@ -3234,7 +3234,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, write =3D pmd_write(old_pmd); young =3D pmd_young(old_pmd); soft_dirty =3D pmd_soft_dirty(old_pmd); - uffd_wp =3D pmd_uffd_wp(old_pmd); + uffd_wp =3D pmd_uffd(old_pmd); =20 VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); @@ -3305,7 +3305,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3332,7 +3332,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3350,7 +3350,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); @@ -5017,8 +5017,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_wa= lk *pvmw, pmdswp =3D swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp =3D pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp =3D pmd_swp_mkuffd_wp(pmdswp); + if (pmd_uffd(pmdval)) + pmdswp =3D pmd_swp_mkuffd(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); folio_remove_rmap_pmd(folio, page, vma); folio_put(folio); @@ -5048,8 +5048,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) pmde =3D pmd_mksoft_dirty(pmde); if (softleaf_is_migration_write(entry)) pmde =3D pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_mkuffd(pmde); if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ @@ -5069,8 +5069,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) =20 if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde =3D pmd_swp_mksoft_dirty(pmde); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_swp_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_swp_mkuffd(pmde); } =20 if (folio_test_anon(folio)) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f24bf49be047..f770c6504e26 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4859,8 +4859,8 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte= _t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4934,10 +4934,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { - bool uffd_wp =3D pte_swp_uffd_wp(entry); + bool uffd =3D pte_swp_uffd(entry); =20 if (!softleaf_is_migration_read(softleaf) && cow) { /* @@ -4947,12 +4947,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + if (userfaultfd_wp(src_vma) && uffd) + entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { const pte_marker marker =3D copy_pte_marker(softleaf, dst_vma); @@ -5028,7 +5028,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } =20 if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); =20 set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); @@ -5076,9 +5076,9 @@ static void move_huge_pte(struct vm_area_struct *vma,= unsigned long old_addr, } else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -5212,7 +5212,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, * drop the uffd-wp bit in this zap, then replace the * pte with a marker. */ - if (pte_swp_uffd_wp_any(pte) && + if (pte_swp_uffd_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5248,7 +5248,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, if (huge_pte_dirty(pte)) folio_mark_dirty(folio); /* Leave a uffd-wp pte marker if needed */ - if (huge_pte_uffd_wp(pte) && + if (huge_pte_uffd(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5452,7 +5452,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) * can trigger this, because hugetlb_fault() will always resolve * uffd-wp bit first. */ - if (!unshare && huge_pte_uffd_wp(pte)) + if (!unshare && huge_pte_uffd(pte)) return 0; =20 /* Let's take out MAP_SHARED mappings first. */ @@ -5596,8 +5596,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) huge_ptep_clear_flush(vma, vmf->address, vmf->pte); hugetlb_remove_rmap(old_folio); hugetlb_add_new_anon_rmap(new_folio, vma, vmf->address); - if (huge_pte_uffd_wp(pte)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (huge_pte_uffd(pte)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(mm, vmf->address, vmf->pte, newpte, huge_page_size(h)); folio_set_hugetlb_migratable(new_folio); @@ -5875,7 +5875,7 @@ static vm_fault_t hugetlb_no_page(struct address_spac= e *mapping, * if populated. */ if (unlikely(pte_is_uffd_wp_marker(vmf->orig_pte))) - new_pte =3D huge_pte_mkuffd_wp(new_pte); + new_pte =3D huge_pte_mkuffd(new_pte); set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h)); =20 hugetlb_count_add(pages_per_huge_page(h), mm); @@ -6073,7 +6073,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, goto out_ptl; =20 /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(mm, vmf.address= , vmf.pte)) && + if (userfaultfd_wp(vma) && huge_pte_uffd(huge_ptep_get(mm, vmf.address, v= mf.pte)) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) { if (!userfaultfd_wp_async(vma)) { spin_unlock(vmf.ptl); @@ -6082,7 +6082,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, return handle_userfault(&vmf, VM_UFFD_WP); } =20 - vmf.orig_pte =3D huge_pte_clear_uffd_wp(vmf.orig_pte); + vmf.orig_pte =3D huge_pte_clear_uffd(vmf.orig_pte); set_huge_pte_at(mm, vmf.address, vmf.pte, vmf.orig_pte, huge_page_size(hstate_vma(vma))); /* Fallthrough to CoW */ @@ -6366,7 +6366,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, _dst_pte =3D pte_mkyoung(_dst_pte); =20 if (wp_enabled) - _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); + _dst_pte =3D huge_pte_mkuffd(_dst_pte); =20 set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte, size); =20 @@ -6490,9 +6490,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); } else if (unlikely(pte_is_marker(pte))) { @@ -6513,9 +6513,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); if (uffd_wp) - pte =3D huge_pte_mkuffd_wp(pte); + pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..b0c6d1621d7c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -413,8 +413,8 @@ static inline pte_t pte_move_swp_offset(pte_t pte, long= delta) new =3D pte_swp_mksoft_dirty(new); if (pte_swp_exclusive(pte)) new =3D pte_swp_mkexclusive(new); - if (pte_swp_uffd_wp(pte)) - new =3D pte_swp_mkuffd_wp(new); + if (pte_swp_uffd(pte)) + new =3D pte_swp_mkuffd(new); =20 return new; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b8452dbdb043..de0644bde400 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -37,7 +37,7 @@ enum scan_result { SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, - SCAN_PTE_UFFD_WP, + SCAN_PTE_UFFD, SCAN_PTE_MAPPED_HUGEPAGE, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -566,8 +566,8 @@ static enum scan_result __collapse_huge_page_isolate(st= ruct vm_area_struct *vma, result =3D SCAN_PTE_NON_PRESENT; goto out; } - if (pte_uffd_wp(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_uffd(pteval)) { + result =3D SCAN_PTE_UFFD; goto out; } page =3D vm_normal_page(vma, addr, pteval); @@ -1303,10 +1303,10 @@ static enum scan_result collapse_scan_pmd(struct mm= _struct *mm, /* * Always be strict with uffd-wp * enabled swap entries. Please see - * comment below for pte_uffd_wp(). + * comment below for pte_uffd(). */ - if (pte_swp_uffd_wp_any(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_swp_uffd_any(pteval)) { + result =3D SCAN_PTE_UFFD; goto out_unmap; } continue; @@ -1316,7 +1316,7 @@ static enum scan_result collapse_scan_pmd(struct mm_s= truct *mm, goto out_unmap; } } - if (pte_uffd_wp(pteval)) { + if (pte_uffd(pteval)) { /* * Don't collapse the page if any of the small * PTEs are armed with uffd write protection. @@ -1326,7 +1326,7 @@ static enum scan_result collapse_scan_pmd(struct mm_s= truct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result =3D SCAN_PTE_UFFD_WP; + result =3D SCAN_PTE_UFFD; goto out_unmap; } =20 @@ -1534,7 +1534,7 @@ static enum scan_result try_collapse_pte_mapped_thp(s= truct mm_struct *mm, unsign =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ if (userfaultfd_wp(vma)) - return SCAN_PTE_UFFD_WP; + return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); @@ -2876,7 +2876,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, /* Whitelisted set of results where continuing OK */ case SCAN_NO_PTE_TABLE: case SCAN_PTE_NON_PRESENT: - case SCAN_PTE_UFFD_WP: + case SCAN_PTE_UFFD: case SCAN_LACK_REFERENCED_PAGE: case SCAN_PAGE_NULL: case SCAN_PAGE_COUNT: diff --git a/mm/memory.c b/mm/memory.c index ea6568571131..f2e7e900b1b8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -877,8 +877,8 @@ static void restore_exclusive_pte(struct vm_area_struct= *vma, if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_mksoft_dirty(pte); =20 - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_mkuffd(pte); =20 if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { @@ -968,8 +968,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_private(entry)) { @@ -1002,8 +1002,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, entry =3D make_readable_device_private_entry( swp_offset(entry)); pte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_exclusive(entry)) { @@ -1026,7 +1026,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, return 0; } if (!userfaultfd_wp(dst_vma)) - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } @@ -1074,7 +1074,7 @@ copy_present_page(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) /* Uffd-wp needs to be delivered to dest pte as well */ - pte =3D pte_mkuffd_wp(pte); + pte =3D pte_mkuffd(pte); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1097,7 +1097,7 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkold(pte); =20 if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); =20 set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } @@ -3909,8 +3909,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (unlikely(unshare)) { if (pte_soft_dirty(vmf->orig_pte)) entry =3D pte_mksoft_dirty(entry); - if (pte_uffd_wp(vmf->orig_pte)) - entry =3D pte_mkuffd_wp(entry); + if (pte_uffd(vmf->orig_pte)) + entry =3D pte_mkuffd(entry); } else { entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); } @@ -4245,7 +4245,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * etc.) because we're only removing the uffd-wp bit, * which is completely invisible to the user. */ - pte =3D pte_clear_uffd_wp(ptep_get(vmf->pte)); + pte =3D pte_clear_uffd(ptep_get(vmf->pte)); =20 set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); /* @@ -5077,8 +5077,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte =3D mk_pte(page, vma->vm_page_prot); if (pte_swp_soft_dirty(vmf->orig_pte)) pte =3D pte_mksoft_dirty(pte); - if (pte_swp_uffd_wp(vmf->orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(vmf->orig_pte)) + pte =3D pte_mkuffd(pte); =20 /* * Same logic as in do_wp_page(); however, optimize for pages that are @@ -5294,7 +5294,7 @@ void map_anon_folio_pte_nopf(struct folio *folio, pte= _t *pte, if (vma->vm_flags & VM_WRITE) entry =3D pte_mkwrite(pte_mkdirty(entry), vma); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 folio_ref_add(folio, nr_pages - 1); folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); @@ -5360,7 +5360,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } if (vmf_orig_pte_uffd_wp(vmf)) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); set_pte_at(vma->vm_mm, addr, vmf->pte, entry); =20 /* No need to invalidate - it was non-present before */ @@ -5609,7 +5609,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio= *folio, else if (pte_write(entry) && folio_test_dirty(folio)) entry =3D pte_mkdirty(entry); if (unlikely(vmf_orig_pte_uffd_wp(vmf))) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { VM_BUG_ON_FOLIO(nr !=3D 1, folio); diff --git a/mm/migrate.c b/mm/migrate.c index 8a64291ab5b4..9d81b7b881ec 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -326,8 +326,8 @@ static bool try_to_map_unused_to_zeropage(struct page_v= ma_mapped_walk *pvmw, =20 if (pte_swp_soft_dirty(old_pte)) newpte =3D pte_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(old_pte)) - newpte =3D pte_mkuffd_wp(newpte); + if (pte_swp_uffd(old_pte)) + newpte =3D pte_mkuffd(newpte); =20 set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 @@ -391,8 +391,8 @@ static bool remove_migration_pte(struct folio *folio, =20 if (softleaf_is_migration_write(entry)) pte =3D pte_mkwrite(pte, vma); - else if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_mkuffd_wp(pte); + else if (pte_swp_uffd(old_pte)) + pte =3D pte_mkuffd(pte); =20 if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; @@ -407,8 +407,8 @@ static bool remove_migration_pte(struct folio *folio, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(old_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(old_pte)) + pte =3D pte_swp_mkuffd(pte); } =20 #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/migrate_device.c b/mm/migrate_device.c index fbfe5715f635..f4058688522d 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -445,13 +445,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (pte_present(pte)) { if (pte_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, addr, ptep, swp_pte); =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 9cbf932b028c..8340c8b228c6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -240,8 +240,8 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, */ entry =3D make_readable_device_private_entry(swp_offset(entry)); newpte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(oldpte)) - newpte =3D pte_swp_mkuffd_wp(newpte); + if (pte_swp_uffd(oldpte)) + newpte =3D pte_swp_mkuffd(newpte); } else if (softleaf_is_marker(entry)) { /* * Ignore error swap entries unconditionally, @@ -266,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { set_pte_at(vma->vm_mm, addr, pte, newpte); @@ -290,9 +290,9 @@ static __always_inline void change_present_ptes(struct = mmu_gather *tlb, ptent =3D pte_modify(oldpte, newprot); =20 if (uffd_wp) - ptent =3D pte_mkuffd_wp(ptent); + ptent =3D pte_mkuffd(ptent); else if (uffd_wp_resolve) - ptent =3D pte_clear_uffd_wp(ptent); + ptent =3D pte_clear_uffd(ptent); =20 /* * In some writable, shared mappings, we might want diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..12732a5c547e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -297,9 +297,9 @@ static int move_ptes(struct pagetable_move_control *pmc, else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 53a8997ec043..3fb995e5d40d 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -188,8 +188,8 @@ static inline bool softleaf_cached_writable(softleaf_t = entry) static void page_table_check_pte_flags(pte_t pte) { if (pte_present(pte)) { - WARN_ON_ONCE(pte_uffd_wp(pte) && pte_write(pte)); - } else if (pte_swp_uffd_wp(pte)) { + WARN_ON_ONCE(pte_uffd(pte) && pte_write(pte)); + } else if (pte_swp_uffd(pte)) { const softleaf_t entry =3D softleaf_from_pte(pte); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); @@ -216,9 +216,9 @@ EXPORT_SYMBOL(__page_table_check_ptes_set); static inline void page_table_check_pmd_flags(pmd_t pmd) { if (pmd_present(pmd)) { - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) WARN_ON_ONCE(pmd_write(pmd)); - } else if (pmd_swp_uffd_wp(pmd)) { + } else if (pmd_swp_uffd(pmd)) { const softleaf_t entry =3D softleaf_from_pmd(pmd); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); diff --git a/mm/rmap.c b/mm/rmap.c index 78b7fb5f367c..05056c213203 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2316,13 +2316,13 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, if (likely(pte_present(pteval))) { if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { @@ -2690,14 +2690,14 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { swp_pte =3D swp_entry_to_pte(entry); if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, diff --git a/mm/swapfile.c b/mm/swapfile.c index 9174f1eeffb0..9119efef7fe6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2336,8 +2336,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, new_pte =3D pte_mkold(mk_pte(page, vma->vm_page_prot)); if (pte_swp_soft_dirty(old_pte)) new_pte =3D pte_mksoft_dirty(new_pte); - if (pte_swp_uffd_wp(old_pte)) - new_pte =3D pte_mkuffd_wp(new_pte); + if (pte_swp_uffd(old_pte)) + new_pte =3D pte_mkuffd(new_pte); setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 885da1e56466..d546ffd2f165 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -358,7 +358,7 @@ static int mfill_atomic_install_pte(pmd_t *dst_pmd, if (writable) _dst_pte =3D pte_mkwrite(_dst_pte, dst_vma); if (flags & MFILL_ATOMIC_WP) - _dst_pte =3D pte_mkuffd_wp(_dst_pte); + _dst_pte =3D pte_mkuffd(_dst_pte); =20 ret =3D -EAGAIN; dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC8F54028E9; Fri, 22 May 2026 13:39:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457164; cv=none; b=s8LWSvzhx2PCOywhyncQL9p3yHv84n+vMwIL1k2hwn1i2mjNlOWZBZbROhhf3At8h7cE0Jj3pmD17+sY/F+wAZX9wZgU3Kpdfcz+LBYetEOCaw0oDGgRv/LJHz2jJDUlON1yO303GaRYo1+a2NFARKBZF5Nr8NwIr1HQcu/EfcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457164; c=relaxed/simple; bh=PD2OIoVtkrwFsxresBEM/uW/32z1i1mCxENWFOonqc4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y0NcfzJmC699SyscZCEUgUKgP2gaW0cNK+eGjKd/o2gI2xZ9ey343ZpUifffuuSIffV/ojdMHM9UvH3lRsANlYpnMdG5xOw1ZPNbNcyClX2yO9et0RO9DEt1vBPp6sCO9wk5ge5wgk5VrpO599dTM0Uvezqyya9w25KeFlkhmI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=1WZTwWO7; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=fXd3rsOl; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="1WZTwWO7"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="fXd3rsOl" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 7C37D7A00AA; Fri, 22 May 2026 09:39:21 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457161; x= 1779543561; bh=7zkzluoKSh1vVnMCrZuyZEWzh9HfyMvX1jib/QCCeO4=; b=1 WZTwWO7mGy1xxI8nRDN7yIJcVVra6RJYq5Dp8G36ax9QJ9kcIj8IQ72r9ITt4Mmv WJi3E6rL7/Aoj9af1glY2Pb4UeNox7chpOlIFvBcNxevBZn4/mM6HUaoURMHLl8N pbuhPkI7NSK72YwqRV3E7xvRhqVZpjLTmEwqG6WxAmIg5aFNrUOH2UQ96pxysDD/ xv/u6/2dg+tSSYReNsJnt9VTJCuxFBR9oF1qjCtlJ9uKgYkypD3rNO0GL4wY8144 VwHG1ZxSlHDMhSDGkRgTJwAEy6MS02ZUT41LeeCkKMcGuLTbsnmlOeInhl20VrHN j6BoaG8svhimz0SiHFL9g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457161; x=1779543561; bh=7 zkzluoKSh1vVnMCrZuyZEWzh9HfyMvX1jib/QCCeO4=; b=fXd3rsOllWvzI46r9 MKLzyF/BgxU6B0H7YaIx4iE4EPAzA0Xc7JHaX57GyFLl/5Pm3FFxi4gwa3QN2TsW 6LMlDZPMq3lbo4ix97OEAwh9OyocB8OuV1UQgHPkVX5P1JTgxcPNR+M0nKhnja3a 6ysd/p8HslOVFkWYJwqP8Zzv4r/Iv1VC+uvA7PqM/nwfIk3fcro57wl4CfGSGSnj /YCVo9gZSSxiHDu3bgSqcCy5NOVvVkbiI5FPUfrTMTaWGXqcSzoKCJntGw2jp4K3 JJAih7TNFRs6e9l7aitzQjECNM+HUKJbtgVnVbG42Yj6HScoKj+UtkYIO2UYZMNm 2hJJA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedvnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:20 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 04/16] mm: add VM_UFFD_RWP VMA flag Date: Fri, 22 May 2026 14:38:45 +0100 Message-ID: <20260522133857.552279-5-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Preparatory patch for userfaultfd read-write protection (RWP). RWP extends userfaultfd protection from plain write-protection (WP) to full read-write protection: accesses to an RWP-protected range -- reads as well as writes -- trap through userfaultfd. Reserve VM_UFFD_RWP, add the userfaultfd_rwp() and userfaultfd_protected() helpers, and wire up the smaps "ur" entry and the trace-flag table the rest of the series will use. The flag is gated on CONFIG_USERFAULTFD_RWP, which is introduced together with the UAPI in a later patch; until then VM_UFFD_RWP aliases VM_NONE and every downstream check folds to dead code. Nothing sets or queries the flag yet. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 28 ++++++++++++++++--------- include/linux/userfaultfd_k.h | 33 ++++++++++++++++++++++++------ include/trace/events/mmflags.h | 7 +++++++ 5 files changed, 56 insertions(+), 16 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index db6167befb7b..db28207c5290 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -607,6 +607,7 @@ encoded manner. The codes are the following: um userfaultfd missing tracking uw userfaultfd wr-protect tracking ui userfaultfd minor fault + ur userfaultfd read-write-protect tracking ss shadow/guarded control stack page sl sealed lf lock on fault pages diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5827074962e7..fbaede228201 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1206,6 +1206,9 @@ static void show_smap_vma_flags(struct seq_file *m, s= truct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] =3D "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_USERFAULTFD_RWP + [ilog2(VM_UFFD_RWP)] =3D "ur", +#endif #ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] =3D "ss", #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 0b776907152e..3f53d1e978c0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -353,6 +353,7 @@ enum { #endif DECLARE_VMA_BIT(UFFD_MINOR, 41), DECLARE_VMA_BIT(SEALED, 42), + DECLARE_VMA_BIT(UFFD_RWP, 43), /* Flags that reuse flags above. */ DECLARE_VMA_BIT_ALIAS(PKEY_BIT0, HIGH_ARCH_0), DECLARE_VMA_BIT_ALIAS(PKEY_BIT1, HIGH_ARCH_1), @@ -496,6 +497,11 @@ enum { #else #define VM_UFFD_MINOR VM_NONE #endif +#ifdef CONFIG_USERFAULTFD_RWP +#define VM_UFFD_RWP INIT_VM_FLAG(UFFD_RWP) +#else +#define VM_UFFD_RWP VM_NONE +#endif #ifdef CONFIG_64BIT #define VM_ALLOW_ANY_UNCACHED INIT_VM_FLAG(ALLOW_ANY_UNCACHED) #define VM_SEALED INIT_VM_FLAG(SEALED) @@ -633,22 +639,24 @@ enum { * reconsistuted upon page fault, so necessitate page table copying upon f= ork. * * Note that these flags should be compared with the DESTINATION VMA not t= he - * source, as VM_UFFD_WP may not be propagated to destination, while all o= ther - * flags will be. + * source: VM_UFFD_WP and VM_UFFD_RWP may be cleared on the destination + * (dup_userfaultfd() -> userfaultfd_reset_ctx() when the parent context d= id + * not negotiate UFFD_FEATURE_EVENT_FORK), while all other flags propagate. * * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot= be * reasonably reconstructed on page fault. * * VM_UFFD_WP - Encodes metadata about an installed uffd - * write protect handler, which cannot be - * reconstructed on page fault. + * VM_UFFD_RWP write- or read-write-protect handler, which + * cannot be reconstructed on page fault. * - * We always copy pgtables when dst_vma has uffd= -wp - * enabled even if it's file-backed - * (e.g. shmem). Because when uffd-wp is enabled, - * pgtable contains uffd-wp protection informati= on, - * that's something we can't retrieve from page = cache, - * and skip copying will lose those info. + * We always copy pgtables when dst_vma has the + * uffd PTE bit in use even if it's file-backed + * (e.g. shmem). Because when the uffd bit is + * in use, the pgtable contains the protection + * information, that's something we can't + * retrieve from page cache, and skip copying + * will lose those info. * * VM_MAYBE_GUARD - Could contain page guard region markers which * by design are a property of the page tables diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 98f546e83cd2..889c7b45fec8 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -21,10 +21,11 @@ #include =20 /* The set of all possible UFFD-related VM flags. */ -#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_MINOR | \ + VM_UFFD_WP | VM_UFFD_RWP) =20 -#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BI= T, \ - VMA_UFFD_MINOR_BIT) +#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_MINOR= _BIT, \ + VMA_UFFD_WP_BIT, VMA_UFFD_RWP_BIT) =20 /* * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining @@ -192,7 +193,7 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(stru= ct vm_area_struct *vma, */ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { - return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); + return vma->vm_flags & (VM_UFFD_MINOR | VM_UFFD_WP | VM_UFFD_RWP); } =20 /* @@ -222,6 +223,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return vma->vm_flags & VM_UFFD_MINOR; } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_RWP; +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return userfaultfd_wp(vma) || userfaultfd_rwp(vma); +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -364,6 +375,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return false; } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return false; +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -457,8 +478,8 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ static inline bool pte_swp_uffd_any(pte_t pte) { diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a6e5a44c9b42..bfface3d0203 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -194,6 +194,12 @@ IF_HAVE_PG_ARCH_3(arch_3) # define IF_HAVE_UFFD_MINOR(flag, name) #endif =20 +#ifdef CONFIG_USERFAULTFD_RWP +# define IF_HAVE_UFFD_RWP(flag, name) {flag, name}, +#else +# define IF_HAVE_UFFD_RWP(flag, name) +#endif + #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32) # define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name}, #else @@ -215,6 +221,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \ {VM_PFNMAP, "pfnmap" }, \ {VM_MAYBE_GUARD, "maybe_guard" }, \ {VM_UFFD_WP, "uffd_wp" }, \ +IF_HAVE_UFFD_RWP(VM_UFFD_RWP, "uffd_rwp" ) \ {VM_LOCKED, "locked" }, \ {VM_IO, "io" }, \ {VM_SEQ_READ, "seqread" }, \ --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE53F403EB0; Fri, 22 May 2026 13:39:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457167; cv=none; b=rS6kdUfbbQGKNPRT5240dt1SyTtKiKPKx/Ff8eVkiC+5A8pwmQZF4rsYeYvQ2rxHDq1tCR6cU5f9veYAbMXWZQ9xFV7LRW++45eRyG5d7xRG48Nwsyl0ebHDqz0Ab/eqM1TlcW6s89tpd4mJhiirnlLyGBvaFwRpdJoLyr2BewY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457167; c=relaxed/simple; bh=Jqhqw/h5DBCLt16ikGg5CeqDdocXma2h1wIWf2vw208=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hssxrr4OfvlPwgjnEfuAPLpTy+o0odLM2cpO5J5rbTM6zwrBgmGZz02Kcrw0GHAb81Zv9m1kB4mziRflCb22BcZJSJJoMYX0ktj4h9pkq8pqkoHdZ4np6idqo/+ovRgj8Thm8UVNciL4118Zima+DhHB8/FFbU0KpdrfpWaqTYg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=qzP/NMzV; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=DWpJZoir; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="qzP/NMzV"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="DWpJZoir" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfout.stl.internal (Postfix) with ESMTP id 8D2CA1D000E7; Fri, 22 May 2026 09:39:24 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 22 May 2026 09:39:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457164; x= 1779543564; bh=AEyMHg1oOHsuwGIljLpw3KfnAjCT3KTeRYSFg2+miss=; b=q zP/NMzVXVKgSNlFJSft4zW/bChML7pyvZaAJ/F56xdIj9LPatzAWQyOpxeCP0b/C gIbNg1XXiKd4ibxnDnDd7GNYOx9RPG8kQRlAcGSSNRJK066nJeTKEY9ilxWubtP8 3VeFOgGg5KQ8+u83EDjlWjtHsJNd8/3geRl2pSitb7xEXBDYU3w6eZS6GN8l5kYA myUwXFb+V2ZmS1qR0eyX5qJE3AxW4WxFetrCUP3OoX0TPDbqPK9Mfnf7gnjJSHel UK1MlopP4dD+Tk4XbbSvjEB7xq2XfJUnc/wk01g6c23Pe2LrDlyjDGoSUimEIKs8 yIGAsB3KXvbUpCZv7gFqw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457164; x=1779543564; bh=A EyMHg1oOHsuwGIljLpw3KfnAjCT3KTeRYSFg2+miss=; b=DWpJZoiry50QG/vQk OfR9VgpMdrpdZiIMQsW9ZVBxu/qRVRu+R/f1kvdLIZqUHLV8z3anuie2fe250zwv 7AyLVT1ai82ET05wrkWT4q2iEUuaUB6G+kmViEHwgYhzhqpi7EyRrZ0Rrr32TdjO T9clD4TyejXsfjdo0Rv19mtnxPDrA/OAHew98F/K07HMFyeyX5JWACjEgzYGsdDi 4TliQe/8EjM2xiIliH25RQMeSUogofHgwHNTjYwqgJhcm+6bzigzH8a2QGw1tKZm r82rq8tGL4YggAs9KEiLfQeuXxgBEMRjbFct8J5Q+rFoFgGTTp3MIzQuZ+R0GelW nRHEw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:23 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 05/16] mm: add MM_CP_UFFD_RWP change_protection() flag Date: Fri, 22 May 2026 14:38:46 +0100 Message-ID: <20260522133857.552279-6-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Preparatory patch. Add the change_protection() primitive that userfaultfd RWP will use. An RWP-protected PTE is PAGE_NONE with the uffd PTE bit set. The PROT_NONE half makes the CPU fault on any access; the uffd bit distinguishes an RWP fault from a plain mprotect(PROT_NONE) or NUMA hinting fault. MM_CP_UFFD_WP and MM_CP_UFFD_RWP share the same PTE bit, so the two cannot be used together on the same range. Two new change_protection() flags: MM_CP_UFFD_RWP install PAGE_NONE and set the uffd bit MM_CP_UFFD_RWP_RESOLVE restore vma->vm_page_prot, clear the uffd bit Both are wired through change_pte_range(), change_huge_pmd(), and hugetlb_change_protection() so anon, shmem, THP, and hugetlb all share the same semantics. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 5 ++++ include/linux/userfaultfd_k.h | 1 - mm/huge_memory.c | 30 +++++++++++++---------- mm/hugetlb.c | 25 ++++++++++++++----- mm/mprotect.c | 46 +++++++++++++++++++++++++++-------- 5 files changed, 77 insertions(+), 30 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3f53d1e978c0..9054468774b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3291,6 +3291,11 @@ int get_cmdline(struct task_struct *task, char *buff= er, int buflen); #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +/* Whether this change is for uffd RWP */ +#define MM_CP_UFFD_RWP (1UL << 4) /* do rwp */ +#define MM_CP_UFFD_RWP_RESOLVE (1UL << 5) /* resolve rwp */ +#define MM_CP_UFFD_RWP_ALL (MM_CP_UFFD_RWP | \ + MM_CP_UFFD_RWP_RESOLVE) =20 bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long add= r, pte_t pte); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 889c7b45fec8..07766398d592 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -397,7 +397,6 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_ar= ea_struct *vma, return false; } =20 - static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d88fcccd386d..befc919de69e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2615,8 +2615,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsign= ed long old_addr, } =20 static void change_non_present_huge_pmd(struct mm_struct *mm, - unsigned long addr, pmd_t *pmd, bool uffd_wp, - bool uffd_wp_resolve) + unsigned long addr, pmd_t *pmd, bool uffd_prot, + bool uffd_prot_resolve) { softleaf_t entry =3D softleaf_from_pmd(*pmd); const struct folio *folio =3D softleaf_to_folio(entry); @@ -2642,9 +2642,9 @@ static void change_non_present_huge_pmd(struct mm_str= uct *mm, newpmd =3D *pmd; } =20 - if (uffd_wp) + if (uffd_prot) newpmd =3D pmd_swp_mkuffd(newpmd); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); @@ -2665,8 +2665,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, spinlock_t *ptl; pmd_t oldpmd, entry; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); int ret =3D 1; =20 tlb_change_page_size(tlb, HPAGE_PMD_SIZE); @@ -2679,11 +2680,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, return 0; =20 if (thp_migration_supported() && pmd_is_valid_softleaf(*pmd)) { - change_non_present_huge_pmd(mm, addr, pmd, uffd_wp, - uffd_wp_resolve); + change_non_present_huge_pmd(mm, addr, pmd, uffd_prot, + uffd_prot_resolve); goto unlock; } =20 + /* Already in the desired state */ + if (prot_numa && pmd_protnone(*pmd)) + goto unlock; + if ((cp_flags & MM_CP_UFFD_RWP) && pmd_protnone(*pmd) && pmd_uffd(*pmd)) + goto unlock; + if (prot_numa) { =20 /* @@ -2694,9 +2701,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, if (is_huge_zero_pmd(*pmd)) goto unlock; =20 - if (pmd_protnone(*pmd)) - goto unlock; - if (!folio_can_map_prot_numa(pmd_folio(*pmd), vma, vma_is_single_threaded_private(vma))) goto unlock; @@ -2725,9 +2729,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, oldpmd =3D pmdp_invalidate_ad(vma, addr, pmd); =20 entry =3D pmd_modify(oldpmd, newprot); - if (uffd_wp) + if (uffd_prot) entry =3D pmd_mkuffd(entry); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f770c6504e26..3cdbf0057dce 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6409,6 +6409,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, unsigned long last_addr_mask; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; + bool uffd_rwp_resolve =3D cp_flags & MM_CP_UFFD_RWP_RESOLVE; struct mmu_gather tlb; =20 /* @@ -6434,6 +6436,11 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, =20 ptep =3D hugetlb_walk(vma, address, psize); if (!ptep) { + /* + * uffd_wp installs a pte marker on the unpopulated + * entry; uffd_rwp does not install markers so the + * allocation is unnecessary for it. + */ if (!uffd_wp) { address |=3D last_addr_mask; continue; @@ -6455,7 +6462,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, * shouldn't happen at all. Warn about it if it * happened due to some reason. */ - WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve || + uffd_rwp || uffd_rwp_resolve); pages++; spin_unlock(ptl); address |=3D last_addr_mask; @@ -6489,9 +6497,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pages++; } =20 - if (uffd_wp) + if (uffd_wp || uffd_rwp) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); @@ -6502,19 +6510,24 @@ long hugetlb_change_protection(struct vm_area_struc= t *vma, * pte_marker_uffd_wp()=3D=3Dtrue implies !poison * because they're mutual exclusive. */ - if (pte_is_uffd_wp_marker(pte) && uffd_wp_resolve) + if (pte_is_uffd_wp_marker(pte) && + (uffd_wp_resolve || uffd_rwp_resolve)) /* Safe to modify directly (non-present->none). */ huge_pte_clear(mm, address, ptep, psize); } else { pte_t old_pte; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); =20 + /* Already protnone with uffd bit set? Nothing to do. */ + if (uffd_rwp && pte_protnone(pte) && huge_pte_uffd(pte)) + goto next; + old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); - if (uffd_wp) + if (uffd_wp || uffd_rwp) pte =3D huge_pte_mkuffd(pte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; diff --git a/mm/mprotect.c b/mm/mprotect.c index 8340c8b228c6..4a6b35482aee 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -214,8 +214,9 @@ static __always_inline void set_write_prot_commit_flush= _ptes(struct vm_area_stru static long change_softleaf_pte(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, pte_t oldpte, unsigned long cp_flags) { - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); softleaf_t entry =3D softleaf_from_pte(oldpte); pte_t newpte; =20 @@ -256,7 +257,7 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, * to unprotect it, drop it; the next page * fault will trigger without uffd trapping. */ - if (uffd_wp_resolve) { + if (uffd_prot_resolve) { pte_clear(vma->vm_mm, addr, pte); return 1; } @@ -265,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, newpte =3D oldpte; } =20 - if (uffd_wp) + if (uffd_prot) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { @@ -282,16 +283,17 @@ static __always_inline void change_present_ptes(struc= t mmu_gather *tlb, int nr_ptes, unsigned long end, pgprot_t newprot, struct folio *folio, struct page *page, unsigned long cp_flags) { - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); pte_t ptent, oldpte; =20 oldpte =3D modify_prot_start_ptes(vma, addr, ptep, nr_ptes); ptent =3D pte_modify(oldpte, newprot); =20 - if (uffd_wp) + if (uffd_prot) ptent =3D pte_mkuffd(ptent); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 /* @@ -325,6 +327,7 @@ static long change_pte_range(struct mmu_gather *tlb, long pages =3D 0; bool is_private_single_threaded; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; int nr_ptes; =20 @@ -350,6 +353,14 @@ static long change_pte_range(struct mmu_gather *tlb, /* Already in the desired state. */ if (prot_numa && pte_protnone(oldpte)) continue; + /* + * RWP-protected PTEs carry _PAGE_UFFD as a marker on + * top of PROT_NONE. Skip only entries already in that + * exact state; plain PROT_NONE from mprotect() still needs + * to be promoted so future faults can be distinguished. + */ + if (uffd_rwp && pte_protnone(oldpte) && pte_uffd(oldpte)) + continue; =20 page =3D vm_normal_page(vma, addr, oldpte); if (page) @@ -358,6 +369,8 @@ static long change_pte_range(struct mmu_gather *tlb, /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. + * Skip this filter for uffd RWP which + * must set protnone regardless of NUMA placement. */ if (prot_numa && !folio_can_map_prot_numa(folio, vma, @@ -667,7 +680,16 @@ long change_protection(struct mmu_gather *tlb, pgprot_t newprot =3D vma->vm_page_prot; long pages; =20 - BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL); + /* + * MM_CP_UFFD_{WP,RWP} and _RESOLVE are mutually exclusive within one + * change, and WP and RWP cannot mix. Miswired callers get a warn and + * a no-op; userspace cannot reach this state. + */ + if (WARN_ON_ONCE((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL = || + (cp_flags & MM_CP_UFFD_RWP_ALL) =3D=3D MM_CP_UFFD_RWP_ALL || + ((cp_flags & MM_CP_UFFD_WP_ALL) && + (cp_flags & MM_CP_UFFD_RWP_ALL)))) + return 0; =20 #ifdef CONFIG_NUMA_BALANCING /* @@ -681,6 +703,10 @@ long change_protection(struct mmu_gather *tlb, WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA); #endif =20 + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE) && + (cp_flags & MM_CP_UFFD_RWP)) + newprot =3D PAGE_NONE; + if (is_vm_hugetlb_page(vma)) pages =3D hugetlb_change_protection(vma, start, end, newprot, cp_flags); --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B04D406270; Fri, 22 May 2026 13:39:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457170; cv=none; b=Ebm3nxND3Aa7y9KTMTeWtdzcc2EphXxxhqEnHLgDu3pxJmLhGVoCL0ufBmIvw+za3ErW7nRZVhLA2lEtDEYpHrvls05VUFrIqYEazc+Vm+Nr+JPlrQwXuWqsSS1CSmXn5oHzgXDHGYSClgduQCjuw9+ZTfMJxEnDBy3ZA71eWvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457170; c=relaxed/simple; bh=T6SxWsRd5exyC151K5uKEWRngtSU7bGh3JzBGBLDasU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mgrS36M713ncs9AZiHfpzg9XR2HBatJuh3K+hBGsl81D3HfOTTdyWQf7oizBoGHNE1sv06io8Vo0SIa9G0NZnYq6H1xM60ieeVua57mwiU4s8iL8EnDtPdGcw8ZbHYKyk3QsKIeZ3m55f94Jp1l1PZp3xpn2kE+v320IU1A9qPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=BpJuvh11; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=Y3Y/uCux; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="BpJuvh11"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Y3Y/uCux" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 172621D000C0; Fri, 22 May 2026 09:39:27 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457166; x= 1779543566; bh=4O70bbPQBhsg9A1RHU2I5MV8zYc4hA13AS1pM3gYLMI=; b=B pJuvh114IBpvIJUAR1pE2ADh+zpjut1nRgon42CU8PlpWSEyYI63hMFpgCm1sab6 pg9m06xWZpI4kByZu3zYQUce4feMSifQHNJC2LgiUHz+TaRzhxo//qGF9LSSJxBD qWv0c96rcnQTCY+SBgBBAWoSW/rY7xO1kpGwk+NnGlEvegOJnKDYJSKRZzGR5WC2 /LmAs6xJSNmW7l1c678ZkH97aYRpnSxvMP7cVUrDFU+/3KjkB/Q0ujHH41STIr/s 9DdJUdUgbyuz9TBrUwFbnVBVCD3UADPuSIB+tN4mBIGM4FUmAfqrVlCN48J7UMtf vm8sg083MXSOZiPeDGm0w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457166; x=1779543566; bh=4 O70bbPQBhsg9A1RHU2I5MV8zYc4hA13AS1pM3gYLMI=; b=Y3Y/uCuxjU76IM30U WnGlzo+BCyv6sJ4YcJESMw5oUJ6d90t4L6wBLAX9oTxFirGB/ZI77wIsgkjwIZYP wPJu/kFg/piB5GkU4y5VuDUFXkL7dVdRyH1dOMQfZytPcPf/ECoFuXsrclW3fkGo iPRVGbC1tQfaFF/55wCl+BPMoSundIB/Fra1j5PDQWKzZXu400bT6Z3nL9YUsY/a atdPZbidD+LYhkPWldyDG2o2MjaVHe3UK3NpkF7nnO3GEEiXgs2FV9wjADH+upUM i753X2c4YGHO30DHbHEmAv2dXpxpxR4WC9T/kRsk9QUeiJnV9OQrzt4tDXetw2PI zRSSA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv peefnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:26 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 06/16] mm: preserve RWP marker across PTE rewrites Date: Fri, 22 May 2026 14:38:47 +0100 Message-ID: <20260522133857.552279-7-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" The uffd PTE bit must survive any kernel path that rewrites a PTE on a VM_UFFD_RWP VMA, otherwise the marker that carries PAGE_NONE semantics is silently dropped and the next access leaks past RWP tracking. Wire the preservation through every path that rewrites a VM_UFFD_RWP PTE. Swap and device-exclusive: do_swap_page(), restore_exclusive_pte(), and unuse_pte() (swapoff()) re-apply PAGE_NONE when the swap PTE carries the uffd bit and the VMA has VM_UFFD_RWP. Migration: remove_migration_pte() and remove_migration_pmd() do the same after the migration entry is replaced with a real PTE/PMD. Fork: __copy_present_ptes(), copy_present_page(), copy_nonpresent_pte(), copy_huge_pmd(), copy_huge_non_present_pmd(), and copy_hugetlb_page_range() keep the uffd bit on the child when the destination VMA has VM_UFFD_RWP, matching the existing VM_UFFD_WP handling. Add VM_UFFD_RWP to VM_COPY_ON_FORK so the flag itself propagates. mprotect(): change_pte_range() and change_huge_pmd() restore PAGE_NONE after pte_modify()/pmd_modify() have recomputed the base protection from a (possibly user-changed) vm_page_prot. pte_modify() preserves _PAGE_UFFD, so the bit stays; we just have to force PAGE_NONE back on top. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 3 ++- mm/huge_memory.c | 47 ++++++++++++++++++++++++++++++++++++++++++---- mm/hugetlb.c | 40 ++++++++++++++++++++++++++++++--------- mm/memory.c | 47 +++++++++++++++++++++++++++++++++++++++------- mm/migrate.c | 8 ++++++++ mm/mprotect.c | 10 ++++++++++ mm/mremap.c | 13 +++++++++++-- mm/swapfile.c | 5 +++++ mm/userfaultfd.c | 14 ++++++++++++++ 9 files changed, 164 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9054468774b5..0598b1482aeb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -663,7 +663,8 @@ enum { * only and thus cannot be reconstructed on page * fault. */ -#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_MAYBE_G= UARD) +#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_UFFD_RW= P | \ + VM_MAYBE_GUARD) =20 /* * mapping from the currently active vm_flags protection bits (the diff --git a/mm/huge_memory.c b/mm/huge_memory.c index befc919de69e..189192ea45cf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1918,7 +1918,7 @@ static void copy_huge_non_present_pmd( add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } @@ -2013,9 +2013,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) + + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma)) + pmd =3D pmd_modify(pmd, dst_vma->vm_page_prot); pmd =3D pmd_clear_uffd(pmd); + } + + pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2601,8 +2607,16 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, pgtable_trans_huge_deposit(mm, new_pmd, pgtable); } pmd =3D move_soft_dirty_pmd(pmd); - if (vma_has_uffd_without_event_remap(vma)) + if (vma_has_uffd_without_event_remap(vma)) { + /* + * See __copy_present_ptes(): normalise RWP PMDs so + * the destination starts accessible instead of taking + * a numa-hinting fault on first access. + */ + if (pmd_present(pmd) && userfaultfd_rwp(vma)) + pmd =3D pmd_modify(pmd, vma->vm_page_prot); pmd =3D clear_uffd_wp_pmd(pmd); + } set_pmd_at(mm, new_addr, new_pmd, pmd); if (force_flush) flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); @@ -2739,6 +2753,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct v= m_area_struct *vma, */ entry =3D pmd_clear_uffd(entry); =20 + /* See change_pte_range(): preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && pmd_uffd(entry)) + entry =3D pmd_modify(entry, PAGE_NONE); + /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) @@ -2906,6 +2924,13 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t = *dst_pmd, pmd_t *src_pmd, pm _dst_pmd =3D move_soft_dirty_pmd(src_pmdval); _dst_pmd =3D clear_uffd_wp_pmd(_dst_pmd); } + + /* Re-arm RWP on the moved PMD if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + _dst_pmd =3D pmd_modify(_dst_pmd, PAGE_NONE); + _dst_pmd =3D pmd_mkuffd(_dst_pmd); + } + set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); =20 src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); @@ -3082,6 +3107,11 @@ static void __split_huge_zero_page_pmd(struct vm_are= a_struct *vma, entry =3D pte_mkspecial(entry); if (pmd_uffd(old_pmd)) entry =3D pte_mkuffd(entry); + + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && pmd_uffd(old_pmd)) + entry =3D pte_modify(entry, PAGE_NONE); + VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3356,6 +3386,10 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, if (uffd_wp) entry =3D pte_mkuffd(entry); =20 + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && uffd_wp) + entry =3D pte_modify(entry, PAGE_NONE); + for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); =20 @@ -5054,6 +5088,11 @@ void remove_migration_pmd(struct page_vma_mapped_wal= k *pvmw, struct page *new) pmde =3D pmd_mkwrite(pmde, vma); if (pmd_swp_uffd(*pvmw->pmd)) pmde =3D pmd_mkuffd(pmde); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pmd_swp_uffd(*pvmw->pmd) && userfaultfd_rwp(vma)) + pmde =3D pmd_modify(pmde, PAGE_NONE); + if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3cdbf0057dce..eee32a325481 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4859,8 +4859,12 @@ hugetlb_install_folio(struct vm_area_struct *vma, pt= e_t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + if (userfaultfd_protected(vma) && huge_pte_uffd(old)) { newpte =3D huge_pte_mkuffd(newpte); + /* Restore PAGE_NONE so the RWP marker keeps trapping. */ + if (userfaultfd_rwp(vma)) + newpte =3D huge_pte_modify(newpte, PAGE_NONE); + } set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4933,7 +4937,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, =20 softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { @@ -4947,11 +4951,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd) + if (userfaultfd_protected(src_vma) && uffd) entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { @@ -5015,6 +5019,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, goto next; } =20 + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma)) + entry =3D huge_pte_modify(entry, dst_vma->vm_page_prot); + entry =3D huge_pte_clear_uffd(entry); + } + if (cow) { /* * No need to notify as we are downgrading page @@ -5027,9 +5038,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, entry =3D huge_pte_wrprotect(entry); } =20 - if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd(entry); - set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); } @@ -5075,10 +5083,19 @@ static void move_huge_pte(struct vm_area_struct *vm= a, unsigned long old_addr, huge_pte_clear(mm, new_addr, dst_pte, sz); } else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise RWP + * PTEs so the destination starts accessible + * instead of taking a numa-hinting fault on + * first access. + */ + if (userfaultfd_rwp(vma)) + pte =3D huge_pte_modify(pte, vma->vm_page_prot); pte =3D huge_pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -6529,6 +6546,11 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); + + /* Preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && huge_pte_uffd(pte)) + pte =3D huge_pte_modify(pte, PAGE_NONE); + huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/memory.c b/mm/memory.c index f2e7e900b1b8..ea9616e3dbaf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -880,6 +880,10 @@ static void restore_exclusive_pte(struct vm_area_struc= t *vma, if (pte_swp_uffd(orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { if (folio_test_dirty(folio)) @@ -1025,7 +1029,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, make_pte_marker(marker)); return 0; } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; @@ -1072,9 +1076,13 @@ copy_present_page(struct vm_area_struct *dst_vma, st= ruct vm_area_struct *src_vma /* All done, just insert the new page copy in the child */ pte =3D folio_mk_pte(new_folio, dst_vma->vm_page_prot); pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); - if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) - /* Uffd-wp needs to be delivered to dest pte as well */ + if (userfaultfd_protected(dst_vma) && pte_uffd(ptep_get(src_pte))) { + /* The uffd bit needs to be delivered to the dest pte as well */ pte =3D pte_mkuffd(pte); + /* Restore PAGE_NONE so the RWP marker keeps trapping */ + if (userfaultfd_rwp(dst_vma)) + pte =3D pte_modify(pte, PAGE_NONE); + } set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1084,9 +1092,29 @@ static __always_inline void __copy_present_ptes(stru= ct vm_area_struct *dst_vma, pte_t pte, unsigned long addr, int nr) { struct mm_struct *src_mm =3D src_vma->vm_mm; + bool writable; + + /* + * Snapshot writability before the RWP-disarm rewrite below: when the + * child is not RWP-armed, pte_modify(pte, dst_vma->vm_page_prot) can + * silently drop _PAGE_RW from a resolved (no-marker) writable PTE, + * so a later pte_write(pte) check would skip the COW wrprotect and + * leave the parent writable over a folio shared with the child. + */ + writable =3D pte_write(pte); + + /* + * Child is not RWP-armed: restore accessible protection so the + * inherited PAGE_NONE does not cost a fault on first read. + */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma)) + pte =3D pte_modify(pte, dst_vma->vm_page_prot); + pte =3D pte_clear_uffd(pte); + } =20 /* If it's a COW mapping, write protect it both processes. */ - if (is_cow_mapping(src_vma->vm_flags) && pte_write(pte)) { + if (is_cow_mapping(src_vma->vm_flags) && writable) { wrprotect_ptes(src_mm, addr, src_pte, nr); pte =3D pte_wrprotect(pte); } @@ -1096,9 +1124,6 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 - if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd(pte); - set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } =20 @@ -5080,6 +5105,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (pte_swp_uffd(vmf->orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* + * A page reclaimed while RWP-protected carries the uffd bit on + * its swap entry. Re-apply PAGE_NONE on swap-in so the first access + * still traps as an RWP fault. pte_modify() preserves _PAGE_UFFD. + */ + if (pte_swp_uffd(vmf->orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + /* * Same logic as in do_wp_page(); however, optimize for pages that are * certainly not shared either because we just allocated them without diff --git a/mm/migrate.c b/mm/migrate.c index 9d81b7b881ec..633085130b7c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -329,6 +329,10 @@ static bool try_to_map_unused_to_zeropage(struct page_= vma_mapped_walk *pvmw, if (pte_swp_uffd(old_pte)) newpte =3D pte_mkuffd(newpte); =20 + /* See remove_migration_pte(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(pvmw->vma)) + newpte =3D pte_modify(newpte, PAGE_NONE); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); @@ -394,6 +398,10 @@ static bool remove_migration_pte(struct folio *folio, else if (pte_swp_uffd(old_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 4a6b35482aee..e0b5fe7c66b2 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -296,6 +296,16 @@ static __always_inline void change_present_ptes(struct= mmu_gather *tlb, else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 + /* + * The uffd bit on a VM_UFFD_RWP VMA carries PROT_NONE + * semantics. If mprotect() or NUMA hinting changed the + * base protection, restore PAGE_NONE so the PTE still + * traps on any access. pte_modify() preserves + * _PAGE_UFFD. + */ + if (userfaultfd_rwp(vma) && pte_uffd(ptent)) + ptent =3D pte_modify(ptent, PAGE_NONE); + /* * In some writable, shared mappings, we might want * to catch actual write access -- see diff --git a/mm/mremap.c b/mm/mremap.c index 12732a5c547e..14e5df316f83 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -296,10 +296,19 @@ static int move_ptes(struct pagetable_move_control *p= mc, pte_clear(mm, new_addr, new_ptep); else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise + * RWP PTEs so the destination starts + * accessible instead of taking a + * numa-hinting fault on first access. + */ + if (userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, vma->vm_page_prot); pte =3D pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/swapfile.c b/mm/swapfile.c index 9119efef7fe6..260239b260d5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2338,6 +2338,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd= _t *pmd, new_pte =3D pte_mksoft_dirty(new_pte); if (pte_swp_uffd(old_pte)) new_pte =3D pte_mkuffd(new_pte); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + new_pte =3D pte_modify(new_pte, PAGE_NONE); + setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d546ffd2f165..d4a1d340dab3 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1200,6 +1200,13 @@ static long move_present_ptes(struct mm_struct *mm, if (pte_dirty(orig_src_pte)) orig_dst_pte =3D pte_mkdirty(orig_dst_pte); orig_dst_pte =3D pte_mkwrite(orig_dst_pte, dst_vma); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + orig_dst_pte =3D pte_modify(orig_dst_pte, PAGE_NONE); + orig_dst_pte =3D pte_mkuffd(orig_dst_pte); + } + set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); =20 src_addr +=3D PAGE_SIZE; @@ -1307,6 +1314,13 @@ static int move_zeropage_pte(struct mm_struct *mm, =20 zero_pte =3D pte_mkspecial(pfn_pte(zero_pfn(dst_addr), dst_vma->vm_page_prot)); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + zero_pte =3D pte_modify(zero_pte, PAGE_NONE); + zero_pte =3D pte_mkuffd(zero_pte); + } + ptep_clear_flush(src_vma, src_addr, src_pte); set_pte_at(mm, dst_addr, dst_pte, zero_pte); double_pt_unlock(dst_ptl, src_ptl); --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7F1B407CC5; Fri, 22 May 2026 13:39:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457172; cv=none; b=tHadHAEg7aYrSXKvZXkfhwMwAnjngVWjzB6+hcFSxa6h/RsF9TsH9qBAde8fhodgJjWkTGTropyGWPhm+F/Q29RUynp26rPyIqQ+vscvZkyvxHYiGUVfuEci+l6IcYmhs4RvEJO1/d9lySKo7mLHNglFxCGEEwqjCw36eoX2/2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457172; c=relaxed/simple; bh=qycJSRuROBUwpjVWwunuHd51mEQPPycEbvhhzvoxrvU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=C00j8luerE1bCAUiIHst+zWRc8fpVmXDI8enTEMVGLz+zwZI08o/GKAY78KRUVsDOpIiU0DDBnKIiswTWwsbD8tRzuLO5epm5Sf4zRj4jEM2qYlF9TTmw5ZXJcJTaOwHgFRDZ8ce8O779q9C4cMFYTwQAWe2ajTYd+4v0+ogrkA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=RA5ff6QA; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=qljvxYwf; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="RA5ff6QA"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="qljvxYwf" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id BC9837A00C8; Fri, 22 May 2026 09:39:29 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457169; x=1779543569; bh=XXFkEIOrYnd4VzG26u1BkNKtRIzSlJz9 bOIkV91NFGs=; b=RA5ff6QAb99SgTaHhMsQjttPHknxYweHjXAy8UUlWBa16qLQ AuUZsRyrNt8HxNbUsJQgbcc8Nkey9IBDhGQxEdrH90LaDaScAVfh1vOH82+AlpNK AWXHy+7Lol/WuDWvqmGdXaiRSpyP0N728RUw+ILge4fZJOUSnFg07o1Axjnq+hSC tjpUp2q6BDJDcAbhW73782kAZsF9gy8LIVEenOkK8yODFfxyN7xCvFq0u8rcCRHL 7r8PzgdetgtPMwzwe4RqfwhnLxZnsWpmHyMKUOTaAoGYLqXbIHtBsO+bxE/2piLc 41UAnC/aVYf5MB9Xut312LhZSXJMPFyo1k2LyQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457169; x= 1779543569; bh=XXFkEIOrYnd4VzG26u1BkNKtRIzSlJz9bOIkV91NFGs=; b=q ljvxYwfY04A9ZYn8H8CQusrqYpvEGioZzjB/OUVYjPtteTkq0sRTVW2t2qkYH6oT T/5aE88gU0O5d6JMcWj5cJgmam7KVdDWsrifmr9t9XDDDHeRfuMmer6Diu4MUaht rTVlGxD5ioTfyt8AleArDB1RhC1eaZ31w5DEma3IbGkCsDnnU6n5a6yd7ZOFAdMT T+qfMOKPuCWX55Tguh3pwQQdj9kj/iUi4T2IIR7BE8edgQZgBtrN57Z/1cO+FAnG MirFV5bjxcTduVPWce6jwVTYqp1oHl1UTV0tNcaysHoF7kFfn8zn3oPzTDXaebD9 cAa/8PMYcLGGRYvSJnyCw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:28 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 07/16] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Date: Fri, 22 May 2026 14:38:48 +0100 Message-ID: <20260522133857.552279-8-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Three mm paths outside the fault handler gate on the uffd PTE bit today: khugepaged (skip collapse on ranges carrying markers), rmap (cap unmap batching), and GUP (force a fault through gup_can_follow_protnone). Extend each to treat VM_UFFD_RWP the same as VM_UFFD_WP; otherwise per-PTE RWP state is silently destroyed or bypassed. khugepaged: try_collapse_pte_mapped_thp() and file_backed_vma_is_retractable() already refuse to collapse or retract page tables on ranges carrying the uffd PTE bit. Broaden the VMA predicate from userfaultfd_wp() to userfaultfd_protected() so VM_UFFD_RWP ranges get the same protection. hpage_collapse_scan_pmd() needs no change =E2=80=94 its existing pte_uffd() check already catches an RWP PTE because it carries the uffd bit. rmap: folio_unmap_pte_batch() caps batching at 1 for VM_UFFD_RWP so the restore path handles each PTE with its own marker. GUP: gup_can_follow_protnone() forces a fault on VM_UFFD_RWP VMAs regardless of FOLL_HONOR_NUMA_FAULT. RWP uses protnone as an access-tracking marker, not for NUMA hinting, so any GUP =E2=80=94 read or write =E2=80=94 must go through the userfaultfd fault path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 10 +++++++++- mm/khugepaged.c | 18 +++++++++++------- mm/rmap.c | 2 +- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0598b1482aeb..0ecfb0973b01 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4605,11 +4605,19 @@ static inline int vm_fault_to_errno(vm_fault_t vm_f= ault, int foll_flags) =20 /* * Indicates whether GUP can follow a PROT_NONE mapped page, or whether - * a (NUMA hinting) fault is required. + * a (NUMA hinting or userfaultfd RWP) fault is required. */ static inline bool gup_can_follow_protnone(const struct vm_area_struct *vm= a, unsigned int flags) { + /* + * VM_UFFD_RWP uses protnone as an access-tracking marker, not for + * NUMA hinting. GUP must always take a fault so the access is + * delivered to userfaultfd, regardless of FOLL_HONOR_NUMA_FAULT. + */ + if (vma->vm_flags & VM_UFFD_RWP) + return false; + /* * If callers don't want to honor NUMA hinting faults, no need to * determine if we would actually have to trigger a NUMA hinting fault. diff --git a/mm/khugepaged.c b/mm/khugepaged.c index de0644bde400..a798c542c849 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1532,8 +1532,11 @@ static enum scan_result try_collapse_pte_mapped_thp(= struct mm_struct *mm, unsign if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return SCAN_VMA_CHECK; =20 - /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ - if (userfaultfd_wp(vma)) + /* + * Keep pmd pgtable while the uffd bit is in use; see comment in + * retract_page_tables(). + */ + if (userfaultfd_protected(vma)) return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, @@ -1746,13 +1749,14 @@ static bool file_backed_vma_is_retractable(struct v= m_area_struct *vma) return false; =20 /* - * When a vma is registered with uffd-wp, we cannot recycle + * When a vma is registered with uffd-wp or RWP, we cannot recycle * the page table because there may be pte markers installed. - * Other vmas can still have the same file mapped hugely, but - * skip this one: it will always be mapped in small page size - * for uffd-wp registered ranges. + * VM_UFFD_RWP ranges similarly rely on per-PTE uffd state + * and cannot be recycled to a shared PMD. Other vmas can still + * have the same file mapped hugely, but skip this one: it will + * always be mapped in small page size for these registrations. */ - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return false; =20 /* diff --git a/mm/rmap.c b/mm/rmap.c index 05056c213203..1426d1ece917 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1965,7 +1965,7 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return 1; =20 /* --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F7C9409125; Fri, 22 May 2026 13:39:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457175; cv=none; b=PZx5gQRekyPvZ3cb9Ffi/gCzv8zi9qAS+IkZonvb7xrZmmUzR3pXOqzHQG2hHkE7idW7XP2yNPEtRRYHlmKpTytAAD9ML10xwqLAPNzX2OrpRItbyuS2Wv1ivh8a0o58MuwM+d0EkbyQcmjIrltixLj+IAPprzrgCmp2ywbmr74= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457175; c=relaxed/simple; bh=VxGypuR29qqNO8WwK9IzRx9znnwqZSglg1pQPlofjxI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=la1iwWyuxQt8oip1wecMro0guY48vDECSzau+wS1O/hbyFitnoznIROX5UovZ9i0rc6WCOW5BCvRpgVNsmz2vVrN6oWSniIqYIsYxlsnqh4r1hJKWs1h2eka8mYB9DkVz4wPgSMBGaw7JYkVukEgxWtY9II4I+zxEEmOUgqtdZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=e3lHq3fk; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=gLzE7b0K; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="e3lHq3fk"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="gLzE7b0K" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 4D4BF7A00C1; Fri, 22 May 2026 09:39:32 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457172; x=1779543572; bh=NUSWa0bvq0cSrOhIFYJlo/a/b9wxl4km 1riDjIieggs=; b=e3lHq3fkS+203s+13cR5G363ZG9VUqc3L5P9PT/EtvcnyInb jiVXCDnpW2rvStoNbPz7GGjiAX7GkKImcTbj+/Dji36cWWd+bqeASaGgumoweD0z +s22N48+H2dTMDLygiBhFYjWOH7Y/MkBYqD/TgMm4GBM/nL6fHVicK4aSJNN0aot je7Zwsk/j1gybcoAvNTu+oAiOZN1E+U13TZ2q0uXavA2JLOGxC5mBvHt0pMLENey smA3c8thkGBCspA6EYha5kPFynIKhK2Dj6I4BRbdyCxoa1b9hnl+NsxEoYDeGA+A 76ddjXEjdXy9IBq3kN7+6SjUo/wGq1TKKZJOqQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457172; x= 1779543572; bh=NUSWa0bvq0cSrOhIFYJlo/a/b9wxl4km1riDjIieggs=; b=g LzE7b0KJffZWCz1bmPiL7sPhcT/lL8EwWAylNYgRqu9kqfCe8iV3pgRXYRr7VO9S 9sokZO8gMWdapn6vAokeG1xbq+y2PeX87ionvysZIJnc8x9JjI4jiUygUug3VM5S uSl6kIZQdKDfSjAS39pXz+bumWkKv37185KKkigJYbcDzP/pgNQnwKJHiljcwTVe vvmxJT3W8oX6YODHfbcQfcldZDeFIylpKoB/FW85FjLOmL6bZbnsebetPPqPmzCE V5DjyS71mD+GULFSasYFmecYcLyKkmx6iBmaut8onHbdnz9pVrvSWFS+Hyi+Btdp ZExCFom9VNfpKQ0QgOZow== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:31 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 08/16] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Date: Fri, 22 May 2026 14:38:49 +0100 Message-ID: <20260522133857.552279-9-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Add the userspace interface for read-write protection tracking: - UFFDIO_REGISTER_MODE_RWP register a range for RWP tracking - UFFD_FEATURE_RWP capability bit - UFFDIO_RWPROTECT install / remove RWP on a range Introduce CONFIG_USERFAULTFD_RWP, auto-selected on 64-bit kernels with ARCH_HAS_PTE_PROTNONE and HAVE_ARCH_USERFAULTFD_WP. The symbol gates VM_UFFD_RWP (previously aliased to VM_NONE) and the smaps/trace-flag hooks added in the preparatory patches; without it the UAPI bits added here have nothing to drive and would be unreachable. Registration sets VM_UFFD_RWP on the VMA. Combining MODE_WP with MODE_RWP is rejected because both modes claim the uffd PTE bit. UFFDIO_RWPROTECT is the bidirectional counterpart of UFFDIO_WRITEPROTECT: - MODE_RWP change_protection() with MM_CP_UFFD_RWP installs PAGE_NONE and sets the uffd bit on present PTEs - !MODE_RWP change_protection() with MM_CP_UFFD_RWP_RESOLVE restores vma->vm_page_prot and clears the bit userfaultfd_clear_vma() runs the same resolve pass on unregister so RWP state cannot outlive the uffd. Re-registering a range must not drop a mode that installs per-PTE markers (WP or RWP); doing so returns -EBUSY. This also closes a pre-existing window where re-registering without MODE_WP would strand uffd-wp markers: before, those caused extra write-faults but were otherwise benign; with RWP preservation in place, a subsequent mprotect() on a VM_UFFD_RWP VMA would silently promote the stale markers to RWP. The feature is not yet advertised. UFFDIO_REGISTER_MODE_RWP, UFFD_FEATURE_RWP, and _UFFDIO_RWPROTECT are intentionally absent from UFFD_API_REGISTER_MODES, UFFD_API_FEATURES, and UFFD_API_RANGE_IOCTLS, so UFFDIO_API masks them out and the register-mode validator rejects the bit. The follow-up patch adds fault dispatch and exposes the UAPI. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/userfaultfd.rst | 10 ++ fs/userfaultfd.c | 84 +++++++++++++++++ include/linux/userfaultfd_k.h | 2 + include/uapi/linux/userfaultfd.h | 19 ++++ mm/Kconfig | 9 ++ mm/userfaultfd.c | 96 +++++++++++++++++++- 6 files changed, 217 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index e5cc8848dcb3..1e533639fd50 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -131,6 +131,16 @@ userfaults on the range registered. Not all ioctls wil= l necessarily be supported for all memory types (e.g. anonymous memory vs. shmem vs. hugetlbfs), or all types of intercepted faults. =20 +.. note:: + + Re-registering an already-registered range must not drop any of the + modes that install per-PTE markers =E2=80=94 currently + ``UFFDIO_REGISTER_MODE_WP`` and ``UFFDIO_REGISTER_MODE_RWP``. Doing + so would strand markers with no flag to describe them, so the call + is rejected with ``-EBUSY``; userspace must issue + ``UFFDIO_UNREGISTER`` first. This differs from older kernels, which + silently replaced the mode bits on re-registration. + Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove memory from the ``userfaultfd`` registered range). This means a userfault diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0fdf28f62702..f2097c558165 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -215,6 +215,8 @@ static inline struct uffd_msg userfault_msg(unsigned lo= ng address, msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WP; + if (reason & VM_UFFD_RWP) + msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_RWP; if (reason & VM_UFFD_MINOR) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) @@ -1292,6 +1294,22 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, =20 vm_flags |=3D VM_UFFD_WP; } + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP) { + if (!pgtable_supports_uffd() || VM_UFFD_RWP =3D=3D VM_NONE) + goto out; + if (!(ctx->features & UFFD_FEATURE_RWP)) + goto out; + vm_flags |=3D VM_UFFD_RWP; + } + + /* + * WP and RWP share the uffd PTE bit and + * cannot coexist in the same VMA =E2=80=94 the bit would carry ambiguous + * semantics. Reject the combination up front. + */ + if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP)) + goto out; + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) { #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR goto out; @@ -1385,6 +1403,16 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, cur->vm_userfaultfd_ctx.ctx !=3D ctx) goto out_unlock; =20 + /* + * Mode switches that drop VM_UFFD_WP or VM_UFFD_RWP would + * leave PTE markers without the flag that describes them; + * subsequent mprotect() would then promote stale markers + * into the other mode. Require an unregister first. + */ + if (cur->vm_userfaultfd_ctx.ctx =3D=3D ctx && + cur->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags) + goto out_unlock; + /* * Note vmas containing huge pages */ @@ -1418,6 +1446,10 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) ioctls_out &=3D ~((__u64)1 << _UFFDIO_CONTINUE); =20 + /* RWPROTECT is only supported for RWP ranges */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP)) + ioctls_out &=3D ~((__u64)1 << _UFFDIO_RWPROTECT); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1765,6 +1797,55 @@ static int userfaultfd_writeprotect(struct userfault= fd_ctx *ctx, return ret; } =20 +static int userfaultfd_rwprotect(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + int ret; + struct uffdio_rwprotect uffdio_rwp; + struct userfaultfd_wake_range range; + bool mode_rwp, mode_dontwake; + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (copy_from_user(&uffdio_rwp, (void __user *)arg, + sizeof(uffdio_rwp))) + return -EFAULT; + + ret =3D validate_range(ctx->mm, uffdio_rwp.range.start, + uffdio_rwp.range.len); + if (ret) + return ret; + + if (uffdio_rwp.mode & ~(UFFDIO_RWPROTECT_MODE_DONTWAKE | + UFFDIO_RWPROTECT_MODE_RWP)) + return -EINVAL; + + mode_rwp =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_RWP; + mode_dontwake =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_DONTWAKE; + + if (mode_rwp && mode_dontwake) + return -EINVAL; + + if (mmget_not_zero(ctx->mm)) { + ret =3D mrwprotect_range(ctx, uffdio_rwp.range.start, + uffdio_rwp.range.len, mode_rwp); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (ret) + return ret; + + if (!mode_rwp && !mode_dontwake) { + range.start =3D uffdio_rwp.range.start; + range.len =3D uffdio_rwp.range.len; + wake_userfault(ctx, &range); + } + return ret; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -2071,6 +2152,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_POISON: ret =3D userfaultfd_poison(ctx, arg); break; + case UFFDIO_RWPROTECT: + ret =3D userfaultfd_rwprotect(ctx, arg); + break; } return ret; } diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 07766398d592..d46974be864e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -162,6 +162,8 @@ extern int mwriteprotect_range(struct userfaultfd_ctx *= ctx, unsigned long start, unsigned long len, bool enable_wp); extern long uffd_wp_range(struct vm_area_struct *vma, unsigned long start, unsigned long len, bool enable_wp); +extern int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long sta= rt, + unsigned long len, bool enable_rwp); =20 /* move_pages */ void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 2841e4ea8f2c..7b78aa3b5318 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -79,6 +79,7 @@ #define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) +#define _UFFDIO_RWPROTECT (0x09) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -103,6 +104,8 @@ struct uffdio_continue) #define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \ struct uffdio_poison) +#define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ + struct uffdio_rwprotect) =20 /* read() structure */ struct uffd_msg { @@ -158,6 +161,7 @@ struct uffd_msg { #define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ #define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ #define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */ +#define UFFD_PAGEFAULT_FLAG_RWP (1<<3) /* If reason is VM_UFFD_RWP */ =20 struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -230,6 +234,11 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_RWP indicates that the kernel supports + * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. + * Pages are made inaccessible via UFFDIO_RWPROTECT and faults + * are delivered when the pages are re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +257,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_RWP (1<<17) __u64 features; =20 __u64 ioctls; @@ -263,6 +273,7 @@ struct uffdio_register { #define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0) #define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1) #define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2) +#define UFFDIO_REGISTER_MODE_RWP ((__u64)1<<3) __u64 mode; =20 /* @@ -356,6 +367,14 @@ struct uffdio_poison { __s64 updated; }; =20 +struct uffdio_rwprotect { + struct uffdio_range range; + /* !RWP means undo RWP-protection */ +#define UFFDIO_RWPROTECT_MODE_RWP ((__u64)1<<0) +#define UFFDIO_RWPROTECT_MODE_DONTWAKE ((__u64)1<<1) + __u64 mode; +}; + struct uffdio_move { __u64 dst; __u64 src; diff --git a/mm/Kconfig b/mm/Kconfig index e8bf1e9e6ad9..ccf534a8cbc9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1347,6 +1347,15 @@ config HAVE_ARCH_USERFAULTFD_MINOR help Arch has userfaultfd minor fault support =20 +config USERFAULTFD_RWP + def_bool y + depends on 64BIT && ARCH_HAS_PTE_PROTNONE && HAVE_ARCH_USERFAULTFD_WP + help + Userfaultfd read-write protection (UFFDIO_RWPROTECT) delivers a + userfaultfd notification on every access -- read or write -- to a + protected range, letting userspace observe the working set of a + process. + menuconfig USERFAULTFD bool "Enable userfaultfd() system call" depends on MMU diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d4a1d340dab3..c13452cb092b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1072,6 +1072,75 @@ int mwriteprotect_range(struct userfaultfd_ctx *ctx,= unsigned long start, return err; } =20 +int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, + unsigned long len, bool enable_rwp) +{ + struct mm_struct *dst_mm =3D ctx->mm; + unsigned long end =3D start + len; + struct vm_area_struct *dst_vma; + unsigned int mm_cp_flags; + struct mmu_gather tlb; + bool found =3D false; + VMA_ITERATOR(vmi, dst_mm, start); + + VM_WARN_ON_ONCE(start & ~PAGE_MASK); + VM_WARN_ON_ONCE(len & ~PAGE_MASK); + VM_WARN_ON_ONCE(start + len <=3D start); + + guard(mmap_read_lock)(dst_mm); + guard(rwsem_read)(&ctx->map_changing_lock); + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (enable_rwp) + mm_cp_flags =3D MM_CP_UFFD_RWP; + else + mm_cp_flags =3D MM_CP_UFFD_RWP_RESOLVE; + + /* + * Pre-scan the range: validate every spanned VMA before applying + * any change_protection() so a partial failure cannot leave the + * process with only a prefix of the range re-protected. + */ + for_each_vma_range(vmi, dst_vma, end) { + if (!userfaultfd_rwp(dst_vma)) + return -ENOENT; + + if (is_vm_hugetlb_page(dst_vma)) { + unsigned long page_mask; + + page_mask =3D vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + return -EINVAL; + } + found =3D true; + } + if (!found) + return -ENOENT; + + vma_iter_set(&vmi, start); + tlb_gather_mmu(&tlb, dst_mm); + for_each_vma_range(vmi, dst_vma, end) { + unsigned long vma_start =3D max(dst_vma->vm_start, start); + unsigned long vma_end =3D min(dst_vma->vm_end, end); + unsigned int flags =3D mm_cp_flags; + + /* + * On resolve, try to upgrade writability per-VMA -- + * MM_CP_TRY_CHANGE_WRITABLE WARNs in + * maybe_change_pte_writable() if the VMA is not VM_WRITE, + * and RWP can be registered on PROT_READ-only mappings. + */ + if (!enable_rwp && vma_wants_manual_pte_write_upgrade(dst_vma)) + flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + change_protection(&tlb, dst_vma, vma_start, vma_end, flags); + } + tlb_finish_mmu(&tlb); + + return 0; +} =20 void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2) @@ -2109,9 +2178,22 @@ struct vm_area_struct *userfaultfd_clear_vma(struct = vma_iterator *vmi, if (start =3D=3D vma->vm_start && end =3D=3D vma->vm_end) give_up_on_oom =3D true; =20 - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, end - start, false); + /* Clear the uffd bit and/or restore protnone PTEs */ + if (userfaultfd_protected(vma)) { + unsigned int mm_cp_flags =3D 0; + struct mmu_gather tlb; + + if (userfaultfd_wp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_WP_RESOLVE; + if (userfaultfd_rwp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_RWP_RESOLVE; + if (vma_wants_manual_pte_write_upgrade(vma)) + mm_cp_flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + tlb_gather_mmu(&tlb, vma->vm_mm); + change_protection(&tlb, vma, start, end, mm_cp_flags); + tlb_finish_mmu(&tlb); + } =20 ret =3D vma_modify_flags_uffd(vmi, prev, vma, start, end, &new_vma_flags, NULL_VM_UFFD_CTX, @@ -2160,6 +2242,14 @@ int userfaultfd_register_range(struct userfaultfd_ct= x *ctx, vma_test_all_mask(vma, vma_flags)) goto skip; =20 + /* + * Pre-scan in userfaultfd_register() already rejected mode + * switches that would drop VM_UFFD_WP or VM_UFFD_RWP, so a + * stray bit here is a bug. + */ + VM_WARN_ON_ONCE(vma->vm_userfaultfd_ctx.ctx =3D=3D ctx && + vma->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags); + if (vma->vm_start > start) start =3D vma->vm_start; vma_end =3D min(end, vma->vm_end); --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29C6C40F8E6; Fri, 22 May 2026 13:39:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457178; cv=none; b=PN1TZvVi188OHUD4C71yarFAvBWvHentdUR1uvd3yHLn5o67H77ICkYab/dRhsfhF+bboEnlDd5whOVhr/iYueyuV+Pj+4cDSLsaAHR6REgCIPIwcoaM3ZUyie/lrreIrw2z8knPSmSpVLigTurXoF4IRzzdPTblEQJGP83u3vM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457178; c=relaxed/simple; bh=DS32qPI7A650Vx2NBEIiP95LLUC/h1M5LH6wlpEV9XU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ldK9yBA4+VEQvR78wve6KtGX0wDEHtTSqN/EcGDkcJnEgXePBAWo+ncrGQwG+1ZVDjCdV2ST3CQRscV2BItiM/MMfktEK/5/WfNBN0oLFWkwJiT5XwV7syxwRJ3IcE6snYpWCZXcdcAo1PGWDhz6n1SmPyKLaIE6thZWrNSGqso= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=L+KnzUz/; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=hCrPIg29; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="L+KnzUz/"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="hCrPIg29" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id DADAA7A00DB; Fri, 22 May 2026 09:39:34 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 22 May 2026 09:39:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457174; x= 1779543574; bh=j4GEm7MS/jeO7SQXaryWg6XeZV31wYZbw0p8z+1XXPs=; b=L +KnzUz/fQRDzVM1a0fwi+6mo3IdHtK04dXnMp9yOeIhpyyhqh8z+ZQysZYOD8279 5j4TNSHeivcUDY4OWkE47SDIGypoq2JyB0/4PdVTbDi8oabuLlaSen5R9RgMqWnM KwoL2krRw/wj3/JJT8nq3qF+b4+1RZwqCjUKuFXcTBUZ8eTb6BmPBqYF9TMwKwJT 6h938X0vL8laNoSPC4ChEPxk2yUldcQIa8v/jUkQKvkb6Xhhcs/KkscG5w7sA0b9 PPuyaGjwq38UY2eWDY9JTjPe9yeq4L6h8z8pihlHjMe7tvArD/M8YjG5MhUUBn+x Xgnmg6E4A7Cqh0u82PX0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457174; x=1779543574; bh=j 4GEm7MS/jeO7SQXaryWg6XeZV31wYZbw0p8z+1XXPs=; b=hCrPIg29BKD2b3HuK MzlVOWCkwnmJ9UEvxiuC/YYIaEfz+Jy7+tV1XNueOOx7YOi+fyVuwwHGfRsNoIpd LJMykcK5tZ6lNDSjrYxKDq3cB/L1k/p6EpNxVAtd+OOWNybmG4hk9U+ZCL5Q7sdU 9vx2c3LubnNnJ0/b4tIveIfB44z4tofMfhUWAXnjvol6qCZ6gGvuJ0OHJjWrfcpD F+i9I4QNC1z8DirJuc3Wyj75fqlzCKvkJeKI1DbEmDMLcA8cVmiKq4zp9sfDVC7o ZneiGQ3W269fOip/zzAXBLIYZTfaT8Hqmn9lomQEln4YcArPp2UoJXIllMv1TzXc pYeDQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedunecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:34 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 09/16] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Date: Fri, 22 May 2026 14:38:50 +0100 Message-ID: <20260522133857.552279-10-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Wire the fault side of read-write protection tracking and turn the userspace interface on. An RWP-protected PTE is PAGE_NONE with the uffd bit set. The PROT_NONE triggers a fault on any access; the uffd bit distinguishes it from plain mprotect(PROT_NONE) or NUMA hinting. Fault dispatch, per level: PTE handle_pte_fault() -> do_uffd_rwp() PMD __handle_mm_fault() -> do_huge_pmd_uffd_rwp() hugetlb hugetlb_fault() -> hugetlb_handle_userfault() The RWP branches gate on userfaultfd_pte_rwp() / userfaultfd_huge_pmd_rwp() (VM_UFFD_RWP plus the uffd bit) and fall through to do_numa_page() / do_huge_pmd_numa_page() otherwise. Each delivers a UFFD_PAGEFAULT_FLAG_RWP message through handle_userfault(); the handler resolves it with UFFDIO_RWPROTECT clearing MODE_RWP. userfaultfd_must_wait() and userfaultfd_huge_must_wait() add matching protnone+uffd waiters so sync-mode fault handlers block correctly. Expose the UAPI: UFFDIO_REGISTER_MODE_RWP -> UFFD_API_REGISTER_MODES UFFD_FEATURE_RWP -> UFFD_API_FEATURES _UFFDIO_RWPROTECT -> UFFD_API_RANGE_IOCTLS UFFD_API_RANGE_IOCTLS_BASIC UFFD_FEATURE_RWP is masked out at UFFDIO_API time when PROT_NONE is not available or VM_UFFD_RWP aliases VM_NONE (32-bit), so userspace never sees an advertised-but-broken feature. Works on anonymous, shmem, and hugetlb memory. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- fs/userfaultfd.c | 32 ++++++++++++++++++++++++++++++-- include/linux/huge_mm.h | 7 +++++++ include/linux/userfaultfd_k.h | 24 ++++++++++++++++++++++++ include/uapi/linux/userfaultfd.h | 12 ++++++++---- mm/huge_memory.c | 5 +++++ mm/hugetlb.c | 11 +++++++++++ mm/memory.c | 21 +++++++++++++++++++-- 7 files changed, 104 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f2097c558165..f8f1619f5183 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -261,6 +261,12 @@ static inline bool userfaultfd_huge_must_wait(struct u= serfaultfd_ctx *ctx, */ if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) return true; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * resolution. Plain PROT_NONE without the marker is not an RWP fault. + */ + if (pte_protnone(pte) && huge_pte_uffd(pte) && (reason & VM_UFFD_RWP)) + return true; =20 return false; } @@ -321,8 +327,14 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, if (!pmd_present(_pmd)) return false; =20 - if (pmd_trans_huge(_pmd)) - return !pmd_write(_pmd) && (reason & VM_UFFD_WP); + if (pmd_trans_huge(_pmd)) { + if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) + return true; + if (pmd_protnone(_pmd) && pmd_uffd(_pmd) && + (reason & VM_UFFD_RWP)) + return true; + return false; + } =20 pte =3D pte_offset_map(pmd, address); if (!pte) @@ -347,6 +359,13 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, */ if (!pte_write(ptent) && (reason & VM_UFFD_WP)) goto out; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * userspace to resolve. Plain PROT_NONE without the marker is not + * an RWP fault. + */ + if (pte_protnone(ptent) && pte_uffd(ptent) && (reason & VM_UFFD_RWP)) + goto out; =20 ret =3D false; out: @@ -2086,6 +2105,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; } + /* + * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + uffdio_api.features &=3D ~UFFD_FEATURE_RWP; =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2949e5acff35..e980909ee49e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -520,6 +520,8 @@ static inline bool folio_test_pmd_mappable(struct folio= *folio) =20 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf); + vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); =20 extern struct folio *huge_zero_folio; @@ -702,6 +704,11 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *p= ud, return NULL; } =20 +static inline vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return 0; +} + static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { return 0; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index d46974be864e..1beae4f2f479 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -247,6 +247,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return userfaultfd_rwp(vma) && pte_uffd(pte); +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return userfaultfd_rwp(vma) && pmd_uffd(pmd); +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & __VM_UFFD_FLAGS; @@ -399,6 +411,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return false; } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return false; +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 7b78aa3b5318..d803e76d47ad 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -25,7 +25,8 @@ #define UFFD_API ((__u64)0xAA) #define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING | \ UFFDIO_REGISTER_MODE_WP | \ - UFFDIO_REGISTER_MODE_MINOR) + UFFDIO_REGISTER_MODE_MINOR | \ + UFFDIO_REGISTER_MODE_RWP) #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ @@ -42,7 +43,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_RWP) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -54,13 +56,15 @@ (__u64)1 << _UFFDIO_MOVE | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) =20 /* * Valid ioctl command number range with this API is from 0x00 to diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 189192ea45cf..76ca0fbaa802 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2264,6 +2264,11 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, return pmd_dirty(pmd); } =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return handle_userfault(vmf, VM_UFFD_RWP); +} + /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eee32a325481..9fc31cbcba4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6067,6 +6067,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, goto out_mutex; } =20 + /* + * Protnone hugetlb PTEs with the uffd bit are used by + * userfaultfd RWP for access tracking. Plain PROT_NONE (without the + * marker) is not an RWP fault and is not expected on hugetlb (no + * NUMA hinting), so let normal hugetlb fault handling proceed. + */ + if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && + userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + } + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any diff --git a/mm/memory.c b/mm/memory.c index ea9616e3dbaf..e0dcf2c28d9d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6172,6 +6172,12 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru } } =20 +static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) +{ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); +} + static vm_fault_t do_numa_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -6446,8 +6452,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); =20 - if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) + if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) { + /* + * RWP-protected PTEs are protnone plus the uffd bit. On a + * VM_UFFD_RWP VMA, a protnone PTE without the uffd bit is + * NUMA hinting and must still fall through to do_numa_page(). + */ + if (userfaultfd_pte_rwp(vmf->vma, vmf->orig_pte)) + return do_uffd_rwp(vmf); return do_numa_page(vmf); + } =20 spin_lock(vmf->ptl); entry =3D vmf->orig_pte; @@ -6561,8 +6575,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, return 0; } if (pmd_trans_huge(vmf.orig_pmd)) { - if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) + if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) { + if (userfaultfd_huge_pmd_rwp(vma, vmf.orig_pmd)) + return do_huge_pmd_uffd_rwp(&vmf); return do_huge_pmd_numa_page(&vmf); + } =20 if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !pmd_write(vmf.orig_pmd)) { --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2597641C2F1; Fri, 22 May 2026 13:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457181; cv=none; b=NgR+eqjSwsYBz5Kh0AnEpJ51I364Wup6vTLVqMpYqWvm5cf058hB37jDiYSfh5CEIUbKalBT7sK2QFsK8fUeUli/oRJhwPH1//GeIHCg9ns8pxB9zrtQW89Vb/WjCHgAuqSJ3MtvtHOiVOYMRoFqEOOcore/a6YtBpOCw4OIejg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457181; c=relaxed/simple; bh=WFZ27iO7caj8WIFHmx/w5k6u+45iXe0Z5uTXJWghWls=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DxMZJHnWd9mJphr6/asmTOu5p0WLMaDcGL6HqxYJx875IUHP5Aumv7wvE2HbdO4B+Okl8GUgcC3EGzmxqBcRxer0wteDt70K+84AK5pFQXQ1huVDm2JhC6CMt8HgspUmsXrQoCsipiXhvvWZB6ZnbMDka4jC0Vhd6CEv1Dvdmcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=YllVU1Yy; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=MD6Z8llg; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="YllVU1Yy"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="MD6Z8llg" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id D31CC7A00D7; Fri, 22 May 2026 09:39:37 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Fri, 22 May 2026 09:39:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457177; x=1779543577; bh=YDOUEEcfbQnv7HVvJdgg2bx0uztFiJM/ ohC1fFTGVAg=; b=YllVU1YyoPfST6pMfNTptKkrnTzw92HF1VFZ54d4qXrOwM3n LceUsJnP8MmwrmQgo0p9I4210vgdlAGNOTc4lnIp7bF86Ao6qyDNpm6sDntOTxTK 751+F22BKQcDGVoyAyyLLWsOu2Jv+VkzgnccADTBnzUjtP7CRKAFCcCCrDlJ0Hoc uIlFUjeZPlz8xhU1DMBRRZf/3STuGdA/EOjKTLTLcweSVAZk4mA4dBHAr0Tnsgvs vMZhoefI4kdvAy3kDM2RgrSCDsiBe+GOYy5pxfQuJxe6Bh5YeEoyF1vx5qIZJljp +nX3Inl0P7M+F0zi7kxgSAGs/lneSQGNTm2hXw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457177; x= 1779543577; bh=YDOUEEcfbQnv7HVvJdgg2bx0uztFiJM/ohC1fFTGVAg=; b=M D6Z8llgQjMvKsAlB7OsNxZETijuTeJ1C8U/u7Iz6qePNed07y33Www2KPzMVh7ti xzeHWsRMQKKNtFaF8xMcYC+lmTNvqhIvWFiQRNFHM9AuQijSb8WR77a1uh3zfxIE ufpeOMYf7SHZES+DUC+11f8sYQAwT8U+NVr24F2eqvi7HOnz5TmntIm3ZAc4+WBN euORIZv58TfO0S7YJM2SeSBmhBE3mMcxN5wAcLgGfgpoDoMV2RVhzq20gvCWWoiO GRnJPvtyru/dmEOGDX8hu4Y9u5yKK/0U1uITHGdBbDu3SuEY/FOT977ARrAhEbyW AkaIO33DoCPeYzcW1Nssg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:36 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 10/16] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Date: Fri, 22 May 2026 14:38:51 +0100 Message-ID: <20260522133857.552279-11-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" PAGEMAP_SCAN already reports PAGE_IS_WRITTEN from the inverted uffd PTE bit, targeting the UFFDIO_WRITEPROTECT workflow. UFFDIO_RWPROTECT reuses the same PTE bit as a marker for read-write protection, but "has been written" and "has been accessed" are distinct semantic signals =E2=80=94 they happen to share one PTE bit today only because the t= wo implementations share infrastructure. Give RWP its own pagemap category so the UAPI does not conflate them: PAGE_IS_WRITTEN reported on VM_UFFD_WP VMAs, !pte_uffd(pte) PAGE_IS_ACCESSED reported on VM_UFFD_RWP VMAs, !pte_uffd(pte) Both still read the same PTE bit today, but each is scoped to the VMA whose registered mode makes the bit meaningful. If a future implementation moves RWP to a separate PTE bit, only PAGE_IS_ACCESSED switches over. This is a UAPI narrowing. Outside VM_UFFD_WP VMAs the uffd bit is always clear, so PAGEMAP_SCAN used to flag PAGE_IS_WRITTEN on every present PTE there =E2=80=94 a meaningless duplicate of PAGE_IS_PRESENT. Now PAGE_IS_WRITTEN fires only inside VM_UFFD_WP VMAs. pagemap_hugetlb_category() now takes the vma like its PTE/PMD peers. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/pagemap.rst | 13 ++++- fs/proc/task_mmu.c | 73 ++++++++++++++++++------ include/uapi/linux/fs.h | 1 + tools/include/uapi/linux/fs.h | 1 + 4 files changed, 67 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin= -guide/mm/pagemap.rst index c57e61b5d8aa..ffa690a171c8 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -19,8 +19,11 @@ There are four components to pagemap: * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst) * Bit 56 page exclusively mapped (since 4.2) - * Bit 57 pte is uffd-wp write-protected (since 5.13) (see - Documentation/admin-guide/mm/userfaultfd.rst) + * Bit 57 pte is tracked by userfaultfd (since 5.13) =E2=80=94 in a + ``VM_UFFD_WP`` VMA this indicates a write-protected PTE; in a + ``VM_UFFD_RWP`` VMA it indicates an RWP-protected PTE. WP and + RWP are mutually exclusive per VMA, so the meaning is + unambiguous. See Documentation/admin-guide/mm/userfaultfd.rst. * Bit 58 pte is a guard region (since 6.15) (see madvise (2) man p= age) * Bits 59-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) @@ -244,7 +247,8 @@ in this IOCTL: Following flags about pages are currently supported: =20 - ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled -- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was writ= e protected +- ``PAGE_IS_WRITTEN`` - Page in a ``UFFDIO_REGISTER_MODE_WP`` VMA has been + written to since it was write-protected. Only reported inside such VMAs. - ``PAGE_IS_FILE`` - Page is file backed - ``PAGE_IS_PRESENT`` - Page is present in the memory - ``PAGE_IS_SWAPPED`` - Page is in swapped @@ -252,6 +256,9 @@ Following flags about pages are currently supported: - ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed - ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty - ``PAGE_IS_GUARD`` - Page is a part of a guard region +- ``PAGE_IS_ACCESSED`` - Page in a ``UFFDIO_REGISTER_MODE_RWP`` VMA has be= en + accessed since RWP was applied. Only reported inside such VMAs. See + Documentation/admin-guide/mm/userfaultfd.rst for the RWP workflow. =20 The ``struct pm_scan_arg`` is used as the argument of the IOCTL. =20 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index fbaede228201..4e207b6216b1 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2197,7 +2197,7 @@ static const struct mm_walk_ops pagemap_ops =3D { * Bits 5-54 swap offset if swapped * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) * Bit 56 page exclusively mapped - * Bit 57 pte is uffd-wp write-protected + * Bit 57 pte is tracked by userfaultfd (uffd-wp or RWP) * Bit 58 pte is a guard region * Bits 59-60 zero * Bit 61 page is file-page or shared-anon @@ -2332,7 +2332,7 @@ static int pagemap_release(struct inode *inode, struc= t file *file) PAGE_IS_FILE | PAGE_IS_PRESENT | \ PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \ PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \ - PAGE_IS_GUARD) + PAGE_IS_GUARD | PAGE_IS_ACCESSED) #define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC) =20 struct pagemap_scan_private { @@ -2357,8 +2357,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page(vma, addr, pte); @@ -2375,8 +2379,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 entry =3D softleaf_from_pte(pte); if (softleaf_is_guard_marker(entry)) @@ -2425,8 +2433,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page_pmd(vma, addr, pmd); @@ -2440,8 +2452,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_swp_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; =20 @@ -2474,7 +2490,8 @@ static void make_uffd_wp_pmd(struct vm_area_struct *v= ma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 #ifdef CONFIG_HUGETLB_PAGE -static unsigned long pagemap_hugetlb_category(pte_t pte) +static unsigned long pagemap_hugetlb_category(struct vm_area_struct *vma, + pte_t pte) { unsigned long categories =3D PAGE_IS_HUGE; =20 @@ -2489,8 +2506,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!huge_pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; if (is_zero_pfn(pte_pfn(pte))) @@ -2500,8 +2521,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; } @@ -2586,6 +2611,16 @@ static int pagemap_scan_test_walk(unsigned long star= t, unsigned long end, bool wp_allowed =3D userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma); =20 + /* + * PM_SCAN_WP_MATCHING is the atomic read-and-reset flavour of the + * scan and is implemented for the WP marker only. Reject it on + * VM_UFFD_RWP VMAs explicitly so userspace gets a clear error + * instead of a silently-skipped range; re-arming is done with + * UFFDIO_RWPROTECT(MODE_RWP). + */ + if (userfaultfd_rwp(vma) && (p->arg.flags & PM_SCAN_WP_MATCHING)) + return -EINVAL; + if (!wp_allowed) { /* User requested explicit failure over wp-async capability */ if (p->arg.flags & PM_SCAN_CHECK_WPASYNC) @@ -2773,7 +2808,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, goto flush_and_return; } =20 - if (!p->arg.category_anyof_mask && !p->arg.category_inverted && + if (userfaultfd_wp(vma) && !p->arg.category_anyof_mask && + !p->arg.category_inverted && p->arg.category_mask =3D=3D PAGE_IS_WRITTEN && p->arg.return_mask =3D=3D PAGE_IS_WRITTEN) { for (addr =3D start; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -2848,7 +2884,8 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, /* Go the short route when not write-protecting pages. */ =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | + pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) return 0; @@ -2860,7 +2897,7 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, ptl =3D huge_pte_lock(hstate_vma(vma), vma->vm_mm, ptep); =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) goto out_unlock; diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 13f71202845e..c4aeaa0c31c7 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -455,6 +455,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index 24ddf7bc4f25..f0a26309b6d5 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -364,6 +364,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3257F421EE4; Fri, 22 May 2026 13:39:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457184; cv=none; b=nUOO9GTj1PSAvWvMsIc5wqhU09voQAHR4vWT6xckng4SINSMl6yV3HLWq0izqoe303mFwxYq41VyvKxUD8E40pivcRgBpC/lJe3FtAhXkK8SAfp+uRcxy6gvyJ/8k0ZuCkusxqzwh9Z5emQIEyR6HSIADF/7vsojhak9Lu58elA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457184; c=relaxed/simple; bh=cSKuOsZv0NVHHmKN+oC2toFd1vGzLzQpfGLOYgKCltk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ig9JlZukZD/N9r5SfdWKipopcX1/nj/cAb04jfH9yJXoHxHGwHjXHxnDAntbeBfTqz5lMxKh/VQVVZ8BcR5H5C8f7M1C4PIfVCEJxWSjqJIOoOh2otvgFdP838b1itnPG6EC/sff4v4526kdR03sB2UoV7+/tfh8Jhdgm/3r0Lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=rnwBrPQl; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=rXJfRNLq; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="rnwBrPQl"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="rXJfRNLq" Received: from phl-compute-11.internal (phl-compute-11.internal [10.202.2.51]) by mailfout.stl.internal (Postfix) with ESMTP id D9FE11D00126; Fri, 22 May 2026 09:39:40 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-11.internal (MEProxy); Fri, 22 May 2026 09:39:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457180; x=1779543580; bh=apb55LlMDEBM2szOAEFKpKJCzTbXU9G5 gULUrddwYm8=; b=rnwBrPQlRTmnXYUYbPLj1AeZcPThyJBEYmqSMB+P/KHVqrD8 3qZZPFhINhZ0QJGQj4fDkqB7AWmMMlad9yZ6mxkpcxuYydR5P3TIwBBSdBV5JiNp Z6o7sIw8haZGLEDoP2RNnkZH78voKVoDX1DPfTNPH2OaRXd/gtQ0vOBdF98sSA05 5QJNYNgXnx40bS5eKV3QtdrMNx2hJnXeshOhCb46Zw6Nc4tvhEaIXojYrVVwIjo5 D6WvUXNI6nDLdMIb3cekp5CQZbJzcfze0i503C7qEvcZ4C0Su4ibAevZg+gpXAMv tm/lc8/Ds3RzCqpbYhJYZFlOPCdYd+uGDBxrGw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457180; x= 1779543580; bh=apb55LlMDEBM2szOAEFKpKJCzTbXU9G5gULUrddwYm8=; b=r XJfRNLq86D7SHLL/fE1ClibStuFy934sS/rA2jO3i16OuVINFZ9sS9FWMDyaFFN6 SNUhRCDi8NxwqHWERtGNh7rIvb0Tm3whVOCdbSv4ruCK4yf2WULMa61KmG4vuID3 uohFOAOSv7aVIIoHlSRIvfT4LuhYF/CBSKfdhmZyQpOIF/1Kdk10VaDbO+l8uS+O g+t67B2Lr/iQyaM+Tw67rdcx7KPxPwPSlkdqWpxXdILUxyI7uKkyl24RBXho4H87 H27NRs4xlX9XrZnZsJkU6H+gGy/6oWFtv2zpZQyW3TLgFVqkLPwHZrZZS4sPP9v6 s72FMCTTkLECtmv4uw3RA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:39 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 11/16] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Date: Fri, 22 May 2026 14:38:52 +0100 Message-ID: <20260522133857.552279-12-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Sync RWP delivers a message and blocks the faulting thread until the handler resolves the fault. For working-set tracking the VMM does not need the message: it just needs to know, at scan time, which pages were touched. Async RWP serves that use case =E2=80=94 the kernel restores access in-place and the faulting thread continues without blocking. The VMM reconstructs the access pattern after the fact via PAGEMAP_SCAN: pages whose uffd bit is still set (inverted PAGE_IS_ACCESSED) were not re-accessed since the last RWP cycle. Worth calling out: async resolution upgrades writable private anon PTEs via pte_mkwrite() when can_change_pte_writable() allows, mirroring do_numa_page(). Without it, every re-access of an RWP'd writable page would COW-fault a second time. UFFD_FEATURE_RWP_ASYNC requires UFFD_FEATURE_RWP. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- fs/userfaultfd.c | 19 ++++++++++++++++++- include/linux/userfaultfd_k.h | 6 ++++++ include/uapi/linux/userfaultfd.h | 11 ++++++++++- mm/huge_memory.c | 25 ++++++++++++++++++++++++- mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++- mm/memory.c | 27 +++++++++++++++++++++++++-- 6 files changed, 114 insertions(+), 6 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f8f1619f5183..bb0ea60dc3e6 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -89,6 +89,11 @@ static bool userfaultfd_wp_async_ctx(struct userfaultfd_= ctx *ctx) return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); } =20 +static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) +{ + return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); +} + /* * Whether WP_UNPOPULATED is enabled on the uffd context. It is only * meaningful when userfaultfd_wp()=3D=3Dtrue on the vma and when it's @@ -1988,6 +1993,11 @@ bool userfaultfd_wp_async(struct vm_area_struct *vma) return userfaultfd_wp_async_ctx(vma->vm_userfaultfd_ctx.ctx); } =20 +bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return userfaultfd_rwp_async_ctx(vma->vm_userfaultfd_ctx.ctx); +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -2091,6 +2101,12 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, if (features & UFFD_FEATURE_WP_ASYNC) features |=3D UFFD_FEATURE_WP_UNPOPULATED; =20 + ret =3D -EINVAL; + /* RWP_ASYNC requires RWP */ + if ((features & UFFD_FEATURE_RWP_ASYNC) && + !(features & UFFD_FEATURE_RWP)) + goto err_out; + /* report all available features and ioctls to userland */ uffdio_api.features =3D UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR @@ -2113,7 +2129,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, * but not actually usable. */ if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_RWP; + uffdio_api.features &=3D + ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 1beae4f2f479..4fd2a5ff3064 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -295,6 +295,7 @@ extern void userfaultfd_unmap_complete(struct mm_struct= *mm, struct list_head *uf); extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern bool userfaultfd_rwp_async(struct vm_area_struct *vma); =20 void userfaultfd_reset_ctx(struct vm_area_struct *vma); =20 @@ -492,6 +493,11 @@ static inline bool userfaultfd_wp_async(struct vm_area= _struct *vma) return false; } =20 +static inline bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return false; +} + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index d803e76d47ad..c10f08f8a618 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -44,7 +44,8 @@ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ UFFD_FEATURE_MOVE | \ - UFFD_FEATURE_RWP) + UFFD_FEATURE_RWP | \ + UFFD_FEATURE_RWP_ASYNC) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -243,6 +244,13 @@ struct uffdio_api { * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. * Pages are made inaccessible via UFFDIO_RWPROTECT and faults * are delivered when the pages are re-accessed. + * + * UFFD_FEATURE_RWP_ASYNC indicates asynchronous mode for + * UFFDIO_REGISTER_MODE_RWP. When set, faults on read-write + * protected pages are auto-resolved by the kernel (PTE + * permissions restored immediately) without delivering a message + * to the userfaultfd handler. Use PAGEMAP_SCAN with inverted + * PAGE_IS_ACCESSED to find pages that were not re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -262,6 +270,7 @@ struct uffdio_api { #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) #define UFFD_FEATURE_RWP (1<<17) +#define UFFD_FEATURE_RWP_ASYNC (1<<18) __u64 features; =20 __u64 ioctls; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 76ca0fbaa802..985eb4e2b5c0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2266,7 +2266,30 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, =20 vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) { - return handle_userfault(vmf, VM_UFFD_RWP); + struct vm_area_struct *vma =3D vmf->vma; + pmd_t pmd; + + if (!userfaultfd_rwp_async(vma)) + return handle_userfault(vmf, VM_UFFD_RWP); + + vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(pmdp_get(vmf->pmd), vmf->orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + pmd =3D pmd_modify(vmf->orig_pmd, vma->vm_page_prot); + /* pmd_modify() preserves _PAGE_UFFD; drop it on resolution */ + pmd =3D pmd_clear_uffd(pmd); + pmd =3D pmd_mkyoung(pmd); + if (!pmd_write(pmd) && + vma_wants_manual_pte_write_upgrade(vma) && + can_change_pmd_writable(vma, vmf->address, pmd)) + pmd =3D pmd_mkwrite(pmd, vma); + set_pmd_at(vma->vm_mm, vmf->address & HPAGE_PMD_MASK, + vmf->pmd, pmd); + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; } =20 /* NUMA hinting page fault entry point for trans huge pmds */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9fc31cbcba4b..84ed3784e3a0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6075,7 +6075,37 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, */ if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { - return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + spinlock_t *ptl; + pte_t pte; + + /* Sync: drop hugetlb locks before blocking in handle_userfault() */ + if (!userfaultfd_rwp_async(vma)) + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + + ptl =3D huge_pte_lock(h, mm, vmf.pte); + pte =3D huge_ptep_get(mm, vmf.address, vmf.pte); + if (pte_protnone(pte) && huge_pte_uffd(pte)) { + unsigned int shift =3D huge_page_shift(h); + + pte =3D huge_pte_modify(pte, vma->vm_page_prot); + pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + /* huge_pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D huge_pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + /* + * Unlike do_uffd_rwp(), do not upgrade to writable + * here. Hugetlb lacks a can_change_huge_pte_writable() + * equivalent, so a write access will take a separate + * COW fault =E2=80=94 acceptable for the rare private hugetlb + * case. + */ + set_huge_pte_at(mm, vmf.address, vmf.pte, pte, + huge_page_size(h)); + update_mmu_cache(vma, vmf.address, vmf.pte); + } + spin_unlock(ptl); + ret =3D 0; + goto out_mutex; } =20 /* diff --git a/mm/memory.c b/mm/memory.c index e0dcf2c28d9d..bfe6f218fb16 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6174,8 +6174,31 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru =20 static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) { - pte_unmap(vmf->pte); - return handle_userfault(vmf, VM_UFFD_RWP); + pte_t pte; + + if (!userfaultfd_rwp_async(vmf->vma)) { + /* Sync mode: unmap PTE and deliver to userfaultfd handler */ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); + } + + spin_lock(vmf->ptl); + if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } + pte =3D pte_modify(vmf->orig_pte, vmf->vma->vm_page_prot); + /* pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + if (!pte_write(pte) && + vma_wants_manual_pte_write_upgrade(vmf->vma) && + can_change_pte_writable(vmf->vma, vmf->address, pte)) + pte =3D pte_mkwrite(pte, vmf->vma); + set_pte_at(vmf->vma->vm_mm, vmf->address, vmf->pte, pte); + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; } =20 static vm_fault_t do_numa_page(struct vm_fault *vmf) --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2014F421A18; Fri, 22 May 2026 13:39:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457189; cv=none; b=AowdKOn0qr8F6YpzBwi/IiE/TvPFV0y+sPfjLDDw3zDcZrLqe7fCT2Q6M06RAf4YwEzK8UJpEcNJLulcH5/q3ECZaF8iVEwC0M8mYw6/QUhvEWkSt1IWovr0bKYvoSELoCmvOsscBfVTL6GYw7XWOz3GRnTkFJMs/vv0uCoksJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457189; c=relaxed/simple; bh=TcIswz3aMPUxfgBrsp9kAmYxgYFHRyLE6BtZm1sUllE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cZnZ34YxXQ3LPSXaXobphtfuClOML+FZyTGHXcYE5rXChk9IOlRyPAsFFy1NhOW7WtFuxAezrempBw2hinFS6Z0xus4B0S1LxbUGZRfID6mPX3RjAeEWwjeteiED/fWenxrYTnGXju4u9UEe3ICuRrlt7FsXtoCB56mjMo8VJsI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=t2FmlI9B; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=oOlHx3vD; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="t2FmlI9B"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="oOlHx3vD" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 96D5B1D000B8; Fri, 22 May 2026 09:39:43 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Fri, 22 May 2026 09:39:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457183; x= 1779543583; bh=C6PdzXdwzrhCIQ/MtEH0oHYhCHCdcjFwX29BGNbj9Ec=; b=t 2FmlI9BUnXTxY3fweS4cBQliaFWqPLeXyqQ34PIbHSrUMB/Td/55pSi8OI2YCXQP BYXVjIPOxrgOEzpPvTHsoVscfapvGM4VGDMF92j3hz9t/MbWWX2GsxKuSA0FU6Ko BXXbM4X0H8ob17MYvzo8AbkhutdhVeiukd+n8GGeKMhZqZdmCSSzjzdtAq6Aw9go QccOj+3XSCA+Z3JuX9H8/WoarTffeZ5HO3igm3+h+xFb0gH5W2pK9Yi3B5B8T4xK GZxXzmonv5436bXL5Z8Y7xNkymRAFH9o0xIuN555jSKvq1MWDKzo/dnjgeoNml9b 27rikPG/7uy075MEOn2uQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457183; x=1779543583; bh=C 6PdzXdwzrhCIQ/MtEH0oHYhCHCdcjFwX29BGNbj9Ec=; b=oOlHx3vD3mg66NdN+ M54QVS7NyfvyTTAJBqKLUaozqHc5ezgDCd2PN15VK6fiNyq4vVDWREFnHpbQCw1O Qhc+LDsxuwprRN9H3ZWBBwCTSx8Y/mCB6X2d+boSbN+zTqYrA3uq9zRoz1SrXJjD AjjbYJ3EoOweYSFdNz3XG48LDw8D0EeUxC5c2RgSYO1IsgPk8RSiuJCM5/+tc95b vbQt1a7oI1ml+F3ox0Uor8ORA2SolurJH0W4Jl4hdz2Q1q9Np4Nbk0qfSjvZT9UR ASEwbmkDKsGKXeBArmjNBaRx24f5Jd95kmpNp/kBbCUf3nkK70u7tj9wJXpYD2Ey iyaPg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedunecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:42 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 12/16] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Date: Fri, 22 May 2026 14:38:53 +0100 Message-ID: <20260522133857.552279-13-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Add an ioctl to toggle async mode at runtime without re-registering the userfaultfd. This allows a VMM to switch between sync and async RWP modes on-the-fly -- for example, starting in async mode for working set scanning, then switching to sync mode to intercept faults during page eviction. UFFDIO_SET_MODE takes an enable/disable bitmask of UFFD_FEATURE_* flags. Only UFFD_FEATURE_RWP_ASYNC is toggleable today; the ioctl rejects any other bit with -EINVAL. Enabling RWP_ASYNC also requires RWP to have been negotiated at UFFDIO_API time, mirroring the UFFDIO_API invariant. Fault-path readers of ctx->features run under mmap_read_lock or a per-VMA lock; the RMW takes mmap_write_lock and calls vma_start_write() on every UFFD-armed VMA, so those readers are fully excluded. userfaultfd_show_fdinfo(), however, reads ctx->features without any lock, so the RMW is written as a single WRITE_ONCE and fdinfo reads it with READ_ONCE. That keeps the lockless observer from seeing a mid-RMW intermediate and removes the audit burden when new toggleable bits are added later. When switching to async, pending sync waiters are woken so they retry and auto-resolve under the new mode. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- fs/userfaultfd.c | 150 +++++++++++++++++++++++++------ include/uapi/linux/userfaultfd.h | 14 +++ 2 files changed, 136 insertions(+), 28 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bb0ea60dc3e6..7eacaa20baec 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -79,19 +79,29 @@ struct userfaultfd_wake_range { /* internal indication that UFFD_API ioctl was successfully executed */ #define UFFD_FEATURE_INITIALIZED (1u << 31) =20 +/* + * UFFDIO_SET_MODE updates ctx->features under mmap_write_lock with + * WRITE_ONCE; readers that run outside mmap_read_lock or the per-VMA + * lock (poll/read_iter/ioctl, fdinfo) must pair with READ_ONCE. + */ +static unsigned int userfaultfd_features(struct userfaultfd_ctx *ctx) +{ + return READ_ONCE(ctx->features); +} + static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) { - return ctx->features & UFFD_FEATURE_INITIALIZED; + return userfaultfd_features(ctx) & UFFD_FEATURE_INITIALIZED; } =20 static bool userfaultfd_wp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_WP_ASYNC); } =20 static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_RWP_ASYNC); } =20 /* @@ -106,7 +116,7 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *= vma) if (!ctx) return false; =20 - return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; + return userfaultfd_features(ctx) & UFFD_FEATURE_WP_UNPOPULATED; } =20 static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, @@ -1870,6 +1880,109 @@ static int userfaultfd_rwprotect(struct userfaultfd= _ctx *ctx, return ret; } =20 +/* Subset of UFFD_API_FEATURES actually supported by this kernel/arch */ +static __u64 uffd_api_available_features(void) +{ + __u64 f =3D UFFD_API_FEATURES; + + if (!IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_MINOR)) + f &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + if (!pgtable_supports_uffd()) + f &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; + if (!uffd_supports_wp_marker()) + f &=3D ~(UFFD_FEATURE_WP_HUGETLBFS_SHMEM | + UFFD_FEATURE_WP_UNPOPULATED | + UFFD_FEATURE_WP_ASYNC); + /* + * RWP needs both PROT_NONE support and the uffd PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + f &=3D ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + return f; +} + +/* Async features that can be toggled at runtime via UFFDIO_SET_MODE */ +#define UFFD_FEATURE_TOGGLEABLE UFFD_FEATURE_RWP_ASYNC + +static int userfaultfd_set_mode(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + struct uffdio_set_mode mode; + struct mm_struct *mm =3D ctx->mm; + + if (copy_from_user(&mode, (void __user *)arg, sizeof(mode))) + return -EFAULT; + + /* enable and disable must not overlap */ + if (mode.enable & mode.disable) + return -EINVAL; + + /* only toggleable features that this kernel/arch actually supports */ + if ((mode.enable | mode.disable) & + ~(uffd_api_available_features() & UFFD_FEATURE_TOGGLEABLE)) + return -EINVAL; + + /* RWP_ASYNC can only be enabled on contexts that negotiated RWP */ + if ((mode.enable & UFFD_FEATURE_RWP_ASYNC) && + !(ctx->features & UFFD_FEATURE_RWP)) + return -EINVAL; + + if (!mmget_not_zero(mm)) + return -ESRCH; + + /* + * Drain in-flight faults before flipping features. mmap_write_lock() + * blocks new mmap_read_lock() callers, but per-VMA locked faults + * (lock_vma_under_rcu() + FAULT_FLAG_VMA_LOCK) that acquired before + * this point keep running. Calling vma_start_write() on each UFFD- + * armed VMA waits for those readers to drop, so no in-flight fault + * can observe the old features after mmap_write_unlock(). + */ + mmap_write_lock(mm); + { + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx =3D=3D ctx) + vma_start_write(vma); + } + } + /* + * Single WRITE_ONCE so lockless readers (fdinfo, poll/read_iter + * via userfaultfd_is_initialized(), and the userfaultfd_features() + * helper used elsewhere) can't observe a mid-RMW intermediate + * value. Hot-path readers already serialise through the mmap lock + * + vma_start_write() drain above, so their load doesn't need an + * annotation. + */ + WRITE_ONCE(ctx->features, + (ctx->features | mode.enable) & ~mode.disable); + mmap_write_unlock(mm); + + /* + * If switching to async, wake threads blocked in handle_userfault(). + * They will retry the fault and auto-resolve under the new mode. + * len=3D0 means wake all pending faults on this context. + */ + if (mode.enable & UFFD_FEATURE_RWP_ASYNC) { + struct userfaultfd_wake_range range =3D { .len =3D 0 }; + + spin_lock_irq(&ctx->fault_pending_wqh.lock); + __wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, + &range); + __wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range); + spin_unlock_irq(&ctx->fault_pending_wqh.lock); + } + + mmput(mm); + return 0; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -2108,29 +2221,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, goto err_out; =20 /* report all available features and ioctls to userland */ - uffdio_api.features =3D UFFD_API_FEATURES; -#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR - uffdio_api.features &=3D - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif - if (!pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; - - if (!uffd_supports_wp_marker()) { - uffdio_api.features &=3D ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; - } - /* - * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The - * VM_UFFD_RWP check covers compile-time unavailability; the - * pgtable_supports_uffd() check covers runtime (e.g. riscv - * without the SVRSW60T59B extension) where the PTE bit is declared - * but not actually usable. - */ - if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D - ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + uffdio_api.features =3D uffd_api_available_features(); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) @@ -2200,6 +2291,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_RWPROTECT: ret =3D userfaultfd_rwprotect(ctx, arg); break; + case UFFDIO_SET_MODE: + ret =3D userfaultfd_set_mode(ctx, arg); + break; } return ret; } @@ -2227,7 +2321,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *= m, struct file *f) * protocols: aa:... bb:... */ seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n", - pending, total, UFFD_API, ctx->features, + pending, total, UFFD_API, userfaultfd_features(ctx), UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS); } #endif diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index c10f08f8a618..cea11aad6b54 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -49,6 +49,7 @@ #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ + (__u64)1 << _UFFDIO_SET_MODE | \ (__u64)1 << _UFFDIO_API) #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ @@ -85,6 +86,7 @@ #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) #define _UFFDIO_RWPROTECT (0x09) +#define _UFFDIO_SET_MODE (0x0A) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -111,6 +113,8 @@ struct uffdio_poison) #define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ struct uffdio_rwprotect) +#define UFFDIO_SET_MODE _IOW(UFFDIO, _UFFDIO_SET_MODE, \ + struct uffdio_set_mode) =20 /* read() structure */ struct uffd_msg { @@ -406,6 +410,16 @@ struct uffdio_move { __s64 move; }; =20 +struct uffdio_set_mode { + /* + * Toggle async mode for features at runtime. + * Supported: UFFD_FEATURE_RWP_ASYNC. + * Setting a bit in both enable and disable is invalid. + */ + __u64 enable; + __u64 disable; +}; + /* * Flags for the userfaultfd(2) system call itself. */ --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C64C0423175; Fri, 22 May 2026 13:39:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457190; cv=none; b=Z+XEOrxs9RMdyL/uDUooLwh+6tmgwf1WSVsKiYuBZchzU49Lnp4pS9UX56u/FAX/G4OFC6qWqFle919pq5uuPY3bAuPGBYnAcS4Sq7GB1Fb5KbHCJeVIHxyRyneFWAA5zpjIiWu4VtbMwADklFU4zog3Ma+O6Y1Nzxyftv4hplE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457190; c=relaxed/simple; bh=EHf6a2CTgl6Meib7qtj04+Nga45BdOt5tT8uSNWskX4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a2pSlPGBwXwWfn9+u4mr4CET0H/tp5+USrK4Y6zQnrl331AVFIYchBk2/1ci0/uWH45IY9Fnld5ZeOa9ZXCE4KhWw/uioGko3J519qdI7ZAzJ+sqwEY19yNNH3qm5m2xRPA8sghVJocPIhOXpraMwWsOVQRvyWf04niVYJW+j6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=XRQkBGcG; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=XUl4SuKN; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="XRQkBGcG"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="XUl4SuKN" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 6F0101D0011E; Fri, 22 May 2026 09:39:46 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457186; x=1779543586; bh=m8tLFE9pHt2T/fAqSqYbmXbLxUOF+bN/ k9MuWOxvlPk=; b=XRQkBGcGDI7mB2HeE+qtOa25rGgGXMCrtFu2P/VsIzrJbMFp Gbw+e59jnb8m05sM3/s/4tNGSdiAU9APos9Dlcd4LRTa6DzTs8VDo53tB8EHA5b1 A1eO1lyEImxHlHEgiVHQhWv66wDaEthu/pJ0Iikcv1TcP4PLzGGkG9zSYHyQjLDj UMbsz1fYV68kQc1dqWVjA3H2teqvzi1lZjHRObPB1Mqqubq6oQxlZW7ejuEgT8FL gAl8x1X4vXM+IE258qoj96GaFFVLUNJwUR27e2IZ3CghnpCXj2aORG4My4ydDI0U S55yppvVg6dqoCkqz3on7nc9IkNa0WAYAmdyLQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457186; x= 1779543586; bh=m8tLFE9pHt2T/fAqSqYbmXbLxUOF+bN/k9MuWOxvlPk=; b=X Ul4SuKNSf8uluzHMVEFVz0YoS8tWsh1dQUKSYvwgk2FfRWiC52TItE7lHDDwOnCw vo4uEljwGt5FmvmEwIpTy+vqM8j1fu9vK7xrS7CZiMCJndguRMDIGeNFprxb0bcL e+KSRanD7OXCg/6/3QYcTpS1edwTMkkP+emNtX5CpDQZL7/nZSqq/+HQK4ymoH95 JW9nKunas8UAHqZS9d40TLJYnxzM7Nuv0X4h18zvsMlxk/fyz5n/21POFbZjAinD dEyQIk7zQCZL5I1mKZqxsE9LFBZaD3ix+0elG9fmn7TWYDNu9IVBczP9+ReF0Vp2 A90fF241SEebbXbQMav9w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:45 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 13/16] selftests/mm: add userfaultfd RWP tests Date: Fri, 22 May 2026 14:38:54 +0100 Message-ID: <20260522133857.552279-14-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Coverage for UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT: rwp-async async mode =E2=80=94 touch pages, verify permissions a= re auto-restored without a message rwp-sync sync mode =E2=80=94 access blocks, handler resolves via UFFDIO_RWPROTECT rwp-pagemap PAGEMAP_SCAN reports still-cold pages via inverted PAGE_IS_ACCESSED rwp-mprotect RWP survives mprotect(PROT_NONE) -> mprotect(PROT_READ|PROT_WRITE) round-trip rwp-gup GUP walks through a protnone RWP PTE (pipe write/read drives the GUP path) rwp-async-toggle UFFDIO_SET_MODE flips between sync and async without re-registering rwp-close closing the uffd restores page permissions rwp-fork RWP survives fork() with EVENT_FORK; child's PTEs keep the uffd bit rwp-fork-pin RWP survives fork() on an RO-longterm-pinned anon page (forces copy_present_page()); child read auto-resolves and clears the bit, proving PAGE_NONE was in place rwp-wp-exclusive register with MODE_WP|MODE_RWP returns -EINVAL All tests run against anon, shmem, shmem-private, hugetlb, and hugetlb-private memory, except rwp-fork-pin which is anon-only =E2=80=94 copy_present_page() is the private-anon pinned-exclusive fork path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 --- tools/testing/selftests/mm/uffd-unit-tests.c | 766 +++++++++++++++++++ 1 file changed, 766 insertions(+) diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/s= elftests/mm/uffd-unit-tests.c index 6f5e404a446c..234d3ac0adfb 100644 --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -7,6 +7,8 @@ =20 #include "uffd-common.h" =20 +#include +#include #include "../../../../mm/gup_test.h" =20 #ifdef __NR_userfaultfd @@ -128,6 +130,11 @@ static void uffd_test_skip(const char *message) */ static int test_uffd_api(bool use_dev) { + const uint64_t expected_ioctls =3D + BIT_ULL(_UFFDIO_REGISTER) | + BIT_ULL(_UFFDIO_UNREGISTER) | + BIT_ULL(_UFFDIO_API) | + BIT_ULL(_UFFDIO_SET_MODE); struct uffdio_api uffdio_api; int uffd; =20 @@ -167,6 +174,15 @@ static int test_uffd_api(bool use_dev) goto out; } =20 + /* Verify returned fd-level ioctls bitmask */ + if ((uffdio_api.ioctls & expected_ioctls) !=3D expected_ioctls) { + uffd_test_fail("UFFDIO_API missing expected ioctls: " + "got=3D0x%"PRIx64", expected=3D0x%"PRIx64, + (uint64_t)uffdio_api.ioctls, + expected_ioctls); + goto out; + } + /* Test double requests of UFFDIO_API with a random feature set */ uffdio_api.features =3D BIT_ULL(0); if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D 0) { @@ -623,6 +639,685 @@ void uffd_minor_collapse_test(uffd_global_test_opts_t= *gopts, uffd_test_args_t * uffd_minor_test_common(gopts, true, false); } =20 +static int uffd_register_rwp(int uffd, void *addr, uint64_t len) +{ + struct uffdio_register reg =3D { + .range =3D { .start =3D (unsigned long)addr, .len =3D len }, + .mode =3D UFFDIO_REGISTER_MODE_RWP, + }; + + if (ioctl(uffd, UFFDIO_REGISTER, ®) =3D=3D -1) + return -errno; + return 0; +} + +static void rwprotect_range(int uffd, __u64 start, __u64 len, bool protect) +{ + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D start, .len =3D len }, + .mode =3D protect ? UFFDIO_RWPROTECT_MODE_RWP : 0, + }; + + if (ioctl(uffd, UFFDIO_RWPROTECT, &rwp)) + err("UFFDIO_RWPROTECT failed"); +} + +static void set_async_mode(int uffd, bool enable) +{ + struct uffdio_set_mode mode =3D { }; + + if (enable) + mode.enable =3D UFFD_FEATURE_RWP_ASYNC; + else + mode.disable =3D UFFD_FEATURE_RWP_ASYNC; + + if (ioctl(uffd, UFFDIO_SET_MODE, &mode)) + err("UFFDIO_SET_MODE failed"); +} + +/* + * Test async RWP faults on anonymous memory. + * Populate pages, register MODE_RWP with RWP_ASYNC, + * RW-protect, re-access, verify content preserved and no faults delivered. + */ +static void uffd_rwp_async_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate all pages with known content */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages (sets protnone) */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Access all pages =E2=80=94 should auto-resolve, no faults */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu content mismatch: %u !=3D %u", + p, page[0], expected); + return; + } + } + + uffd_test_pass(); +} + +/* + * Fault handler for RWP =E2=80=94 unprotect the page via UFFDIO_RWPROTECT. + */ +static void uffd_handle_rwp_fault(uffd_global_test_opts_t *gopts, + struct uffd_msg *msg, + struct uffd_args *uargs) +{ + if (!(msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_RWP)) + err("expected RWP fault, got 0x%llx", + msg->arg.pagefault.flags); + + rwprotect_range(gopts->uffd, msg->arg.pagefault.address, + gopts->page_size, false); + uargs->minor_faults++; +} + +/* + * Test sync RWP faults on anonymous memory. + * Populate pages, register MODE_RWP (sync), RW-protect, + * access from worker thread, verify fault delivered, UFFDIO_RWPROTECT res= olves. + */ +static void uffd_rwp_sync_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + pthread_t uffd_mon; + struct uffd_args uargs =3D { }; + bool failed =3D false; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Start fault handler thread */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access all pages =E2=80=94 triggers sync RWP faults, handler unprotect= s */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + failed =3D true; + goto out; + } + } + +out: + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (failed) + return; + if (uargs.minor_faults =3D=3D 0) + uffd_test_fail("expected RWP faults, got 0"); + else + uffd_test_pass(); +} + +/* + * Test PAGEMAP_SCAN detection of RW-protected (cold) pages. + */ +static void uffd_rwp_pagemap_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Need at least 4 pages */ + if (nr_pages < 4) { + uffd_test_skip("need at least 4 pages"); + return; + } + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Touch first half of pages to re-activate them (async auto-resolve) */ + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* Scan for cold (still RW-protected) pages */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + /* + * PAGE_IS_ACCESSED is set once the uffd-wp bit has been cleared + * (access happened, or the user resolved). Invert it to select + * still-protected (cold) pages. + */ + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + + /* + * The second half of pages should be reported as RW-protected. + * They may be coalesced into one region. + */ + if (ret < 1) { + uffd_test_fail("expected cold pages, got %ld regions", ret); + return; + } + + /* Verify the cold region covers the second half */ + uint64_t cold_start =3D regions[0].start; + uint64_t expected_start =3D (uint64_t)gopts->area_dst + + (nr_pages / 2) * page_size; + + if (cold_start !=3D expected_start) { + uffd_test_fail("cold region starts at 0x%lx, expected 0x%lx", + (unsigned long)cold_start, + (unsigned long)expected_start); + return; + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection survives a mprotect(PROT_NONE) -> + * mprotect(PROT_READ|PROT_WRITE) round-trip. The uffd-wp bit on a + * VM_UFFD_RWP VMA must continue to carry PROT_NONE semantics after + * mprotect() changes the base protection; otherwise accesses would + * silently succeed and the pagemap bit would stick without a fault + * ever clearing it. + */ +static void uffd_rwp_mprotect_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect the whole range */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Round-trip mprotect(): PROT_NONE -> PROT_READ|PROT_WRITE */ + if (mprotect(gopts->area_dst, nr_pages * page_size, PROT_NONE)) + err("mprotect() PROT_NONE"); + if (mprotect(gopts->area_dst, nr_pages * page_size, + PROT_READ | PROT_WRITE)) + err("mprotect() PROT_READ|PROT_WRITE"); + + /* Touch every page. Async RWP must auto-resolve each fault. */ + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* + * After touching, no page should remain RW-protected. A stuck + * uffd-wp bit would mean mprotect() silently dropped PROT_NONE and + * the access never faulted. + */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + if (ret !=3D 0) { + uffd_test_fail("expected no cold pages after mprotect()+touch, got %ld r= egions", + ret); + return; + } + + uffd_test_pass(); +} + +/* + * Test that GUP resolves through protnone PTEs (async mode). + * vmsplice() into a pipe pins user pages via get_user_pages_fast() -- + * unlike write(), which goes through copy_from_user() and ordinary + * hardware page faults -- so it exercises gup_can_follow_protnone() on + * the RW-protected PTE. In async mode the kernel auto-restores + * permissions and GUP returns the page. + */ +static void uffd_rwp_gup_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + struct iovec iov; + char buf; + int pipefd[2]; + + /* Populate first page with known content */ + memset(gopts->area_dst, 0xCD, gopts->page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, gopts->page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + gopts->page_size, true); + + if (pipe(pipefd)) + err("pipe"); + + /* + * One byte's worth of iov is enough to GUP the containing page and + * keeps the pipe transfer well under any pipe-capacity limit even on + * hugetlb-backed runs. + */ + iov.iov_base =3D gopts->area_dst; + iov.iov_len =3D 1; + if (vmsplice(pipefd[1], &iov, 1, 0) !=3D 1) { + uffd_test_fail("vmsplice from RW-protected page failed: %s", + strerror(errno)); + goto out; + } + + if (read(pipefd[0], &buf, 1) !=3D 1) { + uffd_test_fail("read from pipe failed"); + goto out; + } + + if (buf !=3D (char)0xCD) { + uffd_test_fail("content mismatch: got 0x%02x, expected 0xCD", + (unsigned char)buf); + goto out; + } + + uffd_test_pass(); +out: + close(pipefd[0]); + close(pipefd[1]); +} + +/* + * Test runtime toggle between async and sync modes. + * Start in async mode (detection), flip to sync (eviction), verify faults + * block, resolve them, flip back to async. + */ +static void uffd_rwp_async_toggle_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffd_args uargs =3D { }; + pthread_t uffd_mon; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* Phase 1: async detection =E2=80=94 RW-protect, access first half */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; /* auto-resolves in async mode */ + } + + /* Phase 2: flip to sync for eviction */ + set_async_mode(gopts->uffd, false); + + /* Start handler =E2=80=94 will receive faults for cold pages */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access second half (cold pages) =E2=80=94 should trigger sync faults */ + for (p =3D nr_pages / 2; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + goto out; + } + } + + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. Stopping + * here also makes Phase 3 a clean async-only test -- with the + * handler still running it would silently resolve any sync fault + * the kernel erroneously delivers, masking a regression. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (uargs.minor_faults =3D=3D 0) { + uffd_test_fail("expected sync faults, got 0"); + return; + } + + /* Phase 3: flip back to async */ + set_async_mode(gopts->uffd, true); + + /* RW-protect and access again =E2=80=94 should auto-resolve */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + uffd_test_pass(); + return; +out: + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); +} + +/* + * Test that RW-protected pages become accessible after closing uffd. + */ +static void uffd_rwp_close_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Close uffd =E2=80=94 should restore protnone PTEs */ + close(gopts->uffd); + gopts->uffd =3D -1; + + /* All pages should be accessible with original content */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu not accessible after close", p); + return; + } + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection is preserved across fork() when + * UFFD_FEATURE_EVENT_FORK is enabled. Without preservation, the child's + * PTEs would lose the uffd-wp marker and RWP-protected accesses would + * silently fall through to do_numa_page(). + */ +static void uffd_rwp_fork_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + int pagemap_fd; + uint64_t value; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failed"); + + /* Populate + RWP-protect */ + *gopts->area_dst =3D 1; + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + page_size, true); + + /* Parent: verify uffd-wp bit is set before fork */ + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * Fork with EVENT_FORK: child inherits VM_UFFD_RWP. Child reads + * its own pagemap and must still see the uffd-wp bit set. + */ + if (pagemap_test_fork(gopts, true, false)) { + uffd_test_fail("RWP marker lost in child after fork"); + goto out; + } + + uffd_test_pass(); +out: + close(pagemap_fd); +} + +/* + * Test that RWP protection on a pinned anon page is preserved across fork= (). + * Pinning forces copy_present_page() in the child path, which must restore + * PAGE_NONE on top of the uffd bit. Using async mode, a read in the child + * auto-resolves if =E2=80=94 and only if =E2=80=94 the PTE was actually p= rotnone+uffd; the + * cleared uffd bit afterward proves the fault path ran. + */ +static void uffd_rwp_fork_pin_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long page_size =3D gopts->page_size; + fork_event_args fevent_args =3D { .gopts =3D gopts, .child_uffd =3D -1 }; + pin_args pin_args =3D {}; + int pagemap_fd, status; + pthread_t fevent_thread; + uint64_t value; + pid_t child; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, page_size)) + err("register failed"); + + /* Populate. */ + *gopts->area_dst =3D 1; + + /* RO-longterm pin so fork() takes copy_present_page() for this PTE. */ + if (pin_pages(&pin_args, gopts->area_dst, page_size)) { + uffd_test_skip("Possibly CONFIG_GUP_TEST missing or unprivileged"); + uffd_unregister(gopts->uffd, gopts->area_dst, page_size); + return; + } + + /* RWP-protect: PTE is now PAGE_NONE + uffd bit. */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, page_size, true); + + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * UFFD_FEATURE_EVENT_FORK is required so the child inherits + * VM_UFFD_RWP and the marker; without it dup_userfaultfd() resets + * the child VMA and the test would pass for the wrong reason. + * dup_userfaultfd() blocks until the EVENT_FORK message is consumed, + * so spawn a reader before the fork(). + */ + gopts->ready_for_fork =3D false; + if (pthread_create(&fevent_thread, NULL, fork_event_consumer, + &fevent_args)) + err("pthread_create() for fork event consumer"); + while (!gopts->ready_for_fork) + ; /* Wait for consumer to start polling. */ + + child =3D fork(); + if (child < 0) + err("fork"); + if (child =3D=3D 0) { + volatile char c; + int cfd; + + /* + * Read the pinned page. Only reaches the fault path if the + * child PTE is protnone + uffd; async mode auto-resolves and + * clears the uffd bit. If copy_present_page() dropped + * PAGE_NONE, the read would silently succeed and the bit + * would still be set. + */ + c =3D *(volatile char *)gopts->area_dst; + (void)c; + + cfd =3D pagemap_open(); + value =3D pagemap_get_entry(cfd, gopts->area_dst); + close(cfd); + _exit((value & PM_UFFD_WP) ? 1 : 0); + } + if (waitpid(child, &status, 0) < 0) + err("waitpid"); + if (pthread_join(fevent_thread, NULL)) + err("pthread_join() for fork event consumer"); + if (fevent_args.child_uffd >=3D 0) + close(fevent_args.child_uffd); + + unpin_pages(&pin_args); + close(pagemap_fd); + if (uffd_unregister(gopts->uffd, gopts->area_dst, page_size)) + err("unregister failed"); + + if (!WIFEXITED(status) || WEXITSTATUS(status) !=3D 0) { + uffd_test_fail("RWP not enforced in child after pinned fork"); + return; + } + + uffd_test_pass(); +} + +/* + * WP and RWP share the uffd-wp PTE bit and cannot coexist in the same VMA. + * Registration requesting both modes must be rejected. + */ +static void uffd_rwp_wp_exclusive_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffdio_register reg =3D { }; + + reg.range.start =3D (unsigned long)gopts->area_dst; + reg.range.len =3D nr_pages * page_size; + reg.mode =3D UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_RWP; + + if (ioctl(gopts->uffd, UFFDIO_REGISTER, ®) =3D=3D 0) { + uffd_test_fail("register with WP|RWP unexpectedly succeeded"); + return; + } + if (errno !=3D EINVAL) { + uffd_test_fail("register with WP|RWP: expected EINVAL, got %d", + errno); + return; + } + uffd_test_pass(); +} + static sigjmp_buf jbuf, *sigbuf; =20 static void sighndl(int sig, siginfo_t *siginfo, void *ptr) @@ -1625,6 +2320,77 @@ uffd_test_case_t uffd_tests[] =3D { /* We can't test MADV_COLLAPSE, so try our luck */ .uffd_feature_required =3D UFFD_FEATURE_MINOR_SHMEM, }, + { + .name =3D "rwp-async", + .uffd_fn =3D uffd_rwp_async_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-sync", + .uffd_fn =3D uffd_rwp_sync_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-pagemap", + .uffd_fn =3D uffd_rwp_pagemap_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-mprotect", + .uffd_fn =3D uffd_rwp_mprotect_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-gup", + .uffd_fn =3D uffd_rwp_gup_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-async-toggle", + .uffd_fn =3D uffd_rwp_async_toggle_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-close", + .uffd_fn =3D uffd_rwp_close_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-fork", + .uffd_fn =3D uffd_rwp_fork_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-fork-pin", + .uffd_fn =3D uffd_rwp_fork_pin_test, + .mem_targets =3D MEM_ANON, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC | + UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-wp-exclusive", + .uffd_fn =3D uffd_rwp_wp_exclusive_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | + UFFD_FEATURE_PAGEFAULT_FLAG_WP | + UFFD_FEATURE_WP_HUGETLBFS_SHMEM, + }, { .name =3D "sigbus", .uffd_fn =3D uffd_sigbus_test, --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fhigh-b1-smtp.messagingengine.com (fhigh-b1-smtp.messagingengine.com [202.12.124.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 612B6425CCB; Fri, 22 May 2026 13:39:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457193; cv=none; b=FJcXfgiTGSvpzoWXZiHjo8ZshyovEhPjH8eDOmHrr3kn+wYLi9LBykZ+XgOMuK8tUmg1f5orq8pzj+FFxkBAi3kEcq2AU8rMkHX7NypbwBhxPETPBAseTLkPtKY5KFL0reyKKavS+6dS4o2V7/5gBv7TRGGCM5KgC1PI5UBVbYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457193; c=relaxed/simple; bh=yQ5vpGzgTg8Y7UqkqfqzyQH/a6jtWmQYEKNjnhfz0aQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YRN/hl/3M3ugrQeW0QKA4DRjj47Y1Q5dUXZlUByHRGB0so89g/lZ3k7vhl8g/UFOVTU/mn4rbmBtysRZhUpA0K4HtwHqOKUwMg9mRTU1JrBVMqnkL5UtDioNUaV8A7Y780fv87qLCu4XgXDw/jEMxofFPomCPLJI56ty61jHGmo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=U7nJUZJu; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=AKh48fvP; arc=none smtp.client-ip=202.12.124.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="U7nJUZJu"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="AKh48fvP" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 18B1A7A00D4; Fri, 22 May 2026 09:39:49 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Fri, 22 May 2026 09:39:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779457188; x=1779543588; bh=2pL89n2WuXfQRwVMPDSbpel7fBSoLeuO meqskv4u0v0=; b=U7nJUZJuJqz/O7UKPTC9Eap1k93+riBKOHu38cdJAKhMkBNB JPPHSvsUMEqjKFFp3qIqolAgLVnxqdk9ikzkNmTqzK4jAFe6vgxBPqs172ALSG0x KN9DzoawY0qUTyEn8eWYzszsv6xY0DKn/mpRie1vol0rcIRYV20e47wvegOHUenu Rzj9hj8BTGhdADgkxokR66l6LPJauChqvWb9mI3/feJ4oreD43lyLXTvbM933I5c SOx7q4nJFq5+zB/Kxlgn7Pf4Ro0LhIPWs6wY48AV4Jtz0TeY1urDLJ4G2ktEBHuM vCQ5jXQZ77EXcQV/FfNucvE3AkrhRyFY83UbQg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779457188; x= 1779543588; bh=2pL89n2WuXfQRwVMPDSbpel7fBSoLeuOmeqskv4u0v0=; b=A Kh48fvPt9dFZ8psL3NRyYeynfU6SqZUxuuhw9cv9es6NZCSj0hS+vxs/cIMVV1Dn CAPfsQ2xkOVWQJnU0EkEt+6aQFPOkLKc2iKNLiNLUVq8F/tPQES3desbglwi4+/s IDeKfSh23Bu8kFJjfiHpuGUvUSdtO/6/rO5pErRVRjjt3s422mQRLHu/2+UiiPor RW9D8TnBRbqf28y+b1PZc5Lim/trts9a/+OPlX+rfg2RqqZwG7ZWtv0XUZZYxziw JedeUceExYVZwYfE6W7NawSvmlGPzcXRPl1M7Rk4kUSjIlehy7c8PT7L9Us28tgl Kh7W2NXIWI/2yq1gzek6Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfgggtgfesthekre dtredtjeenucfhrhhomhepmfhirhihlhcuufhhuhhtshgvmhgruhcuoehkihhrihhllhes shhhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepteefveejgeffleefff egiedtieegiedugeekudehtedvjeetvdegieeikefffeevnecuvehluhhsthgvrhfuihii vgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedviedpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehrphhptheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepphgvthgvrhigsehrvggu hhgrthdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtph htthhopehljhhssehkvghrnhgvlhdrohhrghdprhgtphhtthhopehsuhhrvghnsgesghho ohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrggskhgrsehkvghrnhgvlhdrohhrghdprh gtphhtthhopehlihgrmhdrhhhofihlvghtthesohhrrggtlhgvrdgtohhmpdhrtghpthht ohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:48 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 14/16] Documentation/userfaultfd: document RWP working set tracking Date: Fri, 22 May 2026 14:38:55 +0100 Message-ID: <20260522133857.552279-15-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP: - sync and async fault models; - UFFDIO_RWPROTECT semantics; - UFFD_FEATURE_RWP_ASYNC; - UFFDIO_SET_MODE runtime mode flips. It also covers typical VMM working-set-tracking workflow from detection loop through sync-mode eviction and back to async. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 --- Documentation/admin-guide/mm/userfaultfd.rst | 226 ++++++++++++++++++- 1 file changed, 220 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index 1e533639fd50..cb5d0e0c9fff 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -275,16 +275,16 @@ tracking and it can be different in a few ways: - Dirty information will not get lost if the pte was zapped due to various reasons (e.g. during split of a shmem transparent huge page). =20 - - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit - set; dirty when uffd-wp bit cleared), it has different semantics on - some of the memory operations. For example: ``MADV_DONTNEED`` on + - Due to a reverted meaning of soft-dirty (page clean when the uffd bit + is set; dirty when the uffd bit is cleared), it has different semantics + on some of the memory operations. For example: ``MADV_DONTNEED`` on anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as - dirtying of memory by dropping uffd-wp bit during the procedure. + dirtying of memory by dropping the uffd bit during the procedure. =20 The user app can collect the "written/dirty" status by looking up the -uffd-wp bit for the pages being interested in /proc/pagemap. +uffd bit for the pages being interested in /proc/pagemap. =20 -The page will not be under track of uffd-wp async mode until the page is +The page will not be under track of userfaultfd-wp async mode until the pa= ge is explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault that was tracked by async mode userfaultfd-wp is invalid. @@ -307,6 +307,220 @@ transparent to the guest, we want that same address r= ange to act as if it was still poisoned, even though it's on a new physical host which ostensibly doesn't have a memory error in the exact same spot. =20 +Read-Write Protection +--------------------- + +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)`` +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a +protected range fall through to the normal missing-page path. It uses the +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages +inaccessible while keeping them resident in memory. Works on anonymous, +shmem, and hugetlbfs memory. + +RWP is designed for VM memory managers that need to track the working set +of guest memory for cold page eviction to tiered or remote storage. + +**Setup:** + +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``. + Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well =E2=80=94 it requ= ires + ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call. + +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP`` + (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be + fetched back from storage). + +**Feature availability:** + +RWP is built on top of two kernel primitives: a spare PTE bit owned by +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and architecture support +for present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When= both +are available on a 64-bit kernel, the build selects +``CONFIG_USERFAULTFD_RWP=3Dy`` and the ``VM_UFFD_RWP`` VMA flag becomes +available. + +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are masked out of the +features returned by ``UFFDIO_API`` when the running kernel or architecture +cannot support them =E2=80=94 for example 32-bit kernels (where ``VM_UFFD_= RWP`` is +unavailable), kernels built without ``CONFIG_USERFAULTFD_RWP``, and +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv +without the ``SVRSW60T59B`` extension). ``UFFDIO_API`` does not fail; +unsupported bits are simply absent from ``uffdio_api.features`` on return. +Callers should inspect the returned ``features`` after ``UFFDIO_API`` and +fall back to another tracking method when RWP is unavailable. + +**Protecting and Unprotecting:** + +Use ``UFFDIO_RWPROTECT`` to protect or unprotect a range, mirroring the +``UFFDIO_WRITEPROTECT`` interface:: + + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D addr, .len =3D len }, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP, /* protect */ + }; + ioctl(uffd, UFFDIO_RWPROTECT, &rwp); + +Setting ``UFFDIO_RWPROTECT_MODE_RWP`` sets PROT_NONE on present PTEs in the +range. Pages stay resident and their physical frames are preserved =E2=80= =94 only +access permissions are removed. + +Clearing ``UFFDIO_RWPROTECT_MODE_RWP`` restores normal VMA permissions and +wakes any faulting threads (unless ``UFFDIO_RWPROTECT_MODE_DONTWAKE`` is s= et). + +**Scope of protection:** + +RWP protection is a property of *present* PTEs. ``UFFDIO_RWPROTECT`` only +affects entries that are already populated. Unpopulated addresses within +the range remain unpopulated; when first accessed they fault through the +normal missing path (``do_anonymous_page()``, ``do_swap_page()``, +``finish_fault()``) and the resulting PTE is not RWP-protected. To observe +the population itself, co-register the range with +``UFFDIO_REGISTER_MODE_MISSING``. + +Protection is preserved across page reclaim: a page swapped out while +RWP-protected carries the marker on its swap entry, and swap-in restores +the PROT_NONE state so the first access after swap-in still faults. The +same applies to pages temporarily replaced by migration entries. + +Operations that drop the PTE entirely =E2=80=94 ``MADV_DONTNEED`` on anony= mous +memory, hole-punch on shmem, truncation of a file mapping =E2=80=94 also d= rop the +RWP marker: the next access re-populates the range without protection. +Unlike WP (which persists via ``PTE_MARKER_UFFD_WP``), there is no +persistent RWP marker today. The user needs to re-arm the range with +``UFFDIO_RWPROTECT`` after any operation that explicitly frees PTEs. + +**Fault Handling:** + +When a protected page is accessed: + +- **Sync mode** (default): The faulting thread blocks and a + ``UFFD_PAGEFAULT_FLAG_RWP`` message is delivered to the userfaultfd + handler. The handler resolves the fault with ``UFFDIO_RWPROTECT`` + (clearing ``MODE_RWP``), which restores the PTE permissions and wakes + the faulting thread. + +- **Async mode** (``UFFD_FEATURE_RWP_ASYNC``): The kernel automatically + restores PTE permissions and the thread continues without blocking. No + message is delivered to the handler. + +**Runtime Mode Switching:** + +``UFFDIO_SET_MODE`` toggles ``UFFD_FEATURE_RWP_ASYNC`` at runtime, allowing +the VMM to switch between lightweight async detection and safe sync +eviction without re-registering. The toggle takes ``mmap_write_lock()`` to +ensure all in-flight faults complete before the mode change takes effect. + +**Cold Page Detection with PAGEMAP_SCAN:** + +RWP-protected PTEs carry the uffd PTE bit; the fault-resolution path +clears it. ``PAGEMAP_SCAN`` reports ``PAGE_IS_ACCESSED`` once the bit is +clear on a ``VM_UFFD_RWP`` VMA, so inverting it efficiently reports the +still-protected (cold) pages:: + + struct pm_scan_arg arg =3D { + .size =3D sizeof(arg), + .start =3D guest_mem_start, + .end =3D guest_mem_end, + .vec =3D (uint64_t)regions, + .vec_len =3D regions_len, + .category_mask =3D PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + }; + long n =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &arg); + +The returned ``page_region`` array contains contiguous cold ranges that can +then be evicted. + +**Cleanup:** + +When the userfaultfd is closed or the range is unregistered, all PROT_NONE +PTEs are automatically restored to their normal VMA permissions. This +prevents pages from becoming permanently inaccessible. + +**VMM Working Set Tracking Workflow:** + +A typical VMM lifecycle for cold page eviction to tiered storage. Two +mappings of the same shmem (or hugetlbfs) file are used: ``guest_mem`` is +the RWP-registered mapping that vCPUs access through, and ``io_mem`` is a +private mapping for VMM-side I/O. Reading ``io_mem`` does not go through +the RWP-protected PTEs of ``guest_mem``, so the VMM's own ``pwrite()`` +never traps on its own :: + + /* One-time setup */ + fd =3D memfd_create("guest", MFD_CLOEXEC); + ftruncate(fd, guest_size); + guest_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* vCPU view, RWP-registered */ + io_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* VMM I/O view, unprotected */ + + uffd =3D userfaultfd(O_CLOEXEC | O_NONBLOCK); + ioctl(uffd, UFFDIO_API, &(struct uffdio_api){ + .api =3D UFFD_API, + .features =3D UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }); + ioctl(uffd, UFFDIO_REGISTER, &(struct uffdio_register){ + .range =3D { guest_mem, guest_size }, + .mode =3D UFFDIO_REGISTER_MODE_RWP | + UFFDIO_REGISTER_MODE_MISSING, + }); + + /* Tracking loop */ + while (vm_running) { + /* 1. Detection phase (async =E2=80=94 no vCPU stalls) */ + ioctl(uffd, UFFDIO_RWPROTECT, &(struct uffdio_rwprotect){ + .range =3D full_range, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP }); + sleep(tracking_interval); + + /* 2. Find cold pages (uffd bit still set) */ + ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){ + .category_mask =3D PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + ... + }); + + /* 3. Switch to sync for safe eviction */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .disable =3D UFFD_FEATURE_RWP_ASYNC }); + + /* 4. Evict cold pages (vCPU faults block on guest_mem) */ + for each cold range: + /* Read from io_mem -- bypasses RWP, no fault. */ + pwrite(storage_fd, io_mem + cold_offset, len, offset); + /* Drop the page from the shared file. */ + fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + cold_offset, len); + /* + * Wake any vCPU blocked on the RWP fault for this range: + * fallocate() does not iterate ctx->fault_pending_wqh. + */ + ioctl(uffd, UFFDIO_WAKE, &(struct uffdio_range){ + .start =3D (uintptr_t)guest_mem + cold_offset, + .len =3D len }); + + /* 5. Resume async tracking */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .enable =3D UFFD_FEATURE_RWP_ASYNC }); + } + +During step 4, a vCPU that accesses ``guest_mem + cold_offset`` blocks +with a ``UFFD_PAGEFAULT_FLAG_RWP`` fault while the eviction is in +progress. After ``fallocate()`` punches the page out and ``UFFDIO_WAKE`` +fires, the vCPU retries the access, faults as ``MISSING``, and the +handler resolves it with ``UFFDIO_COPY`` from storage. + +This workflow targets shmem and hugetlbfs (both support a private +``io_mem`` mapping over the same fd). Anonymous-memory backings need a +different inner-loop strategy because the VMM has no way to read the +page without going through the RWP-protected mapping. + QEMU/KVM =3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 269DD426698; Fri, 22 May 2026 13:39:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457198; cv=none; b=LF3rZpBF9+gOou3fAZEab0DHcGnWSh+/M8O58exDJ6SjkCZSWnPlUS3GkGhERtYa6ioDYklm1OWt5V0ySkCO8RLmLYebbYtV6DnB6Zzce1oTsixPVl1NcM3MkfWp/z7RIiJ+hrSUIZAYnWGfNTsOWIUgyXPgMWnkE9xCC1WqL/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457198; c=relaxed/simple; bh=TEqPgLUiRZaE7mXk2aN0BftwZLiu9WMEYyGFFeFTaeU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Gido8Jgqm61It5YXjnqC6WeXM9Tl+Aj66hQdsGnL3d195FAod/ucKsvfcTg0PVzUGtWwm+O8EsIDDbYmZTeCPDlNhWzLpaV82w6spOHrVotSmau/dqnPiHlKN4ouhAHgvdpRgIf13qtuWZm0B5mfd0UjVA9V3EpDGT2g/91dMJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=YKVD0Pkm; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=G3hnXIQo; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="YKVD0Pkm"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="G3hnXIQo" Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfout.stl.internal (Postfix) with ESMTP id EDD3B1D0007E; Fri, 22 May 2026 09:39:51 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Fri, 22 May 2026 09:39:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457191; x= 1779543591; bh=6f33nVxWenecAKHuGCYX/tM08lTeil0fVmlJgDKyP14=; b=Y KVD0PkmfqRxwatoU5rNl5Xma/HNhtXw0Yz6QlpRXqghQWK628IuyESKrrmaOENjy uWj3fRGWoiJ80fyqJDwf6r6v8b4j85bGpCFXnGattnMvepge0SokqWXOlSxFOM6y 4/QwA0gl9uSFxbeUVLWCNUM4uJJyje/GRgHLrvC7W5XHqSOKGxWKAUvqRHSfCvvE sK4M83K/g6Xlw5rkGYhARHTux8gqMy+W93akKoa9izpyyGJWQVe7w+4IckI5DRmO uKhS8aJhz9wMQayAyFF04Ux79U5Fc31jArFTt/NgzWk72aTc1mfAcK2OJfxTM1pO 9xnzaaQyGp7BFOHQpYRnw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457191; x=1779543591; bh=6 f33nVxWenecAKHuGCYX/tM08lTeil0fVmlJgDKyP14=; b=G3hnXIQomGxQpiG2/ YFEaBDm0T1LQG3WbeVDK+p3At83gdAXPZS8lWj7HvQGzE5mgtGCVRYSvhzza8+Y9 FnnhSat3HWQNRaSQqvQrjmv52EYLCBf6Lzo4/myWV2DdkuvOnoAGcepaKrCA3Wtq NL7nZ3QkrHXG0OWiEDHoBLsRd97fQ65Je+uQDCBMM7B08mshxNw5p3apxOVyA4qU KSvjfDDEK1Lp8DjxZiqe5bmSUJ1ZB9kJj7Rl9l1k2FKpoafCBQdq113JbfFaLzPe LZCnw3GGpp7MYy4Zg7G33yw7Uv2Cs+DyiAYmQRg/0v5JVccgsahfgMXfuo6VGJF+ dG4LQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpedtteefjeetgfdvkedtfe eftedthfehvedukeehudeikeeutdfgiefhvdelkeeufeenucffohhmrghinhepfhgvrght rdhpihgunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh epkhhirhhilhhlsehshhhuthgvmhhovhdrnhgrmhgvpdhnsggprhgtphhtthhopedviedp mhhouggvpehsmhhtphhouhhtpdhrtghpthhtoheprghkphhmsehlihhnuhigqdhfohhunh gurghtihhonhdrohhrghdprhgtphhtthhopehrphhptheskhgvrhhnvghlrdhorhhgpdhr tghpthhtohepphgvthgvrhigsehrvgguhhgrthdrtghomhdprhgtphhtthhopegurghvih gusehkvghrnhgvlhdrohhrghdprhgtphhtthhopehljhhssehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehsuhhrvghnsgesghhoohhglhgvrdgtohhmpdhrtghpthhtohepvhgsrg gskhgrsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihgrmhdrhhhofihlvghtthes ohhrrggtlhgvrdgtohhmpdhrtghpthhtohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:50 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 15/16] userfaultfd.2: Add read-write protect mode Date: Fri, 22 May 2026 14:38:56 +0100 Message-ID: <20260522133857.552279-16-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Read-write protect mode (UFFDIO_REGISTER_MODE_RWP) is supported starting from Linux 7.2. It traps every access -- read or write -- to a present page within a registered range. The matching UAPI consists of: - UFFDIO_REGISTER_MODE_RWP registration-mode bit - UFFD_FEATURE_RWP capability bit - UFFD_FEATURE_RWP_ASYNC async (in-kernel) fault resolution - UFFDIO_RWPROTECT install / remove RWP on a range - UFFDIO_SET_MODE runtime sync/async toggle - UFFD_PAGEFAULT_FLAG_RWP new pagefault.flags bit Document the new registration-mode entry, the "Userfaultfd read-write protect mode" section, the new pagefault flag, and a VERSIONS line. Signed-off-by: Kiryl Shutsemau --- man2/userfaultfd.2 | 147 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 1 deletion(-) diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 index cee7c01d2512..0e702f2f4969 100644 --- a/man2/userfaultfd.2 +++ b/man2/userfaultfd.2 @@ -24,7 +24,7 @@ .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" -.TH USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual" +.TH USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual" .SH NAME userfaultfd \- create a file descriptor for handling page faults in user s= pace .SH SYNOPSIS @@ -105,6 +105,28 @@ The faulted thread will be stopped from execution until user-space write-unprotects the page using an .B UFFDIO_WRITEPROTECT ioctl. +.TP +.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)" +When registered with +.B UFFDIO_REGISTER_MODE_RWP +mode, user-space will receive a page-fault notification +on any access \(em read or write \(em to a present page within the range. +By default the faulted thread will be stopped from execution until +user-space removes the protection using a +.B UFFDIO_RWPROTECT +ioctl; +if +.B UFFD_FEATURE_RWP_ASYNC +was negotiated, the kernel restores access in place and the faulted +thread continues without blocking. +.IP +.B UFFDIO_REGISTER_MODE_RWP +and +.B UFFDIO_REGISTER_MODE_WP +cannot be combined on the same range; attempting to register with both +bits set returns +.BR EINVAL . +See the "Userfaultfd read-write protect mode" section below. .PP Multiple modes can be enabled at the same time for the same memory range. .PP @@ -186,6 +208,21 @@ The user needs to resolve the page fault by unprotecti= ng the faulted page and kicking the faulted thread to continue. For more information, please refer to the "Userfaultfd write-protect mode" section. +.PP +Since Linux 7.2, userfaultfd can do read-write protection tracking, which +traps every access (read or write) to a present page within a registered +range. +One should check against the feature bit +.B UFFD_FEATURE_RWP +before using this feature, and optionally negotiate +.B UFFD_FEATURE_RWP_ASYNC +to have the kernel auto-restore page permissions on fault without +delivering a notification. +This mode is intended for working-set tracking by VM memory managers and +similar callers; cold pages can then be evicted using independent kernel +interfaces. +For more information, +please refer to the "Userfaultfd read-write protect mode" section. .\" .SS Userfaultfd operation After the userfaultfd object is created with @@ -322,6 +359,98 @@ should have the flag cleared upon the faulted page or range. .PP Write-protect mode supports only private anonymous memory. +.SS Userfaultfd read-write protect mode (since Linux 7.2) +Since Linux 7.2, userfaultfd supports read-write protect mode. +Unlike write-protect mode, every access \(em read or write \(em to a +protected present page generates a userfaultfd notification. +It works on anonymous, shmem, and hugetlbfs mappings. +.PP +The user needs to first check availability of this feature using the +.B UFFDIO_API +ioctl against the feature bit +.B UFFD_FEATURE_RWP +before using this mode. +On kernels or architectures that cannot support read-write protection, +the bit is masked out from +.I uffdio_api.features +on return from +.BR UFFDIO_API ; +callers should inspect the returned features and fall back to another +tracking mechanism when the bit is absent. +.PP +To register with userfaultfd read-write protect mode, the user needs to +initiate the +.B UFFDIO_REGISTER +ioctl with mode +.B UFFDIO_REGISTER_MODE_RWP +set. +.B UFFDIO_REGISTER_MODE_RWP +cannot be combined with +.BR UFFDIO_REGISTER_MODE_WP ; +however it can be combined with +.B UFFDIO_REGISTER_MODE_MISSING +when the caller also wants notifications for fresh page populations. +.PP +After registration, the user can read-write-protect any existing memory +within the range using the +.B UFFDIO_RWPROTECT +ioctl where +.I uffdio_rwprotect.mode +is set to +.BR UFFDIO_RWPROTECT_MODE_RWP . +Read-write protection only affects pages that are currently populated +in the range; unpopulated addresses remain unpopulated and fall through +to the normal missing-page path on first access. +.PP +Protection is preserved across page reclaim and migration; it is +.I not +preserved across operations that drop the underlying page +.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem," +truncation of a file mapping). +Callers must re-arm the range with +.B UFFDIO_RWPROTECT +after any such operation. +.PP +When an access fault happens against a protected page, user-space will +receive a page-fault notification whose +.I uffd_msg.pagefault.flags +field has the +.B UFFD_PAGEFAULT_FLAG_RWP +bit set. +.PP +To resolve a read-write-protect page fault, the user initiates another +.B UFFDIO_RWPROTECT +ioctl whose +.I uffdio_rwprotect.mode +has the +.B UFFDIO_RWPROTECT_MODE_RWP +flag cleared. +This restores the original VMA permissions on the affected pages and +wakes any blocked threads (unless +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +is also set). +.PP +If +.B UFFD_FEATURE_RWP_ASYNC +was negotiated alongside +.BR UFFD_FEATURE_RWP , +the kernel resolves access faults in place without delivering a +notification: page permissions are restored automatically and the +faulting thread continues. +Callers can later reconstruct which pages were touched by inspecting the +.B PAGE_IS_ACCESSED +bit returned by the +.B PAGEMAP_SCAN +ioctl described in +.BR ioctl_userfaultfd (2) +and +.IR Documentation/admin\-guide/mm/pagemap.rst +in the Linux kernel source. +.PP +The async mode can be toggled at runtime using the +.B UFFDIO_SET_MODE +ioctl, which lets a single userfaultfd switch between async detection +and synchronous eviction without re-registering the range. .SS Reading from the userfaultfd structure Each .BR read (2) @@ -473,6 +602,12 @@ If the address is in a range that was registered with = the .B UFFDIO_REGISTER_MODE_WP flag, when this bit is set, it means it is a write-protect fault. Otherwise it is a page-missing fault. +.TP +.BR UFFD_PAGEFAULT_FLAG_RWP " (since Linux 7.2)" +If the address is in a range that was registered with the +.B UFFDIO_REGISTER_MODE_RWP +flag, this bit indicates that the fault was triggered by an access to a +read-write-protected page (either a read or a write). .RE .TP .I pagefault.feat.pid @@ -574,6 +709,16 @@ system call first appeared in Linux 4.3. .PP The support for hugetlbfs and shared memory areas and non-page-fault events was added in Linux 4.11 +.PP +Read-write protect mode +.RB ( UFFDIO_REGISTER_MODE_RWP ", " UFFD_FEATURE_RWP ", " +.BR UFFDIO_RWPROTECT ) +was added in Linux 7.2, +together with +.B UFFD_FEATURE_RWP_ASYNC +and the +.B UFFDIO_SET_MODE +runtime mode toggle. .SH CONFORMING TO .BR userfaultfd () is Linux-specific and should not be used in programs intended to be --=20 2.51.2 From nobody Thu Jun 11 10:19:55 2026 Received: from fout-b2-smtp.messagingengine.com (fout-b2-smtp.messagingengine.com [202.12.124.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BE0C426EA6; Fri, 22 May 2026 13:39:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457203; cv=none; b=LWTapVe+LiaMcXi78y7k22e5LS2cVPLzCTORM8qLNJdyXvhotOXPgnT64ZFZPUrSDZ66AGh8n6rx5shPrHcpvenOwGPHRxGxOWTfTj4ThBmAkBG/Gz6NJR3x5hTxvfGOQ0yYnnQ5ul0XT3QM+1xYFi5WZxz1hUbPhXSWzj83o40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779457203; c=relaxed/simple; bh=39e8fnpTUC46kSuw6XMbkKUnGDMgz+PEMYBez5ZsnoM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FBGOZfBVT7pGcDiMF8WljZEdL4ic2CSzMKA9ODHyVbxPyJ5EirdNNdPXTK+bi1lR2+UBl7wzyPyNAZcTO3fct0YdqEbPKA+8E7ao3m3fGNF21dZfX8Kx5K+fv6S0SbYkiSsbMsj+4J7AFQrsMFOPPzkkurvB7L7IxlCg9f3ZV3Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=06PiO8pj; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=EPJyAb55; arc=none smtp.client-ip=202.12.124.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="06PiO8pj"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="EPJyAb55" Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id BB5291D00121; Fri, 22 May 2026 09:39:54 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Fri, 22 May 2026 09:39:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779457194; x= 1779543594; bh=7kgK2B7CSrKKlV7MAxGh/B0N2F/JYv53qwBfZJ+EZP0=; b=0 6PiO8pjoX+s75ZiHT4qw7ynCfBAjDhdjyUKwJsfr7BZ0gE8NoFBsAnbjHdzFdlzb ZMyrOFE4yR6XKstdv9fcpUtUbHwYqdl1C9yq510gXBJAULoTrVZrGleEL2aXV9z/ 3IOFF9X98wQCb+xxh68/DzIWouZyrT6LKRJyJHniNUQm5EamBiy+2nYtiNxEFB+y g4HfCf/N6bK7R93iadLzUQ0+kDwbjwJeCDGqRjFSyMrFw1T0JSxQAhof8T1+vyDq OerulQQvls5bQdM1Dbhy4G7+rl3+vd8L6jcwmqBCVeto97J3IgGGa6sAWM1pmIn8 z228SWGpnB6cPHhQyeRKg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779457194; x=1779543594; bh=7 kgK2B7CSrKKlV7MAxGh/B0N2F/JYv53qwBfZJ+EZP0=; b=EPJyAb55jbrqUpgHF VjM2SUqz+yHoRy0qmFHa3mWQfnINHDJV2LKa4/Y+sSdwW+3GpDrtoA3tmIOxUi24 LUnDrx8JDg24H9sgaLWGNBBTuix8Q27mMFRybJsOds5Xor4sC1S1lC1+tq4vGSP+ vJ0PM+X2obTz7CE1cNeHjmNM+lZ19ZqgbOlpgP8Pk0M/4Q6DL9uY+NHQQjrGkyhl MGqFFOOsFMOhJ6ug9JwgvQRxdFvbtVOKAtLeSG3vPYeOm4gzwymMxfURJCVGEIhO /m5n8pkNywjMclMmalDlJgvEj9OkZsDWE1AeTTjnWcRPKNF0EB5ofjEDQP+fMgYG N1OcA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduhedtfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpefmihhrhihlucfuhhhuthhsvghmrghuuceokhhirhhilhhlsehs hhhuthgvmhhovhdrnhgrmhgvqeenucggtffrrghtthgvrhhnpeegveehtdfgvdfhudegff euuddvgeevjefhveevgefhvdevieevteeivdehjefhjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvdeipdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh eprhhpphhtsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghrgiesrhgvughh rghtrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htoheplhhjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhurhgvnhgssehgohho ghhlvgdrtghomhdprhgtphhtthhopehvsggrsghkrgeskhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhirghmrdhhohiflhgvthhtsehorhgrtghlvgdrtghomhdprhgtphhtthho peiiihihsehnvhhiughirgdrtghomh X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 22 May 2026 09:39:53 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, linux-man@vger.kernel.org, alx@kernel.org, "Kiryl Shutsemau (Meta)" Subject: [PATCH v3 16/16] ioctl_userfaultfd.2: Add read-write protect mode docs Date: Fri, 22 May 2026 14:38:57 +0100 Message-ID: <20260522133857.552279-17-kirill@shutemov.name> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260522133857.552279-1-kirill@shutemov.name> References: <20260522133857.552279-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Userfaultfd read-write protection (UFFDIO_REGISTER_MODE_RWP) is supported starting from Linux 7.2. It traps every access -- read or write -- to a present page within a registered range. The new UAPI documented here: - UFFD_FEATURE_RWP / UFFD_FEATURE_RWP_ASYNC capability bits - UFFDIO_REGISTER_MODE_RWP registration-mode bit - 1 << _UFFDIO_RWPROTECT / _UFFDIO_SET_MODE available-ioctls bits - UFFDIO_RWPROTECT install / remove RWP - UFFDIO_SET_MODE runtime sync/async toggle Signed-off-by: Kiryl Shutsemau --- man2/ioctl_userfaultfd.2 | 209 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 208 insertions(+), 1 deletion(-) diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2 index 504f61d4b0cd..0a24a77ca32b 100644 --- a/man2/ioctl_userfaultfd.2 +++ b/man2/ioctl_userfaultfd.2 @@ -25,7 +25,7 @@ .\" %%%LICENSE_END .\" .\" -.TH IOCTL_USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual" +.TH IOCTL_USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual" .SH NAME ioctl_userfaultfd \- create a file descriptor for handling page faults in = user space @@ -214,6 +214,33 @@ memory accesses to the regions registered with userfau= ltfd. If this feature bit is set, .I uffd_msg.pagefault.feat.ptid will be set to the faulted thread ID for each page-fault message. +.TP +.BR UFFD_FEATURE_RWP " (since Linux 7.2)" +If this feature bit is set, +the kernel supports read-write protection tracking, and the +.B UFFDIO_REGISTER_MODE_RWP +registration mode and the +.B UFFDIO_RWPROTECT +ioctl described below become available. +On kernels or architectures that cannot support this mode, the bit is +masked out from +.I uffdio_api.features +on return; callers should inspect the returned features and fall back +to another tracking mechanism when the bit is absent. +.TP +.BR UFFD_FEATURE_RWP_ASYNC " (since Linux 7.2)" +If this feature bit is set, +the kernel will resolve read-write protect faults in place without +delivering a notification, automatically restoring page permissions and +letting the faulted thread continue. +This bit requires +.B UFFD_FEATURE_RWP +to be set in the same +.B UFFDIO_API +call. +The async mode can also be toggled at runtime using the +.B UFFDIO_SET_MODE +ioctl described below. .PP The returned .I ioctls @@ -240,6 +267,21 @@ operation is supported. The .B UFFDIO_WRITEPROTECT operation is supported. +.TP +.BR "1 << _UFFDIO_RWPROTECT" " (since Linux 7.2)" +The +.B UFFDIO_RWPROTECT +operation is supported. +This bit is reported only when +.B UFFD_FEATURE_RWP +was negotiated successfully. +.TP +.BR "1 << _UFFDIO_SET_MODE" " (since Linux 7.2)" +The +.B UFFDIO_SET_MODE +operation is supported. +This is a file-descriptor-level ioctl and is reported once per +userfaultfd, independent of any registered range. .PP This .BR ioctl (2) @@ -327,6 +369,16 @@ Track page faults on missing pages. .TP .B UFFDIO_REGISTER_MODE_WP Track page faults on write-protected pages. +.TP +.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)" +Track page faults on read-write-protected pages. +Every access (read or write) to a present page within the registered +range generates a notification once the range has been protected with +.BR UFFDIO_RWPROTECT . +This mode cannot be combined with +.BR UFFDIO_REGISTER_MODE_WP ; +attempting to do so returns +.BR EINVAL . .PP If the operation is successful, the kernel modifies the .I ioctls @@ -735,6 +787,161 @@ or not registered with userfaultfd write-protect mode. .TP .B EFAULT Encountered a generic fault during processing. +.SS UFFDIO_RWPROTECT (Since Linux 7.2) +Read-write-protect or un-protect a userfaultfd-registered memory range +registered with mode +.BR UFFDIO_REGISTER_MODE_RWP . +.PP +The +.I argp +argument is a pointer to a +.I uffdio_rwprotect +structure as shown below: +.PP +.in +4n +.EX +struct uffdio_rwprotect { + struct uffdio_range range; /* Range to change RWP on */ + __u64 mode; /* Mode flags */ +}; +.EE +.in +.PP +The following mode bits are supported: +.TP +.B UFFDIO_RWPROTECT_MODE_RWP +When this mode bit is set, +the ioctl installs read-write protection on every present page in the +range specified by +.IR range . +Otherwise the ioctl removes read-write protection from the range, which +is also how a faulted handler resolves an +.B UFFD_PAGEFAULT_FLAG_RWP +notification. +.TP +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +When this mode bit is set, +do not wake up any thread that waits for page-fault resolution after +the operation. +This can be specified only if +.B UFFDIO_RWPROTECT_MODE_RWP +is not specified. +.PP +Read-write protection only affects pages that are currently populated +in the range; unmapped addresses are left untouched. +Protection is preserved across page reclaim and migration; callers must +re-arm a range with +.B UFFDIO_RWPROTECT +after any operation that drops the underlying page +.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem," +truncation of a file mapping). +.PP +This +.BR ioctl (2) +operation returns 0 on success. +On error, \-1 is returned and +.I errno +is set to indicate the error. +Possible errors include: +.TP +.B EINVAL +The +.I start +or the +.I len +field of the +.I uffdio_range +structure was not a multiple of the system page size; or +.I len +was zero; or the specified range was otherwise invalid; or an invalid +mode bit was specified; or +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +was specified together with +.BR UFFDIO_RWPROTECT_MODE_RWP . +.TP +.B EAGAIN +The process was interrupted; retry this call. +.TP +.B ENOENT +The range specified in +.I range +is not valid. +For example, the virtual address does not exist, +or part of the range is not registered with +.BR UFFDIO_REGISTER_MODE_RWP . +.TP +.B EFAULT +Encountered a generic fault during processing. +.\" +.SS UFFDIO_SET_MODE (Since Linux 7.2) +Toggle userfaultfd features that may be flipped at runtime. +.PP +The +.I argp +argument is a pointer to a +.I uffdio_set_mode +structure as shown below: +.PP +.in +4n +.EX +struct uffdio_set_mode { + __u64 enable; /* Feature bits to set */ + __u64 disable; /* Feature bits to clear */ +}; +.EE +.in +.PP +Bits set in +.I enable +turn the named features on; bits set in +.I disable +turn them off. +The two fields must not overlap. +Today only +.B UFFD_FEATURE_RWP_ASYNC +is a valid bit in either field; any other bit causes the ioctl to +return +.BR EINVAL . +Enabling +.B UFFD_FEATURE_RWP_ASYNC +also requires +.B UFFD_FEATURE_RWP +to have been negotiated at +.B UFFDIO_API +time. +.PP +The toggle takes the per-process +.I mmap_lock +in write mode, ensuring that all in-flight fault handlers complete +before the new mode takes effect. +This allows a single userfaultfd to switch between lightweight async +detection and synchronous eviction without re-registering its ranges. +.PP +This +.BR ioctl (2) +operation returns 0 on success. +On error, \-1 is returned and +.I errno +is set to indicate the error. +Possible errors include: +.TP +.B EINVAL +A bit other than +.B UFFD_FEATURE_RWP_ASYNC +was specified in +.I enable +or +.IR disable ; +the two fields overlap; or +.B UFFD_FEATURE_RWP_ASYNC +was requested without +.B UFFD_FEATURE_RWP +having been negotiated. +.TP +.B EFAULT +.I argp +refers to an address that is outside the calling process's accessible +address space. .SH RETURN VALUE See descriptions of the individual operations, above. .SH ERRORS --=20 2.51.2