From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EDCA3CF699; Tue, 26 May 2026 13:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800723; cv=none; b=XGxA5/btpmchwk1tfuMBtX+269vhrRG9G8foNcDaW6JKp/6cmt2F2Sh5tHrX2t1DdcptrGv500ucSTBKcbwsslTFsBC+3Slc8kSZb5ge13zs0yx51gqNU3qWTUWqUOJ5aAzbPVsTbHoNIOKyflXRLCTznjipHDItbl4BDBKi3ys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800723; c=relaxed/simple; bh=Q1x0vzo4Fcg7T40t+fkqOjXDWYibC3Uzz9whKJveh2I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AncZ+XfRCLFS4P9RLQELSBcmgxEsN7tIKmPGr4qzOutQIKXkn4fS1rq2E99+mRNkrw6p1ZvNSnptVn1UbmlaQs2DD3aNB7TCr1MPfsSDMBT6AX4AULH0zARjzE6hWK7b7vul/Pvg3KK5vP3NMYmlNndLoYW7bdo+EF/jem5YLbQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=NdbTXC6n; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=IGBF0Cz2; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="NdbTXC6n"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="IGBF0Cz2" Received: from phl-compute-12.internal (phl-compute-12.internal [10.202.2.52]) by mailfhigh.stl.internal (Postfix) with ESMTP id 1B37E7A0084; Tue, 26 May 2026 09:05:19 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-12.internal (MEProxy); Tue, 26 May 2026 09:05:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800718; x= 1779887118; bh=E7L0m3RyuvBx1kNYpnyZCnh1DhseUWlMTVoo0K5ue5I=; b=N dbTXC6nfKnkuwTtmDny+2aqIfJfyc1pBOZW+8mXnJyJFIEx+jT73avZVDg/Jdvjw Nzv0eaYnrGmJoaCuXRXodkKmc8sK4BAknIxuTjQ52wqoDGzcFYg+Ha5WVQU6B0hK rbjOB49DcM5AXfYsIoNMqTJWHx+4EOd6goiLvaSou0h7pn3VDDyzOHlInzHUIjZI /+RNwRPiMw2v4NO05bqJRe7dH6zRyPI1ApCqipYeUWHgpltb1WLwcC6MoH+hp6x+ fjf3GmwRmT1FYvAr08Xppk0iWSgXdFzF9jJTzvbGTKiTsOTW5+jdlMmZtCwXc9jb M1z6MHUXBOf129pYYVm9w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800718; x=1779887118; bh=E 7L0m3RyuvBx1kNYpnyZCnh1DhseUWlMTVoo0K5ue5I=; b=IGBF0Cz2M9jn/2BO1 XqK9E7mBCfZ+0kW/I5mbgVnQ3I/ZSgJ1JA9UWnFdlkX0WAKHtEFH0bH56UQ+mynW 5ki0M36q2ccwgGP3XTCdjTJhlyPmamMRg8v5lv4UZnxYmBDNinnd4HTHKAwYPR/K 3o6I0AjjhRjolYHVK0IAMYcpUTrSyqbWT57Wa+oqNiMFNb0o/GJYHa5V4xsSZ/8u yoelRdT6Bfx/CMs5cTRDsWD8v5scZ+xazieiEp7R2yyv4p1Zl8kSr3J12JKbqkap 12f65rOvP5garDUHPksiR0Fg75ChENNaiX+TMqd9+BsvP17+PxsUA527yL6LRl9A tH7YQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGhhOzazjqDbg84kXmGQIriAJ7A5ZgfZ0KyuPxd944l9VDoz0auYhNa1nywHIJo+A 1950kHkKTkQx9lnfSbZcx/S8Mcmt8P0l8Wb8fN0f+wGCwnBkoWy9vl20eCC3tZZX+sH7av Dlxe1oS29Zt+/zoZFzy+yh5aDbHDzc3iUaHQ8s1Xon6Q6ulUL8z0cPYFRnxsd9cUuaOOeC grd5OyNgyo2h51fT3APjemmY7PM1ZW9p43vVXIU5xFLcMabfv/O6RwCJbL3XPnw5Oz6kjj h/H6ZW2FJischjgbFlb1Y+t4mjYQn8Uqjf0DstcwPvag6qa0D5BtHhYNV9zEgkVoBIpvsV HjeUxvJUyBNj4Et00jMB7yxNstb+ga+iDaignJA2Z1TUx07nEuujPXNVC1yKIQAZC/SRzt KO5DpEJq+zywyr4tZxV7LPwshOrBc4oFzAodKrcv+DCEH47AMqMlDsNbdlfSfkf4eUkFiA 6FrzrMB9pHbLNpLe7gvkO33hNIIfH5mGXcbOUp8UxYISR022hjAtnUUAcDxY697CtAxQpP zVqAPTUv6uxCKk4KV2zALF7uGWHWgp8gpR2UBm+3CtgTzS5njazyXneppcKKltY4vTsOgH bM2isd0gIi4bl9y8nut0ap6qUmCIjBrJ5uh6XU6vK+q1gGHoKXBVlBAI4d+w X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:18 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" , stable@vger.kernel.org, Sashiko AI review Subject: [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Date: Tue, 26 May 2026 14:04:49 +0100 Message-ID: <20260526130509.2748441-2-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" make_uffd_wp_huge_pte() arms the UFFD_WP bit on a present HugeTLB PTE by calling huge_ptep_modify_prot_commit() with a ptent snapshot that was fetched without the corresponding huge_ptep_modify_prot_start(). The start helper is what atomically clears the entry so the kernel-owned snapshot stays consistent until the commit; without it, the hardware may set Dirty or Accessed in the live PTE between the original read and the commit, and huge_ptep_modify_prot_commit() (whose generic implementation just calls set_huge_pte_at()) then writes the stale snapshot back over the live hardware bits, losing the update. The non-hugetlb sibling make_uffd_wp_pte() does this correctly via ptep_modify_prot_start() / ptep_modify_prot_commit(). Mirror that pattern for the present-PTE branch. The migration case stays as-is -- migration entries are non-present, so there's no hardware update to race against. Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optional= ly clear info about PTEs") Cc: stable@vger.kernel.org Reported-by: Sashiko AI review Signed-off-by: Kiryl Shutsemau --- fs/proc/task_mmu.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 1e3a15bf46f4..e21a38ac745b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2610,12 +2610,16 @@ static void make_uffd_wp_huge_pte(struct vm_area_st= ruct *vma, if (softleaf_is_hwpoison(entry) || softleaf_is_marker(entry)) return; =20 - if (softleaf_is_migration(entry)) + if (softleaf_is_migration(entry)) { set_huge_pte_at(vma->vm_mm, addr, ptep, pte_swp_mkuffd_wp(ptent), psize); - else - huge_ptep_modify_prot_commit(vma, addr, ptep, ptent, - huge_pte_mkuffd_wp(ptent)); + } else { + pte_t old_pte, new_pte; + + old_pte =3D huge_ptep_modify_prot_start(vma, addr, ptep); + new_pte =3D huge_pte_mkuffd_wp(old_pte); + huge_ptep_modify_prot_commit(vma, addr, ptep, old_pte, new_pte); + } } #endif /* CONFIG_HUGETLB_PAGE */ =20 --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EED37357CE8; Tue, 26 May 2026 13:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800727; cv=none; b=umv/C2+nQOnG8b4GEOi6MG8Vb2hv4495U/FqJh5ufSt0doRDQBFWSRLKazO86ch6mtwR0EaBmTTsek7VOqiaNgSipIz9F0TqkbAt0vSDZP/q1gCG7asyfL4inlLKYKRPjcLzQ5ktyVyafhm5aM8Mpf8GjaCl6i7Btk+K+qMvg8c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800727; c=relaxed/simple; bh=o8SwNUysqqOMKVR4AVp5RvJEZPV0nQ2WqI3xepDKDH0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VQ6Aq2do7WuZMcPvsopzcWby/KaWCeXdafIesSVQsoHFdMRNUgidjJ1kNQNrOVP5yT7nE328VmCYtncP2qXALdxNKj11MoHIsk5uqgc7koZqOmay7LeI8YbeIuavvgS/l1jDC5WG8hyeWYbnei4oDI2MFApFXji65uuCKVKQa6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=VF1KadyI; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=elYBKXJy; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="VF1KadyI"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="elYBKXJy" Received: from phl-compute-07.internal (phl-compute-07.internal [10.202.2.47]) by mailfout.stl.internal (Postfix) with ESMTP id 50A181D00133; Tue, 26 May 2026 09:05:22 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-07.internal (MEProxy); Tue, 26 May 2026 09:05:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800722; x= 1779887122; bh=r95UsN6POXemku+UEWA0RutW8+SVMPDyy6AAJkPaa+o=; b=V F1KadyIFrAFs18cLg3xDeNj8DTmmqk/dwKRM8dgHcgEn8PKLRNcGqwBPbJXRcqQd KBVdnbfmW37mrwQXmOo9IMC2j+flCz5JR7jiGBc3rzss4ocIZwGbhC86NaCFkBqf fpM4voOh+LyZGaBgaef8gCn2GTYSbcrQTTMsi9l2pOKDujX1LR9qeoH10DPufiOA UIOMwu/yTu3hh8M/Q5QhU6Pq21HabEiCMDo/vrXBQACK5I/Wjszb6PH0I9G8IJZL +nwfYJuq1uhOr54POH0Ib0brEydYBovCLasy4BL+4DAS2Fu9ezE6nxW8nXFRUYiH 80tbQbIjSESTqvMPu73fw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800722; x=1779887122; bh=r 95UsN6POXemku+UEWA0RutW8+SVMPDyy6AAJkPaa+o=; b=elYBKXJyH+kc564cM gSloRSWrjMp8j9e53ZdkrSnzIB6uLXipuMkEvAVkMwrcjMrSicTeUpHj52Sjt9eC K1v400OgBdqB6e1YGrU+5vz3RMtayny3XVlqhO03ZsMNTQRCQLvWNlmqNA6Py5IX YENhi+aFTDYPEMyP49STAgtVZfT/juPGCQtjDM1MlafZr3uwzkGSwdcrZSmgp/4+ st4NRwt9GH8Sx9sG8gaIGREoK0XHe/djFxhvYqVsAtPTKcuEmwewK+3k0l2VyfSG a0ia4zbHlfxCBF1piKMH79emsuwn9RetK0fJdyQ8dR6p1Za6B8FRStMU0l5C9Tn4 f61nA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGhhOzazjqDbg84kXmGQIriAJ7A5ZgfZ0KyuPxd944l9VDoz0auYhNa1nywHIJo+A 1950kHkKTkQx9lnfSbZcx/S8Mcmt8P0l8Wb8fN0f+wGCwnBkoWy9vl20eCC3tZZX+sH7av Dlxe1oS29Zt+/zoZFzy+yh5aDbHDzc3iUaHQ8s1Xon6Q6ulUL8z0cPYFRnxsd9cUuaOOeC grd5OyNgyo2h51fT3APjemmY7PM1ZW9p43vVXIU5xFLcMabfv/O6RwCJbL3XPnw5Oz6kjj h/H6ZW2FJischjgbFlb1Y+t4mjYQn8Uqjf0DstcwPvag6qa0D5BtHhYNV9zEgkVoBIpvlo i28QZ3xpuMeXkG9Vo0bwb0QNEzHuzfMOai+QEeiDGLl0K7ZTiv6YICoJ4SHacwXkuePeN7 r78GLR0pW7mJoP9mv/y78Jxzi+EC/jBx5wAG6ygsuZlgXZYVWgbUSsbWfWQnjCeQrbw7qr AIpgRyPKxu2MXhtXDmQwFKFE5SgLyyTyiI0IcHS0ZGprqwj+wV3hSqdL8JLbZUdOZlYUVW s4W3dUFc2Up2R4BE6+/kRMIUPJwIMit82TsK0X37VG6bQsilv5FrMfL7vpkUPl7SzhT1Mg D1FM9brZTa9uvb7rhT2NDTubSOHN5pgdmYEzm1XL6vPT9sAi73w3kw6YefRg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:21 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" , stable@vger.kernel.org, Sashiko AI review Subject: [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Date: Tue, 26 May 2026 14:04:50 +0100 Message-ID: <20260526130509.2748441-3-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" change_non_present_huge_pmd() rewrites a writable device-private PMD swap entry into a readable one without carrying pmd_swp_uffd_wp() across. The PTE-level change_softleaf_pte() does this correctly; mirror that here, matching what copy_huge_pmd() does for the fork path. Without the carry, a plain mprotect() over a UFFD_WP-marked device-private THP strips the bit and the trap is bypassed on swap-in. Fixes: 368076f52ebe ("mm/huge_memory: add device-private THP support to PMD= operations") Cc: stable@vger.kernel.org Reported-by: Sashiko AI review Signed-off-by: Kiryl Shutsemau --- mm/huge_memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 42b86e8ab7c0..b7c895b1d366 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2663,6 +2663,8 @@ static void change_non_present_huge_pmd(struct mm_str= uct *mm, } else if (softleaf_is_device_private_write(entry)) { entry =3D make_readable_device_private_entry(swp_offset(entry)); newpmd =3D swp_entry_to_pmd(entry); + if (pmd_swp_uffd_wp(*pmd)) + newpmd =3D pmd_swp_mkuffd_wp(newpmd); } else { newpmd =3D *pmd; } --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C09393C9EF4; Tue, 26 May 2026 13:05:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800734; cv=none; b=mf8GrKCBQJmENJVJoVkNK69XvkwgJjWYB8VmkHU0SNcqA6hAqEkwREfoj1U2vIIhxGitGAdFOi1xOiCz24RznaPxQq2vLYhKhlBQnAbiczLMo7F7RmUfgqRE83ZYgOMqSvxTCsY1McOeJjUKSfBaWQ247Ih7NcOJecnPQvJmI+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800734; c=relaxed/simple; bh=Z8RkZKWnEMc11dpT7ZbpiSwfS0JebjikBzYVmcK7kbs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I29YMKSTZLnxDUfn2cCKRspfFkjn9epsi85zyLgR7crw1KHxLNN8QHgndtBHacbO+3U9naYQBNcZor179b5LN4HrY0TPkBxCCvpdRZntWN8gmKWYAiuEMMae7kBJvxUgv/H1ToAqzaOyW4i50jA8W/ddQ00Po5ec0S4qju0OtYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=Ji+Qdksv; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=HkWUbR8C; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="Ji+Qdksv"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="HkWUbR8C" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 401717A0199; Tue, 26 May 2026 09:05:25 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Tue, 26 May 2026 09:05:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800725; x= 1779887125; bh=YBwKkY/grj7eTubGVS17k23m7rOV/TXg0sGJ6fJ8NP8=; b=J i+Qdksv20gAC3RtFnogzYuu25FqSHcEULonL1lRlrS7o2BRSUfo+tE5HTg5ivaih g2sXfDL+EO/nlWjxNp+bZTl+7sDqzSegl91NgJ2/xxLYD2qWt6QE5KBLRboaH8qT SOFPRtInVPpGDxw5lBIcU3wZWUhpTkqMp7iMpE/X0o3Q92YAKg3TRUZu1hhplYRS GgvqOYv3c37fS/80zr2OEM1EH3ktUA/eyShWyrxqR374NVvvvaB3oqAH8l904Obn LTXxo8JXpxl57wqk0FzVZMCSMHgi139eVOpm5IeoJyG3JwLZBH8x+q+fAk9Xqbmx 1jUz+atgx2rYx5PsO6cvQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800725; x=1779887125; bh=Y BwKkY/grj7eTubGVS17k23m7rOV/TXg0sGJ6fJ8NP8=; b=HkWUbR8C2VgDKVeEi 3fFNUpQr18rx/QntnFFe8xmRB+aCvuXp9lhz/Bq0kzAnwfKWM5Sv0Qjm357tNNMF LYl8E/zE9X9ryvJI2g0N+MPXGephfbTTa1p84lol6pTHUXVfu2kEV1MLcasQezhS JBIz+aEdPDV0oAakwARpNFoDpvC2C8Bo8eswLL6ksHbh695rDWBd/SqujHW8bXI0 Y3iHY2XtNWW7qVagSMesXd+P8QeCtvP7buHbacK6gEmJUWlQ1Nw8BLDpSSXUlXf/ IYhBD1tV6N8Rbe85VpYRluMDzAyVSbI7+rJ5qb5KbT2KKrb1QOx0MP9qXPobEaib LBr4Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGhhOzazjqDbg84kXmGQIriAJ7A5ZgfZ0KyuPxd944l9VDoz0auYhNa1nywHIJo+A 1950kHkKTkQx9lnfSbZcx/S8Mcmt8P0l8Wb8fN0f+wGCwnBkoWy9vl20eCC3tZZX+sH7av Dlxe1oS29Zt+/zoZFzy+yh5aDbHDzc3iUaHQ8s1Xon6Q6ulUL8z0cPYFRnxsd9cUuaOOeC grd5OyNgyo2h51fT3APjemmY7PM1ZW9p43vVXIU5xFLcMabfv/O6RwCJbL3XPnw5Oz6kjj h/H6ZW2FJischjgbFlb1Y+t4mjYQn8Uqjf0DstcwPvag6qa0D5BtHhYNV9zEgkVoBIpvTz ZnR1foptO7/8jmH5QXxz32AsiOm+BXz6hc/cs5B/azrJ0t+P/mqk1gFTiuuTPm5dA5Vq81 cJX4YOgsiiWFnHVl2CR8dh5PxgYu8tU+XiYMmMArsrktl7gMkXqC4+Sbs5Xn6kySEhNAys mYSAKOa4AxYoEqMuWUhw8xNgM0FzaeXymKeIrC6L2QtadT/OHDmcHIZKIxtLTvXPGciZHW lAasq5Z58Qc+qpKkmjjFN0MXay2YvYLghe85s4ABJ6fi1iJVjQD26TVp+3WO6IAh6yMxoI vHJCG2WFwICzMtns84mmZs8RKQ3lGJVWcW9XDv21tmktq9h501AtPj0yTmcg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:24 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" , stable@vger.kernel.org, Sashiko AI review Subject: [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Date: Tue, 26 May 2026 14:04:51 +0100 Message-ID: <20260526130509.2748441-4-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" userfaultfd_must_wait() and userfaultfd_huge_must_wait() read the PTE without taking the page table lock and then apply pte_write() / huge_pte_write() to it. Those accessors decode bits from the present encoding only; on a swap or migration entry they read the offset bits that happen to share the same position and return an undefined result. The intent of the check is "is this fault still WP-blocked?". A non-marker swap entry means the page is in transit -- the userfault context the original fault delivered against is no longer the same, and the swap-in or migration completion path will re-deliver a fresh fault if userspace still needs to handle it. Worst case under the current code the garbage write bit says "wait", and the thread stays asleep until a UFFDIO_WAKE that may never arrive. Gate the writability check on pte_present() so the lockless re-check only inspects present-PTE bits when the entry is actually present. The non-present, non-marker case returns "don't wait" and lets the fault path retry. Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait fo= r hugepmd ranges") Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaul= tfd ioctl") Cc: stable@vger.kernel.org Reported-by: Sashiko AI review Signed-off-by: Kiryl Shutsemau --- mm/userfaultfd.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 35b206cc9aa6..f6d2a1c67019 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2535,6 +2535,15 @@ static inline bool userfaultfd_huge_must_wait(struct= userfaultfd_ctx *ctx, /* UFFD PTE markers require userspace to resolve the fault. */ if (pte_is_uffd_marker(pte)) return true; + /* + * Concurrent migration may have replaced the present PTE with a + * non-marker swap entry between fault delivery and this lockless + * re-check. huge_pte_write() on a swap entry decodes random offset + * bits, so gate it on pte_present(). The migration completion path + * will re-deliver the fault if it still needs userspace. + */ + if (!pte_present(pte)) + return false; /* * If VMA has UFFD WP faults enabled and WP fault, wait for userspace to * resolve the fault. @@ -2621,6 +2630,17 @@ static inline bool userfaultfd_must_wait(struct user= faultfd_ctx *ctx, /* UFFD PTE markers require userspace to resolve the fault. */ if (pte_is_uffd_marker(ptent)) goto out; + /* + * Concurrent swap-out / migration may have replaced the present PTE + * with a non-marker swap entry between fault delivery and this + * lockless re-check. pte_write() on a swap entry decodes random + * offset bits, so gate it on pte_present(). The page-in path will + * re-deliver the fault if it still needs userspace. + */ + if (!pte_present(ptent)) { + ret =3D false; + goto out; + } /* * If VMA has UFFD WP faults enabled and WP fault, wait for userspace to * resolve the fault. --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 819803B5851; Tue, 26 May 2026 13:05:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800732; cv=none; b=SlQoq0c/rXkzMqmpDsMFGkZ93ElYNAxfUX+7NOtaXm0r6XJoJjwvnQvqQtXMnAWRyuNKWoQaFNeuWmsaVwuv1Co/ctlRHuUyDg/oks0wol7AnnAZ36s2bYhuhMn/tr0+v2YWLKkP2OPam8YO6SIyIgXOhv1FeWZGK8pmJwlEQxg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800732; c=relaxed/simple; bh=Aa7q+R70Rfhw3FjOKqvALHtYuktvZ/CCoA0n1h5+WIk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AgJLBRQl/NsVktuL/f6JQNzIXAD/70pDtipgqlsnxxuTbsr7ds8L018XEiiSyx4zJ+Gjt1LcZ4x863NqFjKTT/ug1kak9rZBVt2CwHOuah9CeOiwlIgZ2Dr2kMnvM4E3T2K5tGVUJ93OPVsafd8iVBiqxme1S3c7luEGtCKSueA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=0Ff9KK5j; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=tq/IS1vr; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="0Ff9KK5j"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="tq/IS1vr" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 2F3B21D00136; Tue, 26 May 2026 09:05:28 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Tue, 26 May 2026 09:05:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800728; x= 1779887128; bh=d2ZWqfxA3Xfz/rbOUTC3r3TWiQNsEFUYd9kcAPigHDw=; b=0 Ff9KK5jJKOUb5d5liGmK5GatF06r44Cqioot76R60wyzD6cRAD9/2HeBNut4ujaS W2GEl7DHhCjn8SBl/Dn6IbsDf4fSsHiupW3zsAt5y9KaIeeHUnfOqmmLs0dzHQb8 yL9NJAglP41B0KecJ+SN+MfhSvYG4i9QPP3veMLaNRJfz37Sh+iS8tKHxlIPt/hL A+cyp3Kk1xHYrHaRf2aXafzfSUwaXt8hRTLnPcg7WbTUgF2duU5kcAv1cLaj/9eV CjFtsjAN7yiEmkjsXhREstaGOz0oslu8fN7kdgdBdH476fKfBBPzOJlonrF4pXtc JXN+Tlf+125HctylqWzKA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800728; x=1779887128; bh=d 2ZWqfxA3Xfz/rbOUTC3r3TWiQNsEFUYd9kcAPigHDw=; b=tq/IS1vr8K9Xqtgj6 hGGWy8M5burQAAJFH0YCT/nT8ovMNG44HRzh7iF94TumMePIZjhnQ9atwHGP95+9 nqghS1a/qL1/ZjK3nYEkBHTjgoiTbvZIZso0UG6072OAlxVDRruJju1BcoG8YpyZ dB4UBVsdSDUTXxvWYSMCooVaa7PZNa7N/BJ0Rs4NecmDKCGkyVOeaBvZtEFWVk1Q zmCdFDFh80V9ORlxHE9nXmEvposv+o2wjQ47XNKuGb0zhKToto8OgiF2ztuI+gJx kq34Bb2JpQ3H/T06Lm3G4uXduOFgb7/ocjDXERCSVuM0d6VPGGezmHEFf6nphESi RGPEw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTGorQpjKAi1i8ZG7iYIJOmPXyYX16Wn66HfpfZJJwBiL8IODTEb3YekzvnSBlbsM/ X6SQ7ppixGZbLDOnnWP3soza/h36vxVDnsrC4ARxk4Fv4Yxwdlz6XfYgV6aVIhmCAkL7S2 rg8LxWrTzJxnWF5mIcscGivOVXfPKYpE0f6/jiO1NUStm2Yrejvp+NeLpIjMig9ai5xjSZ OfcKIZ2RBZ3qxjzIlRdeUOJIM3gZ80hy7RrJ61riH0RZeVqDEJKO9QmnDrbe2YL9kG6jXf jJ57EekWZW19wOy8aabwVWOCaUr0TBZVdRGF4CNgIjxmzTTbsnitPoPbwOXnU30/TzlgQz 7FtKYR4dZi54wHrYQVhUER00pFq5LStKwSFXwPOG3Vf4ECuBo0m+ZCs6GaATfLv3YR0/oD R/sn8lj3ySJEA70rOZ/5ulxtvgnYy6+kbWK2YY//GIXDgADn4aMyrp88Y8k+uMKX0saJGq qt3NscLpQPqSHFguy4RerDH2D82jpyQb6UPZ8hb6IW+AmShC0tv1FRp0N1uBKIYOn1nuCV s/Yekbapt/tmMvawu6RTGFOLqbNk2Q/wwDfdoXQaxq+L4uhAvleHvWgGWyEoMxhGto9LOt oqn1gmDacXcYL+l5QGABeMXIy2A5N/lw6eH5WqqBzP5ZbOBIU61XrbkURIUQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:27 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" , stable@vger.kernel.org Subject: [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Date: Tue, 26 May 2026 14:04:52 +0100 Message-ID: <20260526130509.2748441-5-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" vma_flags_t is one unsigned long on 32-bit -- NUM_VMA_FLAG_BITS =3D=3D BITS_PER_LONG by design, so VM_xxx-declared bits sit in the first word and hit the single-long fast path. But the bit enum declares some bits unconditionally above BITS_PER_LONG (VMA_UFFD_MINOR_BIT =3D=3D 41 today, with VM_UFFD_MINOR =3D=3D VM_NONE on 32-bit so no VMA actually carries the bit). Passing such a bit to mk_vma_flags() goes through __set_bit(41, &one_long) and writes one word past the end. The compiler folds the OOB store with wraparound (1UL << (41 % 32) =3D=3D bit 9) into the first word. Bit 9 is already in __VMA_UFFD_FLAGS so the mask happens to come out right today, but any high-numbered bit whose mod-BITS_PER_LONG position is otherwise unused would silently OR an extra bit into the mask. Add VMA_NO_BIT and have DECLARE_VMA_BIT() resolve any bitnum out of range to it. vma_flags_set_flag() drops negative bit values. The ternary collapses at compile time, the runtime check folds away when the bit is in range, and the common path is unchanged. Bits declared in the enum are now safe to pass to mk_vma_flags() regardless of arch. Fixes: 9ea35a25d51b ("mm: introduce VMA flags bitmap type") Cc: stable@vger.kernel.org Signed-off-by: Kiryl Shutsemau --- include/linux/mm.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0f2612a70fb1..71b11945e4fc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -286,8 +286,17 @@ extern unsigned int kobjsize(const void *objp); */ typedef int __bitwise vma_flag_t; =20 -#define DECLARE_VMA_BIT(name, bitnum) \ - VMA_ ## name ## _BIT =3D ((__force vma_flag_t)bitnum) +/* + * VMA_NO_BIT means "no bit"; mk_vma_flags() skips it. DECLARE_VMA_BIT() + * below uses it for any bit number that doesn't fit in the bitmap, so + * callers don't need to track which bits are valid on the current build. + */ +#define VMA_NO_BIT ((__force vma_flag_t)-1) + +#define DECLARE_VMA_BIT(name, bitnum) \ + VMA_ ## name ## _BIT =3D (((bitnum) < NUM_VMA_FLAG_BITS) ? \ + ((__force vma_flag_t)(bitnum)) : \ + VMA_NO_BIT) #define DECLARE_VMA_BIT_ALIAS(name, aliased) \ VMA_ ## name ## _BIT =3D (VMA_ ## aliased ## _BIT) enum { @@ -1081,6 +1090,8 @@ static __always_inline void vma_flags_set_flag(vma_fl= ags_t *flags, { unsigned long *bitmap =3D flags->__vma_flags; =20 + if ((__force int)bit < 0) + return; __set_bit((__force int)bit, bitmap); } =20 --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47EA33CC7C6; Tue, 26 May 2026 13:05:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800734; cv=none; b=fikKKs9Hn7+C12WKPWaxfmVEWdAmWZs+aMuz8RAdLNM+k8ajHG1CHJ9OnO4+TSb5RQHyzPxNiY5DYLW/lL8WodBL2nxCkzHqgzWvHu+GLyDcya2N03i9qXoyyzlQVqpVbFHh4LUA0GJW+aazkVV6ziC0YILg4B0Fy1CIc8Wnnyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800734; c=relaxed/simple; bh=AnRKoYy/SDYOSZG/eH+WlVlmk46N6a9NKX2oi2urUHU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sogJcHN2572zSsv6Ax4pH1TeHjiviEhlzYm2CZZxkGmi+UDPv5tg8G/yrNPjgPV9G2WotVKrHlpZ/WfTt/rKExLrK6JMBob7qo80a4/YbHlX2Mb/c/uTw15WBfbeUkaLQz9XRW0Ly73zEvMfGx8vu35CQkhq6NoOhaSYtUI/W2c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=oq3CavPE; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=effCe9JD; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="oq3CavPE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="effCe9JD" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 1571D1D00139; Tue, 26 May 2026 09:05:31 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Tue, 26 May 2026 09:05:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800730; x= 1779887130; bh=KAnWr4HZExOeax1r+Y5qX/ebuMdtOSIqyv4Q47MTSAo=; b=o q3CavPEcYMoMYYO+zT6sPiWy4APW9LNywMzgkfF+Kav7aVL/I9uWDmEjNxos4bZY PwgsoPnJ6nXFDlQitwVuUtPHJncbdE/FxXbDX7Ka4eEgBTyLPGryfzhXNlJ42GGA 2tTwBKJQ1Op/1Be1k8gI9e+5eoN7uaPzVFXQF7OVrOab45ZxVp+WVp6yABKkUomk EIbFKUQJSIEiB0bfMSZ8H+RPhjFqosR+qkO2UXpaKzZ+I92zE+aIXDQfPwmd7FMF dQXGLV+AH6NRTjlTLAwYhOCF/Kt/fB3P50DNb7+ev8ab/8Dc003prHgc17dEbo3G +N2/U2a2EY2Jo2vCtSftA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800730; x=1779887130; bh=K AnWr4HZExOeax1r+Y5qX/ebuMdtOSIqyv4Q47MTSAo=; b=effCe9JDOY6bFitOh q8qJzIoxG4JvotqH2CeEKQ2q71E0IEKvUfPSeRvvLju42dXWAxQaX7IrzUk4DGI9 DPHCgTUbZb6SoxTvlynWTNYgow+Gd2Ke9BD87jQdbY7v5lELvlLoFkiMjgkVP8lb UpmMR0+M97KQxOfMW/VdzekDSPdeB7/cCROhibh7BK/c6ChmwcqCj5uAoo6k1lKZ C+7Fkgsf2uF7hOPW1O/CqRwl73nOpukdnh4RjNk1ki8SOJ+rESIeiWOmgyGw51rK nKX2T4DQsHw6FvoZT+LydI41rltU319qTxC9wAMlqFiBD7KCoQpraXRlOGLNkizH EpuRQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEJiC033GhjL2GdnX/sXIn31a6xYH9Q9kJLS3tMsbkgwO/cImnHqXQKfEc55wOLx4 i2tqVUZQjDS/XsxdV1Neul2ks/uVgINeMKnTDgBTDd49K013fUhaa7mbXp5KOF+CzusOdT UiYD/yezdXIvKfe8z5w2tcDXcgJNRSU5eSwZFLQl4UtnpivyD/d0gOKuFq22BcEw4tqdAi ZrLfx6gsOEL+8JLB9yUFNWntFWViMy1Q2hMU0QsTFycK4r8MnF1MjvTDo2QKxDh/PpEdKA ZnnGzw8DPiuoI+QbA8cmDBSDEJ+Eojxn1EXRLRmRJNLXOxLbY2LwfoxO6aBp6BbF/3ouHQ n5U+D6Jy8en4U5LNdI3hlF7xLvRmHvMy3bNSSH04fNcrXF7LI9fULmyvit3A0bnc6wN7Tn WbGpDsCIyOOQzATw0EcKNzlMqPkL8tx89piruJbrMI6OPEHKCR6A//4yzF7QN9wLtBBeS4 Mdin5+v4F3TULcgCFpElwwGcz295oD7Jt84pUm2fv1Q0VXWpkcxWMKeoUMKJOrGpsBnlYg VH1JNe2ZrfGWmhy604VNIuzlGgoRCOaAR4S1QlnDzq4NIOfCxrX4c4vSxzf65erobNOGzr l3kenexXZLVjoU1aPh7xRTK/ujMhmZ+zB93Sxk+k/Szt7VSZbItSzWBBgtKw X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:30 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Date: Tue, 26 May 2026 14:04:53 +0100 Message-ID: <20260526130509.2748441-6-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" pte_protnone() and pmd_protnone() detect present-but-inaccessible page table entries. This capability is useful beyond NUMA balancing -- for example, userfaultfd working set tracking uses protnone PTEs to track page access without unmapping pages. Introduce CONFIG_ARCH_HAS_PTE_PROTNONE to decouple the protnone PTE infrastructure from CONFIG_NUMA_BALANCING. The six architectures that support protnone PTEs (x86_64, arm64, powerpc, s390, riscv, loongarch) now select this option, and CONFIG_NUMA_BALANCING depends on it. No functional change -- the same set of architectures continues to have working protnone support, but the infrastructure is now available independently of NUMA balancing. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Acked-by: SeongJae Park Acked-by: Mike Rapoport (Microsoft) --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 7 ++--- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/pgtable.h | 4 +-- arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++--- arch/powerpc/platforms/Kconfig.cputype | 1 + arch/riscv/Kconfig | 1 + arch/riscv/include/asm/pgtable.h | 7 ++--- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 4 +-- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 8 ++--- include/linux/pgtable.h | 32 ++++++++++++++------ init/Kconfig | 8 +++++ mm/debug_vm_pgtable.c | 4 +-- 15 files changed, 52 insertions(+), 36 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fe60738e5943..319470b3b1bb 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -78,6 +78,7 @@ config ARM64 select ARCH_SUPPORTS_CFI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 4dfa42b7d053..873f4ea2e288 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -553,10 +553,7 @@ static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { /* @@ -575,7 +572,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 606597da46b8..c085f5067b3b 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -67,6 +67,7 @@ config LOONGARCH select ARCH_SUPPORTS_LTO_CLANG select ARCH_SUPPORTS_LTO_CLANG_THIN select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE if 64BIT select ARCH_SUPPORTS_NUMA_BALANCING if NUMA select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_RT diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/= asm/pgtable.h index 2a0b63ae421f..d295447a2763 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -619,7 +619,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_s= truct *mm, =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline long pte_protnone(pte_t pte) { return (pte_val(pte) & _PAGE_PROTNONE); @@ -629,7 +629,7 @@ static inline long pmd_protnone(pmd_t pmd) { return (pmd_val(pmd) & _PAGE_PROTNONE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_leaf(pmd) ((pmd_val(pmd) & _PAGE_HUGE) !=3D 0) #define pud_leaf(pud) ((pud_val(pud) & _PAGE_HUGE) !=3D 0) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index e67e64ac6e8c..53a0c5892548 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -490,13 +490,13 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte) } #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)= ) =3D=3D cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_hw_valid(pte_t pte) { @@ -1067,12 +1067,12 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #endif #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) =20 diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platform= s/Kconfig.cputype index bac02c83bb3e..36b64a24cf30 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -87,6 +87,7 @@ config PPC_BOOK3S_64 select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION select ARCH_ENABLE_SPLIT_PMD_PTLOCK select ARCH_SUPPORTS_HUGETLBFS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select HAVE_MOVE_PMD select HAVE_MOVE_PUD diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index c5754942cf85..e2c5776d18cf 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -71,6 +71,7 @@ config RISCV select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS if 64BIT && MMU select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU select ARCH_SUPPORTS_PER_VMA_LOCK if MMU + select ARCH_HAS_PTE_PROTNONE if MMU select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK select ARCH_SUPPORTS_SCHED_MC if SMP diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index a1a7c6520a09..48a127323b21 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -524,10 +524,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) PAGE_SIZE) #endif =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * See the comment in include/asm-generic/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) =3D=3D _PAGE_PR= OT_NONE; @@ -537,7 +534,7 @@ static inline int pmd_protnone(pmd_t pmd) { return pte_protnone(pmd_pte(pmd)); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 /* Modify page protection bits */ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index ecbcbb781e40..bc5bef08454b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -151,6 +151,7 @@ config S390 select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && CC_IS_CLANG select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS + select ARCH_HAS_PTE_PROTNONE select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtabl= e.h index 2c6cee8241e0..97241dea5573 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -842,7 +842,7 @@ static inline int pte_same(pte_t a, pte_t b) return pte_val(a) =3D=3D pte_val(b); } =20 -#ifdef CONFIG_NUMA_BALANCING +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return pte_present(pte) && !(pte_val(pte) & _PAGE_READ); @@ -853,7 +853,7 @@ static inline int pmd_protnone(pmd_t pmd) /* pmd_leaf(pmd) implies pmd_present(pmd) */ return pmd_leaf(pmd) && !(pmd_val(pmd) & _SEGMENT_ENTRY_READ); } -#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline bool pte_swp_exclusive(pte_t pte) { diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f3f7cb01d69d..9da1119e8ff6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select ARCH_SUPPORTS_DEBUG_PAGEALLOC select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64 + select ARCH_HAS_PTE_PROTNONE if X86_64 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <=3D 4096 select ARCH_SUPPORTS_CFI if X86_64 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2187e9cfcefa..c7f014cbf0a9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -985,11 +985,7 @@ static inline int pmd_present(pmd_t pmd) return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE); } =20 -#ifdef CONFIG_NUMA_BALANCING -/* - * These work without NUMA balancing but the kernel does not care. See the - * comment in include/linux/pgtable.h - */ +#ifdef CONFIG_ARCH_HAS_PTE_PROTNONE static inline int pte_protnone(pte_t pte) { return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT)) @@ -1001,7 +997,7 @@ static inline int pmd_protnone(pmd_t pmd) return (pmd_flags(pmd) & (_PAGE_PROTNONE | _PAGE_PRESENT)) =3D=3D _PAGE_PROTNONE; } -#endif /* CONFIG_NUMA_BALANCING */ +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 static inline int pmd_none(pmd_t pmd) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cdd68ed3ae1a..b6516a11adfa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -2052,18 +2052,26 @@ static inline int pud_trans_unstable(pud_t *pud) return 0; } =20 -#ifndef CONFIG_NUMA_BALANCING +#ifndef CONFIG_ARCH_HAS_PTE_PROTNONE /* - * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It is - * perfectly valid to indicate "no" in that case, which is why our default - * implementation defaults to "always no". + * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". = It + * is perfectly valid to indicate "no" in that case, which is why our + * default implementation defaults to "always no". * - * In an accessible VMA, however, pte_protnone() reliably indicates PROT_N= ONE - * page protection due to NUMA hinting. NUMA hinting faults only apply in - * accessible VMAs. + * In an accessible VMA, pte_protnone() reliably indicates a present + * PROT_NONE page protection. Today the kernel uses such PTEs for two + * purposes: NUMA hinting faults, and userfaultfd RWP tracking on + * VM_UFFD_RWP VMAs. The two are distinguished by the uffd PTE bit and + * the VMA flag; see include/linux/userfaultfd_k.h. * - * So, to reliably identify PROT_NONE PTEs that require a NUMA hinting fau= lt, - * looking at the VMA accessibility is sufficient. + * So, to reliably identify PROT_NONE PTEs that require kernel handling, + * looking at the VMA accessibility (and the uffd bit on RWP VMAs) is + * sufficient. + * + * Architectures without CONFIG_ARCH_HAS_PTE_PROTNONE get the always-zero + * stubs below; PAGE_NONE references that survive to runtime fire the + * BUILD_BUG() fallback, since callers should have folded such paths to + * dead code via IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE). */ static inline int pte_protnone(pte_t pte) { @@ -2074,7 +2082,11 @@ static inline int pmd_protnone(pmd_t pmd) { return 0; } -#endif /* CONFIG_NUMA_BALANCING */ + +#ifndef PAGE_NONE +#define PAGE_NONE ({ BUILD_BUG(); (pgprot_t){0}; }) +#endif +#endif /* CONFIG_ARCH_HAS_PTE_PROTNONE */ =20 #endif /* CONFIG_MMU */ =20 diff --git a/init/Kconfig b/init/Kconfig index 2937c4d308ae..58abb7f19206 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -944,6 +944,13 @@ config SCHED_PROXY_EXEC =20 endmenu =20 +# +# For architectures that support present-but-inaccessible (PROT_NONE) page +# table entries detectable via pte_protnone() / pmd_protnone(): +# +config ARCH_HAS_PTE_PROTNONE + bool + # # For architectures that want to enable the support for NUMA-affine schedu= ler # balancing logic: @@ -1010,6 +1017,7 @@ config ARCH_WANT_NUMA_VARIABLE_LOCALITY config NUMA_BALANCING bool "Memory placement aware NUMA scheduler" depends on ARCH_SUPPORTS_NUMA_BALANCING + depends on ARCH_HAS_PTE_PROTNONE depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY depends on SMP && NUMA_MIGRATION && !PREEMPT_RT help diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 23dc3ee09561..5e9f3a35f924 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -672,7 +672,7 @@ static void __init pte_protnone_tests(struct pgtable_de= bug_args *args) { pte_t pte =3D pfn_pte(args->fixed_pte_pfn, args->page_prot_none); =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 pr_debug("Validating PTE protnone\n"); @@ -685,7 +685,7 @@ static void __init pmd_protnone_tests(struct pgtable_de= bug_args *args) { pmd_t pmd; =20 - if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE)) return; =20 if (!has_transparent_hugepage()) --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D71433E0750; Tue, 26 May 2026 13:05:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800738; cv=none; b=StTfMPLBtX4cnuhEbXCxezEwI08AaKq9DD1N9ErBrMmrhTpJzbEpvNoLcoQ0hisOHZ6NzaGTMJWLGR9IQS7mwoE6r0XjpC+aCpaM0dW95ajjsfL6s5tQyRT18ybqLZUaCciFfZoS+jqqawrDFpmGvgybFjLNq0uAYASEJkzmiXU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800738; c=relaxed/simple; bh=DSaikYO5j57k24Or+b5M9NiNF2CKcJrGseSNYbiWP3w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m4ocuPMY/CtnSps/1K8j7dDpUyFryeqJDa8+d8iRIU8f4GwHOnMUW3cpGawnZOWJ2xWYb+bRGgUTw/F2w9lbUmflrYSyIFZ5pfL/s72A585t2nIDcQAPQ9gUTMXKvXSSRafkLqb0NjOclR99bG3YRfTqt9JN502b8OOzWsCqxwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=tKGlIuUr; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=FhBpHRRE; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="tKGlIuUr"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="FhBpHRRE" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id 852C67A0188; Tue, 26 May 2026 09:05:33 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Tue, 26 May 2026 09:05:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800733; x= 1779887133; bh=HCsrlFlMXo0746uAMjG4gQ0SrYEk0HAPTcU35/Iu7B4=; b=t KGlIuUrpeKL1ig9ccVedk4hPGXMKxXw7ch2hWhs47PTr9IKp+nMXMgQdHQC0ZQcA oI57e9wBHL1QR0avdhjXKO1J52khCFFzhqLbe9HS4+VaF70RnSdcxVX1sawHKkRR XcwTu9okXntQpx03Pt9Ote57M7Ggp23YTSiTUXAx3XmLkFh5tnM39cW8lLBX1YQg PkOiBQjWtEY+RVxvRsGm1WEgM0Vt0Q6SbH562HhWbGwJb1HKH6o/MRsRqEvCJQE7 OaeoZWkf5eAsgFV3AkAJ+OOhj/jDX5j3oG88I+RF6+19lZWEjwNkXPZ/rdEJ/TDy lwh2DY5KbWMiMgvSkniSA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800733; x=1779887133; bh=H CsrlFlMXo0746uAMjG4gQ0SrYEk0HAPTcU35/Iu7B4=; b=FhBpHRREWXFtIukan ZkVPMd3xREGBPADUYUlFf8PqWBKKXaB/lSHru1stf4QjcHm43U2kWx6tu+FbfBKn AeaBl23bc9lyFz2txAPRrttzIy4xMvmqrEOiGBURZo/zQyz92PaNOdSEVFQGEyEr YQ7VVO6giF1hmg7TojPOicHeJnOtQ5BL44GQlaH4ghyK8i6cvgmhiLEhLNE+IiI+ GCnkfXkGnP2dY4l1vJZCj8t793mrSzw8STUIwiCUiS+dgiaIxRSNt0Z6D2t3lZq/ Xervj4So2y+wT9GiL90FK7uJKaP+1xMei3iyNkaaUXf34ZxIiKOIQ4LWR09gPOCV XBPFw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFqbc6fXhopwy+hAOgtjHy4pTGuHjfXZXlZdTx/k4tQMvhyI2vZzCnM5UZQOJLsQ6 MBNN4KMbmYZdvz5T8cIgQbo11hylcsT72WGhRELqB/NWcVS6dsw2DuAcdzpihvUx/kC7Rx TXAh0tKkGSNzPC9MzCVari3WizUMLAhLPeZR8i+UtrjD8sUs4gbS3nJjyE6j7x9rQvwYKa Vo1Cdu33hYc8iqP/ATtq14j2/nP5x/Vq5r72hY07O9WQkAcMkY7hJXfuznzppNnRfPI/E0 vXoMBAS+e5kQ5+gg0Sxc802FUu+VDDuhW3vvNg1v2vn17b0MDhDS2LwVSA9QGKzjnWPAVG qjlEBrv2x0pedg7+xAqspi2iHWlK7FMZDywfadvLqN5YjoMLNV+9sasygPgpJ0F+J/RB0d 4roJ4OeCLMaBdYF7KYAyTaAR3d0O3hDAIoZ2znCSBcXyE+jlsTfrqJr5AaziQ46fYBxipx I/1piC/oKJ/HbuoYSe55uZ8xCqpgY26CjnbNIp+9YHdLszG4dzV5If3WxTNGIb+K28l/8S wj4WeSSSGYfllEGWGwW2PUHIXkF1enaIUm2ImD5D6v5MB2TeBBgGlZhWppk457ftQABBbl kqrHgMv0enRAyeRxhYTR0L7qRD536iI8FOg7oKBxDmvO1TZgWkQSeZZ3zLeQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:32 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Date: Tue, 26 May 2026 14:04:54 +0100 Message-ID: <20260526130509.2748441-7-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" The uffd-wp PTE bit is about to gain a second consumer: userfaultfd RWP will use the same bit to mark access-tracking PTEs, distinct from mprotect(PROT_NONE) or NUMA-hinting PTEs. WP vs RWP semantics come from the VMA flag; the bit is just "uffd has claimed this entry." Drop the "_wp" suffix from the arch-private bit macros so they reflect that. x86: _PAGE_BIT_UFFD_WP -> _PAGE_BIT_UFFD _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD arm64: PTE_UFFD_WP -> PTE_UFFD PTE_SWP_UFFD_WP -> PTE_SWP_UFFD riscv: _PAGE_UFFD_WP -> _PAGE_UFFD _PAGE_SWP_UFFD_WP -> _PAGE_SWP_UFFD Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable-prot.h | 8 ++++---- arch/arm64/include/asm/pgtable.h | 12 ++++++------ arch/riscv/include/asm/pgtable-bits.h | 12 ++++++------ arch/riscv/include/asm/pgtable.h | 14 +++++++------- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ arch/x86/include/asm/pgtable_types.h | 16 ++++++++-------- 6 files changed, 43 insertions(+), 43 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm= /pgtable-prot.h index 212ce1b02e15..09d7c00cf405 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -28,11 +28,11 @@ #define PTE_PRESENT_VALID_KERNEL (PTE_VALID | PTE_MAYBE_NG) =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ +#define PTE_UFFD (_AT(pteval_t, 1) << 58) /* userfaultfd tracking */ +#define PTE_SWP_UFFD (_AT(pteval_t, 1) << 3) /* only for swp ptes */ #else -#define PTE_UFFD_WP (_AT(pteval_t, 0)) -#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define PTE_UFFD (_AT(pteval_t, 0)) +#define PTE_SWP_UFFD (_AT(pteval_t, 0)) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 873f4ea2e288..3eecb2c17711 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -343,17 +343,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_UFFD_WP); + return !!(pte_val(pte) & PTE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP))); + return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -539,17 +539,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & PTE_SWP_UFFD_WP); + return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); + return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm= /pgtable-bits.h index b422d9691e60..d5a86b4df3ce 100644 --- a/arch/riscv/include/asm/pgtable-bits.h +++ b/arch/riscv/include/asm/pgtable-bits.h @@ -40,20 +40,20 @@ =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP =20 -/* ext_svrsw60t59b: Bit(60) for uffd-wp tracking */ -#define _PAGE_UFFD_WP \ +/* ext_svrsw60t59b: Bit(60) for userfaultfd tracking */ +#define _PAGE_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ (1UL << 60) : 0) /* * Bit 4 is not involved into swap entry computation, so we - * can borrow it for swap page uffd-wp tracking. + * can borrow it for swap page userfaultfd tracking. */ -#define _PAGE_SWP_UFFD_WP \ +#define _PAGE_SWP_UFFD \ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \ _PAGE_USER : 0) #else -#define _PAGE_UFFD_WP 0 -#define _PAGE_SWP_UFFD_WP 0 +#define _PAGE_UFFD 0 +#define _PAGE_SWP_UFFD 0 #endif =20 #define _PAGE_TABLE _PAGE_PRESENT diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index 48a127323b21..ca69948b3ed8 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -405,32 +405,32 @@ static inline pte_t pte_wrprotect(pte_t pte) =20 static inline bool pte_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_UFFD_WP); + return !!(pte_val(pte) & _PAGE_UFFD); } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD_WP)); + return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 static inline bool pte_swp_uffd_wp(pte_t pte) { - return !!(pte_val(pte) & _PAGE_SWP_UFFD_WP); + return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return __pte(pte_val(pte) | _PAGE_SWP_UFFD_WP); + return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD_WP)); + return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1157,7 +1157,7 @@ static inline pud_t pud_modify(pud_t pud, pgprot_t ne= wprot) * bit 0: _PAGE_PRESENT (zero) * bit 1 to 2: (zero) * bit 3: _PAGE_SWP_SOFT_DIRTY - * bit 4: _PAGE_SWP_UFFD_WP + * bit 4: _PAGE_SWP_UFFD * bit 5: _PAGE_PROT_NONE (zero) * bit 6: exclusive marker * bits 7 to 11: swap type diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c7f014cbf0a9..038c806b50a2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -413,17 +413,17 @@ static inline pte_t pte_wrprotect(pte_t pte) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_UFFD_WP; + return pte_flags(pte) & _PAGE_UFFD; } =20 static inline pte_t pte_mkuffd_wp(pte_t pte) { - return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD_WP)); + return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 static inline pte_t pte_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_UFFD_WP); + return pte_clear_flags(pte, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -528,17 +528,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_UFFD_WP; + return pmd_flags(pmd) & _PAGE_UFFD; } =20 static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) { - return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD_WP)); + return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 @@ -1550,32 +1550,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { - return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 static inline int pte_swp_uffd_wp(pte_t pte) { - return pte_flags(pte) & _PAGE_SWP_UFFD_WP; + return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) { - return pte_clear_flags(pte, _PAGE_SWP_UFFD_WP); + return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 static inline int pmd_swp_uffd_wp(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_SWP_UFFD_WP; + return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_SWP_UFFD_WP); + return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 2ec250ba467e..af08d98be930 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -31,7 +31,7 @@ =20 #define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 -#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ +#define _PAGE_BIT_UFFD _PAGE_BIT_SOFTW2 /* userfaultfd tracking */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted= to large */ =20 @@ -39,7 +39,7 @@ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD)= */ #else -/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ +/* Shared with _PAGE_BIT_UFFD which is not supported on 32 bit */ #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */ #define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD)= */ #endif @@ -111,11 +111,11 @@ #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define _PAGE_UFFD_WP (_AT(pteval_t, 1) << _PAGE_BIT_UFFD_WP) -#define _PAGE_SWP_UFFD_WP _PAGE_USER +#define _PAGE_UFFD (_AT(pteval_t, 1) << _PAGE_BIT_UFFD) +#define _PAGE_SWP_UFFD _PAGE_USER #else -#define _PAGE_UFFD_WP (_AT(pteval_t, 0)) -#define _PAGE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#define _PAGE_UFFD (_AT(pteval_t, 0)) +#define _PAGE_SWP_UFFD (_AT(pteval_t, 0)) #endif =20 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) @@ -129,7 +129,7 @@ /* * The hardware requires shadow stack to be Write=3D0,Dirty=3D1. However, * there are valid cases where the kernel might create read-only PTEs that - * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * are dirty (e.g., fork(), mprotect(), userfaultfd, soft-dirty tracking).= In * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bi= t, * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have * (Write=3D0,SavedDirty=3D1,Dirty=3D0) set. @@ -151,7 +151,7 @@ #define _COMMON_PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | \ _PAGE_DIRTY_BITS | _PAGE_SOFT_DIRTY | \ - _PAGE_CC | _PAGE_UFFD_WP) + _PAGE_CC | _PAGE_UFFD) #define _PAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PAT) #define _HPAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_PAT_LAR= GE) =20 --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D0A03DEACE; Tue, 26 May 2026 13:05:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800741; cv=none; b=DWAnoQlC5/ibVdy11ijnoAulKgffrdf0xo07hFIFXl7tdcRIsBZ5nT3sHDUq29UX+KtUq9Aq2bvU9m+MWYIjOf3B090Q8mwWoXge7BJ2FA/AQBd9j1hgp4c+DS6VfKDjtpcPNetwVZ2oQJi9RSTrIxVAjLaFX0DXDafX9Y5hew8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800741; c=relaxed/simple; bh=U4khoU8XQ7zEb8WnAic31TO4hJWPunPBFRPMJzuSYfQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kY1MMStmu6JidKuQhQPYYKPmbCIrzQaZ/2BoP6y8x7KpKJGm4SJgeE5SZKa7LDSAFEtGhzVL0a7aIYysjhoqh/c5X9dESPMLhvKrzBkZ7fR1rcYTsumgC1kvmcsaFmvMDiTSBgFmbvK1YYiJhxI2NXJIJiUmF5mmnhUkZwknlmA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=sp5MOaOl; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=YPurNl/x; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="sp5MOaOl"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="YPurNl/x" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 44C031D00138; Tue, 26 May 2026 09:05:36 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Tue, 26 May 2026 09:05:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800736; x= 1779887136; bh=9w4pHLlPPvdjComZDz50X3r+NiKo5s+M3QizcwB9Fmk=; b=s p5MOaOlt2z7RlsxhdTm5k1xZ1XE2x1DpeirJ6+ycIV5u+vJZdpWUggwBTcQxDLt5 qGSARuQsRVNxwgwrooUxNU+S9U/CLN5X553k6r9XPkWi+4bdqc13bEwDQMQqq4+T sUEW0MNKCMdT5ZmKLh5MqH3eoV0f4xPaDOb87LyELRl0/BbFebmwXbRRz80FVKwf 2c/UM4FBG77Y9ajjuHcPvYgM0W2I1K/3eqixcvkERbwh+RfgJ4/vd6tfBSSausuB 3AktF1t2L3RvinykPI/7327TYBXPaJfIoZ02hjjN+z0b9FkOqlqxJSOEYjfdqnhv GG1mWxpzZM3zVZXu+9Qvw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800736; x=1779887136; bh=9 w4pHLlPPvdjComZDz50X3r+NiKo5s+M3QizcwB9Fmk=; b=YPurNl/xKBgzKknNF GW5C/LdRYaMEFb/bYf4ZQ6SpaLkfhR0XjM2H8O4PO1KhErrdhgL/VY9I5pQx9FAa d6kUVgpMMJXL/64n9JEMEEt+X3poVFoEsevLLpYRNK8J7GUuYV1x4zJ2nAIWs8zL EzvAh9kE33QyOUdEDHrxSYXSmwLr3zrPkg9l4qOckV937OT7OnZJFRUHiXpufCsb eC8g43NMzcKIORg3T2ESotTdGib66bWuZQHocHhrNyV0IDwGlpcZMXl6H0nc31ch yNQWduLtkl6m31mjtx0ykQ1EA2h5q9j8O57VNlpY9CibOpaMt4XLyTWIgD1/aXDf +PVhA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFqbc6fXhopwy+hAOgtjHy4pTGuHjfXZXlZdTx/k4tQMvhyI2vZzCnM5UZQOJLsQ6 MBNN4KMbmYZdvz5T8cIgQbo11hylcsT72WGhRELqB/NWcVS6dsw2DuAcdzpihvUx/kC7Rx TXAh0tKkGSNzPC9MzCVari3WizUMLAhLPeZR8i+UtrjD8sUs4gbS3nJjyE6j7x9rQvwYKa Vo1Cdu33hYc8iqP/ATtq14j2/nP5x/Vq5r72hY07O9WQkAcMkY7hJXfuznzppNnRfPI/E0 vXoMBAS+e5kQ5+gg0Sxc802FUu+VDDuhW3vvNg1v2vn17b0MDhDS2LwVSA9QGKzjnWPAO7 fY0pUEUHHBT+q70bS3A+kmuFZddzhgabnLrzvvviD9/c31ZWTL80kWnAw9TjDwAkzTP0hC 7TIi8JPWN6uZUTifKnmyA3DYt7pO0LfMTUUHuac2Or0tKSkMfiBlSzfO+dNxBAShrxzGfc gvrSfpuLLT8Fu45xhnqk8Qpb4OLIOUiY3pb+DmyXD+zBHa9Ie4Nj0VLIZfy7mcBN+3u8BY 3pcaEPUdIA2Dlc+pL5DSAPKwOVXD6I1nlzIUTylTj4gVDpsES3MoIW+cGxOhHQIbYVXnZy 3u/T/HKHMbz9qD0N4pytj96SOKndAMRcUaUSsk0/YlmfkKZIQM8MeFH/l0pg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:35 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 07/18] mm: rename uffd-wp PTE accessors to uffd Date: Tue, 26 May 2026 14:04:55 +0100 Message-ID: <20260526130509.2748441-8-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Userfaultfd RWP will reuse the uffd-wp PTE bit to mark access-tracking PTEs, alongside the write-protected ones it already marks. The bit's meaning now depends on the VMA flag (WP or RWP), not on its name. Rename the kernel-internal names that describe the bit: - pte/pmd/huge_pte accessors (and swap variants) - pgtable_supports_uffd() capability query - SCAN_PTE_UFFD khugepaged enum The ftrace string emitted by mm_khugepaged_scan_pmd for this enum is kept as "pte_uffd_wp" so existing trace-based tooling keeps matching. Pure mechanical rename -- no behavior change. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- arch/arm64/include/asm/pgtable.h | 28 +++++++-------- arch/riscv/include/asm/pgtable.h | 38 ++++++++++---------- arch/s390/include/asm/hugetlb.h | 12 +++---- arch/x86/include/asm/pgtable.h | 24 ++++++------- fs/proc/task_mmu.c | 44 +++++++++++------------ include/asm-generic/hugetlb.h | 18 +++++----- include/asm-generic/pgtable_uffd.h | 32 ++++++++--------- include/linux/leafops.h | 4 +-- include/linux/mm_inline.h | 4 +-- include/linux/swapops.h | 4 +-- include/linux/userfaultfd_k.h | 14 ++++---- include/trace/events/huge_memory.h | 2 +- mm/huge_memory.c | 56 +++++++++++++++--------------- mm/hugetlb.c | 46 ++++++++++++------------ mm/internal.h | 4 +-- mm/khugepaged.c | 22 ++++++------ mm/memory.c | 34 +++++++++--------- mm/migrate.c | 12 +++---- mm/migrate_device.c | 8 ++--- mm/mprotect.c | 12 +++---- mm/mremap.c | 4 +-- mm/page_table_check.c | 8 ++--- mm/rmap.c | 16 ++++----- mm/swapfile.c | 4 +-- mm/userfaultfd.c | 6 ++-- 25 files changed, 228 insertions(+), 228 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 3eecb2c17711..c41e4d59dc9f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -341,17 +341,17 @@ static inline pmd_t pmd_mknoncont(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD))); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_UFFD)); } @@ -537,17 +537,17 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & PTE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD)); } @@ -590,13 +590,13 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_mkvalid_k(pmd) pte_pmd(pte_mkvalid_k(pmd_pte(pmd))) #define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) -#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) -#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) -#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd)) -#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))) -#define pmd_swp_clear_uffd_wp(pmd) \ - pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))) +#define pmd_uffd(pmd) pte_uffd(pmd_pte(pmd)) +#define pmd_mkuffd(pmd) pte_pmd(pte_mkuffd(pmd_pte(pmd))) +#define pmd_clear_uffd(pmd) pte_pmd(pte_clear_uffd(pmd_pte(pmd))) +#define pmd_swp_uffd(pmd) pte_swp_uffd(pmd_pte(pmd)) +#define pmd_swp_mkuffd(pmd) pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))) +#define pmd_swp_clear_uffd(pmd) \ + pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))) #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 #define pmd_write(pmd) pte_write(pmd_pte(pmd)) @@ -1512,7 +1512,7 @@ static inline pmd_t pmdp_establish(struct vm_area_str= uct *vma, * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive - * bit 3: remember uffd-wp state + * bit 3: remember uffd state * bits 6-10: swap type * bit 11: PTE_PRESENT_INVALID (must be zero) * bits 12-61: swap offset diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgta= ble.h index ca69948b3ed8..b111e134795e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -400,35 +400,35 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -#define pgtable_supports_uffd_wp() \ +#define pgtable_supports_uffd() \ riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B) =20 -static inline bool pte_uffd_wp(pte_t pte) +static inline bool pte_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_UFFD); } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_UFFD)); } =20 -static inline bool pte_swp_uffd_wp(pte_t pte) +static inline bool pte_swp_uffd(pte_t pte) { return !!(pte_val(pte) & _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return __pte(pte_val(pte) | _PAGE_SWP_UFFD); } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD)); } @@ -886,34 +886,34 @@ static inline pud_t pud_mkspecial(pud_t pud) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline bool pmd_uffd_wp(pmd_t pmd) +static inline bool pmd_uffd(pmd_t pmd) { - return pte_uffd_wp(pmd_pte(pmd)); + return pte_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { - return pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_clear_uffd(pmd_pte(pmd))); } =20 -static inline bool pmd_swp_uffd_wp(pmd_t pmd) +static inline bool pmd_swp_uffd(pmd_t pmd) { - return pte_swp_uffd_wp(pmd_pte(pmd)); + return pte_swp_uffd(pmd_pte(pmd)); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { - return pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_mkuffd(pmd_pte(pmd))); } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { - return pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))); + return pte_pmd(pte_swp_clear_uffd(pmd_pte(pmd))); } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ =20 diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 6983e52eaf81..cf8a176ff3d8 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,20 +77,20 @@ static inline void huge_ptep_set_wrprotect(struct mm_st= ruct *mm, __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } =20 -#define __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { return pte; } =20 -#define __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#define __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { return 0; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 038c806b50a2..d14c84b2a332 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -411,17 +411,17 @@ static inline pte_t pte_wrprotect(pte_t pte) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pte_uffd_wp(pte_t pte) +static inline int pte_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_UFFD; } =20 -static inline pte_t pte_mkuffd_wp(pte_t pte) +static inline pte_t pte_mkuffd(pte_t pte) { return pte_wrprotect(pte_set_flags(pte, _PAGE_UFFD)); } =20 -static inline pte_t pte_clear_uffd_wp(pte_t pte) +static inline pte_t pte_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_UFFD); } @@ -526,17 +526,17 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) } =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline int pmd_uffd_wp(pmd_t pmd) +static inline int pmd_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_UFFD; } =20 -static inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd_wrprotect(pmd_set_flags(pmd, _PAGE_UFFD)); } =20 -static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_UFFD); } @@ -1548,32 +1548,32 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t = pmd) #endif =20 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static inline pte_t pte_swp_mkuffd(pte_t pte) { return pte_set_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline int pte_swp_uffd_wp(pte_t pte) +static inline int pte_swp_uffd(pte_t pte) { return pte_flags(pte) & _PAGE_SWP_UFFD; } =20 -static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte_clear_flags(pte, _PAGE_SWP_UFFD); } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_UFFD); } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return pmd_flags(pmd) & _PAGE_SWP_UFFD; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_SWP_UFFD); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e21a38ac745b..1e5f6ee8a3b6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2035,14 +2035,14 @@ static pagemap_entry_t pte_to_pagemap_entry(struct = pagemapread *pm, page =3D vm_normal_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_uffd_wp(pte)) + if (pte_uffd(pte)) flags |=3D PM_UFFD_WP; } else { softleaf_t entry; =20 if (pte_swp_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) flags |=3D PM_UFFD_WP; entry =3D softleaf_from_pte(pte); if (pm->show_pfn) { @@ -2108,7 +2108,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_PRESENT; if (pmd_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) flags |=3D PM_UFFD_WP; if (pm->show_pfn) frame =3D pmd_pfn(pmd) + idx; @@ -2127,7 +2127,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigne= d long addr, flags |=3D PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |=3D PM_SOFT_DIRTY; - if (pmd_swp_uffd_wp(pmd)) + if (pmd_swp_uffd(pmd)) flags |=3D PM_UFFD_WP; VM_WARN_ON_ONCE(!pmd_is_migration_entry(pmd)); page =3D softleaf_to_page(entry); @@ -2233,14 +2233,14 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsig= ned long hmask, !hugetlb_pmd_shared(ptep)) flags |=3D PM_MMAP_EXCLUSIVE; =20 - if (huge_pte_uffd_wp(pte)) + if (huge_pte_uffd(pte)) flags |=3D PM_UFFD_WP; =20 flags |=3D PM_PRESENT; if (pm->show_pfn) frame =3D pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); - } else if (pte_swp_uffd_wp_any(pte)) { + } else if (pte_swp_uffd_any(pte)) { flags |=3D PM_UFFD_WP; } =20 @@ -2441,7 +2441,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd_wp(pte)) + if (!pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2459,7 +2459,7 @@ static unsigned long pagemap_page_category(struct pag= emap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; =20 entry =3D softleaf_from_pte(pte); @@ -2484,13 +2484,13 @@ static void make_uffd_wp_pte(struct vm_area_struct = *vma, pte_t old_pte; =20 old_pte =3D ptep_modify_prot_start(vma, addr, pte); - ptent =3D pte_mkuffd_wp(old_pte); + ptent =3D pte_mkuffd(old_pte); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); } else if (pte_none(ptent)) { set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); } else { - ptent =3D pte_swp_mkuffd_wp(ptent); + ptent =3D pte_swp_mkuffd(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); } } @@ -2509,7 +2509,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd_wp(pmd)) + if (!pmd_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; =20 if (p->masks_of_interest & PAGE_IS_FILE) { @@ -2524,7 +2524,7 @@ static unsigned long pagemap_thp_category(struct page= map_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd_wp(pmd)) + if (!pmd_swp_uffd(pmd)) categories |=3D PAGE_IS_WRITTEN; if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2548,10 +2548,10 @@ static void make_uffd_wp_pmd(struct vm_area_struct = *vma, =20 if (pmd_present(pmd)) { old =3D pmdp_invalidate_ad(vma, addr, pmdp); - pmd =3D pmd_mkuffd_wp(old); + pmd =3D pmd_mkuffd(old); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } else if (pmd_is_migration_entry(pmd)) { - pmd =3D pmd_swp_mkuffd_wp(pmd); + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } } @@ -2573,7 +2573,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd_wp(pte)) + if (!huge_pte_uffd(pte)) categories |=3D PAGE_IS_WRITTEN; if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; @@ -2584,7 +2584,7 @@ static unsigned long pagemap_hugetlb_category(pte_t p= te) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_wp_any(pte)) + if (!pte_swp_uffd_any(pte)) categories |=3D PAGE_IS_WRITTEN; if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; @@ -2612,12 +2612,12 @@ static void make_uffd_wp_huge_pte(struct vm_area_st= ruct *vma, =20 if (softleaf_is_migration(entry)) { set_huge_pte_at(vma->vm_mm, addr, ptep, - pte_swp_mkuffd_wp(ptent), psize); + pte_swp_mkuffd(ptent), psize); } else { pte_t old_pte, new_pte; =20 old_pte =3D huge_ptep_modify_prot_start(vma, addr, ptep); - new_pte =3D huge_pte_mkuffd_wp(old_pte); + new_pte =3D huge_pte_mkuffd(old_pte); huge_ptep_modify_prot_commit(vma, addr, ptep, old_pte, new_pte); } } @@ -2850,8 +2850,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) @@ -2868,8 +2868,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, unsigned long next =3D addr + PAGE_SIZE; pte_t ptent =3D ptep_get(pte); =20 - if ((pte_present(ptent) && pte_uffd_wp(ptent)) || - pte_swp_uffd_wp_any(ptent)) + if ((pte_present(ptent) && pte_uffd(ptent)) || + pte_swp_uffd_any(ptent)) continue; ret =3D pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, p, addr, &next); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index e1a2e1b7c8e7..635c41cc3479 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -37,24 +37,24 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t= newprot) return pte_modify(pte, newprot); } =20 -#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD_WP -static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD +static inline pte_t huge_pte_mkuffd(pte_t pte) { - return huge_pte_wrprotect(pte_mkuffd_wp(pte)); + return huge_pte_wrprotect(pte_mkuffd(pte)); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP -static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD +static inline pte_t huge_pte_clear_uffd(pte_t pte) { - return pte_clear_uffd_wp(pte); + return pte_clear_uffd(pte); } #endif =20 -#ifndef __HAVE_ARCH_HUGE_PTE_UFFD_WP -static inline int huge_pte_uffd_wp(pte_t pte) +#ifndef __HAVE_ARCH_HUGE_PTE_UFFD +static inline int huge_pte_uffd(pte_t pte) { - return pte_uffd_wp(pte); + return pte_uffd(pte); } #endif =20 diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtab= le_uffd.h index 0d85791efdf7..30e88fc1de2f 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -2,79 +2,79 @@ #define _ASM_GENERIC_PGTABLE_UFFD_H =20 /* - * Some platforms can customize the uffd-wp bit, making it unavailable + * Some platforms can customize the uffd PTE bit, making it unavailable * even if the architecture provides the resource. * Adding this API allows architectures to add their own checks for the * devices on which the kernel is running. * Note: When overriding it, please make sure the * CONFIG_HAVE_ARCH_USERFAULTFD_WP is part of this macro. */ -#ifndef pgtable_supports_uffd_wp -#define pgtable_supports_uffd_wp() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD= _WP) +#ifndef pgtable_supports_uffd +#define pgtable_supports_uffd() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) #endif =20 static inline bool uffd_supports_wp_marker(void) { - return pgtable_supports_uffd_wp() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP= ); + return pgtable_supports_uffd() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP); } =20 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP -static __always_inline int pte_uffd_wp(pte_t pte) +static __always_inline int pte_uffd(pte_t pte) { return 0; } =20 -static __always_inline int pmd_uffd_wp(pmd_t pmd) +static __always_inline int pmd_uffd(pmd_t pmd) { return 0; } =20 -static __always_inline pte_t pte_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_mkuffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_mkuffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_clear_uffd(pte_t pte) { return pte; } =20 -static __always_inline pmd_t pmd_clear_uffd_wp(pmd_t pmd) +static __always_inline pmd_t pmd_clear_uffd(pmd_t pmd) { return pmd; } =20 -static __always_inline pte_t pte_swp_mkuffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_mkuffd(pte_t pte) { return pte; } =20 -static __always_inline int pte_swp_uffd_wp(pte_t pte) +static __always_inline int pte_swp_uffd(pte_t pte) { return 0; } =20 -static __always_inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +static __always_inline pte_t pte_swp_clear_uffd(pte_t pte) { return pte; } =20 -static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_mkuffd(pmd_t pmd) { return pmd; } =20 -static inline int pmd_swp_uffd_wp(pmd_t pmd) +static inline int pmd_swp_uffd(pmd_t pmd) { return 0; } =20 -static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) +static inline pmd_t pmd_swp_clear_uffd(pmd_t pmd) { return pmd; } diff --git a/include/linux/leafops.h b/include/linux/leafops.h index 992cd8bd8ed0..2ce2f37ac883 100644 --- a/include/linux/leafops.h +++ b/include/linux/leafops.h @@ -100,8 +100,8 @@ static inline softleaf_t softleaf_from_pmd(pmd_t pmd) =20 if (pmd_swp_soft_dirty(pmd)) pmd =3D pmd_swp_clear_soft_dirty(pmd); - if (pmd_swp_uffd_wp(pmd)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + if (pmd_swp_uffd(pmd)) + pmd =3D pmd_swp_clear_uffd(pmd); arch_entry =3D __pmd_to_swp_entry(pmd); =20 /* Temporary until swp_entry_t eliminated. */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..2811caf4188d 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -600,14 +600,14 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, return false; =20 /* A uffd-wp wr-protected normal pte */ - if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + if (unlikely(pte_present(pteval) && pte_uffd(pteval))) arm_uffd_pte =3D true; =20 /* * A uffd-wp wr-protected swap pte. Note: this should even cover an * existing pte marker with uffd-wp bit set. */ - if (unlikely(pte_swp_uffd_wp_any(pteval))) + if (unlikely(pte_swp_uffd_any(pteval))) arm_uffd_pte =3D true; =20 if (unlikely(arm_uffd_pte)) { diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 8cfc966eae48..15c6440e38dd 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -73,8 +73,8 @@ static inline pte_t pte_swp_clear_flags(pte_t pte) pte =3D pte_swp_clear_exclusive(pte); if (pte_swp_soft_dirty(pte)) pte =3D pte_swp_clear_soft_dirty(pte); - if (pte_swp_uffd_wp(pte)) - pte =3D pte_swp_clear_uffd_wp(pte); + if (pte_swp_uffd(pte)) + pte =3D pte_swp_clear_uffd(pte); return pte; } =20 diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 3ec8e1071673..f4cf5763f92c 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -211,13 +211,13 @@ static inline bool userfaultfd_minor(struct vm_area_s= truct *vma) static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { - return userfaultfd_wp(vma) && pte_uffd_wp(pte); + return userfaultfd_wp(vma) && pte_uffd(pte); } =20 static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, pmd_t pmd) { - return userfaultfd_wp(vma) && pmd_uffd_wp(pmd); + return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 static inline bool userfaultfd_armed(struct vm_area_struct *vma) @@ -272,10 +272,10 @@ static inline bool userfaultfd_wp_use_markers(struct = vm_area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { if (!uffd_supports_wp_marker()) return false; @@ -283,7 +283,7 @@ static inline bool pte_swp_uffd_wp_any(pte_t pte) if (pte_present(pte)) return false; =20 - if (pte_swp_uffd_wp(pte)) + if (pte_swp_uffd(pte)) return true; =20 if (pte_is_uffd_wp_marker(pte)) @@ -424,7 +424,7 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er * forms (pte marker or a normal swap pte), false otherwise. */ -static inline bool pte_swp_uffd_wp_any(pte_t pte) +static inline bool pte_swp_uffd_any(pte_t pte) { return false; } diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge= _memory.h index 291fae364c62..5a48c5406cce 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -16,7 +16,7 @@ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ - EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \ + EM( SCAN_PTE_UFFD, "pte_uffd_wp") \ EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null") \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7c895b1d366..d43c2255f47d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1909,8 +1909,8 @@ static void copy_huge_non_present_pmd( pmd =3D swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } else if (softleaf_is_device_private(entry)) { /* @@ -1923,8 +1923,8 @@ static void copy_huge_non_present_pmd( =20 if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd =3D pmd_swp_mkuffd_wp(pmd); + if (pmd_swp_uffd(*src_pmd)) + pmd =3D pmd_swp_mkuffd(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } =20 @@ -1944,7 +1944,7 @@ static void copy_huge_non_present_pmd( mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } =20 @@ -2040,7 +2040,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); pmdp_set_wrprotect(src_mm, addr, src_pmd); if (!userfaultfd_wp(dst_vma)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2581,9 +2581,9 @@ static pmd_t clear_uffd_wp_pmd(pmd_t pmd) if (pmd_none(pmd)) return pmd; if (pmd_present(pmd)) - pmd =3D pmd_clear_uffd_wp(pmd); + pmd =3D pmd_clear_uffd(pmd); else - pmd =3D pmd_swp_clear_uffd_wp(pmd); + pmd =3D pmd_swp_clear_uffd(pmd); =20 return pmd; } @@ -2663,16 +2663,16 @@ static void change_non_present_huge_pmd(struct mm_s= truct *mm, } else if (softleaf_is_device_private_write(entry)) { entry =3D make_readable_device_private_entry(swp_offset(entry)); newpmd =3D swp_entry_to_pmd(entry); - if (pmd_swp_uffd_wp(*pmd)) - newpmd =3D pmd_swp_mkuffd_wp(newpmd); + if (pmd_swp_uffd(*pmd)) + newpmd =3D pmd_swp_mkuffd(newpmd); } else { newpmd =3D *pmd; } =20 if (uffd_wp) - newpmd =3D pmd_swp_mkuffd_wp(newpmd); + newpmd =3D pmd_swp_mkuffd(newpmd); else if (uffd_wp_resolve) - newpmd =3D pmd_swp_clear_uffd_wp(newpmd); + newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); } @@ -2753,14 +2753,14 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, =20 entry =3D pmd_modify(oldpmd, newprot); if (uffd_wp) - entry =3D pmd_mkuffd_wp(entry); + entry =3D pmd_mkuffd(entry); else if (uffd_wp_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly * handled. */ - entry =3D pmd_clear_uffd_wp(entry); + entry =3D pmd_clear_uffd(entry); =20 /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && @@ -3103,8 +3103,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, =20 entry =3D pfn_pte(zero_pfn(addr), vma->vm_page_prot); entry =3D pte_mkspecial(entry); - if (pmd_uffd_wp(old_pmd)) - entry =3D pte_mkuffd_wp(entry); + if (pmd_uffd(old_pmd)) + entry =3D pte_mkuffd(entry); VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3188,7 +3188,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_migration_write(entry); if (PageAnon(page)) @@ -3204,7 +3204,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); =20 soft_dirty =3D pmd_swp_soft_dirty(old_pmd); - uffd_wp =3D pmd_swp_uffd_wp(old_pmd); + uffd_wp =3D pmd_swp_uffd(old_pmd); =20 write =3D softleaf_is_device_private_write(entry); anon_exclusive =3D PageAnonExclusive(page); @@ -3261,7 +3261,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, write =3D pmd_write(old_pmd); young =3D pmd_young(old_pmd); soft_dirty =3D pmd_soft_dirty(old_pmd); - uffd_wp =3D pmd_uffd_wp(old_pmd); + uffd_wp =3D pmd_uffd(old_pmd); =20 VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); @@ -3332,7 +3332,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3359,7 +3359,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_swp_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + entry =3D pte_swp_mkuffd(entry); VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3377,7 +3377,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, if (soft_dirty) entry =3D pte_mksoft_dirty(entry); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); @@ -5018,8 +5018,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_wa= lk *pvmw, pmdswp =3D swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp =3D pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp =3D pmd_swp_mkuffd_wp(pmdswp); + if (pmd_uffd(pmdval)) + pmdswp =3D pmd_swp_mkuffd(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); folio_remove_rmap_pmd(folio, page, vma); folio_put(folio); @@ -5049,8 +5049,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) pmde =3D pmd_mksoft_dirty(pmde); if (softleaf_is_migration_write(entry)) pmde =3D pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_mkuffd(pmde); if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ @@ -5070,8 +5070,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk= *pvmw, struct page *new) =20 if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde =3D pmd_swp_mksoft_dirty(pmde); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde =3D pmd_swp_mkuffd_wp(pmde); + if (pmd_swp_uffd(*pvmw->pmd)) + pmde =3D pmd_swp_mkuffd(pmde); } =20 if (folio_test_anon(folio)) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 571212b80835..d0c81a056ae2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4843,8 +4843,8 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte= _t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4918,10 +4918,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { - bool uffd_wp =3D pte_swp_uffd_wp(entry); + bool uffd =3D pte_swp_uffd(entry); =20 if (!softleaf_is_migration_read(softleaf) && cow) { /* @@ -4931,12 +4931,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd_wp) - entry =3D pte_swp_mkuffd_wp(entry); + if (userfaultfd_wp(src_vma) && uffd) + entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { const pte_marker marker =3D copy_pte_marker(softleaf, dst_vma); @@ -5013,7 +5013,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, } =20 if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd_wp(entry); + entry =3D huge_pte_clear_uffd(entry); =20 set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); @@ -5061,9 +5061,9 @@ static void move_huge_pte(struct vm_area_struct *vma,= unsigned long old_addr, } else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -5197,7 +5197,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, * drop the uffd-wp bit in this zap, then replace the * pte with a marker. */ - if (pte_swp_uffd_wp_any(pte) && + if (pte_swp_uffd_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5233,7 +5233,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, if (huge_pte_dirty(pte)) folio_mark_dirty(folio); /* Leave a uffd-wp pte marker if needed */ - if (huge_pte_uffd_wp(pte) && + if (huge_pte_uffd(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), @@ -5437,7 +5437,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) * can trigger this, because hugetlb_fault() will always resolve * uffd-wp bit first. */ - if (!unshare && huge_pte_uffd_wp(pte)) + if (!unshare && huge_pte_uffd(pte)) return 0; =20 /* Let's take out MAP_SHARED mappings first. */ @@ -5581,8 +5581,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) huge_ptep_clear_flush(vma, vmf->address, vmf->pte); hugetlb_remove_rmap(old_folio); hugetlb_add_new_anon_rmap(new_folio, vma, vmf->address); - if (huge_pte_uffd_wp(pte)) - newpte =3D huge_pte_mkuffd_wp(newpte); + if (huge_pte_uffd(pte)) + newpte =3D huge_pte_mkuffd(newpte); set_huge_pte_at(mm, vmf->address, vmf->pte, newpte, huge_page_size(h)); folio_set_hugetlb_migratable(new_folio); @@ -5860,7 +5860,7 @@ static vm_fault_t hugetlb_no_page(struct address_spac= e *mapping, * if populated. */ if (unlikely(pte_is_uffd_wp_marker(vmf->orig_pte))) - new_pte =3D huge_pte_mkuffd_wp(new_pte); + new_pte =3D huge_pte_mkuffd(new_pte); set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h)); =20 hugetlb_count_add(pages_per_huge_page(h), mm); @@ -6058,7 +6058,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, goto out_ptl; =20 /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(mm, vmf.address= , vmf.pte)) && + if (userfaultfd_wp(vma) && huge_pte_uffd(huge_ptep_get(mm, vmf.address, v= mf.pte)) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(vmf.orig_pte)) { if (!userfaultfd_wp_async(vma)) { spin_unlock(vmf.ptl); @@ -6067,7 +6067,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct= vm_area_struct *vma, return handle_userfault(&vmf, VM_UFFD_WP); } =20 - vmf.orig_pte =3D huge_pte_clear_uffd_wp(vmf.orig_pte); + vmf.orig_pte =3D huge_pte_clear_uffd(vmf.orig_pte); set_huge_pte_at(mm, vmf.address, vmf.pte, vmf.orig_pte, huge_page_size(hstate_vma(vma))); /* Fallthrough to CoW */ @@ -6352,7 +6352,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, _dst_pte =3D pte_mkyoung(_dst_pte); =20 if (wp_enabled) - _dst_pte =3D huge_pte_mkuffd_wp(_dst_pte); + _dst_pte =3D huge_pte_mkuffd(_dst_pte); =20 set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte, size); =20 @@ -6476,9 +6476,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); } else if (unlikely(pte_is_marker(pte))) { @@ -6499,9 +6499,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); if (uffd_wp) - pte =3D huge_pte_mkuffd_wp(pte); + pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve) - pte =3D huge_pte_clear_uffd_wp(pte); + pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/internal.h b/mm/internal.h index 5602393054f3..9325eefbea6a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -412,8 +412,8 @@ static inline pte_t pte_move_swp_offset(pte_t pte, long= delta) new =3D pte_swp_mksoft_dirty(new); if (pte_swp_exclusive(pte)) new =3D pte_swp_mkexclusive(new); - if (pte_swp_uffd_wp(pte)) - new =3D pte_swp_mkuffd_wp(new); + if (pte_swp_uffd(pte)) + new =3D pte_swp_mkuffd(new); =20 return new; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4549a020bf73..afa218be15de 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -37,7 +37,7 @@ enum scan_result { SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, - SCAN_PTE_UFFD_WP, + SCAN_PTE_UFFD, SCAN_PTE_MAPPED_HUGEPAGE, SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, @@ -712,8 +712,8 @@ static enum scan_result __collapse_huge_page_isolate(st= ruct vm_area_struct *vma, result =3D SCAN_PTE_NON_PRESENT; goto out; } - if (pte_uffd_wp(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_uffd(pteval)) { + result =3D SCAN_PTE_UFFD; goto out; } page =3D vm_normal_page(vma, addr, pteval); @@ -1566,7 +1566,7 @@ static int mthp_collapse(struct mm_struct *mm, struct= vm_area_struct *vma, case SCAN_PAGE_NULL: case SCAN_DEL_PAGE_LRU: case SCAN_PTE_NON_PRESENT: - case SCAN_PTE_UFFD_WP: + case SCAN_PTE_UFFD: case SCAN_ALLOC_HUGE_PAGE_FAIL: case SCAN_PAGE_LAZYFREE: goto next_order; @@ -1666,15 +1666,15 @@ static enum scan_result collapse_scan_pmd(struct mm= _struct *mm, /* * Always be strict with uffd-wp * enabled swap entries. Please see - * comment below for pte_uffd_wp(). + * comment below for pte_uffd(). */ - if (pte_swp_uffd_wp_any(pteval)) { - result =3D SCAN_PTE_UFFD_WP; + if (pte_swp_uffd_any(pteval)) { + result =3D SCAN_PTE_UFFD; goto out_unmap; } continue; } - if (pte_uffd_wp(pteval)) { + if (pte_uffd(pteval)) { /* * Don't collapse the page if any of the small * PTEs are armed with uffd write protection. @@ -1684,7 +1684,7 @@ static enum scan_result collapse_scan_pmd(struct mm_s= truct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result =3D SCAN_PTE_UFFD_WP; + result =3D SCAN_PTE_UFFD; goto out_unmap; } =20 @@ -1897,7 +1897,7 @@ static enum scan_result try_collapse_pte_mapped_thp(s= truct mm_struct *mm, unsign =20 /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ if (userfaultfd_wp(vma)) - return SCAN_PTE_UFFD_WP; + return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); @@ -3244,7 +3244,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsi= gned long start, /* Whitelisted set of results where continuing OK */ case SCAN_NO_PTE_TABLE: case SCAN_PTE_NON_PRESENT: - case SCAN_PTE_UFFD_WP: + case SCAN_PTE_UFFD: case SCAN_LACK_REFERENCED_PAGE: case SCAN_PAGE_NULL: case SCAN_PAGE_COUNT: diff --git a/mm/memory.c b/mm/memory.c index 7c020995eafc..c4fd5cb4a08f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -893,8 +893,8 @@ static void restore_exclusive_pte(struct vm_area_struct= *vma, if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_mksoft_dirty(pte); =20 - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_mkuffd(pte); =20 if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { @@ -984,8 +984,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_private(entry)) { @@ -1018,8 +1018,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, entry =3D make_readable_device_private_entry( swp_offset(entry)); pte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(orig_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(orig_pte)) + pte =3D pte_swp_mkuffd(pte); set_pte_at(src_mm, addr, src_pte, pte); } } else if (softleaf_is_device_exclusive(entry)) { @@ -1042,7 +1042,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, return 0; } if (!userfaultfd_wp(dst_vma)) - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } @@ -1090,7 +1090,7 @@ copy_present_page(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) /* Uffd-wp needs to be delivered to dest pte as well */ - pte =3D pte_mkuffd_wp(pte); + pte =3D pte_mkuffd(pte); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1113,7 +1113,7 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkold(pte); =20 if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); =20 set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } @@ -3925,8 +3925,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (unlikely(unshare)) { if (pte_soft_dirty(vmf->orig_pte)) entry =3D pte_mksoft_dirty(entry); - if (pte_uffd_wp(vmf->orig_pte)) - entry =3D pte_mkuffd_wp(entry); + if (pte_uffd(vmf->orig_pte)) + entry =3D pte_mkuffd(entry); } else { entry =3D maybe_mkwrite(pte_mkdirty(entry), vma); } @@ -4261,7 +4261,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * etc.) because we're only removing the uffd-wp bit, * which is completely invisible to the user. */ - pte =3D pte_clear_uffd_wp(ptep_get(vmf->pte)); + pte =3D pte_clear_uffd(ptep_get(vmf->pte)); =20 set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); /* @@ -5038,8 +5038,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte =3D mk_pte(page, vma->vm_page_prot); if (pte_swp_soft_dirty(vmf->orig_pte)) pte =3D pte_mksoft_dirty(pte); - if (pte_swp_uffd_wp(vmf->orig_pte)) - pte =3D pte_mkuffd_wp(pte); + if (pte_swp_uffd(vmf->orig_pte)) + pte =3D pte_mkuffd(pte); =20 /* * Same logic as in do_wp_page(); however, optimize for pages that are @@ -5255,7 +5255,7 @@ void map_anon_folio_pte_nopf(struct folio *folio, pte= _t *pte, if (vma->vm_flags & VM_WRITE) entry =3D pte_mkwrite(pte_mkdirty(entry), vma); if (uffd_wp) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); =20 folio_ref_add(folio, nr_pages - 1); folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); @@ -5322,7 +5322,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } if (vmf_orig_pte_uffd_wp(vmf)) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); set_pte_at(vma->vm_mm, addr, vmf->pte, entry); =20 /* No need to invalidate - it was non-present before */ @@ -5572,7 +5572,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio= *folio, else if (pte_write(entry) && folio_test_dirty(folio)) entry =3D pte_mkdirty(entry); if (unlikely(vmf_orig_pte_uffd_wp(vmf))) - entry =3D pte_mkuffd_wp(entry); + entry =3D pte_mkuffd(entry); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { VM_BUG_ON_FOLIO(nr !=3D 1, folio); diff --git a/mm/migrate.c b/mm/migrate.c index 0c6a0ab6ecce..4bdb5be7afbf 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -326,8 +326,8 @@ static bool try_to_map_unused_to_zeropage(struct page_v= ma_mapped_walk *pvmw, =20 if (pte_swp_soft_dirty(old_pte)) newpte =3D pte_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(old_pte)) - newpte =3D pte_mkuffd_wp(newpte); + if (pte_swp_uffd(old_pte)) + newpte =3D pte_mkuffd(newpte); =20 set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 @@ -391,8 +391,8 @@ static bool remove_migration_pte(struct folio *folio, =20 if (softleaf_is_migration_write(entry)) pte =3D pte_mkwrite(pte, vma); - else if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_mkuffd_wp(pte); + else if (pte_swp_uffd(old_pte)) + pte =3D pte_mkuffd(pte); =20 if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; @@ -407,8 +407,8 @@ static bool remove_migration_pte(struct folio *folio, pte =3D softleaf_to_pte(entry); if (pte_swp_soft_dirty(old_pte)) pte =3D pte_swp_mksoft_dirty(pte); - if (pte_swp_uffd_wp(old_pte)) - pte =3D pte_swp_mkuffd_wp(pte); + if (pte_swp_uffd(old_pte)) + pte =3D pte_swp_mkuffd(pte); } =20 #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 554754eb26ff..17da1bab0248 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -445,13 +445,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (pte_present(pte)) { if (pte_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pte)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pte)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, addr, ptep, swp_pte); =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 9cbf932b028c..8340c8b228c6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -240,8 +240,8 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, */ entry =3D make_readable_device_private_entry(swp_offset(entry)); newpte =3D swp_entry_to_pte(entry); - if (pte_swp_uffd_wp(oldpte)) - newpte =3D pte_swp_mkuffd_wp(newpte); + if (pte_swp_uffd(oldpte)) + newpte =3D pte_swp_mkuffd(newpte); } else if (softleaf_is_marker(entry)) { /* * Ignore error swap entries unconditionally, @@ -266,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, } =20 if (uffd_wp) - newpte =3D pte_swp_mkuffd_wp(newpte); + newpte =3D pte_swp_mkuffd(newpte); else if (uffd_wp_resolve) - newpte =3D pte_swp_clear_uffd_wp(newpte); + newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { set_pte_at(vma->vm_mm, addr, pte, newpte); @@ -290,9 +290,9 @@ static __always_inline void change_present_ptes(struct = mmu_gather *tlb, ptent =3D pte_modify(oldpte, newprot); =20 if (uffd_wp) - ptent =3D pte_mkuffd_wp(ptent); + ptent =3D pte_mkuffd(ptent); else if (uffd_wp_resolve) - ptent =3D pte_clear_uffd_wp(ptent); + ptent =3D pte_clear_uffd(ptent); =20 /* * In some writable, shared mappings, we might want diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..12732a5c547e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -297,9 +297,9 @@ static int move_ptes(struct pagetable_move_control *pmc, else { if (need_clear_uffd_wp) { if (pte_present(pte)) - pte =3D pte_clear_uffd_wp(pte); + pte =3D pte_clear_uffd(pte); else - pte =3D pte_swp_clear_uffd_wp(pte); + pte =3D pte_swp_clear_uffd(pte); } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 53a8997ec043..3fb995e5d40d 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -188,8 +188,8 @@ static inline bool softleaf_cached_writable(softleaf_t = entry) static void page_table_check_pte_flags(pte_t pte) { if (pte_present(pte)) { - WARN_ON_ONCE(pte_uffd_wp(pte) && pte_write(pte)); - } else if (pte_swp_uffd_wp(pte)) { + WARN_ON_ONCE(pte_uffd(pte) && pte_write(pte)); + } else if (pte_swp_uffd(pte)) { const softleaf_t entry =3D softleaf_from_pte(pte); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); @@ -216,9 +216,9 @@ EXPORT_SYMBOL(__page_table_check_ptes_set); static inline void page_table_check_pmd_flags(pmd_t pmd) { if (pmd_present(pmd)) { - if (pmd_uffd_wp(pmd)) + if (pmd_uffd(pmd)) WARN_ON_ONCE(pmd_write(pmd)); - } else if (pmd_swp_uffd_wp(pmd)) { + } else if (pmd_swp_uffd(pmd)) { const softleaf_t entry =3D softleaf_from_pmd(pmd); =20 WARN_ON_ONCE(softleaf_cached_writable(entry)); diff --git a/mm/rmap.c b/mm/rmap.c index 1c77d5dc06e9..546bc1cf9391 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2318,13 +2318,13 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, if (likely(pte_present(pteval))) { if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { @@ -2692,14 +2692,14 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } else { swp_pte =3D swp_entry_to_pte(entry); if (pte_swp_soft_dirty(pteval)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (pte_swp_uffd(pteval)) + swp_pte =3D pte_swp_mkuffd(swp_pte); } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, diff --git a/mm/swapfile.c b/mm/swapfile.c index e3d126602a1e..15fdca2da1f7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2557,8 +2557,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, new_pte =3D pte_mkold(mk_pte(page, vma->vm_page_prot)); if (pte_swp_soft_dirty(old_pte)) new_pte =3D pte_mksoft_dirty(new_pte); - if (pte_swp_uffd_wp(old_pte)) - new_pte =3D pte_mkuffd_wp(new_pte); + if (pte_swp_uffd(old_pte)) + new_pte =3D pte_mkuffd(new_pte); setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f6d2a1c67019..9d74be69873a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -394,7 +394,7 @@ static int mfill_atomic_install_pte(pmd_t *dst_pmd, if (writable) _dst_pte =3D pte_mkwrite(_dst_pte, dst_vma); if (flags & MFILL_ATOMIC_WP) - _dst_pte =3D pte_mkuffd_wp(_dst_pte); + _dst_pte =3D pte_mkuffd(_dst_pte); =20 ret =3D -EAGAIN; dst_pte =3D pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); @@ -3591,7 +3591,7 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) vm_flags |=3D VM_UFFD_MISSING; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) { - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) goto out; =20 vm_flags |=3D VM_UFFD_WP; @@ -4301,7 +4301,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, uffdio_api.features &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif - if (!pgtable_supports_uffd_wp()) + if (!pgtable_supports_uffd()) uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; =20 if (!uffd_supports_wp_marker()) { --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9739D3C9EF4; Tue, 26 May 2026 13:05:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800742; cv=none; b=ALf0st13rcUUk/Wo4EWKP6PwkrxUSGPTa+JJGjvvhMrQAXiAqtuRcKvGpJA+GPPDCybWz4cmJl/WpI3DlTFB7qBmK+UvlcG+nnggK0ltmJz74T2fIz+7ToqeuRquFhEplEjyZAlwRsn1/QxukY108Y7uN1BCpBifWvFCvNkFEWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800742; c=relaxed/simple; bh=0LLEFa9EaZl5T17+qwCuHxikXRueCGUiki4N2TL3qIc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y5DL7hd3XLkrALYRBDaucmMNR9Y3R1Oyv8lUOBPkPAY1BlPsB4U+X4Kj3CB/1A7yvYVza3NOGQF3C92nvxxJ3Yjst9zXptVEoW16LmMII+tAALNSkwpubkLmku3+HK8fYZic2gyVOSs+Pj69YispzQJNIU98eUJgCVfWk57ntFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=V5u4u9Jx; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=TNeGK86g; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="V5u4u9Jx"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="TNeGK86g" Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfhigh.stl.internal (Postfix) with ESMTP id 509AB7A0194; Tue, 26 May 2026 09:05:39 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-02.internal (MEProxy); Tue, 26 May 2026 09:05:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800739; x= 1779887139; bh=l2eF4l/Q8Tds5YOPGrl7bqPwv4dzEj01YZ9nn6A7U0E=; b=V 5u4u9JxJ0J8lbXAksIQko2i3FlJtARolCg7Nz3WU5SiPyOpUQ7IYAXw//cJKSo8y r+6SSj4XodSX5glaroP5q5h+sfNAgOTD3mOUWdeQufaKElE3imsDMPhlBQx8TZVY z33H0aAmfZHfTqhF5j+o3Z0TJ09jdP9AlUW/OdgtowUEhEVNpSLZxNLnEzipcggA ZuoWZaS4oUiSFkwOKdZUaGjtZXy+bquw8F05E8V3y8Zn8sGbPsZwJnEflX1wloPT yRe4qkI9lyc0GunJiu/fsBbAPREx5quhfOoZ/1fDxhH059UzybDNf/MK+2KCbolq XUEDhsJEg2+B61aE6MveA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800739; x=1779887139; bh=l 2eF4l/Q8Tds5YOPGrl7bqPwv4dzEj01YZ9nn6A7U0E=; b=TNeGK86gGw0/gzlGC MNU034ZThmu5B5GTK/c+09HUkUIvMnmh+X7EjZB8HywNFsFdEowhE3xe/Okj65MO JZTzSs++JGT31ZooqshCnWZ3fmffJ9OByj5/RsXVv6cnKjsMekNsQLW5sTs9rmKD hc3NA3TCitkMZnFwj4wN+OHrc93t6UFi91rCjmc5GqWJfinOjE0UbbUSDqs/t9Qq gSKGX8iZ1yTFHp3MNKojC5+Gs0SAMCZexpMb7fhnwh+DCFArSJYwJ4RrlgyENw54 mYxDAXp97LBf2EBtEWu1o1hwhsfsuZ3vOEuLFzCUGZnzn2UNz6nxeAsVVIyWMkUi abNCw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEJiC033GhjL2GdnX/sXIn31a6xYH9Q9kJLS3tMsbkgwO/cImnHqXQKfEc55wOLx4 i2tqVUZQjDS/XsxdV1Neul2ks/uVgINeMKnTDgBTDd49K013fUhaa7mbXp5KOF+CzusOdT UiYD/yezdXIvKfe8z5w2tcDXcgJNRSU5eSwZFLQl4UtnpivyD/d0gOKuFq22BcEw4tqdAi ZrLfx6gsOEL+8JLB9yUFNWntFWViMy1Q2hMU0QsTFycK4r8MnF1MjvTDo2QKxDh/PpEdKA ZnnGzw8DPiuoI+QbA8cmDBSDEJ+Eojxn1EXRLRmRJNLXOxLbY2LwfoxO6aBp6BbF/3ouPS G/W9A7xIdv2wgm2GgO5LpY4dwVSaqP8u7FJCUmWLRPJ62hiMmrcyCmaN1wqJ1QoBAFCciv rtvEX5b6D7FtmTG0DAlx4/3hMp5T8ngC47L6P1fbpnHt4xCbDv2EmzJrpGT70FlgWiB4/L zZh5ns/idLUntFwmT1P6exBpd5NXjXWMI3SFBI92vxouzE+iSh9WzKoT2Bb9+0Q3KZVChn 4uYOm3ROFAjDS3s2eqfX20zYMIFP8AHJ3aKzjKaDBiYnpm7OxV2guKu4ime967rdmop5W3 yhcuutd0fjyR2zyBB7xcxm+FHk8b1irJ2B9YNsIQ5MAyZTVBrKYxmzGoWtTg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:38 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Date: Tue, 26 May 2026 14:04:56 +0100 Message-ID: <20260526130509.2748441-9-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Preparatory patch for userfaultfd read-write protection (RWP). RWP extends userfaultfd protection from plain write-protection (WP) to full read-write protection: accesses to an RWP-protected range -- reads as well as writes -- trap through userfaultfd. Reserve VM_UFFD_RWP, add the userfaultfd_rwp() and userfaultfd_protected() helpers, and wire up the smaps "ur" entry and the trace-flag table the rest of the series will use. The flag is gated on CONFIG_USERFAULTFD_RWP, which is introduced together with the UAPI in a later patch; until then VM_UFFD_RWP aliases VM_NONE and every downstream check folds to dead code. Nothing sets or queries the flag yet. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 28 +++++++++++++++++---------- include/linux/userfaultfd_k.h | 31 +++++++++++++++++++++++++----- include/trace/events/mmflags.h | 7 +++++++ 5 files changed, 55 insertions(+), 15 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems= /proc.rst index db6167befb7b..db28207c5290 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -607,6 +607,7 @@ encoded manner. The codes are the following: um userfaultfd missing tracking uw userfaultfd wr-protect tracking ui userfaultfd minor fault + ur userfaultfd read-write-protect tracking ss shadow/guarded control stack page sl sealed lf lock on fault pages diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 1e5f6ee8a3b6..974c5f4aa533 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1237,6 +1237,9 @@ static void show_smap_vma_flags(struct seq_file *m, s= truct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] =3D "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_USERFAULTFD_RWP + [ilog2(VM_UFFD_RWP)] =3D "ur", +#endif #ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK [ilog2(VM_SHADOW_STACK)] =3D "ss", #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 71b11945e4fc..6499cfb61dc4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -362,6 +362,7 @@ enum { #endif DECLARE_VMA_BIT(UFFD_MINOR, 41), DECLARE_VMA_BIT(SEALED, 42), + DECLARE_VMA_BIT(UFFD_RWP, 43), /* Flags that reuse flags above. */ DECLARE_VMA_BIT_ALIAS(PKEY_BIT0, HIGH_ARCH_0), DECLARE_VMA_BIT_ALIAS(PKEY_BIT1, HIGH_ARCH_1), @@ -505,6 +506,11 @@ enum { #else #define VM_UFFD_MINOR VM_NONE #endif +#ifdef CONFIG_USERFAULTFD_RWP +#define VM_UFFD_RWP INIT_VM_FLAG(UFFD_RWP) +#else +#define VM_UFFD_RWP VM_NONE +#endif #ifdef CONFIG_64BIT #define VM_ALLOW_ANY_UNCACHED INIT_VM_FLAG(ALLOW_ANY_UNCACHED) #define VM_SEALED INIT_VM_FLAG(SEALED) @@ -642,22 +648,24 @@ enum { * reconsistuted upon page fault, so necessitate page table copying upon f= ork. * * Note that these flags should be compared with the DESTINATION VMA not t= he - * source, as VM_UFFD_WP may not be propagated to destination, while all o= ther - * flags will be. + * source: VM_UFFD_WP and VM_UFFD_RWP may be cleared on the destination + * (dup_userfaultfd() -> userfaultfd_reset_ctx() when the parent context d= id + * not negotiate UFFD_FEATURE_EVENT_FORK), while all other flags propagate. * * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot= be * reasonably reconstructed on page fault. * * VM_UFFD_WP - Encodes metadata about an installed uffd - * write protect handler, which cannot be - * reconstructed on page fault. + * VM_UFFD_RWP write- or read-write-protect handler, which + * cannot be reconstructed on page fault. * - * We always copy pgtables when dst_vma has uffd= -wp - * enabled even if it's file-backed - * (e.g. shmem). Because when uffd-wp is enabled, - * pgtable contains uffd-wp protection informati= on, - * that's something we can't retrieve from page = cache, - * and skip copying will lose those info. + * We always copy pgtables when dst_vma has the + * uffd PTE bit in use even if it's file-backed + * (e.g. shmem). Because when the uffd bit is + * in use, the pgtable contains the protection + * information, that's something we can't + * retrieve from page cache, and skip copying + * will lose those info. * * VM_MAYBE_GUARD - Could contain page guard region markers which * by design are a property of the page tables diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f4cf5763f92c..0aef628514df 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -21,10 +21,11 @@ #include =20 /* The set of all possible UFFD-related VM flags. */ -#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_MINOR | \ + VM_UFFD_WP | VM_UFFD_RWP) =20 #define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BI= T, \ - VMA_UFFD_MINOR_BIT) + VMA_UFFD_MINOR_BIT, VMA_UFFD_RWP_BIT) =20 /* * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining @@ -178,7 +179,7 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(stru= ct vm_area_struct *vma, */ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { - return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); + return vma->vm_flags & (VM_UFFD_MINOR | VM_UFFD_WP | VM_UFFD_RWP); } =20 /* @@ -208,6 +209,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return vma->vm_flags & VM_UFFD_MINOR; } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_RWP; +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return userfaultfd_wp(vma) || userfaultfd_rwp(vma); +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -328,6 +339,16 @@ static inline bool userfaultfd_minor(struct vm_area_st= ruct *vma) return false; } =20 +static inline bool userfaultfd_rwp(struct vm_area_struct *vma) +{ + return false; +} + +static inline bool userfaultfd_protected(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -421,8 +442,8 @@ static inline bool userfaultfd_wp_use_markers(struct vm= _area_struct *vma) } =20 /* - * Returns true if this is a swap pte and was uffd-wp wr-protected in eith= er - * forms (pte marker or a normal swap pte), false otherwise. + * Returns true if this swap pte carries uffd-tracked state in either + * form (pte marker or a normal swap pte), false otherwise. */ static inline bool pte_swp_uffd_any(pte_t pte) { diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a6e5a44c9b42..bfface3d0203 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -194,6 +194,12 @@ IF_HAVE_PG_ARCH_3(arch_3) # define IF_HAVE_UFFD_MINOR(flag, name) #endif =20 +#ifdef CONFIG_USERFAULTFD_RWP +# define IF_HAVE_UFFD_RWP(flag, name) {flag, name}, +#else +# define IF_HAVE_UFFD_RWP(flag, name) +#endif + #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32) # define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name}, #else @@ -215,6 +221,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \ {VM_PFNMAP, "pfnmap" }, \ {VM_MAYBE_GUARD, "maybe_guard" }, \ {VM_UFFD_WP, "uffd_wp" }, \ +IF_HAVE_UFFD_RWP(VM_UFFD_RWP, "uffd_rwp" ) \ {VM_LOCKED, "locked" }, \ {VM_IO, "io" }, \ {VM_SEQ_READ, "seqread" }, \ --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 721BB3DDDC6; Tue, 26 May 2026 13:05:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800745; cv=none; b=X8117TQ8ah9Ot4oSQ2OU9ZSo3FEnPaZ5GdVSUczQNUnzuHTKy2TSl35cxd0mD94hj2XI7GxoqHqy33kOf0Md+7WiaGOHiRvp5X0Z4CoB5s/mc0Nh5mfeTAHPmUsqbRLXiZC41PNhuzj96C/wy1tC5IH8Ha9EopquQGtYZZRv1jg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800745; c=relaxed/simple; bh=XGe2KRu+dOIOwnHC2hGYADWHbfGY/drdIGdkCbNQJUM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O3dE9PfX9fPTgkUEXF62pcRpgjApqqCq7YJltUbCB13fXP4ReF5ozHrW8ode2EMu85fTc5bts8J8AxwhJT3KH9sSkEEdcaYA32OR5ot81OpbZNPkgcjliRKsqZSZcPLKXCNsLTNyB7AaoyEt97oerGMPoHx7X2L8b1I2MQHeeLs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=xAt0vXYT; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=b7qQnzes; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="xAt0vXYT"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="b7qQnzes" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfout.stl.internal (Postfix) with ESMTP id 2B4DC1D00108; Tue, 26 May 2026 09:05:42 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Tue, 26 May 2026 09:05:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800742; x= 1779887142; bh=HJ/P23+WvEJc8tAMZssUSHiIG11m5YK7nNg0e5jqPMc=; b=x At0vXYTB8VbB4xHIpZMpOoa+SK1tQFdbDowNTveLlc6Ho8YcYddRn+Dsfc+DYDt+ X41YhqniFoeUNYh6PEaIrzDwGgrdUesa0Kwhsnj1sOKJhwueQY3m0fJ7vV44YXC4 5JSuSs9x31RkhqjBMZ1fHbo8PRzwrW4JlGG5ymmToKpBmnyJ9Msrg+1vGyJoVZ3I EFXFnJgwBUvd5etbqyTX26NGGhNHQ5S4L5asFSwz7OSNSeLs6iIXEPYoS3V0P/yM tILG6cPXSX/eUjG39NkI5dL4B2/ynI4G/wFWovIltnQratofGNgtT0KpMy1JeTww qrVTX+q2ObN2LACDG0Asg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800742; x=1779887142; bh=H J/P23+WvEJc8tAMZssUSHiIG11m5YK7nNg0e5jqPMc=; b=b7qQnzes6kcqdTECe iHUcSEBJKXRkGFcyjuQOvLrxxap0Ct5AAhiyXqurhQZnVQtHEdx1QTiSeOEAwkBY sSj7k29xMbnOVD6UZPNB4WzPRMJuS37xh+JXObiJr8MxeJWhJ43lradEky88Vqqb HI2bJnfaMtUTLeLqF/+jO6hKkT83cY+joMZOAgV6Al4S4HssNWHxT1kw1FA0VHi/ lsWbuNBHq8ZGdygYeW2/uAl2ERZmIbGqoI+Trwrgtq4qK90IWIxGjMX91n8Izh2l FVDjPl4oM6VDJpQJQ91h/VocNi4Wx7zBw/zoyA25NMf6hHYrFLs3Sh/RUN9uahip kGZBA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFqbc6fXhopwy+hAOgtjHy4pTGuHjfXZXlZdTx/k4tQMvhyI2vZzCnM5UZQOJLsQ6 MBNN4KMbmYZdvz5T8cIgQbo11hylcsT72WGhRELqB/NWcVS6dsw2DuAcdzpihvUx/kC7Rx TXAh0tKkGSNzPC9MzCVari3WizUMLAhLPeZR8i+UtrjD8sUs4gbS3nJjyE6j7x9rQvwYKa Vo1Cdu33hYc8iqP/ATtq14j2/nP5x/Vq5r72hY07O9WQkAcMkY7hJXfuznzppNnRfPI/E0 vXoMBAS+e5kQ5+gg0Sxc802FUu+VDDuhW3vvNg1v2vn17b0MDhDS2LwVSA9QGKzjnWPAAt Kt/oqiMSLUwhAYjMvuUleXr5hhkZGcTKNFx9r2pVI4AMYV96PdweEcSAVtUG4fpY6gEZjs zf/DPrsSp2bPs4qJWW8vk2q54QDwAGzBOcHzKMwKjCKIrfkf3kJh0XrTie6n6Evwg/GULE CD+un1qH1uwlGwvrCM24Rb54P4mImv6KcgNo3iV33SVab8IHQwrOZOqcUrNdkU9nEGmxpC Kx8RZ3GrqrNUcUm8kRFEoXmCuFdKxxFGKJLgCKKNxshikWZf9j0X1vaaqOw65D+InmFNIj HJfgDWt931Uy0iyL8Ajjfp+h1S/6OQ6TnRXEM1KHxXrppVxuowSkwUQ6EWRQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:41 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Date: Tue, 26 May 2026 14:04:57 +0100 Message-ID: <20260526130509.2748441-10-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Preparatory patch. Add the change_protection() primitive that userfaultfd RWP will use. An RWP-protected PTE is PAGE_NONE with the uffd PTE bit set. The PROT_NONE half makes the CPU fault on any access; the uffd bit distinguishes an RWP fault from a plain mprotect(PROT_NONE) or NUMA hinting fault. MM_CP_UFFD_WP and MM_CP_UFFD_RWP share the same PTE bit, so the two cannot be used together on the same range. Two new change_protection() flags: MM_CP_UFFD_RWP install PAGE_NONE and set the uffd bit MM_CP_UFFD_RWP_RESOLVE restore vma->vm_page_prot, clear the uffd bit Both are wired through change_pte_range(), change_huge_pmd(), and hugetlb_change_protection() so anon, shmem, THP, and hugetlb all share the same semantics. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: SeongJae Park --- include/linux/mm.h | 5 ++++ include/linux/userfaultfd_k.h | 1 - mm/huge_memory.c | 30 +++++++++++++---------- mm/hugetlb.c | 25 ++++++++++++++----- mm/mprotect.c | 46 +++++++++++++++++++++++++++-------- 5 files changed, 77 insertions(+), 30 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6499cfb61dc4..f79801816f32 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3297,6 +3297,11 @@ int get_cmdline(struct task_struct *task, char *buff= er, int buflen); #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +/* Whether this change is for uffd RWP */ +#define MM_CP_UFFD_RWP (1UL << 4) /* do rwp */ +#define MM_CP_UFFD_RWP_RESOLVE (1UL << 5) /* resolve rwp */ +#define MM_CP_UFFD_RWP_ALL (MM_CP_UFFD_RWP | \ + MM_CP_UFFD_RWP_RESOLVE) =20 bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long add= r, pte_t pte); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 0aef628514df..564eb2aac321 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -361,7 +361,6 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_ar= ea_struct *vma, return false; } =20 - static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d43c2255f47d..40c65bf2d6dc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2640,8 +2640,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsign= ed long old_addr, } =20 static void change_non_present_huge_pmd(struct mm_struct *mm, - unsigned long addr, pmd_t *pmd, bool uffd_wp, - bool uffd_wp_resolve) + unsigned long addr, pmd_t *pmd, bool uffd_prot, + bool uffd_prot_resolve) { softleaf_t entry =3D softleaf_from_pmd(*pmd); const struct folio *folio =3D softleaf_to_folio(entry); @@ -2669,9 +2669,9 @@ static void change_non_present_huge_pmd(struct mm_str= uct *mm, newpmd =3D *pmd; } =20 - if (uffd_wp) + if (uffd_prot) newpmd =3D pmd_swp_mkuffd(newpmd); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpmd =3D pmd_swp_clear_uffd(newpmd); if (!pmd_same(*pmd, newpmd)) set_pmd_at(mm, addr, pmd, newpmd); @@ -2692,8 +2692,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, spinlock_t *ptl; pmd_t oldpmd, entry; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); int ret =3D 1; =20 tlb_change_page_size(tlb, HPAGE_PMD_SIZE); @@ -2706,11 +2707,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct = vm_area_struct *vma, return 0; =20 if (thp_migration_supported() && pmd_is_valid_softleaf(*pmd)) { - change_non_present_huge_pmd(mm, addr, pmd, uffd_wp, - uffd_wp_resolve); + change_non_present_huge_pmd(mm, addr, pmd, uffd_prot, + uffd_prot_resolve); goto unlock; } =20 + /* Already in the desired state */ + if (prot_numa && pmd_protnone(*pmd)) + goto unlock; + if ((cp_flags & MM_CP_UFFD_RWP) && pmd_protnone(*pmd) && pmd_uffd(*pmd)) + goto unlock; + if (prot_numa) { =20 /* @@ -2721,9 +2728,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, if (is_huge_zero_pmd(*pmd)) goto unlock; =20 - if (pmd_protnone(*pmd)) - goto unlock; - if (!folio_can_map_prot_numa(pmd_folio(*pmd), vma, vma_is_single_threaded_private(vma))) goto unlock; @@ -2752,9 +2756,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm= _area_struct *vma, oldpmd =3D pmdp_invalidate_ad(vma, addr, pmd); =20 entry =3D pmd_modify(oldpmd, newprot); - if (uffd_wp) + if (uffd_prot) entry =3D pmd_mkuffd(entry); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) /* * Leave the write bit to be handled by PF interrupt * handler, then things like COW could be properly diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d0c81a056ae2..4d75b69d4272 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6395,6 +6395,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, unsigned long last_addr_mask; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; + bool uffd_rwp_resolve =3D cp_flags & MM_CP_UFFD_RWP_RESOLVE; struct mmu_gather tlb; =20 /* @@ -6420,6 +6422,11 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, =20 ptep =3D hugetlb_walk(vma, address, psize); if (!ptep) { + /* + * uffd_wp installs a pte marker on the unpopulated + * entry; uffd_rwp does not install markers so the + * allocation is unnecessary for it. + */ if (!uffd_wp) { address |=3D last_addr_mask; continue; @@ -6441,7 +6448,8 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, * shouldn't happen at all. Warn about it if it * happened due to some reason. */ - WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve || + uffd_rwp || uffd_rwp_resolve); pages++; spin_unlock(ptl); address |=3D last_addr_mask; @@ -6475,9 +6483,9 @@ long hugetlb_change_protection(struct vm_area_struct = *vma, pages++; } =20 - if (uffd_wp) + if (uffd_wp || uffd_rwp) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) newpte =3D pte_swp_clear_uffd(newpte); if (!pte_same(pte, newpte)) set_huge_pte_at(mm, address, ptep, newpte, psize); @@ -6488,19 +6496,24 @@ long hugetlb_change_protection(struct vm_area_struc= t *vma, * pte_marker_uffd_wp()=3D=3Dtrue implies !poison * because they're mutual exclusive. */ - if (pte_is_uffd_wp_marker(pte) && uffd_wp_resolve) + if (pte_is_uffd_wp_marker(pte) && + (uffd_wp_resolve || uffd_rwp_resolve)) /* Safe to modify directly (non-present->none). */ huge_pte_clear(mm, address, ptep, psize); } else { pte_t old_pte; unsigned int shift =3D huge_page_shift(hstate_vma(vma)); =20 + /* Already protnone with uffd bit set? Nothing to do. */ + if (uffd_rwp && pte_protnone(pte) && huge_pte_uffd(pte)) + goto next; + old_pte =3D huge_ptep_modify_prot_start(vma, address, ptep); pte =3D huge_pte_modify(old_pte, newprot); pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); - if (uffd_wp) + if (uffd_wp || uffd_rwp) pte =3D huge_pte_mkuffd(pte); - else if (uffd_wp_resolve) + else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; diff --git a/mm/mprotect.c b/mm/mprotect.c index 8340c8b228c6..4a6b35482aee 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -214,8 +214,9 @@ static __always_inline void set_write_prot_commit_flush= _ptes(struct vm_area_stru static long change_softleaf_pte(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, pte_t oldpte, unsigned long cp_flags) { - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); softleaf_t entry =3D softleaf_from_pte(oldpte); pte_t newpte; =20 @@ -256,7 +257,7 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, * to unprotect it, drop it; the next page * fault will trigger without uffd trapping. */ - if (uffd_wp_resolve) { + if (uffd_prot_resolve) { pte_clear(vma->vm_mm, addr, pte); return 1; } @@ -265,9 +266,9 @@ static long change_softleaf_pte(struct vm_area_struct *= vma, newpte =3D oldpte; } =20 - if (uffd_wp) + if (uffd_prot) newpte =3D pte_swp_mkuffd(newpte); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) newpte =3D pte_swp_clear_uffd(newpte); =20 if (!pte_same(oldpte, newpte)) { @@ -282,16 +283,17 @@ static __always_inline void change_present_ptes(struc= t mmu_gather *tlb, int nr_ptes, unsigned long end, pgprot_t newprot, struct folio *folio, struct page *page, unsigned long cp_flags) { - const bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; - const bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; + const bool uffd_prot =3D cp_flags & (MM_CP_UFFD_WP | MM_CP_UFFD_RWP); + const bool uffd_prot_resolve =3D cp_flags & + (MM_CP_UFFD_WP_RESOLVE | MM_CP_UFFD_RWP_RESOLVE); pte_t ptent, oldpte; =20 oldpte =3D modify_prot_start_ptes(vma, addr, ptep, nr_ptes); ptent =3D pte_modify(oldpte, newprot); =20 - if (uffd_wp) + if (uffd_prot) ptent =3D pte_mkuffd(ptent); - else if (uffd_wp_resolve) + else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 /* @@ -325,6 +327,7 @@ static long change_pte_range(struct mmu_gather *tlb, long pages =3D 0; bool is_private_single_threaded; bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; + bool uffd_rwp =3D cp_flags & MM_CP_UFFD_RWP; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; int nr_ptes; =20 @@ -350,6 +353,14 @@ static long change_pte_range(struct mmu_gather *tlb, /* Already in the desired state. */ if (prot_numa && pte_protnone(oldpte)) continue; + /* + * RWP-protected PTEs carry _PAGE_UFFD as a marker on + * top of PROT_NONE. Skip only entries already in that + * exact state; plain PROT_NONE from mprotect() still needs + * to be promoted so future faults can be distinguished. + */ + if (uffd_rwp && pte_protnone(oldpte) && pte_uffd(oldpte)) + continue; =20 page =3D vm_normal_page(vma, addr, oldpte); if (page) @@ -358,6 +369,8 @@ static long change_pte_range(struct mmu_gather *tlb, /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. + * Skip this filter for uffd RWP which + * must set protnone regardless of NUMA placement. */ if (prot_numa && !folio_can_map_prot_numa(folio, vma, @@ -667,7 +680,16 @@ long change_protection(struct mmu_gather *tlb, pgprot_t newprot =3D vma->vm_page_prot; long pages; =20 - BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL); + /* + * MM_CP_UFFD_{WP,RWP} and _RESOLVE are mutually exclusive within one + * change, and WP and RWP cannot mix. Miswired callers get a warn and + * a no-op; userspace cannot reach this state. + */ + if (WARN_ON_ONCE((cp_flags & MM_CP_UFFD_WP_ALL) =3D=3D MM_CP_UFFD_WP_ALL = || + (cp_flags & MM_CP_UFFD_RWP_ALL) =3D=3D MM_CP_UFFD_RWP_ALL || + ((cp_flags & MM_CP_UFFD_WP_ALL) && + (cp_flags & MM_CP_UFFD_RWP_ALL)))) + return 0; =20 #ifdef CONFIG_NUMA_BALANCING /* @@ -681,6 +703,10 @@ long change_protection(struct mmu_gather *tlb, WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA); #endif =20 + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_PROTNONE) && + (cp_flags & MM_CP_UFFD_RWP)) + newprot =3D PAGE_NONE; + if (is_vm_hugetlb_page(vma)) pages =3D hugetlb_change_protection(vma, start, end, newprot, cp_flags); --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02AFE3FB074; Tue, 26 May 2026 13:05:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800748; cv=none; b=I7l4R6YZ+S7rrjZvbtj748jAlTCtKUpj5ywy6YuUuZG4dO7qPdGnA87iKbT3vA+61j0Uq9jQKR2tN9uz1Y2knXR4qchYwx297SQdBp+0DjDNGNQl4NrlqysrtrGO4WHu5eZS1jXZv8LtuNDoP8FdvjUHHdQQi87dVfx2/qPqXdA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800748; c=relaxed/simple; bh=MWjM9x6sftbWAJ0mjMh8yQgQmyjjBJGkQG6dDIMaNUM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DTivT9BbFtMjH/hcCqUno83Geax5e6wySCL9bFy/29nxjLKua1q24mOysNOy4/bhUkH6rmVhAUqNXf4z/jqJ7dX/vwrh3ShJUokITfiU1uvWgf5P38bLwRzuLugJRV5L4QlSAUskGlImXFBpaRmWTTvNnPUt37/kC7IEHFsLfRo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=D2ONIH54; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=AI9sqdoF; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="D2ONIH54"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="AI9sqdoF" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id B51C87A0194; Tue, 26 May 2026 09:05:44 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Tue, 26 May 2026 09:05:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800744; x= 1779887144; bh=j03dxXIoPPLNhIhpo0k2MB+X17HaXKXBfH81OzyW818=; b=D 2ONIH549dDqPoKLCH2SqV4XoXRZhWHVz9mJ0dALRHUsvoI25EVEa/s6H73rFJQjN 2lA/wAlveDgiR1bj2XRPM/xXDNo8nWVFNzAW/mH6P4jnpAkrvgmf5dEJh7kdbRHB D4UM/w2VqacpAnebW23xFTQ+3mHRiq4eS14/gc472WWFAK/JJnSaGsNcws+1fJc0 58UKRYBZKd7nQ96kcSnTj2gPg/wCs06HkVSLKMnNQ9o23ebOMA2sOvFgOEqGWE8X UTEZHExPfDOO5Woy40apquzowcbah+q3ci2VfSfmk9Hipk/vHMnshj+SkanM/wD7 iJoKrGwl2sHJcxTdjqn4Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800744; x=1779887144; bh=j 03dxXIoPPLNhIhpo0k2MB+X17HaXKXBfH81OzyW818=; b=AI9sqdoF62XRfOE4t HbX0LA6um6oqfLz0y1xvfU5RsoDYya6HO8NNdpP7LEMK+NcNBUtphll+fasUbQT2 2CDQtLzO+DjNzzKC1523UpDLd4xLCk0H3c27PUr9NkqX4El1UTqkNpvbjg+cQen6 wkK8wWKYRdk6jpY+WriXleXUHbRplZY3mLw26i5v/jjEW4yEkp0UXWRe0Dnky3k7 s4sQcexFjx0qwPDqnIozBW4cFKm4ghsOo48Rc9M3NMAXtZOdZQG2zLxzkxi0wgyB tkQFnRYGrCd9c0qeSt9lkUijArJVxfbXsJxU6a9PFjApak15NJXi9d0xSLLmTDDr M+QgQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEzvHlPzLECdN0oUl9WQMczFzDev8cDOIvkt3PCDzQa89C+VcBR0fT+neHr0xDY9c E1lOkYShDM7gHuzaUAspxlre/K2WW5h4tu6mCemqBiOx7uwiffptB4TdA8xuvPKf4dxBnE WSPJQeaUZcATVIs9WpM4Lq0pXLUv11mhmVsjvHccpZr574YSM6zwnvi2MhP6Qgfrqof+PU WnHYhz3eTCLxCEvF1VP7IHBalJXjAemgrxDvmw6FXhGxcXv6+EErND5M/zCAozlc4WhfO8 SwiCJMPqgAdL5MNEfaXozXy3ufcKsgNIf65DdMVl2QuP2VseNZF1YZ8iQpqEWb2i2OkeNM 9ZMIUpP+i2jRT6kBd2/5FHIHXR7BHlaByqWJylR4AwxGyk3M4E0x3SrqkeBxLVqmAAwq3N TGDj+OM9HtHZXYJzi3PxTofYvjGwAPVpSfMhEBPvL1UUK6FQUSf/qxSrnukdtQpqeaV84K wv6RcDfroEbwO+Nu0ruZ8WIRUJk+QDswvzasGU0fe3+6WgU2FLtlQkGD+b4p790z+wzzlb 9J207ka1hyy0Yauj728Vx57aOKD6GqNOCePSZLLbAq0zwcS1qXRr21otyW8b0uWEy/v1UV BsQjXv9qC971sHL9Nc5zHo1wc4ybg9NDOC6+PQB2drz9m6jq6mVN0S9T60hQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:43 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Date: Tue, 26 May 2026 14:04:58 +0100 Message-ID: <20260526130509.2748441-11-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" The uffd PTE bit must survive any kernel path that rewrites a PTE on a VM_UFFD_RWP VMA, otherwise the marker that carries PAGE_NONE semantics is silently dropped and the next access leaks past RWP tracking. Wire the preservation through every path that rewrites a VM_UFFD_RWP PTE. Swap and device-exclusive: do_swap_page(), restore_exclusive_pte(), and unuse_pte() (swapoff()) re-apply PAGE_NONE when the swap PTE carries the uffd bit and the VMA has VM_UFFD_RWP. Migration: remove_migration_pte() and remove_migration_pmd() do the same after the migration entry is replaced with a real PTE/PMD. Fork: __copy_present_ptes(), copy_present_page(), copy_nonpresent_pte(), copy_huge_pmd(), copy_huge_non_present_pmd(), and copy_hugetlb_page_range() keep the uffd bit on the child when the destination VMA has VM_UFFD_RWP, matching the existing VM_UFFD_WP handling. Add VM_UFFD_RWP to VM_COPY_ON_FORK so the flag itself propagates. mprotect(): change_pte_range() and change_huge_pmd() restore PAGE_NONE after pte_modify()/pmd_modify() have recomputed the base protection from a (possibly user-changed) vm_page_prot. pte_modify() preserves _PAGE_UFFD, so the bit stays; we just have to force PAGE_NONE back on top. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 3 ++- mm/huge_memory.c | 47 +++++++++++++++++++++++++++++++++++++---- mm/hugetlb.c | 52 ++++++++++++++++++++++++++++++++++++++-------- mm/memory.c | 49 ++++++++++++++++++++++++++++++++++++------- mm/migrate.c | 8 +++++++ mm/mprotect.c | 10 +++++++++ mm/mremap.c | 13 ++++++++++-- mm/swapfile.c | 5 +++++ mm/userfaultfd.c | 17 +++++++++++++++ 9 files changed, 181 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f79801816f32..9e62946af654 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -672,7 +672,8 @@ enum { * only and thus cannot be reconstructed on page * fault. */ -#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_MAYBE_G= UARD) +#define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_UFFD_RW= P | \ + VM_MAYBE_GUARD) =20 /* * mapping from the currently active vm_flags protection bits (the diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40c65bf2d6dc..6417d883d2e4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1943,7 +1943,7 @@ static void copy_huge_non_present_pmd( add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pmd =3D pmd_swp_clear_uffd(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); } @@ -2038,9 +2038,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct m= m_struct *src_mm, out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) + + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && pmd_uffd(pmd)) + pmd =3D pmd_modify(pmd, dst_vma->vm_page_prot); pmd =3D pmd_clear_uffd(pmd); + } + + pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd =3D pmd_wrprotect(pmd); set_pmd: pmd =3D pmd_mkold(pmd); @@ -2626,8 +2632,16 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsig= ned long old_addr, pgtable_trans_huge_deposit(mm, new_pmd, pgtable); } pmd =3D move_soft_dirty_pmd(pmd); - if (vma_has_uffd_without_event_remap(vma)) + if (vma_has_uffd_without_event_remap(vma)) { + /* + * See __copy_present_ptes(): normalise RWP PMDs so + * the destination starts accessible instead of taking + * a numa-hinting fault on first access. + */ + if (pmd_present(pmd) && userfaultfd_rwp(vma)) + pmd =3D pmd_modify(pmd, vma->vm_page_prot); pmd =3D clear_uffd_wp_pmd(pmd); + } set_pmd_at(mm, new_addr, new_pmd, pmd); if (force_flush) flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); @@ -2766,6 +2780,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct v= m_area_struct *vma, */ entry =3D pmd_clear_uffd(entry); =20 + /* See change_pte_range(): preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && pmd_uffd(entry)) + entry =3D pmd_modify(entry, PAGE_NONE); + /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) @@ -2933,6 +2951,13 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t = *dst_pmd, pmd_t *src_pmd, pm _dst_pmd =3D move_soft_dirty_pmd(src_pmdval); _dst_pmd =3D clear_uffd_wp_pmd(_dst_pmd); } + + /* Re-arm RWP on the moved PMD if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + _dst_pmd =3D pmd_modify(_dst_pmd, PAGE_NONE); + _dst_pmd =3D pmd_mkuffd(_dst_pmd); + } + set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); =20 src_pgtable =3D pgtable_trans_huge_withdraw(mm, src_pmd); @@ -3109,6 +3134,11 @@ static void __split_huge_zero_page_pmd(struct vm_are= a_struct *vma, entry =3D pte_mkspecial(entry); if (pmd_uffd(old_pmd)) entry =3D pte_mkuffd(entry); + + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && pmd_uffd(old_pmd)) + entry =3D pte_modify(entry, PAGE_NONE); + VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); pte++; @@ -3383,6 +3413,10 @@ static void __split_huge_pmd_locked(struct vm_area_s= truct *vma, pmd_t *pmd, if (uffd_wp) entry =3D pte_mkuffd(entry); =20 + /* Restore PAGE_NONE so an RWP marker keeps trapping */ + if (userfaultfd_rwp(vma) && uffd_wp) + entry =3D pte_modify(entry, PAGE_NONE); + for (i =3D 0; i < HPAGE_PMD_NR; i++) VM_WARN_ON(!pte_none(ptep_get(pte + i))); =20 @@ -5055,6 +5089,11 @@ void remove_migration_pmd(struct page_vma_mapped_wal= k *pvmw, struct page *new) pmde =3D pmd_mkwrite(pmde, vma); if (pmd_swp_uffd(*pvmw->pmd)) pmde =3D pmd_mkuffd(pmde); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pmd_swp_uffd(*pvmw->pmd) && userfaultfd_rwp(vma)) + pmde =3D pmd_modify(pmde, PAGE_NONE); + if (!softleaf_is_migration_young(entry)) pmde =3D pmd_mkold(pmde); /* NOTE: this may contain setting soft-dirty on some archs */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4d75b69d4272..0d8d39cd8888 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4843,8 +4843,16 @@ hugetlb_install_folio(struct vm_area_struct *vma, pt= e_t *ptep, unsigned long add =20 __folio_mark_uptodate(new_folio); hugetlb_add_new_anon_rmap(new_folio, vma, addr); - if (userfaultfd_wp(vma) && huge_pte_uffd(old)) + if (userfaultfd_protected(vma) && huge_pte_uffd(old)) { newpte =3D huge_pte_mkuffd(newpte); + /* Restore PAGE_NONE so the RWP marker keeps trapping. */ + if (userfaultfd_rwp(vma)) { + unsigned int shift =3D huge_page_shift(hstate_vma(vma)); + + newpte =3D huge_pte_modify(newpte, PAGE_NONE); + newpte =3D arch_make_huge_pte(newpte, shift, vma->vm_flags); + } + } set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); @@ -4917,7 +4925,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, =20 softleaf =3D softleaf_from_pte(entry); if (unlikely(softleaf_is_hwpoison(softleaf))) { - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(softleaf_is_migration(softleaf))) { @@ -4931,11 +4939,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, softleaf =3D make_readable_migration_entry( swp_offset(softleaf)); entry =3D swp_entry_to_pte(softleaf); - if (userfaultfd_wp(src_vma) && uffd) + if (userfaultfd_protected(src_vma) && uffd) entry =3D pte_swp_mkuffd(entry); set_huge_pte_at(src, addr, src_pte, entry, sz); } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) entry =3D huge_pte_clear_uffd(entry); set_huge_pte_at(dst, addr, dst_pte, entry, sz); } else if (unlikely(pte_is_marker(entry))) { @@ -5000,6 +5008,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, s= truct mm_struct *src, goto next; } =20 + /* See __copy_present_ptes(): restore accessible protection. */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && huge_pte_uffd(entry)) { + entry =3D huge_pte_modify(entry, dst_vma->vm_page_prot); + entry =3D arch_make_huge_pte(entry, huge_page_shift(h), + dst_vma->vm_flags); + } + entry =3D huge_pte_clear_uffd(entry); + } + if (cow) { /* * No need to notify as we are downgrading page @@ -5012,9 +5030,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, st= ruct mm_struct *src, entry =3D huge_pte_wrprotect(entry); } =20 - if (!userfaultfd_wp(dst_vma)) - entry =3D huge_pte_clear_uffd(entry); - set_huge_pte_at(dst, addr, dst_pte, entry, sz); hugetlb_count_add(npages, dst); } @@ -5060,10 +5075,22 @@ static void move_huge_pte(struct vm_area_struct *vm= a, unsigned long old_addr, huge_pte_clear(mm, new_addr, dst_pte, sz); } else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise RWP + * PTEs so the destination starts accessible + * instead of taking a numa-hinting fault on + * first access. + */ + if (userfaultfd_rwp(vma)) { + pte =3D huge_pte_modify(pte, vma->vm_page_prot); + pte =3D arch_make_huge_pte(pte, huge_page_shift(h), + vma->vm_flags); + } pte =3D huge_pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); } @@ -6515,6 +6542,13 @@ long hugetlb_change_protection(struct vm_area_struct= *vma, pte =3D huge_pte_mkuffd(pte); else if (uffd_wp_resolve || uffd_rwp_resolve) pte =3D huge_pte_clear_uffd(pte); + + /* Preserve RWP protection across mprotect() */ + if (userfaultfd_rwp(vma) && huge_pte_uffd(pte)) { + pte =3D huge_pte_modify(pte, PAGE_NONE); + pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + } + huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; tlb_remove_huge_tlb_entry(h, &tlb, ptep, address); diff --git a/mm/memory.c b/mm/memory.c index c4fd5cb4a08f..06473285c0dc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -896,6 +896,10 @@ static void restore_exclusive_pte(struct vm_area_struc= t *vma, if (pte_swp_uffd(orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if ((vma->vm_flags & VM_WRITE) && can_change_pte_writable(vma, address, pte)) { if (folio_test_dirty(folio)) @@ -1041,7 +1045,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, make_pte_marker(marker)); return 0; } - if (!userfaultfd_wp(dst_vma)) + if (!userfaultfd_protected(dst_vma)) pte =3D pte_swp_clear_uffd(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; @@ -1088,9 +1092,13 @@ copy_present_page(struct vm_area_struct *dst_vma, st= ruct vm_area_struct *src_vma /* All done, just insert the new page copy in the child */ pte =3D folio_mk_pte(new_folio, dst_vma->vm_page_prot); pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); - if (userfaultfd_pte_wp(dst_vma, ptep_get(src_pte))) - /* Uffd-wp needs to be delivered to dest pte as well */ + if (userfaultfd_protected(dst_vma) && pte_uffd(ptep_get(src_pte))) { + /* The uffd bit needs to be delivered to the dest pte as well */ pte =3D pte_mkuffd(pte); + /* Restore PAGE_NONE so the RWP marker keeps trapping */ + if (userfaultfd_rwp(dst_vma)) + pte =3D pte_modify(pte, PAGE_NONE); + } set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -1100,9 +1108,31 @@ static __always_inline void __copy_present_ptes(stru= ct vm_area_struct *dst_vma, pte_t pte, unsigned long addr, int nr) { struct mm_struct *src_mm =3D src_vma->vm_mm; + bool writable; + + /* + * Snapshot writability before the RWP-disarm rewrite below: when the + * child is not RWP-armed, pte_modify(pte, dst_vma->vm_page_prot) can + * silently drop _PAGE_RW from a resolved (no-marker) writable PTE, + * so a later pte_write(pte) check would skip the COW wrprotect and + * leave the parent writable over a folio shared with the child. + */ + writable =3D pte_write(pte); + + /* + * Child is not RWP-armed: restore accessible protection so the + * inherited PAGE_NONE does not cost a fault on first read. Gate on + * pte_uffd(pte) so unrelated PAGE_NONE markers (e.g. NUMA balancing) + * are not normalised away. + */ + if (!userfaultfd_protected(dst_vma)) { + if (userfaultfd_rwp(src_vma) && pte_uffd(pte)) + pte =3D pte_modify(pte, dst_vma->vm_page_prot); + pte =3D pte_clear_uffd(pte); + } =20 /* If it's a COW mapping, write protect it both processes. */ - if (is_cow_mapping(src_vma->vm_flags) && pte_write(pte)) { + if (is_cow_mapping(src_vma->vm_flags) && writable) { wrprotect_ptes(src_mm, addr, src_pte, nr); pte =3D pte_wrprotect(pte); } @@ -1112,9 +1142,6 @@ static __always_inline void __copy_present_ptes(struc= t vm_area_struct *dst_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 - if (!userfaultfd_wp(dst_vma)) - pte =3D pte_clear_uffd(pte); - set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); } =20 @@ -5041,6 +5068,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (pte_swp_uffd(vmf->orig_pte)) pte =3D pte_mkuffd(pte); =20 + /* + * A page reclaimed while RWP-protected carries the uffd bit on + * its swap entry. Re-apply PAGE_NONE on swap-in so the first access + * still traps as an RWP fault. pte_modify() preserves _PAGE_UFFD. + */ + if (pte_swp_uffd(vmf->orig_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + /* * Same logic as in do_wp_page(); however, optimize for pages that are * certainly not shared either because we just allocated them without diff --git a/mm/migrate.c b/mm/migrate.c index 4bdb5be7afbf..8d7fd0b056b6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -329,6 +329,10 @@ static bool try_to_map_unused_to_zeropage(struct page_= vma_mapped_walk *pvmw, if (pte_swp_uffd(old_pte)) newpte =3D pte_mkuffd(newpte); =20 + /* See remove_migration_pte(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(pvmw->vma)) + newpte =3D pte_modify(newpte, PAGE_NONE); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); @@ -394,6 +398,10 @@ static bool remove_migration_pte(struct folio *folio, else if (pte_swp_uffd(old_pte)) pte =3D pte_mkuffd(pte); =20 + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + pte =3D pte_modify(pte, PAGE_NONE); + if (folio_test_anon(folio) && !softleaf_is_migration_read(entry)) rmap_flags |=3D RMAP_EXCLUSIVE; =20 diff --git a/mm/mprotect.c b/mm/mprotect.c index 4a6b35482aee..e0b5fe7c66b2 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -296,6 +296,16 @@ static __always_inline void change_present_ptes(struct= mmu_gather *tlb, else if (uffd_prot_resolve) ptent =3D pte_clear_uffd(ptent); =20 + /* + * The uffd bit on a VM_UFFD_RWP VMA carries PROT_NONE + * semantics. If mprotect() or NUMA hinting changed the + * base protection, restore PAGE_NONE so the PTE still + * traps on any access. pte_modify() preserves + * _PAGE_UFFD. + */ + if (userfaultfd_rwp(vma) && pte_uffd(ptent)) + ptent =3D pte_modify(ptent, PAGE_NONE); + /* * In some writable, shared mappings, we might want * to catch actual write access -- see diff --git a/mm/mremap.c b/mm/mremap.c index 12732a5c547e..8a46ec5831c8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -296,10 +296,19 @@ static int move_ptes(struct pagetable_move_control *p= mc, pte_clear(mm, new_addr, new_ptep); else { if (need_clear_uffd_wp) { - if (pte_present(pte)) + if (pte_present(pte)) { + /* + * See __copy_present_ptes(): normalise + * RWP PTEs so the destination starts + * accessible instead of taking a + * numa-hinting fault on first access. + */ + if (userfaultfd_rwp(vma) && pte_uffd(pte)) + pte =3D pte_modify(pte, vma->vm_page_prot); pte =3D pte_clear_uffd(pte); - else + } else { pte =3D pte_swp_clear_uffd(pte); + } } set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); } diff --git a/mm/swapfile.c b/mm/swapfile.c index 15fdca2da1f7..27cc299ead9b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2559,6 +2559,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd= _t *pmd, new_pte =3D pte_mksoft_dirty(new_pte); if (pte_swp_uffd(old_pte)) new_pte =3D pte_mkuffd(new_pte); + + /* See do_swap_page(): restore PAGE_NONE for RWP */ + if (pte_swp_uffd(old_pte) && userfaultfd_rwp(vma)) + new_pte =3D pte_modify(new_pte, PAGE_NONE); + setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9d74be69873a..e30878e4e00b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1285,6 +1285,13 @@ static long move_present_ptes(struct mm_struct *mm, if (pte_dirty(orig_src_pte)) orig_dst_pte =3D pte_mkdirty(orig_dst_pte); orig_dst_pte =3D pte_mkwrite(orig_dst_pte, dst_vma); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + orig_dst_pte =3D pte_modify(orig_dst_pte, PAGE_NONE); + orig_dst_pte =3D pte_mkuffd(orig_dst_pte); + } + set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); =20 src_addr +=3D PAGE_SIZE; @@ -1366,6 +1373,9 @@ static int move_swap_pte(struct mm_struct *mm, struct= vm_area_struct *dst_vma, orig_src_pte =3D ptep_get_and_clear(mm, src_addr, src_pte); if (pgtable_supports_soft_dirty()) orig_src_pte =3D pte_swp_mksoft_dirty(orig_src_pte); + /* Re-arm RWP on the moved swap entry if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) + orig_src_pte =3D pte_swp_mkuffd(orig_src_pte); set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); double_pt_unlock(dst_ptl, src_ptl); =20 @@ -1392,6 +1402,13 @@ static int move_zeropage_pte(struct mm_struct *mm, =20 zero_pte =3D pte_mkspecial(pfn_pte(zero_pfn(dst_addr), dst_vma->vm_page_prot)); + + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ + if (userfaultfd_rwp(dst_vma)) { + zero_pte =3D pte_modify(zero_pte, PAGE_NONE); + zero_pte =3D pte_mkuffd(zero_pte); + } + ptep_clear_flush(src_vma, src_addr, src_pte); set_pte_at(mm, dst_addr, dst_pte, zero_pte); double_pt_unlock(dst_ptl, src_ptl); --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7B9D3FBB4E; Tue, 26 May 2026 13:05:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800750; cv=none; b=RLv15gUWJ2SWiF5rMjBXLjZ86gUF5dn8KAjfDUUpJeak+Ip+m2c/q0rpsXFTtqVGfp2CJZMVmnPvtlctEXIPTUrGE6/xR8WzSM8PoOu71i7C0+2RsSwrU8Ttmmg/COAscuwmHy6SUu3AvJ+K3rNCwmVYv7YsgddEL5o7KKnG01k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800750; c=relaxed/simple; bh=wNdbRBxbi6KRek/PdU/wCEoE+a2dVe8DJ//BzMoH0j8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jTtEpmfkhoWzo+InimGx3y7vUFkW7r94R3f0X/MzzJcsCa5fzp+bKdVNkKzVSXhhTgqR8qoc9MKSmpB3tdqn/PO0ovT2P4xjj5GDv5cXUjzKQB6C89+DgSMcR0kPOtHAvUM55gH4Kx79qFv2s7bW6uPMv1bu38FmSBHiCl28sKI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=noa0c8Mu; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=TC5fe7Wc; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="noa0c8Mu"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="TC5fe7Wc" Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfhigh.stl.internal (Postfix) with ESMTP id 8CE487A0084; Tue, 26 May 2026 09:05:47 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Tue, 26 May 2026 09:05:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800747; x=1779887147; bh=1V28tiHbdQZRMQ0FP+jJ6/BxhOD+MMN4 HsGSl4XnVFI=; b=noa0c8Muk2ynP86tmjaF/SEF5jlhafsg+IQ51lc+WZm8vU4s uMf5JxI633Dy+YniEkD9zjjGXc8OyzYwjrU8y48gapBfWwtaF2iKOiotIq0gbx7S lxLAD3MO9RVCWv/ef0rLA2AG+HECy5zQhUL6W70AOCbB7bckfkp4xz4VmU2L9sxM DF1YjxZZqgM0JGRhUGaUTOK8Tjn+VLZH7NSkp0rC9iIoMuTd+V5v6hv3RvDDg9Sb O8z3mht26pyhzLPwTwkyrSWutG8UvtC9kCH3YTGSooNJWRgU9w14JRZfdPdYc1wz gJU7mxxe01kkrp4rqNkGN5niVRo2TCa9NMjAuQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800747; x= 1779887147; bh=1V28tiHbdQZRMQ0FP+jJ6/BxhOD+MMN4HsGSl4XnVFI=; b=T C5fe7WcjMuIYwccTSfz704Kp/a7YuOPKZw6x7oXIZH5+gsNeQEzNiH36W86HwS9I dHBy7pGIW8KdTqibmqFgCuG8PiyereQi1EXKr4wojp4Dy9GGymQEFNXQZqi9PdvU o35ZCf/A6g+yXx3R1O0nXNDWEHPjaruZamuRklmdRQCap8699dw6CQnoU6uPH75N 21MdBN1ercuJA8aH8FL8kqDVBV5oSghfEf77jcIQq/WWLNyklnmtkC/O/VPcxPc0 oTrl2THIUF8NUpJPWRL2DMWmWXQdSsMRFT+uRW7WrvNZibn0qkDbc604vN+lmlMl dty+s9x5Zh9+yozpCeIRA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFRxKInQFecwZPuEt+LuNBlOJGP8RsxGpm588gQviVhNq1Pa+AqWD7JQeVUR2Swte eyzLZOEYFGJPBoftF7cmhN7hkCk3wRua/o4v8Bw72KaC1CFplNf1sAsIMRN76GRZVrrXJo 4g3If4FQDoPgtrZ9saDLAexzQgvrfsTwQ3/KP0AGKCFmLiVvQMOmXCsts9ZOCGk1E95eOU mgzRkXibxyISOqgf2hZqBdgrV6ktnG5u3x8/zXY22S6cQZP9xSB0eE/pqpJrpVnOId2p4l G0binpu+ah/FFZH0esjGo/wTdPAqY9SKUVzl+zWcrHaZzSyBzEWlNRkrUo/n9HkffJzpVB tAXTyvFcGJ4L+WqhautPziTC+ot800EDoZJbryAf3DXYgwcjBJwcEUaPS9IlisE5rDyaeP 2xnQ1aNKHrCfEz/y/t7ppZXR+kp7a3bd3dRZbeeynQkzsx2Z+tXuaRc/yrxOppSCJK9yO2 tT2GNtMSSDJasr0JASxzy0b96TMuFUk06SmZwOnPhgeNNy+xy2G8pOKs0UUeabrWtvLHNS n5MXArGj3C0PDXWsCrayZxwUm3ECrUTXD1xCEH4qjhMa+Z7CWol0k1dxzdbC9NUieQXnfK 0JsXtu6F8klDC6fmPWosNL6f/JzM8PF5J9z1IdV4RbazBitWRGHwbNK9A3ow X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:46 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Date: Tue, 26 May 2026 14:04:59 +0100 Message-ID: <20260526130509.2748441-12-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Three mm paths outside the fault handler gate on the uffd PTE bit today: khugepaged (skip collapse on ranges carrying markers), rmap (cap unmap batching), and GUP (force a fault through gup_can_follow_protnone). Extend each to treat VM_UFFD_RWP the same as VM_UFFD_WP; otherwise per-PTE RWP state is silently destroyed or bypassed. khugepaged: try_collapse_pte_mapped_thp() and file_backed_vma_is_retractable() already refuse to collapse or retract page tables on ranges carrying the uffd PTE bit. Broaden the VMA predicate from userfaultfd_wp() to userfaultfd_protected() so VM_UFFD_RWP ranges get the same protection. hpage_collapse_scan_pmd() needs no change =E2=80=94 its existing pte_uffd() check already catches an RWP PTE because it carries the uffd bit. rmap: folio_unmap_pte_batch() caps batching at 1 for VM_UFFD_RWP so the restore path handles each PTE with its own marker. GUP: gup_can_follow_protnone() forces a fault on VM_UFFD_RWP VMAs regardless of FOLL_HONOR_NUMA_FAULT. RWP uses protnone as an access-tracking marker, not for NUMA hinting, so any GUP =E2=80=94 read or write =E2=80=94 must go through the userfaultfd fault path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 10 +++++++++- mm/khugepaged.c | 18 +++++++++++------- mm/rmap.c | 2 +- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9e62946af654..87db714e1364 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4611,11 +4611,19 @@ static inline int vm_fault_to_errno(vm_fault_t vm_f= ault, int foll_flags) =20 /* * Indicates whether GUP can follow a PROT_NONE mapped page, or whether - * a (NUMA hinting) fault is required. + * a (NUMA hinting or userfaultfd RWP) fault is required. */ static inline bool gup_can_follow_protnone(const struct vm_area_struct *vm= a, unsigned int flags) { + /* + * VM_UFFD_RWP uses protnone as an access-tracking marker, not for + * NUMA hinting. GUP must always take a fault so the access is + * delivered to userfaultfd, regardless of FOLL_HONOR_NUMA_FAULT. + */ + if (vma->vm_flags & VM_UFFD_RWP) + return false; + /* * If callers don't want to honor NUMA hinting faults, no need to * determine if we would actually have to trigger a NUMA hinting fault. diff --git a/mm/khugepaged.c b/mm/khugepaged.c index afa218be15de..4f3fedcd75cf 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1895,8 +1895,11 @@ static enum scan_result try_collapse_pte_mapped_thp(= struct mm_struct *mm, unsign if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD= _ORDER)) return SCAN_VMA_CHECK; =20 - /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ - if (userfaultfd_wp(vma)) + /* + * Keep pmd pgtable while the uffd bit is in use; see comment in + * retract_page_tables(). + */ + if (userfaultfd_protected(vma)) return SCAN_PTE_UFFD; =20 folio =3D filemap_lock_folio(vma->vm_file->f_mapping, @@ -2109,13 +2112,14 @@ static bool file_backed_vma_is_retractable(struct v= m_area_struct *vma) return false; =20 /* - * When a vma is registered with uffd-wp, we cannot recycle + * When a vma is registered with uffd-wp or RWP, we cannot recycle * the page table because there may be pte markers installed. - * Other vmas can still have the same file mapped hugely, but - * skip this one: it will always be mapped in small page size - * for uffd-wp registered ranges. + * VM_UFFD_RWP ranges similarly rely on per-PTE uffd state + * and cannot be recycled to a shared PMD. Other vmas can still + * have the same file mapped hugely, but skip this one: it will + * always be mapped in small page size for these registrations. */ - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return false; =20 /* diff --git a/mm/rmap.c b/mm/rmap.c index 546bc1cf9391..9fb733489898 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1965,7 +1965,7 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - if (userfaultfd_wp(vma)) + if (userfaultfd_protected(vma)) return 1; =20 /* --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D9813FC5DA; Tue, 26 May 2026 13:05:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800753; cv=none; b=VR1q5NWhshbUjicN19R1XQzldHPJoeHIHWWsBGn3YkmFuNCpCp6sQsNjB28CSAaFCPTpxz46T4DNpYkVus3kk/IdxcwbIOwoBTd7dcj1zhYashPa9mNxQdxsPVyNOKCawSncZLYXL5hIqIxHVXfjQsyuUybiplIrFMq4Hbxg7kE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800753; c=relaxed/simple; bh=fMwxsT+K2cfgO8/YDqNTQveTTUzEgcAcgUjYVTjeS4w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ttp8tm2a1xzr/VlR55p91oA0+S2gdpanUWnpWDTpK0534Xc0RPx6QyGmu0Q/s9C4ngGMEmnDQXtb2IQvhZdZED2BAn1LdKoNWNb217WnrRBlVOaIk36iG3jVBvoJyNgvHV0L6A0yFusJCxPbkEHa3fykFg+OqSYlJpLol6G5KXk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=lC8io2UL; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=fvWlrjs2; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="lC8io2UL"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="fvWlrjs2" Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfhigh.stl.internal (Postfix) with ESMTP id 50E1E7A0146; Tue, 26 May 2026 09:05:50 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Tue, 26 May 2026 09:05:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800750; x=1779887150; bh=0e4WGwhn0PB0Lb/bUSloM/7IFtEUrah/ eLYoarii5hk=; b=lC8io2ULtHRkLhUO+JAUFAnqeV+JrjGqOnCfL0cGQjufTXHT eTXds1hzRHzxAg/FAEj+OYeTM0f6+zhjbj9dwmCftGKDgpIYgxyOL7Wj1tWj0tKm ymn776dhK74C5e6poHKSfT6fOpqpeyPzhfZgYY7fcNcGbqw2gTkfnVmxftCyMLB8 HYrbQUNkF+7dIHoPwb9054bahjx1Xmkn8q0YjcZDG3SpPyuS2/LXO6dyIPqgcG1w hBmO88JNCsj+vxaVE+GCz5PCUTacqPuP1DaZTUA9zk47A54rw9cDsPR5uI8x3PIs zmfJLCEQM3pOdFAvcKwIvNmWk7eWEJZrEsp3lg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800750; x= 1779887150; bh=0e4WGwhn0PB0Lb/bUSloM/7IFtEUrah/eLYoarii5hk=; b=f vWlrjs2PMpezOSavRQBJvQcUBfPduV8Vvo05fY4ZR84zLIauGNVFURBxnK+FW2Ie EuSUIBBN+c/O2tTzBamFGufMkYP+VeJtTKrB0AuqczF+vMONj9DKECHmu/rg3odY W2xDkubqAK1Ere0/r3Dd3RReBG8rAQc4Oej2KYun1FCczSaiDAgwVQRz8uurfYG6 zQeekFvr/gLyO5QC3VsCKR1JKe0STpcFxZiiFKWETgEOPxWWcVB+RLlhpWObxnlM izILu7S/BH322rT3aGvhJFpeUuxAy/mpd/zkl/Kl7Yk9w0oDcQLRlBfhNoVR/1kl hiyZ6ax1hHG9nr9UtuhWA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFJ2bJI4xDmQgR07sJBeQn9uM13l1IDWb+O1snDuz7v5Ax8JR5dVS3tEEBYspKeR7 MIDNN1riH0BYtd9JI2yimrIxWvps4Q2SrjhhycZkM4QIBZFjrNtDvfd9yE7lY/trP7Euvo 9A1MtXG3R7pWCmhu8Xv2H+6bXibGjLwmxPYzTSQ39I/RE7M0YISHZNzeDetuGBtRspLUQb qwrBGmt3FeA1FDZuAhJpNLgyNYx8/4GBXc2FpNzxLFKAKoQWeB/nDQgSR9dKbSc5evxZSG 3jVRv3HVRInqyo7nrq6ydixC4naTHn8paXd04Htll73n7ZurarfHg1MsebK6ALakiAhuCn oDlclrUKU/Pko645/zg1XNdcArs2BVseMIXjWxVNeBUZeMjS4uoQYw8lojFWwf5+jmB2S7 kapJI/Rn4gwso1QkoFhvzAr/8e1Y2LTylcJPE5kwE8Xuaqm3nC4lBy0pcqgowFr+oVgni7 hw9DoS8LOYTeWlcJo7sOw3PthveWZq7rgH1fqOz93b4jm0ghjNUP8Z7MqgKbw0vHsCg0Iw J9NhHwuuN18HuQddabvA+Ic4waJ37XqgF5Wjig5vQ6vxw6EIkFNi2QFKqJ9eNoixEK9ZKO 1TDBT7vjzGMfzMp7q8GJnhuxALkZX+1AOTz6ePMwfOzqiJ51wRryb+4MAhzw X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:49 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Date: Tue, 26 May 2026 14:05:00 +0100 Message-ID: <20260526130509.2748441-13-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Add the userspace interface for read-write protection tracking: - UFFDIO_REGISTER_MODE_RWP register a range for RWP tracking - UFFD_FEATURE_RWP capability bit - UFFDIO_RWPROTECT install / remove RWP on a range Introduce CONFIG_USERFAULTFD_RWP, auto-selected on 64-bit kernels with ARCH_HAS_PTE_PROTNONE and HAVE_ARCH_USERFAULTFD_WP. The symbol gates VM_UFFD_RWP (previously aliased to VM_NONE) and the smaps/trace-flag hooks added in the preparatory patches; without it the UAPI bits added here have nothing to drive and would be unreachable. Registration sets VM_UFFD_RWP on the VMA. Combining MODE_WP with MODE_RWP is rejected because both modes claim the uffd PTE bit. UFFDIO_RWPROTECT is the bidirectional counterpart of UFFDIO_WRITEPROTECT: - MODE_RWP change_protection() with MM_CP_UFFD_RWP installs PAGE_NONE and sets the uffd bit on present PTEs - !MODE_RWP change_protection() with MM_CP_UFFD_RWP_RESOLVE restores vma->vm_page_prot and clears the bit userfaultfd_clear_vma() runs the same resolve pass on unregister so RWP state cannot outlive the uffd. Re-registering a range must not drop a mode that installs per-PTE markers (WP or RWP); doing so returns -EBUSY. This also closes a pre-existing window where re-registering without MODE_WP would strand uffd-wp markers: before, those caused extra write-faults but were otherwise benign; with RWP preservation in place, a subsequent mprotect() on a VM_UFFD_RWP VMA would silently promote the stale markers to RWP. The feature is not yet advertised. UFFDIO_REGISTER_MODE_RWP, UFFD_FEATURE_RWP, and _UFFDIO_RWPROTECT are intentionally absent from UFFD_API_REGISTER_MODES, UFFD_API_FEATURES, and UFFD_API_RANGE_IOCTLS, so UFFDIO_API masks them out and the register-mode validator rejects the bit. The follow-up patch adds fault dispatch and exposes the UAPI. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/userfaultfd.rst | 10 + include/linux/userfaultfd_k.h | 2 + include/uapi/linux/userfaultfd.h | 19 ++ mm/Kconfig | 9 + mm/userfaultfd.c | 189 ++++++++++++++++++- 5 files changed, 226 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index e5cc8848dcb3..1e533639fd50 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -131,6 +131,16 @@ userfaults on the range registered. Not all ioctls wil= l necessarily be supported for all memory types (e.g. anonymous memory vs. shmem vs. hugetlbfs), or all types of intercepted faults. =20 +.. note:: + + Re-registering an already-registered range must not drop any of the + modes that install per-PTE markers =E2=80=94 currently + ``UFFDIO_REGISTER_MODE_WP`` and ``UFFDIO_REGISTER_MODE_RWP``. Doing + so would strand markers with no flag to describe them, so the call + is rejected with ``-EBUSY``; userspace must issue + ``UFFDIO_UNREGISTER`` first. This differs from older kernels, which + silently replaced the mode bits on re-registration. + Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove memory from the ``userfaultfd`` registered range). This means a userfault diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 564eb2aac321..28fc44733302 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -150,6 +150,8 @@ static inline uffd_flags_t uffd_flags_set_mode(uffd_fla= gs_t flags, enum mfill_at =20 extern long uffd_wp_range(struct vm_area_struct *vma, unsigned long start, unsigned long len, bool enable_wp); +extern int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long sta= rt, + unsigned long len, bool enable_rwp); =20 /* move_pages */ void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 2841e4ea8f2c..7b78aa3b5318 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -79,6 +79,7 @@ #define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) +#define _UFFDIO_RWPROTECT (0x09) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -103,6 +104,8 @@ struct uffdio_continue) #define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \ struct uffdio_poison) +#define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ + struct uffdio_rwprotect) =20 /* read() structure */ struct uffd_msg { @@ -158,6 +161,7 @@ struct uffd_msg { #define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ #define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ #define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */ +#define UFFD_PAGEFAULT_FLAG_RWP (1<<3) /* If reason is VM_UFFD_RWP */ =20 struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -230,6 +234,11 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_RWP indicates that the kernel supports + * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. + * Pages are made inaccessible via UFFDIO_RWPROTECT and faults + * are delivered when the pages are re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +257,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_RWP (1<<17) __u64 features; =20 __u64 ioctls; @@ -263,6 +273,7 @@ struct uffdio_register { #define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0) #define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1) #define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2) +#define UFFDIO_REGISTER_MODE_RWP ((__u64)1<<3) __u64 mode; =20 /* @@ -356,6 +367,14 @@ struct uffdio_poison { __s64 updated; }; =20 +struct uffdio_rwprotect { + struct uffdio_range range; + /* !RWP means undo RWP-protection */ +#define UFFDIO_RWPROTECT_MODE_RWP ((__u64)1<<0) +#define UFFDIO_RWPROTECT_MODE_DONTWAKE ((__u64)1<<1) + __u64 mode; +}; + struct uffdio_move { __u64 dst; __u64 src; diff --git a/mm/Kconfig b/mm/Kconfig index 776b67c66e82..fac01bcfc0d1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1333,6 +1333,15 @@ config HAVE_ARCH_USERFAULTFD_MINOR help Arch has userfaultfd minor fault support =20 +config USERFAULTFD_RWP + def_bool y + depends on 64BIT && ARCH_HAS_PTE_PROTNONE && HAVE_ARCH_USERFAULTFD_WP + help + Userfaultfd read-write protection (UFFDIO_RWPROTECT) delivers a + userfaultfd notification on every access -- read or write -- to a + protected range, letting userspace observe the working set of a + process. + menuconfig USERFAULTFD bool "Enable userfaultfd() system call" depends on MMU diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e30878e4e00b..c07e3232a01a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1157,6 +1157,75 @@ static int mwriteprotect_range(struct userfaultfd_ct= x *ctx, unsigned long start, return err; } =20 +int mrwprotect_range(struct userfaultfd_ctx *ctx, unsigned long start, + unsigned long len, bool enable_rwp) +{ + struct mm_struct *dst_mm =3D ctx->mm; + unsigned long end =3D start + len; + struct vm_area_struct *dst_vma; + unsigned int mm_cp_flags; + struct mmu_gather tlb; + bool found =3D false; + VMA_ITERATOR(vmi, dst_mm, start); + + VM_WARN_ON_ONCE(start & ~PAGE_MASK); + VM_WARN_ON_ONCE(len & ~PAGE_MASK); + VM_WARN_ON_ONCE(start + len <=3D start); + + guard(mmap_read_lock)(dst_mm); + guard(rwsem_read)(&ctx->map_changing_lock); + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (enable_rwp) + mm_cp_flags =3D MM_CP_UFFD_RWP; + else + mm_cp_flags =3D MM_CP_UFFD_RWP_RESOLVE; + + /* + * Pre-scan the range: validate every spanned VMA before applying + * any change_protection() so a partial failure cannot leave the + * process with only a prefix of the range re-protected. + */ + for_each_vma_range(vmi, dst_vma, end) { + if (!userfaultfd_rwp(dst_vma)) + return -ENOENT; + + if (is_vm_hugetlb_page(dst_vma)) { + unsigned long page_mask; + + page_mask =3D vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + return -EINVAL; + } + found =3D true; + } + if (!found) + return -ENOENT; + + vma_iter_set(&vmi, start); + tlb_gather_mmu(&tlb, dst_mm); + for_each_vma_range(vmi, dst_vma, end) { + unsigned long vma_start =3D max(dst_vma->vm_start, start); + unsigned long vma_end =3D min(dst_vma->vm_end, end); + unsigned int flags =3D mm_cp_flags; + + /* + * On resolve, try to upgrade writability per-VMA -- + * MM_CP_TRY_CHANGE_WRITABLE WARNs in + * maybe_change_pte_writable() if the VMA is not VM_WRITE, + * and RWP can be registered on PROT_READ-only mappings. + */ + if (!enable_rwp && vma_wants_manual_pte_write_upgrade(dst_vma)) + flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + change_protection(&tlb, dst_vma, vma_start, vma_end, flags); + } + tlb_finish_mmu(&tlb); + + return 0; +} =20 void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2) @@ -2145,6 +2214,15 @@ static bool vma_can_userfault(struct vm_area_struct = *vma, vm_flags_t vm_flags, !vma_is_anonymous(vma)) return false; =20 + /* + * RWP uses protnone as an access-tracking marker. PROT_NONE VMAs + * have vm_page_prot =3D=3D PAGE_NONE, so RWP resolution can't make a + * page accessible -- the next access would fault again. Reject up + * front instead of letting FOLL_FORCE loop on protnone+uffd PTEs. + */ + if ((vm_flags & VM_UFFD_RWP) && !vma_is_accessible(vma)) + return false; + return ops->can_userfault(vma, vm_flags); } =20 @@ -2197,9 +2275,22 @@ static struct vm_area_struct *userfaultfd_clear_vma(= struct vma_iterator *vmi, if (start =3D=3D vma->vm_start && end =3D=3D vma->vm_end) give_up_on_oom =3D true; =20 - /* Reset ptes for the whole vma range if wr-protected */ - if (userfaultfd_wp(vma)) - uffd_wp_range(vma, start, end - start, false); + /* Clear the uffd bit and/or restore protnone PTEs */ + if (userfaultfd_protected(vma)) { + unsigned int mm_cp_flags =3D 0; + struct mmu_gather tlb; + + if (userfaultfd_wp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_WP_RESOLVE; + if (userfaultfd_rwp(vma)) + mm_cp_flags |=3D MM_CP_UFFD_RWP_RESOLVE; + if (vma_wants_manual_pte_write_upgrade(vma)) + mm_cp_flags |=3D MM_CP_TRY_CHANGE_WRITABLE; + + tlb_gather_mmu(&tlb, vma->vm_mm); + change_protection(&tlb, vma, start, end, mm_cp_flags); + tlb_finish_mmu(&tlb); + } =20 ret =3D vma_modify_flags_uffd(vmi, prev, vma, start, end, &new_vma_flags, NULL_VM_UFFD_CTX, @@ -2248,6 +2339,14 @@ static int userfaultfd_register_range(struct userfau= ltfd_ctx *ctx, vma_test_all_mask(vma, vma_flags)) goto skip; =20 + /* + * Pre-scan in userfaultfd_register() already rejected mode + * switches that would drop VM_UFFD_WP or VM_UFFD_RWP, so a + * stray bit here is a bug. + */ + VM_WARN_ON_ONCE(vma->vm_userfaultfd_ctx.ctx =3D=3D ctx && + vma->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags); + if (vma->vm_start > start) start =3D vma->vm_start; vma_end =3D min(end, vma->vm_end); @@ -2514,6 +2613,8 @@ static inline struct uffd_msg userfault_msg(unsigned = long address, msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_WP; + if (reason & VM_UFFD_RWP) + msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_RWP; if (reason & VM_UFFD_MINOR) msg.arg.pagefault.flags |=3D UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) @@ -3613,6 +3714,22 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, =20 vm_flags |=3D VM_UFFD_WP; } + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP) { + if (!pgtable_supports_uffd() || VM_UFFD_RWP =3D=3D VM_NONE) + goto out; + if (!(ctx->features & UFFD_FEATURE_RWP)) + goto out; + vm_flags |=3D VM_UFFD_RWP; + } + + /* + * WP and RWP share the uffd PTE bit and + * cannot coexist in the same VMA =E2=80=94 the bit would carry ambiguous + * semantics. Reject the combination up front. + */ + if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP)) + goto out; + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) { #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR goto out; @@ -3706,6 +3823,16 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, cur->vm_userfaultfd_ctx.ctx !=3D ctx) goto out_unlock; =20 + /* + * Mode switches that drop VM_UFFD_WP or VM_UFFD_RWP would + * leave PTE markers without the flag that describes them; + * subsequent mprotect() would then promote stale markers + * into the other mode. Require an unregister first. + */ + if (cur->vm_userfaultfd_ctx.ctx =3D=3D ctx && + cur->vm_flags & (VM_UFFD_WP | VM_UFFD_RWP) & ~vm_flags) + goto out_unlock; + /* * Note vmas containing huge pages */ @@ -3739,6 +3866,10 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) ioctls_out &=3D ~((__u64)1 << _UFFDIO_CONTINUE); =20 + /* RWPROTECT is only supported for RWP ranges */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_RWP)) + ioctls_out &=3D ~((__u64)1 << _UFFDIO_RWPROTECT); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -4086,6 +4217,55 @@ static int userfaultfd_writeprotect(struct userfault= fd_ctx *ctx, return ret; } =20 +static int userfaultfd_rwprotect(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + int ret; + struct uffdio_rwprotect uffdio_rwp; + struct userfaultfd_wake_range range; + bool mode_rwp, mode_dontwake; + + if (atomic_read(&ctx->mmap_changing)) + return -EAGAIN; + + if (copy_from_user(&uffdio_rwp, (void __user *)arg, + sizeof(uffdio_rwp))) + return -EFAULT; + + ret =3D validate_range(ctx->mm, uffdio_rwp.range.start, + uffdio_rwp.range.len); + if (ret) + return ret; + + if (uffdio_rwp.mode & ~(UFFDIO_RWPROTECT_MODE_DONTWAKE | + UFFDIO_RWPROTECT_MODE_RWP)) + return -EINVAL; + + mode_rwp =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_RWP; + mode_dontwake =3D uffdio_rwp.mode & UFFDIO_RWPROTECT_MODE_DONTWAKE; + + if (mode_rwp && mode_dontwake) + return -EINVAL; + + if (mmget_not_zero(ctx->mm)) { + ret =3D mrwprotect_range(ctx, uffdio_rwp.range.start, + uffdio_rwp.range.len, mode_rwp); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (ret) + return ret; + + if (!mode_rwp && !mode_dontwake) { + range.start =3D uffdio_rwp.range.start; + range.len =3D uffdio_rwp.range.len; + wake_userfault(ctx, &range); + } + return ret; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -4392,6 +4572,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_POISON: ret =3D userfaultfd_poison(ctx, arg); break; + case UFFDIO_RWPROTECT: + ret =3D userfaultfd_rwprotect(ctx, arg); + break; } return ret; } --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24ED63FCB2C; Tue, 26 May 2026 13:05:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800755; cv=none; b=NkYaIIfNRv9W/rZBJy23d3s7Y0DhfSpLvUMsIa3kjkZodf6A/SREvJZSWvJawf5PCQoLEDlNrdLsCcnX0VlUharmrxHWEsRHvyB1tK9VnjCT+66xECBH2Zs4wKvC8dWI8IwBfxCWDcexrH8Fg08cRcjRT6lYU6k91J+AtY30MaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800755; c=relaxed/simple; bh=k8FwSeBgzlZlmMDe++ZeoffIVutmaiubqEM1iO0Babg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sy+UXGN4NsZxPgFYGxEix0VhSur6VRskj85sfGz52GCmr0U7niEu+x307Toz0pTZ10JITGc+3JbMLI90TLuKhIv5Jm3fPMQl+O6INrZ1n6NWac1ktd4Ex0gfFDgCvykcqnuPtiKaa1ZRdDrjcYP1OPnoDuuMM7Cvylt7Lrk3hkk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=c12QtxIR; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=vgPihFIx; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="c12QtxIR"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="vgPihFIx" Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfhigh.stl.internal (Postfix) with ESMTP id B95997A019A; Tue, 26 May 2026 09:05:52 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Tue, 26 May 2026 09:05:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800752; x= 1779887152; bh=5z/AbsTF67VwOQ5tREysiGeDZVwyPpbI1FBFpn7cWY8=; b=c 12QtxIRUXdGwwH8jwQTu6QrV/eh/GCx01CfBPzMDOlbUVbgEsUA4pjDq3GF6yK1b Yio6zurc/imnGWOjZfVC3dGc298KJ7lHmZnQVnOv3yTqAxNOAHlloA6neF9VzYIW JAke4CQ/o94SNqQi46MYuCYjpckrZoUxtEmDBvIrGcn+3oaBWoLdl7ClRcd7yHce o9enHlZEpwRs7UN2/ED8nGWxqBu/LBDTPWnZG4KhgMRYmR7SBLV9j1yfuYjC9hib CHd82YqYIUdHCmed6OZwcTRHIYKvAoCqSD9AiuQSjQDt7ZAYD/PvYZxvNmxt6hyZ PNMr2BNAw7sMRUrn0YLCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800752; x=1779887152; bh=5 z/AbsTF67VwOQ5tREysiGeDZVwyPpbI1FBFpn7cWY8=; b=vgPihFIxeDNsX/e2V xHtU6g4fiaARF74+PzI0CHIl10Lxyv+Xd6t6Ix6Blq4+skZl3ORHEGa5DULsRFrO BizCuusCSgaDhy1HX7QyrF4niSdK+1c0bQmAje6MmfMPfhmMtbemr66gYIg/uqWG e81AP/akoBVGuiru93u0MytZiBYYVxZIvEEsUxzBJMtbnPdApckZspxNuagYE1T0 7dPHdt/dIFonWl3DgqFaB859zPlhXYTUNmhU6l+LYc2HB/De1k7v/s6Qg6Vmyjrm sZWqK9X1CLSXYdiE25+lcSQwNoOAw662ToCG69e/9gAzQ1jsJl8uE3JhErIXg9Di jsbOQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEJiC033GhjL2GdnX/sXIn31a6xYH9Q9kJLS3tMsbkgwO/cImnHqXQKfEc55wOLx4 i2tqVUZQjDS/XsxdV1Neul2ks/uVgINeMKnTDgBTDd49K013fUhaa7mbXp5KOF+CzusOdT UiYD/yezdXIvKfe8z5w2tcDXcgJNRSU5eSwZFLQl4UtnpivyD/d0gOKuFq22BcEw4tqdAi ZrLfx6gsOEL+8JLB9yUFNWntFWViMy1Q2hMU0QsTFycK4r8MnF1MjvTDo2QKxDh/PpEdKA ZnnGzw8DPiuoI+QbA8cmDBSDEJ+Eojxn1EXRLRmRJNLXOxLbY2LwfoxO6aBp6BbF/3ourF UdEs5iVYJELfn9jpGI6iEoj+FoWdfm+tCA0vZfRqLV/kb/KnjWDMDsmj3WHNWInthvIGhP ehMVkALa/wNbA80mZf3/CwuB1yO+poNoZmaX/e6UNakIyclAenqv3BoYRoLXeIh6KqJFem GneTl3f6iVrJduldKxMMXmxBbMHZdJYHAv1qz0/ksxx/q4Zy98sPRUG+ndsiVhD3sqBaT6 d65Z4dAsrjpt+j3NH7uU4QsuNFTn619dClPX3yqNBaPqk1RRe8oQg5Ob9xcK0+fUyRCUS6 duUeT/Nr5sKUj7wjw84K/uMwB7ccaM1IISzH9N9CYuR5YuEfYkGphW25mshQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:52 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Date: Tue, 26 May 2026 14:05:01 +0100 Message-ID: <20260526130509.2748441-14-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Wire the fault side of read-write protection tracking and turn the userspace interface on. An RWP-protected PTE is PAGE_NONE with the uffd bit set. The PROT_NONE triggers a fault on any access; the uffd bit distinguishes it from plain mprotect(PROT_NONE) or NUMA hinting. Fault dispatch, per level: PTE handle_pte_fault() -> do_uffd_rwp() PMD __handle_mm_fault() -> do_huge_pmd_uffd_rwp() hugetlb hugetlb_fault() -> hugetlb_handle_userfault() The RWP branches gate on userfaultfd_pte_rwp() / userfaultfd_huge_pmd_rwp() (VM_UFFD_RWP plus the uffd bit) and fall through to do_numa_page() / do_huge_pmd_numa_page() otherwise. Each delivers a UFFD_PAGEFAULT_FLAG_RWP message through handle_userfault(); the handler resolves it with UFFDIO_RWPROTECT clearing MODE_RWP. userfaultfd_must_wait() and userfaultfd_huge_must_wait() add matching protnone+uffd waiters so sync-mode fault handlers block correctly. Expose the UAPI: UFFDIO_REGISTER_MODE_RWP -> UFFD_API_REGISTER_MODES UFFD_FEATURE_RWP -> UFFD_API_FEATURES _UFFDIO_RWPROTECT -> UFFD_API_RANGE_IOCTLS UFFD_API_RANGE_IOCTLS_BASIC UFFD_FEATURE_RWP is masked out at UFFDIO_API time when PROT_NONE is not available or VM_UFFD_RWP aliases VM_NONE (32-bit), so userspace never sees an advertised-but-broken feature. Works on anonymous, shmem, and hugetlb memory. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/huge_mm.h | 7 +++++++ include/linux/userfaultfd_k.h | 24 ++++++++++++++++++++++++ include/uapi/linux/userfaultfd.h | 12 ++++++++---- mm/huge_memory.c | 5 +++++ mm/hugetlb.c | 11 +++++++++++ mm/memory.c | 29 +++++++++++++++++++++++++++-- mm/userfaultfd.c | 32 ++++++++++++++++++++++++++++++-- 7 files changed, 112 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index edece3e26985..fe48d76957fb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -529,6 +529,8 @@ static inline bool folio_test_pmd_mappable(struct folio= *folio) =20 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf); + vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); =20 extern struct folio *huge_zero_folio; @@ -716,6 +718,11 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *p= ud, return NULL; } =20 +static inline vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return 0; +} + static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { return 0; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 28fc44733302..332fad1560ec 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -233,6 +233,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return userfaultfd_wp(vma) && pmd_uffd(pmd); } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return userfaultfd_rwp(vma) && pte_uffd(pte); +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return userfaultfd_rwp(vma) && pmd_uffd(pmd); +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & __VM_UFFD_FLAGS; @@ -363,6 +375,18 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_a= rea_struct *vma, return false; } =20 +static inline bool userfaultfd_pte_rwp(struct vm_area_struct *vma, + pte_t pte) +{ + return false; +} + +static inline bool userfaultfd_huge_pmd_rwp(struct vm_area_struct *vma, + pmd_t pmd) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 7b78aa3b5318..d803e76d47ad 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -25,7 +25,8 @@ #define UFFD_API ((__u64)0xAA) #define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING | \ UFFDIO_REGISTER_MODE_WP | \ - UFFDIO_REGISTER_MODE_MINOR) + UFFDIO_REGISTER_MODE_MINOR | \ + UFFDIO_REGISTER_MODE_RWP) #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ @@ -42,7 +43,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_RWP) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -54,13 +56,15 @@ (__u64)1 << _UFFDIO_MOVE | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_POISON) + (__u64)1 << _UFFDIO_POISON | \ + (__u64)1 << _UFFDIO_RWPROTECT) =20 /* * Valid ioctl command number range with this API is from 0x00 to diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6417d883d2e4..72cb44332004 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2289,6 +2289,11 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, return pmd_dirty(pmd); } =20 +vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) +{ + return handle_userfault(vmf, VM_UFFD_RWP); +} + /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d8d39cd8888..d4da39d698b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6062,6 +6062,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, goto out_mutex; } =20 + /* + * Protnone hugetlb PTEs with the uffd bit are used by + * userfaultfd RWP for access tracking. Plain PROT_NONE (without the + * marker) is not an RWP fault and is not expected on hugetlb (no + * NUMA hinting), so let normal hugetlb fault handling proceed. + */ + if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && + userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + } + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any diff --git a/mm/memory.c b/mm/memory.c index 06473285c0dc..111fdae14120 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6122,6 +6122,14 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru if (!pte_present(ptent) || !pte_protnone(ptent)) continue; =20 + /* + * RWP-armed PTEs are also protnone but carry _PAGE_UFFD as a + * marker. Leave them alone -- rewriting to vm_page_prot would + * stop the RWP trap. + */ + if (pte_uffd(ptent)) + continue; + if (pfn_folio(pte_pfn(ptent)) !=3D folio) continue; =20 @@ -6137,6 +6145,12 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru } } =20 +static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) +{ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); +} + static vm_fault_t do_numa_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -6412,8 +6426,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *= vmf) if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf); =20 - if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) + if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) { + /* + * RWP-protected PTEs are protnone plus the uffd bit. On a + * VM_UFFD_RWP VMA, a protnone PTE without the uffd bit is + * NUMA hinting and must still fall through to do_numa_page(). + */ + if (userfaultfd_pte_rwp(vmf->vma, vmf->orig_pte)) + return do_uffd_rwp(vmf); return do_numa_page(vmf); + } =20 spin_lock(vmf->ptl); entry =3D vmf->orig_pte; @@ -6527,8 +6549,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_s= truct *vma, return 0; } if (pmd_trans_huge(vmf.orig_pmd)) { - if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) + if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) { + if (userfaultfd_huge_pmd_rwp(vma, vmf.orig_pmd)) + return do_huge_pmd_uffd_rwp(&vmf); return do_huge_pmd_numa_page(&vmf); + } =20 if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !pmd_write(vmf.orig_pmd)) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index c07e3232a01a..db3707b9d977 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2668,6 +2668,12 @@ static inline bool userfaultfd_huge_must_wait(struct= userfaultfd_ctx *ctx, */ if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) return true; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * resolution. Plain PROT_NONE without the marker is not an RWP fault. + */ + if (pte_protnone(pte) && huge_pte_uffd(pte) && (reason & VM_UFFD_RWP)) + return true; =20 return false; } @@ -2728,8 +2734,14 @@ static inline bool userfaultfd_must_wait(struct user= faultfd_ctx *ctx, if (!pmd_present(_pmd)) return false; =20 - if (pmd_trans_huge(_pmd)) - return !pmd_write(_pmd) && (reason & VM_UFFD_WP); + if (pmd_trans_huge(_pmd)) { + if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) + return true; + if (pmd_protnone(_pmd) && pmd_uffd(_pmd) && + (reason & VM_UFFD_RWP)) + return true; + return false; + } =20 pte =3D pte_offset_map(pmd, address); if (!pte) @@ -2765,6 +2777,13 @@ static inline bool userfaultfd_must_wait(struct user= faultfd_ctx *ctx, */ if (!pte_write(ptent) && (reason & VM_UFFD_WP)) goto out; + /* + * PTE is still RW-protected (protnone with uffd bit), wait for + * userspace to resolve. Plain PROT_NONE without the marker is not + * an RWP fault. + */ + if (pte_protnone(ptent) && pte_uffd(ptent) && (reason & VM_UFFD_RWP)) + goto out; =20 ret =3D false; out: @@ -4506,6 +4525,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; } + /* + * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + uffdio_api.features &=3D ~UFFD_FEATURE_RWP; =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 951D23FD15F; Tue, 26 May 2026 13:05:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800758; cv=none; b=URnImXEeZX3SNLk7ZsPCAIwvFG9Yu+rSmE8d7y+p2QEHJyTmdm9Lb4GuxxZBfaARj7bV7IAz3o8U72fDCfSIdQL6i3882ilu1DfsR89HpPd55qQJAiMGzZHtKlQvt52Zhnj414HEQS0nvzkS5kbWF54PflmZ5e3u1dOzDY84hOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800758; c=relaxed/simple; bh=ve9QEfvYm0bCiD3Zi1eIbqx9x62ji4OWuE1qXAX7kjk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fGT27Qf4L+K4vTsguKF1KjN0iAc1rtfaFUPWOD4ZQeVIMAsJJp5TIv1kuErkYVblHigwFpRhxvQ3IrbKHmPGFVgZwji/0Y4M/LzotXBsaj6X+XJP6uXdzFrFdT8gFc7xMvli5kqqwj3Li3P0UKzdORzDDuo/voY2xHOYHKrwTEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=vLST9AFF; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=c3ogybA3; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="vLST9AFF"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="c3ogybA3" Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 62AE67A019F; Tue, 26 May 2026 09:05:55 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Tue, 26 May 2026 09:05:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800755; x=1779887155; bh=V7U0e8pmOCDRB+w7Ki4npwjrymHShsCq vi0HnvAdmrM=; b=vLST9AFF4jFVcywiNhRXckCFD2nnGmFtYy0JuMVjHLgK7r97 kdzuZGmNkeTiUMZxYbImCxvrMPhJXtQRm1xLm5ZzP0/jY2Psvle6t20lef/jbM2m df6Sv+xyva4ph5cWGglHdEKKcTdqIywRCNH4gas/qTqS/su4O/BpMlCbhQwIHq3+ cqTi7mn0B/SRe1IZ+cf5bA7GuAMkRauNUvlK908vDx/ecqRV+tch8mdHjBPPhsmc UZlZTkgDmt0cJdCyjo1LXD4zogdJtnNM3JMyUdk4QzGUPo3efAqwBbVaGgaquhEJ omR1+YsKViRDt1BW24cqnHqtTOoVMgh01yREIQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800755; x= 1779887155; bh=V7U0e8pmOCDRB+w7Ki4npwjrymHShsCqvi0HnvAdmrM=; b=c 3ogybA3pWTT7lOT0cKabVoApt9YWbqnqSlvLHgBmfRTUTJNaqrtMZ+aHqwGSSYJy /fln4P7yAV+EM7MQbr9fUX3tKrkXBD0adYECZrKbe06Q9g/ruMw03oVxz7Ujmxme w0hUgfCpNVGmOLODV4dqLrUAPbgd9TuFXU4iDKOSJwdKSSf9MB29nUIraeSdRbZC RdLDKDSWTuQwKEtfhExHjH3tl8wCFbm/GGIfMp1oMCqsZ4p+7Q+Fil+cy3XTzVRY y/1OVeOmK5BdfzD+eLc9gK5mjh8X/qczlrNFazCxxcLVt0YCnLsR6WkJw9tsNPQv 5OyEHXV5AXSR7MGhzjyXw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFRxKInQFecwZPuEt+LuNBlOJGP8RsxGpm588gQviVhNq1Pa+AqWD7JQeVUR2Swte eyzLZOEYFGJPBoftF7cmhN7hkCk3wRua/o4v8Bw72KaC1CFplNf1sAsIMRN76GRZVrrXJo 4g3If4FQDoPgtrZ9saDLAexzQgvrfsTwQ3/KP0AGKCFmLiVvQMOmXCsts9ZOCGk1E95eOU mgzRkXibxyISOqgf2hZqBdgrV6ktnG5u3x8/zXY22S6cQZP9xSB0eE/pqpJrpVnOId2p4l G0binpu+ah/FFZH0esjGo/wTdPAqY9SKUVzl+zWcrHaZzSyBzEWlNRkrUo/n9HkffJzpDJ RadNWjUYf05V4NClhmxfTiFz7mqn4CmgNeoF+o5VwMXR+lDZSJhql8M/tys2jB+a0Zeqq1 bu73w/0D8yuFUdQC97NuG73B8GkWe/dqnrTl4GCqO8Ue7YttD9BD/SzihZRsSPdFVPDk+Y jYkL8fugK6WenVTJFiKKh3L9J/loaS7yJvBGhOyt+dZ6EuRO6ZYAZJ586Opw270rs3vBWU b4bkzWE1tnalYCButGLiNmmU54wvpEzxJyj6Nby8S3goSjxIN1uptIsFfk+QDMex02FHUl wbSVXZ9lU78ZgImB99l6sxpzQWYjerHc08+RgKqGJ4ItmlNzLwaws+P5BPpg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:54 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Date: Tue, 26 May 2026 14:05:02 +0100 Message-ID: <20260526130509.2748441-15-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" PAGEMAP_SCAN already reports PAGE_IS_WRITTEN from the inverted uffd PTE bit, targeting the UFFDIO_WRITEPROTECT workflow. UFFDIO_RWPROTECT reuses the same PTE bit as a marker for read-write protection, but "has been written" and "has been accessed" are distinct semantic signals =E2=80=94 they happen to share one PTE bit today only because the t= wo implementations share infrastructure. Give RWP its own pagemap category so the UAPI does not conflate them: PAGE_IS_WRITTEN reported on VM_UFFD_WP VMAs, !pte_uffd(pte) PAGE_IS_ACCESSED reported on VM_UFFD_RWP VMAs, !pte_uffd(pte) Both still read the same PTE bit today, but each is scoped to the VMA whose registered mode makes the bit meaningful. If a future implementation moves RWP to a separate PTE bit, only PAGE_IS_ACCESSED switches over. This is a UAPI narrowing. Outside VM_UFFD_WP VMAs the uffd bit is always clear, so PAGEMAP_SCAN used to flag PAGE_IS_WRITTEN on every present PTE there =E2=80=94 a meaningless duplicate of PAGE_IS_PRESENT. Now PAGE_IS_WRITTEN fires only inside VM_UFFD_WP VMAs. pagemap_hugetlb_category() now takes the vma like its PTE/PMD peers. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- Documentation/admin-guide/mm/pagemap.rst | 13 +++- fs/proc/task_mmu.c | 75 ++++++++++++++++++------ include/uapi/linux/fs.h | 1 + tools/include/uapi/linux/fs.h | 1 + 4 files changed, 69 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin= -guide/mm/pagemap.rst index c57e61b5d8aa..ffa690a171c8 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -19,8 +19,11 @@ There are four components to pagemap: * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst) * Bit 56 page exclusively mapped (since 4.2) - * Bit 57 pte is uffd-wp write-protected (since 5.13) (see - Documentation/admin-guide/mm/userfaultfd.rst) + * Bit 57 pte is tracked by userfaultfd (since 5.13) =E2=80=94 in a + ``VM_UFFD_WP`` VMA this indicates a write-protected PTE; in a + ``VM_UFFD_RWP`` VMA it indicates an RWP-protected PTE. WP and + RWP are mutually exclusive per VMA, so the meaning is + unambiguous. See Documentation/admin-guide/mm/userfaultfd.rst. * Bit 58 pte is a guard region (since 6.15) (see madvise (2) man p= age) * Bits 59-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) @@ -244,7 +247,8 @@ in this IOCTL: Following flags about pages are currently supported: =20 - ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled -- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was writ= e protected +- ``PAGE_IS_WRITTEN`` - Page in a ``UFFDIO_REGISTER_MODE_WP`` VMA has been + written to since it was write-protected. Only reported inside such VMAs. - ``PAGE_IS_FILE`` - Page is file backed - ``PAGE_IS_PRESENT`` - Page is present in the memory - ``PAGE_IS_SWAPPED`` - Page is in swapped @@ -252,6 +256,9 @@ Following flags about pages are currently supported: - ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed - ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty - ``PAGE_IS_GUARD`` - Page is a part of a guard region +- ``PAGE_IS_ACCESSED`` - Page in a ``UFFDIO_REGISTER_MODE_RWP`` VMA has be= en + accessed since RWP was applied. Only reported inside such VMAs. See + Documentation/admin-guide/mm/userfaultfd.rst for the RWP workflow. =20 The ``struct pm_scan_arg`` is used as the argument of the IOCTL. =20 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 974c5f4aa533..0db29c3a8639 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2284,7 +2284,7 @@ static const struct mm_walk_ops pagemap_ops =3D { * Bits 5-54 swap offset if swapped * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dir= ty.rst) * Bit 56 page exclusively mapped - * Bit 57 pte is uffd-wp write-protected + * Bit 57 pte is tracked by userfaultfd (uffd-wp or RWP) * Bit 58 pte is a guard region * Bits 59-60 zero * Bit 61 page is file-page or shared-anon @@ -2419,7 +2419,7 @@ static int pagemap_release(struct inode *inode, struc= t file *file) PAGE_IS_FILE | PAGE_IS_PRESENT | \ PAGE_IS_SWAPPED | PAGE_IS_PFNZERO | \ PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY | \ - PAGE_IS_GUARD) + PAGE_IS_GUARD | PAGE_IS_ACCESSED) #define PM_SCAN_FLAGS (PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC) =20 struct pagemap_scan_private { @@ -2444,8 +2444,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_PRESENT; =20 - if (!pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page(vma, addr, pte); @@ -2462,8 +2466,12 @@ static unsigned long pagemap_page_category(struct pa= gemap_scan_private *p, =20 categories =3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 entry =3D softleaf_from_pte(pte); if (softleaf_is_guard_marker(entry)) @@ -2512,8 +2520,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, struct page *page; =20 categories |=3D PAGE_IS_PRESENT; - if (!pmd_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } =20 if (p->masks_of_interest & PAGE_IS_FILE) { page =3D vm_normal_page_pmd(vma, addr, pmd); @@ -2527,8 +2539,12 @@ static unsigned long pagemap_thp_category(struct pag= emap_scan_private *p, categories |=3D PAGE_IS_SOFT_DIRTY; } else { categories |=3D PAGE_IS_SWAPPED; - if (!pmd_swp_uffd(pmd)) - categories |=3D PAGE_IS_WRITTEN; + if (!pmd_swp_uffd(pmd)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pmd_swp_soft_dirty(pmd)) categories |=3D PAGE_IS_SOFT_DIRTY; =20 @@ -2561,7 +2577,8 @@ static void make_uffd_wp_pmd(struct vm_area_struct *v= ma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 #ifdef CONFIG_HUGETLB_PAGE -static unsigned long pagemap_hugetlb_category(pte_t pte) +static unsigned long pagemap_hugetlb_category(struct vm_area_struct *vma, + pte_t pte) { unsigned long categories =3D PAGE_IS_HUGE; =20 @@ -2576,8 +2593,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) if (pte_present(pte)) { categories |=3D PAGE_IS_PRESENT; =20 - if (!huge_pte_uffd(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!huge_pte_uffd(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (!PageAnon(pte_page(pte))) categories |=3D PAGE_IS_FILE; if (is_zero_pfn(pte_pfn(pte))) @@ -2587,8 +2608,12 @@ static unsigned long pagemap_hugetlb_category(pte_t = pte) } else { categories |=3D PAGE_IS_SWAPPED; =20 - if (!pte_swp_uffd_any(pte)) - categories |=3D PAGE_IS_WRITTEN; + if (!pte_swp_uffd_any(pte)) { + if (userfaultfd_wp(vma)) + categories |=3D PAGE_IS_WRITTEN; + if (userfaultfd_rwp(vma)) + categories |=3D PAGE_IS_ACCESSED; + } if (pte_swp_soft_dirty(pte)) categories |=3D PAGE_IS_SOFT_DIRTY; } @@ -2677,6 +2702,18 @@ static int pagemap_scan_test_walk(unsigned long star= t, unsigned long end, bool wp_allowed =3D userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma); =20 + /* + * PM_SCAN_WP_MATCHING is the atomic read-and-reset flavour of the + * scan and is implemented for the WP marker only. Silently skip + * VM_UFFD_RWP VMAs, matching the convention used below for VMAs + * that lack the WP-async capability. Returning -EINVAL here would + * abort the walk after preceding VMAs had already been mutated, + * destroying the atomic read-and-reset guarantee. Re-arming RWP + * is done with UFFDIO_RWPROTECT(MODE_RWP). + */ + if (userfaultfd_rwp(vma) && (p->arg.flags & PM_SCAN_WP_MATCHING)) + return 1; + if (!wp_allowed) { /* User requested explicit failure over wp-async capability */ if (p->arg.flags & PM_SCAN_CHECK_WPASYNC) @@ -2864,7 +2901,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, goto flush_and_return; } =20 - if (!p->arg.category_anyof_mask && !p->arg.category_inverted && + if (userfaultfd_wp(vma) && !p->arg.category_anyof_mask && + !p->arg.category_inverted && p->arg.category_mask =3D=3D PAGE_IS_WRITTEN && p->arg.return_mask =3D=3D PAGE_IS_WRITTEN) { for (addr =3D start; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -2939,7 +2977,8 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, /* Go the short route when not write-protecting pages. */ =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | + pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) return 0; @@ -2951,7 +2990,7 @@ static int pagemap_scan_hugetlb_entry(pte_t *ptep, un= signed long hmask, ptl =3D huge_pte_lock(hstate_vma(vma), vma->vm_mm, ptep); =20 pte =3D huge_ptep_get(walk->mm, start, ptep); - categories =3D p->cur_vma_category | pagemap_hugetlb_category(pte); + categories =3D p->cur_vma_category | pagemap_hugetlb_category(vma, pte); =20 if (!pagemap_scan_is_interesting_page(categories, p)) goto out_unlock; diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 13f71202845e..c4aeaa0c31c7 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -455,6 +455,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index 24ddf7bc4f25..f0a26309b6d5 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -364,6 +364,7 @@ typedef int __bitwise __kernel_rwf_t; #define PAGE_IS_HUGE (1 << 6) #define PAGE_IS_SOFT_DIRTY (1 << 7) #define PAGE_IS_GUARD (1 << 8) +#define PAGE_IS_ACCESSED (1 << 9) =20 /* * struct page_region - Page region with flags --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58D4E3FDC05; Tue, 26 May 2026 13:05:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800761; cv=none; b=dLiZUV+OyidqsSbHuoy/3Ij9GTn/yvyl7rTw2b1joPdpKYvsobXgZ62C3KoeUj6sn6w8SwUK2RBgAHpH2QYF7PDqVfH7fPk86bbGLOJEj8EkUZup2O31tXOklPCK4quY1MeyIVcwJEUMuBol/smpmcoIRUvVJBh3+/octO36v0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800761; c=relaxed/simple; bh=svRUpcdmLDUZp4HUWogd6/2hf4u66q++ufpaElVqXrU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qOgbsF3G5gRtuLIFIRmbvVy8SA1ZpGF036GsIXp6h84CZ+YbkV//3P9pD8EuQKZlxMNLZcE5emhwMu/luY73KmFG4nslOUeNH/QD5gz4TvPTYhNqw6I9c1MvxOJ6WiU0yooHAvGzOw+4F1yc9SILxtbRJ2aeT+MVBCrqZ52f3MA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=Pj5lWzCu; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=QXRbsLnl; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="Pj5lWzCu"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="QXRbsLnl" Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfout.stl.internal (Postfix) with ESMTP id 125D71D00136; Tue, 26 May 2026 09:05:58 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-05.internal (MEProxy); Tue, 26 May 2026 09:05:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800757; x=1779887157; bh=PX7WVh3vuRh+AWp3LK4dcDvLj/dnD9e3 A7FTfSbgriA=; b=Pj5lWzCue8Q8I4gSNZUws3uuEoltzDPmchbsGtHhWMBA2CHn Gq63JIIzUI3DIkU44mTmrQJmQ0pj+GFyH+b/Zg1N0qAY5EoIlE7p6WzmGFLdOZFS YxXHdO5Y2q5SjtQTX21L5Ac0K6dYZIXt+9RsDiYt2Gm0iDG5K8TcFPdpwpoCQnVG Dc6JhD5NzxU7/MqsE7aDo6eYV8cuy2JYsbMYM4puHjakgx3clYG+EyUHNE6VjS4K oaltnFd6Xy/b6wn1hFGhu5exrowndMhxAhgYwTGaJV2IAn1Uzc4AeX/wdB3/x0L+ E0755kfdcmP5yR0ExbLVWTx5HcCaYTSN87HJhA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800757; x= 1779887157; bh=PX7WVh3vuRh+AWp3LK4dcDvLj/dnD9e3A7FTfSbgriA=; b=Q XRbsLnlb66uL1qv+t9az7DSjL14e+4qFSDIzEyRmgbGf/XSrVkJ/WlrSg391Idhi DEu0FkdAzcM454QVho6NFc7gEmpPsOpQLkV1677XH9B0qY3nngc9rnHAQ/jebENg K6fh1mwAh/YqNxeKFDECsl4owTML4VguW84dRAlzZnUDYOMprldePgQ9WfNEMGg4 gRpaL5GIdrsW86GLQGHbsun3TWSyEXdGDiOqZ9wSxSd7Q3uEoO4N16LAa/1nP6Ys St0SO4cDuFKf9cfylwscr8X2swCGz8JpoJJTEeVEoS3RJ81IxBvPTXbq7gPOSLX/ oD185uKnQU4aQ6XK0hllg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFRxKInQFecwZPuEt+LuNBlOJGP8RsxGpm588gQviVhNq1Pa+AqWD7JQeVUR2Swte eyzLZOEYFGJPBoftF7cmhN7hkCk3wRua/o4v8Bw72KaC1CFplNf1sAsIMRN76GRZVrrXJo 4g3If4FQDoPgtrZ9saDLAexzQgvrfsTwQ3/KP0AGKCFmLiVvQMOmXCsts9ZOCGk1E95eOU mgzRkXibxyISOqgf2hZqBdgrV6ktnG5u3x8/zXY22S6cQZP9xSB0eE/pqpJrpVnOId2p4l G0binpu+ah/FFZH0esjGo/wTdPAqY9SKUVzl+zWcrHaZzSyBzEWlNRkrUo/n9HkffJzpIG YNd96Qkbq/e1nWeI9VaXxlgB6bhigF2yF9kzoOWWlkbPaWEfZgm9ZlbVFTye1qLkpHcjku Iaz8PgLGUHYvM/ync2TXfGgwLkD6Vz5TZO6pdeXMBTFDj3tOs7FT0dO3o4taXMCZwG8J1q 8DhkwncpnIntNd6tei3BLHi69PfftgA8eOctrYKfllNQgHdVxpSG6lOkga+WwCKwOsRuZt e7C3UM1H7m50VgshlO21HEXHe8MXOntZkRuVTS3U0qT8joOsFHgDYmxr+HnD62HyL5BgHH ESVpY1z4Mo5iBeMXEk03q9wHa91ACKa+7sQXLltbUnBuUffFpzCIUErPhXCA X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:57 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Date: Tue, 26 May 2026 14:05:03 +0100 Message-ID: <20260526130509.2748441-16-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Sync RWP delivers a message and blocks the faulting thread until the handler resolves the fault. For working-set tracking the VMM does not need the message: it just needs to know, at scan time, which pages were touched. Async RWP serves that use case =E2=80=94 the kernel restores access in-place and the faulting thread continues without blocking. The VMM reconstructs the access pattern after the fact via PAGEMAP_SCAN: pages whose uffd bit is still set (inverted PAGE_IS_ACCESSED) were not re-accessed since the last RWP cycle. Worth calling out: async resolution upgrades writable private anon PTEs via pte_mkwrite() when can_change_pte_writable() allows, mirroring do_numa_page(). Without it, every re-access of an RWP'd writable page would COW-fault a second time. UFFD_FEATURE_RWP_ASYNC requires UFFD_FEATURE_RWP. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Acked-by: Mike Rapoport (Microsoft) --- include/linux/userfaultfd_k.h | 6 ++++++ include/uapi/linux/userfaultfd.h | 11 ++++++++++- mm/huge_memory.c | 25 ++++++++++++++++++++++++- mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++- mm/memory.c | 27 +++++++++++++++++++++++++-- mm/userfaultfd.c | 19 ++++++++++++++++++- 6 files changed, 114 insertions(+), 6 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 332fad1560ec..87386b79049e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -278,6 +278,7 @@ extern void userfaultfd_unmap_complete(struct mm_struct= *mm, struct list_head *uf); extern bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma); extern bool userfaultfd_wp_async(struct vm_area_struct *vma); +extern bool userfaultfd_rwp_async(struct vm_area_struct *vma); =20 static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma) { @@ -456,6 +457,11 @@ static inline bool userfaultfd_wp_async(struct vm_area= _struct *vma) return false; } =20 +static inline bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return false; +} + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) { return false; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index d803e76d47ad..c10f08f8a618 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -44,7 +44,8 @@ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ UFFD_FEATURE_MOVE | \ - UFFD_FEATURE_RWP) + UFFD_FEATURE_RWP | \ + UFFD_FEATURE_RWP_ASYNC) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -243,6 +244,13 @@ struct uffdio_api { * UFFDIO_REGISTER_MODE_RWP for read-write protection tracking. * Pages are made inaccessible via UFFDIO_RWPROTECT and faults * are delivered when the pages are re-accessed. + * + * UFFD_FEATURE_RWP_ASYNC indicates asynchronous mode for + * UFFDIO_REGISTER_MODE_RWP. When set, faults on read-write + * protected pages are auto-resolved by the kernel (PTE + * permissions restored immediately) without delivering a message + * to the userfaultfd handler. Use PAGEMAP_SCAN with inverted + * PAGE_IS_ACCESSED to find pages that were not re-accessed. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -262,6 +270,7 @@ struct uffdio_api { #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) #define UFFD_FEATURE_RWP (1<<17) +#define UFFD_FEATURE_RWP_ASYNC (1<<18) __u64 features; =20 __u64 ioctls; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 72cb44332004..8f120452d995 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2291,7 +2291,30 @@ static inline bool can_change_pmd_writable(struct vm= _area_struct *vma, =20 vm_fault_t do_huge_pmd_uffd_rwp(struct vm_fault *vmf) { - return handle_userfault(vmf, VM_UFFD_RWP); + struct vm_area_struct *vma =3D vmf->vma; + pmd_t pmd; + + if (!userfaultfd_rwp_async(vma)) + return handle_userfault(vmf, VM_UFFD_RWP); + + vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(pmdp_get(vmf->pmd), vmf->orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + pmd =3D pmd_modify(vmf->orig_pmd, vma->vm_page_prot); + /* pmd_modify() preserves _PAGE_UFFD; drop it on resolution */ + pmd =3D pmd_clear_uffd(pmd); + pmd =3D pmd_mkyoung(pmd); + if (!pmd_write(pmd) && + vma_wants_manual_pte_write_upgrade(vma) && + can_change_pmd_writable(vma, vmf->address, pmd)) + pmd =3D pmd_mkwrite(pmd, vma); + set_pmd_at(vma->vm_mm, vmf->address & HPAGE_PMD_MASK, + vmf->pmd, pmd); + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; } =20 /* NUMA hinting page fault entry point for trans huge pmds */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d4da39d698b8..9da52d95b3fb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6070,7 +6070,37 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struc= t vm_area_struct *vma, */ if (pte_protnone(vmf.orig_pte) && vma_is_accessible(vma) && userfaultfd_rwp(vma) && huge_pte_uffd(vmf.orig_pte)) { - return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + spinlock_t *ptl; + pte_t pte; + + /* Sync: drop hugetlb locks before blocking in handle_userfault() */ + if (!userfaultfd_rwp_async(vma)) + return hugetlb_handle_userfault(&vmf, mapping, VM_UFFD_RWP); + + ptl =3D huge_pte_lock(h, mm, vmf.pte); + pte =3D huge_ptep_get(mm, vmf.address, vmf.pte); + if (pte_protnone(pte) && huge_pte_uffd(pte)) { + unsigned int shift =3D huge_page_shift(h); + + pte =3D huge_pte_modify(pte, vma->vm_page_prot); + pte =3D arch_make_huge_pte(pte, shift, vma->vm_flags); + /* huge_pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D huge_pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + /* + * Unlike do_uffd_rwp(), do not upgrade to writable + * here. Hugetlb lacks a can_change_huge_pte_writable() + * equivalent, so a write access will take a separate + * COW fault =E2=80=94 acceptable for the rare private hugetlb + * case. + */ + set_huge_pte_at(mm, vmf.address, vmf.pte, pte, + huge_page_size(h)); + update_mmu_cache(vma, vmf.address, vmf.pte); + } + spin_unlock(ptl); + ret =3D 0; + goto out_mutex; } =20 /* diff --git a/mm/memory.c b/mm/memory.c index 111fdae14120..5f56dcc2f265 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6147,8 +6147,31 @@ static void numa_rebuild_large_mapping(struct vm_fau= lt *vmf, struct vm_area_stru =20 static vm_fault_t do_uffd_rwp(struct vm_fault *vmf) { - pte_unmap(vmf->pte); - return handle_userfault(vmf, VM_UFFD_RWP); + pte_t pte; + + if (!userfaultfd_rwp_async(vmf->vma)) { + /* Sync mode: unmap PTE and deliver to userfaultfd handler */ + pte_unmap(vmf->pte); + return handle_userfault(vmf, VM_UFFD_RWP); + } + + spin_lock(vmf->ptl); + if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } + pte =3D pte_modify(vmf->orig_pte, vmf->vma->vm_page_prot); + /* pte_modify() preserves _PAGE_UFFD; drop it on resolution */ + pte =3D pte_clear_uffd(pte); + pte =3D pte_mkyoung(pte); + if (!pte_write(pte) && + vma_wants_manual_pte_write_upgrade(vmf->vma) && + can_change_pte_writable(vmf->vma, vmf->address, pte)) + pte =3D pte_mkwrite(pte, vmf->vma); + set_pte_at(vmf->vma->vm_mm, vmf->address, vmf->pte, pte); + update_mmu_cache(vmf->vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; } =20 static vm_fault_t do_numa_page(struct vm_fault *vmf) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index db3707b9d977..f40bf473a6f6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2487,6 +2487,11 @@ static bool userfaultfd_wp_async_ctx(struct userfaul= tfd_ctx *ctx) return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); } =20 +static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) +{ + return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); +} + /* * Whether WP_UNPOPULATED is enabled on the uffd context. It is only * meaningful when userfaultfd_wp()=3D=3Dtrue on the vma and when it's @@ -4408,6 +4413,11 @@ bool userfaultfd_wp_async(struct vm_area_struct *vma) return userfaultfd_wp_async_ctx(vma->vm_userfaultfd_ctx.ctx); } =20 +bool userfaultfd_rwp_async(struct vm_area_struct *vma) +{ + return userfaultfd_rwp_async_ctx(vma->vm_userfaultfd_ctx.ctx); +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -4511,6 +4521,12 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, if (features & UFFD_FEATURE_WP_ASYNC) features |=3D UFFD_FEATURE_WP_UNPOPULATED; =20 + ret =3D -EINVAL; + /* RWP_ASYNC requires RWP */ + if ((features & UFFD_FEATURE_RWP_ASYNC) && + !(features & UFFD_FEATURE_RWP)) + goto err_out; + /* report all available features and ioctls to userland */ uffdio_api.features =3D UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR @@ -4533,7 +4549,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, * but not actually usable. */ if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_RWP; + uffdio_api.features &=3D + ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fout-c3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E21083FE371; Tue, 26 May 2026 13:06:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.146 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800764; cv=none; b=BCg2hCVb1h+ZKqy/EATaBinhmJk6jv7p/wj5+AlDEUtLTTfLk0VaZH6JJF0Rm2mqnxrwFlRws66Q6i/Q6hAOkZb8dt+uE4+HkBz/8fEMZ6SaCQh2rrlk5xFmhmk39pcH/XANqVS8ZC32v+wmsHq88OY/HqNZO00LXvmQ/6NB/os= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800764; c=relaxed/simple; bh=Fg0ZtKI9jIo3f0yObFyHzAfy+QBXhQpS4B42gpXHlVI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fgccOWC+bc7LQ+1tr3MTuRS4VOsqXt44KXsXNNi2nW/DsXLEWWixlyKwowE6L1u8nn0rwLDo5XZRfUPURpNt0YZcRF/C9EQWAi582CO2dDNmkHOccgMYMiKy/VKb9dYf7fCTItnSo7IgHWp5p/BP82wak7ISITaZGF3zp9B04cc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=Q4wLGDE0; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=ew91eedb; arc=none smtp.client-ip=202.12.124.146 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="Q4wLGDE0"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ew91eedb" Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfout.stl.internal (Postfix) with ESMTP id AB6101D00137; Tue, 26 May 2026 09:06:00 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Tue, 26 May 2026 09:06:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1779800760; x= 1779887160; bh=GacYIKsvEC3Q0G1cEeb4eXR0U11nh62L+pkNkKxBJpI=; b=Q 4wLGDE0XMixu2WxHKxn7Cm8xuYKFqaat8Kj+JE+/HUbQGwUZts83X5c9tcRDmxLv IznhyyD7EI8xqzNg775D9LB9GVibUdBYVnN0QRjJj0d19BS9iHIHPiR0tOPZFb45 H4LFTH8OrI4Z0l03UBfii32PfTLZKOX1DYf1WTGyC1+6KSTNGbjiU3Izp+iboW0e 1gXvkgHJFc091o/vjEwbni6RKSf5EHIf/zZAtyzH7CEbnjYSufTSzqvs3N3antBa 0ivuiLpkDkvQ1qrHrAT4/fteBMd5ML8IqH0137xOD3Ll39YeqQYqRr/qvQzAOGXZ TmHCwVXZIJdiUf8AYhWLQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1779800760; x=1779887160; bh=G acYIKsvEC3Q0G1cEeb4eXR0U11nh62L+pkNkKxBJpI=; b=ew91eedbQJG4DdDQH aDBaNKfQQ7dczaGMtRwncBf1OxMQ7aDsNJbWdplYyKkmBQrW56cRE3n0wGF7IvzC 6IGgY/TquYm/X9naofh74WcCf7cf3Qwb5utRQNO7Q9E1acO7kdt4MVzYeKmGQuLC q0aFkqFkjUtjlJUfgjGf6iBPjqqUxXI2ZCyuBuE/FV1AM2K6QcvenIxN2uw1p3M2 mPVW4NhKVez8rofkdjcMYyAhZQEG7BzDgsy23ObDghk3oEPgdvZ53vpu4vYwc4Wz SjtW3CFN+Rm/jTZeKE7sajzsIWcqSnyEhpGpB9aKYa83g/rd6yJ1M/S6PQ0lxlnO Ndgxw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTEJiC033GhjL2GdnX/sXIn31a6xYH9Q9kJLS3tMsbkgwO/cImnHqXQKfEc55wOLx4 i2tqVUZQjDS/XsxdV1Neul2ks/uVgINeMKnTDgBTDd49K013fUhaa7mbXp5KOF+CzusOdT UiYD/yezdXIvKfe8z5w2tcDXcgJNRSU5eSwZFLQl4UtnpivyD/d0gOKuFq22BcEw4tqdAi ZrLfx6gsOEL+8JLB9yUFNWntFWViMy1Q2hMU0QsTFycK4r8MnF1MjvTDo2QKxDh/PpEdKA ZnnGzw8DPiuoI+QbA8cmDBSDEJ+Eojxn1EXRLRmRJNLXOxLbY2LwfoxO6aBp6BbF/3ouAV +rtH4r1JZbSnpOY5yE9tRPk1Hf2qqFNV17pMimdGxS0WJ3e9vVWWBY0OhpskHqwuc58Q/H c88Z/36lFToSrmUzOzYQPXAz042mFSuRSJR03wCPSDvZuE74YShMvkkUGfyynvupd6D1j0 lqPpgLnubsUDEZywqaFQ4fP1K9nyQ7vNxJUG36180U/0iZssi5sfjTWzv1W9oRpU5JvzFT mHex9P94YPFqBQUPxjSe1NxFfXNob2HrZj/btLjiMFvusBZp5jbZOLPxHoP41BinZPdTv5 +q3fjPHduTRpB2hTfs9lXf/ThqnJ9M+7rq0upIV0+dZ5IIGjE0wTjaKcbA9g X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:05:59 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Date: Tue, 26 May 2026 14:05:04 +0100 Message-ID: <20260526130509.2748441-17-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kiryl Shutsemau (Meta)" Add an ioctl to toggle async mode at runtime without re-registering the userfaultfd. This allows a VMM to switch between sync and async RWP modes on-the-fly -- for example, starting in async mode for working set scanning, then switching to sync mode to intercept faults during page eviction. UFFDIO_SET_MODE takes an enable/disable bitmask of UFFD_FEATURE_* flags. Only UFFD_FEATURE_RWP_ASYNC is toggleable today; the ioctl rejects any other bit with -EINVAL. Enabling RWP_ASYNC also requires RWP to have been negotiated at UFFDIO_API time, mirroring the UFFDIO_API invariant. Fault-path readers of ctx->features run under mmap_read_lock or a per-VMA lock; the RMW takes mmap_write_lock and calls vma_start_write() on every UFFD-armed VMA, so those readers are fully excluded. userfaultfd_show_fdinfo(), however, reads ctx->features without any lock, so the RMW is written as a single WRITE_ONCE and fdinfo reads it with READ_ONCE. That keeps the lockless observer from seeing a mid-RMW intermediate and removes the audit burden when new toggleable bits are added later. When switching to async, pending sync waiters are woken so they retry and auto-resolve under the new mode. Signed-off-by: Kiryl Shutsemau (Meta) Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- include/uapi/linux/userfaultfd.h | 14 +++ mm/userfaultfd.c | 150 +++++++++++++++++++++++++------ 2 files changed, 136 insertions(+), 28 deletions(-) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index c10f08f8a618..cea11aad6b54 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -49,6 +49,7 @@ #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ + (__u64)1 << _UFFDIO_SET_MODE | \ (__u64)1 << _UFFDIO_API) #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ @@ -85,6 +86,7 @@ #define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_POISON (0x08) #define _UFFDIO_RWPROTECT (0x09) +#define _UFFDIO_SET_MODE (0x0A) #define _UFFDIO_API (0x3F) =20 /* userfaultfd ioctl ids */ @@ -111,6 +113,8 @@ struct uffdio_poison) #define UFFDIO_RWPROTECT _IOWR(UFFDIO, _UFFDIO_RWPROTECT, \ struct uffdio_rwprotect) +#define UFFDIO_SET_MODE _IOW(UFFDIO, _UFFDIO_SET_MODE, \ + struct uffdio_set_mode) =20 /* read() structure */ struct uffd_msg { @@ -406,6 +410,16 @@ struct uffdio_move { __s64 move; }; =20 +struct uffdio_set_mode { + /* + * Toggle async mode for features at runtime. + * Supported: UFFD_FEATURE_RWP_ASYNC. + * Setting a bit in both enable and disable is invalid. + */ + __u64 enable; + __u64 disable; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f40bf473a6f6..f172ec14a6c8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -2477,19 +2477,29 @@ struct userfaultfd_wake_range { /* internal indication that UFFD_API ioctl was successfully executed */ #define UFFD_FEATURE_INITIALIZED (1u << 31) =20 +/* + * UFFDIO_SET_MODE updates ctx->features under mmap_write_lock with + * WRITE_ONCE; readers that run outside mmap_read_lock or the per-VMA + * lock (poll/read_iter/ioctl, fdinfo) must pair with READ_ONCE. + */ +static unsigned int userfaultfd_features(struct userfaultfd_ctx *ctx) +{ + return READ_ONCE(ctx->features); +} + static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) { - return ctx->features & UFFD_FEATURE_INITIALIZED; + return userfaultfd_features(ctx) & UFFD_FEATURE_INITIALIZED; } =20 static bool userfaultfd_wp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_WP_ASYNC); } =20 static bool userfaultfd_rwp_async_ctx(struct userfaultfd_ctx *ctx) { - return ctx && (ctx->features & UFFD_FEATURE_RWP_ASYNC); + return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_RWP_ASYNC); } =20 /* @@ -2504,7 +2514,7 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct= *vma) if (!ctx) return false; =20 - return ctx->features & UFFD_FEATURE_WP_UNPOPULATED; + return userfaultfd_features(ctx) & UFFD_FEATURE_WP_UNPOPULATED; } =20 static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, @@ -4290,6 +4300,109 @@ static int userfaultfd_rwprotect(struct userfaultfd= _ctx *ctx, return ret; } =20 +/* Subset of UFFD_API_FEATURES actually supported by this kernel/arch */ +static __u64 uffd_api_available_features(void) +{ + __u64 f =3D UFFD_API_FEATURES; + + if (!IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_MINOR)) + f &=3D ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + if (!pgtable_supports_uffd()) + f &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; + if (!uffd_supports_wp_marker()) + f &=3D ~(UFFD_FEATURE_WP_HUGETLBFS_SHMEM | + UFFD_FEATURE_WP_UNPOPULATED | + UFFD_FEATURE_WP_ASYNC); + /* + * RWP needs both PROT_NONE support and the uffd PTE bit. The + * VM_UFFD_RWP check covers compile-time unavailability; the + * pgtable_supports_uffd() check covers runtime (e.g. riscv + * without the SVRSW60T59B extension) where the PTE bit is declared + * but not actually usable. + */ + if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) + f &=3D ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + return f; +} + +/* Async features that can be toggled at runtime via UFFDIO_SET_MODE */ +#define UFFD_FEATURE_TOGGLEABLE UFFD_FEATURE_RWP_ASYNC + +static int userfaultfd_set_mode(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + struct uffdio_set_mode mode; + struct mm_struct *mm =3D ctx->mm; + + if (copy_from_user(&mode, (void __user *)arg, sizeof(mode))) + return -EFAULT; + + /* enable and disable must not overlap */ + if (mode.enable & mode.disable) + return -EINVAL; + + /* only toggleable features that this kernel/arch actually supports */ + if ((mode.enable | mode.disable) & + ~(uffd_api_available_features() & UFFD_FEATURE_TOGGLEABLE)) + return -EINVAL; + + /* RWP_ASYNC can only be enabled on contexts that negotiated RWP */ + if ((mode.enable & UFFD_FEATURE_RWP_ASYNC) && + !(userfaultfd_features(ctx) & UFFD_FEATURE_RWP)) + return -EINVAL; + + if (!mmget_not_zero(mm)) + return -ESRCH; + + /* + * Drain in-flight faults before flipping features. mmap_write_lock() + * blocks new mmap_read_lock() callers, but per-VMA locked faults + * (lock_vma_under_rcu() + FAULT_FLAG_VMA_LOCK) that acquired before + * this point keep running. Calling vma_start_write() on each UFFD- + * armed VMA waits for those readers to drop, so no in-flight fault + * can observe the old features after mmap_write_unlock(). + */ + mmap_write_lock(mm); + { + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + + for_each_vma(vmi, vma) { + if (vma->vm_userfaultfd_ctx.ctx =3D=3D ctx) + vma_start_write(vma); + } + } + /* + * Single WRITE_ONCE so lockless readers (fdinfo, poll/read_iter + * via userfaultfd_is_initialized(), and the userfaultfd_features() + * helper used elsewhere) can't observe a mid-RMW intermediate + * value. Hot-path readers already serialise through the mmap lock + * + vma_start_write() drain above, so their load doesn't need an + * annotation. + */ + WRITE_ONCE(ctx->features, + (ctx->features | mode.enable) & ~mode.disable); + mmap_write_unlock(mm); + + /* + * If switching to async, wake threads blocked in handle_userfault(). + * They will retry the fault and auto-resolve under the new mode. + * len=3D0 means wake all pending faults on this context. + */ + if (mode.enable & UFFD_FEATURE_RWP_ASYNC) { + struct userfaultfd_wake_range range =3D { .len =3D 0 }; + + spin_lock_irq(&ctx->fault_pending_wqh.lock); + __wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, + &range); + __wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range); + spin_unlock_irq(&ctx->fault_pending_wqh.lock); + } + + mmput(mm); + return 0; +} + static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long= arg) { __s64 ret; @@ -4528,29 +4641,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *c= tx, goto err_out; =20 /* report all available features and ioctls to userland */ - uffdio_api.features =3D UFFD_API_FEATURES; -#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR - uffdio_api.features &=3D - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif - if (!pgtable_supports_uffd()) - uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; - - if (!uffd_supports_wp_marker()) { - uffdio_api.features &=3D ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_UNPOPULATED; - uffdio_api.features &=3D ~UFFD_FEATURE_WP_ASYNC; - } - /* - * RWP needs both PROT_NONE support and the uffd-wp PTE bit. The - * VM_UFFD_RWP check covers compile-time unavailability; the - * pgtable_supports_uffd() check covers runtime (e.g. riscv - * without the SVRSW60T59B extension) where the PTE bit is declared - * but not actually usable. - */ - if (VM_UFFD_RWP =3D=3D VM_NONE || !pgtable_supports_uffd()) - uffdio_api.features &=3D - ~(UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC); + uffdio_api.features =3D uffd_api_available_features(); =20 ret =3D -EINVAL; if (features & ~uffdio_api.features) @@ -4620,6 +4711,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, case UFFDIO_RWPROTECT: ret =3D userfaultfd_rwprotect(ctx, arg); break; + case UFFDIO_SET_MODE: + ret =3D userfaultfd_set_mode(ctx, arg); + break; } return ret; } @@ -4647,7 +4741,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *= m, struct file *f) * protocols: aa:... bb:... */ seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n", - pending, total, UFFD_API, ctx->features, + pending, total, UFFD_API, userfaultfd_features(ctx), UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS); } #endif --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4B533FE666; Tue, 26 May 2026 13:06:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800767; cv=none; b=ks3CzbxWYLeESGGNhm59u4pBrXvGl453Nq1zUP7BXls6L1vmpXSkz9lDh7BAphqopPznjHuZzN7h3WABl9gsKaDupihbxYe3jrZdJ8bV8D8XeDvcO694rM2v5VVgl6xAViI3JUHLmAWfcKgGO/9WG9CkKCQtMaC9IySgz0Zrw74= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800767; c=relaxed/simple; bh=WPzMyRebTjLU90QW9vjUj2VVA5T/AQkzQ00eDWKasgM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pxoYQ0dbl9b1+qU1wb7hg3h99p6mnLcCdaBYUPjUz3CEwqWvjexhMQGL8Mts/T1wkdIK6Ag/K7dWVNakKn0bBuI4ZrHQqN5klfK2lopGJan/btyvehgK08+ojPdWgasPyBLkhBHXQaLHZ2m8YC/+EXNDO1pC7WVnaK6+9SXzeOE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=HkPCh+NH; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=B7u3p3UH; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="HkPCh+NH"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="B7u3p3UH" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id 983197A0176; Tue, 26 May 2026 09:06:03 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Tue, 26 May 2026 09:06:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800763; x=1779887163; bh=eOz+0brXo9TSVG/xiKRnhSykq2dVBP2j zZ+IvBYXcOM=; b=HkPCh+NHQwTr+ian2JPVFnUaSxtiRn7LfDKcAPkGpzsoL0nR G81z42d3nHL7pEhHozpPaxrNFhNt/CyFvxCwo7ZWnZcoHY2z+N52jd3hPQm82AMP +X60UbEtFM8V2M2MEQUFnk/IzAGvnE5EUfylxBSXc0x4jQkh4Z9lzQt1jlKLVpcK JxABYGjCZpByUmdCDHsb6c98DMMzz7Dker882IS9OBH1MXth40vuzqumZV/F0KbR Dr4o40fc6OZWJhZsEbXp5PIEpAwM6TFtZ55YRmAKZa3aBk0DP4xQO6a2GyOTSCLj fPaiNFCpYmfg6XJ9oqW+UyRGFr/XuUUzZTCaLQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800763; x= 1779887163; bh=eOz+0brXo9TSVG/xiKRnhSykq2dVBP2jzZ+IvBYXcOM=; b=B 7u3p3UHDvz/GV5rsArcC26PET9qGWp1Bsus2dnqtsgDmmpipEVIVGfzv/YW3L2tf Gt5dCwdsv1ZZu9VkC7cozbahR5ikeV8A/yPwQ1k47fJbHM6eTPaDEN7gfQR3abbE c2Q0fyofQcY4Wmx8IXP84HSLX3uaLQuhfXhrKgxwucWXN4WqpmvPY2Q82YEcsRx+ Ddk2rQY/xrikq8ofOVFL5Pl55BiaK0PN/8VaNYs94doM9svDzJOcMzJw4boTzBQw kExMATsH1TYN2ySkRwvJKNjCRqM6TF0poIcHTXVGGW2HYSIqjwGpXCANJvm2gxI/ gjEjfFAEGVKDt4dtlCLXg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFRxKInQFecwZPuEt+LuNBlOJGP8RsxGpm588gQviVhNq1Pa+AqWD7JQeVUR2Swte eyzLZOEYFGJPBoftF7cmhN7hkCk3wRua/o4v8Bw72KaC1CFplNf1sAsIMRN76GRZVrrXJo 4g3If4FQDoPgtrZ9saDLAexzQgvrfsTwQ3/KP0AGKCFmLiVvQMOmXCsts9ZOCGk1E95eOU mgzRkXibxyISOqgf2hZqBdgrV6ktnG5u3x8/zXY22S6cQZP9xSB0eE/pqpJrpVnOId2p4l G0binpu+ah/FFZH0esjGo/wTdPAqY9SKUVzl+zWcrHaZzSyBzEWlNRkrUo/n9HkffJzpMI rBdKczXG69UeuPoqgq3OhcTnRfp8BVktpV9GIjADXWZcobs2IqSa8Ff8CDv5FMsBfjhSBh edLmMFE3ebqGb4wW/zccR0JxojNzk99oqMYoEBMD3rUuLzKMPZeqnskHcV5BvQBDKjx5iC C0Yc4A7Fl7pQTLDywDgzX7wuHks4kmSNkp7XCLPMkZKpuMO1p1rl/BoLtD1txNSXIi1+P1 7cr9BzA5Ta3j/2pqYKq6MH5/kASkQhPYtCII2PMFRivC9/xsrt6N4LMgtJsG9u6116cXW+ 5nlqQoTTh3veHOLV6uwiL/pf5W6ZPvTPvSPDFZhngpzNr091Yd8LU70EOR1w X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:06:02 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Date: Tue, 26 May 2026 14:05:05 +0100 Message-ID: <20260526130509.2748441-18-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Coverage for UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT: rwp-async async mode =E2=80=94 touch pages, verify permissions a= re auto-restored without a message rwp-sync sync mode =E2=80=94 access blocks, handler resolves via UFFDIO_RWPROTECT rwp-pagemap PAGEMAP_SCAN reports still-cold pages via inverted PAGE_IS_ACCESSED rwp-mprotect RWP survives mprotect(PROT_NONE) -> mprotect(PROT_READ|PROT_WRITE) round-trip rwp-gup GUP walks through a protnone RWP PTE (pipe write/read drives the GUP path) rwp-async-toggle UFFDIO_SET_MODE flips between sync and async without re-registering rwp-close closing the uffd restores page permissions rwp-fork RWP survives fork() with EVENT_FORK; child's PTEs keep the uffd bit rwp-fork-pin RWP survives fork() on an RO-longterm-pinned anon page (forces copy_present_page()); child read auto-resolves and clears the bit, proving PAGE_NONE was in place rwp-wp-exclusive register with MODE_WP|MODE_RWP returns -EINVAL All tests run against anon, shmem, shmem-private, hugetlb, and hugetlb-private memory, except rwp-fork-pin which is anon-only =E2=80=94 copy_present_page() is the private-anon pinned-exclusive fork path. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Mike Rapoport (Microsoft) --- tools/testing/selftests/mm/uffd-unit-tests.c | 765 +++++++++++++++++++ 1 file changed, 765 insertions(+) diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/s= elftests/mm/uffd-unit-tests.c index a6c14109e818..9f5a5ccf6044 100644 --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -7,6 +7,8 @@ =20 #include "uffd-common.h" =20 +#include +#include #include "../../../../mm/gup_test.h" =20 #ifdef __NR_userfaultfd @@ -109,6 +111,10 @@ static void uffd_test_skip(const char *message) =20 static void test_uffd_api(bool use_dev) { + const uint64_t expected_ioctls =3D + BIT_ULL(_UFFDIO_REGISTER) | + BIT_ULL(_UFFDIO_UNREGISTER) | + BIT_ULL(_UFFDIO_API); struct uffdio_api uffdio_api; int uffd; =20 @@ -148,6 +154,15 @@ static void test_uffd_api(bool use_dev) goto out; } =20 + /* Verify returned fd-level ioctls bitmask */ + if ((uffdio_api.ioctls & expected_ioctls) !=3D expected_ioctls) { + uffd_test_fail("UFFDIO_API missing expected ioctls: " + "got=3D0x%"PRIx64", expected=3D0x%"PRIx64, + (uint64_t)uffdio_api.ioctls, + expected_ioctls); + goto out; + } + /* Test double requests of UFFDIO_API with a random feature set */ uffdio_api.features =3D BIT_ULL(0); if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D 0) { @@ -602,6 +617,685 @@ void uffd_minor_collapse_test(uffd_global_test_opts_t= *gopts, uffd_test_args_t * uffd_minor_test_common(gopts, true, false); } =20 +static int uffd_register_rwp(int uffd, void *addr, uint64_t len) +{ + struct uffdio_register reg =3D { + .range =3D { .start =3D (unsigned long)addr, .len =3D len }, + .mode =3D UFFDIO_REGISTER_MODE_RWP, + }; + + if (ioctl(uffd, UFFDIO_REGISTER, ®) =3D=3D -1) + return -errno; + return 0; +} + +static void rwprotect_range(int uffd, __u64 start, __u64 len, bool protect) +{ + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D start, .len =3D len }, + .mode =3D protect ? UFFDIO_RWPROTECT_MODE_RWP : 0, + }; + + if (ioctl(uffd, UFFDIO_RWPROTECT, &rwp)) + err("UFFDIO_RWPROTECT failed"); +} + +static void set_async_mode(int uffd, bool enable) +{ + struct uffdio_set_mode mode =3D { }; + + if (enable) + mode.enable =3D UFFD_FEATURE_RWP_ASYNC; + else + mode.disable =3D UFFD_FEATURE_RWP_ASYNC; + + if (ioctl(uffd, UFFDIO_SET_MODE, &mode)) + err("UFFDIO_SET_MODE failed"); +} + +/* + * Test async RWP faults on anonymous memory. + * Populate pages, register MODE_RWP with RWP_ASYNC, + * RW-protect, re-access, verify content preserved and no faults delivered. + */ +static void uffd_rwp_async_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate all pages with known content */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages (sets protnone) */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Access all pages =E2=80=94 should auto-resolve, no faults */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu content mismatch: %u !=3D %u", + p, page[0], expected); + return; + } + } + + uffd_test_pass(); +} + +/* + * Fault handler for RWP =E2=80=94 unprotect the page via UFFDIO_RWPROTECT. + */ +static void uffd_handle_rwp_fault(uffd_global_test_opts_t *gopts, + struct uffd_msg *msg, + struct uffd_args *uargs) +{ + if (!(msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_RWP)) + err("expected RWP fault, got 0x%llx", + msg->arg.pagefault.flags); + + rwprotect_range(gopts->uffd, msg->arg.pagefault.address, + gopts->page_size, false); + uargs->minor_faults++; +} + +/* + * Test sync RWP faults on anonymous memory. + * Populate pages, register MODE_RWP (sync), RW-protect, + * access from worker thread, verify fault delivered, UFFDIO_RWPROTECT res= olves. + */ +static void uffd_rwp_sync_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + pthread_t uffd_mon; + struct uffd_args uargs =3D { }; + bool failed =3D false; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + /* Register MODE_RWP */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* RW-protect all pages */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Start fault handler thread */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access all pages =E2=80=94 triggers sync RWP faults, handler unprotect= s */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + failed =3D true; + goto out; + } + } + +out: + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (failed) + return; + if (uargs.minor_faults =3D=3D 0) + uffd_test_fail("expected RWP faults, got 0"); + else + uffd_test_pass(); +} + +/* + * Test PAGEMAP_SCAN detection of RW-protected (cold) pages. + */ +static void uffd_rwp_pagemap_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Need at least 4 pages */ + if (nr_pages < 4) { + uffd_test_skip("need at least 4 pages"); + return; + } + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Touch first half of pages to re-activate them (async auto-resolve) */ + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* Scan for cold (still RW-protected) pages */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + /* + * PAGE_IS_ACCESSED is set once the uffd-wp bit has been cleared + * (access happened, or the user resolved). Invert it to select + * still-protected (cold) pages. + */ + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + + /* + * The second half of pages should be reported as RW-protected. + * They may be coalesced into one region. + */ + if (ret < 1) { + uffd_test_fail("expected cold pages, got %ld regions", ret); + return; + } + + /* Verify the cold region covers the second half */ + uint64_t cold_start =3D regions[0].start; + uint64_t expected_start =3D (uint64_t)gopts->area_dst + + (nr_pages / 2) * page_size; + + if (cold_start !=3D expected_start) { + uffd_test_fail("cold region starts at 0x%lx, expected 0x%lx", + (unsigned long)cold_start, + (unsigned long)expected_start); + return; + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection survives a mprotect(PROT_NONE) -> + * mprotect(PROT_READ|PROT_WRITE) round-trip. The uffd-wp bit on a + * VM_UFFD_RWP VMA must continue to carry PROT_NONE semantics after + * mprotect() changes the base protection; otherwise accesses would + * silently succeed and the pagemap bit would stick without a fault + * ever clearing it. + */ +static void uffd_rwp_mprotect_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + struct page_region regions[16]; + struct pm_scan_arg pm_arg; + int pagemap_fd; + long ret; + + /* Populate all pages */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, 0xab, page_size); + + /* Register and RW-protect the whole range */ + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Round-trip mprotect(): PROT_NONE -> PROT_READ|PROT_WRITE */ + if (mprotect(gopts->area_dst, nr_pages * page_size, PROT_NONE)) + err("mprotect() PROT_NONE"); + if (mprotect(gopts->area_dst, nr_pages * page_size, + PROT_READ | PROT_WRITE)) + err("mprotect() PROT_READ|PROT_WRITE"); + + /* Touch every page. Async RWP must auto-resolve each fault. */ + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + /* + * After touching, no page should remain RW-protected. A stuck + * uffd-wp bit would mean mprotect() silently dropped PROT_NONE and + * the access never faulted. + */ + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + err("open pagemap"); + + memset(&pm_arg, 0, sizeof(pm_arg)); + pm_arg.size =3D sizeof(pm_arg); + pm_arg.start =3D (uint64_t)gopts->area_dst; + pm_arg.end =3D (uint64_t)gopts->area_dst + nr_pages * page_size; + pm_arg.vec =3D (uint64_t)regions; + pm_arg.vec_len =3D ARRAY_SIZE(regions); + pm_arg.category_mask =3D PAGE_IS_ACCESSED; + pm_arg.category_inverted =3D PAGE_IS_ACCESSED; + pm_arg.return_mask =3D PAGE_IS_ACCESSED; + + ret =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg); + close(pagemap_fd); + + if (ret < 0) { + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno)); + return; + } + if (ret !=3D 0) { + uffd_test_fail("expected no cold pages after mprotect()+touch, got %ld r= egions", + ret); + return; + } + + uffd_test_pass(); +} + +/* + * Test that GUP resolves through protnone PTEs (async mode). + * vmsplice() into a pipe pins user pages via get_user_pages_fast() -- + * unlike write(), which goes through copy_from_user() and ordinary + * hardware page faults -- so it exercises gup_can_follow_protnone() on + * the RW-protected PTE. In async mode the kernel auto-restores + * permissions and GUP returns the page. + */ +static void uffd_rwp_gup_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + struct iovec iov; + char buf; + int pipefd[2]; + + /* Populate first page with known content */ + memset(gopts->area_dst, 0xCD, gopts->page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, gopts->page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + gopts->page_size, true); + + if (pipe(pipefd)) + err("pipe"); + + /* + * One byte's worth of iov is enough to GUP the containing page and + * keeps the pipe transfer well under any pipe-capacity limit even on + * hugetlb-backed runs. + */ + iov.iov_base =3D gopts->area_dst; + iov.iov_len =3D 1; + if (vmsplice(pipefd[1], &iov, 1, 0) !=3D 1) { + uffd_test_fail("vmsplice from RW-protected page failed: %s", + strerror(errno)); + goto out; + } + + if (read(pipefd[0], &buf, 1) !=3D 1) { + uffd_test_fail("read from pipe failed"); + goto out; + } + + if (buf !=3D (char)0xCD) { + uffd_test_fail("content mismatch: got 0x%02x, expected 0xCD", + (unsigned char)buf); + goto out; + } + + uffd_test_pass(); +out: + close(pipefd[0]); + close(pipefd[1]); +} + +/* + * Test runtime toggle between async and sync modes. + * Start in async mode (detection), flip to sync (eviction), verify faults + * block, resolve them, flip back to async. + */ +static void uffd_rwp_async_toggle_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffd_args uargs =3D { }; + pthread_t uffd_mon; + char c =3D '\0'; + unsigned long p; + + uargs.gopts =3D gopts; + uargs.handle_fault =3D uffd_handle_rwp_fault; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + /* Phase 1: async detection =E2=80=94 RW-protect, access first half */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages / 2; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; /* auto-resolves in async mode */ + } + + /* Phase 2: flip to sync for eviction */ + set_async_mode(gopts->uffd, false); + + /* Start handler =E2=80=94 will receive faults for cold pages */ + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs)) + err("uffd_poll_thread create"); + + /* Access second half (cold pages) =E2=80=94 should trigger sync faults */ + for (p =3D nr_pages / 2; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + if (page[0] !=3D (p % 255 + 1)) { + uffd_test_fail("page %lu content mismatch", p); + goto out; + } + } + + /* + * Stop the handler before reading minor_faults: the last fault + * resolution rwprotect_range()s before incrementing the counter, + * so the main thread can race ahead of the increment. Stopping + * here also makes Phase 3 a clean async-only test -- with the + * handler still running it would silently resolve any sync fault + * the kernel erroneously delivers, masking a regression. + */ + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); + + if (uargs.minor_faults =3D=3D 0) { + uffd_test_fail("expected sync faults, got 0"); + return; + } + + /* Phase 3: flip back to async */ + set_async_mode(gopts->uffd, true); + + /* RW-protect and access again =E2=80=94 should auto-resolve */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + for (p =3D 0; p < nr_pages; p++) { + volatile char *page =3D gopts->area_dst + p * page_size; + (void)*page; + } + + uffd_test_pass(); + return; +out: + if (write(gopts->pipefd[1], &c, sizeof(c)) !=3D sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("join() failed"); +} + +/* + * Test that RW-protected pages become accessible after closing uffd. + */ +static void uffd_rwp_close_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + unsigned long p; + + /* Populate */ + for (p =3D 0; p < nr_pages; p++) + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size); + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failure"); + + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + nr_pages * page_size, true); + + /* Close uffd =E2=80=94 should restore protnone PTEs */ + close(gopts->uffd); + gopts->uffd =3D -1; + + /* All pages should be accessible with original content */ + for (p =3D 0; p < nr_pages; p++) { + unsigned char *page =3D (unsigned char *)gopts->area_dst + + p * page_size; + unsigned char expected =3D p % 255 + 1; + + if (page[0] !=3D expected) { + uffd_test_fail("page %lu not accessible after close", p); + return; + } + } + + uffd_test_pass(); +} + +/* + * Test that RWP protection is preserved across fork() when + * UFFD_FEATURE_EVENT_FORK is enabled. Without preservation, the child's + * PTEs would lose the uffd-wp marker and RWP-protected accesses would + * silently fall through to do_numa_page(). + */ +static void uffd_rwp_fork_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + int pagemap_fd; + uint64_t value; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, + nr_pages * page_size)) + err("register failed"); + + /* Populate + RWP-protect */ + *gopts->area_dst =3D 1; + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, + page_size, true); + + /* Parent: verify uffd-wp bit is set before fork */ + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * Fork with EVENT_FORK: child inherits VM_UFFD_RWP. Child reads + * its own pagemap and must still see the uffd-wp bit set. + */ + if (pagemap_test_fork(gopts, true, false)) { + uffd_test_fail("RWP marker lost in child after fork"); + goto out; + } + + uffd_test_pass(); +out: + close(pagemap_fd); +} + +/* + * Test that RWP protection on a pinned anon page is preserved across fork= (). + * Pinning forces copy_present_page() in the child path, which must restore + * PAGE_NONE on top of the uffd bit. Using async mode, a read in the child + * auto-resolves if =E2=80=94 and only if =E2=80=94 the PTE was actually p= rotnone+uffd; the + * cleared uffd bit afterward proves the fault path ran. + */ +static void uffd_rwp_fork_pin_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long page_size =3D gopts->page_size; + fork_event_args fevent_args =3D { .gopts =3D gopts, .child_uffd =3D -1 }; + pin_args pin_args =3D {}; + int pagemap_fd, status; + pthread_t fevent_thread; + uint64_t value; + pid_t child; + + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, page_size)) + err("register failed"); + + /* Populate. */ + *gopts->area_dst =3D 1; + + /* RO-longterm pin so fork() takes copy_present_page() for this PTE. */ + if (pin_pages(&pin_args, gopts->area_dst, page_size)) { + uffd_test_skip("Possibly CONFIG_GUP_TEST missing or unprivileged"); + uffd_unregister(gopts->uffd, gopts->area_dst, page_size); + return; + } + + /* RWP-protect: PTE is now PAGE_NONE + uffd bit. */ + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, page_size, true); + + pagemap_fd =3D pagemap_open(); + value =3D pagemap_get_entry(pagemap_fd, gopts->area_dst); + pagemap_check_wp(value, true); + + /* + * UFFD_FEATURE_EVENT_FORK is required so the child inherits + * VM_UFFD_RWP and the marker; without it dup_userfaultfd() resets + * the child VMA and the test would pass for the wrong reason. + * dup_userfaultfd() blocks until the EVENT_FORK message is consumed, + * so spawn a reader before the fork(). + */ + gopts->ready_for_fork =3D false; + if (pthread_create(&fevent_thread, NULL, fork_event_consumer, + &fevent_args)) + err("pthread_create() for fork event consumer"); + while (!gopts->ready_for_fork) + ; /* Wait for consumer to start polling. */ + + child =3D fork(); + if (child < 0) + err("fork"); + if (child =3D=3D 0) { + volatile char c; + int cfd; + + /* + * Read the pinned page. Only reaches the fault path if the + * child PTE is protnone + uffd; async mode auto-resolves and + * clears the uffd bit. If copy_present_page() dropped + * PAGE_NONE, the read would silently succeed and the bit + * would still be set. + */ + c =3D *(volatile char *)gopts->area_dst; + (void)c; + + cfd =3D pagemap_open(); + value =3D pagemap_get_entry(cfd, gopts->area_dst); + close(cfd); + _exit((value & PM_UFFD_WP) ? 1 : 0); + } + if (waitpid(child, &status, 0) < 0) + err("waitpid"); + if (pthread_join(fevent_thread, NULL)) + err("pthread_join() for fork event consumer"); + if (fevent_args.child_uffd >=3D 0) + close(fevent_args.child_uffd); + + unpin_pages(&pin_args); + close(pagemap_fd); + if (uffd_unregister(gopts->uffd, gopts->area_dst, page_size)) + err("unregister failed"); + + if (!WIFEXITED(status) || WEXITSTATUS(status) !=3D 0) { + uffd_test_fail("RWP not enforced in child after pinned fork"); + return; + } + + uffd_test_pass(); +} + +/* + * WP and RWP share the uffd-wp PTE bit and cannot coexist in the same VMA. + * Registration requesting both modes must be rejected. + */ +static void uffd_rwp_wp_exclusive_test(uffd_global_test_opts_t *gopts, + uffd_test_args_t *args) +{ + unsigned long nr_pages =3D gopts->nr_pages; + unsigned long page_size =3D gopts->page_size; + struct uffdio_register reg =3D { }; + + reg.range.start =3D (unsigned long)gopts->area_dst; + reg.range.len =3D nr_pages * page_size; + reg.mode =3D UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_RWP; + + if (ioctl(gopts->uffd, UFFDIO_REGISTER, ®) =3D=3D 0) { + uffd_test_fail("register with WP|RWP unexpectedly succeeded"); + return; + } + if (errno !=3D EINVAL) { + uffd_test_fail("register with WP|RWP: expected EINVAL, got %d", + errno); + return; + } + uffd_test_pass(); +} + static sigjmp_buf jbuf, *sigbuf; =20 static void sighndl(int sig, siginfo_t *siginfo, void *ptr) @@ -1604,6 +2298,77 @@ uffd_test_case_t uffd_tests[] =3D { /* We can't test MADV_COLLAPSE, so try our luck */ .uffd_feature_required =3D UFFD_FEATURE_MINOR_SHMEM, }, + { + .name =3D "rwp-async", + .uffd_fn =3D uffd_rwp_async_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-sync", + .uffd_fn =3D uffd_rwp_sync_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-pagemap", + .uffd_fn =3D uffd_rwp_pagemap_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-mprotect", + .uffd_fn =3D uffd_rwp_mprotect_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-gup", + .uffd_fn =3D uffd_rwp_gup_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-async-toggle", + .uffd_fn =3D uffd_rwp_async_toggle_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }, + { + .name =3D "rwp-close", + .uffd_fn =3D uffd_rwp_close_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D UFFD_FEATURE_RWP, + }, + { + .name =3D "rwp-fork", + .uffd_fn =3D uffd_rwp_fork_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-fork-pin", + .uffd_fn =3D uffd_rwp_fork_pin_test, + .mem_targets =3D MEM_ANON, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC | + UFFD_FEATURE_EVENT_FORK, + }, + { + .name =3D "rwp-wp-exclusive", + .uffd_fn =3D uffd_rwp_wp_exclusive_test, + .mem_targets =3D MEM_ALL, + .uffd_feature_required =3D + UFFD_FEATURE_RWP | + UFFD_FEATURE_PAGEFAULT_FLAG_WP | + UFFD_FEATURE_WP_HUGETLBFS_SHMEM, + }, { .name =3D "sigbus", .uffd_fn =3D uffd_sigbus_test, --=20 2.54.0 From nobody Mon Jun 8 22:01:34 2026 Received: from fhigh-c5-smtp.messagingengine.com (fhigh-b5-smtp.messagingengine.com [202.12.124.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B0303FE67A; Tue, 26 May 2026 13:06:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800769; cv=none; b=FMaw2JpQJl7+NQPmB0Cbie5l8uCqsl1lWdr9e+dASLsSddnDnPpxD4U057JZtnc2Jh65hvAD93GIHJ6ytkV/vsPX0/ed5MT3to7rHQma2eGduWTScojDcMsv1y8+JJ6a7beS+JwDmsqALNLfhZe4QEvaYljzQXAs9A2RRR2+emM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800769; c=relaxed/simple; bh=tRfM1cARdb7c85If26xviHwZi2bvyrNLHqKjMkiRP4s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lNUUHdRDZ6l0hx0k8VEm+N9N4txgv7dlmqSwytLa1DhRtDdZELaMBiKoUCXiriD2J8Vq/NDWf/JNUMO88t/BCin8/5VVVCe8LuiR5jf66YJTmFNWevh637KlruT00j4rzuCYB7EMETCN8C7mmWDkzFyOZ7BjjAknfEjlFkbX4TY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name; spf=pass smtp.mailfrom=shutemov.name; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b=Y/Y76rwg; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=oKR8qVl+; arc=none smtp.client-ip=202.12.124.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shutemov.name Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shutemov.name header.i=@shutemov.name header.b="Y/Y76rwg"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="oKR8qVl+" Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 214707A0194; Tue, 26 May 2026 09:06:06 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Tue, 26 May 2026 09:06:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1779800765; x=1779887165; bh=0RZWxLCSQpWS7L56BzIYMW5jz8tYKTKo 3QaGhAIbGbc=; b=Y/Y76rwg/t4CKGBLZ1HS4+UHInVKNp7EDicE4hVhEt44xpbU jTr3Y312l2aXv1Fl9t5NZUMtAy9NBOhkqw+zTw9HNKt8NTyKHu8Fz1akG80nDvjb g3dz9su5XvWFnGoz2WMUBovrEYfkGpjlkkh+B9UG25oYPwMzdLEHdwQnZRDr+rAv k+e5PqGtdwf3t0Fkt16FRuhLJK5AJJhRytPYPVbfdFptxqPoAfgyyDeRlG+diET0 bn2teK4oUA/q1qeB6GkA0PTYbP+WJDtQDypBJrxWfGSe3gSHzzVDVx8qmpFcLudB jCcxu9sCYOjT/kWbcbxcX+v0GaMzO7xa4MZz/A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1779800765; x= 1779887165; bh=0RZWxLCSQpWS7L56BzIYMW5jz8tYKTKo3QaGhAIbGbc=; b=o KR8qVl+ORE6L9OrV2AV/iEDxv7De7HOIBWHxQWWdOnvjacWY30r/NZqcwfXZeTGI oFecSbMUplpetewkg/bBXavoomobua74FcHCxERIEg1Yxv6T2mENCe0r41L7PPE7 L+0i5fHmvSt6xkoUkVhBsM4rTzekeipAVy7ntmxzJtCuV7OlnZyBZa3xKl8zEpVZ NjXFngzrItIBkraxyUQ6gUmNG0H5FjSiIwYlZ8EuzbOdBTC4+bduBw/wjaFR6fCs kiByXDlERKymtrODdLHzq/+kGOTMybx63Q/Fv9/FARdEFWDy93xW+1CWM+YD7SGT 92fZFpHbBqeowrpKGM6DQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: dmFkZTFRxKInQFecwZPuEt+LuNBlOJGP8RsxGpm588gQviVhNq1Pa+AqWD7JQeVUR2Swte eyzLZOEYFGJPBoftF7cmhN7hkCk3wRua/o4v8Bw72KaC1CFplNf1sAsIMRN76GRZVrrXJo 4g3If4FQDoPgtrZ9saDLAexzQgvrfsTwQ3/KP0AGKCFmLiVvQMOmXCsts9ZOCGk1E95eOU mgzRkXibxyISOqgf2hZqBdgrV6ktnG5u3x8/zXY22S6cQZP9xSB0eE/pqpJrpVnOId2p4l G0binpu+ah/FFZH0esjGo/wTdPAqY9SKUVzl+zWcrHaZzSyBzEWlNRkrUo/n9HkffJzpL/ kGIu5JwsORrZsXWIEDxPvEa3Kz8I/CYeERaY4tWVAozXgw/06ELyM1NrVd22X2GFXL95ix pkbXc2CH3newDMbYDBStBRG/83gjwiiEfxZpqIKWVfvqJ/KWg9paUA73/pxQ68Fh2uN7VS Eq75y4XBDPEHn6pe4+aoTb0iU3NiiO5jgFzcuC9CnqTwkHYka4MIPFhebWLzKy/6H58x5E GwDA1S1AoxCpvf6vdI2L63P7VKJjRCb3qX1d8SkS3dqct3ag9qXa/v037EQPaktPXFy3HE jXJt3UfMPxNGn5I4uZmiGYgNUfFZqfiXwYtNxMFT1qQxi33E0BO3XMV/0YmQ X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 May 2026 09:06:05 -0400 (EDT) From: Kiryl Shutsemau To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com, david@kernel.org Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org, Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net, skhan@linuxfoundation.org, seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com, sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org, kernel-team@meta.com, "Kiryl Shutsemau (Meta)" Subject: [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Date: Tue, 26 May 2026 14:05:06 +0100 Message-ID: <20260526130509.2748441-19-kirill@shutemov.name> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526130509.2748441-1-kirill@shutemov.name> References: <20260526130509.2748441-1-kirill@shutemov.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kiryl Shutsemau (Meta)" Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP: - sync and async fault models; - UFFDIO_RWPROTECT semantics; - UFFD_FEATURE_RWP_ASYNC; - UFFDIO_SET_MODE runtime mode flips. It also covers typical VMM working-set-tracking workflow from detection loop through sync-mode eviction and back to async. Signed-off-by: Kiryl Shutsemau Assisted-by: Claude:claude-opus-4-6 --- Documentation/admin-guide/mm/userfaultfd.rst | 238 ++++++++++++++++++- 1 file changed, 232 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index 1e533639fd50..1db99b5355f7 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -275,16 +275,16 @@ tracking and it can be different in a few ways: - Dirty information will not get lost if the pte was zapped due to various reasons (e.g. during split of a shmem transparent huge page). =20 - - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit - set; dirty when uffd-wp bit cleared), it has different semantics on - some of the memory operations. For example: ``MADV_DONTNEED`` on + - Due to a reverted meaning of soft-dirty (page clean when the uffd bit + is set; dirty when the uffd bit is cleared), it has different semantics + on some of the memory operations. For example: ``MADV_DONTNEED`` on anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as - dirtying of memory by dropping uffd-wp bit during the procedure. + dirtying of memory by dropping the uffd bit during the procedure. =20 The user app can collect the "written/dirty" status by looking up the -uffd-wp bit for the pages being interested in /proc/pagemap. +uffd bit for the pages being interested in /proc/pagemap. =20 -The page will not be under track of uffd-wp async mode until the page is +The page will not be under track of userfaultfd-wp async mode until the pa= ge is explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault that was tracked by async mode userfaultfd-wp is invalid. @@ -307,6 +307,232 @@ transparent to the guest, we want that same address r= ange to act as if it was still poisoned, even though it's on a new physical host which ostensibly doesn't have a memory error in the exact same spot. =20 +Read-Write Protection +--------------------- + +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)`` +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a +protected range fall through to the normal missing-page path. It uses the +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages +inaccessible while keeping them resident in memory. Works on anonymous, +shmem, and hugetlbfs memory. + +RWP is designed for VM memory managers that need to track the working set +of guest memory for cold page eviction to tiered or remote storage. + +**Setup:** + +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``. + Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well =E2=80=94 it requ= ires + ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call. + +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP`` + (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be + fetched back from storage). + +**Feature availability:** + +RWP is built on top of two kernel primitives: a spare PTE bit owned by +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and architecture support +for present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When= both +are available on a 64-bit kernel, the build selects +``CONFIG_USERFAULTFD_RWP=3Dy`` and the ``VM_UFFD_RWP`` VMA flag becomes +available. + +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are masked out of the +features returned by ``UFFDIO_API`` when the running kernel or architecture +cannot support them =E2=80=94 for example 32-bit kernels (where ``VM_UFFD_= RWP`` is +unavailable), kernels built without ``CONFIG_USERFAULTFD_RWP``, and +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv +without the ``SVRSW60T59B`` extension). ``UFFDIO_API`` does not fail; +unsupported bits are simply absent from ``uffdio_api.features`` on return. +Callers should inspect the returned ``features`` after ``UFFDIO_API`` and +fall back to another tracking method when RWP is unavailable. + +**Protecting and Unprotecting:** + +Use ``UFFDIO_RWPROTECT`` to protect or unprotect a range, mirroring the +``UFFDIO_WRITEPROTECT`` interface:: + + struct uffdio_rwprotect rwp =3D { + .range =3D { .start =3D addr, .len =3D len }, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP, /* protect */ + }; + ioctl(uffd, UFFDIO_RWPROTECT, &rwp); + +Setting ``UFFDIO_RWPROTECT_MODE_RWP`` sets PROT_NONE on present PTEs in the +range. Pages stay resident and their physical frames are preserved =E2=80= =94 only +access permissions are removed. + +Clearing ``UFFDIO_RWPROTECT_MODE_RWP`` restores normal VMA permissions and +wakes any faulting threads (unless ``UFFDIO_RWPROTECT_MODE_DONTWAKE`` is s= et). + +**Scope of protection:** + +RWP protection is a property of *present* PTEs. ``UFFDIO_RWPROTECT`` only +affects entries that are already populated. Unpopulated addresses within +the range remain unpopulated; when first accessed they fault through the +normal missing path (``do_anonymous_page()``, ``do_swap_page()``, +``finish_fault()``) and the resulting PTE is not RWP-protected. To observe +the population itself, co-register the range with +``UFFDIO_REGISTER_MODE_MISSING``. + +Protection is preserved across page reclaim: a page swapped out while +RWP-protected carries the marker on its swap entry, and swap-in restores +the PROT_NONE state so the first access after swap-in still faults. The +same applies to pages temporarily replaced by migration entries. + +Operations that drop the PTE entirely =E2=80=94 ``MADV_DONTNEED`` on anony= mous +memory, hole-punch on shmem, truncation of a file mapping =E2=80=94 also d= rop the +RWP marker: the next access re-populates the range without protection. +Unlike WP (which persists via ``PTE_MARKER_UFFD_WP``), there is no +persistent RWP marker today. The user needs to re-arm the range with +``UFFDIO_RWPROTECT`` after any operation that explicitly frees PTEs. + +**Fault Handling:** + +When a protected page is accessed: + +- **Sync mode** (default): The faulting thread blocks and a + ``UFFD_PAGEFAULT_FLAG_RWP`` message is delivered to the userfaultfd + handler. The handler resolves the fault with ``UFFDIO_RWPROTECT`` + (clearing ``MODE_RWP``), which restores the PTE permissions and wakes + the faulting thread. + +- **Async mode** (``UFFD_FEATURE_RWP_ASYNC``): The kernel automatically + restores PTE permissions and the thread continues without blocking. No + message is delivered to the handler. + +**Runtime Mode Switching:** + +``UFFDIO_SET_MODE`` toggles ``UFFD_FEATURE_RWP_ASYNC`` at runtime, allowing +the VMM to switch between lightweight async detection and safe sync +eviction without re-registering. The toggle takes ``mmap_write_lock()`` +and calls ``vma_start_write()`` on each UFFD-armed VMA, draining +in-flight per-VMA-locked faults before the new mode takes effect. + +**Cold Page Detection with PAGEMAP_SCAN:** + +RWP-protected PTEs carry the uffd PTE bit; the fault-resolution path +clears it. ``PAGEMAP_SCAN`` reports ``PAGE_IS_ACCESSED`` once the bit is +clear on a ``VM_UFFD_RWP`` VMA, so inverting it efficiently reports the +still-protected (cold) pages:: + + struct pm_scan_arg arg =3D { + .size =3D sizeof(arg), + .start =3D guest_mem_start, + .end =3D guest_mem_end, + .vec =3D (uint64_t)regions, + .vec_len =3D regions_len, + .category_mask =3D PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + }; + long n =3D ioctl(pagemap_fd, PAGEMAP_SCAN, &arg); + +The returned ``page_region`` array contains contiguous cold ranges that can +then be evicted. + +**Cleanup:** + +When the userfaultfd is closed or the range is unregistered, all PROT_NONE +PTEs are automatically restored to their normal VMA permissions. This +prevents pages from becoming permanently inaccessible. + +**VMM Working Set Tracking Workflow:** + +A typical VMM lifecycle for cold page eviction to tiered storage. Two +mappings of the same shmem (or hugetlbfs) file are used: ``guest_mem`` is +the RWP-registered mapping that vCPUs access through, and ``io_mem`` is a +private mapping for VMM-side I/O. Reading ``io_mem`` does not go through +the RWP-protected PTEs of ``guest_mem``, so the VMM's own ``pwrite()`` +never traps on its own :: + + /* One-time setup */ + fd =3D memfd_create("guest", MFD_CLOEXEC); + ftruncate(fd, guest_size); + guest_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* vCPU view, RWP-registered */ + io_mem =3D mmap(NULL, guest_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); /* VMM I/O view, unprotected */ + + uffd =3D userfaultfd(O_CLOEXEC | O_NONBLOCK); + struct uffdio_api api =3D { + .api =3D UFFD_API, + .features =3D UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC, + }; + ioctl(uffd, UFFDIO_API, &api); + if (!(api.features & UFFD_FEATURE_RWP)) + /* RWP unavailable on this kernel/arch -- fall back. */ + ioctl(uffd, UFFDIO_REGISTER, &(struct uffdio_register){ + .range =3D { guest_mem, guest_size }, + .mode =3D UFFDIO_REGISTER_MODE_RWP | + UFFDIO_REGISTER_MODE_MISSING, + }); + + /* Tracking loop */ + while (vm_running) { + /* 1. Detection phase (async -- no vCPU stalls) */ + ioctl(uffd, UFFDIO_RWPROTECT, &(struct uffdio_rwprotect){ + .range =3D full_range, + .mode =3D UFFDIO_RWPROTECT_MODE_RWP }); + sleep(tracking_interval); + + /* + * 2. Switch to sync BEFORE scanning. In async mode a vCPU + * access between the scan and any eviction step silently + * clears the uffd bit, so the scan would already disagree + * with the page state by the time eviction begins. Sync mode + * blocks vCPU accesses, freezing the cold snapshot for the + * rest of the iteration. + */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .disable =3D UFFD_FEATURE_RWP_ASYNC }); + + /* 3. Find cold pages (uffd bit still set) */ + ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){ + .category_mask =3D PAGE_IS_ACCESSED, + .category_inverted =3D PAGE_IS_ACCESSED, + .return_mask =3D PAGE_IS_ACCESSED, + ... + }); + + /* 4. Evict cold pages (vCPU faults block on guest_mem) */ + for each cold range: + /* Read from io_mem -- bypasses RWP, no fault. */ + pwrite(storage_fd, (char *)io_mem + cold_offset, + len, cold_offset); + /* Drop the page from the shared file. */ + fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + cold_offset, len); + /* + * Wake any vCPU blocked on the RWP fault for this range: + * fallocate() does not iterate ctx->fault_pending_wqh. + */ + ioctl(uffd, UFFDIO_WAKE, &(struct uffdio_range){ + .start =3D (uintptr_t)guest_mem + cold_offset, + .len =3D len }); + + /* 5. Resume async tracking */ + ioctl(uffd, UFFDIO_SET_MODE, + &(struct uffdio_set_mode){ + .enable =3D UFFD_FEATURE_RWP_ASYNC }); + } + +During step 4, a vCPU that accesses ``guest_mem + cold_offset`` blocks +with a ``UFFD_PAGEFAULT_FLAG_RWP`` fault while the eviction is in +progress. After ``fallocate()`` punches the page out and ``UFFDIO_WAKE`` +fires, the vCPU retries the access, faults as ``MISSING``, and the +handler resolves it with ``UFFDIO_COPY`` from storage. + +This workflow targets shmem and hugetlbfs (both support a private +``io_mem`` mapping over the same fd). Anonymous-memory backings need a +different inner-loop strategy because the VMM has no way to read the +page without going through the RWP-protected mapping. + QEMU/KVM =3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.54.0