From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5C70A3E9593 for ; Tue, 10 Mar 2026 07:30:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127858; cv=none; b=XHBdOTM8lfxaB61bATIcSVHoO1eH4x7qHyJDHtC8GEa90qFBmK5jr4JcURaXq7OfHg9Zu5Ct9nqI+rOQgISH+4br3XtyftStXsENNik6jPzNAucxn4m2F7g5XL2iy+VCuKGvq1IEPx5Kv5DtTT+truWWyBtssofDOJ8BBb/BFfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127858; c=relaxed/simple; bh=MRlcBvZLEUKmw8dJBFH04z8FpXAXusVfgkni9JIk3Kc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=J2FEZEdHpsxSZ6Eyw3nvUEsRqqTk4/dcbPI9J6kcQeSAeU6JfNrZXyUc78qPGy6d1Sw2JgLwaeXa7uBWTcj7y1a++++/IhMxuVQiwSbug6XvawUelh51HRHYF8mpD6jLDz8TJckrnzNopjsj9mAYDc7afq/abNqD2lUj/ANi4i8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 721DF152B; Tue, 10 Mar 2026 00:30:49 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BE6DA3F73B; Tue, 10 Mar 2026 00:30:46 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 1/9] mm/rmap: make nr_pages signed in try_to_unmap_one Date: Tue, 10 Mar 2026 13:00:05 +0530 Message-Id: <20260310073013.4069309-2-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, nr_pages is defined as unsigned long. We use nr_pages to manipulate mm rss counters for lazyfree folios as follows: add_mm_counter(mm, MM_ANONPAGES, -nr_pages); Suppose nr_pages =3D 3. -nr_pages underflows and becomes ULONG_MAX - 2. The= n, since add_mm_counter() uses this -nr_pages as a long, ULONG_MAX - 2 does not fit into the positive range of long, and is converted to -3. Eventually all of this works out, but for keeping things simple, declare nr_pages as a signed variable. Signed-off-by: Dev Jain --- mm/rmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 6398d7eef393f..087c9f5b884fe 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1979,9 +1979,10 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, struct page *subpage; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; - unsigned long nr_pages =3D 1, end_addr; + unsigned long end_addr; unsigned long pfn; unsigned long hsz =3D 0; + long nr_pages =3D 1; int ptes =3D 0; =20 /* --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9D5BC3E9593 for ; Tue, 10 Mar 2026 07:31:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127867; cv=none; b=JZa2870Knb7wUHMsigueUNV6HzrUuvImdonjU5/+iH/RmfG2+pC5PTXMFfBFoRmuFb0OtCBgvLFDdnxWAm2VoWDt484adO7XaSLqK0Qs0BkuHQ0yVqk9+6SKL5qzsRbSl+3gEQz4RmPajuzboZiWdICREUCC+nxx5hkHL8yckWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127867; c=relaxed/simple; bh=UORdVV0kozPBokbsJs/d7jDEqRDGNyPcMRZ5Dg/JibY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=o2tYxm7ubwkSkM/46Aoe5u5cMcbnqLqz0S483wtoetFnA5FsylvJVm3PtP9FhoNFFQtuFkwFzEGSMvQflnwlzcwttPPIWj+C15gPTpx/T+jbdFqgrDfi6o/s5S0etk/k4kxq4srmRKKkP3T8rvKOmrM4jfZZx9PLdacRsvF+s/g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CCA92169C; Tue, 10 Mar 2026 00:30:58 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 30A363F73B; Tue, 10 Mar 2026 00:30:55 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 2/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Date: Tue, 10 Mar 2026 13:00:06 +0530 Message-Id: <20260310073013.4069309-3-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Initialize nr_pages to 1 at the start of the loop, similar to what is being done in folio_referenced_one(). It may happen that the nr_pages computed from a previous call to folio_unmap_pte_batch gets reused without again going through folio_unmap_pte_batch, messing up things. Although, I don't think there is any bug right now; a bug would have been there, if in the same instance of a call to try_to_unmap_one, we end up in the pte_present(pteval) branch, then in the else branch doing pte_clear() for device-exclusive ptes. This means that a lazyfree folio has some present entries and some device entries mapping it. Since a pte being device-exclusive means that a GUP reference on the underlying folio is held, the lazyfree unmapping path upon witnessing this will abort try_to_unmap_one. Signed-off-by: Dev Jain --- mm/rmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 087c9f5b884fe..1fa020edd954a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1982,7 +1982,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, unsigned long end_addr; unsigned long pfn; unsigned long hsz =3D 0; - long nr_pages =3D 1; + long nr_pages; int ptes =3D 0; =20 /* @@ -2019,6 +2019,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); =20 while (page_vma_mapped_walk(&pvmw)) { + nr_pages =3D 1; + /* * If the folio is in an mlock()d vma, we must not swap it out. */ --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1DE12292B44 for ; Tue, 10 Mar 2026 07:31:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127876; cv=none; b=tFTWbj7CLx5L5peu9j7sxfSKhW/w5vWAbmeY6FLhjwSLHVAlEBcXYmW+h1BmKuM1zSKLo6FjJnOoH3WUFtaz7HSXPyk7N1YrSrjNZ3O5hnT+qh3ECnW57GL9TIIF3qmotNm2OxKfgsJwh95LJeIap0mm3CDIhH1etvTTDsSqYi8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127876; c=relaxed/simple; bh=q2rLzpigrF7jknYgtD0LPRZ+vKpHZJSDe1tMDWjXK78=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BL4/BF/FbQvoKS5iqRC6HR8DZ/nQpjfiEuGrlIiNTuF6kHoOUxzvyydC2vf9yBNHAAq6eftlKDa5ObsLrkuAVsmRP9RdW3ztnEKgYJAQx5lLwObPY9fhjU3dX+R1UoE3sA3mML9Wtl/70lev5i5W//2lNRLlaL8Ea80VPVL1JQ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4534216A3; Tue, 10 Mar 2026 00:31:08 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8EA0F3F73B; Tue, 10 Mar 2026 00:31:05 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 3/9] mm/rmap: refactor lazyfree unmap commit path to commit_ttu_lazyfree_folio() Date: Tue, 10 Mar 2026 13:00:07 +0530 Message-Id: <20260310073013.4069309-4-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Clean up the code by refactoring the post-pte-clearing path of lazyfree folio unmapping, into commit_ttu_lazyfree_folio(). No functional change is intended. Signed-off-by: Dev Jain --- mm/rmap.c | 93 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 54 insertions(+), 39 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 1fa020edd954a..a61978141ee3f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1966,6 +1966,57 @@ static inline unsigned int folio_unmap_pte_batch(str= uct folio *folio, FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); } =20 +static inline int commit_ttu_lazyfree_folio(struct vm_area_struct *vma, + struct folio *folio, unsigned long address, pte_t *ptep, + pte_t pteval, long nr_pages) +{ + struct mm_struct *mm =3D vma->vm_mm; + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count =3D folio_ref_count(folio); + map_count =3D folio_mapcount(folio); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_ptes(mm, address, ptep, pteval, nr_pages); + folio_set_swapbacked(folio); + return 1; + } + + if (ref_count !=3D 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_ptes(mm, address, ptep, pteval, nr_pages); + return 1; + } + + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); + return 0; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -2227,46 +2278,10 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, =20 /* MADV_FREE page check */ if (!folio_test_swapbacked(folio)) { - int ref_count, map_count; - - /* - * Synchronize with gup_pte_range(): - * - clear PTE; barrier; read refcount - * - inc refcount; barrier; read PTE - */ - smp_mb(); - - ref_count =3D folio_ref_count(folio); - map_count =3D folio_mapcount(folio); - - /* - * Order reads for page refcount and dirty flag - * (see comments in __remove_mapping()). - */ - smp_rmb(); - - if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { - /* - * redirtied either using the page table or a previously - * obtained GUP reference. - */ - set_ptes(mm, address, pvmw.pte, pteval, nr_pages); - folio_set_swapbacked(folio); + if (commit_ttu_lazyfree_folio(vma, folio, address, + pvmw.pte, pteval, + nr_pages)) goto walk_abort; - } else if (ref_count !=3D 1 + map_count) { - /* - * Additional reference. Could be a GUP reference or any - * speculative reference. GUP users must mark the folio - * dirty if there was a modification. This folio cannot be - * reclaimed right now either way, so act just like nothing - * happened. - * We'll come back here later and detect if the folio was - * dirtied when the additional reference is gone. - */ - set_ptes(mm, address, pvmw.pte, pteval, nr_pages); - goto walk_abort; - } - add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } =20 --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 66E312F0C74 for ; Tue, 10 Mar 2026 07:31:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127885; cv=none; b=iQxd/UEnAcNd9BHKoEqROjV/49b+cGSoR5zJHdA7N43509lViDR/m3PgLY0RCecSfsJJZfjyfCuoF1CHWTQyGla2zwg8/DpcxDPdeln9gLVXcHVUMW3WaZe/IFi+ve/ysJZ5VJFR6FRqf0q62kR5vwNtKMuPPGjG5O2MXhPLAnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127885; c=relaxed/simple; bh=1RHth0AKQ0iQ/QD/EAmiAMaO6b3mkj+ETRkeVR1+Jpg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gy8mNMyW8CPOgBLfAnDryabi6q8YcYtXusHZomdHTKKqCo4iwLVw/DY00Y6tLpVcDjvFItevYKgTSh03YUcbuKv7nFvyL7leXr/rXd9Cc/nms0/6a/TebNdyLPpo09b4JWsPARba5VLc/+kDca6OfkTfnjMxTweES39lxtGLZlg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A178A1756; Tue, 10 Mar 2026 00:31:17 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 075CF3F73B; Tue, 10 Mar 2026 00:31:14 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 4/9] mm/memory: Batch set uffd-wp markers during zapping Date: Tue, 10 Mar 2026 13:00:08 +0530 Message-Id: <20260310073013.4069309-5-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for the next patch, enable batch setting of uffd-wp ptes. The code paths passing nr > 1 to zap_install_uffd_wp_if_needed() produce that nr through either folio_pte_batch or swap_pte_batch, guaranteeing that all ptes are the same w.r.t belonging to the same type of VMA (anonymous or non-anonymous, wp-armed or non-wp-armed), and all being marked with uffd-wp or all being not marked. Note that we will have to use set_pte_at() in a loop instead of set_ptes() since the latter cannot handle present->non-present conversion for nr_pages > 1. Convert documentation of install_uffd_wp_ptes_if_needed to kerneldoc format. No functional change is intended. Signed-off-by: Dev Jain --- include/linux/mm_inline.h | 37 +++++++++++++++++++++++-------------- mm/memory.c | 20 +------------------- mm/rmap.c | 2 +- 3 files changed, 25 insertions(+), 34 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index ad50688d89dba..d69b9abbdf2a7 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -560,21 +560,30 @@ static inline pte_marker copy_pte_marker( return dstm; } =20 -/* - * If this pte is wr-protected by uffd-wp in any form, arm the special pte= to - * replace a none pte. NOTE! This should only be called when *pte is alr= eady +/** + * install_uffd_wp_ptes_if_needed - install uffd-wp marker on PTEs that map + * consecutive pages of the same large folio. + * @vma: The VMA the pages are mapped into. + * @addr: Address the first page of this batch is mapped at. + * @ptep: Page table pointer for the first entry of this batch. + * @pteval: old value of the entry pointed to by ptep. + * @nr: Number of entries to clear (batch size). + * + * If the ptes were wr-protected by uffd-wp in any form, arm special ptes = to + * replace none ptes. NOTE! This should only be called when *pte is alre= ady * cleared so we will never accidentally replace something valuable. Mean= while * none pte also means we are not demoting the pte so tlb flushed is not n= eeded. * E.g., when pte cleared the caller should have taken care of the tlb flu= sh. * - * Must be called with pgtable lock held so that no thread will see the no= ne - * pte, and if they see it, they'll fault and serialize at the pgtable loc= k. + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD + * and the same VMA. * - * Returns true if an uffd-wp pte was installed, false otherwise. + * Returns true if uffd-wp ptes were installed, false otherwise. */ static inline bool -pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long ad= dr, - pte_t *pte, pte_t pteval) +install_uffd_wp_ptes_if_needed(struct vm_area_struct *vma, unsigned long a= ddr, + pte_t *pte, pte_t pteval, unsigned int nr) { bool arm_uffd_pte =3D false; =20 @@ -604,13 +613,13 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, if (unlikely(pte_swp_uffd_wp_any(pteval))) arm_uffd_pte =3D true; =20 - if (unlikely(arm_uffd_pte)) { - set_pte_at(vma->vm_mm, addr, pte, - make_pte_marker(PTE_MARKER_UFFD_WP)); - return true; - } + if (likely(!arm_uffd_pte)) + return false; =20 - return false; + for (int i =3D 0; i < nr; ++i, ++pte, addr +=3D PAGE_SIZE) + set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); + + return true; } =20 static inline bool vma_has_recency(const struct vm_area_struct *vma) diff --git a/mm/memory.c b/mm/memory.c index 38062f8e11656..768646c0b3b6a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1594,29 +1594,11 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct= *vma, unsigned long addr, pte_t *pte, int nr, struct zap_details *details, pte_t pteval) { - bool was_installed =3D false; - - if (!uffd_supports_wp_marker()) - return false; - - /* Zap on anonymous always means dropping everything */ - if (vma_is_anonymous(vma)) - return false; - if (zap_drop_markers(details)) return false; =20 - for (;;) { - /* the PFN in the PTE is irrelevant. */ - if (pte_install_uffd_wp_if_needed(vma, addr, pte, pteval)) - was_installed =3D true; - if (--nr =3D=3D 0) - break; - pte++; - addr +=3D PAGE_SIZE; - } + return install_uffd_wp_ptes_if_needed(vma, addr, pte, pteval, nr); =20 - return was_installed; } =20 static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, diff --git a/mm/rmap.c b/mm/rmap.c index a61978141ee3f..a7570cd037344 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2235,7 +2235,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * we may want to replace a none pte with a marker pte if * it's file-backed, so we don't lose the tracking info. */ - pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D37B333F363 for ; Tue, 10 Mar 2026 07:31:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127895; cv=none; b=LTHbsLmM45BUGRfPppAgchFQIFWiypobVHt6vM6X7d3sLALuCc+ZDnegKPwC8MlkqxWR1/ax/TTx72w4ZIgcUG91YWhdkHrzGnDHQWNP4ynvYsNCco6LGlKWtWkX4ZEXpFGES/pU4eAi0L6z1VWTua6YPmHOA5z579z+A0cg3ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127895; c=relaxed/simple; bh=h9MOL9Da0iIQ6uyEphx2UZuhQMYmkjjG690XweoNU18=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j2ydF5teMvfIrJn0WzQDHUejgSAYDjMLpxP9WAP8rF9cuisyqGIRbqLHIsp3fmVrDkUqg/zZ80I7F2En9/cuIFR5NHPZsht2IcexErG68+xEot+AZcnygdTCVCsTOPtljnG4u26M1Umq4wEm1qmp+6RRkIanG0flpOilWYKRqKY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B2CF1756; Tue, 10 Mar 2026 00:31:27 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 6151A3F73B; Tue, 10 Mar 2026 00:31:24 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Date: Tue, 10 Mar 2026 13:00:09 +0530 Message-Id: <20260310073013.4069309-6-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The ptes are all the same w.r.t belonging to the same type of VMA, and being marked with uffd-wp or all being not marked. Therefore we can batch set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable batched unmapping of folios belonging to uffd-wp VMAs by dropping that condition from folio_unmap_pte_batch. It may happen that we don't batch over the entire folio in one go, in which case, we must skip over the current batch. Add a helper to do that - page_vma_mapped_walk_jump() will increment the relevant fields of pvmw by nr pages. I think that we can get away with just incrementing pvmw->pte and pvmw->address, since looking at the code in page_vma_mapped.c, pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff and pvmw->nr_pages (in vma_address_end()) are used in conjunction, cancelling out the increment and decrement in the respective fields. But let us not rely on the pvmw implementation and keep this simple. Export this function to rmap.h to enable future reuse. Signed-off-by: Dev Jain --- include/linux/rmap.h | 10 ++++++++++ mm/rmap.c | 8 +++----- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8dc0871e5f001..1b7720c66ac87 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct pa= ge_vma_mapped_walk *pvmw) spin_unlock(pvmw->ptl); } =20 +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *= pvmw, + unsigned int nr) +{ + pvmw->pfn +=3D nr; + pvmw->nr_pages -=3D nr; + pvmw->pgoff +=3D nr; + pvmw->pte +=3D nr; + pvmw->address +=3D nr * PAGE_SIZE; +} + /** * page_vma_mapped_walk_restart - Restart the page table walk. * @pvmw: Pointer to struct page_vma_mapped_walk. diff --git a/mm/rmap.c b/mm/rmap.c index a7570cd037344..dd638429c963e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1953,9 +1953,6 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - if (userfaultfd_wp(vma)) - return 1; - /* * If unmap fails, we need to restore the ptes. To avoid accidentally * upgrading write permissions for ptes that were not originally @@ -2235,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * we may want to replace a none pte with a marker pte if * it's file-backed, so we don't lose the tracking info. */ - install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1); + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, nr_pages); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); @@ -2359,8 +2356,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * If we are sure that we batched the entire folio and cleared * all PTEs, we can just optimize and stop right here. */ - if (nr_pages =3D=3D folio_nr_pages(folio)) + if (likely(nr_pages =3D=3D folio_nr_pages(folio))) goto walk_done; + page_vma_mapped_walk_jump(&pvmw, nr_pages - 1); continue; walk_abort: ret =3D false; --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 02BFD1FF1C7 for ; Tue, 10 Mar 2026 07:31:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127904; cv=none; b=LTeluI3zT32PmsKTNdxya+Mz9PAdMUhx2m3us4XCl+lKJZzUQ4XfMGCbCIJtNZ8OjRsyIqdYXGe6jiqh6sZAPgBjOZ04Kepd4UmUBeq53jKtILJFUFX4eHFFtcEyUoXz3GqJxC0veOW55p1fG0qJniTT95gc0x8nlnWr3CB75k4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127904; c=relaxed/simple; bh=0nqDKK6XW0P552LFJYysWThjlcoNIxNGwsQv3yeHGL4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kqt8G6+Y60ZbM2dPBhTjxrythVH2njQbkfDmwWxTB/YEQmsDaFby14+8MB3WNCvqUIH2iQS2O3goXXuCKGJ9Rm4FplM7FhUZxFLgTANXtCGj01vglAd8lh13u1DOTpg1xvLbUdTrP7bl6gGYHotVYIyaSzMunw5sincCtJthQQ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 646211BA8; Tue, 10 Mar 2026 00:31:36 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BC8C73F73B; Tue, 10 Mar 2026 00:31:33 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 6/9] mm/swapfile: Make folio_dup_swap batchable Date: Tue, 10 Mar 2026 13:00:10 +0530 Message-Id: <20260310073013.4069309-7-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Teach folio_dup_swap to handle a batch of consecutive pages. Note that folio_dup_swap already can handle a subset of this: nr_pages =3D=3D 1 and nr_pages =3D=3D folio_nr_pages(folio). Generalize this to any nr_pages. Currently we have a not-so-nice logic of passing in subpage =3D=3D NULL if we mean to exercise the logic on the entire folio, and subpage !=3D NULL if we want to exercise the logic on only that subpage. Remove this indirection, and explicitly pass subpage !=3D NULL, and the number of pages required. Signed-off-by: Dev Jain --- mm/rmap.c | 2 +- mm/shmem.c | 2 +- mm/swap.h | 5 +++-- mm/swapfile.c | 12 +++++------- 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index dd638429c963e..f6d5b187cf09b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2282,7 +2282,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, goto discard; } =20 - if (folio_dup_swap(folio, subpage) < 0) { + if (folio_dup_swap(folio, subpage, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } diff --git a/mm/shmem.c b/mm/shmem.c index 5e7dcf5bc5d3c..86ee34c9b40b3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1695,7 +1695,7 @@ int shmem_writeout(struct folio *folio, struct swap_i= ocb **plug, spin_unlock(&shmem_swaplist_lock); } =20 - folio_dup_swap(folio, NULL); + folio_dup_swap(folio, folio_page(folio, 0), folio_nr_pages(folio)); shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); =20 BUG_ON(folio_mapped(folio)); diff --git a/mm/swap.h b/mm/swap.h index a77016f2423b9..d9cb58ebbddd1 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -206,7 +206,7 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gf= p_t gfp); * folio_put_swap(): does the opposite thing of folio_dup_swap(). */ int folio_alloc_swap(struct folio *folio); -int folio_dup_swap(struct folio *folio, struct page *subpage); +int folio_dup_swap(struct folio *folio, struct page *subpage, unsigned int= nr_pages); void folio_put_swap(struct folio *folio, struct page *subpage); =20 /* For internal use */ @@ -390,7 +390,8 @@ static inline int folio_alloc_swap(struct folio *folio) return -EINVAL; } =20 -static inline int folio_dup_swap(struct folio *folio, struct page *page) +static inline int folio_dup_swap(struct folio *folio, struct page *page, + unsigned int nr_pages) { return -EINVAL; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 915bc93964dbd..eaf61ae6c3817 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1738,7 +1738,8 @@ int folio_alloc_swap(struct folio *folio) /** * folio_dup_swap() - Increase swap count of swap entries of a folio. * @folio: folio with swap entries bounded. - * @subpage: if not NULL, only increase the swap count of this subpage. + * @subpage: Increase the swap count of this subpage till nr number of + * pages forward. * * Typically called when the folio is unmapped and have its swap entry to * take its place: Swap entries allocated to a folio has count =3D=3D 0 an= d pinned @@ -1752,18 +1753,15 @@ int folio_alloc_swap(struct folio *folio) * swap_put_entries_direct on its swap entry before this helper returns, or * the swap count may underflow. */ -int folio_dup_swap(struct folio *folio, struct page *subpage) +int folio_dup_swap(struct folio *folio, struct page *subpage, + unsigned int nr_pages) { swp_entry_t entry =3D folio->swap; - unsigned long nr_pages =3D folio_nr_pages(folio); =20 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio); =20 - if (subpage) { - entry.val +=3D folio_page_idx(folio, subpage); - nr_pages =3D 1; - } + entry.val +=3D folio_page_idx(folio, subpage); =20 return swap_dup_entries_cluster(swap_entry_to_info(entry), swp_offset(entry), nr_pages); --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B074F3D7D76 for ; Tue, 10 Mar 2026 07:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127914; cv=none; b=bG92nIU7yHc6DrPzbcLCwdkCS8QTdKfDQriuVmV6M/ScqGEsPi2n5sg6IctgoYKLfzgTuz/LZmKK6wpiFHaLovM5u7JHudDcMZ8/oHpWVA46ma97gxclg6+5irUQlwsz3Uss63tI3LZcRTsE/Yuk4VC0R70wBhWjtVBJ8Z06lDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127914; c=relaxed/simple; bh=zkV2meQ9GWfwoZ/TcLBxK/JKCj5phfpJaDcYCQhnxOs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iD/rEGIakJe5oC3D2CH/yQMU0BjigT4sb6PcaaXL9OsYBSUadJNoH5xqbpq8bBOUY8JFDEbEY3BYu5Km3feT43F4M9ashyJs/AMlPAvrw4LIsquOmmFnhlq+PgUub/G7vZGy0XEqKSAESSX7jPaaB960AD9K4T6ZswCkSCbxH6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C27501BB0; Tue, 10 Mar 2026 00:31:45 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 21F903F73B; Tue, 10 Mar 2026 00:31:42 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 7/9] mm/swapfile: Make folio_put_swap batchable Date: Tue, 10 Mar 2026 13:00:11 +0530 Message-Id: <20260310073013.4069309-8-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Teach folio_put_swap to handle a batch of consecutive pages. Note that folio_put_swap already can handle a subset of this: nr_pages =3D=3D 1 and nr_pages =3D=3D folio_nr_pages(folio). Generalize this to any nr_pages. Currently we have a not-so-nice logic of passing in subpage =3D=3D NULL if we mean to exercise the logic on the entire folio, and subpage !=3D NULL if we want to exercise the logic on only that subpage. Remove this indirection, and explicitly pass subpage !=3D NULL, and the number of pages required. Signed-off-by: Dev Jain --- mm/memory.c | 6 +++--- mm/rmap.c | 4 ++-- mm/shmem.c | 6 +++--- mm/swap.h | 5 +++-- mm/swapfile.c | 13 +++++-------- 5 files changed, 16 insertions(+), 18 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 768646c0b3b6a..8249a9b7083ab 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5002,7 +5002,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(folio !=3D swapcache)) { folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); - folio_put_swap(swapcache, NULL); + folio_put_swap(swapcache, folio_page(swapcache, 0), folio_nr_pages(swapc= ache)); } else if (!folio_test_anon(folio)) { /* * We currently only expect !anon folios that are fully @@ -5011,12 +5011,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) VM_WARN_ON_ONCE_FOLIO(folio_nr_pages(folio) !=3D nr_pages, folio); VM_WARN_ON_ONCE_FOLIO(folio_mapped(folio), folio); folio_add_new_anon_rmap(folio, vma, address, rmap_flags); - folio_put_swap(folio, NULL); + folio_put_swap(folio, folio_page(folio, 0), folio_nr_pages(folio)); } else { VM_WARN_ON_ONCE(nr_pages !=3D 1 && nr_pages !=3D folio_nr_pages(folio)); folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, rmap_flags); - folio_put_swap(folio, nr_pages =3D=3D 1 ? page : NULL); + folio_put_swap(folio, page, nr_pages); } =20 VM_BUG_ON(!folio_test_anon(folio) || diff --git a/mm/rmap.c b/mm/rmap.c index f6d5b187cf09b..42f6b00cced01 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2293,7 +2293,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * so we'll not check/care. */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { - folio_put_swap(folio, subpage); + folio_put_swap(folio, subpage, 1); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } @@ -2301,7 +2301,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, /* See folio_try_share_anon_rmap(): clear PTE first. */ if (anon_exclusive && folio_try_share_anon_rmap_pte(folio, subpage)) { - folio_put_swap(folio, subpage); + folio_put_swap(folio, subpage, 1); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } diff --git a/mm/shmem.c b/mm/shmem.c index 86ee34c9b40b3..d9d216ea28ecb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1716,7 +1716,7 @@ int shmem_writeout(struct folio *folio, struct swap_i= ocb **plug, /* Swap entry might be erased by racing shmem_free_swap() */ if (!error) { shmem_recalc_inode(inode, 0, -nr_pages); - folio_put_swap(folio, NULL); + folio_put_swap(folio, folio_page(folio, 0), folio_nr_pages(folio)); } =20 /* @@ -2196,7 +2196,7 @@ static void shmem_set_folio_swapin_error(struct inode= *inode, pgoff_t index, =20 nr_pages =3D folio_nr_pages(folio); folio_wait_writeback(folio); - folio_put_swap(folio, NULL); + folio_put_swap(folio, folio_page(folio, 0), folio_nr_pages(folio)); swap_cache_del_folio(folio); /* * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks @@ -2426,7 +2426,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - folio_put_swap(folio, NULL); + folio_put_swap(folio, folio_page(folio, 0), folio_nr_pages(folio)); swap_cache_del_folio(folio); folio_mark_dirty(folio); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index d9cb58ebbddd1..73fd9faa67608 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -207,7 +207,7 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gf= p_t gfp); */ int folio_alloc_swap(struct folio *folio); int folio_dup_swap(struct folio *folio, struct page *subpage, unsigned int= nr_pages); -void folio_put_swap(struct folio *folio, struct page *subpage); +void folio_put_swap(struct folio *folio, struct page *subpage, unsigned in= t nr_pages); =20 /* For internal use */ extern void __swap_cluster_free_entries(struct swap_info_struct *si, @@ -396,7 +396,8 @@ static inline int folio_dup_swap(struct folio *folio, s= truct page *page, return -EINVAL; } =20 -static inline void folio_put_swap(struct folio *folio, struct page *page) +static inline void folio_put_swap(struct folio *folio, struct page *page, + unsigned int nr_pages) { } =20 diff --git a/mm/swapfile.c b/mm/swapfile.c index eaf61ae6c3817..c66aa6d15d479 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1770,25 +1770,22 @@ int folio_dup_swap(struct folio *folio, struct page= *subpage, /** * folio_put_swap() - Decrease swap count of swap entries of a folio. * @folio: folio with swap entries bounded, must be in swap cache and lock= ed. - * @subpage: if not NULL, only decrease the swap count of this subpage. + * @subpage: decrease the swap count of this subpage till nr_pages. * * This won't free the swap slots even if swap count drops to zero, they a= re * still pinned by the swap cache. User may call folio_free_swap to free t= hem. * Context: Caller must ensure the folio is locked and in the swap cache. */ -void folio_put_swap(struct folio *folio, struct page *subpage) +void folio_put_swap(struct folio *folio, struct page *subpage, + unsigned int nr_pages) { swp_entry_t entry =3D folio->swap; - unsigned long nr_pages =3D folio_nr_pages(folio); struct swap_info_struct *si =3D __swap_entry_to_info(entry); =20 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio); =20 - if (subpage) { - entry.val +=3D folio_page_idx(folio, subpage); - nr_pages =3D 1; - } + entry.val +=3D folio_page_idx(folio, subpage); =20 swap_put_entries_cluster(si, swp_offset(entry), nr_pages, false); } @@ -2334,7 +2331,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, new_pte =3D pte_mkuffd_wp(new_pte); setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); - folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); + folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry)), = 1); out: if (pte) pte_unmap_unlock(pte, ptl); --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ED0DD3E8C5B for ; Tue, 10 Mar 2026 07:32:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127923; cv=none; b=vFbtG801vz8lGa97foOQH1BK3MEWE0LkyvFtUuaGCIKTKNurfjuVuKQjfsDarnTiNrFkMjfGD8MJxas/tx2o9FuJ4hY9QDnW1VVpTefwrUuc9sR7ZQqivhoV5tkplAou7D3vgH6SoeqGdw6wou8W41Zuw37JOiCU677PkEFPcBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127923; c=relaxed/simple; bh=MAfpioUlLrma/8URjc27SXE2Hp96+9Af8RU08va+o90=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=skCKk2YyWhv+Un+nXVTwA+sEqHpMZPPF8Blzyr6NyWCVd3LX6YM4eNDMFNXSREJ1l+kyeFCNbW0apYBz0ovENZEJhtWbJfVOpGFxMIsOeaCcNzsLHecvRHq8MmKhIU2N0/VspWFDenWJt5/hUeT0S8Ix2G6eGoPWKqXzHeZG6kY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 28E431F37; Tue, 10 Mar 2026 00:31:55 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8352E3F73B; Tue, 10 Mar 2026 00:31:52 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes Date: Tue, 10 Mar 2026 13:00:12 +0530 Message-Id: <20260310073013.4069309-9-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the quest of enabling batched unmapping of anonymous folios, we need to handle the sharing of exclusive pages. Hence, a batched version of folio_try_share_anon_rmap_pte is required. Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is to do some rmap sanity checks. Add helpers to set and clear the PageAnonExclusive bit on a batch of nr_pages. Note that __folio_try_share_anon_rmap can receive nr_pages =3D=3D HPAGE_PMD_NR from t= he PMD path, but currently we only clear the bit on the head page. Retain this behaviour by setting nr_pages =3D 1 in case the caller is folio_try_share_anon_rmap_pmd. Signed-off-by: Dev Jain --- include/linux/page-flags.h | 11 +++++++++++ include/linux/rmap.h | 28 ++++++++++++++++++++++++++-- mm/rmap.c | 2 +- 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0e03d816e8b9d..1d74ed9a28c41 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive= (struct page *page) __clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f); } =20 +static __always_inline void ClearPagesAnonExclusive(struct page *page, + unsigned int nr) +{ + for (;;) { + ClearPageAnonExclusive(page); + if (--nr =3D=3D 0) + break; + ++page; + } +} + #ifdef CONFIG_MMU #define __PG_MLOCKED (1UL << PG_mlocked) #else diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 1b7720c66ac87..7a67776dca3fe 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap= (struct folio *folio, VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio); __folio_rmap_sanity_checks(folio, page, nr_pages, level); =20 + /* We only clear anon-exclusive from head page of PMD folio */ + if (level =3D=3D PGTABLE_LEVEL_PMD) + nr_pages =3D 1; + /* device private folios cannot get pinned via GUP. */ if (unlikely(folio_is_device_private(folio))) { - ClearPageAnonExclusive(page); + ClearPagesAnonExclusive(page, nr_pages); return 0; } =20 @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(= struct folio *folio, =20 if (unlikely(folio_maybe_dma_pinned(folio))) return -EBUSY; - ClearPageAnonExclusive(page); + ClearPagesAnonExclusive(page, nr_pages); =20 /* * This is conceptually a smp_wmb() paired with the smp_rmb() in @@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct= folio *folio, return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE); } =20 +/** + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages + * mapped by PTEs possibly shared to prepare + * for KSM or temporary unmapping + * @folio: The folio to share a mapping of + * @page: The first mapped exclusive page of the batch + * @nr_pages: The number of pages to share (batch size) + * + * See folio_try_share_anon_rmap_pte for full description. + * + * Context: The caller needs to hold the page table lock and has to have t= he + * page table entries cleared/invalidated. Those PTEs used to map consecut= ive + * pages of the folio passed here. The PTEs are all in the same PMD and VM= A. + */ +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio, + struct page *page, unsigned int nr) +{ + return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE); +} + /** * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page * range mapped by a PMD possibly shared to diff --git a/mm/rmap.c b/mm/rmap.c index 42f6b00cced01..bba5b571946d8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, =20 /* See folio_try_share_anon_rmap(): clear PTE first. */ if (anon_exclusive && - folio_try_share_anon_rmap_pte(folio, subpage)) { + folio_try_share_anon_rmap_ptes(folio, subpage, 1)) { folio_put_swap(folio, subpage, 1); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; --=20 2.34.1 From nobody Thu Apr 9 08:08:50 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2E0182F5487 for ; Tue, 10 Mar 2026 07:32:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127932; cv=none; b=oAYDyxzyUTCeSP1VOc8VV9Y38sh9ouK4QcY1/SFAPzuylTPDxTqeJOcjgAZgOglp+flpIfAfPg4QeFwXzwFe6EZa2AcCTYRLUaLdurYS1/braCl+Kje9LOQtOAENLfaYnSmDX41mTSnz4/xQF9EcIupzOzedRqXVjh79Mb93XPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773127932; c=relaxed/simple; bh=KfJZnjNE4oPOQv6WVZOjNtYzzTPPXMuY2EH70nTP6tc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BZMYjVQVhlX2AxCwGQifoYZDUX5KDxgO2INHNQjrRTIRJ12qtGoauzfjvp2SLNuFstzsMFRO27SjlL6zwG+L38B12t0yy9DOnPKRAF2KoIQXi7TmPQdlWrfsRP7JJXr8NrB1n8Pt8dU3oAQHTQKyW43q3fHpJCHMwokiMkd8BBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 886A01FC4; Tue, 10 Mar 2026 00:32:04 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id DE7BD3F73B; Tue, 10 Mar 2026 00:32:01 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, david@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: weixugc@google.com, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org, willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH 9/9] mm/rmap: enable batch unmapping of anonymous folios Date: Tue, 10 Mar 2026 13:00:13 +0530 Message-Id: <20260310073013.4069309-10-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260310073013.4069309-1-dev.jain@arm.com> References: <20260310073013.4069309-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable batch clearing of ptes, and batch swap setting of ptes for anon folio unmapping. Processing all ptes of a large folio in one go helps us batch across atomics (add_mm_counter etc), barriers (in the function __folio_try_share_anon_rmap), repeated calls to page_vma_mapped_walk(), to name a few. In general, batching helps us to execute similar code together, making the execution of the program more memory and CPU friendly. The handling of anon-exclusivity is very similar to commit cac1db8c3aad ("mm: optimize mprotect() by PTE batching"). Since folio_unmap_pte_batch() won't look at the bits of the underlying page, we need to process sub-batches of ptes pointing to pages which are same w.r.t exclusivity, and batch set only those ptes to swap ptes in one go. Hence export page_anon_exclusive_sub_batch() to internal.h and reuse it. arch_unmap_one() is only defined for sparc64; I am not comfortable changing that bit of code to enable batching, the nuances between retrieving the pfn from pte_pfn() or from (paddr =3D pte_val(oldpte) & _PAGE_PADDR_4V) (and, pte_next_pfn() can't even be called from arch_unmap_one() because that file does not include pgtable.h), especially when I have no way to test the code. So just disable the "sparc64-anon-swapbacked" case for now. We need to take care of rmap accounting (folio_remove_rmap_ptes) and reference accounting (folio_put_refs) when anon folio unmap succeeds. In case we partially batch the large folio and fail, we need to correctly do the accounting for pages which were successfully unmapped. So, put this accounting code in __commit_ttu_anon_swapbacked_folio() itself, instead of doing some horrible goto jumping at the callsite of commit_ttu_anon_swapbacked_folio(). Similarly, do the jumping-over-batch immediately after we succeed in the unmapping of the entire batch, and continue to the next (unlikely) iteration. Add a comment at relevant places to say that we are on a device-exclusive entry and not a present entry. Signed-off-by: Dev Jain --- mm/internal.h | 26 ++++++++ mm/mprotect.c | 17 ----- mm/rmap.c | 170 +++++++++++++++++++++++++++++++++++--------------- 3 files changed, 144 insertions(+), 69 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 95b583e7e4f75..c29ecc334a06b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -393,6 +393,32 @@ static inline unsigned int folio_pte_batch_flags(struc= t folio *folio, unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte, unsigned int max_nr); =20 +/** + * page_anon_exclusive_sub_batch - Determine length of consecutive exclusi= ve + * or maybe shared pages + * @start_idx: Starting index of the page array to scan from + * @max_len: Maximum length to look at + * @first_page: First page of the page array + * @expected_anon_exclusive: Whether to look for exclusive or !exclusive p= ages + * + * Determines length of consecutive ptes, pointing to pages being the same + * w.r.t the PageAnonExclusive bit. + * + * Context: The ptes point to consecutive pages of the same large folio. T= he + * ptes belong to the same PMD and VMA. + */ +static inline int page_anon_exclusive_sub_batch(int start_idx, int max_len, + struct page *first_page, bool expected_anon_exclusive) +{ + int idx; + + for (idx =3D start_idx + 1; idx < start_idx + max_len; ++idx) { + if (expected_anon_exclusive !=3D PageAnonExclusive(first_page + idx)) + break; + } + return idx - start_idx; +} + /** * pte_move_swp_offset - Move the swap entry offset field of a swap pte * forward or backward by delta diff --git a/mm/mprotect.c b/mm/mprotect.c index 9681f055b9fca..9403171d648b6 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -138,23 +138,6 @@ static void prot_commit_flush_ptes(struct vm_area_stru= ct *vma, unsigned long add tlb_flush_pte_range(tlb, addr, nr_ptes * PAGE_SIZE); } =20 -/* - * Get max length of consecutive ptes pointing to PageAnonExclusive() page= s or - * !PageAnonExclusive() pages, starting from start_idx. Caller must enforce - * that the ptes point to consecutive pages of the same anon large folio. - */ -static int page_anon_exclusive_sub_batch(int start_idx, int max_len, - struct page *first_page, bool expected_anon_exclusive) -{ - int idx; - - for (idx =3D start_idx + 1; idx < start_idx + max_len; ++idx) { - if (expected_anon_exclusive !=3D PageAnonExclusive(first_page + idx)) - break; - } - return idx - start_idx; -} - /* * This function is a result of trying our very best to retain the * "avoid the write-fault handler" optimization. In can_change_pte_writabl= e(), diff --git a/mm/rmap.c b/mm/rmap.c index bba5b571946d8..334350caf40b0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1946,11 +1946,11 @@ static inline unsigned int folio_unmap_pte_batch(st= ruct folio *folio, end_addr =3D pmd_addr_end(addr, vma->vm_end); max_nr =3D (end_addr - addr) >> PAGE_SHIFT; =20 - /* We only support lazyfree or file folios batching for now ... */ - if (folio_test_anon(folio) && folio_test_swapbacked(folio)) + if (pte_unused(pte)) return 1; =20 - if (pte_unused(pte)) + if (__is_defined(__HAVE_ARCH_UNMAP_ONE) && folio_test_anon(folio) && + folio_test_swapbacked(folio)) return 1; =20 /* @@ -1963,6 +1963,112 @@ static inline unsigned int folio_unmap_pte_batch(st= ruct folio *folio, FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); } =20 +static inline void set_swp_ptes(struct mm_struct *mm, unsigned long addres= s, + pte_t *ptep, swp_entry_t entry, pte_t pteval, bool anon_exclusive, + unsigned int nr_pages) +{ + pte_t swp_pte =3D swp_entry_to_pte(entry); + + if (anon_exclusive) + swp_pte =3D pte_swp_mkexclusive(swp_pte); + + if (likely(pte_present(pteval))) { + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + /* Device-exclusive entry: nr_pages is 1. */ + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } + + for (int i =3D 0; i < nr_pages; ++i, ++ptep, address +=3D PAGE_SIZE) { + set_pte_at(mm, address, ptep, swp_pte); + swp_pte =3D pte_next_swp_offset(swp_pte); + } +} + +static inline int __commit_ttu_anon_swapbacked_folio(struct vm_area_struct= *vma, + struct folio *folio, struct page *subpage, unsigned long address, + pte_t *ptep, pte_t pteval, long nr_pages, bool anon_exclusive) +{ + swp_entry_t entry =3D page_swap_entry(subpage); + struct mm_struct *mm =3D vma->vm_mm; + + if (folio_dup_swap(folio, subpage, nr_pages) < 0) { + set_ptes(mm, address, ptep, pteval, nr_pages); + return 1; + } + + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + VM_WARN_ON(nr_pages !=3D 1); + folio_put_swap(folio, subpage, nr_pages); + set_pte_at(mm, address, ptep, pteval); + return 1; + } + + /* See folio_try_share_anon_rmap(): clear PTE first. */ + if (anon_exclusive && folio_try_share_anon_rmap_ptes(folio, subpage, nr_p= ages)) { + folio_put_swap(folio, subpage, nr_pages); + set_ptes(mm, address, ptep, pteval, nr_pages); + return 1; + } + + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); + add_mm_counter(mm, MM_SWAPENTS, nr_pages); + set_swp_ptes(mm, address, ptep, entry, pteval, anon_exclusive, nr_pages); + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + if (vma->vm_flags & VM_LOCKED) + mlock_drain_local(); + folio_put_refs(folio, nr_pages); + return 0; +} + +static inline int commit_ttu_anon_swapbacked_folio(struct vm_area_struct *= vma, + struct folio *folio, struct page *first_page, unsigned long address, + pte_t *ptep, pte_t pteval, long nr_pages) +{ + bool expected_anon_exclusive; + int sub_batch_idx =3D 0; + int len, err; + + for (;;) { + expected_anon_exclusive =3D PageAnonExclusive(first_page + sub_batch_idx= ); + len =3D page_anon_exclusive_sub_batch(sub_batch_idx, nr_pages, + first_page, expected_anon_exclusive); + err =3D __commit_ttu_anon_swapbacked_folio(vma, folio, first_page + sub_= batch_idx, + address, ptep, pteval, len, expected_anon_exclusive); + if (err) + return err; + + nr_pages -=3D len; + if (!nr_pages) + break; + + pteval =3D pte_advance_pfn(pteval, len); + address +=3D len * PAGE_SIZE; + sub_batch_idx +=3D len; + ptep +=3D len; + } + + return 0; +} + static inline int commit_ttu_lazyfree_folio(struct vm_area_struct *vma, struct folio *folio, unsigned long address, pte_t *ptep, pte_t pteval, long nr_pages) @@ -2022,7 +2128,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - bool anon_exclusive, ret =3D true; + bool ret =3D true; pte_t pteval; struct page *subpage; struct mmu_notifier_range range; @@ -2148,8 +2254,6 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, =20 subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; - anon_exclusive =3D folio_test_anon(folio) && - PageAnonExclusive(subpage); =20 if (folio_test_hugetlb(folio)) { bool anon =3D folio_test_anon(folio); @@ -2224,6 +2328,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, if (pte_dirty(pteval)) folio_mark_dirty(folio); } else { + /* Device-exclusive entry */ pte_clear(mm, address, pvmw.pte); } =20 @@ -2261,8 +2366,6 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, */ dec_mm_counter(mm, mm_counter(folio)); } else if (folio_test_anon(folio)) { - swp_entry_t entry =3D page_swap_entry(subpage); - pte_t swp_pte; /* * Store the swap location in the pte. * See handle_pte_fault() ... @@ -2282,52 +2385,15 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, goto discard; } =20 - if (folio_dup_swap(folio, subpage, 1) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (commit_ttu_anon_swapbacked_folio(vma, folio, subpage, + address, pvmw.pte, + pteval, nr_pages)) goto walk_abort; - } =20 - /* - * arch_unmap_one() is expected to be a NOP on - * architectures where we could have PFN swap PTEs, - * so we'll not check/care. - */ - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - folio_put_swap(folio, subpage, 1); - set_pte_at(mm, address, pvmw.pte, pteval); - goto walk_abort; - } - - /* See folio_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - folio_try_share_anon_rmap_ptes(folio, subpage, 1)) { - folio_put_swap(folio, subpage, 1); - set_pte_at(mm, address, pvmw.pte, pteval); - goto walk_abort; - } - if (list_empty(&mm->mmlist)) { - spin_lock(&mmlist_lock); - if (list_empty(&mm->mmlist)) - list_add(&mm->mmlist, &init_mm.mmlist); - spin_unlock(&mmlist_lock); - } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); - swp_pte =3D swp_entry_to_pte(entry); - if (anon_exclusive) - swp_pte =3D pte_swp_mkexclusive(swp_pte); - if (likely(pte_present(pteval))) { - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } else { - if (pte_swp_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (likely(nr_pages =3D=3D folio_nr_pages(folio))) + goto walk_done; + page_vma_mapped_walk_jump(&pvmw, nr_pages - 1); + continue; } else { /* * This is a locked file-backed folio, --=20 2.34.1