From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1B74D33F5A9 for ; Fri, 10 Apr 2026 10:32:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817151; cv=none; b=jH3uo09x9javHLbWbjmtKiAzzdoDBrAInR6baNaiS1PhGuryLECcUesZmAmxz++YOSAZAJKaKhKfeQQs62MmrVhX/zuesx+FzQq885rroehjcKf9lp385DgVC//g4BZS8ux6UjnxXqiW4EtXI3tW0sqtm1MU4ZdDNYPHDzxtskY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817151; c=relaxed/simple; bh=b08HICgezPVD7gUzyQf2pI1dALNHj0lOGqN3C5nNR0Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=RvVTVzFdOiMpEJ8+Md6/7cT/B/GXHni7RCWKoTlnfud30+8NQ7FvyoGxNAvAS/G6vaTEHytWGflZ3wEbuIaO+Hfuj/3jPQEAROwnakRlQJi+j6l12hxfxwMb2euRoqIckyW4O+00wz0s08XRPoNY00dD/69nEDnNDGzIdnzm7WM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=UXc3vPoC; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="UXc3vPoC" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D07B41D13; Fri, 10 Apr 2026 03:32:23 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5C4853FAF5; Fri, 10 Apr 2026 03:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817149; bh=b08HICgezPVD7gUzyQf2pI1dALNHj0lOGqN3C5nNR0Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UXc3vPoCgwsQbgN+hDlEu3IRjHexFrPZmZsCCe4+iKS7P2XZFV6VtHkFHXQ6E0Ubr 1FEXUb/fAwuvXn/4Yq7VI3p4t9ki5EU1Jl953XMq83vVMouhMrLnLJsM7YJR3IDaEt VkOcTq+8HOT3JjA5js+iX624XwvwQwLxVI0HBGd4= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 1/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Date: Fri, 10 Apr 2026 16:01:56 +0530 Message-Id: <20260410103204.120409-2-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Initialize nr_pages to 1 at the start of each loop iteration, like folio_referenced_one() does. Without this, nr_pages computed by a previous folio_unmap_pte_batch() call can be reused on a later iteration that does not run folio_unmap_pte_batch() again. I don=E2=80=99t think this is causing a bug today, but it is fragile. A real bug would require this sequence within the same try_to_unmap_one() call: 1. Hit the pte_present(pteval) branch and set nr_pages > 1. 2. Later hit the else branch and do pte_clear() for device-exclusive PTE, and execute rest of the code with nr_pages > 1. Executing the above would imply a lazyfree folio is mapped by a mix of present PTEs and device-exclusive PTEs. In practice, device-exclusive PTEs imply a GUP pin on the folio, and lazyfree unmapping aborts try_to_unmap_one() when it detects that condition. So today this likely does not manifest, but initializing nr_pages per-iteration is still the correct and safer behavior. Signed-off-by: Dev Jain Acked-by: Barry Song --- mm/rmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 78b7fb5f367ce..62a8c912fd788 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1991,7 +1991,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, struct page *subpage; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; - unsigned long nr_pages =3D 1, end_addr; + unsigned long nr_pages; + unsigned long end_addr; unsigned long pfn; unsigned long hsz =3D 0; int ptes =3D 0; @@ -2030,6 +2031,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); =20 while (page_vma_mapped_walk(&pvmw)) { + nr_pages =3D 1; /* * If the folio is in an mlock()d vma, we must not swap it out. */ --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 40C4A33DEE0 for ; Fri, 10 Apr 2026 10:32:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817160; cv=none; b=pJxjStbT8qFgbvnggBT5MUW6KeQlXTQivlCIEdOje/ofHJjTMioqGWHz2XZiOuGANM5wKIXMgKhzXmU26oTt8nVZlGQZEDNSSelpu1AoMZL4rboNx+HQdcWIMmLcGxBuu4kwxpLz2M9/u/Gy33djJ8KPwXpZpKI1ha+yMqL64Vw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817160; c=relaxed/simple; bh=KqK/BLpNLmdQmMsGUTYd1FsmAxRcfYnyllCg9XkaRKY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OilSvucTUO03hLFyOCzhd4WGVQDqAerOaa7id+Kmbhn+OHfQaSfd8kmzq1V7V2YQv2O5kzZgUITPE4ilygXFAcw/QCSE23xy8qsVmi+iuBzaFpLq8xujzmgEbx0eoQmEymt+6wcJifeoc4KsX9Q80M5ezHg+MDOoK8/VEQC2C9g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=hlfgag3m; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="hlfgag3m" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 90C291D15; Fri, 10 Apr 2026 03:32:32 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1D7403FAF5; Fri, 10 Apr 2026 03:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817158; bh=KqK/BLpNLmdQmMsGUTYd1FsmAxRcfYnyllCg9XkaRKY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hlfgag3mwCqDdiXjfNp6lxjv9ajfCvqnk5tv2ugz0FTuP0baBeKU00MNal0bmhW/c wKEcljzN0/PwWMn7eJTz8IKalf1LNPmZMCozknskvDnUGYDX93b1f+AZTCCBfCQ/8J CPk8cUx3BxA54lJSKuwhbRRgJzyFaZfSDXpZ0DxE= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one Date: Fri, 10 Apr 2026 16:01:57 +0530 Message-Id: <20260410103204.120409-3-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify the code by refactoring the folio_test_hugetlb() branch into a new function. No functional change is intended. Signed-off-by: Dev Jain --- mm/rmap.c | 116 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 67 insertions(+), 49 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 62a8c912fd788..a9c43e2f6e695 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1978,6 +1978,67 @@ static inline unsigned int folio_unmap_pte_batch(str= uct folio *folio, FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); } =20 +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma, + struct folio *folio, struct page_vma_mapped_walk *pvmw, + struct page *page, enum ttu_flags flags, pte_t *pteval, + struct mmu_notifier_range *range, bool *walk_done) +{ + /* + * The try_to_unmap() is only passed a hugetlb page + * in the case where the hugetlb page is poisoned. + */ + VM_WARN_ON_PAGE(!PageHWPoison(page), page); + /* + * huge_pmd_unshare may unmap an entire PMD page. + * There is no way of knowing exactly which PMDs may + * be cached for this mm, so we must flush them all. + * start/end were already adjusted above to cover this + * range. + */ + flush_cache_range(vma, range->start, range->end); + + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. + */ + if (!folio_test_anon(folio)) { + struct mmu_gather tlb; + + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + *walk_done =3D true; + return false; + } + + tlb_gather_mmu_vma(&tlb, vma); + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) { + hugetlb_vma_unlock_write(vma); + huge_pmd_unshare_flush(&tlb, vma); + tlb_finish_mmu(&tlb); + /* + * The PMD table was unmapped, + * consequently unmapping the folio. + */ + *walk_done =3D true; + return true; + } + hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); + } + *pteval =3D huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte); + if (pte_dirty(*pteval)) + folio_mark_dirty(folio); + + *walk_done =3D false; + return true; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -2115,56 +2176,13 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, PageAnonExclusive(subpage); =20 if (folio_test_hugetlb(folio)) { - bool anon =3D folio_test_anon(folio); - - /* - * The try_to_unmap() is only passed a hugetlb page - * in the case where the hugetlb page is poisoned. - */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); - /* - * huge_pmd_unshare may unmap an entire PMD page. - * There is no way of knowing exactly which PMDs may - * be cached for this mm, so we must flush them all. - * start/end were already adjusted above to cover this - * range. - */ - flush_cache_range(vma, range.start, range.end); + bool walk_done; =20 - /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. - */ - if (!anon) { - struct mmu_gather tlb; - - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); - if (!hugetlb_vma_trylock_write(vma)) - goto walk_abort; - - tlb_gather_mmu_vma(&tlb, vma); - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) { - hugetlb_vma_unlock_write(vma); - huge_pmd_unshare_flush(&tlb, vma); - tlb_finish_mmu(&tlb); - /* - * The PMD table was unmapped, - * consequently unmapping the folio. - */ - goto walk_done; - } - hugetlb_vma_unlock_write(vma); - tlb_finish_mmu(&tlb); - } - pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + ret =3D unmap_hugetlb_folio(vma, folio, &pvmw, subpage, + flags, &pteval, &range, + &walk_done); + if (walk_done) + goto walk_done; } else if (likely(pte_present(pteval))) { nr_pages =3D folio_unmap_pte_batch(folio, &pvmw, flags, pteval); end_addr =3D address + nr_pages * PAGE_SIZE; --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 90EE633DEE0 for ; Fri, 10 Apr 2026 10:32:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817169; cv=none; b=qGTmDWbT1v/5B6orVwuTayBIn8z2NF+W/AYneJwmHJt5aH+1Xa8nUhQJMpGArf9upJeCUVNx0O5lBPany/lDDS78qmMefL1bjn1eTy68KVuGjYiCce5IbmhK1cSx4KXO5c82vUABQGUDrhPW8ZPXNu8xVHfkJy6Xo72rXV/EZy4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817169; c=relaxed/simple; bh=X7DlvXowYL6w+6n3AW64YW79PXkvSOgsC12lJIyy+DA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bpxSERVOX4kDk4yBB/g1ddRa4F/veRNPYfwlC98wm7bbj/LBLW8hZ9C2aqkfGN0evmG+ps+p3tt21JSqPEYQt15uE3SKRYxzmFlbNE6YWNpxdQAcG2LYlM+Nyc/+EawPjRZ1eaGdnb+ZdsNte1mB5481MReN2sCV+GszxMgRRok= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=chmVQdNF; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="chmVQdNF" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4EA5F2681; Fri, 10 Apr 2026 03:32:41 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D22F33FAF5; Fri, 10 Apr 2026 03:32:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817167; bh=X7DlvXowYL6w+6n3AW64YW79PXkvSOgsC12lJIyy+DA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=chmVQdNFRnxFZW1c2PeiQ45e2n4251vCjuRZc+/Ml+7/MHJC7bt7lwbm7ixvDeX+C 0gfeMHhQQxxQnRTUjY3X6UgbYTRXDReYyHWTuq1h9hyGe4I4EgCriYLjWt5vcyDVcq RJ5e1f2yFbEjw2vL/ZLDYEMQLNaWmCAoSBeZklIM= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 3/9] mm/rmap: refactor some code around lazyfree folio unmapping Date: Fri, 10 Apr 2026 16:01:58 +0530 Message-Id: <20260410103204.120409-4-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For lazyfree folio unmapping, after clearing the ptes we must abort the operation if the folio got dirtied or it has unexpected references. Refactor this logic into a function which will return whether we need to abort or not. If we abort, we restore the ptes and bail out of try_to_unmap_one. Otherwise adjust the rss stats of the mm and jump to a label. Also rename that label from "discard" to "finish_unmap"; the former is appropriate in the lazyfree context, but the code following the label is executed for other successful unmap code paths too, so 'discard' does not sound correct for them. Signed-off-by: Dev Jain --- mm/rmap.c | 95 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 55 insertions(+), 40 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index a9c43e2f6e695..fa5d6599dedf0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1978,6 +1978,56 @@ static inline unsigned int folio_unmap_pte_batch(str= uct folio *folio, FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); } =20 +static inline bool can_unmap_lazyfree_folio_range(struct vm_area_struct *v= ma, + struct folio *folio, unsigned long address, pte_t *ptep, + pte_t pteval, unsigned long nr_pages) +{ + struct mm_struct *mm =3D vma->vm_mm; + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count =3D folio_ref_count(folio); + map_count =3D folio_mapcount(folio); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_ptes(mm, address, ptep, pteval, nr_pages); + folio_set_swapbacked(folio); + return false; + } + + if (ref_count !=3D 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_ptes(mm, address, ptep, pteval, nr_pages); + return false; + } + + return true; +} + static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma, struct folio *folio, struct page_vma_mapped_walk *pvmw, struct page *page, enum ttu_flags flags, pte_t *pteval, @@ -2256,47 +2306,12 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, =20 /* MADV_FREE page check */ if (!folio_test_swapbacked(folio)) { - int ref_count, map_count; - - /* - * Synchronize with gup_pte_range(): - * - clear PTE; barrier; read refcount - * - inc refcount; barrier; read PTE - */ - smp_mb(); - - ref_count =3D folio_ref_count(folio); - map_count =3D folio_mapcount(folio); - - /* - * Order reads for page refcount and dirty flag - * (see comments in __remove_mapping()). - */ - smp_rmb(); - - if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { - /* - * redirtied either using the page table or a previously - * obtained GUP reference. - */ - set_ptes(mm, address, pvmw.pte, pteval, nr_pages); - folio_set_swapbacked(folio); + if (!can_unmap_lazyfree_folio_range(vma, folio, address, + pvmw.pte, pteval, nr_pages)) goto walk_abort; - } else if (ref_count !=3D 1 + map_count) { - /* - * Additional reference. Could be a GUP reference or any - * speculative reference. GUP users must mark the folio - * dirty if there was a modification. This folio cannot be - * reclaimed right now either way, so act just like nothing - * happened. - * We'll come back here later and detect if the folio was - * dirtied when the additional reference is gone. - */ - set_ptes(mm, address, pvmw.pte, pteval, nr_pages); - goto walk_abort; - } + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); - goto discard; + goto finish_unmap; } =20 if (folio_dup_swap(folio, subpage) < 0) { @@ -2359,7 +2374,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, */ add_mm_counter(mm, mm_counter_file(folio), -nr_pages); } -discard: +finish_unmap: if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); } else { --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 654F33630AE for ; Fri, 10 Apr 2026 10:32:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817177; cv=none; b=rrAm3WHb1enoGJmYvQMJnCy+iQ/5hln2ExVPim98dIZ1xEEi4qTLgFZ+wmsE3XCw3XR3x98ztcHm2QAX872oa+hl5VVn0+MS/5dJM+9jPMFzTSnVB4MnPLJTF/vba3kVyb84pOmO1/0iaoqhjr6PtGqbukudTRAwyMFSn33FzdE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817177; c=relaxed/simple; bh=caKXDhVV7OSoFkbvLj/dPJkPGoMKHBzSGDJtbDEn0Dc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VHY39O+6YuU/Z/EoGTHZh8X4Znfx9Du0ZtRO1wDDbHm2c3WDavE5IElFqk3eFwNjZcDynCvdMAwyTPSpLWTWDjk5q+kBK9uCKrqRK0cbi3loCulu7FMdaTp0OTrhKTt+6CKsviAeaN5CNQWRTpGH0njXkqJih5B0Pgn/XXPfs6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=b6+PS95B; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="b6+PS95B" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0EBBE2682; Fri, 10 Apr 2026 03:32:50 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 92EFB3FAF5; Fri, 10 Apr 2026 03:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817175; bh=caKXDhVV7OSoFkbvLj/dPJkPGoMKHBzSGDJtbDEn0Dc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b6+PS95BFzLUjvFSdizyN/iTJgGePZWWNKL7DzMAc+OfVgATLlrKI08QXy64iceAK XgPjoTcY6qT13gn1dOcGbCFa2UQa0W6IqGGg+D5l5PHTNCCZjqioviSd7oopUC5FqV SdLSw1d9FJ0kuBbPlvxMvfWsxe272DAdl+3S+srU= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 4/9] mm/memory: Batch set uffd-wp markers during zapping Date: Fri, 10 Apr 2026 16:01:59 +0530 Message-Id: <20260410103204.120409-5-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for the next patch, enable batch setting of uffd-wp ptes. The code paths passing nr > 1 to zap_install_uffd_wp_if_needed() produce that nr through either folio_pte_batch or swap_pte_batch, guaranteeing that all ptes are the same w.r.t belonging to the same type of VMA (anonymous or non-anonymous, wp-armed or non-wp-armed), and all being marked with uffd-wp or all being not marked. Note that we will have to use set_pte_at() in a loop instead of set_ptes() since the latter cannot handle present->non-present conversion for nr_pages > 1. Convert documentation of install_uffd_wp_ptes_if_needed to kerneldoc format. No functional change is intended. Signed-off-by: Dev Jain --- include/linux/mm_inline.h | 32 ++++++++++++++++++++------------ mm/memory.c | 20 +------------------- mm/rmap.c | 2 +- 3 files changed, 22 insertions(+), 32 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f05..20c34d14ad539 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -566,9 +566,17 @@ static inline pte_marker copy_pte_marker( return dstm; } =20 -/* - * If this pte is wr-protected by uffd-wp in any form, arm the special pte= to - * replace a none pte. NOTE! This should only be called when *pte is alr= eady +/** + * install_uffd_wp_ptes_if_needed - install uffd-wp marker on PTEs that map + * consecutive pages of the same large folio. + * @vma: The VMA the pages are mapped into. + * @addr: Address the first page of this batch is mapped at. + * @ptep: Page table pointer for the first entry of this batch. + * @pteval: old value of the entry pointed to by ptep. + * @nr_ptes: Number of entries to clear (batch size). + * + * If the ptes were wr-protected by uffd-wp in any form, arm special ptes = to + * replace none ptes. NOTE! This should only be called when *pte is alre= ady * cleared so we will never accidentally replace something valuable. Mean= while * none pte also means we are not demoting the pte so tlb flushed is not n= eeded. * E.g., when pte cleared the caller should have taken care of the tlb flu= sh. @@ -576,11 +584,11 @@ static inline pte_marker copy_pte_marker( * Must be called with pgtable lock held so that no thread will see the no= ne * pte, and if they see it, they'll fault and serialize at the pgtable loc= k. * - * Returns true if an uffd-wp pte was installed, false otherwise. + * Returns true if uffd-wp ptes were installed, false otherwise. */ static inline bool -pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long ad= dr, - pte_t *pte, pte_t pteval) +install_uffd_wp_ptes_if_needed(struct vm_area_struct *vma, unsigned long a= ddr, + pte_t *pte, pte_t pteval, unsigned long nr_ptes) { bool arm_uffd_pte =3D false; =20 @@ -610,13 +618,13 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *= vma, unsigned long addr, if (unlikely(pte_swp_uffd_wp_any(pteval))) arm_uffd_pte =3D true; =20 - if (unlikely(arm_uffd_pte)) { - set_pte_at(vma->vm_mm, addr, pte, - make_pte_marker(PTE_MARKER_UFFD_WP)); - return true; - } + if (likely(!arm_uffd_pte)) + return false; =20 - return false; + for (int i =3D 0; i < nr_ptes; ++i, ++pte, addr +=3D PAGE_SIZE) + set_pte_at(vma->vm_mm, addr, pte, make_pte_marker(PTE_MARKER_UFFD_WP)); + + return true; } =20 static inline bool vma_has_recency(const struct vm_area_struct *vma) diff --git a/mm/memory.c b/mm/memory.c index ea65685711311..eef144fa293d4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1594,29 +1594,11 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct= *vma, unsigned long addr, pte_t *pte, int nr, struct zap_details *details, pte_t pteval) { - bool was_installed =3D false; - - if (!uffd_supports_wp_marker()) - return false; - - /* Zap on anonymous always means dropping everything */ - if (vma_is_anonymous(vma)) - return false; - if (zap_drop_markers(details)) return false; =20 - for (;;) { - /* the PFN in the PTE is irrelevant. */ - if (pte_install_uffd_wp_if_needed(vma, addr, pte, pteval)) - was_installed =3D true; - if (--nr =3D=3D 0) - break; - pte++; - addr +=3D PAGE_SIZE; - } + return install_uffd_wp_ptes_if_needed(vma, addr, pte, pteval, nr); =20 - return was_installed; } =20 static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, diff --git a/mm/rmap.c b/mm/rmap.c index fa5d6599dedf0..20e1fb81c33fc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2263,7 +2263,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * we may want to replace a none pte with a marker pte if * it's file-backed, so we don't lose the tracking info. */ - pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2E28A36826C for ; Fri, 10 Apr 2026 10:33:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817186; cv=none; b=mGueyYPSGw9qhHD2Enl4XqKMV8SL/asF+KxXJK9aqxR74wTWH7GKJ+KDHS/9GZuIoJdnTfUk37VBEWiPPsxUWWbAUpqTRc6bIP9Hdwy/5mn8A27S5Ur0SmTWfilnf5x3nWd59S+UBBE8X7VH4kCezrKOKBOrdgcif75UqTcsnAE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817186; c=relaxed/simple; bh=nznb2888N9JDa3zHHM9+wuulyWdFYl45pOUn7o59yFs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uQSDlxJ0n0OLKZ3/wwsJH7jBvcrz5ss5cuVUSfvK93epDs8yFHJyp4vVJwQi5oR+p95G0NJNQFcGoPNXoTHL7v5+jGqeiokFTTwZhKuebrx2U4QT3fvVXbMn5IKwXn1k9DiSYQVpAzxOuFTNpGTqyt1RLyTlT/ewPAJiXDapzlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=iEparMsG; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="iEparMsG" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C74E92696; Fri, 10 Apr 2026 03:32:58 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5339A3FAF5; Fri, 10 Apr 2026 03:32:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817184; bh=nznb2888N9JDa3zHHM9+wuulyWdFYl45pOUn7o59yFs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iEparMsGcO02O+o0tOuEdJ9CGG67+dAO90ssHctZ5f6vH9drv5WHMtg8RGsfYurCr HYjyTkZI62cLthPoG6x/jpOjNc2ekYXuQ1GS1fcYlTTa6cz84BHBrNAGnI11W/MxQy fDyhOmC5zVX9X81xgpqRslqHvf4Lbz3M27sSp3eo= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Date: Fri, 10 Apr 2026 16:02:00 +0530 Message-Id: <20260410103204.120409-6-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit a67fe41e214f ("mm: rmap: support batched unmapping for file large fo= lios") extended batched unmapping for file folios. That also required making install_uffd_wp_ptes_if_needed() support batching, but that was left out for the time being, and correctness was maintained by stopping batching in case the VMA the folio belongs to is marked uffd-wp. Now that we have a batched version called install_uffd_wp_ptes_if_needed, simply call that. folio_unmap_pte_batch() ensures that the original state of the ptes is either all uffd or all non-uffd, so we maintain correctness. If uffd-wp bit is there, we have the following transitions of ptes after unmapping: 1) anon folio: present -> uffd-wp swap 2) file folio: present -> uffd-wp marker We must ensure that these ptes are not reprocessed by the while loop - if the batch length is less than the number of pages in the folio, then we must skip over this batch. The page_vma_mapped_walk API ensures this - check_pte() will return true only if any of [pvmw->pfn, pvmw->pfn + nr_pages) is mapped by the pte. There is no pfn underlying either a uffd-wp swap pte or a uffd-wp marker pte, so check_pte returns false and we keep skipping until we hit a present entry, which is where we want to batch from next. Signed-off-by: Dev Jain --- mm/rmap.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 20e1fb81c33fc..7a150edd96819 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1965,9 +1965,6 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - if (userfaultfd_wp(vma)) - return 1; - /* * If unmap fails, we need to restore the ptes. To avoid accidentally * upgrading write permissions for ptes that were not originally @@ -2263,7 +2260,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * we may want to replace a none pte with a marker pte if * it's file-backed, so we don't lose the tracking info. */ - install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1); + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, nr_pages); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4394337D121 for ; Fri, 10 Apr 2026 10:33:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817197; cv=none; b=Dinvwc+KRk9xNEut7ZIgEIg4jDf4y0h0Nfn3+iGTZ/D6WBNok4b0Vij92SKcB48YTPUDS2N4NCMTZfzzZaL0iOkoAg6yPjEM0JkgkYfCay0P3n+HjiPWkGjA4ZcGPKOkagm5hSpJ2w+M/Se3J9T4XaNKHSyACraF0q4pm8lPh7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817197; c=relaxed/simple; bh=ZjAFZQGht/MkomLmPetxzLAvC/RxdkzRQ9Ld8jFfkCA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Gc/sgfJHUCclT18E1hJJNqriEwepiTb8JKFl1d+SQYh5+ZlauKLHe6Z5wIZ7rvvpPFpOlQcRawBTWrfkp3aGpPdPrYxl4ZQzvyVXophaRL04T385mCY/8usCgqk6EfIgu3jv2qBQLLjjM2CtZYct2DBqxezsgqDBhs/HoLcYBnw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=PNexbu38; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="PNexbu38" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8890026AC; Fri, 10 Apr 2026 03:33:07 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 149533FAF5; Fri, 10 Apr 2026 03:33:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817193; bh=ZjAFZQGht/MkomLmPetxzLAvC/RxdkzRQ9Ld8jFfkCA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PNexbu38UIpmS8ZurlXAd4ooPHyE+rz9wcSHywjpLXQ11yC1pGXoJoxDKsLldKMPV +zQiwWNDDMKDh/wdoVY1SIGQqe9pwaIOh9+AtTbNIYcDGDb7jywOncX7VWXlg2lV81 oRFQlx2HzlW0TrxUet4ELXPGTXdzPGfypmTHRvKo= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 6/9] mm/swapfile: Add batched version of folio_dup_swap Date: Fri, 10 Apr 2026 16:02:01 +0530 Message-Id: <20260410103204.120409-7-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add folio_dup_swap_pages to handle a batch of consecutive pages. Note that folio_dup_swap already can handle a subset of this: nr_pages =3D=3D 1 = and nr_pages =3D=3D folio_nr_pages(folio). Generalize this to any nr_pages. Currently we have a not-so-nice logic of passing in subpage =3D=3D NULL if we mean to exercise the logic on the entire folio, and subpage !=3D NULL if we want to exercise the logic on only that subpage. Remove this indirection: the caller invokes folio_dup_swap_pages() if it wants to operate on a range of pages in the folio (i.e nr_pages may be anything between 1 till folio_nr_pages()), and invokes folio_dup_swap() if it wants to operate on the entire folio. Signed-off-by: Dev Jain --- mm/rmap.c | 2 +- mm/shmem.c | 2 +- mm/swap.h | 12 ++++++++++-- mm/swapfile.c | 20 ++++++++++++-------- 4 files changed, 24 insertions(+), 12 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7a150edd96819..6412103fcd6cb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2311,7 +2311,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, goto finish_unmap; } =20 - if (folio_dup_swap(folio, subpage) < 0) { + if (folio_dup_swap_pages(folio, subpage, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } diff --git a/mm/shmem.c b/mm/shmem.c index 5aa43657886c3..3f9523c97b9ed 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1695,7 +1695,7 @@ int shmem_writeout(struct folio *folio, struct swap_i= ocb **plug, spin_unlock(&shmem_swaplist_lock); } =20 - folio_dup_swap(folio, NULL); + folio_dup_swap(folio); shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap)); =20 BUG_ON(folio_mapped(folio)); diff --git a/mm/swap.h b/mm/swap.h index a77016f2423b9..3c25f914e908b 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -206,7 +206,9 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gf= p_t gfp); * folio_put_swap(): does the opposite thing of folio_dup_swap(). */ int folio_alloc_swap(struct folio *folio); -int folio_dup_swap(struct folio *folio, struct page *subpage); +int folio_dup_swap(struct folio *folio); +int folio_dup_swap_pages(struct folio *folio, struct page *page, + unsigned long nr_pages); void folio_put_swap(struct folio *folio, struct page *subpage); =20 /* For internal use */ @@ -390,7 +392,13 @@ static inline int folio_alloc_swap(struct folio *folio) return -EINVAL; } =20 -static inline int folio_dup_swap(struct folio *folio, struct page *page) +static inline int folio_dup_swap(struct folio *folio) +{ + return -EINVAL; +} + +static inline int folio_dup_swap_pages(struct folio *folio, struct page *p= age, + unsigned long nr_pages) { return -EINVAL; } diff --git a/mm/swapfile.c b/mm/swapfile.c index ff315b752afd3..22be05a0bb200 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1740,9 +1740,10 @@ int folio_alloc_swap(struct folio *folio) } =20 /** - * folio_dup_swap() - Increase swap count of swap entries of a folio. + * folio_dup_swap_pages() - Increase swap count of swap entries of a folio. * @folio: folio with swap entries bounded. - * @subpage: if not NULL, only increase the swap count of this subpage. + * @page: the first page in the folio to increase the swap count for. + * @nr_pages: the number of pages in the folio to increase the swap count = for. * * Typically called when the folio is unmapped and have its swap entry to * take its place: Swap entries allocated to a folio has count =3D=3D 0 an= d pinned @@ -1756,23 +1757,26 @@ int folio_alloc_swap(struct folio *folio) * swap_put_entries_direct on its swap entry before this helper returns, or * the swap count may underflow. */ -int folio_dup_swap(struct folio *folio, struct page *subpage) +int folio_dup_swap_pages(struct folio *folio, struct page *page, + unsigned long nr_pages) { swp_entry_t entry =3D folio->swap; - unsigned long nr_pages =3D folio_nr_pages(folio); =20 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio); =20 - if (subpage) { - entry.val +=3D folio_page_idx(folio, subpage); - nr_pages =3D 1; - } + entry.val +=3D folio_page_idx(folio, page); =20 return swap_dup_entries_cluster(swap_entry_to_info(entry), swp_offset(entry), nr_pages); } =20 +int folio_dup_swap(struct folio *folio) +{ + return folio_dup_swap_pages(folio, folio_page(folio, 0), + folio_nr_pages(folio)); +} + /** * folio_put_swap() - Decrease swap count of swap entries of a folio. * @folio: folio with swap entries bounded, must be in swap cache and lock= ed. --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EB6F038655D for ; Fri, 10 Apr 2026 10:33:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817206; cv=none; b=h8PSDcRNmAzALY3XFmy086pOkB4NdnWCH6AZHE67Zo/XeU7hmbVOgOvAdmrNA9cYJFzMFu2UXzznlidmblU5FtMRs6ZF88YcMbQomu7POhv+YSMKMF3rCGAW4nUbtbcIhU3Kbm3rm1EoVdbmAqENjrHCek3D4sOJ9+mZqJTdHSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817206; c=relaxed/simple; bh=bsHtG+pqYrgnynFwDuTLl6TEYR+rvmf81J2ef4PKREM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=l8nDdV+H8uYAcOhPhw2C1hWt5fuXlBOM/HUzGjpM58i8G3Ar6S0GvSD6WPyUl7IVABxLFhB0Pa5oJO7ziNRG4NOlyBpt2dAg2a6tAbfFn7KP7UIhk1BiSdD4un2gm2WGgH0n88DHZwY58crzTroYOZ8gFvXc2QJVeXiqXh8Cs6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=ih7hSGVa; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="ih7hSGVa" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 45ECA2682; Fri, 10 Apr 2026 03:33:16 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id C96383FAF5; Fri, 10 Apr 2026 03:33:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817202; bh=bsHtG+pqYrgnynFwDuTLl6TEYR+rvmf81J2ef4PKREM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ih7hSGVaCiEv29B/vfG2bp2W/3IRDX6R2oWnvXwewVRKpaz+2G5dpkF18231ZixhL +9M4kNcEOzXeB+6ll4JGrAEpD6Lcro1YA/xl06cZPTqFl9hEwNpsLTA7m8bekSEsEg ZPvVeH6CL19ARqJIppYk+maFKDV+VULRm9w6OWuw= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 7/9] mm/swapfile: Add batched version of folio_put_swap Date: Fri, 10 Apr 2026 16:02:02 +0530 Message-Id: <20260410103204.120409-8-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add folio_put_swap_pages to handle a batch of consecutive pages. Note that folio_put_swap already can handle a subset of this: nr_pages =3D=3D 1 = and nr_pages =3D=3D folio_nr_pages(folio). Generalize this to any nr_pages. Currently we have a not-so-nice logic of passing in subpage =3D=3D NULL if we mean to exercise the logic on the entire folio, and subpage !=3D NULL if we want to exercise the logic on only that subpage. Remove this indirection: the caller invokes folio_put_swap_pages() if it wants to operate on a range of pages in the folio (i.e nr_pages may be anything between 1 till folio_nr_pages()), and invokes folio_put_swap() if it wants to operate on the entire folio. Signed-off-by: Dev Jain --- mm/memory.c | 6 +++--- mm/rmap.c | 4 ++-- mm/shmem.c | 6 +++--- mm/swap.h | 11 +++++++++-- mm/swapfile.c | 22 +++++++++++++--------- 5 files changed, 30 insertions(+), 19 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index eef144fa293d4..d6da01867baf9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5088,7 +5088,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(folio !=3D swapcache)) { folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); - folio_put_swap(swapcache, NULL); + folio_put_swap(swapcache); } else if (!folio_test_anon(folio)) { /* * We currently only expect !anon folios that are fully @@ -5097,12 +5097,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) VM_WARN_ON_ONCE_FOLIO(folio_nr_pages(folio) !=3D nr_pages, folio); VM_WARN_ON_ONCE_FOLIO(folio_mapped(folio), folio); folio_add_new_anon_rmap(folio, vma, address, rmap_flags); - folio_put_swap(folio, NULL); + folio_put_swap(folio); } else { VM_WARN_ON_ONCE(nr_pages !=3D 1 && nr_pages !=3D folio_nr_pages(folio)); folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address, rmap_flags); - folio_put_swap(folio, nr_pages =3D=3D 1 ? page : NULL); + folio_put_swap_pages(folio, page, nr_pages); } =20 VM_BUG_ON(!folio_test_anon(folio) || diff --git a/mm/rmap.c b/mm/rmap.c index 6412103fcd6cb..9b20ef7f211e1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2322,7 +2322,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * so we'll not check/care. */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { - folio_put_swap(folio, subpage); + folio_put_swap_pages(folio, subpage, 1); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } @@ -2330,7 +2330,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, /* See folio_try_share_anon_rmap(): clear PTE first. */ if (anon_exclusive && folio_try_share_anon_rmap_pte(folio, subpage)) { - folio_put_swap(folio, subpage); + folio_put_swap_pages(folio, subpage, 1); set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } diff --git a/mm/shmem.c b/mm/shmem.c index 3f9523c97b9ed..f49bf07e806a7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1716,7 +1716,7 @@ int shmem_writeout(struct folio *folio, struct swap_i= ocb **plug, /* Swap entry might be erased by racing shmem_free_swap() */ if (!error) { shmem_recalc_inode(inode, 0, -nr_pages); - folio_put_swap(folio, NULL); + folio_put_swap(folio); } =20 /* @@ -2196,7 +2196,7 @@ static void shmem_set_folio_swapin_error(struct inode= *inode, pgoff_t index, =20 nr_pages =3D folio_nr_pages(folio); folio_wait_writeback(folio); - folio_put_swap(folio, NULL); + folio_put_swap(folio); swap_cache_del_folio(folio); /* * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks @@ -2426,7 +2426,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - folio_put_swap(folio, NULL); + folio_put_swap(folio); swap_cache_del_folio(folio); folio_mark_dirty(folio); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index 3c25f914e908b..343547469927a 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -209,7 +209,9 @@ int folio_alloc_swap(struct folio *folio); int folio_dup_swap(struct folio *folio); int folio_dup_swap_pages(struct folio *folio, struct page *page, unsigned long nr_pages); -void folio_put_swap(struct folio *folio, struct page *subpage); +void folio_put_swap(struct folio *folio); +void folio_put_swap_pages(struct folio *folio, struct page *page, + unsigned long nr_pages); =20 /* For internal use */ extern void __swap_cluster_free_entries(struct swap_info_struct *si, @@ -403,7 +405,12 @@ static inline int folio_dup_swap_pages(struct folio *f= olio, struct page *page, return -EINVAL; } =20 -static inline void folio_put_swap(struct folio *folio, struct page *page) +static inline void folio_put_swap(struct folio *folio) +{ +} + +static inline void folio_put_swap_pages(struct folio *folio, struct page *= page, + unsigned long nr_pages) { } =20 diff --git a/mm/swapfile.c b/mm/swapfile.c index 22be05a0bb200..d8fae3925e171 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1778,31 +1778,34 @@ int folio_dup_swap(struct folio *folio) } =20 /** - * folio_put_swap() - Decrease swap count of swap entries of a folio. + * folio_put_swap_pages() - Decrease swap count of swap entries of a folio. * @folio: folio with swap entries bounded, must be in swap cache and lock= ed. - * @subpage: if not NULL, only decrease the swap count of this subpage. + * @page: the first page in the folio to decrease the swap count for. + * @nr_pages: the number of pages in the folio to decrease the swap count = for. * * This won't free the swap slots even if swap count drops to zero, they a= re * still pinned by the swap cache. User may call folio_free_swap to free t= hem. * Context: Caller must ensure the folio is locked and in the swap cache. */ -void folio_put_swap(struct folio *folio, struct page *subpage) +void folio_put_swap_pages(struct folio *folio, struct page *page, + unsigned long nr_pages) { swp_entry_t entry =3D folio->swap; - unsigned long nr_pages =3D folio_nr_pages(folio); struct swap_info_struct *si =3D __swap_entry_to_info(entry); =20 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio); =20 - if (subpage) { - entry.val +=3D folio_page_idx(folio, subpage); - nr_pages =3D 1; - } + entry.val +=3D folio_page_idx(folio, page); =20 swap_put_entries_cluster(si, swp_offset(entry), nr_pages, false); } =20 +void folio_put_swap(struct folio *folio) +{ + folio_put_swap_pages(folio, folio_page(folio, 0), folio_nr_pages(folio)); +} + /* * When we get a swap entry, if there aren't some other ways to * prevent swapoff, such as the folio in swap cache is locked, RCU @@ -2443,7 +2446,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, new_pte =3D pte_mkuffd_wp(new_pte); setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); - folio_put_swap(swapcache, folio_file_page(swapcache, swp_offset(entry))); + folio_put_swap_pages(swapcache, + folio_file_page(swapcache, swp_offset(entry)), 1); out: if (pte) pte_unmap_unlock(pte, ptl); --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8B7B433F5A9 for ; Fri, 10 Apr 2026 10:33:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817214; cv=none; b=TUEwuHPmNjumXBPfJoPXHLftCAdcQWfCuqhSyK8Wa83MERB66kisUAUgEFVps+2n8JWfG/hcZGKxqpYYPVhhNQ/FBxScdubTjUIUVENg27Je9KuCPefylc2GD120u78o22Y172g4VnIkl1Gs0tMfjF4pY1cEzcX6t1D6ATVuCmI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817214; c=relaxed/simple; bh=tF3VW2lbJuABvGu5WBZjmvW3rTKVGcTyExhhZXtmOk8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uIZ+sYK4/UwK2Cvob0W+5rDPVtx+BN0dlSpAjkssC9j2vNf+XGEkivsXAGbAomVQ2hw43We5ey5iFpf1gNnEjjQSTHgvr30+KcXP/bIcz1RaQ8VS7SyapjMZB6ySit+04J8gs6tEaWopWuRoCSWJFwIsKcya0t3T4LaYtmiae6Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=B8Zxi1oo; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="B8Zxi1oo" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B6944529; Fri, 10 Apr 2026 03:33:25 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8A01F3FAF5; Fri, 10 Apr 2026 03:33:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817210; bh=tF3VW2lbJuABvGu5WBZjmvW3rTKVGcTyExhhZXtmOk8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=B8Zxi1oowJQksrIiPBubX8jrt3yniQITzN3UZ8rLvHMLHv/Ien8ryoISTvpCoiy6I qWpA8/CW72pBeVLfZYBV9iNeewvz+tzYciTIn+MDm3gV85bUmGJExuFSPQIKV6ow5V 7Q80cSOMa3itWQ//epcQbaltosKJVSg8wXCNYYK4= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 8/9] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Date: Fri, 10 Apr 2026 16:02:03 +0530 Message-Id: <20260410103204.120409-9-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To enabe batched unmapping of anonymous folios, we need to handle the sharing of exclusive pages. Hence, a batched version of folio_try_share_anon_rmap_pte is required. Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is to do some rmap sanity checks. Add helpers to set and clear the PageAnonExclusive bit on a batch of nr_pages. Note that __folio_try_share_anon_rmap can receive nr_pages =3D=3D HPAGE_PMD_NR from t= he PMD path, but currently we only clear the bit on the head page. Retain this behaviour by setting nr_pages =3D 1 in case the caller is folio_try_share_anon_rmap_pmd. While at it, convert nr_pages to unsigned long to future-proof from overflow in case P4D-huge mappings etc get supported down the road. Signed-off-by: Dev Jain --- include/linux/mm.h | 11 +++++++++++ include/linux/rmap.h | 27 ++++++++++++++++++++------- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 633bbf9a184a6..2d20954da652a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -243,6 +243,17 @@ static inline unsigned long folio_page_idx(const struc= t folio *folio, return page - &folio->page; } =20 +static __always_inline void folio_clear_pages_anon_exclusive(struct page *= page, + unsigned long nr_pages) +{ + for (;;) { + ClearPageAnonExclusive(page); + if (--nr_pages =3D=3D 0) + break; + ++page; + } +} + static inline struct folio *lru_to_folio(struct list_head *head) { return list_entry((head)->prev, struct folio, lru); diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8dc0871e5f001..f3b3ee3955afc 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -706,15 +706,19 @@ static inline int folio_try_dup_anon_rmap_pmd(struct = folio *folio, } =20 static __always_inline int __folio_try_share_anon_rmap(struct folio *folio, - struct page *page, int nr_pages, enum pgtable_level level) + struct page *page, unsigned long nr_pages, enum pgtable_level level) { VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio); __folio_rmap_sanity_checks(folio, page, nr_pages, level); =20 + /* We only clear anon-exclusive from head page of PMD folio */ + if (level =3D=3D PGTABLE_LEVEL_PMD) + nr_pages =3D 1; + /* device private folios cannot get pinned via GUP. */ if (unlikely(folio_is_device_private(folio))) { - ClearPageAnonExclusive(page); + folio_clear_pages_anon_exclusive(page, nr_pages); return 0; } =20 @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(= struct folio *folio, =20 if (unlikely(folio_maybe_dma_pinned(folio))) return -EBUSY; - ClearPageAnonExclusive(page); + folio_clear_pages_anon_exclusive(page, nr_pages); =20 /* * This is conceptually a smp_wmb() paired with the smp_rmb() in @@ -778,11 +782,12 @@ static __always_inline int __folio_try_share_anon_rma= p(struct folio *folio, } =20 /** - * folio_try_share_anon_rmap_pte - try marking an exclusive anonymous page - * mapped by a PTE possibly shared to prepare + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages + * mapped by PTEs possibly shared to prepare * for KSM or temporary unmapping * @folio: The folio to share a mapping of - * @page: The mapped exclusive page + * @page: The first mapped exclusive page of the batch in the folio + * @nr_pages: The number of pages to share in the folio (batch size) * * The caller needs to hold the page table lock and has to have the page t= able * entries cleared/invalidated. @@ -797,11 +802,19 @@ static __always_inline int __folio_try_share_anon_rma= p(struct folio *folio, * * Returns 0 if marking the mapped page possibly shared succeeded. Returns * -EBUSY otherwise. + * + * The caller needs to hold the page table lock. */ +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio, + struct page *page, unsigned long nr_pages) +{ + return __folio_try_share_anon_rmap(folio, page, nr_pages, PGTABLE_LEVEL_P= TE); +} + static inline int folio_try_share_anon_rmap_pte(struct folio *folio, struct page *page) { - return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE); + return folio_try_share_anon_rmap_ptes(folio, page, 1); } =20 /** --=20 2.34.1 From nobody Sat Jun 20 19:55:16 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B32EF38838A for ; Fri, 10 Apr 2026 10:33:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817224; cv=none; b=lqmIUgYYMtHbR7ovIPoaYwbwVw2Vb7E4ZM6aBCpvb7XAwdOWEDgXiPafQBO+i8Bo84pwtD2R/T2pvcaNTtU3VAMldpz16EFxgmxMj/A68FQRvCbMRKncU0cTvWilnN3Sm9L1fYcMECCqY6wsqjHheYhytIhVYAccabSN5TmtX1U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775817224; c=relaxed/simple; bh=IYbaIbF+HIELGkWoWytxfeLOJBlJUqn4EEQxhn4v5mE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Rkr0Ct5DSqVUXifKTDWwXdUfhPq5cP9sZ7sB4ByVQUs59SVcfOetX7K95u+rvEhyPjqWVhuw6NKGjzJbECdzUB8qUbzRXLoDp61p/xjyFSHEP1MeZbin+g8OCvaK0KWzrhg3Tx6Q7CZ8kDk8dhWQNWqzWPYrQm0W7cbUEqAjNPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=SSUwwv6K; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="SSUwwv6K" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C265A453F; Fri, 10 Apr 2026 03:33:33 -0700 (PDT) Received: from a080796.blr.arm.com (a080796.arm.com [10.164.21.51]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 503083FAF5; Fri, 10 Apr 2026 03:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1775817219; bh=IYbaIbF+HIELGkWoWytxfeLOJBlJUqn4EEQxhn4v5mE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SSUwwv6KmxM5an1P1ml4mo9ZiDEOcCqJ17Cgua9iP7oh/FRV8mDjV5mh6zBIWMCzA AE2brYwiBSZR1LQT8DiuVrzHIUpBplvp5Yw7emuhH4tLT5Ke616mYWLa6iJWd3h76k HZ6tHKFGrynXQ/RJFk6cYJvf6fOBwhn5+qIgOh28= From: Dev Jain To: akpm@linux-foundation.org, david@kernel.org, hughd@google.com, chrisl@kernel.org Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, kasong@tencent.com, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, riel@surriel.com, harry@kernel.org, jannh@google.com, pfalcato@suse.de, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, Dev Jain Subject: [PATCH v2 9/9] mm/rmap: enable batch unmapping of anonymous folios Date: Fri, 10 Apr 2026 16:02:04 +0530 Message-Id: <20260410103204.120409-10-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260410103204.120409-1-dev.jain@arm.com> References: <20260410103204.120409-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable batch clearing of ptes, and batch swap setting of ptes for anon folio unmapping. Processing all ptes of a large folio in one go helps us batch across atomics (add_mm_counter etc), barriers (in the function __folio_try_share_anon_rmap), repeated calls to page_vma_mapped_walk(), to name a few. In general, batching helps us to execute similar code together, making the execution of the program more memory and CPU friendly. On arm64-contpte, batching also helps us avoid redundant ptep_get() calls and TLB flushes while breaking the contpte mapping. The handling of anon-exclusivity is very similar to commit cac1db8c3aad ("mm: optimize mprotect() by PTE batching"). Since folio_unmap_pte_batch() won't look at the bits of the underlying page, we need to process sub-batches of ptes pointing to pages which are same w.r.t exclusivity, and batch set only those ptes to swap ptes in one go. Hence export page_anon_exclusive_sub_batch() to internal.h and reuse it. arch_unmap_one() is only defined for sparc64; I am not comfortable regarding the nuances between retrieving the pfn from pte_pfn() or from (paddr =3D pte_val(oldpte) & _PAGE_PADDR_4V). (And, pte_next_pfn() can't even be called from arch_unmap_one() because that file does not include pgtable.h) So just disable the "sparc64-anon-swapbacked" case for now. We need to take care of rmap accounting (folio_remove_rmap_ptes) and reference accounting (folio_put_refs) when anon folio unmap succeeds. In case we partially batch the large folio and fail, we need to correctly do the accounting for pages which were successfully unmapped. So, put this accounting code in __unmap_anon_folio() itself, instead of doing some horrible goto jumping at the callsite of unmap_anon_folio(). Add a comment at relevant places to say that we are on a device-exclusive entry and not a present entry. If the batch length is less than the number of pages in the folio, then we must skip over this batch. The page_vma_mapped_walk API ensures this - check_pte() will return true only if any of [pvmw->pfn, pvmw->pfn + nr_pages) is mapped by the pte. There is no pfn underlying a swap pte, so check_pte returns false and we keep skipping until we hit a present pte, which is where we want to start unmapping from next. Signed-off-by: Dev Jain --- mm/internal.h | 26 +++++++ mm/mprotect.c | 17 ----- mm/rmap.c | 188 ++++++++++++++++++++++++++++++++++---------------- 3 files changed, 153 insertions(+), 78 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c693646e5b3f0..f7033c9626767 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -393,6 +393,32 @@ static inline unsigned int folio_pte_batch_flags(struc= t folio *folio, unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte, unsigned int max_nr); =20 +/** + * page_anon_exclusive_sub_batch - Determine length of consecutive exclusi= ve + * or maybe shared pages + * @start_idx: Starting index of the page array to scan from + * @max_len: Maximum length to look at + * @first_page: First page of the page array + * @expected_anon_exclusive: Whether to look for exclusive or !exclusive p= ages + * + * Determines length of consecutive ptes, pointing to pages being the same + * w.r.t the PageAnonExclusive bit. + * + * Context: The ptes point to consecutive pages of the same large folio. T= he + * ptes belong to the same PMD and VMA. + */ +static __always_inline int page_anon_exclusive_sub_batch(int start_idx, in= t max_len, + struct page *first_page, bool expected_anon_exclusive) +{ + int idx; + + for (idx =3D start_idx + 1; idx < start_idx + max_len; ++idx) { + if (expected_anon_exclusive !=3D PageAnonExclusive(first_page + idx)) + break; + } + return idx - start_idx; +} + /** * pte_move_swp_offset - Move the swap entry offset field of a swap pte * forward or backward by delta diff --git a/mm/mprotect.c b/mm/mprotect.c index 9cbf932b028cf..949fd7022b5cf 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -138,23 +138,6 @@ static __always_inline void prot_commit_flush_ptes(str= uct vm_area_struct *vma, tlb_flush_pte_range(tlb, addr, nr_ptes * PAGE_SIZE); } =20 -/* - * Get max length of consecutive ptes pointing to PageAnonExclusive() page= s or - * !PageAnonExclusive() pages, starting from start_idx. Caller must enforce - * that the ptes point to consecutive pages of the same anon large folio. - */ -static __always_inline int page_anon_exclusive_sub_batch(int start_idx, in= t max_len, - struct page *first_page, bool expected_anon_exclusive) -{ - int idx; - - for (idx =3D start_idx + 1; idx < start_idx + max_len; ++idx) { - if (expected_anon_exclusive !=3D PageAnonExclusive(first_page + idx)) - break; - } - return idx - start_idx; -} - /* * This function is a result of trying our very best to retain the * "avoid the write-fault handler" optimization. In can_change_pte_writabl= e(), diff --git a/mm/rmap.c b/mm/rmap.c index 9b20ef7f211e1..ca071641965bd 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1958,11 +1958,11 @@ static inline unsigned int folio_unmap_pte_batch(st= ruct folio *folio, end_addr =3D pmd_addr_end(addr, vma->vm_end); max_nr =3D (end_addr - addr) >> PAGE_SHIFT; =20 - /* We only support lazyfree or file folios batching for now ... */ - if (folio_test_anon(folio) && folio_test_swapbacked(folio)) + if (pte_unused(pte)) return 1; =20 - if (pte_unused(pte)) + if (__is_defined(__HAVE_ARCH_UNMAP_ONE) && folio_test_anon(folio) && + folio_test_swapbacked(folio)) return 1; =20 /* @@ -1975,6 +1975,122 @@ static inline unsigned int folio_unmap_pte_batch(st= ruct folio *folio, FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); } =20 +static inline void set_swp_ptes(struct mm_struct *mm, unsigned long addres= s, + pte_t *ptep, swp_entry_t entry, pte_t pteval, bool anon_exclusive, + unsigned long nr_pages) +{ + pte_t swp_pte =3D swp_entry_to_pte(entry); + + if (anon_exclusive) + swp_pte =3D pte_swp_mkexclusive(swp_pte); + + if (likely(pte_present(pteval))) { + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + /* Device-exclusive entry */ + VM_WARN_ON(nr_pages !=3D 1); + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } + + for (int i =3D 0; i < nr_pages; ++i, ++ptep, address +=3D PAGE_SIZE) { + set_pte_at(mm, address, ptep, swp_pte); + swp_pte =3D pte_next_swp_offset(swp_pte); + } +} + +static inline void finish_folio_unmap(struct vm_area_struct *vma, + struct folio *folio, struct page *subpage, unsigned long nr_pages) +{ + if (unlikely(folio_test_hugetlb(folio))) + hugetlb_remove_rmap(folio); + else + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + if (vma->vm_flags & VM_LOCKED) + mlock_drain_local(); + folio_put_refs(folio, nr_pages); +} + +static inline bool __unmap_anon_folio_range(struct vm_area_struct *vma, st= ruct folio *folio, + struct page *subpage, unsigned long address, pte_t *ptep, + pte_t pteval, unsigned long nr_pages, bool anon_exclusive) +{ + swp_entry_t entry =3D page_swap_entry(subpage); + struct mm_struct *mm =3D vma->vm_mm; + + if (folio_dup_swap_pages(folio, subpage, nr_pages) < 0) { + set_ptes(mm, address, ptep, pteval, nr_pages); + return false; + } + + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + VM_WARN_ON(nr_pages !=3D 1); + folio_put_swap_pages(folio, subpage, nr_pages); + set_pte_at(mm, address, ptep, pteval); + return false; + } + + /* See folio_try_share_anon_rmap(): clear PTE first. */ + if (anon_exclusive && folio_try_share_anon_rmap_ptes(folio, subpage, nr_p= ages)) { + folio_put_swap_pages(folio, subpage, nr_pages); + set_ptes(mm, address, ptep, pteval, nr_pages); + return false; + } + + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); + add_mm_counter(mm, MM_SWAPENTS, nr_pages); + set_swp_ptes(mm, address, ptep, entry, pteval, anon_exclusive, nr_pages); + finish_folio_unmap(vma, folio, subpage, nr_pages); + return true; +} + +static inline bool unmap_anon_folio_range(struct vm_area_struct *vma, stru= ct folio *folio, + struct page *first_page, unsigned long address, pte_t *ptep, + pte_t pteval, unsigned long nr_pages) +{ + bool expected_anon_exclusive; + int sub_batch_idx =3D 0; + int len, ret; + + for (;;) { + expected_anon_exclusive =3D PageAnonExclusive(first_page + sub_batch_idx= ); + len =3D page_anon_exclusive_sub_batch(sub_batch_idx, nr_pages, + first_page, expected_anon_exclusive); + ret =3D __unmap_anon_folio_range(vma, folio, first_page + sub_batch_idx, + address, ptep, pteval, len, expected_anon_exclusive); + if (!ret) + return ret; + + nr_pages -=3D len; + if (!nr_pages) + break; + + pteval =3D pte_advance_pfn(pteval, len); + address +=3D len * PAGE_SIZE; + sub_batch_idx +=3D len; + ptep +=3D len; + } + + return true; +} + static inline bool can_unmap_lazyfree_folio_range(struct vm_area_struct *v= ma, struct folio *folio, unsigned long address, pte_t *ptep, pte_t pteval, unsigned long nr_pages) @@ -2094,7 +2210,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - bool anon_exclusive, ret =3D true; + bool ret =3D true; pte_t pteval; struct page *subpage; struct mmu_notifier_range range; @@ -2219,8 +2335,6 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, =20 subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; - anon_exclusive =3D folio_test_anon(folio) && - PageAnonExclusive(subpage); =20 if (folio_test_hugetlb(folio)) { bool walk_done; @@ -2252,6 +2366,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, if (pte_dirty(pteval)) folio_mark_dirty(folio); } else { + /* Device-exclusive entry */ pte_clear(mm, address, pvmw.pte); } =20 @@ -2289,8 +2404,6 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, */ dec_mm_counter(mm, mm_counter(folio)); } else if (folio_test_anon(folio)) { - swp_entry_t entry =3D page_swap_entry(subpage); - pte_t swp_pte; /* * Store the swap location in the pte. * See handle_pte_fault() ... @@ -2306,57 +2419,17 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, if (!can_unmap_lazyfree_folio_range(vma, folio, address, pvmw.pte, pteval, nr_pages)) goto walk_abort; - add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto finish_unmap; } =20 - if (folio_dup_swap_pages(folio, subpage, 1) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (!unmap_anon_folio_range(vma, folio, subpage, address, + pvmw.pte, pteval, nr_pages)) goto walk_abort; - } =20 - /* - * arch_unmap_one() is expected to be a NOP on - * architectures where we could have PFN swap PTEs, - * so we'll not check/care. - */ - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - folio_put_swap_pages(folio, subpage, 1); - set_pte_at(mm, address, pvmw.pte, pteval); - goto walk_abort; - } - - /* See folio_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - folio_try_share_anon_rmap_pte(folio, subpage)) { - folio_put_swap_pages(folio, subpage, 1); - set_pte_at(mm, address, pvmw.pte, pteval); - goto walk_abort; - } - if (list_empty(&mm->mmlist)) { - spin_lock(&mmlist_lock); - if (list_empty(&mm->mmlist)) - list_add(&mm->mmlist, &init_mm.mmlist); - spin_unlock(&mmlist_lock); - } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); - swp_pte =3D swp_entry_to_pte(entry); - if (anon_exclusive) - swp_pte =3D pte_swp_mkexclusive(swp_pte); - if (likely(pte_present(pteval))) { - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } else { - if (pte_swp_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - } - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (nr_pages =3D=3D folio_nr_pages(folio)) + goto walk_done; + continue; } else { /* * This is a locked file-backed folio, @@ -2372,14 +2445,7 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, add_mm_counter(mm, mm_counter_file(folio), -nr_pages); } finish_unmap: - if (unlikely(folio_test_hugetlb(folio))) { - hugetlb_remove_rmap(folio); - } else { - folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); - } - if (vma->vm_flags & VM_LOCKED) - mlock_drain_local(); - folio_put_refs(folio, nr_pages); + finish_folio_unmap(vma, folio, subpage, nr_pages); =20 /* * If we are sure that we batched the entire folio and cleared --=20 2.34.1