From nobody Tue Apr 7 20:07:29 2026 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33C8F1F92E for ; Fri, 27 Feb 2026 09:44:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772185502; cv=none; b=gUtch+uOhzNh4xGKx7XPGeHU1iqT04dDv+xEEP7G/Y8msQvfSuYcMTvDLbHl2BCxzl9EBmRXJiZCEMyEo4IhVktQkavCcwaJ77YnQOcDqnR9/6wRtqYzAEqEME1PvT6ATg3GNZ3hw1plA5TEwPenFz01vldf0LORIf7F5eIxBLg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772185502; c=relaxed/simple; bh=csT0qHAE6gY9QRnQI3rHZIN8NUhGcIHGmO58ve5b6IY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FRI7gZJsm8ZWqHRDrhz0QUO+7v7nTSPyFSkUdsmOrAA89zAc0PpOmNL+G//inqdmaCjU23p4eT3Pluy3j8zHA5g9l3RcXD7KMg0cLf29jCA+Gq3ehtL5xgG0ArR8zInr/aMO8tQxLCcg+5e3YjgBNbX6lKmGxjWuoBvAghKp0PE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=uuynP2+i; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="uuynP2+i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1772185498; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=1wgDGH7YrsjBc++BlCR5vPH+I+CcHMW2c+p9i33Zw9Q=; b=uuynP2+iI92fIwqR/h/EJ38AYEGQaHQwzk002THk3HlnC2p2OfhFJESDRaxRnGwYJN1GcIE44segWzwHZmXI1/ueLzoaJ3vZ3ZbVOaYZbBSy85n5gkFWZYbdVNIlssUxgGiUI/7q+Qq3NSV4QAE4Z0ylPVVjJA1vCqf8RjLEW/I= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wzu12u4_1772185495 cluster:ay36) by smtp.aliyun-inc.com; Fri, 27 Feb 2026 17:44:56 +0800 From: Baolin Wang To: akpm@linux-foundation.org, david@kernel.org Cc: catalin.marinas@arm.com, will@kernel.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, willy@infradead.org, baohua@kernel.org, dev.jain@arm.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 4/6] mm: add a batched helper to clear the young flag for large folios Date: Fri, 27 Feb 2026 17:44:38 +0800 Message-ID: <589d743f4e048dc749002a7e1a1aec5d511c406b.1772185080.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, MGLRU will call ptep_test_and_clear_young_notify() to check and clear the young flag for each PTE sequentially, which is inefficient for large folios reclamation. Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64- specific ptep_test_and_clear_young() already implements an optimization to clear the young flags for PTEs within a contiguous range. However, this is = not sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE). Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and its wrapper test_and_clear_young_ptes_notify() which are consistent with the existing functions, to perform batched checking of the young flags for large folios, which can help improve performance during large folio reclamation w= hen MGLRU is enabled. And it will be overridden by the architecture that implem= ents a more efficient batch operation in the following patches. Signed-off-by: Baolin Wang --- include/linux/pgtable.h | 38 ++++++++++++++++++++++++++++++++++++++ mm/internal.h | 16 +++++++++++----- 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 776993d4567b..29bd9fd04e1e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1103,6 +1103,44 @@ static inline int clear_flush_young_ptes(struct vm_a= rea_struct *vma, } #endif =20 +#ifndef test_and_clear_young_ptes +/** + * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the= same + * folio as old + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear access bit. + * + * May be overridden by the architecture; otherwise, implemented as a simp= le + * loop over ptep_test_and_clear_young(). + * + * Note that PTE bits in the PTE range besides the PFN can differ. For exa= mple, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + * + * Returns: whether any PTE was young. + */ +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + unsigned int nr) +{ + int young =3D 0; + + for (;;) { + young |=3D ptep_test_and_clear_young(vma, addr, ptep); + if (--nr =3D=3D 0) + break; + ptep++; + addr +=3D PAGE_SIZE; + } + + return young; +} +#endif + /* * On some architectures hardware does not set page access bit when access= ing * memory page, it is responsibility of software setting this bit. It brin= gs diff --git a/mm/internal.h b/mm/internal.h index af04b177f21f..a5f0a264ad56 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1814,13 +1814,13 @@ static inline int pmdp_clear_flush_young_notify(str= uct vm_area_struct *vma, return young; } =20 -static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *= vma, - unsigned long addr, pte_t *ptep) +static inline int test_and_clear_young_ptes_notify(struct vm_area_struct *= vma, + unsigned long addr, pte_t *ptep, unsigned int nr) { int young; =20 - young =3D ptep_test_and_clear_young(vma, addr, ptep); - young |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + young =3D test_and_clear_young_ptes(vma, addr, ptep, nr); + young |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SI= ZE); return young; } =20 @@ -1838,9 +1838,15 @@ static inline int pmdp_test_and_clear_young_notify(s= truct vm_area_struct *vma, =20 #define clear_flush_young_ptes_notify clear_flush_young_ptes #define pmdp_clear_flush_young_notify pmdp_clear_flush_young -#define ptep_test_and_clear_young_notify ptep_test_and_clear_young +#define test_and_clear_young_ptes_notify test_and_clear_young_ptes #define pmdp_test_and_clear_young_notify pmdp_test_and_clear_young =20 #endif /* CONFIG_MMU_NOTIFIER */ =20 +static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *= vma, + unsigned long addr, pte_t *ptep) +{ + return test_and_clear_young_ptes_notify(vma, addr, ptep, 1); +} + #endif /* __MM_INTERNAL_H */ --=20 2.47.3