From nobody Sun Apr 5 21:16:35 2026 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5D3B371CF0 for ; Fri, 6 Mar 2026 06:44:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.132 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772779454; cv=none; b=RXHpeVgsK5JG2H9kUkgz9H20HL7/YwKd8KrXbW6WlYpsOqlFhe7Gu9vpeOviSwES/WrtJrE+lzdVbiRLfUgTOp0qhjOllRlEBf4szfMxoBeqVwBmnF8+9jgHbfPDu46is1tyk0vEhlr5YZdvPNd5DYZs8k6mH6JUq64L/UbigR0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772779454; c=relaxed/simple; bh=P8nt/pROwCqysN2teJXrpTYWCmaXDcKdjH+7H+y7LzQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ToiPZLdSaR7tsshwawLI1tbpS8KRR2CnEbUvWpKQCnVLqs0sXqztj3TXvm1/MOzDbYkPw/yTWsmOKLpKnWBvRSRcB36rm4RizB1SBdXCF48Sjya5Jl2Lh0ja0f0IrPTyDlmloaUn5fPysii7NRik9HduB08qRTiFyEWOWkUU2uI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=v49bVuO1; arc=none smtp.client-ip=115.124.30.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="v49bVuO1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1772779445; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=ySLYC5/cF8hmf06ne/L2kgOPFMnQM58WLWXLfygl3cs=; b=v49bVuO1fkFqiM6ScJNP13gRf2ev7Ccb9klhcYlXe0tYjbiXo3RqMpSIp1L3xn+Ohp/GyR1t+f1NZ8pHV44RD4p7FzMfXR0p1kkRg/Y3jmlmkKa8FCL4XVuCH/H2LsmmTbHJtn7EaZLPdWl/EOsmol52glplSnsiVIyIueoBzYQ= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0X-MLoqO_1772779443 cluster:ay36) by smtp.aliyun-inc.com; Fri, 06 Mar 2026 14:44:04 +0800 From: Baolin Wang To: akpm@linux-foundation.org, david@kernel.org Cc: catalin.marinas@arm.com, will@kernel.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, willy@infradead.org, baohua@kernel.org, dev.jain@arm.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios Date: Fri, 6 Mar 2026 14:43:40 +0800 Message-ID: <23ec671bfcc06cd24ee0fbff8e329402742274a0.1772778858.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, MGLRU will call ptep_test_and_clear_young_notify() to check and clear the young flag for each PTE sequentially, which is inefficient for large folios reclamation. Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64- specific ptep_test_and_clear_young() already implements an optimization to clear the young flags for PTEs within a contiguous range. However, this is = not sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE). Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and its wrapper test_and_clear_young_ptes_notify() which are consistent with the existing functions, to perform batched checking of the young flags for large folios, which can help improve performance during large folio reclamation w= hen MGLRU is enabled. And it will be overridden by the architecture that implem= ents a more efficient batch operation in the following patches. Signed-off-by: Baolin Wang --- include/linux/pgtable.h | 37 +++++++++++++++++++++++++++++++++++++ mm/internal.h | 16 +++++++++++----- 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d2767a4c027b..17d961c612fc 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1103,6 +1103,43 @@ static inline int clear_flush_young_ptes(struct vm_a= rea_struct *vma, } #endif =20 +#ifndef test_and_clear_young_ptes +/** + * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the= same + * folio as old + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear access bit. + * + * May be overridden by the architecture; otherwise, implemented as a simp= le + * loop over ptep_test_and_clear_young(). + * + * Note that PTE bits in the PTE range besides the PFN can differ. For exa= mple, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + * + * Returns: whether any PTE was young. + */ +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, unsigned int nr) +{ + int young =3D 0; + + for (;;) { + young |=3D ptep_test_and_clear_young(vma, addr, ptep); + if (--nr =3D=3D 0) + break; + ptep++; + addr +=3D PAGE_SIZE; + } + + return young; +} +#endif + /* * On some architectures hardware does not set page access bit when access= ing * memory page, it is responsibility of software setting this bit. It brin= gs diff --git a/mm/internal.h b/mm/internal.h index f45f97df0d28..8cdd5d8e43fb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1819,13 +1819,13 @@ static inline int pmdp_clear_flush_young_notify(str= uct vm_area_struct *vma, return young; } =20 -static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *= vma, - unsigned long addr, pte_t *ptep) +static inline int test_and_clear_young_ptes_notify(struct vm_area_struct *= vma, + unsigned long addr, pte_t *ptep, unsigned int nr) { int young; =20 - young =3D ptep_test_and_clear_young(vma, addr, ptep); - young |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + young =3D test_and_clear_young_ptes(vma, addr, ptep, nr); + young |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SI= ZE); return young; } =20 @@ -1843,9 +1843,15 @@ static inline int pmdp_test_and_clear_young_notify(s= truct vm_area_struct *vma, =20 #define clear_flush_young_ptes_notify clear_flush_young_ptes #define pmdp_clear_flush_young_notify pmdp_clear_flush_young -#define ptep_test_and_clear_young_notify ptep_test_and_clear_young +#define test_and_clear_young_ptes_notify test_and_clear_young_ptes #define pmdp_test_and_clear_young_notify pmdp_test_and_clear_young =20 #endif /* CONFIG_MMU_NOTIFIER */ =20 +static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *= vma, + unsigned long addr, pte_t *ptep) +{ + return test_and_clear_young_ptes_notify(vma, addr, ptep, 1); +} + #endif /* __MM_INTERNAL_H */ --=20 2.47.3