From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74DBD34E768 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=prjpBKH2Pizk30LFoaOUJ1zdSvdKij944F5w16PNCkn9ikndjuqC05UkLZk0fbUyxjd0jsQd5gG8ckeSfnUnWmJ1L8qnAtgacMvCwfYncTq+N9pGF0OlaMtGKWCVVkn/Q7gtwqyaaMzXZrx6Nd1eQvfCtvXPMIvD0Mpr3mXwCVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=A+uGFSOjp4jpmBRuOIkwLTviq/mFB5GM+JaXwoKxQ8Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=avsOuokWKd5EChhIOXzMg2fuHx5sDPH+ovxomJ1rCU5d+i8yIDo/+nVvYHqs7ZECP0M9bK+bEoHKnJHgj+bzHqmMZQGD7D5KtZG1hEUk7alopFnkY8CdYPotTvNz91zWYCIJR5AgwU3v/yxxQtVpGiYbCTfPidJ4OG1/M3Kl31k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ekkdZMD5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ekkdZMD5" Received: by smtp.kernel.org (Postfix) with ESMTPS id 16744C2BCC4; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=A+uGFSOjp4jpmBRuOIkwLTviq/mFB5GM+JaXwoKxQ8Y=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=ekkdZMD5eQDMQovCA/dQWkHuNNOANDIJNLlFIJqBvmXXezb8NsS2/sN3GmcCTlF6M q6q2xt6NZxVcYVNnAbDXWfCtoJGLa+JVEFHhx4djfcwAqO9fDIIL8Fa8BF5iXQr3q3 Sa/ygvFFaaJ2tw2En/+pqzuU0CkpUxCq6Mxbl7wiCgqRp2+1/FMdnKDgkiryEPa9E3 aymwL9S3AZS8TTCKnNDQW8YGJHGWMXkdv4NaBvv/9rZAaEqIdZoRjQ36Bs9hrEmIm5 Ijmm6cPaQK/6dS34WXVW3FkfJoGrkPBLv3FBIJZsBShHfvluW7f1xgftxBustsDLf4 nTmhvOL4byrOw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF2EBCD3423; Fri, 1 May 2026 21:04:11 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:47 +0800 Subject: [PATCH RFC 01/32] mm/memcontrol: make lru_zone_size atomic and simplify sanity check Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-1-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=4666; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=IkFEgPrNnMULleiYp1UZ+qFJc4vNB+W+MgG/aidkTiM=; b=6EU6Ld0m0s7fh+UjdoyV2mk70KaMjPaHaHpHNPNMEwb7zV+0OY9j1iIwRJGitS6PnymWdKxNA Fg86q+QfaGpCOJmVc3UGWg+5+l9KKYo0E3RxEGsZjoPuIOrcmx5BO1o X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song commit ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") introduced a sanity check to catch memcg counter underflow, which is more like a workaround for another bug: lru_zone_size is unsigned, so underflow will wrap it around and return an enormously large number, then the memcg shrinker will loop almost forever as the calculated number of folios to shrink is huge. That commit also checks if a zero value matches the empty LRU list, so we have to hold the LRU lock, and do the counter adding differently depending on whether the nr_pages is negative. But later commit b4536f0c829c ("mm, memcg: fix the active list aging for lowmem requests when memcg is enabled") already removed the LRU emptiness check, doing the adding differently is meaningless now. And if we just turn it into an atomic long, underflow isn't a big issue either, and can be checked at the reader side. The reader size is much less frequently called than the updater. So let's turn the counter into an atomic long and check at the reader side instead, which has a smaller overhead. Use atomic to avoid potential locking issue. The underflow correction is removed, which should be fine as if there is a mass leaking of the LRU size counter, something else may also have gone very wrong, and one should fix that leaking site instead. Besides, doing the sanity check in updater is unlikely to catch the leaking site, e.g. a folio was removed minutes ago without updating the counter, while there are still other folios on the LRU, the WARN won't be triggered until other folios are removed from a likely correct callsite. For now still keep the LRU lock context, in theory that can be removed too since the update is atomic, if we can tolerate a temporary inaccurate reading, but currently there is no benefit doing so yet. Signed-off-by: Kairui Song --- include/linux/memcontrol.h | 9 +++++++-- mm/memcontrol.c | 18 +----------------- mm/vmscan.c | 5 ----- 3 files changed, 8 insertions(+), 24 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dc3fa687759b..345a6ba8a3a7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -112,7 +112,7 @@ struct mem_cgroup_per_node { /* Fields which get updated often at the end. */ struct lruvec lruvec; CACHELINE_PADDING(_pad2_); - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; + atomic_long_t lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter; =20 /* @@ -884,10 +884,15 @@ static inline unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) { + long val; struct mem_cgroup_per_node *mz; =20 mz =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); - return READ_ONCE(mz->lru_zone_size[zone_idx][lru]); + val =3D atomic_long_read(&mz->lru_zone_size[zone_idx][lru]); + if (WARN_ON_ONCE(val < 0)) + return 0; + + return val; } =20 void __mem_cgroup_handle_over_high(gfp_t gfp_mask); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c03d4787d466..71fad2239973 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1484,28 +1484,12 @@ void mem_cgroup_update_lru_size(struct lruvec *lruv= ec, enum lru_list lru, int zid, long nr_pages) { struct mem_cgroup_per_node *mz; - unsigned long *lru_size; - long size; =20 if (mem_cgroup_disabled()) return; =20 mz =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); - lru_size =3D &mz->lru_zone_size[zid][lru]; - - if (nr_pages < 0) - *lru_size +=3D nr_pages; - - size =3D *lru_size; - if (WARN_ONCE(size < 0, - "%s(%p, %d, %ld): lru_size %ld\n", - __func__, lruvec, lru, nr_pages, size)) { - VM_BUG_ON(1); - *lru_size =3D 0; - } - - if (nr_pages > 0) - *lru_size +=3D nr_pages; + atomic_long_add(nr_pages, &mz->lru_zone_size[zid][lru]); } =20 /** diff --git a/mm/vmscan.c b/mm/vmscan.c index 8df21364ef71..53b43e3f5795 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1641,10 +1641,6 @@ unsigned int reclaim_clean_pages_from_list(struct zo= ne *zone, return nr_reclaimed; } =20 -/* - * Update LRU sizes after isolating pages. The LRU size updates must - * be complete before mem_cgroup_update_lru_size due to a sanity check. - */ static __always_inline void update_lru_sizes(struct lruvec *lruvec, enum lru_list lru, unsigned long *nr_zone_taken) { @@ -1656,7 +1652,6 @@ static __always_inline void update_lru_sizes(struct l= ruvec *lruvec, =20 update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]); } - } =20 /* --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74E5B423A74 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=gWk25P0SdUpS8qFMN/zMQ1zDJMAy0Mc/RXUUD6cgvxTKpf4UJm2WuKJ5F2zi4SRl2Rl99ScO77T+1LBLlfvowPiZ+hBtPEDrNlZvTHzp1uT8eGQcWlT1773N/+WBQ9QKplwVh+/Gdli3ZTBaY+MFQfvP9PWVEPUr8Ip3UHCIWSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=BtZCQ7Vt3/NuhMTCXCuVEsIVBPtMyx52Q47wxXig3d8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=di2p8PKNk+nj3oQg2y+0J+p0PLclrys7jGEpYAg938oEKXLRDrSyqMk31o8itvo31ND0IdSHKig6fTCTu4VS2TmVvu7rX6CBE445Q1YAy3WhZMhyx91RkrMMwLtF2LjioLGWURI7Fn3f44oeAgx+BCrqGvthj3mzonjk32CCN8U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hQ+RGrdA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hQ+RGrdA" Received: by smtp.kernel.org (Postfix) with ESMTPS id 217E8C2BCC7; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=BtZCQ7Vt3/NuhMTCXCuVEsIVBPtMyx52Q47wxXig3d8=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=hQ+RGrdAVg+Y7Czpeuyr/HnIGGt/gwxGoFve+P81oYA0YXQ3+nZ73exutv8Vl8i19 52QHz1UnEcEnNXBMGIoR2XGljZbV0686WKQKdEkbJL2bw2+5yl63/HY7V/NcEpyVZC E1mltgN6ym8uGIM9K48pJVxQo5+SqfRham8W0i0TN+vCMDjNWrXd58oZcEFjdNUOI1 MPltQAnP541kb0+KKgfeenXb0o9NGWqLQRORs/BOW49o3aBpl+xX/x1+2pXLXmwL/l yUZbWrg+aY7dslTIUBIHkGmOItSKilIJ9aK1jm27KH0VcoKXn4W8GhnPwB6/Ez6Du6 P6dWBJMkmQqRQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F957CD3425; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:48 +0800 Subject: [PATCH RFC 02/32] mm/memcontrol: allow update of LRU statistic without holding LRU lock Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-2-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=2443; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=7DMvb7Upt/bAaZ8grY+SULfcG8KtwGIhgqyxd6Mp/ho=; b=akq/u4487d3d37xhZaBqbl7RVG5EPHWQXhVgyD5RxHWhgIuLeyYAv7xavx0FgEr6W1ZuEMAME 3Ik58RxwOAlAu+wSikCBBnd3YkAKUEsEbo3Lq/yaKRgvyfSyBBFTfS+ X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song To enable moving file pages in folio_mark_accessed directly and lazily for MGLRU, allow updating the LRU statistic atomically without holding a lock. It may cause temporary counter underflow, which should be fine as we still follow final consistency of the counter, and it only serves as a factor for calculating the reclaim budget in vmscan. A little inaccuracy has no visible effect. Signed-off-by: Kairui Song --- include/linux/memcontrol.h | 2 +- include/linux/mm_inline.h | 3 +-- mm/memcontrol.c | 4 ++-- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 345a6ba8a3a7..2552f24afe38 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -889,7 +889,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruve= c *lruvec, =20 mz =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); val =3D atomic_long_read(&mz->lru_zone_size[zone_idx][lru]); - if (WARN_ON_ONCE(val < 0)) + if (val < 0) return 0; =20 return val; diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..045f9ee3880a 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -36,11 +36,10 @@ static __always_inline void __update_lru_size(struct lr= uvec *lruvec, { struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); =20 - lockdep_assert_held(&lruvec->lru_lock); WARN_ON_ONCE(nr_pages !=3D (int)nr_pages); =20 mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); - __mod_zone_page_state(&pgdat->node_zones[zid], + mod_zone_page_state(&pgdat->node_zones[zid], NR_ZONE_LRU_BASE + lru, nr_pages); } =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 71fad2239973..a3571763e813 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1477,8 +1477,8 @@ struct lruvec *folio_lruvec_lock_irqsave(struct folio= *folio, * @zid: zone id of the accounted pages * @nr_pages: positive when adding or negative when removing * - * This function must be called under lru_lock, just before a page is added - * to or just after a page is removed from an lru list. + * This function must be called when a page is added to or removed from + * an lru list. Caller need to protect the lruvec from being freed. */ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, long nr_pages) --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66E3D423145 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=jUVPEJVN4w0ITp3z1JFEvVaNEavqKzUyw5jMPluCNsMoiov16VZHK+wrKrsPX73UvQYMtaMnvaKYwoNSzIXytFUQQicwLaMXWbO2tDi6Oi7VJGapMwaz38owoYtA2OBA8BWw7aKEfutcegsai4Vcdw4hQGBUpCTUXa111UoqdIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=FZCKQ7dAiZgVTu75+CYq7b7DpOsNd2uVESBQWCbPnj4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=goAA0lWQDNyjcLDj260keRxKSO4za07+zn5X/2dq49wm8zLszkAywvhIzCCbracAh6napYCRWtwsRnng+pcntcHMiCSCz53qphT1fvPyC9WzIX3Veoc1ecB9uFWM5nPCa6XmYeuDEA3lRdvia9FXo6seE062KKxIxF4eaoQXRSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SGzqDyGy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SGzqDyGy" Received: by smtp.kernel.org (Postfix) with ESMTPS id 385E9C2BCF5; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=FZCKQ7dAiZgVTu75+CYq7b7DpOsNd2uVESBQWCbPnj4=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=SGzqDyGyJ4w5hRkgd6dPrS/GfXZyo7yCVpkPFZD2nWzK5IdJQzIitwKdewAq9Cz2V f9IAnTG3cpgTqD3T12tQuGQa+2lF/4g9wIVeY1EXq48g/Lkiok4myShQoo1MMxWCTN Rfan5MHAyulfM150Hty9+GSC8fKMuw7bcmIt1SXCEoAYmIIVkkGGZfr2CQ4zcQqqQE Z+XVYR2FTaKK9uVwt9Og87PfLkAwFfNQLcPhgrg17kBW6ha/WKp+4CXz/bAT4XOTu3 UQ0sYIaf/kpMdbVwKK9ljgnNmu2ZU6jwWKFP7yuWJoDAdio/G8Ahl8blbJksiT4lqK ulCn7LBWoTY4A== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24F1ACCFA13; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:49 +0800 Subject: [PATCH RFC 03/32] mm/mglru: wrap all access to folio flags with accessor Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-3-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=7063; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=08Uk+DXB3RZOMcAhxRkisbJ6KbJA8hUQR439Qy3O5lM=; b=KPwlw0vvTZCg+2YG2oAv4raYUSjN+YfzLdm0ytNJbQ93DQJD0LmoX3PWfR3ci+xShOYwcYaXv xNtRqOERu7hAaMtRSKQiNQNitjmcDDSeyoGL+avgAUWsFZklJdVxIzU X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Instead of reading folio->flags.f, use the folio_flags helper which is design exactly for this which more checks. Signed-off-by: Kairui Song --- include/linux/mm_inline.h | 12 ++++++------ mm/swap.c | 8 ++++---- mm/vmscan.c | 16 ++++++++-------- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 045f9ee3880a..9c8ad8af37de 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -143,7 +143,7 @@ static inline int lru_tier_from_refs(int refs, bool wor= kingset) =20 static inline int folio_lru_refs(const struct folio *folio) { - unsigned long flags =3D READ_ONCE(folio->flags.f); + unsigned long flags =3D READ_ONCE(*const_folio_flags(folio, 0)); =20 if (!(flags & BIT(PG_referenced))) return 0; @@ -156,7 +156,7 @@ static inline int folio_lru_refs(const struct folio *fo= lio) =20 static inline int folio_lru_gen(const struct folio *folio) { - unsigned long flags =3D READ_ONCE(folio->flags.f); + unsigned long flags =3D READ_ONCE(*const_folio_flags(folio, 0)); =20 return ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; } @@ -269,7 +269,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lru= vec, struct folio *folio, gen =3D lru_gen_from_seq(seq); flags =3D (gen + 1UL) << LRU_GEN_PGOFF; /* see the comment on MIN_NR_GENS about PG_active */ - set_mask_bits(&folio->flags.f, LRU_GEN_MASK | BIT(PG_active), flags); + set_mask_bits(folio_flags(folio, 0), LRU_GEN_MASK | BIT(PG_active), flags= ); =20 lru_gen_update_size(lruvec, folio, -1, gen); /* for folio_rotate_reclaimable() */ @@ -294,7 +294,7 @@ static inline bool lru_gen_del_folio(struct lruvec *lru= vec, struct folio *folio, =20 /* for folio_migrate_flags() */ flags =3D !reclaiming && lru_gen_is_active(lruvec, gen) ? BIT(PG_active) = : 0; - flags =3D set_mask_bits(&folio->flags.f, LRU_GEN_MASK, flags); + flags =3D set_mask_bits(folio_flags(folio, 0), LRU_GEN_MASK, flags); gen =3D ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; =20 lru_gen_update_size(lruvec, folio, gen, -1); @@ -305,9 +305,9 @@ static inline bool lru_gen_del_folio(struct lruvec *lru= vec, struct folio *folio, =20 static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) { - unsigned long refs =3D READ_ONCE(old->flags.f) & LRU_REFS_MASK; + unsigned long refs =3D READ_ONCE(*const_folio_flags(old, 0)) & LRU_REFS_M= ASK; =20 - set_mask_bits(&new->flags.f, LRU_REFS_MASK, refs); + set_mask_bits(folio_flags(new, 0), LRU_REFS_MASK, refs); } #else /* !CONFIG_LRU_GEN */ =20 diff --git a/mm/swap.c b/mm/swap.c index e3cf703ccb89..e7037ea2c10f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -392,14 +392,14 @@ static void __lru_cache_activate_folio(struct folio *= folio) =20 static void lru_gen_inc_refs(struct folio *folio) { - unsigned long new_flags, old_flags =3D READ_ONCE(folio->flags.f); + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); =20 if (folio_test_unevictable(folio)) return; =20 /* see the comment on LRU_REFS_FLAGS */ if (!folio_test_referenced(folio)) { - set_mask_bits(&folio->flags.f, LRU_REFS_MASK, BIT(PG_referenced)); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); return; } =20 @@ -411,7 +411,7 @@ static void lru_gen_inc_refs(struct folio *folio) } =20 new_flags =3D old_flags + BIT(LRU_REFS_PGOFF); - } while (!try_cmpxchg(&folio->flags.f, &old_flags, new_flags)); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); } =20 static bool lru_gen_clear_refs(struct folio *folio) @@ -423,7 +423,7 @@ static bool lru_gen_clear_refs(struct folio *folio) if (gen < 0) return true; =20 - set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS | BIT(PG_workingset), 0); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS | BIT(PG_workingset),= 0); =20 rcu_read_lock(); seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); diff --git a/mm/vmscan.c b/mm/vmscan.c index 53b43e3f5795..7a1f08147dee 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -844,11 +844,11 @@ static bool lru_gen_set_refs(struct folio *folio) { /* see the comment on LRU_REFS_FLAGS */ if (!folio_test_referenced(folio) && !folio_test_workingset(folio)) { - set_mask_bits(&folio->flags.f, LRU_REFS_MASK, BIT(PG_referenced)); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); return false; } =20 - set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_workingset)); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS, BIT(PG_workingset)); return true; } #else @@ -3193,13 +3193,13 @@ static bool positive_ctrl_err(struct ctrl_pos *sp, = struct ctrl_pos *pv) /* promote pages accessed through page tables */ static int folio_update_gen(struct folio *folio, int gen) { - unsigned long new_flags, old_flags =3D READ_ONCE(folio->flags.f); + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); =20 VM_WARN_ON_ONCE(gen >=3D MAX_NR_GENS); =20 /* see the comment on LRU_REFS_FLAGS */ if (!folio_test_referenced(folio) && !folio_test_workingset(folio)) { - set_mask_bits(&folio->flags.f, LRU_REFS_MASK, BIT(PG_referenced)); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); return -1; } =20 @@ -3210,7 +3210,7 @@ static int folio_update_gen(struct folio *folio, int = gen) =20 new_flags =3D old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS); new_flags |=3D ((gen + 1UL) << LRU_GEN_PGOFF) | BIT(PG_workingset); - } while (!try_cmpxchg(&folio->flags.f, &old_flags, new_flags)); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; } @@ -3221,7 +3221,7 @@ static int folio_inc_gen(struct lruvec *lruvec, struc= t folio *folio) int type =3D folio_is_file_lru(folio); struct lru_gen_folio *lrugen =3D &lruvec->lrugen; int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); - unsigned long new_flags, old_flags =3D READ_ONCE(folio->flags.f); + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); =20 VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio); =20 @@ -4639,7 +4639,7 @@ static bool isolate_folio(struct lruvec *lruvec, stru= ct folio *folio, struct sca =20 /* see the comment on LRU_REFS_FLAGS */ if (!folio_test_referenced(folio)) - set_mask_bits(&folio->flags.f, LRU_REFS_MASK, 0); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, 0); =20 success =3D lru_gen_del_folio(lruvec, folio, true); VM_WARN_ON_ONCE_FOLIO(!success, folio); @@ -4855,7 +4855,7 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, =20 /* don't add rejected folios to the oldest generation */ if (lru_gen_folio_seq(lruvec, folio, false) =3D=3D min_seq[type]) - set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active)); + set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS, BIT(PG_active)); } =20 move_folios_to_lru(&list); --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91D49423A9B for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=XSssEX8SIb515p7qv0+wW2p71mOhbzTG68VhtJcXB3sKO5EIa50zWmt1BvFozGfQkDW1vOn2u8BHslL6ff8vjj35yOcJqav8ma95sqmQSsuxhlCdO4xXRarpuhrMWeu/lrpX6G24GUC1TW9nxtcqUV/1WEvjYuC0183HCs5ZSTA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=+yHr1JUmHMBO/y47bjd3AbfSo7VRCUCZK/eF6pT8Uug=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=i0gkBYAmV8+VhH6FRFKaQNug3v3LuhIA0mHUO5rn46a501j2zfJTgYusdDcZSun0vpr0mbs5RbKVH2efsvyxL6ewwxZTqWnc6m8uLaT2NPrn00gYdHGLEi01gxUovfXQ7s6EgljIbCBK1cmaYqJ7L5MWSoCibaMraoVNvZlDCF8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OAmSEqbq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OAmSEqbq" Received: by smtp.kernel.org (Postfix) with ESMTPS id 46A74C2BCFC; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=+yHr1JUmHMBO/y47bjd3AbfSo7VRCUCZK/eF6pT8Uug=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=OAmSEqbqJCQDBhXLKYIAnCuNdNDAI6pKrDwh/DjUCWjNu8cLS3tHTGXHZVorD14Li OPmJkGKv7vsnpOFRfHRYH3qXtPMskwMS5wRKs2XeO3GczliySj/3Gij6q1sIsWpsVU bwNaE7FHnKJDxTeozfbgMzIN6st+BJqi2vGhtYuxS3efBAJJtntLAtM05A3qAm/8nr u80HjvTN33int/JjlncUHKHtkX5mxssSD72rh44777ddeFdt6nKxrxDvyhSGhA/499 Xg6BJatkK0YUkFhB9MNADd0k7w3MW3e6HQsZOk4rWEff6OXSX81R/1o2lVtqo0z3oU Z5oXEeJaBhFGQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A224CD3428; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:50 +0800 Subject: [PATCH RFC 04/32] mm/mglru: introduce and use helpers for updating lru_gen refs and gen Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-4-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=11174; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=mzihrAcNLx/NNYmCzxNjqi9MDgaAfA8pw6JJnBYfLik=; b=j0O5d5xSHhkEfT9k9tlkSBlFTxKwdl0D0BGzjPwChutOeH8vLWMM6u/U27G5/QSuQHgqcNsvU AL5Gr7VHj2sB1LF7d9VyPPUEF4dpyE0lonDCo0bpHOgWsQzqejyhscP X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Instead of keep touching the raw page flags, use helpers for adjusting folio's refs and gen info, make the code easier to debug and understand. Signed-off-by: Kairui Song --- include/linux/mm_inline.h | 79 +++++++++++++++++++++++++++++++++++++++++--= ---- include/linux/mmzone.h | 2 ++ mm/migrate.c | 2 -- mm/swap.c | 17 +++++----- mm/vmscan.c | 52 ++++++++++++++++++------------- 5 files changed, 111 insertions(+), 41 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 9c8ad8af37de..eade9f2d6afc 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -141,10 +141,42 @@ static inline int lru_tier_from_refs(int refs, bool w= orkingset) return workingset ? MAX_NR_TIERS - 1 : order_base_2(refs); } =20 -static inline int folio_lru_refs(const struct folio *folio) +/** + * lru_gen_from_flags - Return the LRU generation number from folio flags. + * @flags: folio flags + * + * Returns: A number between 0 and LRU_GEN_MAX, inclusive. Returns -1 if t= he + * flags indicate the folio is off the list (e.g., isolated). + */ +static inline int lru_gen_from_flags(unsigned long flags) { - unsigned long flags =3D READ_ONCE(*const_folio_flags(folio, 0)); + int gen =3D ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF); =20 + gen -=3D 1; + VM_WARN_ON_ONCE(gen !=3D -1 && gen > LRU_GEN_MAX); + return gen; +} + +/** + * lru_gen_set_flags - Set the LRU generation number to specified folio fl= ags. + * @flags: pointer to the folio flags + * @gen: generation number, between 0 and LRU_GEN_MAX, inclusive. + */ +static inline void lru_gen_set_flags(unsigned long *flags, int gen) +{ + VM_WARN_ON_ONCE(gen > LRU_GEN_MAX || gen < 0); + BUILD_BUG_ON((LRU_GEN_MAX + 1) !=3D MAX_NR_GENS); + + *flags &=3D ~LRU_GEN_MASK; + *flags |=3D (gen + 1UL) << LRU_GEN_PGOFF; +} + +/** + * lru_refs_from_flags - Return LRU referenced / access count from folio f= lags. + * @flags: folio flags + */ +static inline int lru_refs_from_flags(unsigned long flags) +{ if (!(flags & BIT(PG_referenced))) return 0; /* @@ -154,18 +186,46 @@ static inline int folio_lru_refs(const struct folio *= folio) return ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) + 1; } =20 -static inline int folio_lru_gen(const struct folio *folio) +/** + * lru_refs_set_flags - Set the LRU referenced / access count to specified= folio flags. + * @flags: pointer to the folio flags + * @refs: referenced / access count number, between 0 and LRU_REFS_MAX, in= clusive. + */ +static inline void lru_refs_set_flags(unsigned long *flags, unsigned int r= efs) +{ + VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); + + *flags &=3D ~LRU_REFS_FLAGS; + if (!refs) + return; + *flags |=3D (BIT(PG_referenced) | ((refs - 1UL) << LRU_REFS_PGOFF)); +} + +static inline int folio_lru_refs(const struct folio *folio) +{ + return lru_refs_from_flags(READ_ONCE(*const_folio_flags(folio, 0))); +} + +static inline void folio_set_lru_refs(struct folio *folio, unsigned int re= fs) { - unsigned long flags =3D READ_ONCE(*const_folio_flags(folio, 0)); + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); + + do { + new_flags =3D old_flags; + lru_refs_set_flags(&new_flags, refs); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); +} =20 - return ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; +static inline int folio_lru_gen(const struct folio *folio) +{ + return lru_gen_from_flags(READ_ONCE(*const_folio_flags(folio, 0))); } =20 static inline bool lru_gen_is_active(const struct lruvec *lruvec, int gen) { unsigned long max_seq =3D lruvec->lrugen.max_seq; =20 - VM_WARN_ON_ONCE(gen >=3D MAX_NR_GENS); + VM_WARN_ON_ONCE(gen > LRU_GEN_MAX); =20 /* see the comment on MIN_NR_GENS */ return gen =3D=3D lru_gen_from_seq(max_seq) || gen =3D=3D lru_gen_from_se= q(max_seq - 1); @@ -305,9 +365,7 @@ static inline bool lru_gen_del_folio(struct lruvec *lru= vec, struct folio *folio, =20 static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) { - unsigned long refs =3D READ_ONCE(*const_folio_flags(old, 0)) & LRU_REFS_M= ASK; - - set_mask_bits(folio_flags(new, 0), LRU_REFS_MASK, refs); + folio_set_lru_refs(new, folio_lru_refs(old)); } #else /* !CONFIG_LRU_GEN */ =20 @@ -338,7 +396,8 @@ static inline bool lru_gen_del_folio(struct lruvec *lru= vec, struct folio *folio, =20 static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) { - + if (folio_test_referenced(old)) + folio_set_referenced(new); } #endif /* CONFIG_LRU_GEN */ =20 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9adb2ad21da5..e4c51961ec27 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -493,7 +493,9 @@ enum lruvec_flags { #ifndef __GENERATING_BOUNDS_H =20 #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) +#define LRU_GEN_MAX (BIT(LRU_GEN_WIDTH - 1) - 1) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) +#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH) - 1) =20 /* * For folios accessed multiple times through file descriptors, diff --git a/mm/migrate.c b/mm/migrate.c index 8a64291ab5b4..23248484a165 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -763,8 +763,6 @@ void folio_migrate_flags(struct folio *newfolio, struct= folio *folio) { int cpupid; =20 - if (folio_test_referenced(folio)) - folio_set_referenced(newfolio); if (folio_test_uptodate(folio)) folio_mark_uptodate(newfolio); if (folio_test_clear_active(folio)) { diff --git a/mm/swap.c b/mm/swap.c index e7037ea2c10f..6204496d48f5 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -393,24 +393,26 @@ static void __lru_cache_activate_folio(struct folio *= folio) static void lru_gen_inc_refs(struct folio *folio) { unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); + int refs; =20 if (folio_test_unevictable(folio)) return; =20 - /* see the comment on LRU_REFS_FLAGS */ - if (!folio_test_referenced(folio)) { - set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); + if (!folio_lru_refs(folio)) { + folio_set_lru_refs(folio, 1); return; } =20 + /* see the comment on LRU_REFS_FLAGS */ do { - if ((old_flags & LRU_REFS_MASK) =3D=3D LRU_REFS_MASK) { + new_flags =3D old_flags; + refs =3D lru_refs_from_flags(old_flags); + if (refs =3D=3D LRU_REFS_MAX) { if (!folio_test_workingset(folio)) folio_set_workingset(folio); return; } - - new_flags =3D old_flags + BIT(LRU_REFS_PGOFF); + lru_refs_set_flags(&new_flags, refs + 1); } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); } =20 @@ -423,7 +425,8 @@ static bool lru_gen_clear_refs(struct folio *folio) if (gen < 0) return true; =20 - set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS | BIT(PG_workingset),= 0); + folio_set_lru_refs(folio, 0); + folio_clear_workingset(folio); =20 rcu_read_lock(); seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); diff --git a/mm/vmscan.c b/mm/vmscan.c index 7a1f08147dee..2ca1d6d80259 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -843,12 +843,14 @@ enum folio_references { static bool lru_gen_set_refs(struct folio *folio) { /* see the comment on LRU_REFS_FLAGS */ - if (!folio_test_referenced(folio) && !folio_test_workingset(folio)) { - set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); + if (!folio_lru_refs(folio) && !folio_test_workingset(folio)) { + folio_set_lru_refs(folio, 1); return false; } =20 - set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS, BIT(PG_workingset)); + folio_set_lru_refs(folio, 0); + folio_set_workingset(folio); + return true; } #else @@ -3191,28 +3193,31 @@ static bool positive_ctrl_err(struct ctrl_pos *sp, = struct ctrl_pos *pv) *************************************************************************= *****/ =20 /* promote pages accessed through page tables */ -static int folio_update_gen(struct folio *folio, int gen) +static int folio_update_gen(struct folio *folio, int new_gen) { unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); - - VM_WARN_ON_ONCE(gen >=3D MAX_NR_GENS); + int old_gen; =20 /* see the comment on LRU_REFS_FLAGS */ - if (!folio_test_referenced(folio) && !folio_test_workingset(folio)) { - set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, BIT(PG_referenced)); + if (!lru_refs_from_flags(old_flags) && !folio_test_workingset(folio)) { + folio_set_lru_refs(folio, 1); return -1; } =20 do { + old_gen =3D lru_gen_from_flags(old_flags); + new_flags =3D old_flags; + /* lru_gen_del_folio() has isolated this page? */ - if (!(old_flags & LRU_GEN_MASK)) - return -1; + if (old_gen < 0) + break; =20 - new_flags =3D old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS); - new_flags |=3D ((gen + 1UL) << LRU_GEN_PGOFF) | BIT(PG_workingset); + lru_gen_set_flags(&new_flags, new_gen); + lru_refs_set_flags(&new_flags, 0); + new_flags |=3D BIT(PG_workingset); } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 - return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; + return old_gen; } =20 /* protect pages accessed multiple times through file descriptors */ @@ -3220,17 +3225,18 @@ static int folio_inc_gen(struct lruvec *lruvec, str= uct folio *folio) { int type =3D folio_is_file_lru(folio); struct lru_gen_folio *lrugen =3D &lruvec->lrugen; - int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); + int old_gen, new_gen, min_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); =20 - VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio); - do { - new_gen =3D ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; + old_gen =3D lru_gen_from_flags(old_flags); + VM_WARN_ON_ONCE_FOLIO(old_gen < 0, folio); + /* folio_update_gen() has promoted this page? */ - if (new_gen >=3D 0 && new_gen !=3D old_gen) - return new_gen; + if (old_gen >=3D 0 && old_gen !=3D min_gen) + return old_gen; =20 + new_flags =3D old_flags; new_gen =3D (old_gen + 1) % MAX_NR_GENS; =20 new_flags =3D old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS); @@ -4639,7 +4645,7 @@ static bool isolate_folio(struct lruvec *lruvec, stru= ct folio *folio, struct sca =20 /* see the comment on LRU_REFS_FLAGS */ if (!folio_test_referenced(folio)) - set_mask_bits(folio_flags(folio, 0), LRU_REFS_MASK, 0); + folio_set_lru_refs(folio, 0); =20 success =3D lru_gen_del_folio(lruvec, folio, true); VM_WARN_ON_ONCE_FOLIO(!success, folio); @@ -4854,8 +4860,10 @@ static int evict_folios(unsigned long nr_to_scan, st= ruct lruvec *lruvec, } =20 /* don't add rejected folios to the oldest generation */ - if (lru_gen_folio_seq(lruvec, folio, false) =3D=3D min_seq[type]) - set_mask_bits(folio_flags(folio, 0), LRU_REFS_FLAGS, BIT(PG_active)); + if (lru_gen_folio_seq(lruvec, folio, false) =3D=3D min_seq[type]) { + folio_set_lru_refs(folio, 0); + folio_set_active(folio); + } } =20 move_folios_to_lru(&list); --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0A2F425CC3 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=fr8CG9e8CueatRbNpEfHA/nxpQ/o4xXzPJ1BQs7fzcYGATOFPmpr6uo2BpXUfbkD75P/CJ3HvwcTsUfxyoOtfgIdPgSki/TjYfGWq5EKMaKnJwWwNQ6OXttbB6obMZCVBYAEqJPgr6X7Ak7AmcAMuz59iZ3xbDtg3a6iukPiyxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=jmvg4ef5qUyTqHGcbdIbJiby1sIH8SV8blDyMOM5pJs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AL7DNdK95tiHrGDy6WkZCUIXUYTUcxJF9FCoahlAN8fMPSH3oEpCQIYZhhxk/80lKagrc0MfCjOWZZzteE5cX/Otn40F9Tt3iA5aJqLLd2v0fY39xAsDu4jca4TuOpMb1uMPfeIZP00F9cqedM2aCBrqDdoCMsuRHtAbjoYyRPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lEMAmBpw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lEMAmBpw" Received: by smtp.kernel.org (Postfix) with ESMTPS id 585DBC2BD01; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=jmvg4ef5qUyTqHGcbdIbJiby1sIH8SV8blDyMOM5pJs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=lEMAmBpwinUPOn+VOa+VxZQccVjhzOCi2jxDLmzY8AWQSOp7kQ6Gxxowwb9gGocCS qMM8PA8rtEvRauTjYZoq9i0Ju9MaujRSD/WKLrFTOKLa9E0a1oM6815wFfp4U7OjGc HBZpdEFB3fA6KX39tbuKhHnxC3mGRqP1ff16oCK8xTCe4EhL5vGLtOh0psv2P3dCXV BoIAfzDu6aItQaLsk6Zq30+srTO8bBkw3qJ0GxGHKFkyTasPvZczux7GnKbY3kygR8 MHUfdjCvknaL1bqdeYoBu2d3NKdSsZF3law98bIs8Ce0PmCEwgznn3FhlaSy0nCYfR vhKC1iTda4Qwg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FC53CD3427; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:51 +0800 Subject: [PATCH RFC 05/32] mm/mglru: make generation page counters atomic Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-5-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=5037; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=ANTm+1uGlwjkYEz3d0r4O5bWVxUXbGeSOOlpFXDQZtE=; b=6f8SHaIAMJ2FbcFM8OPiHetkk+iOjEr7ZTqQSex6tvBYvhrw7VFABVnSnuDklt2MY9OcOWzLH u8dQtyGdgMGCiVN6bTI+xzsJUmC67F2zwSdZ40u6ye9d4hfp6B7q05g X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, convert them to atomic so we can update them without holding the LRU lock. There is no risk of overflow. The reader always compares and uses zero instead if the counter values are negative. It follows final consistency. Signed-off-by: Kairui Song --- include/linux/mm_inline.h | 6 ++---- include/linux/mmzone.h | 2 +- mm/vmscan.c | 18 ++++++++---------- 3 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index eade9f2d6afc..8a3fb357dc15 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -245,11 +245,9 @@ static inline void lru_gen_update_size(struct lruvec *= lruvec, struct folio *foli VM_WARN_ON_ONCE(old_gen =3D=3D -1 && new_gen =3D=3D -1); =20 if (old_gen >=3D 0) - WRITE_ONCE(lrugen->nr_pages[old_gen][type][zone], - lrugen->nr_pages[old_gen][type][zone] - delta); + atomic_long_sub(delta, &lrugen->nr_pages[old_gen][type][zone]); if (new_gen >=3D 0) - WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone], - lrugen->nr_pages[new_gen][type][zone] + delta); + atomic_long_add(delta, &lrugen->nr_pages[new_gen][type][zone]); =20 /* addition */ if (old_gen < 0) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e4c51961ec27..721d0db8b0f9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -569,7 +569,7 @@ struct lru_gen_folio { /* the multi-gen LRU lists, lazily sorted on eviction */ struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the multi-gen LRU sizes, eventually consistent */ - long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + atomic_long_t nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 2ca1d6d80259..a5b4750a5028 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3280,8 +3280,7 @@ static void reset_batch_size(struct lru_gen_mm_walk *= walk) continue; =20 walk->nr_pages[gen][type][zone] =3D 0; - WRITE_ONCE(lrugen->nr_pages[gen][type][zone], - lrugen->nr_pages[gen][type][zone] + delta); + atomic_long_add(delta, &lrugen->nr_pages[gen][type][zone]); =20 if (lru_gen_is_active(lruvec, gen)) lru +=3D LRU_ACTIVE; @@ -3971,8 +3970,8 @@ static bool inc_max_seq(struct lruvec *lruvec, unsign= ed long seq, int swappiness for (type =3D 0; type < ANON_AND_FILE; type++) { for (zone =3D 0; zone < MAX_NR_ZONES; zone++) { enum lru_list lru =3D type * LRU_INACTIVE_FILE; - long delta =3D lrugen->nr_pages[prev][type][zone] - - lrugen->nr_pages[next][type][zone]; + long delta =3D atomic_long_read(&lrugen->nr_pages[prev][type][zone]) - + atomic_long_read(&lrugen->nr_pages[next][type][zone]); =20 if (!delta) continue; @@ -4090,7 +4089,7 @@ static unsigned long lruvec_evictable_size(struct lru= vec *lruvec, int swappiness for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) { gen =3D lru_gen_from_seq(seq); for (zone =3D 0; zone < MAX_NR_ZONES; zone++) - total +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L); + total +=3D max(atomic_long_read(&lrugen->nr_pages[gen][type][zone]), 0= L); } } =20 @@ -4525,7 +4524,7 @@ static void __lru_gen_reparent_memcg(struct lruvec *c= hild_lruvec, struct lruvec =20 for (i =3D 0; i < get_nr_gens(child_lruvec, type); i++) { int gen =3D lru_gen_from_seq(child_lrugen->max_seq - i); - long nr_pages =3D child_lrugen->nr_pages[gen][type][zone]; + long nr_pages =3D atomic_long_read(&child_lrugen->nr_pages[gen][type][zo= ne]); int child_lru_active =3D lru_gen_is_active(child_lruvec, gen) ? LRU_ACTI= VE : 0; int parent_lru_active =3D lru_gen_is_active(parent_lruvec, gen) ? LRU_AC= TIVE : 0; =20 @@ -4533,9 +4532,8 @@ static void __lru_gen_reparent_memcg(struct lruvec *c= hild_lruvec, struct lruvec list_splice_tail_init(&child_lrugen->folios[gen][type][zone], &parent_lrugen->folios[gen][type][zone]); =20 - WRITE_ONCE(child_lrugen->nr_pages[gen][type][zone], 0); - WRITE_ONCE(parent_lrugen->nr_pages[gen][type][zone], - parent_lrugen->nr_pages[gen][type][zone] + nr_pages); + atomic_long_set(&child_lrugen->nr_pages[gen][type][zone], 0); + atomic_long_add(nr_pages, &parent_lrugen->nr_pages[gen][type][zone]); =20 if (lru_gen_is_active(child_lruvec, gen) !=3D lru_gen_is_active(parent_l= ruvec, gen)) { __update_lru_size(child_lruvec, lru + child_lru_active, zone, -nr_pages= ); @@ -5555,7 +5553,7 @@ static int lru_gen_seq_show(struct seq_file *m, void = *v) char mark =3D full && seq < min_seq[type] ? 'x' : ' '; =20 for (zone =3D 0; zone < MAX_NR_ZONES; zone++) - size +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L); + size +=3D max(atomic_long_read(&lrugen->nr_pages[gen][type][zone]), 0L= ); =20 seq_printf(m, " %10lu%c", size, mark); } --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7639425CC5 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=sKCIUhFZAz9gyVvrqM2YMi3a9RZzBZ6Z6M9pKTnThtCLBK1KEoT8vyZD6JLw9+UPHy+vNpnCSGSSV0ts7AJ8tbqKh5HLS7yRjKIq57d4DpNaUuhZHEHOv38aZcxpJdlQlERl2UWfM+pzF6n1pws3olYASKU3irHhtWXjs8YHJ8Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=jy+lfB6WG519rQVWXB9k0U5SPmV0HyD/ptBklKs+JGE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TAgXKe5BKAn4DC12yCurhNBAKOhH1QhCKxPH/Y2CzVVbY2M+uZ5XO1Tm0d9D/slw2fFN9EWLZ2W2gvxSyCszlE/sBgKIZVgc1W/5Bj3ISIPw6sLVVh2PUGXV+5TB1Xp/5htb1Bk5k2y8qJ4NwBQ0C3ivf6fkayoKKzd+WiUuo7Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hW93sxax; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hW93sxax" Received: by smtp.kernel.org (Postfix) with ESMTPS id 6E264C2BCB4; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=jy+lfB6WG519rQVWXB9k0U5SPmV0HyD/ptBklKs+JGE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=hW93sxaxUJWPEvomFjJazR6iTblgPm0xZZ4EYlXappbWUPo92bmfUOH1GRH7SiJAt 1b99L1opus4jgZF22p4TnC0fo5PlHcdUlBnP5/pUXIF/HDrWl/AlAY6fPisNZWvvZY U7lqayn0rXnA+b9WN4PbMXFyC2ZdEqdz+4ZjGC2oHuEwrCkrw0eaK+/xHsjSb48vSm EheMlhR+Slo9dt6LGem+Bqu4YAO+v/0GqsjWzubxnZmPKUM0J49btR5KEOReK4T4OM eputwSB8vOtF3u7xMVBz9rt+Gxt75aMGEuS7ihsEvnLLZfg97g1UyfC9GfDp1dzmc8 KnDS9Cqao5uAA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64B5CCCFA13; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:52 +0800 Subject: [PATCH RFC 06/32] mm/mglru: frequency guided workingset promotion (MGLRU-FG) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-6-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=31159; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=nqgE3bRYpiEGBjHmyiK9hsIuBnvRLThtCXFVokE5+4U=; b=lgNk6FfuVePNIV8QzMKahhRT7F44WxnkFykS6orRUsI+oHbRHfv7ngt9czSFSGkHbKpPwVcQb 271Dr919oI5DlImkyL5ftk5jqXHd7kS+M4R4aWDONQgTwKNxMffah1E X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song For file folios, MGLRU mostly relies on tier protection to protect them on eviction. This led to at least two problems: - It has a long feedback loop: higher-tier folios won't be protected until enough re-faults have been triggered. And by the time protection happens, the folio is likely no longer a hot page. - We have limited tiers, so if the access count of a set of folios goes beyond the tier limit, MGLRU is no longer able to tell which folio is hotter. And tiering is not working efficiently for anon folios. Anon folios rarely stay on any tiers except tier 0 and tier 3, the page access will promote and change the folio tier aggressively for them. So tweak the tiering mechanism a bit: for file folios, promote them more adaptively. For folios that have an access count beyond the maximum tiering count (LRU_REFS_MAX), reset its tier and promote it into the next gen. This is just like protection, but happens at access time and not eviction time. Compared to the previous protection mechanism, this happens more proactively, and still won't over-protect file pages. The promotion is very conservative, one gen at a time, and each promotion resets the reference times count. Each promotion requires 8 accesses to occur. And now tiering is much more immune to the count overflow issue, as the overflowed folio will be promoted and reset its count. And the lower tier in newer gen have a higher protect priority (less likely to be evicted) than the higher tier in lower gen, which seems reasonable as always. And for workingset tracking, we can simply consider all folios with a referenced times count >=3D 2 (LRU_REFS_WORKINGSET) as a workingset. This makes the workingset tracking similar to what active/inactive LRU had, so in-kernel checks for workingset (e.g., PSI, readahead) will have a more consistent behavior on MGLRU. Currently, MGLRU is causing a lower PSI reading in some workloads. Note that PG_workingset and PG_referenced are no longer used as individual flags under MGLRU; they are now the low two bits of the LRU reference count encoding. Adjusting the existing raw folio_test_*() callers to the new semantics is left to a follow-up. Behavior of active/inactive LRU is not changed. Signed-off-by: Kairui Song --- include/linux/mm_inline.h | 58 +++++++++++++---------- include/linux/mmzone.h | 104 ++++++++++++++++++++++++++--------------- kernel/bounds.c | 2 +- mm/swap.c | 93 +++++++++++++++++++++++++++++-------- mm/vmscan.c | 115 ++++++++++++++++++++++--------------------= ---- mm/workingset.c | 31 +++++++------ 6 files changed, 248 insertions(+), 155 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 8a3fb357dc15..a9ed9a79364e 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -133,12 +133,12 @@ static inline int lru_hist_from_seq(unsigned long seq) return seq % NR_HIST_GENS; } =20 -static inline int lru_tier_from_refs(int refs, bool workingset) +static inline int lru_tier_from_refs(unsigned int refs) { - VM_WARN_ON_ONCE(refs > BIT(LRU_REFS_WIDTH)); - - /* see the comment on MAX_NR_TIERS */ - return workingset ? MAX_NR_TIERS - 1 : order_base_2(refs); + VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); + if (refs < LRU_REFS_WORKINGSET) + return 0; + return fls(refs - 1); } =20 /** @@ -177,13 +177,16 @@ static inline void lru_gen_set_flags(unsigned long *f= lags, int gen) */ static inline int lru_refs_from_flags(unsigned long flags) { - if (!(flags & BIT(PG_referenced))) - return 0; + int refs; + /* - * Return the total number of accesses including PG_referenced. Also see - * the comment on LRU_REFS_FLAGS. + * Return the total number of accesses. Also see the comment on + * LRU_REFS_FLAGS. */ - return ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) + 1; + refs =3D (flags & BIT(PG_referenced)) ? BIT(0) : 0; + refs +=3D (flags & BIT(PG_workingset)) ? BIT(1) : 0; + refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 2; + return refs; } =20 /** @@ -196,9 +199,11 @@ static inline void lru_refs_set_flags(unsigned long *f= lags, unsigned int refs) VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); =20 *flags &=3D ~LRU_REFS_FLAGS; - if (!refs) - return; - *flags |=3D (BIT(PG_referenced) | ((refs - 1UL) << LRU_REFS_PGOFF)); + if (refs & BIT(0)) + *flags |=3D BIT(PG_referenced); + if (refs & BIT(1)) + *flags |=3D BIT(PG_workingset); + *flags |=3D (((unsigned long)refs) >> 2) << LRU_REFS_PGOFF; } =20 static inline int folio_lru_refs(const struct folio *folio) @@ -280,23 +285,24 @@ static inline unsigned long lru_gen_folio_seq(const s= truct lruvec *lruvec, bool reclaiming) { int gen; + int refs =3D folio_lru_refs(folio); int type =3D folio_is_file_lru(folio); const struct lru_gen_folio *lrugen =3D &lruvec->lrugen; =20 /* - * +-----------------------------------+---------------------------------= --+ - * | Accessed through page tables and | Accessed through file descriptor= s | - * | promoted by folio_update_gen() | and protected by folio_inc_gen()= | - * +-----------------------------------+---------------------------------= --+ - * | PG_active (set while isolated) | = | - * +-----------------+-----------------+-----------------+---------------= --+ - * | PG_workingset | PG_referenced | PG_workingset | LRU_REFS_FLAG= S | - * +-----------------------------------+---------------------------------= --+ - * |<---------- MIN_NR_GENS ---------->| = | - * |<---------------------------- MAX_NR_GENS ---------------------------= ->| + * +-------------------------------------------+-------------------------= -----------------+ + * | Accessed through page tables and | Accessed through fil= e descriptors | + * | promoted by folio_update_gen() | and protected by fol= io_inc_gen() | + * +------0------------------------------------+-------------------------= -----------------+ + * | PG_active (set while isolated) | = | + * +---------------------+---------------------+--------------------+----= -----------------+ + * | LRU_REFS_MAX | LRU_REFS_REFERENCED | LRU_REFS_MAX | LRU= _REFS_REFERENCED | + * +-------------------------------------------+-------------------------= -----------------+ + * |<-------------- MIN_NR_GENS -------------->| = | + * |<------------------------------------ MAX_NR_GENS -------------------= ---------------->| */ if (folio_test_active(folio)) - gen =3D MIN_NR_GENS - folio_test_workingset(folio); + gen =3D MIN_NR_GENS - (refs >=3D LRU_REFS_WORKINGSET); else if (reclaiming) gen =3D MAX_NR_GENS; else if ((!folio_is_file_lru(folio) && !folio_test_swapcache(folio)) || @@ -304,7 +310,7 @@ static inline unsigned long lru_gen_folio_seq(const str= uct lruvec *lruvec, (folio_test_dirty(folio) || folio_test_writeback(folio)))) gen =3D MIN_NR_GENS; else - gen =3D MAX_NR_GENS - folio_test_workingset(folio); + gen =3D MAX_NR_GENS - (refs >=3D LRU_REFS_REFERENCED); =20 return max(READ_ONCE(lrugen->max_seq) - gen + 1, READ_ONCE(lrugen->min_se= q[type])); } @@ -396,6 +402,8 @@ static inline void folio_migrate_refs(struct folio *new= , const struct folio *old { if (folio_test_referenced(old)) folio_set_referenced(new); + if (folio_test_workingset(old)) + folio_set_workingset(new); } #endif /* CONFIG_LRU_GEN */ =20 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 721d0db8b0f9..393bbea75838 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -470,55 +470,87 @@ enum lruvec_flags { =20 /* * Each generation is divided into multiple tiers. A folio accessed N times - * through file descriptors is in tier order_base_2(N). A folio in the fir= st - * tier (N=3D0,1) is marked by PG_referenced unless it was faulted in thro= ugh page - * tables or read ahead. A folio in the last tier (MAX_NR_TIERS-1) is mark= ed by - * PG_workingset. A folio in any other tier (1flags. + * through file descriptors will be categorized as higher tier as shown be= low: * - * In contrast to moving across generations which requires the LRU lock, m= oving - * across tiers only involves atomic operations on folio->flags and theref= ore - * has a negligible cost in the buffered access path. In the eviction path, - * comparisons of refaulted/(evicted+protected) from the first tier and th= e rest - * infer whether folios accessed multiple times through file descriptors a= re - * statistically hot and thus worth protecting. + * MGLRU (Workingset Protection) + * Access Tier | + * 0 0 | - Should be mostly cold folio, or readahead & reclaim= ing. [1] + * 1 0 | - Folios that are used for at least once. [2] + * --WORKINGSET--<-\ - Considered workingset and future access may increas= e its gen. [3] + * 2 1 | - Reclaimable workingset. [4] + * 3 2 | - PID protected workingset. [5] + * 4* 2 | - Extended tiers [6] + * 5* 3 | + * 6* 3 | + * 7* 3 | - Promotion candidate workingset. [7] + * --PROMOTION--->-/ + * + * 1. The tier is fls(N-1) for N > 1, 0 otherwise. Ideally each tier + * represents folios of similar access patterns, lower tiers are folios th= at + * are less important and should be evicted faster. Readahead (PG_readahea= d) + * or reclaiming (PG_reclaim) folios are force put in tier 0. + * + * 2. Folios accessed at least once are still on tier 0: + * (folio_lru_refs(folio) =3D=3D LRU_REFS_REFERENCED =3D=3D 1). + * One time usage doesn't make the folio qualified to be protected in anyw= ay. + * If a workingset folio is accessed and in the oldest gen, it will be mov= ed + * to second oldest gen. Gen moving is lazily done without involving + * LRU lock so it's cheap. It won't be promoted further if it's not on + * oldest gen to avoid over-promoting of caches. + * + * 3. Folios that are accessed more than once are considered workingset: + * (folio_lru_refs(folio) =3D=3D LRU_REFS_WORKINGSET =3D=3D 2). + * + * 4. A folio qualified as workingset is still on tier 1 unless it + * is accessed again. That makes the workingset still easily reclaimable, = since + * the initial move in (1) and the promotion in (6) already protects recent + * workingset proactively, and we want to reclaim caches fast when under + * pressure: pinning a folio in high tier combined with the PID protect in + * (4) will cause the eviction of old caches delayed by a lot. + * + * 5. Starting from tier 1, PID protection will take effect. Lower tier + * will be sacrificed to protect higher tier when the higher tier have a + * higher refault rate (see PID protection part on vmscan.c). + * + * 6. Tier > 2 is only available when LRU_REFS_WIDTH >=3D 1. Ideally we wi= ll + * always have it but in some extreme configs, page flags is just not enou= gh. + * This will be improved in the future. Without Tiers > 2, things still wo= rk, + * but we will promote file caches more aggressively, and maybe unexpected= ly. + * + * 7. If the referenced count =3D=3D LRU_REFS_MAX, a future access will in= crease + * the gen of the folio by one and reset its referenced count to + * LRU_REFS_PROTECTED. It still considered workingset but moved to a higher + * gen representing a higher hotness and reclaim bias. + * + * Tiering uses PG_workingset and PG_referenced and the lower two bits, + * LRU_REFS_MASK as the higher bits. + * + * A folio's referenced count never goes backwards except upon gen increase + * in (7) or a promotion. Passive protect by PID will reset a folio with h= igher + * referenced count to LRU_REFS_WORKINGSET. Refault of a reclaimed folio m= ay + * restore its referenced count by lru_gen_refault. + * + * In the eviction path, comparisons of refaulted/(evicted+protected) from + * the first tier and the rest infer whether folios accessed multiple times + * through file descriptors are statistically hot and thus worth protectin= g. * * MAX_NR_TIERS is set to 4 so that the multi-gen LRU can support twice the * number of categories of the active/inactive LRU when keeping track of - * accesses through file descriptors. This uses MAX_NR_TIERS-2 spare bits = in + * accesses through file descriptors. This uses MAX_NR_TIERS-3 spare bits = in * folio->flags, masked by LRU_REFS_MASK. */ #define MAX_NR_TIERS 4U +#define LRU_REFS_REFERENCED 0x1 +#define LRU_REFS_WORKINGSET 0x2 +#define LRU_REFS_PROTECTED 0x3 =20 #ifndef __GENERATING_BOUNDS_H =20 #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_GEN_MAX (BIT(LRU_GEN_WIDTH - 1) - 1) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) -#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH) - 1) - -/* - * For folios accessed multiple times through file descriptors, - * lru_gen_inc_refs() sets additional bits of LRU_REFS_WIDTH in folio->fla= gs - * after PG_referenced, then PG_workingset after LRU_REFS_WIDTH. After all= its - * bits are set, i.e., LRU_REFS_FLAGS|BIT(PG_workingset), a folio is lazily - * promoted into the second oldest generation in the eviction path. And wh= en - * folio_inc_gen() does that, it clears LRU_REFS_FLAGS so that - * lru_gen_inc_refs() can start over. Note that for this case, LRU_REFS_MA= SK is - * only valid when PG_referenced is set. - * - * For folios accessed multiple times through page tables, folio_update_ge= n() - * from a page table walk or lru_gen_set_refs() from a rmap walk sets - * PG_referenced after the accessed bit is cleared for the first time. - * Thereafter, those two paths set PG_workingset and promote folios to the - * youngest generation. Like folio_inc_gen(), folio_update_gen() also clea= rs - * PG_referenced. Note that for this case, LRU_REFS_MASK is not used. - * - * For both cases above, after PG_workingset is set on a folio, it remains= until - * this folio is either reclaimed, or "deactivated" by lru_gen_clear_refs(= ). It - * can be set again if lru_gen_test_recent() returns true upon a refault. - */ -#define LRU_REFS_FLAGS (LRU_REFS_MASK | BIT(PG_referenced)) +#define LRU_REFS_FLAGS (LRU_REFS_MASK | BIT(PG_referenced) | BIT(PG_worki= ngset)) +#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH + 2) - 1) =20 struct lruvec; struct page_vma_mapped_walk; diff --git a/kernel/bounds.c b/kernel/bounds.c index 02b619eb6106..06a034713b5d 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -25,7 +25,7 @@ int main(void) DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t)); #ifdef CONFIG_LRU_GEN DEFINE(LRU_GEN_WIDTH, order_base_2(MAX_NR_GENS + 1)); - DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 2); + DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 3); #else DEFINE(LRU_GEN_WIDTH, 0); DEFINE(__LRU_REFS_WIDTH, 0); diff --git a/mm/swap.c b/mm/swap.c index 6204496d48f5..5fc8a9ffbedb 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -311,9 +311,9 @@ static void lru_activate(struct lruvec *lruvec, struct = folio *folio) if (folio_test_active(folio) || folio_test_unevictable(folio)) return; =20 - lruvec_del_folio(lruvec, folio); folio_set_active(folio); + folio_set_lru_refs(folio, LRU_REFS_WORKINGSET); lruvec_add_folio(lruvec, folio); trace_mm_lru_activate(folio); =20 @@ -390,30 +390,86 @@ static void __lru_cache_activate_folio(struct folio *= folio) =20 #ifdef CONFIG_LRU_GEN =20 -static void lru_gen_inc_refs(struct folio *folio) +static void folio_inc_lru_refs(struct folio *folio) { - unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); - int refs; + int type, refs, gen, new_gen, max_gen, min_gen; + unsigned long new_flags, old_flags, max_seq; + struct lru_gen_folio *lrugen =3D NULL; + struct lruvec *lruvec =3D NULL; + bool isolated =3D false; =20 if (folio_test_unevictable(folio)) return; =20 - if (!folio_lru_refs(folio)) { - folio_set_lru_refs(folio, 1); - return; - } - - /* see the comment on LRU_REFS_FLAGS */ + old_flags =3D READ_ONCE(*folio_flags(folio, 0)); do { new_flags =3D old_flags; - refs =3D lru_refs_from_flags(old_flags); - if (refs =3D=3D LRU_REFS_MAX) { - if (!folio_test_workingset(folio)) - folio_set_workingset(folio); - return; + gen =3D lru_gen_from_flags(old_flags); + refs =3D lru_refs_from_flags(old_flags) + 1; + new_gen =3D gen; + if (gen < 0) + goto out; + + /* + * To promote frequently used folios, prevent isolation + * first, it's a lazy promotion so no LRU lock needed. + */ + if (!isolated) { + if (!folio_test_clear_lru(folio)) + goto out; + isolated =3D true; + old_flags &=3D ~BIT(PG_lru); + new_flags =3D old_flags; + rcu_read_lock(); + lruvec =3D folio_lruvec(folio); + lrugen =3D &lruvec->lrugen; } - lru_refs_set_flags(&new_flags, refs + 1); + + max_seq =3D READ_ONCE(lrugen->max_seq); + max_gen =3D lru_gen_from_seq(max_seq); + if (gen =3D=3D max_gen) + goto out; + + /* + * Always promote if we hit LRU_REFS_MAX, else, only promote + * from oldest gen. + */ + if (refs <=3D LRU_REFS_MAX) { + type =3D folio_is_file_lru(folio); + min_gen =3D lru_gen_from_seq(READ_ONCE(lrugen->min_seq[type])); + if (gen !=3D min_gen) + goto out; + } else { + refs =3D LRU_REFS_PROTECTED; + } + + new_gen =3D (gen + 1UL) % MAX_NR_GENS; + lru_gen_set_flags(&new_flags, new_gen); +out: + lru_refs_set_flags(&new_flags, min(refs, LRU_REFS_MAX)); + if (isolated) + new_flags |=3D BIT(PG_lru); } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); + + if (isolated) { + bool reactive =3D false; + + if (new_gen !=3D gen) { + /* + * It's possible that the folio is concurrently promoted to + * latest gen, so the promotion above causes gen inversion. + * The window is tiny but in such case, just activate the folio. + */ + if (max_seq !=3D READ_ONCE(lrugen->max_seq)) + reactive =3D true; + lru_gen_update_size(lruvec, folio, gen, new_gen); + } + + rcu_read_unlock(); + + if (reactive) + folio_activate(folio); + } } =20 static bool lru_gen_clear_refs(struct folio *folio) @@ -426,7 +482,6 @@ static bool lru_gen_clear_refs(struct folio *folio) return true; =20 folio_set_lru_refs(folio, 0); - folio_clear_workingset(folio); =20 rcu_read_lock(); seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); @@ -437,7 +492,7 @@ static bool lru_gen_clear_refs(struct folio *folio) =20 #else /* !CONFIG_LRU_GEN */ =20 -static void lru_gen_inc_refs(struct folio *folio) +static void folio_inc_lru_refs(struct folio *folio) { } =20 @@ -466,7 +521,7 @@ void folio_mark_accessed(struct folio *folio) if (folio_test_dropbehind(folio)) return; if (lru_gen_enabled()) { - lru_gen_inc_refs(folio); + folio_inc_lru_refs(folio); return; } =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index a5b4750a5028..026b56828fdb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -840,21 +840,30 @@ enum folio_references { * with PG_active set. In contrast, the aging (page table walk) path uses * folio_update_gen(). */ -static bool lru_gen_set_refs(struct folio *folio) +static bool folio_promote_lru_refs(struct folio *folio) { - /* see the comment on LRU_REFS_FLAGS */ - if (!folio_lru_refs(folio) && !folio_test_workingset(folio)) { - folio_set_lru_refs(folio, 1); - return false; - } + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); + int refs; =20 - folio_set_lru_refs(folio, 0); - folio_set_workingset(folio); + do { + new_flags =3D old_flags; + refs =3D lru_refs_from_flags(old_flags); + /* + * Bump refs by one up to LRU_REFS_MAX. Once we are at + * LRU_REFS_MAX, leave the flags alone: the caller treats a + * return of true (refs >=3D LRU_REFS_WORKINGSET) as a cue to + * activate the folio, which resets refs to + * LRU_REFS_WORKINGSET in lru_activate(). + */ + if (refs =3D=3D LRU_REFS_MAX) + break; + lru_refs_set_flags(&new_flags, ++refs); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 - return true; + return refs >=3D LRU_REFS_WORKINGSET; } #else -static bool lru_gen_set_refs(struct folio *folio) +static bool folio_promote_lru_refs(struct folio *folio) { return false; } @@ -889,7 +898,7 @@ static enum folio_references folio_check_references(str= uct folio *folio, if (!referenced_ptes) return FOLIOREF_RECLAIM; =20 - return lru_gen_set_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP; + return folio_promote_lru_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP; } =20 referenced_folio =3D folio_test_clear_referenced(folio); @@ -3196,39 +3205,39 @@ static bool positive_ctrl_err(struct ctrl_pos *sp, = struct ctrl_pos *pv) static int folio_update_gen(struct folio *folio, int new_gen) { unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); - int old_gen; + int refs, gen, ret; =20 /* see the comment on LRU_REFS_FLAGS */ - if (!lru_refs_from_flags(old_flags) && !folio_test_workingset(folio)) { - folio_set_lru_refs(folio, 1); - return -1; - } - do { - old_gen =3D lru_gen_from_flags(old_flags); + gen =3D lru_gen_from_flags(old_flags); + refs =3D lru_refs_from_flags(old_flags); new_flags =3D old_flags; =20 - /* lru_gen_del_folio() has isolated this page? */ - if (old_gen < 0) - break; - - lru_gen_set_flags(&new_flags, new_gen); - lru_refs_set_flags(&new_flags, 0); - new_flags |=3D BIT(PG_workingset); + if (gen >=3D 0 && gen !=3D new_gen && refs) { + ret =3D gen; + lru_gen_set_flags(&new_flags, new_gen); + lru_refs_set_flags(&new_flags, LRU_REFS_WORKINGSET); + } else { + ret =3D -1; + lru_refs_set_flags(&new_flags, min(++refs, LRU_REFS_MAX)); + } } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 - return old_gen; + return ret; } =20 /* protect pages accessed multiple times through file descriptors */ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio) { + int refs; int type =3D folio_is_file_lru(folio); struct lru_gen_folio *lrugen =3D &lruvec->lrugen; int old_gen, new_gen, min_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); =20 do { + new_flags =3D old_flags; + refs =3D lru_refs_from_flags(old_flags); old_gen =3D lru_gen_from_flags(old_flags); VM_WARN_ON_ONCE_FOLIO(old_gen < 0, folio); =20 @@ -3236,12 +3245,10 @@ static int folio_inc_gen(struct lruvec *lruvec, str= uct folio *folio) if (old_gen >=3D 0 && old_gen !=3D min_gen) return old_gen; =20 - new_flags =3D old_flags; new_gen =3D (old_gen + 1) % MAX_NR_GENS; - - new_flags =3D old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS); - new_flags |=3D (new_gen + 1UL) << LRU_GEN_PGOFF; - } while (!try_cmpxchg(&folio->flags.f, &old_flags, new_flags)); + lru_gen_set_flags(&new_flags, new_gen); + lru_refs_set_flags(&new_flags, min(refs, LRU_REFS_WORKINGSET)); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 lru_gen_update_size(lruvec, folio, old_gen, new_gen); =20 @@ -3451,7 +3458,7 @@ static void walk_update_folio(struct lru_gen_mm_walk = *walk, struct folio *folio, old_gen =3D folio_update_gen(folio, new_gen); if (old_gen >=3D 0 && old_gen !=3D new_gen) update_batch_size(walk, folio, old_gen, new_gen); - } else if (lru_gen_set_refs(folio)) { + } else if (folio_promote_lru_refs(folio)) { old_gen =3D folio_lru_gen(folio); if (old_gen >=3D 0 && old_gen !=3D new_gen) folio_activate(folio); @@ -3846,7 +3853,8 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty= pe, int swappiness) while (!list_empty(head)) { struct folio *folio =3D lru_to_folio(head); int refs =3D folio_lru_refs(folio); - bool workingset =3D folio_test_workingset(folio); + int delta =3D folio_nr_pages(folio); + int tier =3D lru_tier_from_refs(refs); =20 VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); @@ -3856,14 +3864,8 @@ static bool inc_min_seq(struct lruvec *lruvec, int t= ype, int swappiness) new_gen =3D folio_inc_gen(lruvec, folio); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); =20 - /* don't count the workingset being lazily promoted */ - if (refs + workingset !=3D BIT(LRU_REFS_WIDTH) + 1) { - int tier =3D lru_tier_from_refs(refs, workingset); - int delta =3D folio_nr_pages(folio); - - WRITE_ONCE(lrugen->protected[hist][type][tier], - lrugen->protected[hist][type][tier] + delta); - } + WRITE_ONCE(lrugen->protected[hist][type][tier], + lrugen->protected[hist][type][tier] + delta); =20 if (!--remaining) return false; @@ -4580,8 +4582,7 @@ static bool sort_folio(struct lruvec *lruvec, struct = folio *folio, struct scan_c int zone =3D folio_zonenum(folio); int delta =3D folio_nr_pages(folio); int refs =3D folio_lru_refs(folio); - bool workingset =3D folio_test_workingset(folio); - int tier =3D lru_tier_from_refs(refs, workingset); + int tier =3D lru_tier_from_refs(refs); struct lru_gen_folio *lrugen =3D &lruvec->lrugen; =20 VM_WARN_ON_ONCE_FOLIO(gen >=3D MAX_NR_GENS, folio); @@ -4603,17 +4604,15 @@ static bool sort_folio(struct lruvec *lruvec, struc= t folio *folio, struct scan_c } =20 /* protected */ - if (tier > tier_idx || refs + workingset =3D=3D BIT(LRU_REFS_WIDTH) + 1) { + if (tier > tier_idx) { + int hist =3D lru_hist_from_seq(lrugen->min_seq[type]); + gen =3D folio_inc_gen(lruvec, folio); - list_move(&folio->lru, &lrugen->folios[gen][type][zone]); + list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); =20 - /* don't count the workingset being lazily promoted */ - if (refs + workingset !=3D BIT(LRU_REFS_WIDTH) + 1) { - int hist =3D lru_hist_from_seq(lrugen->min_seq[type]); + WRITE_ONCE(lrugen->protected[hist][type][tier], + lrugen->protected[hist][type][tier] + delta); =20 - WRITE_ONCE(lrugen->protected[hist][type][tier], - lrugen->protected[hist][type][tier] + delta); - } return true; } =20 @@ -4641,10 +4640,6 @@ static bool isolate_folio(struct lruvec *lruvec, str= uct folio *folio, struct sca return false; } =20 - /* see the comment on LRU_REFS_FLAGS */ - if (!folio_test_referenced(folio)) - folio_set_lru_refs(folio, 0); - success =3D lru_gen_del_folio(lruvec, folio, true); VM_WARN_ON_ONCE_FOLIO(!success, folio); =20 @@ -4732,13 +4727,13 @@ static int get_tier_idx(struct lruvec *lruvec, int = type) struct ctrl_pos sp, pv =3D {}; =20 /* - * To leave a margin for fluctuations, use a larger gain factor (2:3). + * To leave a margin for fluctuations, use a larger gain factor (1:2). * This value is chosen because any other tier would have at least twice * as many refaults as the first tier. */ - read_ctrl_pos(lruvec, type, 0, 2, &sp); + read_ctrl_pos(lruvec, type, 0, 1, &sp); for (tier =3D 1; tier < MAX_NR_TIERS; tier++) { - read_ctrl_pos(lruvec, type, tier, 3, &pv); + read_ctrl_pos(lruvec, type, tier, 2, &pv); if (!positive_ctrl_err(&sp, &pv)) break; } @@ -4858,10 +4853,8 @@ static int evict_folios(unsigned long nr_to_scan, st= ruct lruvec *lruvec, } =20 /* don't add rejected folios to the oldest generation */ - if (lru_gen_folio_seq(lruvec, folio, false) =3D=3D min_seq[type]) { - folio_set_lru_refs(folio, 0); + if (lru_gen_folio_seq(lruvec, folio, false) =3D=3D min_seq[type]) folio_set_active(folio); - } } =20 move_folios_to_lru(&list); diff --git a/mm/workingset.c b/mm/workingset.c index 07e6836d0502..bdb8df6009af 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -189,6 +189,13 @@ #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) #define EVICTION_MASK_ANON (~0UL >> EVICTION_SHIFT_ANON) =20 +/* + * LRU refs uses LRU_REFS_WIDTH + 2 bits, the 2 bits are PG_workingset and + * PG_referenced. But here we record PG_workingset separately (to reuse + * pack_shadow). + */ +#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 2) - 1) + /* * Eviction timestamps need to be able to cover the full range of * actionable refaults. However, bits are tight in the xarray @@ -242,13 +249,12 @@ static void *lru_gen_eviction(struct folio *folio) int type =3D folio_is_file_lru(folio); int delta =3D folio_nr_pages(folio); int refs =3D folio_lru_refs(folio); - bool workingset =3D folio_test_workingset(folio); - int tier =3D lru_tier_from_refs(refs, workingset); + int tier =3D lru_tier_from_refs(refs); struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); unsigned short memcg_id; =20 - BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > + BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_BITS > BITS_PER_LONG - max(EVICTION_SHIFT, EVICTION_SHIFT_ANON)); =20 rcu_read_lock(); @@ -256,14 +262,14 @@ static void *lru_gen_eviction(struct folio *folio) lruvec =3D mem_cgroup_lruvec(memcg, pgdat); lrugen =3D &lruvec->lrugen; min_seq =3D READ_ONCE(lrugen->min_seq[type]); - token =3D (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0); + token =3D (min_seq << LRU_REFS_BITS) | refs >> 1; =20 hist =3D lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); memcg_id =3D mem_cgroup_private_id(memcg); rcu_read_unlock(); =20 - return pack_shadow(memcg_id, pgdat, token, workingset, type); + return pack_shadow(memcg_id, pgdat, token, refs & 1, type); } =20 /* @@ -284,9 +290,9 @@ static bool lru_gen_test_recent(void *shadow, struct lr= uvec **lruvec, *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); =20 max_seq =3D READ_ONCE((*lruvec)->lrugen.max_seq); - max_seq &=3D (file ? EVICTION_MASK : EVICTION_MASK_ANON) >> LRU_REFS_WIDT= H; + max_seq &=3D (file ? EVICTION_MASK : EVICTION_MASK_ANON) >> LRU_REFS_BITS; =20 - return abs_diff(max_seq, *token >> LRU_REFS_WIDTH) < MAX_NR_GENS; + return abs_diff(max_seq, *token >> LRU_REFS_BITS) < MAX_NR_GENS; } =20 static void lru_gen_refault(struct folio *folio, void *shadow) @@ -314,8 +320,8 @@ static void lru_gen_refault(struct folio *folio, void *= shadow) lrugen =3D &lruvec->lrugen; =20 hist =3D lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); - refs =3D (token & (BIT(LRU_REFS_WIDTH) - 1)) + 1; - tier =3D lru_tier_from_refs(refs, workingset); + refs =3D ((token & (BIT(LRU_REFS_BITS) - 1)) << 1) + workingset; + tier =3D lru_tier_from_refs(refs); =20 atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); =20 @@ -323,11 +329,10 @@ static void lru_gen_refault(struct folio *folio, void= *shadow) if (lru_gen_in_fault()) mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); =20 - if (workingset) { - folio_set_workingset(folio); + if (refs) { + folio_set_lru_refs(folio, refs); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); - } else - set_mask_bits(&folio->flags.f, LRU_REFS_MASK, (refs - 1UL) << LRU_REFS_P= GOFF); + } unlock: rcu_read_unlock(); } --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0F83425CDF for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=d1L9oDVWyaZ0cGtsq0e68VqLqyAsusPInUJ+x9JT7L8pMEfoKvEVGx14NSucDpQAjbIdJbZe90GspAFV5J2oJAhw/LpnhP/YZJyKldF0qIFuBJIPbNlBRMEgnLFjgvHM7ZNmMNDL8pJJWKTVQvygNqrf2t8O2hZbma+6Q7n3MQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=uw7HtZ5ow/Jkd3cZzVS81z4VEBFR+IcoXXPZBjeqpgM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=fq+xXtkYwUXsTVMDlDW8BXHFYHm6lYCHj62ehpx5z7rBkvAN/JLmTCCURnBXzxDBabPzKp1EvWm6muOwjQVbRGqCIVTJbNTII7Ar6yULVUz29juzSxJUYxGWQIBdtUUJxeyoJX3IMNtm9AcrltxM+d4PxnbaR+l0H9arjhDDavY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vGgGEF+e; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vGgGEF+e" Received: by smtp.kernel.org (Postfix) with ESMTPS id 83997C32781; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=uw7HtZ5ow/Jkd3cZzVS81z4VEBFR+IcoXXPZBjeqpgM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=vGgGEF+etW/Tod5Sq+y12DVbJ5FnObSl2pAt60sfSXE+yYpL/C/uH3I9WIkQfO15s 8QOxRXttW1ztRfZIgZL5WWLWo1OmoQW/2xYZ5ca9cnmaH2GQXAp1qjmfzV8FgJqHCJ loFSz7CYDgMraLMFgwcdWGqgOoS+fAYDu4z9Vq9VGqAvrlxqYKl5J8TQKHXiAGOd0S zcBhrgZhfg/2ozyidnWa1I3SXt2+78GU+ytgUSMqqrNwC67kE+AONHuM4H9u8kmwZl B0f5b6S1hQ/CACt+8mtXIoPzIY0YQZxCFN+qfyaiu7tH0b0g9dYMwNnmy8ic6+T21K V267vjg5tSCSA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76EBECD3425; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:53 +0800 Subject: [PATCH RFC 07/32] mm/mglru: don't reset folios LRU refs count on protection by default Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-7-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=2486; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=74TJpGUGa2omgt1UHlZXT4tEjvZggbuhmpXctXzyYeI=; b=TdrXt7tP211NvOJLgnbSiBqOADLMZLnktd8QIY73I2m6phygDPVNzZRmaGwGoUSDIgF/Q1Xyl raKvqc1OKAtDdWdoH5Lxp8YH0KrFwc8Th8Gyv44avqwtpbxBGDzZMz2 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Only reset the folio's ref count info when it's being protected from PID to avoid over protection. No need to clear it if the folio is moved to younger gen due to other reason. Signed-off-by: Kairui Song --- mm/vmscan.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 026b56828fdb..c6857a933ebf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3227,7 +3227,7 @@ static int folio_update_gen(struct folio *folio, int = new_gen) } =20 /* protect pages accessed multiple times through file descriptors */ -static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio) +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool = reset) { int refs; int type =3D folio_is_file_lru(folio); @@ -3247,7 +3247,8 @@ static int folio_inc_gen(struct lruvec *lruvec, struc= t folio *folio) =20 new_gen =3D (old_gen + 1) % MAX_NR_GENS; lru_gen_set_flags(&new_flags, new_gen); - lru_refs_set_flags(&new_flags, min(refs, LRU_REFS_WORKINGSET)); + if (reset) + lru_refs_set_flags(&new_flags, min(refs, LRU_REFS_WORKINGSET)); } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); =20 lru_gen_update_size(lruvec, folio, old_gen, new_gen); @@ -3861,7 +3862,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty= pe, int swappiness) VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) !=3D type, folio); VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio); =20 - new_gen =3D folio_inc_gen(lruvec, folio); + new_gen =3D folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); =20 WRITE_ONCE(lrugen->protected[hist][type][tier], @@ -4607,7 +4608,7 @@ static bool sort_folio(struct lruvec *lruvec, struct = folio *folio, struct scan_c if (tier > tier_idx) { int hist =3D lru_hist_from_seq(lrugen->min_seq[type]); =20 - gen =3D folio_inc_gen(lruvec, folio); + gen =3D folio_inc_gen(lruvec, folio, true); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); =20 WRITE_ONCE(lrugen->protected[hist][type][tier], @@ -4618,7 +4619,7 @@ static bool sort_folio(struct lruvec *lruvec, struct = folio *folio, struct scan_c =20 /* ineligible */ if (zone > sc->reclaim_idx) { - gen =3D folio_inc_gen(lruvec, folio); + gen =3D folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA125425CE0 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=TDtW+5KFjuhi8ioDINxZT3Mhbi8mRaNBI9OhLBqQdWzrFwA6PzthQqaz1vTRZ/uf5Qkgy8IJ3MmC4KEE1rWOdc1gy2PNzltTKvjIm+m6IGToQSUrq0INxTMjVJt+NZzFD3k+gRe3ff+ihwk4yD4vSkkOISw3u9t1DFFJYCG2f9I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=ApEOFvC2VgjTNekwT8E17q1ZVKs84dstNvvmMSucXp8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=caZFFzvFvtGh8Oumgng3RosJLKlLtTVCFPr1jKDWuZIXlgV3N0rlTw7xo4/uQTqWU01u+0SvTImZcF2gFn2FriJ/af+L0dZByFthaQT4VFevnLREVnigQnuYA98e6iDTjeMudameQoA8Ek/BtwERRnkQWtHMd/UYceG2T524k54= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HqXOV6T/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HqXOV6T/" Received: by smtp.kernel.org (Postfix) with ESMTPS id 92C1BC2BCF6; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=ApEOFvC2VgjTNekwT8E17q1ZVKs84dstNvvmMSucXp8=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=HqXOV6T/5NYzzetWHipxDTUBDAGs7OV05csm35yH1ube6fFSiCw6XPu/kMVORVENA Qr2mY25i8OOofF5GI470hVeq8iV04ljNWZRJ70x132sfVtYxz9o/k4rjIARU5/rNdP JkUxo1fpWoyx+fDqv2TtCgBo4ljbuCjILp4PP2b4rKKZQfZX1jTuZiZk0nGIg+Ieu4 dYOrGnu919foXSs+ylaqxH6mo3zHLSXwPGqArCJ8qZE0VNMewBxVyrXNEyPvj+WVWX RJo9obiL3QmGdn+jwY4UTtnoQnAS8GDG1/SLQsVrlcQQI2F1fiBXBHGdBUycnjYaPH u14ncMJdPZ3gQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88E06CD3427; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:54 +0800 Subject: [PATCH RFC 08/32] mm: make folio lru referenced times count a generic API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-8-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=9380; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=kFsnhem71+vNlAD+uETGIDzz3S2DgnshxmHavEobZls=; b=ZcQPWJu7KfLDxZjQPqZ8h8aF5A2CQesX2x+TUpUsKZ3cYMSL9TEgR2qq0n4FukOnjz+ugfmQ+ t8XROTxV3wDBrzDHokgy4gl5yiQOj36pBlwUNU8gEDp1DYXLPS6eEdb X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song To prepare for unifying the API for checking folio referenced status, expose the referenced times counting as a generic API. For MGLRU this helps to adapt other subsystem based on the referenced times counting, for non-MGLRU this is still bitwise compatible and there won't be major behavior change. Signed-off-by: Kairui Song --- include/linux/mm_inline.h | 229 +++++++++++++++++++++++++++++++++---------= ---- mm/migrate.c | 2 - 2 files changed, 167 insertions(+), 64 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a9ed9a79364e..a108695424fb 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -94,6 +94,173 @@ static __always_inline enum lru_list folio_lru_list(con= st struct folio *folio) return lru; } =20 +/** + * lru_refs_from_flags - Return LRU referenced / access count from folio f= lags. + * @flags: folio flags + */ +static inline int lru_refs_from_flags(unsigned long flags) +{ + int refs; + + /* + * Return the total number of accesses. Also see the comment on + * LRU_REFS_FLAGS. + */ + refs =3D (flags & BIT(PG_referenced)) ? BIT(0) : 0; + refs +=3D (flags & BIT(PG_workingset)) ? BIT(1) : 0; + refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 2; + return refs; +} + +/** + * lru_refs_set_flags - Set the LRU referenced / access count to specified= folio flags. + * @flags: pointer to the folio flags + * @refs: referenced / access count number, between 0 and LRU_REFS_MAX, in= clusive. + */ +static inline void lru_refs_set_flags(unsigned long *flags, unsigned int r= efs) +{ + VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); + + *flags &=3D ~LRU_REFS_FLAGS; + if (refs & BIT(0)) + *flags |=3D BIT(PG_referenced); + if (refs & BIT(1)) + *flags |=3D BIT(PG_workingset); + *flags |=3D (((unsigned long)refs) >> 2) << LRU_REFS_PGOFF; +} + +static inline int folio_lru_refs(const struct folio *folio) +{ + return lru_refs_from_flags(READ_ONCE(*const_folio_flags(folio, 0))); +} + +static inline void folio_set_lru_refs(struct folio *folio, unsigned int re= fs) +{ + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); + + do { + new_flags =3D old_flags; + lru_refs_set_flags(&new_flags, refs); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); +} + +/** + * folio_is_referenced - Tell if a folio was accessed before. + * @folio: the folio. + * + * This helper currently only works as intended for MGLRU, as it checks + * all LRU_REFS_FLAGS. It might be fine for non-MGLRU to replace + * folio_test_referenced in some cases but the user should be careful. + * + * Returns: true if the folio's LRU referenced / accessd count > 0. + */ +static inline bool folio_is_referenced(const struct folio *folio) +{ + return folio_lru_refs(folio) >=3D LRU_REFS_REFERENCED; +} + +/** + * folio_mark_referenced - Mark a folio as referenced. + * @folio: the folio. + * + * Ensures the folio's LRU referenced count is at least + * LRU_REFS_REFERENCED. Won't do anything if the count is already larger + * than that. This helper currently only works as intended for MGLRU. + * Not a drop-in replacement, but should be fine for non-MGLRU to replace + * folio_set_referenced with this after audit. + */ +static inline void folio_mark_referenced(struct folio *folio) +{ + unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); + + do { + new_flags =3D old_flags; + if (lru_refs_from_flags(new_flags) >=3D LRU_REFS_REFERENCED) + return; + lru_refs_set_flags(&new_flags, LRU_REFS_REFERENCED); + } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); +} + +/** + * __folio_init_referenced - Force init a folio as referenced non-atomicly. + * @folio: the folio. + * + * Force set a folio's LRU referenced count to LRU_REFS_REFERENCED non-ato= micly. + * Can be used to replace __folio_set_referenced safely. + */ +static inline void __folio_init_referenced(struct folio *folio) +{ + lru_refs_set_flags(folio_flags(folio, 0), LRU_REFS_REFERENCED); +} + +/** + * folio_mark_referenced_by_bit - Mark a folio as referenced by bit. + * @folio: the folio. + * + * non-MGLRU may want to make use the lowest LRU referenced count bit + * explicitely as a referenced mark. + */ +static inline void folio_mark_referenced_by_bit(struct folio *folio) +{ + set_mask_bits(folio_flags(folio, 0), BIT(PG_referenced), BIT(PG_reference= d)); +} + +/** + * folio_clear_referenced_by_bit - Mark a folio as referenced exactly once. + * @folio: the folio. + */ +static inline void folio_clear_referenced_by_bit(struct folio *folio) +{ + set_mask_bits(folio_flags(folio, 0), BIT(PG_referenced), 0); +} + +/** + * folio_test_clear_referenced_bit - Test and clear the referenced bit + * @folio: the folio. + */ +static inline bool folio_test_clear_referenced_bit(struct folio *folio) +{ + return test_and_clear_bit(PG_referenced, folio_flags(folio, 0)); +} + +/** + * folio_is_referenced_by_bit - Mark a folio as referenced at least once. + * @folio: the folio. + */ +static inline bool folio_is_referenced_by_bit(const struct folio *folio) +{ + return test_bit(PG_referenced, const_folio_flags(folio, 0)); +} + +/** + * folio_is_workingset - Tell if a folio is part of the workingset. + * @folio: the folio. + * + * Can be used to replace folio_test_workingset safely. For MGLRU the LRU + * referenced count tells if a folio is a workingset as intended. For non-= MGLRU, + * the check below only holds true if the PG_workingset bit is set. + */ +static inline bool folio_is_workingset(const struct folio *folio) +{ + return folio_lru_refs(folio) >=3D LRU_REFS_WORKINGSET; +} + +/** + * folio_mark_workingset_by_bit - Mark a folio as part of the workingset. + * @folio: the folio. + * + * Force set a folio's LRU referenced count to at least LRU_REFS_WORKINGSE= T. + */ +static inline void folio_mark_workingset_by_bit(struct folio *folio) +{ + set_mask_bits(folio_flags(folio, 0), BIT(PG_workingset), BIT(PG_workingse= t)); +} + +static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) +{ + folio_set_lru_refs(new, folio_lru_refs(old)); +} + #ifdef CONFIG_LRU_GEN =20 static inline bool lru_gen_switching(void) @@ -171,56 +338,6 @@ static inline void lru_gen_set_flags(unsigned long *fl= ags, int gen) *flags |=3D (gen + 1UL) << LRU_GEN_PGOFF; } =20 -/** - * lru_refs_from_flags - Return LRU referenced / access count from folio f= lags. - * @flags: folio flags - */ -static inline int lru_refs_from_flags(unsigned long flags) -{ - int refs; - - /* - * Return the total number of accesses. Also see the comment on - * LRU_REFS_FLAGS. - */ - refs =3D (flags & BIT(PG_referenced)) ? BIT(0) : 0; - refs +=3D (flags & BIT(PG_workingset)) ? BIT(1) : 0; - refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 2; - return refs; -} - -/** - * lru_refs_set_flags - Set the LRU referenced / access count to specified= folio flags. - * @flags: pointer to the folio flags - * @refs: referenced / access count number, between 0 and LRU_REFS_MAX, in= clusive. - */ -static inline void lru_refs_set_flags(unsigned long *flags, unsigned int r= efs) -{ - VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); - - *flags &=3D ~LRU_REFS_FLAGS; - if (refs & BIT(0)) - *flags |=3D BIT(PG_referenced); - if (refs & BIT(1)) - *flags |=3D BIT(PG_workingset); - *flags |=3D (((unsigned long)refs) >> 2) << LRU_REFS_PGOFF; -} - -static inline int folio_lru_refs(const struct folio *folio) -{ - return lru_refs_from_flags(READ_ONCE(*const_folio_flags(folio, 0))); -} - -static inline void folio_set_lru_refs(struct folio *folio, unsigned int re= fs) -{ - unsigned long new_flags, old_flags =3D READ_ONCE(*folio_flags(folio, 0)); - - do { - new_flags =3D old_flags; - lru_refs_set_flags(&new_flags, refs); - } while (!try_cmpxchg(folio_flags(folio, 0), &old_flags, new_flags)); -} - static inline int folio_lru_gen(const struct folio *folio) { return lru_gen_from_flags(READ_ONCE(*const_folio_flags(folio, 0))); @@ -366,11 +483,6 @@ static inline bool lru_gen_del_folio(struct lruvec *lr= uvec, struct folio *folio, =20 return true; } - -static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) -{ - folio_set_lru_refs(new, folio_lru_refs(old)); -} #else /* !CONFIG_LRU_GEN */ =20 static inline bool lru_gen_enabled(void) @@ -398,13 +510,6 @@ static inline bool lru_gen_del_folio(struct lruvec *lr= uvec, struct folio *folio, return false; } =20 -static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) -{ - if (folio_test_referenced(old)) - folio_set_referenced(new); - if (folio_test_workingset(old)) - folio_set_workingset(new); -} #endif /* CONFIG_LRU_GEN */ =20 static __always_inline diff --git a/mm/migrate.c b/mm/migrate.c index 23248484a165..bb52bb8565f4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -770,8 +770,6 @@ void folio_migrate_flags(struct folio *newfolio, struct= folio *folio) folio_set_active(newfolio); } else if (folio_test_clear_unevictable(folio)) folio_set_unevictable(newfolio); - if (folio_test_workingset(folio)) - folio_set_workingset(newfolio); if (folio_test_checked(folio)) folio_set_checked(newfolio); /* --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9E2E425CF7 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=SseGKev+z8jyc+5/kpwdas4URu3pfxErMZg390m7ngwGlC3ifKcS4Pe92ROBlfgiYyBNndS7ZI7GcyNlrkpCGfF9FJZHW7vXL1tbvvR1BGIyV6jz4H8NPDgXjs6tiwb29nMMtnjpurtn/8dZo0LA50S3FLtHGjouvl5ADo+tzzA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=Y7iBY7mlSN/lByjcSiiA8Q2HY0ocRA1rqhdeQIY8wfw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iYOtULrZOfpgQ/PuzkrSqco8leoyackDi+NznW1CPkK2YsVZLu+cFhzOLndgXhy/i/NdOlYRkzaYNGiHpL3JoUnl/KVAY5jIUWKvIg2Y7NTpS20rS8kAtZPTXetOJJtTCRmT26cYKvvTnPBRI//nLyhac633J1ieKc5hvYUo+/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WQLswZ3B; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WQLswZ3B" Received: by smtp.kernel.org (Postfix) with ESMTPS id A47D3C2BCFB; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=Y7iBY7mlSN/lByjcSiiA8Q2HY0ocRA1rqhdeQIY8wfw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=WQLswZ3BHqk8QsMCosDxCOyRah8sCK9mM8d+kHVbj1/sjNFSDMizqlbEYh8eV8gFp 5FzCmv4T0CDYQY0u1qBcMKsUpx1gmp/W4M7g1J2uHn3BAdLpAL0ka/DWs4NZaGEePU QfG1p2cUQeuVlGXb5s/XIg0drUrMsUeqOVhulkrNGOfPbGxIoCXzjMYi7SsAfe2JWc 8YR8iLvEQ4qppwkqAIaN5Pm+ypzhMsVL6ldFZwc0dujcVdEgvps6uHGYxlJS7LBLWE rlGjqrRnbYzi/NRf1fO9LSkprAqu8RTZfPU+lUG2XB6VxwGAS9jxPc9yUYbi0BBaKv g5BVNKX2krNQw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AFF5CD3428; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:55 +0800 Subject: [PATCH RFC 09/32] mm: replace folio_set_workingset with folio_mark_workingset Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-9-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1960; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=R9UYu0s/pryes2odZArBSEeX7+UjdOfURrA6S5mawmM=; b=oHo47lWbXKavCucKWmMfLLUs3lCZ0hcCiiwYPQV6fBHKDp5kb11/cGdfRYKUhtz89xvWpZvna Pa8zYdBQ5fTCdPmmbqpdrq8R5vX0i4CocHaRDxIN0oGIoBXjqpah5tm X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, new help is bit-wise identical, just to prepare for removal of PG_workingset. Signed-off-by: Kairui Song --- mm/madvise.c | 4 ++-- mm/vmscan.c | 2 +- mm/workingset.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 69708e953cf5..930939c55cd5 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -428,7 +428,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio_clear_referenced(folio); folio_test_clear_young(folio); if (folio_test_active(folio)) - folio_set_workingset(folio); + folio_mark_workingset_by_bit(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio)) @@ -543,7 +543,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio_clear_referenced(folio); folio_test_clear_young(folio); if (folio_test_active(folio)) - folio_set_workingset(folio); + folio_mark_workingset_by_bit(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio)) diff --git a/mm/vmscan.c b/mm/vmscan.c index c6857a933ebf..2fd62d02a83a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2141,7 +2141,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, } =20 folio_clear_active(folio); /* we are de-activating */ - folio_set_workingset(folio); + folio_mark_workingset_by_bit(folio); list_add(&folio->lru, &l_inactive); } =20 diff --git a/mm/workingset.c b/mm/workingset.c index bdb8df6009af..bdee91f54e61 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -586,7 +586,7 @@ void workingset_refault(struct folio *folio, void *shad= ow) =20 /* Folio was active prior to eviction */ if (workingset) { - folio_set_workingset(folio); + folio_mark_workingset_by_bit(folio); /* * XXX: Move to folio_add_lru() when it supports new vs * putback --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE270425CFE for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; cv=none; b=rvahPus5Q//biv1TP3SIWbGumTbBB0a82bz21NblO7+uPw4Bp4yf2RDqahQQc5bT405ZufDzZVDRA3mDtJBKgpR5zgO/5gMsTXdEOaIr/JBqTqqpVRthQ8Ur8aHzyF4CUgVOa5ZozU9y0N9HY4Upy+4zL0Rj0DVnWKQ1wBq/mP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669452; c=relaxed/simple; bh=VkzoIoo4iNGRdob7TlEQIpSeG3yvGqdZ1JEVDVZwjEk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TOfT01m6B+fmDlLUpT1wOsjV9bjBKf18/uv3FS2wk3Zp5Kow80xPX5uNhC25p60roq6UBpkuHesbB2NWHMjakOR0XWXhuMPeda4oHO6ZWMbJheOWZ41o9E+Z309s+Kit5E/5qEToGXal7yqbuMvpvNGuvIP6O/qzE5eAuC22akc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o5u4Z4Z/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o5u4Z4Z/" Received: by smtp.kernel.org (Postfix) with ESMTPS id BC0E6C2BCFA; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=VkzoIoo4iNGRdob7TlEQIpSeG3yvGqdZ1JEVDVZwjEk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=o5u4Z4Z/3+tFnF+UV5eACSPqnwB/BROO8kRPXIjGw0VDtLxn+ri4+NVjDKqZHLHBA 4NUNkUNs4ou/CxO9X9u24D/8LxMS0up6HJGj8kA9nrOSFnEJnjCyxg4eK5DJW+yAKw tvF/Xu6c1/8p1YwvgZGzR7Toa0DzDexZQr5o/B/LRJzwBfXzn4huTs8vej7ljMQOlr eMJMiaDuSWkaKoEjr5FhlPnycQ9ApVQx94lnTIT9pYLpfJsoMhgnPMOykGv6ofo53Y CjuTqM01MIbO0Ubx3FSZCGRD5Al4Ej8gMa59+z3cLLWaUC6VjthgxHTBjCOd4/mi0s sqij3Gl7gGUvA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id AECB8CD3423; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:56 +0800 Subject: [PATCH RFC 10/32] mm: replace folio_test_workingset with folio_is_workingset Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-10-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=6427; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=R3jorkkLRpDVsaAXmdj3l7nh5hAt5IvzuBZ6Gm5Pqmg=; b=o1Oy0PnIcsoV8EG77I3PhPiNKZxsAv+s5jJS+1nT6dut/a2CmRW+2BfV8rjqYcPUVz3kSMca9 hftH+YUfXLhAp/J5ftyLTbfP4xoKGNHQIaO26+3+L30vUaPbNdkgWH+ X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song With the new folio LRU refs tracking API, a folio is considered a workingset folio if its referenced count > 1. This is compatible and reasonable in many ways: The PG_referenced and PG_workingset are used as the lower bits of the LRU refs counter, and MGLRU will make use of extra bits as higher bits. So when MGLRU is disabled, all higher bits are always 0, making the check bit-wise equal to the old behavior. Active/inactive LRU sets PG_workingset explicitly for folios moved from active list to inactive list, and that makes the LRU refs tracking API (folio_is_workingset) report a referenced number > 1. Clearing PG_workingset will always return a value <=3D 1. When MGLRU is enabled, a folio referenced twice is considered a workingset folio, which is basically the same as how active/inactive LRU used to promote a file page to the active list. Note for active/inactive, the folio has to be marked inactive and PG_workingset before eviction, but MGLRU doesn't have a demotion process, so this simplified check ensures we have a stable definition and accurate readings in PSI and readaheads just like before. Signed-off-by: Kairui Song --- fs/btrfs/compression.c | 3 ++- mm/filemap.c | 10 +++++----- mm/page_io.c | 3 ++- mm/readahead.c | 8 ++++---- mm/workingset.c | 2 +- 5 files changed, 14 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index b2393a48a8fe..ac7bfb0017c6 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "misc.h" #include "ctree.h" #include "fs.h" @@ -461,7 +462,7 @@ static noinline int add_ra_bio_pages(struct inode *inod= e, continue; } =20 - if (!*memstall && folio_test_workingset(folio)) { + if (!*memstall && folio_is_workingset(folio)) { psi_memstall_enter(pflags); *memstall =3D 1; } diff --git a/mm/filemap.c b/mm/filemap.c index 4e636647100c..50897ca1d74e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1254,7 +1254,7 @@ static inline int folio_wait_bit_common(struct folio = *folio, int bit_nr, bool in_thrashing; =20 if (bit_nr =3D=3D PG_locked && - !folio_test_uptodate(folio) && folio_test_workingset(folio)) { + !folio_test_uptodate(folio) && folio_is_workingset(folio)) { delayacct_thrashing_start(&in_thrashing); psi_memstall_enter(&pflags); thrashing =3D true; @@ -1409,7 +1409,7 @@ void softleaf_entry_wait_on_locked(softleaf_t entry, = spinlock_t *ptl) struct folio *folio =3D softleaf_to_folio(entry); =20 q =3D folio_waitqueue(folio); - if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) { + if (!folio_test_uptodate(folio) && folio_is_workingset(folio)) { delayacct_thrashing_start(&in_thrashing); psi_memstall_enter(&pflags); thrashing =3D true; @@ -2492,7 +2492,7 @@ static void filemap_get_read_batch(struct address_spa= ce *mapping, static int filemap_read_folio(struct file *file, filler_t filler, struct folio *folio) { - bool workingset =3D folio_test_workingset(folio); + bool workingset =3D folio_is_workingset(folio); unsigned long pflags; int error; =20 @@ -3787,7 +3787,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_f= ault *vmf, * Don't decrease mmap_miss in this scenario to make sure * we can stop read-ahead. */ - if (!folio_test_workingset(folio)) + if (!folio_is_workingset(folio)) (*mmap_miss)++; =20 /* @@ -3845,7 +3845,7 @@ static vm_fault_t filemap_map_order0_folio(struct vm_= fault *vmf, goto out; =20 /* See comment of filemap_map_folio_range() */ - if (!folio_test_workingset(folio)) + if (!folio_is_workingset(folio)) (*mmap_miss)++; =20 /* diff --git a/mm/page_io.c b/mm/page_io.c index 70cea9e24d2f..a78a4e753650 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "swap.h" =20 static void __end_swap_bio_write(struct bio *bio) @@ -614,7 +615,7 @@ void swap_read_folio(struct folio *folio, struct swap_i= ocb **plug) { struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); bool synchronous =3D sis->flags & SWP_SYNCHRONOUS_IO; - bool workingset =3D folio_test_workingset(folio); + bool workingset =3D folio_is_workingset(folio); unsigned long pflags; bool in_thrashing; =20 diff --git a/mm/readahead.c b/mm/readahead.c index 7b05082c89ea..f3b03d6e7828 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -291,7 +291,7 @@ void page_cache_ra_unbounded(struct readahead_control *= ractl, } if (i =3D=3D mark) folio_set_readahead(folio); - ractl->_workingset |=3D folio_test_workingset(folio); + ractl->_workingset |=3D folio_is_workingset(folio); ractl->_nr_pages +=3D min_nrpages; i +=3D min_nrpages; } @@ -460,7 +460,7 @@ static inline int ra_alloc_folio(struct readahead_contr= ol *ractl, pgoff_t index, } =20 ractl->_nr_pages +=3D 1UL << order; - ractl->_workingset |=3D folio_test_workingset(folio); + ractl->_workingset |=3D folio_is_workingset(folio); return 0; } =20 @@ -797,7 +797,7 @@ void readahead_expand(struct readahead_control *ractl, folio_put(folio); return; } - if (unlikely(folio_test_workingset(folio)) && + if (unlikely(folio_is_workingset(folio)) && !ractl->_workingset) { ractl->_workingset =3D true; psi_memstall_enter(&ractl->_pflags); @@ -826,7 +826,7 @@ void readahead_expand(struct readahead_control *ractl, folio_put(folio); return; } - if (unlikely(folio_test_workingset(folio)) && + if (unlikely(folio_is_workingset(folio)) && !ractl->_workingset) { ractl->_workingset =3D true; psi_memstall_enter(&ractl->_pflags); diff --git a/mm/workingset.c b/mm/workingset.c index bdee91f54e61..fa644948c80e 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -415,7 +415,7 @@ void *workingset_eviction(struct folio *folio, struct m= em_cgroup *target_memcg) eviction >>=3D bucket_order[file]; workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, - folio_test_workingset(folio), file); + folio_is_workingset(folio), file); } =20 /** --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB46F426687 for ; Fri, 1 May 2026 21:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=RybWSAYdTZC0knkfxECIS8ZJ/Wu5u8Vr43NIPTAyfDgKHgRHnTHtxYgxDs4gR1FSl1nbpe7VQ4+jGx+pPfvWdGF9l62iufN4Mn7uiDNrnXcWXMOnV3gEaAqYllLzArVmlKTUWN9XPcz6TCj0TbdIeFRqH8uP05QDQEIXqmC8RX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=Fz1qR5iuVNG5vOBxWH0sEBLOHYRtiFoBzzSrTgox+z0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=U3LSDQhDsuixCNkrm8GsQAY2Ngo+QHRFsJA5NFlILDiuFmTMJsr3w+mJOJqbFbGzjSxl70LNasFKsKi4GC8Y81XWFAdkWz0MjCP+1898k5cSNe3m2/OMIjgV45IR0GWmZW8FTbD7CdOmd1Q0tQW+WdvK+n/Nwx5sd3Ngl/MJTMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BE+GeB87; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BE+GeB87" Received: by smtp.kernel.org (Postfix) with ESMTPS id CB408C2BCB8; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=Fz1qR5iuVNG5vOBxWH0sEBLOHYRtiFoBzzSrTgox+z0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=BE+GeB87tufb+uHZvkRCAVG8CivOBSoMa+3eAcx34P5M6XspGI3INJkn6PLa/SDHe swrCgxxleaV3B/fQXrKH0nGbeJ6n9wboggkX7nAShmnNlSXQo+d32mpCirntzpgjcn uRimT6GAQp6wY0VclAb0dzIa2yeaVY7RjnBOtDv1ZxOiZc8mKr2Q/6nDmqrZDd3XOm VFoYYFBZt+8y+qcAAJctH8Uv+kzb/YWxOqTR+7ddMzgVQorjLUwMY+zKu9LBILgLQH nZexJYJE1gl6v9mzA5cUEwoXKxw49QegF/daMfDIOsWRWhBkTnVTWqki//1rFDH2Mg 10rn0Rk1auBJA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1A32CD3425; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:57 +0800 Subject: [PATCH RFC 11/32] mm/smap: report workingset folios as referenced Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-11-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=2528; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=bAs41qLSTJXf0JYRd4i0pQir7expLyGkScY1vBkRC/o=; b=P/ANv6ue43Q6QQ4noPOswZ0RUbI8N/S5PXWVt7oIEUIJ76Te+R11GKtYDMIKwrJ7cy95eHrZz xhljUlaTrpYApGqZAapScC6xKAxGJukXEyEYE+cWzQ3vic4cOQSp0Lf X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song For MGLRU, switch smap to use the folio refs count API so smap will report all folio with referenced count >=3D 1 as "Referenced". Current smap checking PG_referenced is causing folios to flick between referenced and not-reference status, because for both MGLRU and active/inactive LRU, PG_referenced may got cleared on second access. (Increase of LRU referenced times count for MGLRU, and movig to active list active/inactive all clears that bit). After this, we will have a more reliable and useful reading for MGLRU. Signed-off-by: Kairui Song --- fs/proc/task_mmu.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 751b9ba160fb..617f457c75bb 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -913,6 +913,22 @@ static void smaps_page_accumulate(struct mem_size_stat= s *mss, } } =20 +static bool smap_check_folio_referenced(struct folio *folio) +{ + if (lru_gen_enabled()) + return folio_is_referenced(folio); + else + return folio_is_referenced_by_bit(folio); +} + +static void smap_clear_folio_referenced(struct folio *folio) +{ + if (lru_gen_enabled()) + folio_set_lru_refs(folio, 0); + else + folio_clear_referenced_by_bit(folio); +} + static void smaps_account(struct mem_size_stats *mss, struct page *page, bool compound, bool young, bool dirty, bool locked, bool present) @@ -939,7 +955,7 @@ static void smaps_account(struct mem_size_stats *mss, s= truct page *page, =20 mss->resident +=3D size; /* Accumulate the size in pages that have been accessed. */ - if (young || folio_test_young(folio) || folio_test_referenced(folio)) + if (young || folio_test_young(folio) || smap_check_folio_referenced(folio= )) mss->referenced +=3D size; =20 /* @@ -1701,7 +1717,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned = long addr, /* Clear accessed and referenced bits. */ pmdp_test_and_clear_young(vma, addr, pmd); folio_test_clear_young(folio); - folio_clear_referenced(folio); + smap_clear_folio_referenced(folio); out: spin_unlock(ptl); return 0; @@ -1730,7 +1746,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned = long addr, /* Clear accessed and referenced bits. */ ptep_test_and_clear_young(vma, addr, pte); folio_test_clear_young(folio); - folio_clear_referenced(folio); + smap_clear_folio_referenced(folio); } pte_unmap_unlock(pte - 1, ptl); cond_resched(); --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C2F042668B for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=czBZidO6McMDbJcA6vBCwc6ZG6eAnc7Zz+V+K/TREn8QGPqqacHneLoqjUE6+n5mManyQJUWW8T7XPuEE3gqfecKy7/RLdez2VImdPSjYh+nvxpcDvw7Jh4UD51lMO5g4lr+fY8izcxIDzV1aV8KPKQQISki/TytLsaaUZw5gwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=POr4zid/4RAX0e9kjwSEM1rU1FDoknetSEYkoF6ny4E=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AauNInWsXshg0wjZyXTqio4l92Nf0O2cn8uK+Mp7d2zSWQO/f9h7AM3gdoiQaIEMEvN6a5yrTTigExo62sMgHMrzt6Y01tVEbW+0NjN6YPjVIBc0z9wdbUfC4wGHL7ZZQRRGTsrlwDbwtRr0NABYcHsyNl3ebXHYRXtaotRSpw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bDzk3XUq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bDzk3XUq" Received: by smtp.kernel.org (Postfix) with ESMTPS id DDA41C2BCC9; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669452; bh=POr4zid/4RAX0e9kjwSEM1rU1FDoknetSEYkoF6ny4E=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=bDzk3XUqNo9kUbbyVs69WOx+nKE5NIn/Zj8cboVaatIVWblhdAr3pzpymdvN38QQ9 ygnDOvOS9LMDze4Bq1b0LX4WczKjSPi5PBi22ZA9MbeNNKFV0oloMUkaEjkL+ZwEM2 AmNVa86XU96dQL6b4qTUWVIr1T0YN9hEronUvR7002N/8a9HW9PfkhHz6zr6ok5TMR g8Oyu1EFHLhIQDlB7xnSJC2q85sKN/x9rJ/0GROQKD7LkG/2BUQm58bE3mH1YxHN3g mG35kjkNxLoqDXimXdhZO5U4rUAJziqfuqWLnjiPB5gQH18MbcRwtvprwAkIcaHB4G IooXSwEhPKOvg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2807CD3427; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:58 +0800 Subject: [PATCH RFC 12/32] mm/huge_memory: mark file folio as accessed more accurately on split Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-12-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1863; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=k9aFXwHSgVQ00njEcryM7TIGI1VaTNApzaY86FSiuv4=; b=QqCHLzigIMmqtIvHrFqvpIctYLj4s2so+ICW0Z0obHQIhpFKNjP3tQFD/+qeGoRVwmacXTFqZ uxnxT2yFKm2DzyUAKs9L1I8OUW5nQDZogw7ArbrYJaph0LCJt25LX1B X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song The behavior of updating the folio's access info isn't consistent for huge mapping splitting or ordinary unmapping. Actually right now, there is no huge mapping splitting for such folios, we simply unmap it, so the page table's young flag has to be translated into folio's access info. Right now it only check and set folio's referenced flag, which isn't enough since folio flags update on access have its rules. For example, ordinary unmapping (zapping) calls folio_mark_accessed(), and it also checks if the VMA has recency to avoid false updates. So use the right helper here and be consistent. Signed-off-by: Kairui Song --- mm/huge_memory.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1f0d0b780943..87a2640e3396 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3023,8 +3023,8 @@ static void __split_huge_pud_locked(struct vm_area_st= ruct *vma, pud_t *pud, =20 if (!folio_test_dirty(folio) && pud_dirty(old_pud)) folio_mark_dirty(folio); - if (!folio_test_referenced(folio) && pud_young(old_pud)) - folio_set_referenced(folio); + if (pud_young(old_pud) && vma_has_recency(vma)) + folio_mark_accessed(folio); folio_remove_rmap_pud(folio, page, vma); folio_put(folio); add_mm_counter(vma->vm_mm, mm_counter_file(folio), @@ -3141,8 +3141,8 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio =3D page_folio(page); if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) folio_mark_dirty(folio); - if (!folio_test_referenced(folio) && pmd_young(old_pmd)) - folio_set_referenced(folio); + if (pmd_young(old_pmd) && vma_has_recency(vma)) + folio_mark_accessed(folio); folio_remove_rmap_pmd(folio, page, vma); folio_put(folio); } --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 188FF426692 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=fTi58V0Lobmlrsz5A44N//gaPtXi/pTrdwKMfK0LwQY0J4dcluWtqjArE+msLCg8SnarxXPQpMHL4IjB28K3jrrTE9//HKrlKxwQxZcaB+gryanisxj9icUSUm19AeEs0g6QAP/XRhQHb+NCjqy20exIjvAXXeWSbADSSl79c4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=tEQvyj84bVP6Fstmv7rAnw0q1fJ254JkDNTXkCqnrAs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PW4drzd3oCQZNGND4SsvXyoV+AOsxWsecFih046s5FuQ5G7eLhOG3OmMAgJHf4BWgkpdAgsFs1DABmy3ik7akCZgIiTxr+wUcda42l84JIMqSddSg5zlRBnNYARDIpu6n3KOfquo3RpwCEpIuShQ+Eyie+dhYDNml3dI9UA8Zwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PFBseTCo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PFBseTCo" Received: by smtp.kernel.org (Postfix) with ESMTPS id ED3BCC2BD00; Fri, 1 May 2026 21:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=tEQvyj84bVP6Fstmv7rAnw0q1fJ254JkDNTXkCqnrAs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=PFBseTCohhWhGfR0FM/Nmk7ev53x9Ud0IzpvJXTUQ3CDHhJDeDoRXsp88mW6xPEUm y/OiQgDmlbBRmGeAdS4C9yQShd+qM8HSsuBbMm219+FN834qRqZoQJL354OYVQ1Wqc Ua5Ov5SpZuGZX2jDpZDLUnjKpHDNbFA/rH3c+N60doj1nSRzSh98JfljhU17STbLlh mQEOfAb0RCI5AhIDpY7Rl90jC381G4qmzrzLVYPNSwUtGpcZIE45K95BRDMR5evUup 5mtYg/VrkuB+/cW4YSsjkbP/M5YX80FK9y1fKg1rpKQu94JJdudcAJphb9AJ4T6MPX b8a3nqGbymECw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id E445BCCFA13; Fri, 1 May 2026 21:04:12 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:03:59 +0800 Subject: [PATCH RFC 13/32] mm/khugepaged: consider workingset folios as referenced Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-13-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1950; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=UByiLYjFt2+kqAxm04OpzowbaWLuDjdJ8bJGsAYOXBE=; b=rN+ZBnZrzZD6vSDJMUHL0q+E5ZYHa/RaFdPJvCTAyQgAUBzFuz+g/IdZd5U6SrCWwaRFYnHSv Iys2yTORjLYCkFTarebaM66SXYa0dlv8YaJMk+cjtcUCldB69BmOlrO X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song The folio_test_referenced check here is clearly trying to test if the folio was ever referenced. It was first introduced by commit 8ee53820edfd ("thp: mmu_notifier_test_young") as an supplement of the young bit check. Folios are marked as PG_referenced on first access, but following access will clear their PG_referenced on second access. So checking only the refer= enced flag is not accurate enough. Switch to use the new helper, so we can cover the secondary and following access from MGLRU side. For non-MGLRU, this will make it return positve for workingset folios too, which should be OK. Signed-off-by: Kairui Song --- mm/khugepaged.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7d48d4fbd5f3..af9217e19eac 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -815,7 +815,7 @@ static enum scan_result __collapse_huge_page_isolate(st= ruct vm_area_struct *vma, */ if (cc->is_khugepaged && (pte_young(pteval) || folio_test_young(folio) || - folio_test_referenced(folio) || + folio_is_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, addr))) referenced++; } @@ -1743,7 +1743,7 @@ static enum scan_result collapse_scan_pmd(struct mm_s= truct *mm, */ if (cc->is_khugepaged && (pte_young(pteval) || folio_test_young(folio) || - folio_test_referenced(folio) || + folio_is_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, addr))) referenced++; } @@ -2729,7 +2729,7 @@ static enum scan_result collapse_scan_file(struct mm_= struct *mm, /* * We probably should check if the folio is referenced * here, but nobody would transfer pte_young() to - * folio_test_referenced() for us. And rmap walk here + * folio_is_referenced() for us. And rmap walk here * is just too costly... */ =20 --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EEB242669B for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=HTzfJMMNAqFUHqanGf+4HAb/T2nUojf/HkKv5iB94ck6CWEuzR4Wak3rEecrlDsFSR7fzj6uR45PqKCagH7+dW+xAR3TK5JGonN9OFz3pPJVGSNVDxryGpyL4F3MMeBrn4a+eHuwRr/PbEp4r+4Vae2GJ4w81QWZipW/XfCQa7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=CSC1oi4wyTxesQHZPPunJdMxPvfSzuzhFGjoMvX2PIQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=emgMlMLuQMZYj9/Gc+CXh8ef8dlfq3JDBOaeiWJ20kLke/3yT1uPz5CM1fqL+2C1ueHsSam9ag2I8mCsNp68mY/iACT/e5uNlELbb9cHM3DyxpZz3fqxqdx/7MsyXWtmjdBAt+8zifYebHNSvyRRNJTTbhP4wU7NNEOSnkUcSoA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DmFIAXtO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DmFIAXtO" Received: by smtp.kernel.org (Postfix) with ESMTPS id 0EE78C2BCF5; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=CSC1oi4wyTxesQHZPPunJdMxPvfSzuzhFGjoMvX2PIQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=DmFIAXtOGlRJorh94KbBVFxch7PD6wkK/70yQ+Bj8F2VxZkvozpE7G64d/BvqhlfP Xc6Q2/WLLjgaPKGmBtbnVdZ65tQQGdkr1rm0Lliu054sZSIM5ElJ/T3rtEnz19IgXu CUF/9JHL/R5nWq+izgCXDCr+Y+yWA5yWASNPkhwQIqxyQwgFBUzwV6gXjv8uZAI1u5 ESRKeugdqKRPmcyAkOz9lt+WFYOjS7CnUq2oqOkttlHOFCh9Pkehx/gwW8qB0Wpdcv 1Shruaf1GD2DGbjLykT8YkVOUyBgr4TWe3kUmEf4RIXeu/Bcfo1zIeVDBvSQiPjbbH 8z2sWMtLfPfdQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03237CD3423; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:00 +0800 Subject: [PATCH RFC 14/32] mm: convert rest folio LRU referenced usages to new helpers Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-14-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1270; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=s9IM5saCZOi1yMrQ9wd37CGnZKAxDY3IwusY5KAp2fM=; b=AjjRfwfd76/c82mEMrSgQlxsNUhdUausxOPl6q2Enso4MgBpO0AY3ccxZG3B8/BDqvylfIBb0 ERx+uZevY/aDpfqVoUhUizG7Af7iJlG4HWFUFRPpixkNa46KQ/KyKiL X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, everything is bitwisely same with before, just preparing to drop the page flag. Signed-off-by: Kairui Song --- mm/filemap.c | 2 +- mm/swap.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 50897ca1d74e..91a1d1c03475 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2015,7 +2015,7 @@ struct folio *__filemap_get_folio_mpol(struct address= _space *mapping, =20 /* Init accessed so avoid atomic mark_page_accessed later */ if (fgp_flags & FGP_ACCESSED) - __folio_set_referenced(folio); + __folio_init_referenced(folio); if (fgp_flags & FGP_DONTCACHE) __folio_set_dropbehind(folio); =20 diff --git a/mm/swap.c b/mm/swap.c index 5fc8a9ffbedb..6e0397ff881d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -514,7 +514,7 @@ static bool lru_gen_clear_refs(struct folio *folio) * * active,unreferenced -> active,referenced * * When a newly allocated folio is not yet visible, so safe for non-atomic= ops, - * __folio_set_referenced() may be substituted for folio_mark_accessed(). + * __folio_init_referenced() may be substituted for folio_mark_accessed(). */ void folio_mark_accessed(struct folio *folio) { --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EACB4266A6 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=tbkCuDpjs0AhoUt2ZdmKk+WdMQkMdpD9a2qZ00lSnt/ckmzgI85YXFHaTs+HtDjrS2ISbkSJY2f4H2Ggc5f3hRpLhPQDlr61U+8z9Km99railS7hAmvso0gxoaz+wCw41SCDfE2m4p38w4UcJcQbx3WJ5iQjUm1d2nyEROQTRHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=6HpqHKl1SwEhSBNLQO59Km2e9sgznTR6swPQ6bwX5OY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NJEPphbRee//hUUtfzjgf09G0yAowl/q5rJIkiEuTbxTTe1uUUblOPYhhLB9Fg7+1U8Dm0Y/Pq5MZMbFO8ckZ0eUC4SyEH7GtD9KiX81IGCI78/QMLQ69j2cPpPeiKKs2VO263yo+7r/oH3LgKIp3JPuWEjO7HllwZFJ4In/wxA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NnUeZIXW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NnUeZIXW" Received: by smtp.kernel.org (Postfix) with ESMTPS id 21368C2BCB4; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=6HpqHKl1SwEhSBNLQO59Km2e9sgznTR6swPQ6bwX5OY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=NnUeZIXWBuxNKOdej3BXDeB8GMf0T1irLfF9GFl9K0VefrCJl8OoVFjh2Oe+ciV1x stj7+T4bojxgrthBfP8ACDhsoLfSwrZB9CtIF2QRwtTjWQ3Z9aky6vOSZEAAARe51M p1pum8NximN/MgF5wzz641i0LVhoJ943/pwng9N/ZJbnKSOCSRrXmD3KYw9cWYhPha wDNHZnMuxfmY4yiqkNS66QKIJ8aZrONFyztD2Nei3s0E8s703Lw6etE5jJ16PFYort 9cPb3LgfsdmnuSuA4kJzSpJIZPnwMK6xYq77dJgmHNBg/YyBgyck80IkRhig1JIHxI tzqRLJPrqy4HQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18590CCFA13; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:01 +0800 Subject: [PATCH RFC 15/32] mm/gup: use new helpers for marking folios as referenced Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-15-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1672; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=zzj2htF5O3Boz+i6Si7xd6bJD7nkf/a9RkvvaueQ9so=; b=sZi3qcG1zBUsLAgNczKqAvDXvOrUzPJS593cEkUbvLcZSL+TA/nS/UnQHUw8/BbKhSltO+z1A MbjywAvxq4tDfpfGluN3y7BIjR9yZGpQIA1+BNH1U2QKP35eubRjqAL X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Marking the folio as referenced in gup fast was first introduced by commit 8ee53820edfd ("thp: mmu_notifier_test_young"). Then later refractored but still basically the samething in commit e93480537fd7 ("mm/gup: Mark all pages PageReferenced in generic get_user_pages_fast()"). Some implementation may changed over days by the goal is the same: provide hotness info for bet= ter memory management. We have made all users that cares about a folio's hotness info to the new helper, so convert this updater too. NOTE: it would be better to convert to folio_mark_referenced later. Signed-off-by: Kairui Song --- mm/gup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 21ea90baefb1..78ffa364ff18 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2898,7 +2898,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp,= unsigned long addr, gup_put_folio(folio, 1, flags); goto pte_unmap; } - folio_set_referenced(folio); + folio_mark_referenced_by_bit(folio); pages[*nr] =3D page; (*nr)++; } while (ptep++, addr +=3D PAGE_SIZE, addr !=3D end); @@ -2967,7 +2967,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp,= unsigned long addr, *nr +=3D refs; for (; refs; refs--) *(pages++) =3D page++; - folio_set_referenced(folio); + folio_mark_referenced_by_bit(folio); return 1; } =20 @@ -3011,7 +3011,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp,= unsigned long addr, *nr +=3D refs; for (; refs; refs--) *(pages++) =3D page++; - folio_set_referenced(folio); + folio_mark_referenced_by_bit(folio); return 1; } =20 --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53F584266AC for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=tWY6AQCD02Hqt9P48f9waJZZ9Adrkk2TlWrcp46aWMQarlCa7vx1mfacD/k4LB2GaH29al83gFUv+eQKn48CX+EtsYNmpHeaD7EQfg1fXJUoTe0sE2E+MFyjmyWymhQ6x1C9/OaCHFXiIFU+Kizz07FY2zTykurjqpBwNPsmBNs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=CpnKWsO08wiB8cn9JIys6r/uZDfOE9UFk/t21QroBUo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=c3DaJ3dL91RA3bfptsKH3fyfjcU5ZrZw51zgFUvGyMH4xWZ/ryjiDUYph3w61xZgcCClISVRh4f0GWdRy2VrsDez5Ab6KZf/zKyVu7+jfmSj3zvlGLnc0DrFD9Ttm4gCWOUiNJ8FrqirPC+i/XMCXXe6p4b0sN1kc2LcMfIzK6k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qgCn+zcQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qgCn+zcQ" Received: by smtp.kernel.org (Postfix) with ESMTPS id 35E58C2BCFA; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=CpnKWsO08wiB8cn9JIys6r/uZDfOE9UFk/t21QroBUo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=qgCn+zcQB3xec9hvFg2q3r7Bc+7E6ej6hyfkKHfddvOMOoY3qjN9PfqjofAuKlhyD df590To5SsJAVfOC/QHT/HkxibWVqHQQf7A6esRMHxd1+TbnwQmOgzjHifWr1RQEbX sZKU5sIJ8y2I/riJOPQxGENju7YabOKWVQcOIu8M1W5cmtfIk5pqa30DjA+AzraO1w uJN4pdHndGbBmc3Rlc2IRjDn3DS198mM6gxXmCsEFGgU4K39C7T0jYLZUR7IPwzokU m4txlLiP1udkMfoUiOOVU0n9WFEf7o5cEu8XRQQ6+nJx5W6bu7W7J53JrCOz3Bwxs0 Q9tzVmTF6mWzw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B720CD3423; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:02 +0800 Subject: [PATCH RFC 16/32] mm: convert folio referenced flag usages to new bitwise identical helpers Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-16-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=999; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=+5qun8U2KQYDmQH0ZIm/6ejG1bASGFZfAJQuD/re23Y=; b=lSMsnUvxGGUaYG0E/v+f1nxrr3TYC9U+s6XCF5kTjEYfQM7zbRpIjFFhSJ4RiZ/Cp9u0YGRqE oaQcIqhCOdFBSP97uzGceWG0hU5snTAOcZ2U8TXEw2Y4nZ5E9KEBA6c X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song To prepare for removal of this page flag, use the new introduced standalone helper. Signed-off-by: Kairui Song --- mm/swap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 6e0397ff881d..2250d9db1395 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -525,8 +525,8 @@ void folio_mark_accessed(struct folio *folio) return; } =20 - if (!folio_test_referenced(folio)) { - folio_set_referenced(folio); + if (!folio_is_referenced_by_bit(folio)) { + folio_mark_referenced_by_bit(folio); } else if (folio_test_unevictable(folio)) { /* * Unevictable pages are on the "LRU_UNEVICTABLE" list. But, @@ -544,7 +544,7 @@ void folio_mark_accessed(struct folio *folio) folio_activate(folio); else __lru_cache_activate_folio(folio); - folio_clear_referenced(folio); + folio_clear_referenced_by_bit(folio); workingset_activation(folio); } if (folio_test_idle(folio)) --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65F1D4266BD for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=H+vRcCJr3h4CocP+s28o1wgifn1KmJ8VM8fFu5uvZZkMziNWkEbeKWHhcJg0BKw0nGjdnAOhG55S9Jv8HttwLiRbcVVAaTiUSqe1EPg5c1aIHhMmak7Jo3P2qGyxU1HO7YQJqKjAv8yhtfsygH5Hme625TjiUs9YA9ZmwTUG+8w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=m5RrhLaoLypBw5aoyEJ47nhbPvpGV8LZ8VpVCX9MAyQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iTn3lUebTx4DvbN4tB54p1pRFDjGtLNq0Aly3nBlUVFkgkBzpAH4WIzw368YNj1RSTt3ITsyxJjw3Y9r0SPi52ps9bTFFz7QN7/HN6AUcn3Id5xFsHvQDbdasdUweQlYzElxnu/sBB6xMC8gzS/7RXnFSZH2cW1txrKQefxXt+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vBUsA9Bm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vBUsA9Bm" Received: by smtp.kernel.org (Postfix) with ESMTPS id 473CEC2BCC6; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=m5RrhLaoLypBw5aoyEJ47nhbPvpGV8LZ8VpVCX9MAyQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=vBUsA9BmMV7k+paFfpI1TuIstYuG1OuNs5X1JDjpk4jvQg3bCg0jI2LCKfgmc+lxR Bo9FQ+PXCWWOz7X7nom8s+KRzQq0nnqSATTADz9U1CrcV8BcbrLZYRDKhCeygzgXoK x9ojAGH48F+HRcYDpMAKA3bWOxcS1u4Byy3ShljzI6r/s2Fm4XrlTN3YWvif6VJUYc XG0A9qlWUhoV6DBuEGP51+FidhBrITGN4fWouoKuXRSznxn3T+PAWg0fXQoJtAYBzw /M0EinOQQ4MzDMycqCQlKZSdrqQ38VfLKR2olh5kNf8JEaRDNg0bxqWkcPBkOSuQ4o 1qH/V3kcxSkyw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EA6ECCFA13; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:03 +0800 Subject: [PATCH RFC 17/32] mm/shmem: mark folio as referenced use new helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-17-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=790; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=W5OkGQddy603UwQXhANOZyCkcCJwnjswXrRu8xbbvTA=; b=GUAAPFmU7MszYrfNginsHgKTjAvgcDq9vIizybjDxJCcex326WcuVpbP3Hh9Dp2GtnN6N7719 rpCI6b4Y3uvCwBERqQGsOLtcnYs0KEYEyRMq4NDXYQ6X2mn2AtV3Ild X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Shmem will mark a new allocated folio as referenced for a write request, convert it to use the new helper. Stay atomic as the folio, although being locked, is already in shmem mapping and could be sharely used or even have its flags field modified by others. Signed-off-by: Kairui Song --- mm/shmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index bab3529af23c..2aed99c9f70e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2595,7 +2595,7 @@ static int shmem_get_folio_gfp(struct inode *inode, p= goff_t index, } =20 if (sgp =3D=3D SGP_WRITE) - folio_set_referenced(folio); + folio_mark_referenced(folio); /* * Let SGP_FALLOC use the SGP_WRITE optimization on a new folio. */ --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7592E426D05 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=mMynI267HF1EcmhW5U6kJbDvSY1hn57vX343jwO7LJs7Ps69Jgk/tSoR5mV9rZ5opRUto8QIBnU/xtF53Oet2Ei3bbhc0OXp/mrrA1x729sQ4c4vzlFJcvS1rEqGVnKqDn6aPk5lX4/mpYtUOv6DdpHfWAlu+XIM7dOVMUxZuJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=QmJnoJw3TwuGtbylrumLElnsij6ui3k2H3ZChDmHvxE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Zo19ockHOcJhJMM5HXGHCOOITjE00eAvZXY0cyZ41+2el5A0WNl7wXarmWaAk2WbtfIKFseVs/6JrsougXD40JCgNZEjSEchDNEW03tJ8Tu+ji6tNdabW7bw/P6KfDCs2pNnfXIVSLBwYOjF7LRMSMB+pJjLfXZXKuht247uCHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dGP598je; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dGP598je" Received: by smtp.kernel.org (Postfix) with ESMTPS id 5727FC2BCC7; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=QmJnoJw3TwuGtbylrumLElnsij6ui3k2H3ZChDmHvxE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=dGP598jexZOuQRzxnUGrd6SKnlQjrYLkhnamtJXYN+oGlkhGUCoglNSvzZEl8uhB1 ozz9EnMoQ/TdeCFQSVg5nO6ESun/vEXNtUUnnupEq3jviqEbux/OpTxkrjEpxIWwmS ikAYymnXoxW6VE4J/WwBUTWinOrtOuihalLyGqrXqCAzedycFP+vJyBLadvNRuFEMq kGzBeYZ2e5BHBbMpqfEGcoIQZlaei5+fRvjslN9X9yQl54yARJ9sDNiYQldOCuyCqL EMvwd3WMAyj6TSF2eElBpasIl+6H7YEavrYswrqZ6Y6VdrRs9A1QJDUszWhzBA/28K JBiuTMumIdptQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F731CD3427; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:04 +0800 Subject: [PATCH RFC 18/32] mm/vmscan: convert to new bitwise identical helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-18-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1005; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=g0CjYTRIe9ngM5FeeFowlBKRaYV4naFyb1v+5e6oewY=; b=1O08Fjuv7dDsUSN65/Nh/EKQX5Zn+CfMlt8A8CHIlb4oXMYvuDZNoL/N8ajhstsKC7eU8DLsw Xvzd7PgG+K0BufVemZT87auAAygwRgKn3y6d7OMjSqe/ca16Fjp1BzM X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, just prepare for removal of PG_referenced. Signed-off-by: Kairui Song --- mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2fd62d02a83a..a3200020b9c2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -901,7 +901,7 @@ static enum folio_references folio_check_references(str= uct folio *folio, return folio_promote_lru_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP; } =20 - referenced_folio =3D folio_test_clear_referenced(folio); + referenced_folio =3D folio_test_clear_referenced_bit(folio); =20 if (referenced_ptes) { /* @@ -918,7 +918,7 @@ static enum folio_references folio_check_references(str= uct folio *folio, * so that recently deactivated but used folios are * quickly recovered. */ - folio_set_referenced(folio); + folio_mark_referenced_by_bit(folio); =20 if (referenced_folio || referenced_ptes > 1) return FOLIOREF_ACTIVATE; --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88D6B426D06 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=rj/tAYsjMFBYwY+iMliYdn8rH3GI8wAfkLsHD8Hp+GWvwyOu5xlFB3FTrpQMxEcDRZBmui7s1oMJ25EJ5YMS9lzDz0an//qTssc1U4E2ZNNc/+Yydb8QAnvI31Vg6Q7qkSqo6iuCUoBARI5dKqxUT/PgZIgj82T2nl4sGK8qXzU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=JbSryOj61KkH2epK9V3PZBxm87AJiD8qW4MrulV6+TQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=gg1UlpD912nfYpuYOILlhk+1jwW3tSYK6OKJsEd3RZSUMnyOKru9MuIgykrnad7Drg5+kHQRcqbaAwdlTJ5GMiczUFdqIB/yj0ooETB0tZ5KoCUzKDjT1bjVc3teB4p6+TDM4Sq9kZBtrnuYk5X1QibszA/8U1vjmAPOT5qqYrA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sow84dOV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sow84dOV" Received: by smtp.kernel.org (Postfix) with ESMTPS id 6B885C2BCF7; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=JbSryOj61KkH2epK9V3PZBxm87AJiD8qW4MrulV6+TQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=sow84dOVIH+CGAEOSD+vgPjVAdkuBcH3OA6XMqPexmwsb7bmFaH6prpyKpEOkWveo p4DsoAkmbhi1NGljLRqyed6ITPYMWoFRWsH0BqExqCgOKC5ln0Yf9JXVbhPvg9XhZy 53kJW6wLTkCqG/YaJtkTOLyiJQemSVAnB/QZZqpY0kAN19qQgVD+KuNOEqFnbGEHZn kxFDg7Icoz5kUdtXJJmU8cZnoH0KzxhVtRIk6pu0YdwBhspqV9Fqi5cq+3QS1ruJ6X AkGi2N9eg9iu9qQA4bZqLpsP7AGvOVy+gKXDOhzJ6GQ8LK+sg89Jj8BRbIygKN5+Hv 9yhAV5qvrWNSw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60F57CD342C; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:05 +0800 Subject: [PATCH RFC 19/32] mm/madvise: convert to new lru refs API and better support for MGLRU Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-19-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=2598; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=nsdGkqQnIhHZ9iD9iPaPkGtCBxehUPo1DWGXdON6M/Y=; b=GSF6KP7HamzrOX9fc6KaEbpwG7F0Vpq9uEZhUQSmAd6gKmp3/uajUtaj2M8du2Qu6ADDYhGQB k20Lra60ZzLCHE+D9B7reUUbU0PFsXCZpaolFWS7eQ2d+CUIx2JH4Pz X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song For active/inactive LRU, madvise wants evicted folios from active LRU to be considered for PSI too, so some special handling are added. But MGLRU doesn't really need this, as it has a different activation logic. Switch to new helpers and improve the support for MGLRU here. Signed-off-by: Kairui Song --- mm/madvise.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 930939c55cd5..e42266a03fe9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -351,6 +351,27 @@ static inline int madvise_folio_pte_batch(unsigned lon= g addr, unsigned long end, FPB_MERGE_YOUNG_DIRTY); } =20 +/* + * We are deactivating a folio for accelerating reclaiming. + * VM couldn't reclaim the folio unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ +static void madvise_cold_or_pageout_prep_folio(struct folio *folio) +{ + folio_test_clear_young(folio); + + /* + * MGLRU clears all reference flags in folio_deactivate, + * no need to touch it here. + */ + if (!lru_gen_enabled()) { + folio_clear_referenced_by_bit(folio); + if (folio_test_active(folio)) + folio_mark_workingset_by_bit(folio); + } +} + static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -425,10 +446,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, tlb_remove_pmd_tlb_entry(tlb, pmd, addr); } =20 - folio_clear_referenced(folio); - folio_test_clear_young(folio); - if (folio_test_active(folio)) - folio_mark_workingset_by_bit(folio); + madvise_cold_or_pageout_prep_folio(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio)) @@ -534,16 +552,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, tlb_remove_tlb_entries(tlb, pte, nr, addr); } =20 - /* - * We are deactivating a folio for accelerating reclaiming. - * VM couldn't reclaim the folio unless we clear PG_young. - * As a side effect, it makes confuse idle-page tracking - * because they will miss recent referenced history. - */ - folio_clear_referenced(folio); - folio_test_clear_young(folio); - if (folio_test_active(folio)) - folio_mark_workingset_by_bit(folio); + madvise_cold_or_pageout_prep_folio(folio); if (pageout) { if (folio_isolate_lru(folio)) { if (folio_test_unevictable(folio)) --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DF53426D1F for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=sH602nePOZ2q5Mt/rB4kQsI0JcAMcHCbNBkgCDsfzjB1X/UAIUGP3T9l7bvJNYA9DKg7beoyriA2CmyBwPfT668LChYo7PNAM+Jf/Z66jB2txmg9GAg/nEFX/ItTGRsLLWPPou0qoDsxcxQq8z0bFrhMdI7EgJEQvowsRLzWwwg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=J9XkOZA3RXA56Ea9/X8t4L7qZbu/jTW68DVd2kgzDhY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=B+YZU4bWfETvkmqfgyy3qcmlsgjPtBbyUN3OAw8SDYn0RISaMy+wIYIdKunFl4k5tNuhU1yc5jznl/KRp5a1BP9baNFaqi032XzLPNeROrkImLSKnNnSBTcvq5jZWFwRUrz3gkbZqVumPhFoB9ic0OHG1BlS+0gaakEnfFt/aKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=O3cr2C5x; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="O3cr2C5x" Received: by smtp.kernel.org (Postfix) with ESMTPS id 7D890C2BCB4; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=J9XkOZA3RXA56Ea9/X8t4L7qZbu/jTW68DVd2kgzDhY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=O3cr2C5x32oLyL4h7xuz2UvIo9eA6PphtmST6sK7EeejTnm4kfJAKsNxMjQZzt+sU D2PGf/lXwKbmY3Eo2apjgoYlxxUP3tWg1Hhw9JXOt4EXVTSw1VcZ3vbOzXUz/Xb5px UCqjmdT+XolQ+/+lrW6HTjbcIMXp800eCF5YTseQWv98nor8nFCVar247XRJE7vmem s/pl/tCGoiZilGmzSbO8j7ObmgQL0S2Fq9fjq2CwK5jPSUde1agsqQAfF7LokQjKOa 6wcRVbWQlVMvUWPhYB2+rm22sNdVQR2CEUykBY3rR39ajXi9IP5oeYicP6DVEY5OXo Mw7WG/EZDtIdw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73752CD3425; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:06 +0800 Subject: [PATCH RFC 20/32] mm/damon: don't clear the lruref for MGLRU Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-20-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=635; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=aHsc1nk/FwDwd1lG6O1KFNt7Wktso+y56WV9FF+FyW4=; b=mQ1ilv6x/6GtozJZhXvoobsbJ9AwqNI8htvVQf2Xp8Esl58NF8vs+4uOuU/WIwknj7Ts5Z1Ok wXwKliweL9CDMi4L7r0NeohOyUPT3TIx6Z/hv/jKA2Tw2fVQF04dSsD X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Signed-off-by: Kairui Song --- mm/damon/paddr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 5cdcc5037cbc..7718cd9ac959 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -186,7 +186,8 @@ static unsigned long damon_pa_pageout(struct damon_regi= on *r, else *sz_filter_passed +=3D folio_size(folio) / addr_unit; =20 - folio_clear_referenced(folio); + if (!lru_gen_enabled()) + folio_clear_referenced_by_bit(folio); folio_test_clear_young(folio); if (!folio_isolate_lru(folio)) goto put_folio; --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B01EE426D20 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=pxcR8ozsCS5W7JjsW2kBt5Y5TcR+UI3QcekFBwzkLUzGD29RJAT+QLrELo9AQVQOjtJpT3sG5p9WtgPXDjomziM8m/fOP/j4Uiqd63VqbM+Tax2PRCJ0/wA5FUDRW3M6Nx08wCcIvglCjEw2vH1fAV9oqsevaPyANoI8tU0a67w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=IFixioiEcWVbaHjdKqjkGe9G9//zLNm5c6Kw2RIarj0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=fShb9K5nIwq2WgSJRgijVmwIkecanQdbt0INWpZ2GaUHE7ozQOJVnAqzfjDHFSNt+9lnVUoizU8JZ8EyfxaflH45tMn3ed83GE88Nx/Q2l5UEWutY0E8Ly5OSAgslTnjKB9VKjU7yCMxgSwX0uxiCRYnkje8oiI7rWtoHBL6v6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BRgJrQ0U; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BRgJrQ0U" Received: by smtp.kernel.org (Postfix) with ESMTPS id 8FD85C2BCC9; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=IFixioiEcWVbaHjdKqjkGe9G9//zLNm5c6Kw2RIarj0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=BRgJrQ0UJNYt/UnBGgEtVE7K4uth2p9QU7i0zC3eOCBDIQIUkVKMymdhxln4WtRjx xiRcX5l8j7V00vbGsJjuLdzqe2ndll6ds8DKIveuZP4j2Lx+cGFG5Z8f7nWDGledac lXb6NMOZxDh0eWSrSl56OoXz+LS5/zRFMNxvEDfXr//94RVtogcDEL0WMqzVa11dl0 eD87Z/teeUjaJQB0lD5juOzyrP5BPhyFaNeU0AHmdXMy0f52ypZebtNQU8jFy/brE1 voPUY4hSTwRzilsnCESw2B7j4TUVRoCHl+J9gurm8FinX822NQwgbH/erSui10j45z DyRmBWLN05ASw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84F0BCCFA13; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:07 +0800 Subject: [PATCH RFC 21/32] mm/swap: convert to new bitwise identical helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-21-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=1323; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=eeiiyy+su7zYF2XxxCJkB0ii+3kNLayc6qTsNhWAGbI=; b=FyZO3KOY9qKBaiu/DIWFcROSEhhGKBxTqumQl6RrGU/2X5SJ5Pyl15i0sP1V8AB44wIWdPUMO AKnyGEbam5xBgtdf2RYhHf1fGBdJpGpa9Ss1FHDNVjXiSSeafhu8cev X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, just prepare for removal of PG_referenced. Signed-off-by: Kairui Song --- mm/swap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 2250d9db1395..043c4ec708d7 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -629,7 +629,7 @@ static void lru_deactivate_file(struct lruvec *lruvec, = struct folio *folio) =20 lruvec_del_folio(lruvec, folio); folio_clear_active(folio); - folio_clear_referenced(folio); + folio_clear_referenced_by_bit(folio); =20 if (folio_test_writeback(folio) || folio_test_dirty(folio)) { /* @@ -665,7 +665,7 @@ static void lru_deactivate(struct lruvec *lruvec, struc= t folio *folio) =20 lruvec_del_folio(lruvec, folio); folio_clear_active(folio); - folio_clear_referenced(folio); + folio_clear_referenced_by_bit(folio); lruvec_add_folio(lruvec, folio); =20 __count_vm_events(PGDEACTIVATE, nr_pages); @@ -685,7 +685,7 @@ static void lru_lazyfree(struct lruvec *lruvec, struct = folio *folio) if (lru_gen_enabled()) lru_gen_clear_refs(folio); else - folio_clear_referenced(folio); + folio_clear_referenced_by_bit(folio); /* * Lazyfree folios are clean anonymous folios. They have * the swapbacked flag cleared, to distinguish them from normal --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0DE7426D26 for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=Di4a6lC+lwugZkFBxh4KD5yRoyUvT1KxQV68BJzM+Az0SRaP8Nz+osc1ibhcEABmmQSD1EuDQW7gKsu4zi1x6mzbVJFJ7n4+WJbDji2F2SH76Ogz2QtmwzKeHuHgBcMPBy/Nnr4HhktelC4+qZgQH7p3Ik05fJH+zVqFpDm+7eM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=/ikrs+4SbaVJg0tvGLo0u4ETHIPCTVBH3OAFRKXnBFM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=YMJZAX4Efb/lhpAiU/LLEHgVTztIjxDIIJ/L6FW4FB4jXJDp6Fv/jafmJ2NQYh8nzFUuZAbcwBQnjk8wGR0jVjvEZpMzerWsf11XTh+1Awz1ZRbi4s15P70fisoj835OD4dfWV0SNmShTX7z5ZJP73zEBwj7bMpbOCShVbPWFTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qdonvicz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qdonvicz" Received: by smtp.kernel.org (Postfix) with ESMTPS id A2966C2BCF4; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=/ikrs+4SbaVJg0tvGLo0u4ETHIPCTVBH3OAFRKXnBFM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=qdonviczKU/JHnQExYpvrihI3vLh+h7BdQJWUDXa4SZs4mL+BfY7JtlvgY4zZSdZ0 OjBCTavQqahWGkbsPiQA1QJx1C6SSWfsGD0txaJ3Msdyqb1FrPlFgmkn0Dx2FcLgst KGqGQBvUdLWB443xpadV5T+T3v1gqPaCqdVJHFeEzHPiPjQ4P8Lljo2W+B8chR1tkD a2Dqp2jGC7GOUbIskmlwQ2Wpr5W5Oi7PghJX6AyrdeGL7jZoVsciPsfHhK0Jq0t6rN OeFhyeNyXtbQhBgHcZ8RA1KgJT+TsRzS4mkAA57g7Zx2M8MqX0hHgemvrOobMbGmY0 /86SB+ucaaNQA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9913CCD3427; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:08 +0800 Subject: [PATCH RFC 22/32] mm/workingset: simplify and use a more intuitive model Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-22-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=20953; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=1dTSECn4f3IJgXaVU1EBjh87x63FnKLXmOSYf0cJJvQ=; b=rv32xcR6Z5PlYssPrbZDyzFCu3sTTQOO6bRxH6wtLlCZqpLs8DRhxnGd2cUR/nRfzaUxgAgPQ 0YzGKc+n1TzAnXOG3oTJn2GdOm5aLK2rymKj8EDupUCULkcF0SNz039 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Remove workingset activation hook, rework the refault distance comparison accordingly. The non-resident age counter (NA) is now incremented only on eviction. This is also preparation for adapting refault-distance-based file page protection to multi-gen LRU. Refault-based re-activation on the active/inactive LRU helps identify the working set. On refault, it estimates the page's access distance and, if small enough, activates the page directly instead of letting it re-age through the inactive list. The existing estimation (see the original comment in mm/workingset.c) is built on two assumptions: 1. Activating an inactive page left-shifts all LRU pages by one (treating the LRU head as the right). 2. Evicting an inactive page left-shifts all LRU pages by one. Assumption 2 is exact. Assumption 1 is a bit subtle. The old model counts events (activations and evictions), which is internally consistent, but it means NA advances at the rate of the busiest pages, not at the rate of actual eviction pressure. A workload with an established active set whose pages cycle inactive -> active -> inactive -> active produces many NA increments per eviction. This is not an edge case: a folio activated even once contributes two NA per eviction, which is the common case. Legitimate working-set candidates then get aged out of the refault window faster than real eviction pressure would justify. MGLRU does not fit this model well either. Pages are aged in generations and are promoted frequently between them, so tying NA to activations is even less meaningful. Note this patch does not change the MGLRU path yet, just a preparation. New model =3D=3D=3D=3D=3D=3D=3D=3D=3D Treat evicted pages as if they were still resident, each carrying an eviction timestamp (NA at eviction time). These timestamps logically form a read-only "Shadow LRU" extending to the left of the real LRU. For a refaulting page, define: SP =3D NA@refault - NA@eviction SP is the page's offset back in the shadow LRU: how far in eviction history it sits. Since every evicted page was once at the head of the INACTIVE list, its in-memory access distance would have needed to be at least: SP + NR_INACTIVE to avoid eviction. So an upper bound on when activating the page could keep it resident is: SP + NR_INACTIVE <=3D NR_INACTIVE + NR_ACTIVE which simplifies to: SP <=3D NR_ACTIVE (NR_INACTIVE is read at refault time, not eviction time; the two can differ but are assumed comparable for workloads stable enough that the heuristic is meaningful at all.) The derivation above is the upper bound on feasibility. The policy applied here is stricter: compare SP against the average of NR_ACTIVE and NR_INACTIVE rather than NR_ACTIVE alone: SP <=3D (NR_ACTIVE + NR_INACTIVE) / 2 The number 2 here is not a magic number. Two arguments motivate this threshold. They stem from different reasoning but converge on the same operating point. 1. Empirical self-balancing. Relative to the feasibility bound SP <=3D NR_ACTIVE: - When NR_ACTIVE is short, no working set has been established yet. Moving the threshold above NR_ACTIVE lets more refaulting pages activate, so a working set can form faster. - When NR_ACTIVE is long, a working set is already establishing. Moving the threshold below NR_ACTIVE keeps it stable, so one-time refaults do not easily displace established active pages. Because NR_ACTIVE + NR_INACTIVE is roughly M (total cache memory), the threshold is roughly M/2 regardless of how the A/I split moves. The policy self-stabilizes around half of memory. Also note for refauting, the re-activation is pure hypothesis based so the lowest bar 1:1 is used. 2. Bounded MRU-like protection against LRU thrashing. On a sequential cyclic scan over a set of size S on memory M, every refault in steady state has SP ~=3D S - M. With the threshold at M/2, activation happens exactly when S <=3D 1.5 * M. In that regime, the first refaults of the second pass land on the active list and stop being evicted. Roughly M/2 pages get frozen active; the remaining M/2 pages churn through the inactive list. Hit rate on the frozen half is near 100%, compared to pure LRU's classic ~0% on any cyclic scan with S > M. This is a large improvement in the "slightly too large to fit" regime, which is a common workload shape when memory is sized close to but under the working set (analytics passes, batch jobs, cold-start caches). Above 1.5 * M the heuristic disengages. Since we don't know if the work load is sequential or random with skew, partial protection of much-larger sets has diminishing returns, and indiscriminate activation of pages with very large SP would pollute the active list with pages unlikely to be re-accessed before eviction. For uniform random access with no structure, SP is widely spread around its mean of S - M; activation is essentially neutral because no page is more valuable than another. The policy neither helps nor hurts. The two arguments are independent but give a similar conclusion, and easy to calculation, which is a useful coincidence and gives some confidence the constant is well-chosen. A secondary effect: NA still over-counts pages that are refaulted and then re-evicted, because such pages contribute to NA twice over their lifetime. This inflates SP for older shadows under heavy refault churn. The feasibility bound SP <=3D NR_ACTIVE would over-activate in that regime; tightening to (A+I)/2 dampens this without a separate correction. Performance =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Removing the NA update from folio_mark_accessed() and from move_folios_to_lru() avoids a memcg parent walk and an atomic add on those hot paths, so the no-pressure case should cost slightly less. Under pressure the behavior still seems good. Testing with several benchmarks didn't show any regression, and the simplification of code leads to a direct gain in many cases. Results can be seen in a previous posted version of this patch [1]. Link: https://lwn.net/ml/linux-kernel/20230920190244.16839-2-ryncsn@gmail.c= om/ [1] Signed-off-by: Kairui Song --- include/linux/swap.h | 2 - mm/swap.c | 1 - mm/vmscan.c | 2 - mm/workingset.c | 225 ++++++++++++++++++++++++-----------------------= ---- 4 files changed, 107 insertions(+), 123 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1930f81e6be4..04806dc8b2b1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -297,10 +297,8 @@ static inline swp_entry_t page_swap_entry(struct page = *page) /* linux/mm/workingset.c */ bool workingset_test_recent(void *shadow, bool file, bool *workingset, bool flush); -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pa= ges); void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_m= emcg); void workingset_refault(struct folio *folio, void *shadow); -void workingset_activation(struct folio *folio); =20 /* linux/mm/page_alloc.c */ extern unsigned long totalreserve_pages; diff --git a/mm/swap.c b/mm/swap.c index 043c4ec708d7..97c820b85db1 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -545,7 +545,6 @@ void folio_mark_accessed(struct folio *folio) else __lru_cache_activate_folio(folio); folio_clear_referenced_by_bit(folio); - workingset_activation(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index a3200020b9c2..e6631ad03caa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1923,8 +1923,6 @@ static unsigned int move_folios_to_lru(struct list_he= ad *list) lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); nr_moved +=3D nr_pages; - if (folio_test_active(folio)) - workingset_age_nonresident(lruvec, nr_pages); } =20 if (lruvec) diff --git a/mm/workingset.c b/mm/workingset.c index fa644948c80e..01f6e9a72cdb 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -66,74 +66,89 @@ * thrashing on the inactive list, after which refaulting pages can be * activated optimistically to compete with the existing active pages. * - * Approximating inactive page access frequency - Observations: - * - * 1. When a page is accessed for the first time, it is added to the - * head of the inactive list, slides every existing inactive page - * towards the tail by one slot, and pushes the current tail page - * out of memory. - * - * 2. When a page is accessed for the second time, it is promoted to - * the active list, shrinking the inactive list by one slot. This - * also slides all inactive pages that were faulted into the cache - * more recently than the activated page towards the tail of the - * inactive list. - * - * Thus: - * - * 1. The sum of evictions and activations between any two points in - * time indicate the minimum number of inactive pages accessed in - * between. - * - * 2. Moving one inactive page N page slots towards the tail of the - * list requires at least N inactive page accesses. - * - * Combining these: - * - * 1. When a page is finally evicted from memory, the number of - * inactive pages accessed while the page was in cache is at least - * the number of page slots on the inactive list. - * - * 2. In addition, measuring the sum of evictions and activations (E) - * at the time of a page's eviction, and comparing it to another - * reading (R) at the time the page faults back into memory tells - * the minimum number of accesses while the page was not cached. - * This is called the refault distance. - * - * Because the first access of the page was the fault and the second - * access the refault, we combine the in-cache distance with the - * out-of-cache distance to get the complete minimum access distance - * of this page: - * - * NR_inactive + (R - E) - * - * And knowing the minimum access distance of a page, we can easily - * tell if the page would be able to stay in cache assuming all page - * slots in the cache were available: - * - * NR_inactive + (R - E) <=3D NR_inactive + NR_active - * - * If we have swap we should consider about NR_inactive_anon and - * NR_active_anon, so for page cache and anonymous respectively: - * - * NR_inactive_file + (R - E) <=3D NR_inactive_file + NR_active_file - * + NR_inactive_anon + NR_active_anon - * - * NR_inactive_anon + (R - E) <=3D NR_inactive_anon + NR_active_anon - * + NR_inactive_file + NR_active_file - * - * Which can be further simplified to: - * - * (R - E) <=3D NR_active_file + NR_inactive_anon + NR_active_anon - * - * (R - E) <=3D NR_active_anon + NR_inactive_file + NR_active_file - * - * Put into words, the refault distance (out-of-cache) can be seen as - * a deficit in inactive list space (in-cache). If the inactive list - * had (R - E) more page slots, the page would not have been evicted - * in between accesses, but activated instead. And on a full system, - * the only thing eating into inactive list space is active pages. - * + * For such approximation, introduce a counter `nonresident_age` (NA) + * per lruvec. NA is incremented once for every evicted page, and each + * evicted page's shadow entry records the NA value at eviction time as + * a timestamp. So when an evicted page is faulted in again, we have: + * + * Let SP =3D ((NA's reading @ current) - (NA's reading @ eviction)) + * + * +-memory available to cache-+ + * | | + * +-------------------------+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + * | shadows * | INACTIVE | ACTIVE | + * +-+------------^----------+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + * | | + * +------------+ + * | SP + * oldest shadow * -> The refaulting page's shadow in the + * imaginary "Shadow LRU" + * + * SP stands for how far back in eviction history the refaulting page + * is. Since every evicted page was once at the head of the INACTIVE + * list, the minimum in-memory access distance this page would have + * needed to avoid eviction is: + * + * SP + NR_INACTIVE + * + * So the page is a plausible workingset candidate if: + * + * SP + NR_INACTIVE <=3D NR_INACTIVE + NR_ACTIVE + * + * which simplifies to: + * + * SP <=3D NR_ACTIVE + * + * Note NR_INACTIVE is read at refault time, not at eviction time; the + * two can differ, but the difference is assumed small for workloads + * stable enough that the refault-distance heuristic is meaningful at + * all. + * + * The derivation above gives the upper bound on when activation could + * keep the page resident. The actual policy used here is stricter: + * the refault distance is compared against the average of NR_ACTIVE + * and NR_INACTIVE rather than NR_ACTIVE alone: + * + * SP <=3D (NR_ACTIVE + NR_INACTIVE) / 2 + * + * Two arguments motivate this threshold and converge on the same + * operating point: + * + * 1. Self-balancing around the feasibility bound. Relative to + * SP <=3D NR_ACTIVE: + * + * - when NR_ACTIVE is short (no established workingset), the + * threshold sits above NR_ACTIVE, allowing more activations so + * a workingset can form faster; + * - when NR_ACTIVE is long (established workingset), the threshold + * sits below NR_ACTIVE, so one-time refaults do not easily + * displace established active pages. + * + * Because NR_ACTIVE + NR_INACTIVE is roughly M (total cache + * memory), the threshold is roughly M/2 regardless of how the A/I + * split moves. + * + * 2. Bounded MRU-like protection. For a sequential cyclic scan over + * a set of size S on memory M, every refault has SP roughly equal + * to S - M in steady state. With the threshold at M/2, activation + * happens exactly when S <=3D 1.5 * M. In that regime the heuristic + * freezes roughly M/2 pages onto the active list, yielding a large + * hit-rate improvement over pure LRU (which has ~0% hits on any + * cyclic scan with S > M). Above 1.5 * M the heuristic disengages: + * partial protection has diminishing returns as S/M grows, and + * indiscriminate activation would pollute the active list with + * pages unlikely to be re-accessed before eviction. For random + * access with skew, the SP-based threshold naturally selects + * hotter pages (shorter inter-access times give smaller SP), so + * the benefit extends across a broader range of S/M as a smooth + * transition rather than a cliff. + * + * A secondary effect: NA over-counts pages that are refaulted and + * then re-evicted (they contribute to NA twice over their lifetime), + * which inflates SP for older shadows under heavy refault churn. The + * feasibility bound SP <=3D NR_ACTIVE would over-activate in that + * regime; tightening to (A+I)/2 dampens this without a separate + * correction. * * Refaulting inactive pages * @@ -142,19 +157,14 @@ * time there is actually a good chance that pages on the active list * are no longer in active use. * - * So when a refault distance of (R - E) is observed and there are at - * least (R - E) pages in the userspace workingset, the refaulting page - * is activated optimistically in the hope that (R - E) pages are actually - * used less frequently than the refaulting page - or even not used at - * all anymore. - * - * That means if inactive cache is refaulting with a suitable refault - * distance, we assume the cache workingset is transitioning and put - * pressure on the current workingset. + * So when a refault distance SP satisfies the rule above, the + * refaulting page is activated optimistically in the hope that roughly + * (NR_ACTIVE + NR_INACTIVE) / 2 pages on the active side are used less + * frequently than the refaulting page - or even not used at all anymore. * * If this is wrong and demotion kicks in, the pages which are truly * used more frequently will be reactivated while the less frequently - * used once will be evicted from memory. + * used ones will be evicted from memory. * * But if this is right, the stale pages will be pushed out of memory * and the used pages get to stay in cache. @@ -170,11 +180,11 @@ * * Implementation * - * For each node's LRU lists, a counter for inactive evictions and - * activations is maintained (node->nonresident_age). + * For each lruvec, a non-resident age counter (lruvec->nonresident_age) + * is maintained. It is incremented once per evicted page. * * On eviction, a snapshot of this counter (along with some bits to - * identify the node) is stored in the now empty page cache + * identify the lruvec) is stored in the now empty page cache * slot of the evicted page. This is called a shadow entry. * * On cache misses for which there are shadow entries, an eligible @@ -366,7 +376,7 @@ static void lru_gen_refault(struct folio *folio, void *= shadow) * to the in-memory dimensions. This function allows reclaim and LRU * operations to drive the non-resident aging along in parallel. */ -void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pa= ges) +static void workingset_age_nonresident(struct lruvec *lruvec, unsigned lon= g nr_pages) { /* * Reclaiming a cgroup means reclaiming all its children in a @@ -433,14 +443,12 @@ void *workingset_eviction(struct folio *folio, struct= mem_cgroup *target_memcg) bool workingset_test_recent(void *shadow, bool file, bool *workingset, bool flush) { + unsigned long refault, distance, active, inactive; struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; - unsigned long refault_distance; - unsigned long workingset_size; - unsigned long refault; - int memcgid; struct pglist_data *pgdat; unsigned long eviction; + int memcgid; =20 if (lru_gen_enabled()) { bool recent; @@ -511,8 +519,8 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, * longest time, so the occasional inappropriate activation * leading to pressure on the active list is not a problem. */ - refault_distance =3D ((refault - eviction) & - (file ? EVICTION_MASK : EVICTION_MASK_ANON)); + distance =3D ((refault - eviction) & + (file ? EVICTION_MASK : EVICTION_MASK_ANON)); =20 /* * Compare the distance to the existing workingset size. We @@ -521,22 +529,21 @@ bool workingset_test_recent(void *shadow, bool file, = bool *workingset, * workingset competition needs to consider anon or not depends * on having free swap space. */ - workingset_size =3D lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - if (!file) { - workingset_size +=3D lruvec_page_state(eviction_lruvec, - NR_INACTIVE_FILE); - } + active =3D lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); + inactive =3D lruvec_page_state(eviction_lruvec, NR_INACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - workingset_size +=3D lruvec_page_state(eviction_lruvec, - NR_ACTIVE_ANON); - if (file) { - workingset_size +=3D lruvec_page_state(eviction_lruvec, - NR_INACTIVE_ANON); - } + active +=3D lruvec_page_state(eviction_lruvec, NR_ACTIVE_ANON); + inactive +=3D lruvec_page_state(eviction_lruvec, NR_INACTIVE_ANON); } =20 mem_cgroup_put(eviction_memcg); - return refault_distance <=3D workingset_size; + + /* + * Be cautious about challenging the existing active working set; + * sacrificing the inactive part of the opposite type should be safe. + */ + return distance <=3D (active + inactive) / 2; } =20 /** @@ -581,7 +588,6 @@ void workingset_refault(struct folio *folio, void *shad= ow) goto out; =20 folio_set_active(folio); - workingset_age_nonresident(lruvec, nr); mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + file, nr); =20 /* Folio was active prior to eviction */ @@ -598,23 +604,6 @@ void workingset_refault(struct folio *folio, void *sha= dow) mem_cgroup_put(memcg); } =20 -/** - * workingset_activation - note a page activation - * @folio: Folio that is being activated. - */ -void workingset_activation(struct folio *folio) -{ - /* - * Filter non-memcg pages here, e.g. unmap can call - * mark_page_accessed() on VDSO pages. - */ - if (mem_cgroup_disabled() || folio_memcg_charged(folio)) { - rcu_read_lock(); - workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); - rcu_read_unlock(); - } -} - /* * Shadow entries reflect the share of the working set that does not * fit into memory, so their number depends on the access pattern of --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5DD8426D2D for ; Fri, 1 May 2026 21:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; cv=none; b=u9F9qvRI3HaqovAdsQy44w6E+olr5MidDB4GDRG1MHvzeim0AuFPTIpSUIl4SW0764g944LidEyeWnTVTKCMh+mKE90Eh8a0+dXkWZdmrzfcyus93hbFnt9OyIrBTsTLHY6VWTFPtpod3SzQvCyBxVPgfDulMAJAEtO5EFHmfSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669453; c=relaxed/simple; bh=Ftr4whPMGYa53GufT+XufxiyT6tMyEQKBh59LOxSBJg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=UVWWGm1C+wrmKntlkGh+lgsuRkEytB6hoNVgmqI4JImdf9VaqkhOUwQtHkeDTFaiYRgWAutWrP1g3XAwp1NlBOt31lARW9F0bwzM9EfJGMdRaMYKLHQLnq7TrNzRmUg9Ilm+C+u5NNuKRx77H3YNlwqO6jsbRBEY9GUzJ73shVE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oQObQMgm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oQObQMgm" Received: by smtp.kernel.org (Postfix) with ESMTPS id B6655C2BCC7; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669453; bh=Ftr4whPMGYa53GufT+XufxiyT6tMyEQKBh59LOxSBJg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=oQObQMgmFXVM+ojV1ZQhu2uRkqYo1dWgZeEHTqq7n0bTL1gBkGrarL6XzRMHWhA6d Fy8F3sWH++ILt09ufGzg2DqedIzH0Tvs4oEn2SngyuFPkIpElaoluAe2snmq9to+zl gDRftzA/xd4E+Se18cfLrNQ2MyCr4hazdtAk/t4PWFrMptTIUFZl/LcBx9JOu8ccdg gvAjezGliysBGiQVm6HCZDPtxATW3h0p8onipO7xouF6RhVnQkvl0tT856xBa2afNR OXfWDdKfSWZq0mgV01vR9ORYo6xkPP+ZbUgf6qa5WyC5QAQOocIXLOWEVlzlSlSUeC 8QmZhkm7PGqug== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC107CD3423; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:09 +0800 Subject: [PATCH RFC 23/32] mm/workingset: rename the nonresistence age counter Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-23-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=4861; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=RZ75CFKx9NHBtI4cGuTrtP6eqoZNh7E9CBvqbHcahSE=; b=6CUaoS9oIH44bJ2AntcJ9Npik2Ryxt5O6dYw7eQBmQWbjWLTWtWLbYQjxusIpbsC+fbHre9/X LYWbK2P1T1OCsWFUfQfEBXJKiNAl+i7YwNq4D52rScVPa3Jkg1GqBLa X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Clearly, now nonresistence_age just means how many pages has been evicted from this lruvec. Rename it. Signed-off-by: Kairui Song --- include/linux/mmzone.h | 4 ++-- mm/workingset.c | 27 ++++++++++----------------- 2 files changed, 12 insertions(+), 19 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 393bbea75838..6747e1c6079c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -799,8 +799,8 @@ struct lruvec { */ unsigned long anon_cost; unsigned long file_cost; - /* Non-resident age, driven by LRU movement */ - atomic_long_t nonresident_age; + /* How many evictions has happened */ + atomic_long_t evictions; /* Refaults at the time of last reclaim cycle */ unsigned long refaults[ANON_AND_FILE]; /* Various lruvec state flags (enum lruvec_flags) */ diff --git a/mm/workingset.c b/mm/workingset.c index 01f6e9a72cdb..aac68cecdfbf 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -66,12 +66,12 @@ * thrashing on the inactive list, after which refaulting pages can be * activated optimistically to compete with the existing active pages. * - * For such approximation, introduce a counter `nonresident_age` (NA) - * per lruvec. NA is incremented once for every evicted page, and each - * evicted page's shadow entry records the NA value at eviction time as + * For such approximation, introduce a counter `evictions` (E) + * per lruvec. E is incremented once for every evicted page, and each + * evicted page's shadow entry records the E value at eviction time as * a timestamp. So when an evicted page is faulted in again, we have: * - * Let SP =3D ((NA's reading @ current) - (NA's reading @ eviction)) + * Let SP =3D ((E's reading @ current) - (E's reading @ eviction)) * * +-memory available to cache-+ * | | @@ -143,13 +143,6 @@ * the benefit extends across a broader range of S/M as a smooth * transition rather than a cliff. * - * A secondary effect: NA over-counts pages that are refaulted and - * then re-evicted (they contribute to NA twice over their lifetime), - * which inflates SP for older shadows under heavy refault churn. The - * feasibility bound SP <=3D NR_ACTIVE would over-activate in that - * regime; tightening to (A+I)/2 dampens this without a separate - * correction. - * * Refaulting inactive pages * * All that is known about the active list is that the pages have been @@ -180,7 +173,7 @@ * * Implementation * - * For each lruvec, a non-resident age counter (lruvec->nonresident_age) + * For each lruvec, a non-resident age counter (lruvec->evictions) * is maintained. It is incremented once per evicted page. * * On eviction, a snapshot of this counter (along with some bits to @@ -390,7 +383,7 @@ static void workingset_age_nonresident(struct lruvec *l= ruvec, unsigned long nr_p * the root cgroup's, age as well. */ do { - atomic_long_add(nr_pages, &lruvec->nonresident_age); + atomic_long_add(nr_pages, &lruvec->evictions); } while ((lruvec =3D parent_lruvec(lruvec))); } =20 @@ -421,7 +414,7 @@ void *workingset_eviction(struct folio *folio, struct m= em_cgroup *target_memcg) lruvec =3D mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid =3D mem_cgroup_private_id(lruvec_memcg(lruvec)); - eviction =3D atomic_long_read(&lruvec->nonresident_age); + eviction =3D atomic_long_read(&lruvec->evictions); eviction >>=3D bucket_order[file]; workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, @@ -501,17 +494,17 @@ bool workingset_test_recent(void *shadow, bool file, = bool *workingset, mem_cgroup_flush_stats_ratelimited(eviction_memcg); =20 eviction_lruvec =3D mem_cgroup_lruvec(eviction_memcg, pgdat); - refault =3D atomic_long_read(&eviction_lruvec->nonresident_age); + refault =3D atomic_long_read(&eviction_lruvec->evictions); =20 /* * Calculate the refault distance * * The unsigned subtraction here gives an accurate distance - * across nonresident_age overflows in most cases. There is a + * across evictions overflows in most cases. There is a * special case: usually, shadow entries have a short lifetime * and are either refaulted or reclaimed along with the inode * before they get too old. But it is not impossible for the - * nonresident_age to lap a shadow entry in the field, which + * evictions to lap a shadow entry in the field, which * can then result in a false small refault distance, leading * to a false activation should this old entry actually * refault again. However, earlier kernels used to deactivate --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37C5F426D3F for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=jGlwbmRSDRezMUB0J5DTPSwEwjjSkmnajfVWzZCRJFnq9qkCyIhl+qJQb/h6CJx+uUPmTzz6i0dczTNYd9u2A1ibzy/tlKW+n7asZljVdxSdYOqJCHHH/EqNF/VEbC2SiMYn+LlHi5SvTwmvXnf7uVZEAfIrBdAdOJZn3hktvhA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=rfg28QEetQjYOVfW5ELG9kC4fyTzABEeHPgECsGTHdk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dr7jxvNTYhRNr9R7VUdYyUjHe0FvJOpjQE9g9nBCYZ5n8ZAuTj1CbNaYQ1Poj6itn05TSDevRcHTDNi63vlV0w0XVvuPynKjaLNp7WqdOBPnesGRh+bjcjsP/Y172CeGbl12awzojeqjlYGW2Z5lXwEjaIeT6fuce4rmBT2PYkk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XrFpadVA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XrFpadVA" Received: by smtp.kernel.org (Postfix) with ESMTPS id E63E3C2BCC9; Fri, 1 May 2026 21:04:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=rfg28QEetQjYOVfW5ELG9kC4fyTzABEeHPgECsGTHdk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=XrFpadVAjcYnMwMtQRC5h0OL5Xfhb3B6UTtFDiLq4FAa/+XxUZgn9DCTjiSSp30Jz bWeoEu2HA71j9lgN/W7GYzpXvOk6qfLO25ACzmrRIlxzXo2soQUAWlShbjC7+Hm1x9 bJ/JjCUkoNDad+klBWvDBU4BD24pv9mr3Zhz7254cReN+m+w8NlHLb3rzP2ZcqoI4S CXeam9HWUDjKYffswf1x1LJhhw6aup1A8rsUSkDqxIWXfM+RvD/jHz2RjGSMewUvmv iLjISaDHYckoZoOFgMJZUdWkgrduIdve8FkpSR0v9Nwh0LTgg8wMb71O7A7t+v2cTD JRc3jNxbokDHQ== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8AFBCD3423; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:10 +0800 Subject: [PATCH RFC 24/32] mm/workingset: use a single atomic operation for read and age Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-24-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=2020; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=wgOBUng+uTQYm5twq6Yy1hA/jkGiMMQo7dEmYz74ui0=; b=ir6e0a5kqtAwGqvbIavgrfnZJL15MwneqaCVg2lgFlnO9FBbjWzjQcm7j0Nc4ykvHkerskNIx VRDLrU4BOj7CxHp3RBUP3d15PVMDXtxI4aCutuGdbgjQxeqMnPjkN6q X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song One single atomic operation should be faster than two for most archs, and be more accurate. Signed-off-by: Kairui Song --- mm/workingset.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index aac68cecdfbf..a77a6c0c3e15 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -369,8 +369,10 @@ static void lru_gen_refault(struct folio *folio, void = *shadow) * to the in-memory dimensions. This function allows reclaim and LRU * operations to drive the non-resident aging along in parallel. */ -static void workingset_age_nonresident(struct lruvec *lruvec, unsigned lon= g nr_pages) +static long workingset_age_nonresident(struct lruvec *lruvec, unsigned lon= g nr_pages) { + unsigned long eviction; + /* * Reclaiming a cgroup means reclaiming all its children in a * round-robin fashion. That means that each cgroup has an LRU @@ -382,9 +384,10 @@ static void workingset_age_nonresident(struct lruvec *= lruvec, unsigned long nr_p * the virtual inactive lists of all its parents, including * the root cgroup's, age as well. */ - do { + eviction =3D atomic_long_fetch_add_relaxed(nr_pages, &lruvec->evictions); + while ((lruvec =3D parent_lruvec(lruvec))) atomic_long_add(nr_pages, &lruvec->evictions); - } while ((lruvec =3D parent_lruvec(lruvec))); + return eviction; } =20 /** @@ -414,9 +417,8 @@ void *workingset_eviction(struct folio *folio, struct m= em_cgroup *target_memcg) lruvec =3D mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid =3D mem_cgroup_private_id(lruvec_memcg(lruvec)); - eviction =3D atomic_long_read(&lruvec->evictions); + eviction =3D workingset_age_nonresident(lruvec, folio_nr_pages(folio)); eviction >>=3D bucket_order[file]; - workingset_age_nonresident(lruvec, folio_nr_pages(folio)); return pack_shadow(memcgid, pgdat, eviction, folio_is_workingset(folio), file); } --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37CF6426EA0 for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=hgyrYCWrM1bFkhI5bDkM4PcdGwmM7iznxgmIZ84hqfDbZJY+D3nNsWGU2N/0G4vw+oLSyoui+9x62mW3/DV/4cR3ZxzxBxf3kLLVgaY3pdD6z+X+bonxM03J9tDNTHmxD31hw3qXEmOBD7M6dyTF6NMXrZ8EhhbI0IM5X/j0hfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=uio2f/d9kySlX7LZuVl0j2Zqx+2SWezHKqCvlhfuxeg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=uSCwOSHM7RVILHWhHuIYEkTwlxPqWSEmQ1gpCOXU++t3lQslPgXp1xpQJ/rXAkXdz3K9/wvn2vHYXFQYZ8WJdHIwy0efQPsEXVrhg25oii1fFtP4d5HlHlf3jeqoHjOaH2mnhsJT5tjfsiPRA3BAvRL6rmoMtUhXkAjLEMeM/TQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pZZ+MemW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pZZ+MemW" Received: by smtp.kernel.org (Postfix) with ESMTPS id 04843C2BCC6; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=uio2f/d9kySlX7LZuVl0j2Zqx+2SWezHKqCvlhfuxeg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=pZZ+MemWp7rO7SJpDU+K2Imd3EW8rsVfuJEprxVxlr87evIL0C4rbqq/11Z1X1Eem dr4mmiokiixCklaUe4wYF7mwaQxqmgQTfsJe1kRgOq81x78w7plA48Zx01QeOvyHgm 1WYqq5U91iAJAVqtuMxaloNHJoNH+hvVqvgOqxFNESM6R8QRoPHMbprSUTo69t9w7Z JDQL/EvPRRF6cHK64crd7zcPJaL6BLlZxv3UeX0uXm3svENW4MF6A06vD/UfQGJtPs HK26S3wVwWjxMJnyOzvU7txHRRHHjrAkBDopJK6YzzjnlvYNZ3uThxBdU97M41btWX 24mGajt82/uwg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFB3BCCFA13; Fri, 1 May 2026 21:04:13 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:11 +0800 Subject: [PATCH RFC 25/32] mm/workingset, lru_gen: simplify lru_gen recent Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-25-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=4396; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=KbFTzFXmhJEXcTUabj/0KGmAdVaMVZsImEKHZnTCyNA=; b=RZb8GTAX1xI9h34O7aHTxF8zmM5P2cs1w7khJykb58iQUXdSl8fVGbsw1OfZz9Bg07EHTDjJK Yd9ndllnZ4bBx8uRFgZBRXfjzVFN8DheBQywHs+6Y1ygX8I9k5cFIlo X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Move the commont part in the common caller. Also let it return early if the memcg doesn't match. Signed-off-by: Kairui Song --- mm/workingset.c | 55 ++++++++++++++++++++++++++---------------------------= -- 1 file changed, 26 insertions(+), 29 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index a77a6c0c3e15..b472ac34943e 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -277,44 +277,41 @@ static void *lru_gen_eviction(struct folio *folio) =20 /* * Tests if the shadow entry is for a folio that was recently evicted. - * Fills in @lruvec, @token, @workingset with the values unpacked from sha= dow. */ -static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec, - unsigned long *token, bool *workingset, bool file) +static bool lru_gen_test_recent(struct lruvec *lruvec, + unsigned long token, bool file) { - int memcg_id; unsigned long max_seq; - struct mem_cgroup *memcg; - struct pglist_data *pgdat; - - unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset); =20 - memcg =3D mem_cgroup_from_private_id(memcg_id); - *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - - max_seq =3D READ_ONCE((*lruvec)->lrugen.max_seq); + max_seq =3D READ_ONCE((lruvec)->lrugen.max_seq); max_seq &=3D (file ? EVICTION_MASK : EVICTION_MASK_ANON) >> LRU_REFS_BITS; =20 - return abs_diff(max_seq, *token >> LRU_REFS_BITS) < MAX_NR_GENS; + return abs_diff(max_seq, token >> LRU_REFS_BITS) < MAX_NR_GENS; } =20 static void lru_gen_refault(struct folio *folio, void *shadow) { bool recent; + int memcg_id; int hist, tier, refs; bool workingset; unsigned long token; struct lruvec *lruvec; + struct mem_cgroup *memcg; + struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type =3D folio_is_file_lru(folio); int delta =3D folio_nr_pages(folio); =20 - rcu_read_lock(); + unpack_shadow(shadow, &memcg_id, &pgdat, &token, &workingset); =20 - recent =3D lru_gen_test_recent(shadow, &lruvec, &token, &workingset, type= ); + rcu_read_lock(); + memcg =3D mem_cgroup_from_private_id(memcg_id); + lruvec =3D mem_cgroup_lruvec(memcg, pgdat); if (lruvec !=3D folio_lruvec(folio)) goto unlock; =20 + recent =3D lru_gen_test_recent(lruvec, token, type); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); =20 if (!recent) @@ -347,8 +344,8 @@ static void *lru_gen_eviction(struct folio *folio) return NULL; } =20 -static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec, - unsigned long *token, bool *workingset, bool file) +static bool lru_gen_test_recent(struct lruvec *lruvec, + unsigned long token, bool file) { return false; } @@ -445,19 +442,8 @@ bool workingset_test_recent(void *shadow, bool file, b= ool *workingset, unsigned long eviction; int memcgid; =20 - if (lru_gen_enabled()) { - bool recent; - - rcu_read_lock(); - recent =3D lru_gen_test_recent(shadow, &eviction_lruvec, &eviction, - workingset, file); - rcu_read_unlock(); - return recent; - } - rcu_read_lock(); unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - eviction <<=3D bucket_order[file]; =20 /* * Look up the memcg associated with the stored ID. It might @@ -482,6 +468,17 @@ bool workingset_test_recent(void *shadow, bool file, b= ool *workingset, =20 if (!mem_cgroup_disabled() && !eviction_memcg) return false; + + eviction_lruvec =3D mem_cgroup_lruvec(eviction_memcg, pgdat); + + if (lru_gen_enabled()) { + bool recent; + + recent =3D lru_gen_test_recent(eviction_lruvec, eviction, file); + mem_cgroup_put(eviction_memcg); + return recent; + } + /* * Flush stats (and potentially sleep) outside the RCU read section. * @@ -495,7 +492,6 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, if (flush) mem_cgroup_flush_stats_ratelimited(eviction_memcg); =20 - eviction_lruvec =3D mem_cgroup_lruvec(eviction_memcg, pgdat); refault =3D atomic_long_read(&eviction_lruvec->evictions); =20 /* @@ -514,6 +510,7 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, * longest time, so the occasional inappropriate activation * leading to pressure on the active list is not a problem. */ + eviction <<=3D bucket_order[file]; distance =3D ((refault - eviction) & (file ? EVICTION_MASK : EVICTION_MASK_ANON)); =20 --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39D48426EA1 for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=T0EsG1cfp/OlXXn+jvDFHNit4yfxAYMf0VhJBydZ4e2I066UKcPkpLXFOpYCrIW+tYHJoOa/YCc6PYf/iDJ/YqLaGf1p9DcbsyPobzOA/u0xWcLnuaIMU81DGZ+UjPvBq7Hi8dQKf+KhU/JmDznoteZvog6Or9GSUEJVMygSX/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=tHaw3tfLQDdMlRkLC7d39ZVgj9socWvo29ap6QmaxjM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Oom+oFqqhF9Esxt7R/W/MYLHAW4A01D3rBR9s1iYEt6c6558jwjfgJIqhGOSQtY2NlghiS4JwuvtT3z87RIcfKxPqDw++q969X7ZbI9jpKiazY6d6wBYek5b49/wk3X4epdfwGZvjk5kwZ3WzX4wH9kdnQA7ULjrh+orHNsEScM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sWQhubef; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sWQhubef" Received: by smtp.kernel.org (Postfix) with ESMTPS id 19DE7C2BCB8; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=tHaw3tfLQDdMlRkLC7d39ZVgj9socWvo29ap6QmaxjM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=sWQhubefyPkGM4BWTPRy9Ywc1mFfVZImQgysH0xEuLNngQhjfRu+AiaIBOLsyV7V1 nnKFhSHSZn2qJboqJtZouUN/2TiEGYKfuUnaV8NHxbAJZMOfv3lKHZXY7WZq/cYZ2z pKOiIwTqiJrM7Yg4lhsWbQLXdV8H9qiHaeNESV9VwRHOIvwG22h11eKXxpRgUWyR/u bmgyw/643c9wWx4/4qQ9NRCmIXcu1eirs6+hoWeFLMM1eFfwkHUDFUPl3bsPqd0nRy ArphTz5gDU/w0rF9V6bGrhIUUlT4kSsTHebmWqLz6SRxwgGPJSeOXJ30OWqM9TulNK WkBUXlkQmXAJg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 103FECD3423; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:12 +0800 Subject: [PATCH RFC 26/32] mm/workingset: properly define the format of a folio shadow Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-26-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=6409; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=BHsPRvLdIXuKL3/5pLqvcCuBLzJEpjMu3hI5uV7aekI=; b=AcTOgAnvvy963ARcizG0u5Uont1pyDQrOxeIwl5U9ZU8DRZnP0MT4tUKr19pgaGqHDFeirdzz XLtUak8Rg5sBgd6KvulitmR+E8YDSYWip1TC4e5UormFnuF5hLwTJ2u X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song The shadow of an evicted folio can be roughly divided into two parts: - The common and mandatory pack info: which contains the memcg info, workingset bit, and pgdat. - LRU specific eviction info: which is a "timestamp" for Active/Inactive LRU, and generation sequence for MGLRU. The common pack part is the same for both Active/Inactive and MGLRU, and the data stored presents the exact information. Meanwhile, the eviction info part could be truncated, which is OK since the eviction info is just a hint for LRU to determine what to do with a refaulted folio, and in the worst case, only has a limited effect on the system's performance. Add some comments on this, and consolidate the macros for these two parts. Signed-off-by: Kairui Song --- mm/workingset.c | 61 +++++++++++++++++++++++++++++++++++++----------------= ---- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index b472ac34943e..622e00ac28b6 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -184,13 +184,35 @@ * refault distance will immediately activate the refaulting page. */ =20 -#define WORKINGSET_SHIFT 1 -#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \ - WORKINGSET_SHIFT + NODES_SHIFT + \ - MEM_CGROUP_ID_SHIFT) -#define EVICTION_SHIFT_ANON (EVICTION_SHIFT + SWAP_COUNT_SHIFT) -#define EVICTION_MASK (~0UL >> EVICTION_SHIFT) -#define EVICTION_MASK_ANON (~0UL >> EVICTION_SHIFT_ANON) +/* + * Active/Inactive LRU, MGLRU have different info embedded in the shadow. + * Shadow format: + * / LRU Eviction Info \ / LRU Pack Info \ + * +----------------------------+----------------+-+ + * non-MGLRU: |SC| eviction timestamp | NID | MCID | W |1| + * MGLRU: |SC| seq number | refs | NID | MCID | W |1| + * ^ ^ ^ ^ ^ + * Swap Count (anon only) NUMA ID (NODES_SHIFT)-+ | | XA_VALUE + * Memory Cgroup ID (MEM_CGROUP_ID_SHIFT) --------+ | mark + * Workingset Bit (WORKINGSET_SHIFT) --------+ + * + * Shadow is a XA_VALUE, 63 / 31 bits are usable. + * + * The LRU pack info part is used to identify which lruvec a folio was + * evicted from. This part is always accurate so we never lose the + * basic track of faults on each lruvec. + * + * Eviction info is either a snapshot of the `evictions` counter of an + * lruvec when the folio was evicted (lru timestamp, for active/inactive + * LRU), or the min_seq number when the folio was evicted (MGLRU). This + * part may have shrunk, so we may get inaccurate info, which is usually + * fine and could be tolerated. + */ +#define WORKINGSET_SHIFT 1 +#define LRU_PACK_BITS (NODES_SHIFT + MEM_CGROUP_ID_SHIFT + \ + WORKINGSET_SHIFT) +#define LRU_EVICT_BITS (BITS_PER_XA_VALUE - LRU_PACK_BITS) +#define LRU_EVICT_BITS_ANON (LRU_EVICT_BITS - SWAP_COUNT_SHIFT) =20 /* * LRU refs uses LRU_REFS_WIDTH + 2 bits, the 2 bits are PG_workingset and @@ -212,7 +234,9 @@ static unsigned int bucket_order[ANON_AND_FILE] __read_= mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long evic= tion, bool workingset, bool file) { - eviction &=3D file ? EVICTION_MASK : EVICTION_MASK_ANON; + BUILD_BUG_ON(LRU_EVICT_BITS_ANON <=3D SWAP_COUNT_SHIFT); + + eviction &=3D BIT(file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - 1; eviction =3D (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction =3D (eviction << NODES_SHIFT) | pgdat->node_id; eviction =3D (eviction << WORKINGSET_SHIFT) | workingset; @@ -257,8 +281,7 @@ static void *lru_gen_eviction(struct folio *folio) struct pglist_data *pgdat =3D folio_pgdat(folio); unsigned short memcg_id; =20 - BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_BITS > - BITS_PER_LONG - max(EVICTION_SHIFT, EVICTION_SHIFT_ANON)); + BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_BITS > LRU_EVICT_BITS_ANON); =20 rcu_read_lock(); memcg =3D folio_memcg(folio); @@ -284,7 +307,7 @@ static bool lru_gen_test_recent(struct lruvec *lruvec, unsigned long max_seq; =20 max_seq =3D READ_ONCE((lruvec)->lrugen.max_seq); - max_seq &=3D (file ? EVICTION_MASK : EVICTION_MASK_ANON) >> LRU_REFS_BITS; + max_seq &=3D BIT((file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - LRU_REFS= _BITS) - 1; =20 return abs_diff(max_seq, token >> LRU_REFS_BITS) < MAX_NR_GENS; } @@ -512,7 +535,7 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, */ eviction <<=3D bucket_order[file]; distance =3D ((refault - eviction) & - (file ? EVICTION_MASK : EVICTION_MASK_ANON)); + (BIT(file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - 1)); =20 /* * Compare the distance to the existing workingset size. We @@ -781,12 +804,10 @@ static struct lock_class_key shadow_nodes_key; =20 static int __init workingset_init(void) { - unsigned int timestamp_bits, timestamp_bits_anon; struct shrinker *workingset_shadow_shrinker; unsigned int max_order; int ret =3D -ENOMEM; =20 - BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT); /* * Calculate the eviction bucket size to cover the longest * actionable refault distance, which is currently half of @@ -794,15 +815,13 @@ static int __init workingset_init(void) * some more pages at runtime, so keep working with up to * double the initial memory by using totalram_pages as-is. */ - timestamp_bits =3D BITS_PER_LONG - EVICTION_SHIFT; - timestamp_bits_anon =3D BITS_PER_LONG - EVICTION_SHIFT_ANON; max_order =3D fls_long(totalram_pages() - 1); - if (max_order > (BITS_PER_LONG - EVICTION_SHIFT)) - bucket_order[WORKINGSET_FILE] =3D max_order - timestamp_bits; - if (max_order > timestamp_bits_anon) - bucket_order[WORKINGSET_ANON] =3D max_order - timestamp_bits_anon; + if (max_order > LRU_EVICT_BITS) + bucket_order[WORKINGSET_FILE] =3D max_order - LRU_EVICT_BITS; + if (max_order > LRU_EVICT_BITS_ANON) + bucket_order[WORKINGSET_ANON] =3D max_order - LRU_EVICT_BITS_ANON; pr_info("workingset: timestamp_bits=3D%d (anon: %d) max_order=3D%d bucket= _order=3D%u (anon: %d)\n", - timestamp_bits, timestamp_bits_anon, max_order, + LRU_EVICT_BITS, LRU_EVICT_BITS_ANON, max_order, bucket_order[WORKINGSET_FILE], bucket_order[WORKINGSET_ANON]); =20 workingset_shadow_shrinker =3D shrinker_alloc(SHRINKER_NUMA_AWARE | --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BE66426EAC for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=vDuyHN88hmflV3eYtTBn5+ZX8mP3Yu8W1Xal+9OKGHup/gV8pVjicqGDqlArxjPGb5srsjdnZ1NyGtwNIu+y145iUYMbv5sQpWz+AJXqTKw2TmZsMoN7Nl8XdtHpCAJP9Yu7sNN5cavNx/QWzLg4tla0u4hFQzDx5sXCIPgK7VQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=EZl06XMXdG5hHyjUR1NFTv/KzrDGXtu69mN8a47gXHc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=R0CYE5ozPlzY6WTP4d28rzUrb/uLvr53/dVkctjywAbcrx+o8XK9apn0XkUi1dkLMMC3ovKwTlgaE6hAMbzBJdLAuPaR2AYZBL1x3N3j3+BPyNyzvXZFknsJF8/oTVWec31DJs6ImVEZdrer0mIsydhdBtRBQrjzOBzL8S0xouI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mA6Rfsig; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mA6Rfsig" Received: by smtp.kernel.org (Postfix) with ESMTPS id 2B62DC2BCB4; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=EZl06XMXdG5hHyjUR1NFTv/KzrDGXtu69mN8a47gXHc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=mA6Rfsig+XZYNj7U/Zy2ZXvj1j3kRtHYm3zR+THVmyImSOxOlZb3zsOFC5OEh8zM+ Nch/I4+D6ug9K0m5kVNEoG96iNJDyUZG9PjJGWM72F6gLG8848hPP0byDeRYOwpRVb 6azUYxUx1/MqnUg9bIYgOQwLZell8kR0YlR3U1eYDxobAaBnb+PinMofGCObwz2APp Rt8ppNlxzlh9opLmOtrAaKXfKx3MaT3hP7iSb9LC6hxlXcHS4Bdq9+kaDQmuyaedt+ cMc8WCriDX0N8DVeE98tCUQ2x7xBO4dr0E1G05Tt3pv7ZrAQWI09auVzbFTzXziljx jsRa3PkYCOC0A== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2347FCD342C; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:13 +0800 Subject: [PATCH RFC 27/32] mm/workingset: move refault distance checking into a helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-27-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=7913; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=bwYzIJmY3Hqa83k+2m4zOivhjtGZdKIukV17N5RFO4g=; b=406q4TDrkyFVDQ5IMpGCHCP44WNYxwuZQGPzaniAOeZA2dwckbGSgUqvgkWv/KqqeqF9PROsP Cb4l7vx1BkkC9r/wmxg3Sx2QxppoNWXmWgFDYcIRETQ5+fcG45eU5gT X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song There isn't any feature change, just move the refault distance checking logic into a standalone helper so it can be reused later. Signed-off-by: Kairui Song --- mm/workingset.c | 136 ++++++++++++++++++++++++++++++++--------------------= ---- 1 file changed, 78 insertions(+), 58 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 622e00ac28b6..e756b0cc14b5 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -234,9 +234,6 @@ static unsigned int bucket_order[ANON_AND_FILE] __read_= mostly; static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long evic= tion, bool workingset, bool file) { - BUILD_BUG_ON(LRU_EVICT_BITS_ANON <=3D SWAP_COUNT_SHIFT); - - eviction &=3D BIT(file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - 1; eviction =3D (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction =3D (eviction << NODES_SHIFT) | pgdat->node_id; eviction =3D (eviction << WORKINGSET_SHIFT) | workingset; @@ -264,6 +261,77 @@ static void unpack_shadow(void *shadow, int *memcgidp,= pg_data_t **pgdat, *workingsetp =3D workingset; } =20 +/** + * lru_eviction - notifies eviction of an folio on an lruvec + * @lruvec: the lruvec the folio belongs to + * @nr_pages: size of the folio + * + * As in-memory folio is evicted, increase the eviction counter on + * the LRU and return its current reading. + */ +static inline unsigned long lru_eviction(struct lruvec *lruvec, int nr_pag= es, + int bits, int bucket_order) +{ + unsigned long eviction; + + /* + * Reclaiming a cgroup means reclaiming all its children in a + * round-robin fashion. That means that each cgroup has an LRU + * order that is composed of the LRU orders of its child + * cgroups; and every page has an LRU position not just in the + * cgroup that owns it, but in all of that group's ancestors. + * + * So when the physical inactive list of a leaf cgroup ages, + * the virtual inactive lists of all its parents, including + * the root cgroup's, age as well. + */ + BUILD_BUG_ON(LRU_EVICT_BITS_ANON <=3D SWAP_COUNT_SHIFT); + eviction =3D atomic_long_fetch_add_relaxed(nr_pages, &lruvec->evictions); + while ((lruvec =3D parent_lruvec(lruvec))) + atomic_long_add(nr_pages, &lruvec->evictions); + + /* Truncate the timestamp to fit in limited bits */ + eviction >>=3D bucket_order; + eviction &=3D (BIT(bits) - 1); + return eviction; +} + +/** + * lru_distance - calculate the refault distance of a refaulted folio + * @lruvec: the lruvec the folio belongs to before eviction + * @eviction: eviction timestamp recorded in the shadow + * @bits: number of bits used to encode the timestamp + * @bucket_order: bucket order used to truncate the timestamp + * + * Read the lruvec's current eviction counter and return the refault + * distance. + */ +static inline unsigned long lru_distance(struct lruvec *lruvec, + unsigned long eviction, + int bits, int bucket_order) +{ + unsigned long refault; + + eviction <<=3D bucket_order; + refault =3D atomic_long_read(&lruvec->evictions); + + /* + * The unsigned subtraction here gives an accurate distance + * across evictions overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * evictions to lap a shadow entry in the field, which + * can then result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. + */ + return (refault - eviction) & (BIT(bits) - 1); +} + #ifdef CONFIG_LRU_GEN =20 static void *lru_gen_eviction(struct folio *folio) @@ -379,37 +447,6 @@ static void lru_gen_refault(struct folio *folio, void = *shadow) =20 #endif /* CONFIG_LRU_GEN */ =20 -/** - * workingset_age_nonresident - age non-resident entries as LRU ages - * @lruvec: the lruvec that was aged - * @nr_pages: the number of pages to count - * - * As in-memory pages are aged, non-resident pages need to be aged as - * well, in order for the refault distances later on to be comparable - * to the in-memory dimensions. This function allows reclaim and LRU - * operations to drive the non-resident aging along in parallel. - */ -static long workingset_age_nonresident(struct lruvec *lruvec, unsigned lon= g nr_pages) -{ - unsigned long eviction; - - /* - * Reclaiming a cgroup means reclaiming all its children in a - * round-robin fashion. That means that each cgroup has an LRU - * order that is composed of the LRU orders of its child - * cgroups; and every page has an LRU position not just in the - * cgroup that owns it, but in all of that group's ancestors. - * - * So when the physical inactive list of a leaf cgroup ages, - * the virtual inactive lists of all its parents, including - * the root cgroup's, age as well. - */ - eviction =3D atomic_long_fetch_add_relaxed(nr_pages, &lruvec->evictions); - while ((lruvec =3D parent_lruvec(lruvec))) - atomic_long_add(nr_pages, &lruvec->evictions); - return eviction; -} - /** * workingset_eviction - note the eviction of a folio from memory * @target_memcg: the cgroup that is causing the reclaim @@ -437,8 +474,9 @@ void *workingset_eviction(struct folio *folio, struct m= em_cgroup *target_memcg) lruvec =3D mem_cgroup_lruvec(target_memcg, pgdat); /* XXX: target_memcg can be NULL, go through lruvec */ memcgid =3D mem_cgroup_private_id(lruvec_memcg(lruvec)); - eviction =3D workingset_age_nonresident(lruvec, folio_nr_pages(folio)); - eviction >>=3D bucket_order[file]; + eviction =3D lru_eviction(lruvec, folio_nr_pages(folio), + file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON, + bucket_order[file]); return pack_shadow(memcgid, pgdat, eviction, folio_is_workingset(folio), file); } @@ -458,7 +496,7 @@ void *workingset_eviction(struct folio *folio, struct m= em_cgroup *target_memcg) bool workingset_test_recent(void *shadow, bool file, bool *workingset, bool flush) { - unsigned long refault, distance, active, inactive; + unsigned long distance, active, inactive; struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; struct pglist_data *pgdat; @@ -515,27 +553,9 @@ bool workingset_test_recent(void *shadow, bool file, b= ool *workingset, if (flush) mem_cgroup_flush_stats_ratelimited(eviction_memcg); =20 - refault =3D atomic_long_read(&eviction_lruvec->evictions); - - /* - * Calculate the refault distance - * - * The unsigned subtraction here gives an accurate distance - * across evictions overflows in most cases. There is a - * special case: usually, shadow entries have a short lifetime - * and are either refaulted or reclaimed along with the inode - * before they get too old. But it is not impossible for the - * evictions to lap a shadow entry in the field, which - * can then result in a false small refault distance, leading - * to a false activation should this old entry actually - * refault again. However, earlier kernels used to deactivate - * unconditionally with *every* reclaim invocation for the - * longest time, so the occasional inappropriate activation - * leading to pressure on the active list is not a problem. - */ - eviction <<=3D bucket_order[file]; - distance =3D ((refault - eviction) & - (BIT(file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - 1)); + distance =3D lru_distance(eviction_lruvec, eviction, + file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON, + bucket_order[file]); =20 /* * Compare the distance to the existing workingset size. We --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E92E426EB1 for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=TOzoTzJlrPJWJuYIZmLuyqVDXparmu2BKrW7uSb2sojVCpHusCfcIUIiO0zpwFXkIez7ILxoIp/i4tuiLDpxj2wHgi8SaZEpcT66yoqyNFGauY/D2WZ/+UDkgcecjdmvg45pGEWDdpZb8L575g7E23AM19Ictu53GlPOc86UqRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=iJExWTQRPkiwcYj7KBz1453v6H0elkPygOR5Ae/2kfg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LbxegST5DaipujxYtn1nZo49antiq5gTcx82YhZXhLQs3qNmgseNa61sBj8Tjqfe47u8feL7QWTX7PkADTISWs5YofQMdheLHZP/trXLshsPDdD+4IhAv8TeOIOYO2hyFzfNCm6ceYhxctldqV3sexgfYFZvHccOVJd+mw3JcpI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=C03jirgi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="C03jirgi" Received: by smtp.kernel.org (Postfix) with ESMTPS id 40056C2BCC4; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=iJExWTQRPkiwcYj7KBz1453v6H0elkPygOR5Ae/2kfg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=C03jirginn7K28FqB0HDkVVvV1f+S0NqIZLWLYIXy24kD/B1Y+oxUepgM0HeS0uTf 1JA74vEc3bZZeaN2Ajac6Ra9siO9/v+fsxahelnX9SLYyrF2u5eSkD0x2GQqcjcdNV uQN06rDp98RCK0xgD47RnEvkkmMag7ecp8y04ygwAqGLOfEufyHrh+k1fOjEyUwtnp OXxeFb1faKjven7+/FmtcAckVSc0noL953cG7Ev2c8wZ8aq6FvAf3aD4F1zRc4noKU fqa4bGpZ86k75m+llHYDr54XInq1ILZmsEO3jV4vLX7hIfIwp1vO2xkKTI/HO0qMh9 05tkudkewn54Q== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DF7CCFA13; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:14 +0800 Subject: [PATCH RFC 28/32] mm/workingset: split lruvec retrieving and flush into a helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-28-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=8304; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=CannGTZuhdb+chYR/x/lEwpMQkSHEkSpZfE0GVG1Xvg=; b=oXBiImfpMiql1tAMBZqHjNGBH+90Hveu3ZZG7nw1NvBuBrivKpGFuLylAq0UHD0SBKaV/+h7T khWYi0AglAVD1r4EUq/bDzIigkek7M7F7l5+pz4g9OXg6Qjj0Zb0PwG X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Soon MGLRU will share the common routine for refault distance checking, so make a few helpers for that. No feature change. Signed-off-by: Kairui Song --- mm/workingset.c | 189 +++++++++++++++++++++++++++++-----------------------= ---- 1 file changed, 98 insertions(+), 91 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index e756b0cc14b5..5c52dd835a92 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -261,6 +261,60 @@ static void unpack_shadow(void *shadow, int *memcgidp,= pg_data_t **pgdat, *workingsetp =3D workingset; } =20 +static struct lruvec *try_unpack_get_lruvec(void *shadow, + unsigned long *eviction, + bool *workingset, bool flush) +{ + int memcgid; + struct mem_cgroup *memcg; + struct pglist_data *pgdat; + + unpack_shadow(shadow, &memcgid, &pgdat, eviction, workingset); + + /* + * Look up the memcg associated with the stored ID. It might + * have been deleted since the folio's eviction. + * + * Note that in rare events the ID could have been recycled + * for a new cgroup that refaults a shared folio. This is + * impossible to tell from the available data. However, this + * should be a rare and limited disturbance, and activations + * are always speculative anyway. Ultimately, it's the aging + * algorithm's job to shake out the minimum access frequency + * for the active cache. + * + * XXX: On !CONFIG_MEMCG, this will always return NULL; it + * would be better if the root_mem_cgroup existed in all + * configurations instead. + */ + rcu_read_lock(); + memcg =3D mem_cgroup_from_private_id(memcgid); + if (!mem_cgroup_tryget(memcg)) + memcg =3D NULL; + rcu_read_unlock(); + + if (!mem_cgroup_disabled() && !memcg) + return NULL; + + /* + * Flush stats (and potentially sleep) outside the RCU read section. + * XXX: With per-memcg flushing and thresholding, is ratelimiting + * still needed here? + */ + if (memcg && flush) + mem_cgroup_flush_stats_ratelimited(memcg); + + return mem_cgroup_lruvec(memcg, pgdat); +} + +static void put_lruvec(struct lruvec *lruvec) +{ + if (mem_cgroup_disabled()) + return; + + mem_cgroup_put(lruvec_memcg(lruvec)); +} + /** * lru_eviction - notifies eviction of an folio on an lruvec * @lruvec: the lruvec the folio belongs to @@ -383,30 +437,25 @@ static bool lru_gen_test_recent(struct lruvec *lruvec, static void lru_gen_refault(struct folio *folio, void *shadow) { bool recent; - int memcg_id; int hist, tier, refs; bool workingset; unsigned long token; struct lruvec *lruvec; - struct mem_cgroup *memcg; - struct pglist_data *pgdat; struct lru_gen_folio *lrugen; int type =3D folio_is_file_lru(folio); int delta =3D folio_nr_pages(folio); =20 - unpack_shadow(shadow, &memcg_id, &pgdat, &token, &workingset); - - rcu_read_lock(); - memcg =3D mem_cgroup_from_private_id(memcg_id); - lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + lruvec =3D try_unpack_get_lruvec(shadow, &token, &workingset, false); + if (!lruvec) + return; if (lruvec !=3D folio_lruvec(folio)) - goto unlock; + goto out_put; =20 recent =3D lru_gen_test_recent(lruvec, token, type); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); =20 if (!recent) - goto unlock; + goto out_put; =20 lrugen =3D &lruvec->lrugen; =20 @@ -424,8 +473,8 @@ static void lru_gen_refault(struct folio *folio, void *= shadow) folio_set_lru_refs(folio, refs); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } -unlock: - rcu_read_unlock(); +out_put: + put_lruvec(lruvec); } =20 #else /* !CONFIG_LRU_GEN */ @@ -494,91 +543,49 @@ void *workingset_eviction(struct folio *folio, struct= mem_cgroup *target_memcg) * Return: true if the shadow is for a recently evicted folio; false other= wise. */ bool workingset_test_recent(void *shadow, bool file, bool *workingset, - bool flush) + bool flush) { - unsigned long distance, active, inactive; - struct mem_cgroup *eviction_memcg; - struct lruvec *eviction_lruvec; - struct pglist_data *pgdat; + struct lruvec *lruvec; unsigned long eviction; - int memcgid; - - rcu_read_lock(); - unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); - - /* - * Look up the memcg associated with the stored ID. It might - * have been deleted since the folio's eviction. - * - * Note that in rare events the ID could have been recycled - * for a new cgroup that refaults a shared folio. This is - * impossible to tell from the available data. However, this - * should be a rare and limited disturbance, and activations - * are always speculative anyway. Ultimately, it's the aging - * algorithm's job to shake out the minimum access frequency - * for the active cache. - * - * XXX: On !CONFIG_MEMCG, this will always return NULL; it - * would be better if the root_mem_cgroup existed in all - * configurations instead. - */ - eviction_memcg =3D mem_cgroup_from_private_id(memcgid); - if (!mem_cgroup_tryget(eviction_memcg)) - eviction_memcg =3D NULL; - rcu_read_unlock(); - - if (!mem_cgroup_disabled() && !eviction_memcg) - return false; - - eviction_lruvec =3D mem_cgroup_lruvec(eviction_memcg, pgdat); + unsigned long active, inactive; + unsigned long distance; + bool recent; =20 if (lru_gen_enabled()) { - bool recent; - - recent =3D lru_gen_test_recent(eviction_lruvec, eviction, file); - mem_cgroup_put(eviction_memcg); - return recent; - } - - /* - * Flush stats (and potentially sleep) outside the RCU read section. - * - * Note that workingset_test_recent() itself might be called in RCU read - * section (for e.g, in cachestat) - these callers need to skip flushing - * stats (via the flush argument). - * - * XXX: With per-memcg flushing and thresholding, is ratelimiting - * still needed here? - */ - if (flush) - mem_cgroup_flush_stats_ratelimited(eviction_memcg); - - distance =3D lru_distance(eviction_lruvec, eviction, - file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON, - bucket_order[file]); - - /* - * Compare the distance to the existing workingset size. We - * don't activate pages that couldn't stay resident even if - * all the memory was available to the workingset. Whether - * workingset competition needs to consider anon or not depends - * on having free swap space. - */ - active =3D lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); - inactive =3D lruvec_page_state(eviction_lruvec, NR_INACTIVE_FILE); - - if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) { - active +=3D lruvec_page_state(eviction_lruvec, NR_ACTIVE_ANON); - inactive +=3D lruvec_page_state(eviction_lruvec, NR_INACTIVE_ANON); + lruvec =3D try_unpack_get_lruvec(shadow, &eviction, workingset, false); + if (!lruvec) + return false; + recent =3D lru_gen_test_recent(lruvec, eviction, file); + } else { + lruvec =3D try_unpack_get_lruvec(shadow, &eviction, workingset, flush); + if (!lruvec) + return false; + distance =3D lru_distance(lruvec, eviction, + file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON, + bucket_order[file]); + /* + * Compare the distance to the existing workingset size. We + * don't activate pages that couldn't stay resident even if + * all the memory was available to the workingset. Whether + * workingset competition needs to consider anon or not depends + * on having free swap space. + */ + active =3D lruvec_page_state(lruvec, NR_ACTIVE_FILE); + inactive =3D lruvec_page_state(lruvec, NR_INACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(lruvec_memcg(lruvec)) > 0) { + active +=3D lruvec_page_state(lruvec, NR_ACTIVE_ANON); + inactive +=3D lruvec_page_state(lruvec, NR_INACTIVE_ANON); + } + /* + * Be cautious about challenging the existing active working + * set; sacrificing the inactive part of the opposite type + * should be safe. + */ + recent =3D distance <=3D (active + inactive) / 2; } =20 - mem_cgroup_put(eviction_memcg); - - /* - * Be cautious about challenging the existing active working set; - * sacrificing the inactive part of the opposite type should be safe. - */ - return distance <=3D (active + inactive) / 2; + put_lruvec(lruvec); + return recent; } =20 /** --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E2FC426EB6 for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=uId4l0ZeV0kCkrNj6r+kQjx4QD/4to0/7f8SsW3A88Uo0AfGyVtLXYlA2f0Ni1/b7lBqTLbPlL0fHBukIFzze50XIywrbQUCgaV+N3qLFGKQcn9uaZj6Vm2D8nj2esR7iorOow53vF1rT6EtItXAmo8TV6PTB+qMQqjtF6UZqRQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=lVTNcFnFgfnB5g69KkGFxL7/LFAt8CkEJCpVqbo55Qg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PKVudHdU53mjpo+JB0HP5EZNJSncd1FXZ5DYek9LZWStF9KORnR43NsHyp6pTQTpx/0j2uDKWw+4kJwTqRKz1nMxuyRuX4X0yxYG+4fGctAR8XR81hjISr8x+pH9+18rOEg7LXXEuHZr/BVfwvC4TBZCJvcwJ/+fwGrVV57ZujQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KlNZVWzB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KlNZVWzB" Received: by smtp.kernel.org (Postfix) with ESMTPS id 52A39C2BCF6; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=lVTNcFnFgfnB5g69KkGFxL7/LFAt8CkEJCpVqbo55Qg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=KlNZVWzBaixt05Wnl+6RSDBZDXM6yR7Mz1zwDfJFWU405rbH/My0xpog0FIRyEa4G iURkjKqVpkU9WjegsF4LwIpWO20ARiHqX5dhX2+nnC+TII019JXnzSCHLoukVHZsgF paDxZtieL2WTTbb5nbVKN0dq7iupqQtkznvlTc5a1N9MYbEfHpCkYLxkjQpp2RTgZZ e0ngp3ODlzrn0TQ2dtyjWw+J1AYq/Zpuk208cRjkhKaC0Mi9mQnKthwxeZsPN6XCKv Sp8p7VFPfXr0QxQplDBCBEu0AkaglrzMgt+rmam/500glY/ANm4f0EMmUJxuxkt8X7 XmVBOJNl6XEbw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48BECCD342C; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:15 +0800 Subject: [PATCH RFC 29/32] mm/mglru: convert avg_total and avg_refaulted to atomic Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-29-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=3247; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=hmT6vqop0NqUJUmZIf0OMLsTgQI+2TaaPPU/LnVzSzc=; b=g+0w/grZtdu1EfjPTqryehzK0cTREjDFDKafsOtlliKbG3uXxYucXFymfVL6KoJwofcPFE2cs gfKrnc8/qUECWpRaVLAaJ6HpGiIuc964Lmc3JDsXDYQuw5ArvJXTStI X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song No feature change, make it possible to update these value parallelly. Signed-off-by: Kairui Song --- include/linux/mmzone.h | 4 ++-- mm/vmscan.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6747e1c6079c..aa27627b0406 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -603,9 +603,9 @@ struct lru_gen_folio { /* the multi-gen LRU sizes, eventually consistent */ atomic_long_t nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ - unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ - unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t avg_total[ANON_AND_FILE][MAX_NR_TIERS]; /* can only be modified under the LRU lock */ unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; /* can be modified without holding the LRU lock */ diff --git a/mm/vmscan.c b/mm/vmscan.c index e6631ad03caa..22e77450c1b4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3140,9 +3140,9 @@ static void read_ctrl_pos(struct lruvec *lruvec, int = type, int tier, int gain, pos->refaulted =3D pos->total =3D 0; =20 for (i =3D tier % MAX_NR_TIERS; i <=3D min(tier, MAX_NR_TIERS - 1); i++) { - pos->refaulted +=3D lrugen->avg_refaulted[type][i] + + pos->refaulted +=3D atomic_long_read(&lrugen->avg_refaulted[type][i]) + atomic_long_read(&lrugen->refaulted[hist][type][i]); - pos->total +=3D lrugen->avg_total[type][i] + + pos->total +=3D atomic_long_read(&lrugen->avg_total[type][i]) + lrugen->protected[hist][type][i] + atomic_long_read(&lrugen->evicted[hist][type][i]); } @@ -3166,14 +3166,14 @@ static void reset_ctrl_pos(struct lruvec *lruvec, i= nt type, bool carryover) if (carryover) { unsigned long sum; =20 - sum =3D lrugen->avg_refaulted[type][tier] + + sum =3D atomic_long_read(&lrugen->avg_refaulted[type][tier]) + atomic_long_read(&lrugen->refaulted[hist][type][tier]); - WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_refaulted[type][tier], sum / 2); =20 - sum =3D lrugen->avg_total[type][tier] + + sum =3D atomic_long_read(&lrugen->avg_total[type][tier]) + lrugen->protected[hist][type][tier] + atomic_long_read(&lrugen->evicted[hist][type][tier]); - WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2); + atomic_long_set(&lrugen->avg_total[type][tier], sum / 2); } =20 if (clear) { @@ -5466,8 +5466,8 @@ static void lru_gen_seq_show_full(struct seq_file *m,= struct lruvec *lruvec, =20 if (seq =3D=3D max_seq) { s =3D "RTx"; - n[0] =3D READ_ONCE(lrugen->avg_refaulted[type][tier]); - n[1] =3D READ_ONCE(lrugen->avg_total[type][tier]); + n[0] =3D atomic_long_read(&lrugen->avg_refaulted[type][tier]); + n[1] =3D atomic_long_read(&lrugen->avg_total[type][tier]); } else if (seq =3D=3D min_seq[type] || NR_HIST_GENS > 1) { s =3D "rep"; n[0] =3D atomic_long_read(&lrugen->refaulted[hist][type][tier]); --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85A87426EBC for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=XbSpXMhI4VEKgxg2i05aIlSTOxYQQRPRwThAfHAdILJAUt1kP1qJkiGEztBq9ANfm5gbgj7KbnV8rJe3wSWnWRBfNgZ6N5HXZ6kOFIKD2qXD3D/2KtF9UAA7Mne9VvPqvwiIeAEb+/LK9AZM5MY0O3yhxEWtWALvIiDVeHjWHbA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=9bXLr8DWOxGjog7CVEQ5YwCoruTB5F+PQs0K9k/GJFo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AbRe20C4riq3t92RU5AFxc+tMs6hPReDg/HZ12mJTE0jbFimNxm8EW/AjFzGCJlfk35KwqL86m7e387mO/Al4ivWXAYZEvNne4M30eUbp52kcQl+WsIABT1mjpXGR1EAARpZE9KtTVhzGH4uPIfnACWW2vegq4THECCV8HE6o7U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r7+EWpuE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r7+EWpuE" Received: by smtp.kernel.org (Postfix) with ESMTPS id 67E3BC2BCB8; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=9bXLr8DWOxGjog7CVEQ5YwCoruTB5F+PQs0K9k/GJFo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=r7+EWpuErCrOya0sV1kyeAzHASSkfmQlVbh+udt5YtToGeyv/eMeS4NAZTuQVu9zm IN6VT9NAHiQkfX27XqdT4xnHA9kbYSQO3qlgWc2st25o6hZDmrUppmTzj1jMrrTOv6 wlw/eVAA/tKo03bHbTe2dBS5SLbe8b+0PHogFuE/zetbcQFmgtcaJgXzoqR7r/36AS B7eL+t7pb9HHPjUimJjyeO8VoWIdEFKGgXJDvSkL8+YNHEVE4YKLD3WEjQFoZ9Scvt +zUVnepCOhZXKMewrQAuDOk9hutQ5juR7nCqcam3SDsqGXKV7+k1H2IhpjBWuY1pDi oohcoaVnL87NA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D4E6CCFA13; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:16 +0800 Subject: [PATCH RFC 30/32] mm/mglru, workingset: apply refault-distance based re-activation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-30-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=9815; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=3SDg1UIwE2lOt/3QEIDXKbHVzTms1xk8wQaCfFECvFg=; b=IVyQkXaCapdKIeTGy01jKs/3iCm+Yy/0+bi99dDFnHITieJXGhW1YYpbViK7xMYWbGx8vZTm7 ri+vMaIdqD0Ceelh1IhYMLmwcQ1jYrRjxSGiLndWjW7SCZTFXvN1zJV X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song I noticed MGLRU not working very well on certain workflows, which is observed on some heavily stressed databases. That is when the file page workingset size exceeds total memory, and the access distance of file pages also larger than total memory. All file pages could stuck on the oldest generation and getting read-in then evicted permutably. Despite anon pages being idle, they never get aged. PID controller didn't kickin until there are some minor access pattern changes. And file pages are not promoted or reused. Even though the memory can't cover the whole workingset, the refault-distance based re-activation can help hold part of the workingset in-memory to help reduce the IO workload significantly. So apply it for MGLRU as well. The updated refault-distance model fits well for MGLRU in most cases, if we just consider the last two generation as the inactive LRU and the first two generations as active LRU. Some adjustment is done to fit the logic better, also make the refault-distance contributed to page tiering and PID refault detection of MGLRU. NOTE: This also changed the meaning of workingset_* fields in /proc/vmstat. Signed-off-by: Kairui Song --- mm/workingset.c | 116 +++++++++++++++++++++++++++++++++++++++-------------= ---- 1 file changed, 81 insertions(+), 35 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 5c52dd835a92..25a8eda233ef 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -190,7 +190,7 @@ * / LRU Eviction Info \ / LRU Pack Info \ * +----------------------------+----------------+-+ * non-MGLRU: |SC| eviction timestamp | NID | MCID | W |1| - * MGLRU: |SC| seq number | refs | NID | MCID | W |1| + * MGLRU: |SC| refs|eviction timestamp | NID | MCID | W |1| * ^ ^ ^ ^ ^ * Swap Count (anon only) NUMA ID (NODES_SHIFT)-+ | | XA_VALUE * Memory Cgroup ID (MEM_CGROUP_ID_SHIFT) --------+ | mark @@ -219,7 +219,9 @@ * PG_referenced. But here we record PG_workingset separately (to reuse * pack_shadow). */ -#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 2) - 1) +#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 2) - 1) +#define LRU_GEN_EVICT_BITS (LRU_EVICT_BITS - LRU_REFS_BITS) +#define LRU_GEN_EVICT_BITS_ANON (LRU_EVICT_BITS_ANON - LRU_REFS_BITS) =20 /* * Eviction timestamps need to be able to cover the full range of @@ -230,6 +232,7 @@ * evictions into coarser buckets by shaving off lower timestamp bits. */ static unsigned int bucket_order[ANON_AND_FILE] __read_mostly; +static unsigned int lru_gen_bucket_order[ANON_AND_FILE] __read_mostly; =20 static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long evic= tion, bool workingset, bool file) @@ -392,7 +395,6 @@ static void *lru_gen_eviction(struct folio *folio) { int hist; unsigned long token; - unsigned long min_seq; struct lruvec *lruvec; struct lru_gen_folio *lrugen; int type =3D folio_is_file_lru(folio); @@ -403,16 +405,19 @@ static void *lru_gen_eviction(struct folio *folio) struct pglist_data *pgdat =3D folio_pgdat(folio); unsigned short memcg_id; =20 - BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_BITS > LRU_EVICT_BITS_ANON); + BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_BITS > LRU_GEN_EVICT_BITS_ANON); =20 rcu_read_lock(); memcg =3D folio_memcg(folio); lruvec =3D mem_cgroup_lruvec(memcg, pgdat); lrugen =3D &lruvec->lrugen; - min_seq =3D READ_ONCE(lrugen->min_seq[type]); - token =3D (min_seq << LRU_REFS_BITS) | refs >> 1; + hist =3D lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); =20 - hist =3D lru_hist_from_seq(min_seq); + token =3D refs >> 1; + token <<=3D type ? LRU_GEN_EVICT_BITS : LRU_GEN_EVICT_BITS_ANON; + token |=3D lru_eviction(lruvec, delta, + type ? LRU_GEN_EVICT_BITS : LRU_GEN_EVICT_BITS_ANON, + lru_gen_bucket_order[type]); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); memcg_id =3D mem_cgroup_private_id(memcg); rcu_read_unlock(); @@ -423,56 +428,87 @@ static void *lru_gen_eviction(struct folio *folio) /* * Tests if the shadow entry is for a folio that was recently evicted. */ -static bool lru_gen_test_recent(struct lruvec *lruvec, - unsigned long token, bool file) +static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, + unsigned long distance) { - unsigned long max_seq; + struct lru_gen_folio *lrugen; + unsigned long recent =3D 0; + int hist, tier; =20 - max_seq =3D READ_ONCE((lruvec)->lrugen.max_seq); - max_seq &=3D BIT((file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON) - LRU_REFS= _BITS) - 1; + lrugen =3D &lruvec->lrugen; + hist =3D lru_hist_from_seq(READ_ONCE(lrugen->min_seq[file])); + for (tier =3D 0; tier < MAX_NR_TIERS; tier++) + recent +=3D atomic_long_read(&lrugen->evicted[hist][file][tier]); =20 - return abs_diff(max_seq, token >> LRU_REFS_BITS) < MAX_NR_GENS; + return distance <=3D recent; } =20 static void lru_gen_refault(struct folio *folio, void *shadow) { bool recent; - int hist, tier, refs; bool workingset; - unsigned long token; + int hist, tier, refs; struct lruvec *lruvec; struct lru_gen_folio *lrugen; int type =3D folio_is_file_lru(folio); int delta =3D folio_nr_pages(folio); + unsigned long token, distance, total; =20 - lruvec =3D try_unpack_get_lruvec(shadow, &token, &workingset, false); + lruvec =3D try_unpack_get_lruvec(shadow, &token, &workingset, true); if (!lruvec) return; if (lruvec !=3D folio_lruvec(folio)) goto out_put; =20 - recent =3D lru_gen_test_recent(lruvec, token, type); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta); =20 - if (!recent) - goto out_put; - lrugen =3D &lruvec->lrugen; - hist =3D lru_hist_from_seq(READ_ONCE(lrugen->min_seq[type])); - refs =3D ((token & (BIT(LRU_REFS_BITS) - 1)) << 1) + workingset; - tier =3D lru_tier_from_refs(refs); =20 - atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); + distance =3D lru_distance(lruvec, token, + type ? LRU_GEN_EVICT_BITS : LRU_GEN_EVICT_BITS_ANON, + lru_gen_bucket_order[type]); + recent =3D lru_gen_test_recent(lruvec, type, distance); + + total =3D lruvec_page_state(lruvec, NR_ACTIVE_FILE) + + lruvec_page_state(lruvec, NR_INACTIVE_FILE); + if (!type || mem_cgroup_get_nr_swap_pages(lruvec_memcg(lruvec))) { + total +=3D lruvec_page_state(lruvec, NR_ACTIVE_ANON) + + lruvec_page_state(lruvec, NR_INACTIVE_ANON); + } + + /* Return if it's neither recently evicted nor fits workingset. */ + if (!recent && distance > total) + goto out_put; =20 - /* see folio_add_lru() where folio_set_active() will be called */ - if (lru_gen_in_fault()) + token >>=3D type ? LRU_GEN_EVICT_BITS : LRU_GEN_EVICT_BITS_ANON; + token &=3D (BIT(LRU_REFS_BITS) - 1); + refs =3D (token << 1) + workingset; + tier =3D lru_tier_from_refs(refs); + + /* Set refault as active. */ + if (distance < total / 2) { + folio_set_active(folio); mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); + } =20 + /* Restore reference count. */ if (refs) { folio_set_lru_refs(folio, refs); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } + + /* + * If it's recently evicted, update the recent gen's counter. Else, + * update the global counter, increase total too to avoid having + * a refault rate > 1. + */ + if (recent) { + atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); + } else { + atomic_long_add(delta, &lrugen->avg_total[type][tier]); + atomic_long_add(delta, &lrugen->avg_refaulted[type][tier]); + } out_put: put_lruvec(lruvec); } @@ -484,8 +520,8 @@ static void *lru_gen_eviction(struct folio *folio) return NULL; } =20 -static bool lru_gen_test_recent(struct lruvec *lruvec, - unsigned long token, bool file) +static bool lru_gen_test_recent(struct lruvec *lruvec, bool file, + unsigned long distance) { return false; } @@ -551,15 +587,16 @@ bool workingset_test_recent(void *shadow, bool file, = bool *workingset, unsigned long distance; bool recent; =20 + lruvec =3D try_unpack_get_lruvec(shadow, &eviction, workingset, flush); + if (!lruvec) + return false; + if (lru_gen_enabled()) { - lruvec =3D try_unpack_get_lruvec(shadow, &eviction, workingset, false); - if (!lruvec) - return false; - recent =3D lru_gen_test_recent(lruvec, eviction, file); + distance =3D lru_distance(lruvec, eviction, + file ? LRU_GEN_EVICT_BITS : LRU_GEN_EVICT_BITS_ANON, + lru_gen_bucket_order[file]); + recent =3D lru_gen_test_recent(lruvec, file, distance); } else { - lruvec =3D try_unpack_get_lruvec(shadow, &eviction, workingset, flush); - if (!lruvec) - return false; distance =3D lru_distance(lruvec, eviction, file ? LRU_EVICT_BITS : LRU_EVICT_BITS_ANON, bucket_order[file]); @@ -850,6 +887,15 @@ static int __init workingset_init(void) pr_info("workingset: timestamp_bits=3D%d (anon: %d) max_order=3D%d bucket= _order=3D%u (anon: %d)\n", LRU_EVICT_BITS, LRU_EVICT_BITS_ANON, max_order, bucket_order[WORKINGSET_FILE], bucket_order[WORKINGSET_ANON]); +#ifdef CONFIG_LRU_GEN + if (max_order > LRU_GEN_EVICT_BITS) + lru_gen_bucket_order[WORKINGSET_FILE] =3D max_order - LRU_GEN_EVICT_BITS; + if (max_order > LRU_GEN_EVICT_BITS_ANON) + lru_gen_bucket_order[WORKINGSET_ANON] =3D max_order - LRU_GEN_EVICT_BITS= _ANON; + pr_info("workingset: lru_gen_timestamp_bits=3D%d (anon: %d) lru_gen_bucke= t_order=3D%u (anon %d)\n", + LRU_GEN_EVICT_BITS, LRU_GEN_EVICT_BITS_ANON, + lru_gen_bucket_order[WORKINGSET_FILE], lru_gen_bucket_order[WORKINGSET_A= NON]); +#endif =20 workingset_shadow_shrinker =3D shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B316426EBE for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=M13jOfEbYiLSBR8THIlsre7+/Y+3ABUqzRYpBE236jFkW9jMa4Q1LcbNYE9+gGq4Mp9FGU2tEjHViC57vaB20J8RnP8mmj0bO6q1u55CGRJ1rPZ5m/hrA497xonPKRWtEnv5uGJnouHleDo431TFqOQrObQd5slQLdM8PFqsAYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=EfYFoJPzrm/LKZNnV8IsPykaRdCK7lu5nugZRzWhxPY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J3IFCi9+8LB604sc2SKdiaF3YpHUw5hIFj7/qq8WKkp/ugyOZzhvDKyaj+VDrBRA7GwBM212MzyRSL5Xqjcu9cbtziEre0wakdeKoUq1V+OswVz0GiW9IEFUJjxjkU7V7doY3wIXBAwrEFt9ydB1NReQimxLXfgMdAQejztM8Qk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=U2EiVSAp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="U2EiVSAp" Received: by smtp.kernel.org (Postfix) with ESMTPS id 7B1AEC2BCC6; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=EfYFoJPzrm/LKZNnV8IsPykaRdCK7lu5nugZRzWhxPY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=U2EiVSApD5C5TEkHm24HPGShVRGE0py7AvOwvwpzh5rW1nfFFVgSZtwTFzCMiKd7+ KDXIAchOBJ7hHlW6wqQ+SxcnBWaK5JAFlP/DUsJrWF3zqQhY21Vsd63L1qnFa22z1r 0ZfZ3Tqeu+1ASmGQCAdf+SsnFRxfWgEYPWVW67RCuGeBjoJnNVOhaK3GjJmfgi8gWi XiWYgAvyaqzMOiNM1WRxFhIlEM2wTwCr4HG3MQfloJhXQAdMZGGFf8YFOaUn5U4f6D /Fe14xFmxRYo83KOIMLmL71L7f7oQHeAi4oBrp62RUkcQqLb44aEqF53mMkpZG320/ CefyqHoI7h8bA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7159DCD3427; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:17 +0800 Subject: [PATCH RFC 31/32] mm: remove PG_workingset Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-31-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=9202; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=+4Zon/rLWOFCjp3678u/teBTE57oMcI/CfYNFDguoyE=; b=hoXxIcA+fm83eacooynsDJqnJlneSepVXkhwliW4wpuEj1vJ4nW6YSdD0HFKLVUk1Vuj7uymL 5VdUtAhxPBrC7HosG9Cs+h9keewvJQ0YCjrJg7DlMjpcwGqXSD451Jq X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Now PG_workingset is used as the second lowest bit of folio's access count for MGLRU, and for non-MGLRU access to this bit is wrapper by new helpers. We can now remove this bit to avoid the ugly dance when setting the folio's reference count. Extend the folio's access count field width and convert helpers to use the lowest bit directly. Merge PG_workingset into LRU referenced bits, no feature change. Signed-off-by: Kairui Song --- fs/erofs/zdata.c | 3 ++- fs/fuse/dev.c | 1 - include/linux/mm_inline.h | 10 ++++------ include/linux/mmzone.h | 10 +++++----- include/linux/page-flags.h | 5 +---- include/trace/events/mmflags.h | 1 - kernel/bounds.c | 2 +- mm/huge_memory.c | 1 - mm/slub.c | 2 +- mm/workingset.c | 8 ++++---- 10 files changed, 18 insertions(+), 25 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 43bb5a6a9924..44bcf16fff6a 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -8,6 +8,7 @@ #include #include #include +#include =20 #define Z_EROFS_MAX_SYNC_DECOMPRESS_BYTES 12288 #define Z_EROFS_PCLUSTER_MAX_PAGES (Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE) @@ -1734,7 +1735,7 @@ static void z_erofs_submit_queue(struct z_erofs_front= end *f, DBG_BUGON(bvec.bv_len < sb->s_blocksize); } =20 - if (unlikely(PageWorkingset(bvec.bv_page)) && + if (unlikely(folio_is_workingset(page_folio(bvec.bv_page))) && !memstall) { psi_memstall_enter(&pflags); memstall =3D 1; diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5dda7080f4a9..12270c5f647c 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -966,7 +966,6 @@ static int fuse_check_folio(struct folio *folio) 1 << PG_referenced | 1 << PG_lru | 1 << PG_active | - 1 << PG_workingset | 1 << PG_reclaim | 1 << PG_waiters | LRU_GEN_MASK | LRU_REFS_MASK))) { diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a108695424fb..446597e594e8 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -107,8 +107,7 @@ static inline int lru_refs_from_flags(unsigned long fla= gs) * LRU_REFS_FLAGS. */ refs =3D (flags & BIT(PG_referenced)) ? BIT(0) : 0; - refs +=3D (flags & BIT(PG_workingset)) ? BIT(1) : 0; - refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 2; + refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 1; return refs; } =20 @@ -124,9 +123,7 @@ static inline void lru_refs_set_flags(unsigned long *fl= ags, unsigned int refs) *flags &=3D ~LRU_REFS_FLAGS; if (refs & BIT(0)) *flags |=3D BIT(PG_referenced); - if (refs & BIT(1)) - *flags |=3D BIT(PG_workingset); - *flags |=3D (((unsigned long)refs) >> 2) << LRU_REFS_PGOFF; + *flags |=3D (((unsigned long)refs) >> 1) << LRU_REFS_PGOFF; } =20 static inline int folio_lru_refs(const struct folio *folio) @@ -253,7 +250,8 @@ static inline bool folio_is_workingset(const struct fol= io *folio) */ static inline void folio_mark_workingset_by_bit(struct folio *folio) { - set_mask_bits(folio_flags(folio, 0), BIT(PG_workingset), BIT(PG_workingse= t)); + BUILD_BUG_ON(LRU_REFS_WIDTH < 2); + set_mask_bits(folio_flags(folio, 0), BIT(LRU_REFS_PGOFF + 1), BIT(LRU_REF= S_PGOFF + 1)); } =20 static inline void folio_migrate_refs(struct folio *new, const struct foli= o *old) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aa27627b0406..f55d256fd200 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -522,8 +522,8 @@ enum lruvec_flags { * LRU_REFS_PROTECTED. It still considered workingset but moved to a higher * gen representing a higher hotness and reclaim bias. * - * Tiering uses PG_workingset and PG_referenced and the lower two bits, - * LRU_REFS_MASK as the higher bits. + * Tiering uses PG_referenced as the lowest bit and LRU_REFS_MASK as the + * higher bits. * * A folio's referenced count never goes backwards except upon gen increase * in (7) or a promotion. Passive protect by PID will reset a folio with h= igher @@ -536,7 +536,7 @@ enum lruvec_flags { * * MAX_NR_TIERS is set to 4 so that the multi-gen LRU can support twice the * number of categories of the active/inactive LRU when keeping track of - * accesses through file descriptors. This uses MAX_NR_TIERS-3 spare bits = in + * accesses through file descriptors. This uses MAX_NR_TIERS-2 spare bits = in * folio->flags, masked by LRU_REFS_MASK. */ #define MAX_NR_TIERS 4U @@ -549,8 +549,8 @@ enum lruvec_flags { #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_GEN_MAX (BIT(LRU_GEN_WIDTH - 1) - 1) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) -#define LRU_REFS_FLAGS (LRU_REFS_MASK | BIT(PG_referenced) | BIT(PG_worki= ngset)) -#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH + 2) - 1) +#define LRU_REFS_FLAGS (LRU_REFS_MASK | BIT(PG_referenced)) +#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH + 1) - 1) =20 struct lruvec; struct page_vma_mapped_walk; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0e03d816e8b9..451e96ca89f7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -100,7 +100,6 @@ enum pageflags { PG_head, /* Must be in bit 6 */ PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and= in the same byte as "PG_locked" */ PG_active, - PG_workingset, PG_owner_priv_1, /* Owner use. If pagecache, fs may use */ PG_owner_2, /* Owner use. If pagecache, fs may use */ PG_arch_1, @@ -190,7 +189,7 @@ enum pageflags { =20 /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned =3D PG_active, - PG_large_rmappable =3D PG_workingset, /* anon or file-backed */ + PG_large_rmappable =3D PG_swapbacked, /* anon or file-backed */ PG_partially_mapped =3D PG_reclaim, /* was identified to be partially map= ped */ }; =20 @@ -554,8 +553,6 @@ PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, P= F_HEAD) FOLIO_FLAG(active, FOLIO_HEAD_PAGE) __FOLIO_CLEAR_FLAG(active, FOLIO_HEAD_PAGE) FOLIO_TEST_CLEAR_FLAG(active, FOLIO_HEAD_PAGE) -PAGEFLAG(Workingset, workingset, PF_HEAD) - TESTCLEARFLAG(Workingset, workingset, PF_HEAD) PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems = */ =20 /* Xen */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a6e5a44c9b42..c13575d8c7ee 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -147,7 +147,6 @@ TRACE_DEFINE_ENUM(___GFP_LAST_BIT); DEF_PAGEFLAG_NAME(dirty), \ DEF_PAGEFLAG_NAME(lru), \ DEF_PAGEFLAG_NAME(active), \ - DEF_PAGEFLAG_NAME(workingset), \ DEF_PAGEFLAG_NAME(owner_priv_1), \ DEF_PAGEFLAG_NAME(owner_2), \ DEF_PAGEFLAG_NAME(arch_1), \ diff --git a/kernel/bounds.c b/kernel/bounds.c index 06a034713b5d..02b619eb6106 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -25,7 +25,7 @@ int main(void) DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t)); #ifdef CONFIG_LRU_GEN DEFINE(LRU_GEN_WIDTH, order_base_2(MAX_NR_GENS + 1)); - DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 3); + DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 2); #else DEFINE(LRU_GEN_WIDTH, 0); DEFINE(__LRU_REFS_WIDTH, 0); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87a2640e3396..f30445c01f18 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3643,7 +3643,6 @@ static void __split_folio_to_order(struct folio *foli= o, int old_order, (1L << PG_mlocked) | (1L << PG_uptodate) | (1L << PG_active) | - (1L << PG_workingset) | (1L << PG_locked) | (1L << PG_unevictable) | #ifdef CONFIG_ARCH_USES_PG_ARCH_2 diff --git a/mm/slub.c b/mm/slub.c index 161079ac5ba1..4b363a81c6b0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -191,7 +191,7 @@ */ enum slab_flags { SL_locked =3D PG_locked, - SL_partial =3D PG_workingset, /* Historical reasons for this bit */ + SL_partial =3D PG_owner_priv_1, /* Use a flag that is never set for slab = folio */ SL_pfmemalloc =3D PG_active, /* Historical reasons for this bit */ }; =20 diff --git a/mm/workingset.c b/mm/workingset.c index 25a8eda233ef..f9de276a1404 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -215,11 +215,11 @@ #define LRU_EVICT_BITS_ANON (LRU_EVICT_BITS - SWAP_COUNT_SHIFT) =20 /* - * LRU refs uses LRU_REFS_WIDTH + 2 bits, the 2 bits are PG_workingset and - * PG_referenced. But here we record PG_workingset separately (to reuse - * pack_shadow). + * LRU refs uses LRU_REFS_WIDTH + 1 bit in folio->flags, the extra 1 bit + * is PG_referenced. But here we record the low bit of refs separately + * (to reuse pack_shadow). */ -#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 2) - 1) +#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 1) - 1) #define LRU_GEN_EVICT_BITS (LRU_EVICT_BITS - LRU_REFS_BITS) #define LRU_GEN_EVICT_BITS_ANON (LRU_EVICT_BITS_ANON - LRU_REFS_BITS) =20 --=20 2.54.0 From nobody Sun Jun 14 06:10:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAC78426EC7 for ; Fri, 1 May 2026 21:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; cv=none; b=e1nf4P8jUHvchhOibp8HljeIRzyEP6Iha84rHx8i4qFoEdevmg0lH2jhdSxcZeTltER5YpHni4AZiWZzmiSNtw0AP1DNFCqWhPDw1qPl4SLiFQhrMG6jvGKbKp0gzGUtLmo5HXY0qMParna8QHYp3RSwO7dDm37layM23C7fjYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777669454; c=relaxed/simple; bh=doK35nLkz5KNd5wFsipWvD5mR9lAV1HrW1BUxLjjaQQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MKnr0RZ7eTbhqy2oiLsa8vhsjwTTI2A2hUbVbX/t8By84HOJ3uoTCoVgLlybTV5xSV74MqG4thN0v7PJqf5UljDd1HDrFSJeskuKfAzwW7CtzMY83mOpXBMLisUqfheuZ/xGfDfjTb52dVmhLIgEKEBobbSQV00SmgDm5ehPOXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JHlM8jrg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JHlM8jrg" Received: by smtp.kernel.org (Postfix) with ESMTPS id 8C045C2BCC4; Fri, 1 May 2026 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777669454; bh=doK35nLkz5KNd5wFsipWvD5mR9lAV1HrW1BUxLjjaQQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=JHlM8jrgo4S1gaO3BtXvkmeotdlraqcQjESCI5XIcy8Q6plVV34YcYFNGwHZDvnlU GBnYJBYfQmlxOFcngT3SqFcRGtv+5HM8c0rlZ8i/BM/C60gYVYd+8RbEoHtcv/i0OD b+iJ1bl7XQjJNOuo8ZWKQDoaOuSl/zQQQbEumqG5dtdX5up0gSK+Nob3MQhaAEUUnA jdC1lbAlxQH9kDZIY53Nj2meIc2ib27DRQ6t86hMs+NsITnsh4OsVzBNfgFoj+SLoT jaXdMAoUR5+aYBV9Pvm2jQemRxkA1sZ/610N8XIHP1tdebyVDGeLm01FCyIFdJOAm1 TuBsvBGgCAXPg== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8444CCCFA13; Fri, 1 May 2026 21:04:14 +0000 (UTC) From: Kairui Song via B4 Relay Date: Sat, 02 May 2026 05:04:18 +0800 Subject: [PATCH RFC 32/32] mm: remove PG_referenced Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260502-mglru-fg-v1-32-913619b014d9@tencent.com> References: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> In-Reply-To: <20260502-mglru-fg-v1-0-913619b014d9@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , David Hildenbrand , Lorenzo Stoakes , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Vlastimil Babka , Suren Baghdasaryan , Kemeng Shi , Nhat Pham , Baoquan He , Youngjun Park , Zi Yan , Gregory Price , "Matthew Wilcox (Oracle)" , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Lance Yang , Hugh Dickins , SeongJae Park , David Rientjes , Yu Zhao , Vernon Yang , Zicheng Wang , Chen Ridong , Tal Zussman , Kairui Song , linux-kernel@vger.kernel.org, Kairui Song X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777669448; l=11205; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=g2ZuyybWv1/6mZFk+0JvM1YtPzR/czoYZ6oSGgPM8VM=; b=BB/IakPj7gn2GvJbqMjUa/xsbX9CqxXM0JOhr2DmQ3jA/C5RE6ocQAbcrAFp3Jb3uiVnU25Y4 H8zQijTh9l6Bm18Vt+jYQDKNLoWIROPjzk6QvOFTH1frABIZ0jcRLAx X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Now PG_referenced is used as the lowest bit of folio's access count for MGLRU, and for non-MGLRU access to this bit is wrapper by new helpers. We can now remove this flag to avoid the ugly dance when setting the folio's reference count. Extend the folio's access count field width and convert helpers to use the lowest bit directly. Merge PG_referenced into LRU referenced bits, no feature change. Signed-off-by: Kairui Song --- fs/fuse/dev.c | 1 - fs/proc/page.c | 1 - include/linux/mm.h | 2 +- include/linux/mm_inline.h | 18 ++++++------------ include/linux/mmzone.h | 9 ++++----- include/linux/page-flags.h | 8 ++------ include/trace/events/mmflags.h | 1 - include/uapi/linux/kernel-page-flags.h | 1 - kernel/bounds.c | 2 +- mm/huge_memory.c | 3 +-- mm/workingset.c | 8 ++++---- tools/mm/page-types.c | 1 - 12 files changed, 19 insertions(+), 36 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 12270c5f647c..4fb9d3193ccf 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -963,7 +963,6 @@ static int fuse_check_folio(struct folio *folio) folio->mapping !=3D NULL || (folio->flags.f & PAGE_FLAGS_CHECK_AT_PREP & ~(1 << PG_locked | - 1 << PG_referenced | 1 << PG_lru | 1 << PG_active | 1 << PG_reclaim | diff --git a/fs/proc/page.c b/fs/proc/page.c index f9b2c2c906cd..018454279f66 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -221,7 +221,6 @@ u64 stable_page_flags(const struct page *page) u |=3D kpf_copy_bit(k, KPF_WRITEBACK, PG_writeback); =20 u |=3D kpf_copy_bit(k, KPF_LRU, PG_lru); - u |=3D kpf_copy_bit(k, KPF_REFERENCED, PG_referenced); u |=3D kpf_copy_bit(k, KPF_ACTIVE, PG_active); u |=3D kpf_copy_bit(k, KPF_RECLAIM, PG_reclaim); =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index 1d76da6e0791..008bd9187b9d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3570,7 +3570,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, = pud_t *pud, unsigned long a #endif /* CONFIG_MMU */ =20 enum pt_flags { - PT_kernel =3D PG_referenced, + PT_kernel =3D PG_owner_priv_1, PT_reserved =3D PG_reserved, /* High bits are used for zone/node/section */ }; diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 446597e594e8..7574dd3e244b 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -100,15 +100,11 @@ static __always_inline enum lru_list folio_lru_list(c= onst struct folio *folio) */ static inline int lru_refs_from_flags(unsigned long flags) { - int refs; - /* * Return the total number of accesses. Also see the comment on * LRU_REFS_FLAGS. */ - refs =3D (flags & BIT(PG_referenced)) ? BIT(0) : 0; - refs +=3D ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) << 1; - return refs; + return (flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF; } =20 /** @@ -121,9 +117,7 @@ static inline void lru_refs_set_flags(unsigned long *fl= ags, unsigned int refs) VM_WARN_ON_ONCE(refs > LRU_REFS_MAX); =20 *flags &=3D ~LRU_REFS_FLAGS; - if (refs & BIT(0)) - *flags |=3D BIT(PG_referenced); - *flags |=3D (((unsigned long)refs) >> 1) << LRU_REFS_PGOFF; + *flags |=3D ((unsigned long)refs) << LRU_REFS_PGOFF; } =20 static inline int folio_lru_refs(const struct folio *folio) @@ -199,7 +193,7 @@ static inline void __folio_init_referenced(struct folio= *folio) */ static inline void folio_mark_referenced_by_bit(struct folio *folio) { - set_mask_bits(folio_flags(folio, 0), BIT(PG_referenced), BIT(PG_reference= d)); + set_mask_bits(folio_flags(folio, 0), BIT(LRU_REFS_PGOFF), BIT(LRU_REFS_PG= OFF)); } =20 /** @@ -208,7 +202,7 @@ static inline void folio_mark_referenced_by_bit(struct = folio *folio) */ static inline void folio_clear_referenced_by_bit(struct folio *folio) { - set_mask_bits(folio_flags(folio, 0), BIT(PG_referenced), 0); + set_mask_bits(folio_flags(folio, 0), BIT(LRU_REFS_PGOFF), 0); } =20 /** @@ -217,7 +211,7 @@ static inline void folio_clear_referenced_by_bit(struct= folio *folio) */ static inline bool folio_test_clear_referenced_bit(struct folio *folio) { - return test_and_clear_bit(PG_referenced, folio_flags(folio, 0)); + return test_and_clear_bit(LRU_REFS_PGOFF, folio_flags(folio, 0)); } =20 /** @@ -226,7 +220,7 @@ static inline bool folio_test_clear_referenced_bit(stru= ct folio *folio) */ static inline bool folio_is_referenced_by_bit(const struct folio *folio) { - return test_bit(PG_referenced, const_folio_flags(folio, 0)); + return test_bit(LRU_REFS_PGOFF, const_folio_flags(folio, 0)); } =20 /** diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index f55d256fd200..575afff010eb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -522,8 +522,7 @@ enum lruvec_flags { * LRU_REFS_PROTECTED. It still considered workingset but moved to a higher * gen representing a higher hotness and reclaim bias. * - * Tiering uses PG_referenced as the lowest bit and LRU_REFS_MASK as the - * higher bits. + * Tiering uses LRU_REFS_WIDTH bits in folio->flags, masked by LRU_REFS_MA= SK. * * A folio's referenced count never goes backwards except upon gen increase * in (7) or a promotion. Passive protect by PID will reset a folio with h= igher @@ -536,7 +535,7 @@ enum lruvec_flags { * * MAX_NR_TIERS is set to 4 so that the multi-gen LRU can support twice the * number of categories of the active/inactive LRU when keeping track of - * accesses through file descriptors. This uses MAX_NR_TIERS-2 spare bits = in + * accesses through file descriptors. This uses MAX_NR_TIERS-1 bits in * folio->flags, masked by LRU_REFS_MASK. */ #define MAX_NR_TIERS 4U @@ -549,8 +548,8 @@ enum lruvec_flags { #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_GEN_MAX (BIT(LRU_GEN_WIDTH - 1) - 1) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) -#define LRU_REFS_FLAGS (LRU_REFS_MASK | BIT(PG_referenced)) -#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH + 1) - 1) +#define LRU_REFS_FLAGS LRU_REFS_MASK +#define LRU_REFS_MAX (BIT(LRU_REFS_WIDTH) - 1) =20 struct lruvec; struct page_vma_mapped_walk; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 451e96ca89f7..ba82f5c755f8 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -63,7 +63,7 @@ * might lose their PG_swapbacked flag when they simply can be dropped (e.= g. as * a result of MADV_FREE). * - * PG_referenced, PG_reclaim are used for page reclaim for anonymous and + * PG_reclaim are used for page reclaim for anonymous and * file-backed pagecache (see mm/vmscan.c). * * PG_arch_1 is an architecture specific page state bit. The generic code @@ -93,13 +93,12 @@ enum pageflags { PG_locked, /* Page is locked. Don't touch. */ PG_writeback, /* Page is under writeback */ - PG_referenced, PG_uptodate, PG_dirty, PG_lru, + PG_active, PG_head, /* Must be in bit 6 */ PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and= in the same byte as "PG_locked" */ - PG_active, PG_owner_priv_1, /* Owner use. If pagecache, fs may use */ PG_owner_2, /* Owner use. If pagecache, fs may use */ PG_arch_1, @@ -543,9 +542,6 @@ static inline int TestClearPage##uname(struct page *pag= e) { return 0; } =20 __PAGEFLAG(Locked, locked, PF_NO_TAIL) FOLIO_FLAG(waiters, FOLIO_HEAD_PAGE) -FOLIO_FLAG(referenced, FOLIO_HEAD_PAGE) - FOLIO_TEST_CLEAR_FLAG(referenced, FOLIO_HEAD_PAGE) - __FOLIO_SET_FLAG(referenced, FOLIO_HEAD_PAGE) PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index c13575d8c7ee..b411475e82d1 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -142,7 +142,6 @@ TRACE_DEFINE_ENUM(___GFP_LAST_BIT); #define __def_pageflag_names \ DEF_PAGEFLAG_NAME(locked), \ DEF_PAGEFLAG_NAME(waiters), \ - DEF_PAGEFLAG_NAME(referenced), \ DEF_PAGEFLAG_NAME(uptodate), \ DEF_PAGEFLAG_NAME(dirty), \ DEF_PAGEFLAG_NAME(lru), \ diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/ke= rnel-page-flags.h index ff8032227876..adff709a86f6 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -8,7 +8,6 @@ =20 #define KPF_LOCKED 0 #define KPF_ERROR 1 /* Now unused */ -#define KPF_REFERENCED 2 #define KPF_UPTODATE 3 #define KPF_DIRTY 4 #define KPF_LRU 5 diff --git a/kernel/bounds.c b/kernel/bounds.c index 02b619eb6106..4171cf03f296 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -25,7 +25,7 @@ int main(void) DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t)); #ifdef CONFIG_LRU_GEN DEFINE(LRU_GEN_WIDTH, order_base_2(MAX_NR_GENS + 1)); - DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 2); + DEFINE(__LRU_REFS_WIDTH, MAX_NR_TIERS - 1); #else DEFINE(LRU_GEN_WIDTH, 0); DEFINE(__LRU_REFS_WIDTH, 0); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f30445c01f18..02d356d38cea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3637,8 +3637,7 @@ static void __split_folio_to_order(struct folio *foli= o, int old_order, */ new_folio->flags.f &=3D ~PAGE_FLAGS_CHECK_AT_PREP; new_folio->flags.f |=3D (folio->flags.f & - ((1L << PG_referenced) | - (1L << PG_swapbacked) | + ((1L << PG_swapbacked) | (1L << PG_swapcache) | (1L << PG_mlocked) | (1L << PG_uptodate) | diff --git a/mm/workingset.c b/mm/workingset.c index f9de276a1404..6b0120c3456f 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -215,11 +215,11 @@ #define LRU_EVICT_BITS_ANON (LRU_EVICT_BITS - SWAP_COUNT_SHIFT) =20 /* - * LRU refs uses LRU_REFS_WIDTH + 1 bit in folio->flags, the extra 1 bit - * is PG_referenced. But here we record the low bit of refs separately - * (to reuse pack_shadow). + * LRU refs uses LRU_REFS_WIDTH bits in folio->flags. Its low bit is stored + * separately as the "workingset" bit in the shadow (to reuse pack_shadow); + * the remaining high bits are packed into the token. */ -#define LRU_REFS_BITS ((LRU_REFS_WIDTH + 1) - 1) +#define LRU_REFS_BITS (LRU_REFS_WIDTH - 1) #define LRU_GEN_EVICT_BITS (LRU_EVICT_BITS - LRU_REFS_BITS) #define LRU_GEN_EVICT_BITS_ANON (LRU_EVICT_BITS_ANON - LRU_REFS_BITS) =20 diff --git a/tools/mm/page-types.c b/tools/mm/page-types.c index d7e5e8902af8..de5c7b6f3135 100644 --- a/tools/mm/page-types.c +++ b/tools/mm/page-types.c @@ -101,7 +101,6 @@ static const char * const page_flag_names[] =3D { [KPF_LOCKED] =3D "L:locked", [KPF_ERROR] =3D "E:error", - [KPF_REFERENCED] =3D "R:referenced", [KPF_UPTODATE] =3D "U:uptodate", [KPF_DIRTY] =3D "D:dirty", [KPF_LRU] =3D "l:lru", --=20 2.54.0