From nobody Mon Jun 8 21:46:12 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2A0A22DFA4 for ; Tue, 26 May 2026 13:09:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800986; cv=none; b=grrFa1cK1MN+F0WD5Sea7rxe4ZuCumT87j5yQXLQghtMiHYf2MYeDSYbFksQXrUH0u+Eten2eY0YX1qil+ycli3odeRZXt98jFNWDCYwMyDfvS7snCAWY4+L+okcE8TBilAX7twQYvrlmDCgsCr2gS6sQwKHJVRZgqhSCnexHYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779800986; c=relaxed/simple; bh=TGvN+KOEVYffJaEUbwyeTmwVktKJe3gGx/0mGiwVUFc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=VFLbM7ibfN3i6wBj0nQWKUE6TadPr87pgV7xsJvKBKAmyQSZ+SOpGaibp/Td+F79Ev/si1c1pRsifVkUH3OkUaLESTMRqJX/aCk6uD1Qt9rpGpUFtrLG8VxCHf5HJjt10ClElR5YR4SHismxQ7WCTJyb+aMBzHb2FCQmZOCDU8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DqfzhVvX; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DqfzhVvX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 444A61F000E9; Tue, 26 May 2026 13:09:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779800984; bh=6V4hnJj1JqVqfqsfY3L4+1KunXCdfTgD+ix64T7uE08=; h=From:To:Cc:Subject:Date; b=DqfzhVvXcRXLgltGb2fggRsxmLbOBLAu1+aZ6eKRFRMNEuSlTXmPEqX+Zvf+JHTbD NdWAJY3MwXhwPaUbjhTB8O/75yw3ym7hjQic2+5wYlsp1OtLyHSkxNwoSH2hbTG85M tXGoROKi0LLeA5+Xw9GqHmwwRkAykanXhHt7nKWQ3KW/hKW65BiJI9jvZoK3vd+o/t 6YFUZBdE8ju5qRKV0V+QgP5pxZfDBIO7uk3f/cEWWtcjdvUmrvV+CPWkaZvWcSM16Y kH5MO3p58eH9Xt9MYp8Hh4yk1bHe/6J4urbsZ977hh3htMEnB6iv+qN6J40oxFcMFW ovQ+nD36/zaEg== From: "Barry Song (Xiaomi)" To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Barry Song (Xiaomi)" , Lance Yang , Xueyuan Chen , Pedro Falcato , Kairui Song , Qi Zheng , Shakeel Butt , wangzicheng , Suren Baghdasaryan , Lei Liu , Matthew Wilcox , Axel Rasmussen , Yuanchu Xie , Wei Xu , Will Deacon , Kalesh Singh Subject: [PATCH v3] mm/mglru: use folio_mark_accessed to replace folio_set_active Date: Tue, 26 May 2026 21:09:38 +0800 Message-Id: <20260526130938.66253-1-baohua@kernel.org> X-Mailer: git-send-email 2.39.3 (Apple Git-146) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" MGLRU gives high priority to folios mapped in page tables. As a result, folio_set_active() is invoked for all folios read during page faults. In practice, however, readahead can bring in many folios that are never accessed via page tables. A previous attempt by Lei Liu proposed introducing a separate LRU for readahead[1] to make readahead pages easier to reclaim, but that approach is likely over-engineered. Before commit 4d5d14a01e2c ("mm/mglru: rework workingset protection"), folios with PG_active were always placed in the youngest generation, leading to over-protection and increased refaults. After that commit, PG_active folios are placed in the second youngest generation, which is still too optimistic given the presence of readahead. In contrast, the classic active/inactive scheme is more conservative. This patch switches to using folio_mark_accessed() and begins prefaulted file folios from the second oldest generation instead of active generations. We should also adjust the following accordingly: - WORKINGSET_ACTIVATE: aligned with setting active for refaulted workingset folios; - lru_gen_folio_seq(): place (pre)faulted file folios into the second oldest generation; - promote second-scanned folios to workingset in folio_check_references(): we now have to depend on folio_lru_refs() > 1, since we previously relied on PG_referenced being set during the first scan, but PG_referenced is now set earlier. On x86, running a kernel build inside a memcg with a 1GB memory limit using 20 threads. w/o patch: real 1m50.764s user 25m32.305s sys 4m0.012s pswpin: 1333245 pswpout: 4366443 pgpgin: 6962592 pgpgout: 17780712 swpout_zero: 1019603 swpin_zero: 14764 refault_file: 287794 refault_anon: 1347963 w/ patch: real 1m48.879s user 25m29.224s sys 3m37.421s pswpin: 568480 pswpout: 2322657 pgpgin: 4073416 pgpgout: 9613408 swpout_zero: 593275 swpin_zero: 9118 refault_file: 262505 refault_anon: 577550 active/inactive LRU: real 1m49.928s user 25m28.196s sys 3m40.740s pswpin: 463452 pswpout: 2309119 pgpgin: 4438856 pgpgout: 9568628 swpout_zero: 743704 swpin_zero: 7244 refault_file: 562555 refault_anon: 470694 Lance and Xueyuan made a huge contribution to this patch through testing. [1] https://lore.kernel.org/linux-mm/20250916072226.220426-1-liulei.rjpt@vi= vo.com/ Signed-off-by: Barry Song (Xiaomi) Tested-by: Lance Yang Tested-by: Xueyuan Chen Cc: Pedro Falcato Cc: Kairui Song Cc: Qi Zheng Cc: Shakeel Butt Cc: wangzicheng Cc: Suren Baghdasaryan Cc: Lei Liu Cc: Matthew Wilcox (Oracle) Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: Will Deacon Cc: Kalesh Singh --- -v3: * Fix 2nd pte-scanned promotion in folio_check_references(), per Kairui; * Restore anon folios behaviour in lru_gen_folio_seq(), per Kairui; -v2: https://lore.kernel.org/linux-mm/20260525123205.51874-1-baohua@kernel.org/ * Fix WORKINGSET_ACTIVATE - workingset will be set to active during refaul= t; * Avoid unconditional protecting anon folios in lru_gen_folio_seq(); * Also adjusted workingset set accordingly in folio_check_references(). -v1: https://lore.kernel.org/linux-mm/20260418120233.7162-1-baohua@kernel.org/ -rfc was: [PATCH RFC] mm/mglru: lazily activate folios while folios are really mapped https://lore.kernel.org/linux-mm/20260225212642.15219-1-21cnbao@gmail.com/ include/linux/mm_inline.h | 2 +- mm/swap.c | 16 +++++++++++++--- mm/vmscan.c | 6 +++++- mm/workingset.c | 10 ++++++---- 4 files changed, 25 insertions(+), 9 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a171070e15f0..a8430a7ae054 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -247,7 +247,7 @@ static inline unsigned long lru_gen_folio_seq(const str= uct lruvec *lruvec, (folio_test_dirty(folio) || folio_test_writeback(folio)))) gen =3D MIN_NR_GENS; else - gen =3D MAX_NR_GENS - folio_test_workingset(folio); + gen =3D MAX_NR_GENS - (folio_test_workingset(folio) || folio_test_refere= nced(folio)); =20 return max(READ_ONCE(lrugen->max_seq) - gen + 1, READ_ONCE(lrugen->min_se= q[type])); } diff --git a/mm/swap.c b/mm/swap.c index 5cc44f0de987..a44829dcde7a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -509,10 +509,20 @@ void folio_add_lru(struct folio *folio) folio_test_unevictable(folio), folio); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); =20 - /* see the comment in lru_gen_folio_seq() */ + /* + * For refaulted workingset folios, set PG_active so they + * can be added to active generations. + * For prefaulted file folios, folio_mark_accessed() sets + * PG_referenced so lru_gen_folio_seq() places them into + * the second oldest generation. + */ if (lru_gen_enabled() && !folio_test_unevictable(folio) && - lru_gen_in_fault() && !(current->flags & PF_MEMALLOC)) - folio_set_active(folio); + lru_gen_in_fault() && !(current->flags & PF_MEMALLOC)) { + if (folio_test_workingset(folio)) + folio_set_active(folio); + else if (!folio_test_referenced(folio)) + folio_mark_accessed(folio); + } =20 folio_batch_add_and_move(folio, lru_add); } diff --git a/mm/vmscan.c b/mm/vmscan.c index e452cb043d46..745a55a3f7de 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -848,7 +848,11 @@ static bool lru_gen_set_refs(struct folio *folio) return false; } =20 - set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_workingset)); + /* Promote on second access */ + if (folio_lru_refs(folio) > 1) + set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_workingset)); + else + folio_mark_accessed(folio); return true; } #else diff --git a/mm/workingset.c b/mm/workingset.c index 07e6836d0502..f351798e723a 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -319,11 +319,13 @@ static void lru_gen_refault(struct folio *folio, void= *shadow) =20 atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]); =20 - /* see folio_add_lru() where folio_set_active() will be called */ - if (lru_gen_in_fault()) - mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); - if (workingset) { + /* + * see folio_add_lru(), where folio_set_active() is + * called for workingset folios + */ + if (lru_gen_in_fault()) + mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta); folio_set_workingset(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } else --=20 2.34.1