From nobody Wed Apr 1 22:34:21 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 206D93783A8 for ; Thu, 5 Mar 2026 11:54:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711666; cv=none; b=dUmqbYc0EJ1UzsjRPrD0jEIdGWatn3FL25uHVouyzkmFgoPvfLSEhI+GzIsqcG6hkuKzxyBmGL6V9uMFkMbYjSIyG9KGXhDruXb73U4L33X3M7/vpn4ZFv3/Djje4jiZxC43BKQV3RWIvHOw6wWxmrvgz0/j52Q/cUw2+zpC3SY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711666; c=relaxed/simple; bh=2iGkLzvp86WupNKMwu/KzCPVtK14qSmibY5QB4tIreU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AgFLRJwQ3sl/7X45YGB8Gwho+8c87xR9WZqqmC2d53/qxZzWwkMyAFcKGH5VKIJBM5P68IIZjdRmd/d3CBqtpC0g7ZMDA/dJbpMD5E3YH0Ybd5I3KadTrVzg89MUPNq4mXtKdPEw+Kj/3YCGp8oEy8U91n8clfaa56o2GVEtuGY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=K94LyqzX; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="K94LyqzX" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=slvzxnQOZnGyEx6uyl7ed+CIq1Y9lBOT7LEWCAJSIOk=; b=K94LyqzXS3gex8zFVnJgklSMCyl1mw39v9eFcSQDt3tLYoGbqG58fueQtSLfGedD0qbMWL IaWKIO1I3HWL/QISzw6kxwulQCi7W7F0VQ7rswv9IR110N0ONu0cttNUvLRj5hIkQZFpLY vQ7oRfBVai5yz3N4q7P3MJsT4Nr+388= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng , Chen Ridong Subject: [PATCH v6 01/33] mm: memcontrol: remove dead code of checking parent memory cgroup Date: Thu, 5 Mar 2026 19:52:19 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Since the no-hierarchy mode has been deprecated after the commit: commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode"). As a result, parent_mem_cgroup() will not return NULL except when passing the root memcg, and the root memcg cannot be offline. Hence, it's safe to remove the check on the returned value of parent_mem_cgroup(). Remove the corresponding dead code. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Reviewed-by: Chen Ridong Acked-by: Shakeel Butt --- mm/memcontrol.c | 5 ----- mm/shrinker.c | 6 +----- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index db59fad3503f2..aab863e1822d4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3424,9 +3424,6 @@ static void memcg_offline_kmem(struct mem_cgroup *mem= cg) return; =20 parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - memcg_reparent_list_lrus(memcg, parent); =20 /* @@ -3706,8 +3703,6 @@ struct mem_cgroup *mem_cgroup_private_id_get_online(s= truct mem_cgroup *memcg, un break; } memcg =3D parent_mem_cgroup(memcg); - if (!memcg) - memcg =3D root_mem_cgroup; } return memcg; } diff --git a/mm/shrinker.c b/mm/shrinker.c index 94646ee0af63b..4cd33222256ef 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -286,14 +286,10 @@ void reparent_shrinker_deferred(struct mem_cgroup *me= mcg) { int nid, index, offset; long nr; - struct mem_cgroup *parent; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); struct shrinker_info *child_info, *parent_info; struct shrinker_info_unit *child_unit, *parent_unit; =20 - parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - /* Prevent from concurrent shrinker_info expand */ mutex_lock(&shrinker_mutex); for_each_node(nid) { --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8249196C7C for ; Thu, 5 Mar 2026 11:54:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711675; cv=none; b=XxagWz9vs1hmQXY+8/k43z7Mi4tx0pErTO64BB866kGL178wGLBo2qf60L2p189AGIqacfSdp2NFdjjUNLQ35eP7c9xtT53BDn9MxNxgCmyBhKw7DHvxrWcD4CxOllWcn5tI+K6nTIM1PWU04PBCd5qsPd+MZUeDTfU/VFFaPoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711675; c=relaxed/simple; bh=yvERgrZ8n3bm13Q0+hXdxfQbbLIc5Vwh/csEg1LcyFY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fqtM/DC3tuo+gyUEoLR4oiHlO/xrvVHC848yqndeWc752ZiNWckoOIpV6PvNgeoR/F7G78Hjg8YgiCKOkZZHiCaJkVPlQhZL/eZ/vrCj5VI8U6lbpV2eon9hDJxOIFqDdK/mg/3MGuXwI1WNNV/7Ozn3cmLdPDKXhuOtvsVbdRw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=igXWvWa2; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="igXWvWa2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711663; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DH3ne046D93m9UmkRfI664mjum+ALgiIemSaapH9p/k=; b=igXWvWa2LejEgfiijOe0mqvJlDfl8MXK2pM3TUlHRBJw3DvKeE/vv3mPPPyqek07NhZImL ZdDmJo/LRii7ka7WGoYFjnxLYzyHmtwDz5T9r1Yc6M2rTq0fiKOyx29t5ximLc6DqFYOVE qQtgG4fOmPkoxrm5SIsaUumvw4C4V3w= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 02/33] mm: workingset: use folio_lruvec() in workingset_refault() Date: Thu, 5 Mar 2026 19:52:20 +0800 Message-ID: <11bd2fbbf082f4f7972a1113ca42a61fbe2876a9.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Use folio_lruvec() to simplify the code. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Shakeel Butt --- mm/workingset.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 37a94979900f1..5e8b6e62a6175 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -541,8 +541,6 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, void workingset_refault(struct folio *folio, void *shadow) { bool file =3D folio_is_file_lru(folio); - struct pglist_data *pgdat; - struct mem_cgroup *memcg; struct lruvec *lruvec; bool workingset; long nr; @@ -564,10 +562,7 @@ void workingset_refault(struct folio *folio, void *sha= dow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); - memcg =3D folio_memcg(folio); - pgdat =3D folio_pgdat(folio); - lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - + lruvec =3D folio_lruvec(folio); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 167383845BF for ; Thu, 5 Mar 2026 11:54:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711676; cv=none; b=dCUOk0X84tfpi1jyZ3qjUWIbi0i+Fn6MkgaGClOpvUx7mWaapG6k5Jrt+EpMfnBRaONF4a8O3uoCU3ALvlViXP+6eyI0hKmL4Bt3YmEzpRe49yyhzO4JZO8gcKw5nCys4i6kzQ2KFnxuxbNiLlOjJKAnxnLaj239stjv2rF40hs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711676; c=relaxed/simple; bh=ncVyXNHp/N8BzlgZ4yDBtExobTCigjexXS+MlMiqqzs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JRTW1/9SB8kSlwN5jmrZ3er57eDliNSwQy2vJ41WV32FmwMO7fL/hnU6Mh00ft610iSsUqXEpkq5uevf/oPV6lo/Cogb0zjknGL7kr9/qEE6N7LQbiRscdNUpEObA/wnu0I07f6iW06FYKcBWUqAhmUkp2ckbXGbNxsYLhnJ0r4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=TAcLVerw; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="TAcLVerw" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LNsfxtSYWj7yZaRzbGFekqdNorCiiJBwbG2lhydHM1Q=; b=TAcLVerw12B8zJP8Q69QG5w58dlJQGtHKPH3C/Rel0gMOEm59C6umgn7OQO4dhZFCPfG5J EuNSgbCYPjBSv6QJZJe0SGUl3fZM6OaR6yE5Zt2R3y27Vidsl6Pog7aRG/9Gus9KW8HMLy l3VoI3KlmsccwDl5WU5QRSi9jmyTjks= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng , Chen Ridong Subject: [PATCH v6 03/33] mm: rename unlock_page_lruvec_irq and its variants Date: Thu, 5 Mar 2026 19:52:21 +0800 Message-ID: <4e5e05271a250df4d1812e1832be65636a78c957.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song It is inappropriate to use folio_lruvec_lock() variants in conjunction with unlock_page_lruvec() variants, as this involves the inconsistent operation of locking a folio while unlocking a page. To rectify this, the functions unlock_page_lruvec{_irq, _irqrestore} are renamed to lruvec_unlock{_irq,_irqrestore}. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Reviewed-by: Chen Ridong Acked-by: David Hildenbrand (Red Hat) Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 10 +++++----- mm/compaction.c | 14 +++++++------- mm/huge_memory.c | 2 +- mm/mlock.c | 2 +- mm/swap.c | 12 ++++++------ mm/vmscan.c | 4 ++-- 6 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5695776f32c83..52b1d8f3942e1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1480,17 +1480,17 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } =20 -static inline void unlock_page_lruvec(struct lruvec *lruvec) +static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, +static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); @@ -1512,7 +1512,7 @@ static inline struct lruvec *folio_lruvec_relock_irq(= struct folio *folio, if (folio_matches_lruvec(folio, locked_lruvec)) return locked_lruvec; =20 - unlock_page_lruvec_irq(locked_lruvec); + lruvec_unlock_irq(locked_lruvec); } =20 return folio_lruvec_lock_irq(folio); @@ -1526,7 +1526,7 @@ static inline void folio_lruvec_relock_irqsave(struct= folio *folio, if (folio_matches_lruvec(folio, *lruvecp)) return; =20 - unlock_page_lruvec_irqrestore(*lruvecp, *flags); + lruvec_unlock_irqrestore(*lruvecp, *flags); } =20 *lruvecp =3D folio_lruvec_lock_irqsave(folio, flags); diff --git a/mm/compaction.c b/mm/compaction.c index 1e8f8eca318c6..c3e338aaa0ffb 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -913,7 +913,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, */ if (!(low_pfn % COMPACT_CLUSTER_MAX)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -964,7 +964,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, } /* for alloc_contig case */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1053,7 +1053,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, if (unlikely(page_has_movable_ops(page)) && !PageMovableOpsIsolated(page)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1158,7 +1158,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (lruvec !=3D locked) { if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); =20 compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); locked =3D lruvec; @@ -1226,7 +1226,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, isolate_fail_put: /* Avoid potential deadlock in freeing page under lru_lock */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } folio_put(folio); @@ -1242,7 +1242,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, */ if (nr_isolated) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } putback_movable_pages(&cc->migratepages); @@ -1274,7 +1274,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, =20 isolate_abort: if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); if (folio) { folio_set_lru(folio); folio_put(folio); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8003d3a498220..f6c0a86055bdc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3902,7 +3902,7 @@ static int __folio_freeze_and_split_unmapped(struct f= olio *folio, unsigned int n folio_ref_unfreeze(folio, folio_cache_ref_count(folio) + 1); =20 if (do_lru) - unlock_page_lruvec(lruvec); + lruvec_unlock(lruvec); =20 if (ci) swap_cluster_unlock(ci); diff --git a/mm/mlock.c b/mm/mlock.c index 2f699c3497a57..66740e16679c3 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -205,7 +205,7 @@ static void mlock_folio_batch(struct folio_batch *fbatc= h) } =20 if (lruvec) - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folios_put(fbatch); } =20 diff --git a/mm/swap.c b/mm/swap.c index bb19ccbece464..245ba159e01d7 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -91,7 +91,7 @@ static void page_cache_release(struct folio *folio) =20 __page_cache_release(folio, &lruvec, &flags); if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); } =20 void __folio_put(struct folio *folio) @@ -175,7 +175,7 @@ static void folio_batch_move_lru(struct folio_batch *fb= atch, move_fn_t move_fn) } =20 if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); folios_put(fbatch); } =20 @@ -349,7 +349,7 @@ void folio_activate(struct folio *folio) =20 lruvec =3D folio_lruvec_lock_irq(folio); lru_activate(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folio_set_lru(folio); } #endif @@ -963,7 +963,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) =20 if (folio_is_zone_device(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } if (folio_ref_sub_and_test(folio, nr_refs)) @@ -977,7 +977,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) /* hugetlb has its own memcg */ if (folio_test_hugetlb(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } free_huge_folio(folio); @@ -991,7 +991,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) j++; } if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); if (!j) { folio_batch_reinit(folios); return; diff --git a/mm/vmscan.c b/mm/vmscan.c index 7effd01a78287..223d584421a9e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1835,7 +1835,7 @@ bool folio_isolate_lru(struct folio *folio) folio_get(folio); lruvec =3D folio_lruvec_lock_irq(folio); lruvec_del_folio(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); ret =3D true; } =20 @@ -7861,7 +7861,7 @@ void check_move_unevictable_folios(struct folio_batch= *fbatch) if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); } else if (pgscanned) { count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DAE13845BF for ; Thu, 5 Mar 2026 11:54:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711687; cv=none; b=ZpVzjyGM3gESgYCdZVVLPuEA0s9/kz1K13S+Y/jYNMBoc/zxPE1gKwxH/IQfxvh8PTlX5wPEM5K7xQccelmmT5V3xLfenJDygoNgRwOkWYavRXUZ8hf6EPOrXcgHF4+ss2vSXWpiIvFcuAZjckLOM9bcv0nQVydRzcU7PKP271Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711687; c=relaxed/simple; bh=YA96uBcgzxv8MEMgFI9Uj+/dyOHo0qRd63pCEGsPq3M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lgbEwhpfLOrYFJfA/afRWaSEBm9MxIVlf5hlf+V9UhbKx1h2QzXoeLEQJtN/yyp717/gQAoe1wxhW4+xAYd5WcMDOSo7AxOitLFn/IWGaqAlWHm3D/5YO6pJl1NzXiaQksZkdrdccWWep1QsUHeaqC/xdJQhE/r3VVKFYQTc9fc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=tLqVajZa; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="tLqVajZa" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711684; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6HAmieQx96uUJHkqmiULZYRo0O0Tag2Ie6J1oo57wFA=; b=tLqVajZa0xN3lq32q2+2Y4z9uopzjIj0kne1M5U790g7TRavQL0V3m9ke0lTp/jRilYTmq QGM4MpTzczQ6P/FzEGVqDeKjKdRgV2qB0vEPjRi5TqlU5ip4lINogf03VoPWaITp1kM6/A xrgYdM62LA0CM3iFJjfGUhdkzQAxLUc= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng , Chen Ridong Subject: [PATCH v6 04/33] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Date: Thu, 5 Mar 2026 19:52:22 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Once we refactor move_folios_to_lru(), its callers will no longer have to hold the lruvec lock; For shrink_inactive_list(), shrink_active_list() and evict_folios(), IRQ disabling is only needed for __count_vm_events() and __mod_node_page_state(). To avoid using local_irq_disable() on the PREEMPT_RT kernel, let's make all callers of move_folios_to_lru() use IRQ-safed count_vm_events() and mod_node_page_state(). Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt Reviewed-by: Chen Ridong Reviewed-by: Harry Yoo Acked-by: Muchun Song --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 223d584421a9e..2a32dce8d8394 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2025,7 +2025,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), stat.nr_demoted); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item =3D PGSTEAL_KSWAPD + reclaimer_offset(sc); mod_lruvec_state(lruvec, item, nr_reclaimed); mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed); @@ -2171,10 +2171,10 @@ static void shrink_active_list(unsigned long nr_to_= scan, nr_activate =3D move_folios_to_lru(lruvec, &l_active); nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); =20 - __count_vm_events(PGDEACTIVATE, nr_deactivate); + count_vm_events(PGDEACTIVATE, nr_deactivate); count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); =20 - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); =20 lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F5E038E13C for ; Thu, 5 Mar 2026 11:54:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711697; cv=none; b=e+Sr2dBwi1GpAhXoK7f3434tOxF1omkS8y3UUWIT6Es1YkU5xDlKar+I/rqiVjDQts82Huwq6hPYVwB0/FthuXLQvaRD//GzbQexqfJu5g0MjadSPB1UTyf/GRHL0YvoyadwA2nNDLRxzD89Mp6mrwjhsJKNy4s28V1aAAZZ1DY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711697; c=relaxed/simple; bh=7cudV7OPQo1BWw8I/6sJZ7X0YTIEfXgMdcz1vLHkXyk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J/196GcEddMnmdnXH4Lf/v5kr6CfNZxkRlmVg7lbStNskGW1dq2LV3wcVE36rwJTi1wFsoX8cO93Ggy5XPizBqHwFvaWsEdbixwGy4LmzzdxMitgOSkSX3t5MIEVQlCgMmbViMEZEeI7tLxVWtj5hiZGGaz2VOJdu8r9JEdx8D4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ZCFQjPpy; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ZCFQjPpy" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=776a3yXyenvO+S9oFBjCtWS3YtcAt/IfBD8X6MP7Tkk=; b=ZCFQjPpy7n7KsEKOJibwAi4V8KKy9q02gRd4MMrZ1mYxNuY8XwwAUELFJH6x4XWdiH0BY4 DkGDOBg+cRt5GNhLvRFW/LYglR3Xs+5k2ZnGsM1gToBfBG6E6WpTTvTvXtEB1xX6vuB1ey pZBywQjZx8RntdcEM2xd6z9l8RKHfco= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 05/33] mm: vmscan: refactor move_folios_to_lru() Date: Thu, 5 Mar 2026 19:52:23 +0800 Message-ID: <6f1dac88b61e2e3cb7a3e90bacdf06b654acfc15.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In a subsequent patch, we'll reparent the LRU folios. The folios that are moved to the appropriate LRU list can undergo reparenting during the move_folios_to_lru() process. Hence, it's incorrect for the caller to hold a lruvec lock. Instead, we should utilize the more general interface of folio_lruvec_relock_irq() to obtain the correct lruvec lock. This patch involves only code refactoring and doesn't introduce any functional changes. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Acked-by: Shakeel Butt Reviewed-by: Harry Yoo --- mm/vmscan.c | 46 +++++++++++++++++++++------------------------- 1 file changed, 21 insertions(+), 25 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2a32dce8d8394..61303ec85d587 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1889,24 +1889,27 @@ static bool too_many_isolated(struct pglist_data *p= gdat, int file, /* * move_folios_to_lru() moves folios from private @list to appropriate LRU= list. * - * Returns the number of pages moved to the given lruvec. + * Returns the number of pages moved to the appropriate lruvec. + * + * Note: The caller must not hold any lruvec lock. */ -static unsigned int move_folios_to_lru(struct lruvec *lruvec, - struct list_head *list) +static unsigned int move_folios_to_lru(struct list_head *list) { int nr_pages, nr_moved =3D 0; + struct lruvec *lruvec =3D NULL; struct folio_batch free_folios; =20 folio_batch_init(&free_folios); while (!list_empty(list)) { struct folio *folio =3D lru_to_folio(list); =20 + lruvec =3D folio_lruvec_relock_irq(folio, lruvec); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); list_del(&folio->lru); if (unlikely(!folio_evictable(folio))) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); folio_putback_lru(folio); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; continue; } =20 @@ -1928,19 +1931,15 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, =20 folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) =3D=3D 0) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; } =20 continue; } =20 - /* - * All pages were isolated from the same lruvec (and isolation - * inhibits memcg migration). - */ VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); @@ -1949,11 +1948,12 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, workingset_age_nonresident(lruvec, nr_pages); } =20 + if (lruvec) + lruvec_unlock_irq(lruvec); + if (free_folios.nr) { - spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); } =20 return nr_moved; @@ -2020,8 +2020,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, nr_reclaimed =3D shrink_folio_list(&folio_list, pgdat, sc, &stat, false, lruvec_memcg(lruvec)); =20 - spin_lock_irq(&lruvec->lru_lock); - move_folios_to_lru(lruvec, &folio_list); + move_folios_to_lru(&folio_list); =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), stat.nr_demoted); @@ -2030,6 +2029,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, mod_lruvec_state(lruvec, item, nr_reclaimed); mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed); =20 + spin_lock_irq(&lruvec->lru_lock); lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); =20 @@ -2166,16 +2166,14 @@ static void shrink_active_list(unsigned long nr_to_= scan, /* * Move folios back to the lru list. */ - spin_lock_irq(&lruvec->lru_lock); - - nr_activate =3D move_folios_to_lru(lruvec, &l_active); - nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); + nr_activate =3D move_folios_to_lru(&l_active); + nr_deactivate =3D move_folios_to_lru(&l_inactive); =20 count_vm_events(PGDEACTIVATE, nr_deactivate); count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); =20 + spin_lock_irq(&lruvec->lru_lock); lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); @@ -4731,14 +4729,14 @@ static int evict_folios(unsigned long nr_to_scan, s= truct lruvec *lruvec, set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active)); } =20 - spin_lock_irq(&lruvec->lru_lock); - - move_folios_to_lru(lruvec, &list); + move_folios_to_lru(&list); =20 walk =3D current->reclaim_state->mm_walk; if (walk && walk->batched) { walk->lruvec =3D lruvec; + spin_lock_irq(&lruvec->lru_lock); reset_batch_size(walk); + spin_unlock_irq(&lruvec->lru_lock); } =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), @@ -4748,8 +4746,6 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, mod_lruvec_state(lruvec, item, reclaimed); mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed); =20 - spin_unlock_irq(&lruvec->lru_lock); - list_splice_init(&clean, &list); =20 if (!list_empty(&list)) { --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0636238424E for ; Thu, 5 Mar 2026 11:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711711; cv=none; b=t78VTLFLSCNfQJJwUgUYfocP6tGmnBeSPPfuy/K2EhLpA7QtQtBrHlytXMmDJ/nkMOA4uxKKhNfO3Ji6ImOXxQSVd14DI9PmKOf9meEX2IYvl1wBH6y6D9P96okEzXbIDJkDLd9uJPwzJVTg1MB1HTUcjU1W4HnThQjqtF8+y6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711711; c=relaxed/simple; bh=tMaQndRnT4oOZ2Yp7miWtxWcyLHnT6zABGLN4nrwwQw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nfEzjjYyj8WwHcfEcEo8pOtNLAGC7AcGqnPhMgbGuGuofqgyJ2plHC8HSm0or5394aFXVxlPZTEIyMWDBqWkDKWBZn8oB4ObBTlP3UK0dBWQaVJJ/22iQh44FwGIkogvSzNk6ftfs/erXSU4mNNsuKRKViaH63C7vP/1Fkxgx50= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=EHiZT0m1; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="EHiZT0m1" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n5rM3hQDqpUHUQyzUXFcLhuGKQoF0cAvo8SOoxm2ygg=; b=EHiZT0m1ECW8ox899R8iliXjZOJ/dD5ofGIxpwnahXjLK7nGEq2KPAT8hIpVKLSkfd3XMV 1zfY7VUYw1qKkg0kJQFPnHTzUzwtkcCH4LtE1a1F0+ZdN4Jv+Hvi0SnTsHO7YhmUjT403a 0vA3DqXIPQvYqmbULJwouIywxjhB0IY= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng , Chen Ridong Subject: [PATCH v6 06/33] mm: memcontrol: allocate object cgroup for non-kmem case Date: Thu, 5 Mar 2026 19:52:24 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song To allow LRU page reparenting, the objcg infrastructure is no longer solely applicable to the kmem case. In this patch, we extend the scope of the objcg infrastructure beyond the kmem case, enabling LRU folios to reuse it for folio charging purposes. It should be noted that LRU folios are not accounted for at the root level, yet the folio->memcg_data points to the root_mem_cgroup. Hence, the folio->memcg_data of LRU folios always points to a valid pointer. However, the root_mem_cgroup does not possess an object cgroup. Therefore, we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt Reviewed-by: Chen Ridong --- mm/memcontrol.c | 51 +++++++++++++++++++++++-------------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index aab863e1822d4..508ee182c032e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -207,10 +207,10 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); =20 objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); =20 @@ -3387,30 +3387,17 @@ void folio_split_memcg_refs(struct folio *folio, un= signed old_order, css_get_many(&__folio_memcg(folio)->css, new_refs); } =20 -static int memcg_online_kmem(struct mem_cgroup *memcg) +static void memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; - if (mem_cgroup_kmem_disabled()) - return 0; + return; =20 if (unlikely(mem_cgroup_is_root(memcg))) - return 0; - - objcg =3D obj_cgroup_alloc(); - if (!objcg) - return -ENOMEM; - - objcg->memcg =3D memcg; - rcu_assign_pointer(memcg->objcg, objcg); - obj_cgroup_get(objcg); - memcg->orig_objcg =3D objcg; + return; =20 static_branch_enable(&memcg_kmem_online_key); =20 memcg->kmemcg_id =3D memcg->id.id; - - return 0; } =20 static void memcg_offline_kmem(struct mem_cgroup *memcg) @@ -3425,12 +3412,6 @@ static void memcg_offline_kmem(struct mem_cgroup *me= mcg) =20 parent =3D parent_mem_cgroup(memcg); memcg_reparent_list_lrus(memcg, parent); - - /* - * Objcg's reparenting must be after list_lru's, make sure list_lru - * helpers won't use parent's list_lru until child is drained. - */ - memcg_reparent_objcgs(memcg, parent); } =20 #ifdef CONFIG_CGROUP_WRITEBACK @@ -3931,9 +3912,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + struct obj_cgroup *objcg; =20 - if (memcg_online_kmem(memcg)) - goto remove_id; + memcg_online_kmem(memcg); =20 /* * A memcg must be visible for expand_shrinker_info() @@ -3943,6 +3924,15 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) if (alloc_shrinker_info(memcg)) goto offline_kmem; =20 + objcg =3D obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg =3D memcg; + rcu_assign_pointer(memcg->objcg, objcg); + obj_cgroup_get(objcg); + memcg->orig_objcg =3D objcg; + if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_dfl_wq, &stats_flush_dwork, FLUSH_TIME); @@ -3965,9 +3955,10 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); =20 return 0; +free_shrinker: + free_shrinker_info(memcg); offline_kmem: memcg_offline_kmem(memcg); -remove_id: mem_cgroup_private_id_remove(memcg); return -ENOMEM; } @@ -3985,6 +3976,12 @@ static void mem_cgroup_css_offline(struct cgroup_sub= sys_state *css) =20 memcg_offline_kmem(memcg); reparent_deferred_split_queue(memcg); + /* + * The reparenting of objcg must be after the reparenting of the + * list_lru and deferred_split_queue above, which ensures that they will + * not mistakenly get the parent list_lru and deferred_split_queue. + */ + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CE28374191 for ; Thu, 5 Mar 2026 11:55:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711722; cv=none; b=nd/thNEp8imqQ54OkYwUJI8wKHFsmk6Qp8WZ1zEGmKju+plSUKV13IE+/U2D1myZsePafBaBtDlCEbrx8I8FhJJ4k9u66Gg9oUCCFpZwo8hseuIeRqK+l3lgIHiFjZMQNNCEcJ30vrfrSTov05a6WfgcDIcQftgaRapJx5NPOiQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711722; c=relaxed/simple; bh=YK+t5CTl8EPDUsoOw7ny5aygya8Rr8ynuaR2JM88IBc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JhovfHJofYP/6z6cOHrwmyPJoCtAWzJIpz2+2tDbnmsiNo2O+58SqGDMhFpzlQ4ghxscnvOkTR8SCAaG3vbTJxaoICdvdTOy7hPQ62QOapkU7I7KwaEqLq/i1PBj1B5KjjNTmiL9P81NW2CT8JQuGbVrnTktWbarojkrEhoCEwY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=G9yd0bJY; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="G9yd0bJY" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711719; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r3gH8AiiD/E862U1vIIy8+gSWs5lK6UcmktqNCNUwlo=; b=G9yd0bJYkANT5mpRufTilwpomjklsp+K616yc33Ot9NCkpPjKnwko/VPQyc2OPUkwUySPv e8C3Dx4abZrIyiaAP5f5yPyb1Ng0JG4ShmU2TDZ7W0/cycBFg8GDOFAn1NTWxuXarYubof nk2CernPtz5XauqcvXxSuGVD+3B5xWo= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 07/33] mm: memcontrol: return root object cgroup for root memory cgroup Date: Thu, 5 Mar 2026 19:52:25 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Memory cgroup functions such as get_mem_cgroup_from_folio() and get_mem_cgroup_from_mm() return a valid memory cgroup pointer, even for the root memory cgroup. In contrast, the situation for object cgroups has been different. Previously, the root object cgroup couldn't be returned because it didn't exist. Now that a valid root object cgroup exists, for the sake of consistency, it's necessary to align the behavior of object-cgroup-related operations with that of memory cgroup APIs. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt Reviewed-by: Harry Yoo --- include/linux/memcontrol.h | 26 +++++++++++++++++----- mm/memcontrol.c | 45 ++++++++++++++++++++------------------ mm/percpu.c | 2 +- 3 files changed, 45 insertions(+), 28 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 52b1d8f3942e1..f4b6158b77d8e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -333,6 +333,7 @@ struct mem_cgroup { #define MEMCG_CHARGE_BATCH 64U =20 extern struct mem_cgroup *root_mem_cgroup; +extern struct obj_cgroup *root_obj_cgroup; =20 enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ @@ -549,6 +550,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgrou= p *memcg) return (memcg =3D=3D root_mem_cgroup); } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return objcg =3D=3D root_obj_cgroup; +} + static inline bool mem_cgroup_disabled(void) { return !cgroup_subsys_enabled(memory_cgrp_subsys); @@ -775,23 +781,26 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_= subsys_state *css){ =20 static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg) { + if (obj_cgroup_is_root(objcg)) + return true; return percpu_ref_tryget(&objcg->refcnt); } =20 -static inline void obj_cgroup_get(struct obj_cgroup *objcg) +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, + unsigned long nr) { - percpu_ref_get(&objcg->refcnt); + if (!obj_cgroup_is_root(objcg)) + percpu_ref_get_many(&objcg->refcnt, nr); } =20 -static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, - unsigned long nr) +static inline void obj_cgroup_get(struct obj_cgroup *objcg) { - percpu_ref_get_many(&objcg->refcnt, nr); + obj_cgroup_get_many(objcg, 1); } =20 static inline void obj_cgroup_put(struct obj_cgroup *objcg) { - if (objcg) + if (objcg && !obj_cgroup_is_root(objcg)) percpu_ref_put(&objcg->refcnt); } =20 @@ -1088,6 +1097,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgr= oup *memcg) return true; } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return true; +} + static inline bool mem_cgroup_disabled(void) { return true; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 508ee182c032e..a60b692fb75a9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -84,6 +84,8 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; EXPORT_SYMBOL(root_mem_cgroup); =20 +struct obj_cgroup *root_obj_cgroup __read_mostly; + /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg); @@ -2740,15 +2742,14 @@ struct mem_cgroup *mem_cgroup_from_virt(void *p) =20 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *m= emcg) { - struct obj_cgroup *objcg =3D NULL; + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { + struct obj_cgroup *objcg =3D rcu_dereference(memcg->objcg); =20 - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { - objcg =3D rcu_dereference(memcg->objcg); if (likely(objcg && obj_cgroup_tryget(objcg))) - break; - objcg =3D NULL; + return objcg; } - return objcg; + + return NULL; } =20 static struct obj_cgroup *current_objcg_update(void) @@ -2822,18 +2823,17 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) * Objcg reference is kept by the task, so it's safe * to use the objcg by the current task. */ - return objcg; + return objcg ? : root_obj_cgroup; } =20 memcg =3D this_cpu_read(int_active_memcg); if (unlikely(memcg)) goto from_memcg; =20 - return NULL; + return root_obj_cgroup; =20 from_memcg: - objcg =3D NULL; - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { /* * Memcg pointer is protected by scope (see set_active_memcg()) * and is pinning the corresponding objcg, so objcg can't go @@ -2842,10 +2842,10 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) */ objcg =3D rcu_dereference_check(memcg->objcg, 1); if (likely(objcg)) - break; + return objcg; } =20 - return objcg; + return root_obj_cgroup; } =20 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) @@ -2859,14 +2859,8 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct = folio *folio) objcg =3D __folio_objcg(folio); obj_cgroup_get(objcg); } else { - struct mem_cgroup *memcg; - rcu_read_lock(); - memcg =3D __folio_memcg(folio); - if (memcg) - objcg =3D __get_obj_cgroup_from_memcg(memcg); - else - objcg =3D NULL; + objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); rcu_read_unlock(); } return objcg; @@ -2969,7 +2963,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t= gfp, int order) int ret =3D 0; =20 objcg =3D current_obj_cgroup(); - if (objcg) { + if (objcg && !obj_cgroup_is_root(objcg)) { ret =3D obj_cgroup_charge_pages(objcg, gfp, 1 << order); if (!ret) { obj_cgroup_get(objcg); @@ -3270,7 +3264,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *= s, struct list_lru *lru, * obj_cgroup_get() is used to get a permanent reference. */ objcg =3D current_obj_cgroup(); - if (!objcg) + if (!objcg || obj_cgroup_is_root(objcg)) return true; =20 /* @@ -3928,6 +3922,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) if (!objcg) goto free_shrinker; =20 + if (unlikely(mem_cgroup_is_root(memcg))) + root_obj_cgroup =3D objcg; + objcg->memcg =3D memcg; rcu_assign_pointer(memcg->objcg, objcg); obj_cgroup_get(objcg); @@ -5558,6 +5555,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg= , size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC)); =20 /* PF_MEMALLOC context, charging must succeed */ @@ -5587,6 +5587,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *obj= cg, size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + obj_cgroup_uncharge(objcg, size); =20 rcu_read_lock(); diff --git a/mm/percpu.c b/mm/percpu.c index a2107bdebf0b5..b0676b8054ed0 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1622,7 +1622,7 @@ static bool pcpu_memcg_pre_alloc_hook(size_t size, gf= p_t gfp, return true; =20 objcg =3D current_obj_cgroup(); - if (!objcg) + if (!objcg || obj_cgroup_is_root(objcg)) return true; =20 if (obj_cgroup_charge(objcg, gfp, pcpu_obj_full_size(size))) --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 807C5395DB8 for ; Thu, 5 Mar 2026 11:55:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711731; cv=none; b=UUzimalDVhKZP0X1uQf8FyKc46oN9yN3wva9W3fVxn5S4T+o7x5/k8ojEdFVM78QJlbbk7FZ5kMo8qyKUPHlKxUdYM4cm49Qmlxod2J8NjQLjqsXrlPQqIZ+0KZ+bvb81xBmOvIdpEmCd92woMR1dHWs1EnAqYfqhTvXqQ2yO0k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711731; c=relaxed/simple; bh=G0dihWnKm6y0QPypWwy2yh6BtGnXz6IdOR+8M2j0ejE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hia9Tg2KZXCEdQzvM/6Jlk18EWoLenq4rQ6mhKbl4kFukrBE3JtOh83DK/PAysvTmm5ipxT7vTyWvsedh6QmUWrHTIXoqVQvgjR82QGFDaBNicQdusubuuwjpgq3SWvewdDgX4fH4rtABZ5N4hMreAyMmK2j5dR66IV4kgig/i0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=bFsZqb75; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="bFsZqb75" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=98HTAgCgFMLsilH4QRkwk8a4UTEYNMYQTWZIQBuXIPY=; b=bFsZqb758HrKaEvWXtIxtvKoTg6QfwJ7pEZJpokSqrrOoLJ8VF9nif4svxs8qpkAxB4Y4Y rdw3fUGGmKpmai/wHzf85O2uTtFcrJ4xtujv6IcY2aosqGEHG2UO+xXiTuPGBskTdgfNuW 7c8/5gvkCqxiALRH8OnLSByUeqCnatU= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 08/33] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Date: Thu, 5 Mar 2026 19:52:26 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in get_mem_cgroup_from_folio(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Shakeel Butt --- mm/memcontrol.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a60b692fb75a9..4820919c0d219 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -997,14 +997,18 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 if (mem_cgroup_disabled()) return NULL; =20 + if (!folio_memcg_charged(folio)) + return root_mem_cgroup; + rcu_read_lock(); - if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg =3D root_mem_cgroup; + do { + memcg =3D folio_memcg(folio); + } while (unlikely(!css_tryget(&memcg->css))); rcu_read_unlock(); return memcg; } --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D265B3845BF for ; Thu, 5 Mar 2026 11:55:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711742; cv=none; b=hcrFLHzhQWk8qkyEXWWBaQ2vkRibx1P5c+694gjxBI5Qsih2UwL1bls3x01DO+N0wCbupSqRf1vaOxQ2Aot5xH5vCWyXuH6UiEJBCh90zjqkJQWADWThjQnbGx2aT9gKU/gZhQGUjfl2kGkwB3gjnv5r2+6cin7yb/7cdpIBi7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711742; c=relaxed/simple; bh=9c31522ZsrW1siZuLCpKFXj9Q1T2hnhqH1/dr7ntmNM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lXrJyD+GJ/KvoHbfGc2np3JJdm2CntcbkPmE2twUvkXl1ho3rr82cXrsUfPQ+F604rkE9O0w815jiNMVGw5s1gausCdN/arYHJpMPgFPSrRUeedIprNhkQUoiHaA/ImzUot2yhjNpWHWcyIYXOoX50qUgVQ5jyX5XMlKU20YKhc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VoyQbhDY; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VoyQbhDY" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/198v9MPJs+KJCrWZ0LWBhkz6feBqnUHTx73TsS3q/w=; b=VoyQbhDY1PB+RM6iyqZa3x0Jh2dPgk7SFfWgibufrQHOlTEslOuFJAu9GjifThbapQVtua GxytC/w0MWXOIowX12iVAw+ZNgiQLCSKsK/Hzk8l0BRxSt7qQuQ4Vtv/zD4rGjWM/S56PU xD1eYLKZAhEzrdgXKBHIOReFT3GAaA8= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 09/33] buffer: prevent memory cgroup release in folio_alloc_buffers() Date: Thu, 5 Mar 2026 19:52:27 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_from_folio() is employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- fs/buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 22b43642ba574..343c97eab9e57 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -922,8 +922,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, long offset; struct mem_cgroup *memcg, *old_memcg; =20 - /* The folio lock pins the memcg */ - memcg =3D folio_memcg(folio); + memcg =3D get_mem_cgroup_from_folio(folio); old_memcg =3D set_active_memcg(memcg); =20 head =3D NULL; @@ -944,6 +943,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, } out: set_active_memcg(old_memcg); + mem_cgroup_put(memcg); return head; /* * In case anything failed, we just free everything we got. --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B818317146 for ; Thu, 5 Mar 2026 11:55:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711755; cv=none; b=CK4xJcJl6/L/ztzrV5o3WZaLDLVtEocbPAG8vUVm8qebW4hXdsRFUJQ9VnJ53MsIRyJAvuQ0p4o95TffKKzEe25WaHqkkNi/KlI0gTrxWYhT8z9RFZ4wzlXY7j7V2AONWleuEWeY/KogtuQ86x7BxOkhLCQVldN/pntczgIqbxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711755; c=relaxed/simple; bh=laNpFwwUTwKxDQDGg8NTq8SOVWxm6/DK3O3vQ0TI5bE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XIsZG0g542b0qjky7Sdk39nwknbAPRVZoxJtmh4UsPD8fRABsYYJpRzqCPih7CnSngvE3XAxTintUJGSV8jgun8TE5BFYu3lh3wAx6XTPEFq7pD6HgN0/FF+Jq+5onGINveooKcAglqxDL03sbgGclXQOFAB3VTn4XvL1iO1UjY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Pb6WKb5m; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Pb6WKb5m" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7g4zlvI4KL5BM3hQhoVvcLwONpSLS6QDIXKeCKwliro=; b=Pb6WKb5mz/1V1WY0eot4/pZLnwSOkyFz2Lzw8N3G+lPEzpxbDxUTlbXrGgDlF2tztpheTp 93mVBUBBnl/qzFMK1rAbrS6RsSXQ4dyPaLwlGTNCXmU152hwx4J4C1yCAIohg+SU0p/nz5 Pw2uIUZUgDgYZnRJlE3XqFM4pliL338= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 10/33] writeback: prevent memory cgroup release in writeback module Date: Thu, 5 Mar 2026 19:52:28 +0800 Message-ID: <645f99bc344575417f67def3744f975596df2793.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_css_from_folio() and the rcu read lock are employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- fs/fs-writeback.c | 22 +++++++++++----------- include/linux/memcontrol.h | 9 +++++++-- include/trace/events/writeback.h | 3 +++ mm/memcontrol.c | 14 ++++++++------ 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 7c75ed7e89799..c3442a38450ca 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -280,15 +280,13 @@ void __inode_attach_wb(struct inode *inode, struct fo= lio *folio) if (inode_cgwb_enabled(inode)) { struct cgroup_subsys_state *memcg_css; =20 - if (folio) { - memcg_css =3D mem_cgroup_css_from_folio(folio); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - } else { - /* must pin memcg_css, see wb_get_create() */ + /* must pin memcg_css, see wb_get_create() */ + if (folio) + memcg_css =3D get_mem_cgroup_css_from_folio(folio); + else memcg_css =3D task_get_css(current, memory_cgrp_id); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - css_put(memcg_css); - } + wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); + css_put(memcg_css); } =20 if (!wb) @@ -979,16 +977,16 @@ void wbc_account_cgroup_owner(struct writeback_contro= l *wbc, struct folio *folio if (!wbc->wb || wbc->no_cgroup_owner) return; =20 - css =3D mem_cgroup_css_from_folio(folio); + css =3D get_mem_cgroup_css_from_folio(folio); /* dead cgroups shouldn't contribute to inode ownership arbitration */ if (!css_is_online(css)) - return; + goto out; =20 id =3D css->id; =20 if (id =3D=3D wbc->wb_id) { wbc->wb_bytes +=3D bytes; - return; + goto out; } =20 if (id =3D=3D wbc->wb_lcand_id) @@ -1001,6 +999,8 @@ void wbc_account_cgroup_owner(struct writeback_control= *wbc, struct folio *folio wbc->wb_tcand_bytes +=3D bytes; else wbc->wb_tcand_bytes -=3D min(bytes, wbc->wb_tcand_bytes); +out: + css_put(css); } EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); =20 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f4b6158b77d8e..20d38262b984b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -895,7 +895,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return match; } =20 -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio); +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio); ino_t page_cgroup_ino(struct page *page); =20 static inline bool mem_cgroup_online(struct mem_cgroup *memcg) @@ -1564,9 +1564,14 @@ static inline void mem_cgroup_track_foreign_dirty(st= ruct folio *folio, if (mem_cgroup_disabled()) return; =20 + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); memcg =3D folio_memcg(folio); - if (unlikely(memcg && &memcg->css !=3D wb->memcg_css)) + if (unlikely(&memcg->css !=3D wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(folio, wb); + rcu_read_unlock(); } =20 void mem_cgroup_flush_foreign(struct bdi_writeback *wb); diff --git a/include/trace/events/writeback.h b/include/trace/events/writeb= ack.h index 4d3d8c8f3a1bc..b849b8cc96b1e 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -294,7 +294,10 @@ TRACE_EVENT(track_foreign_dirty, __entry->ino =3D inode ? inode->i_ino : 0; __entry->memcg_id =3D wb->memcg_css->id; __entry->cgroup_ino =3D __trace_wb_assign_cgroup(wb); + + rcu_read_lock(); __entry->page_cgroup_ino =3D cgroup_ino(folio_memcg(folio)->css.cgroup); + rcu_read_unlock(); ), =20 TP_printk("bdi %s[%llu]: ino=3D%lu memcg_id=3D%u cgroup_ino=3D%lu page_cg= roup_ino=3D%lu", diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4820919c0d219..a4bb8b8b2c457 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -244,7 +244,7 @@ DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key); EXPORT_SYMBOL(memcg_bpf_enabled_key); =20 /** - * mem_cgroup_css_from_folio - css of the memcg associated with a folio + * get_mem_cgroup_css_from_folio - acquire a css of the memcg associated w= ith a folio * @folio: folio of interest * * If memcg is bound to the default hierarchy, css of the memcg associated @@ -254,14 +254,16 @@ EXPORT_SYMBOL(memcg_bpf_enabled_key); * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup * is returned. */ -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio) +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) - memcg =3D root_mem_cgroup; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return &root_mem_cgroup->css; =20 - return &memcg->css; + memcg =3D get_mem_cgroup_from_folio(folio); + + return memcg ? &memcg->css : &root_mem_cgroup->css; } =20 /** --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65C0C1E0DE8 for ; Thu, 5 Mar 2026 11:56:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711765; cv=none; b=EF1tXr/M31v8C5lOo4j3mWMlU8yD91lHsbyh1wAVipBSDyGtYFqrKOgR4BMqBF9/NQcQAQ6Mz8SD+JbDLjm8N7l2Vp3S2JQWgx7XaDOy6noVF/nx+UB667AHWQXL8GbVi3osrA5mvuooMHHVLMZ3eTZhpTll6P9sg3/01dHPP5s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711765; c=relaxed/simple; bh=5bUKBc4771QU23iCzBAa0sVt8S5IWikOPFOBJLwNYpA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZTmtt5QPuMfqkbCVZayxgbpGteD2+Ck2mZNgVN8OXGnykRuHpjgHVSmFRViYwwwjbEcl7YmYxsZaMnodPMAcOpYLT1wBD1iHB7VtzJS/riryNWCqzrfmuXZSphQDAUAcsMJeBBAFi2T8gh7uKCGMlFVFkgCr4lAOrXJl9GyUJu0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Sp5oK5oF; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Sp5oK5oF" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711762; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uFugfqe6/ZOVEb7BRYdC4hEvLiQEcyW+5o69Lqfswgk=; b=Sp5oK5oFjJ4WgSWliPpVc97gKNr6NqbPwLiwIZw2Z5SdXlWR5qqaGTuz0QH15VeaTPFeTL DhNfLnKGRBoCG/NKALOxn+6rpvwZ+0MFsAYZK5mRe2jhWsni/vOgmp0glTUg1xHKOXNwk2 OXk+yVV6975l6Mvz8Y7qKdWdSI/kOaA= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 11/33] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Date: Thu, 5 Mar 2026 19:52:29 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in count_memcg_folio_events(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20d38262b984b..0fdfb8044458e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -976,10 +976,15 @@ void count_memcg_events(struct mem_cgroup *memcg, enu= m vm_event_item idx, static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (memcg) - count_memcg_events(memcg, idx, nr); + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); + memcg =3D folio_memcg(folio); + count_memcg_events(memcg, idx, nr); + rcu_read_unlock(); } =20 static inline void count_memcg_events_mm(struct mm_struct *mm, --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C092340D91; Thu, 5 Mar 2026 11:56:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711775; cv=none; b=C5oD9Wo/plpfEBMuKAT6H3XpaaMMh/KVuSuUvDpO0+LcnYs/DNDPHCbGP0K7cfhbKq6iN3XrZQamlFsQ7BOtgwM22G8Bad3YkGnh/1bxdYpG8TlY8mjb/SuosiVhCHIjg0nYr+h3nJSpYLtKDaVIkcGxOY9ozPT07I3IG+AAf+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711775; c=relaxed/simple; bh=7BX7FySXEpJHlQa6BXTxiXtffFu7ZL8irUZf80TSL0s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EirKJIgdrUDIcGbGjgpiG1xbshWp4eYiRqs3MvGjlGCeZKyZzdq334m+gHXKazfWr4y5J19kTfsu88+ywseziBfgzPaCw7Tq2wlJ4i8DRBn0yhPd57AZTNoGiPeje31Pm24k4YOzF/7/22Dfwt+SPqgQKaDFkFrkMUgRh5DlAFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vDIj4tca; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vDIj4tca" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711771; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s82tZFT3iNsr3WMx/4d8Ml9gjp1Ep9qgTy0gWjRVe+I=; b=vDIj4tcaixUxSqZBHWF4+YL7Up0TeXQdEYFI7em4v5qbgKuDZOuC1XTD7ewfFUcwVeAIMm TLAowdrnhkx6Q49+Fxgbj2oZ6OvyEkZ/8UJC8z7FIaT5wLfoJvdSVUKOGiXIbMEmGgKLxN BNvDvDiql5zp4WavFRG9F9e4VcBmnVg= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 12/33] mm: page_io: prevent memory cgroup release in page_io module Date: Thu, 5 Mar 2026 19:52:30 +0800 Message-ID: <7c3708358412fb02c482d0985feb5e9513a863ef.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in swap_writeout() and bio_associate_blkg_from_page(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/page_io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index a2c034660c805..63b262f4c5a9b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -276,10 +276,14 @@ int swap_writeout(struct folio *folio, struct swap_io= cb **swap_plug) count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); goto out_unlock; } + + rcu_read_lock(); if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) { + rcu_read_unlock(); folio_mark_dirty(folio); return AOP_WRITEPAGE_ACTIVATE; } + rcu_read_unlock(); =20 __swap_writepage(folio, swap_plug); return 0; @@ -307,11 +311,11 @@ static void bio_associate_blkg_from_page(struct bio *= bio, struct folio *folio) struct cgroup_subsys_state *css; struct mem_cgroup *memcg; =20 - memcg =3D folio_memcg(folio); - if (!memcg) + if (!folio_memcg_charged(folio)) return; =20 rcu_read_lock(); + memcg =3D folio_memcg(folio); css =3D cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); rcu_read_unlock(); --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF5643947BC for ; Thu, 5 Mar 2026 11:56:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711784; cv=none; b=gnT7cS1z5e5k/YtEDXOQEpp5MzEc3qSaYWyWCdkZOJCwRCdJvbXs3cjTWYczQIyJC4XeZt+ZDtvVpkeEE+Aow1sqEJ8/PbrpOjuIKZsExupJf4HDbmjGh1T2qV9OC8NeGVEL2LVJrvm6PSKWCgtmHQuCV5TCvcEGSR4ab0i7Gbw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711784; c=relaxed/simple; bh=jnH4Rpx9d7OsmMtOIIwLWbBWcTx9HUujpy4PZ4gm5HQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aePpJENhvMmyuCPXZcGIvbvZc/ha39lPDixqcPJWCbXDiZPyjeIYRfNyElFnF28k33C6S4xeJKlGChry0H+zMRHlHUJFPKeorfj1cQYrqrMKWJ+QrEkEf6k1T5coa1WCRYNAh1L66mmmDlg0olMWG4renv/PaDdsb5N/kSOnnXA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=FxYOARU5; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="FxYOARU5" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711780; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8hyqbb8F2NRNHHFp77Ms/+jViunOCmw3TEFzht4PTD4=; b=FxYOARU5x0g+Zna+Ay1ysDbQ1ntMsGhY8lLL89aloYuqaM4oNBrujP2pa/q0wx6LwQTaci fEFYT5Sp4lq00AepmhrB0Hut0n4GXtU8qSaP1MKimCke1uJOslo9LqbnfjQJFz5e6dfAeb K7bYbN5wfpK5v5eB3OUoXcoLkaNQKDE= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 13/33] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Date: Thu, 5 Mar 2026 19:52:31 +0800 Message-ID: <0f156c2f1188f256855617953f8305f43e066065.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In __folio_migrate_mapping(), the rcu read lock is employed to safeguard against the release of the memory cgroup in folio_migrate_mapping(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/migrate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index 6cc654858da65..27e6103037c29 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -672,6 +672,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; =20 + rcu_read_lock(); memcg =3D folio_memcg(folio); old_lruvec =3D mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec =3D mem_cgroup_lruvec(memcg, newzone->zone_pgdat); @@ -699,6 +700,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } + rcu_read_unlock(); } local_irq_enable(); =20 --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 386D8396D30 for ; Thu, 5 Mar 2026 11:56:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711794; cv=none; b=QrRE54CoOc9CS8jw+4OaHdfdEa5dxXVF5Nq0aDCnECS3HQ3An8okA0Og+AFgN33DhX5zsXRnarbVU18yNChVojq85h1Iz28QRtfdJgynxq3YchaZFN9HQ6V3WqgtoW80SfzjDbhJAygNN0JmcrIQQjLO/qQARLOq/76WPgHpIZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711794; c=relaxed/simple; bh=09h+jdEV69JyhGkg3G43NcjR1oD6XyLBM6oBs4yRPiI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r3F/cdgGosS2sxcg/fOZ4fRSrwhhXL0fCPET2CklWJO9w35DQjF0Xsazs0Of7THPuf0qYijlnBxD8Wl9g2ByjXCzHWsblgHOMeefmkBAFKwCMa7c67XpbLRP+4TjLDoQEB5M7DLinsG0okGvQ2zr3uC+2FVLhj5vic6OV876Hx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=aYY6BFEh; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="aYY6BFEh" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711789; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=50It2KWkplJCk5rfNOAinOMdF3qOxn822zHuSg5yVbw=; b=aYY6BFEhf3mRI3VEazB541/h+Ze1kRL1T3Rkx2QrL6z4PUiW/zGnHy3PoFZBmvGLsO0m3n n1ulPJkbmUdk9NdpCNGrTnTk69ATRWXhf8K/4eKeCiyoxOVjm91Y6ghwuFwnSmjwthd/Iy kOTAGa1xFjCZIWPSMsCLoL0zwIKrHWI= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 14/33] mm: mglru: prevent memory cgroup release in mglru Date: Thu, 5 Mar 2026 19:52:32 +0800 Message-ID: <9d887662a9d39c425742dd8468e3123316bccfe3.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mglru. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Shakeel Butt Reviewed-by: Harry Yoo --- mm/vmscan.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 61303ec85d587..024ff870b1a03 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3443,8 +3443,10 @@ static struct folio *get_pfn_folio(unsigned long pfn= , struct mem_cgroup *memcg, if (folio_nid(folio) !=3D pgdat->node_id) return NULL; =20 + rcu_read_lock(); if (folio_memcg(folio) !=3D memcg) - return NULL; + folio =3D NULL; + rcu_read_unlock(); =20 return folio; } @@ -4202,12 +4204,12 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) unsigned long addr =3D pvmw->address; struct vm_area_struct *vma =3D pvmw->vma; struct folio *folio =3D pfn_folio(pvmw->pfn); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); - struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); - DEFINE_MAX_SEQ(lruvec); - int gen =3D lru_gen_from_seq(max_seq); + struct lruvec *lruvec; + struct lru_gen_mm_state *mm_state; + unsigned long max_seq; + int gen; =20 lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); @@ -4242,6 +4244,12 @@ bool lru_gen_look_around(struct page_vma_mapped_walk= *pvmw) } } =20 + memcg =3D get_mem_cgroup_from_folio(folio); + lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + max_seq =3D READ_ONCE((lruvec)->lrugen.max_seq); + gen =3D lru_gen_from_seq(max_seq); + mm_state =3D get_mm_state(lruvec); + lazy_mmu_mode_enable(); =20 pte -=3D (addr - start) / PAGE_SIZE; @@ -4282,6 +4290,8 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); =20 + mem_cgroup_put(memcg); + return true; } =20 --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD11B366576 for ; Thu, 5 Mar 2026 11:56:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711808; cv=none; b=GsTmXxpZqbMiE7+7PUXU/dgO9R3GPHNJ673/cXvpqjkCX7iXrdxIlmFKoi3BD4Dcbxc40/ymL5/KofrEdtI6iM7ePARbyL4qBYkzuggW++fgr+DR/WO7+GVlbvbkfcC/HvXUVuFpZSuLJdEF0xT0C1Es75+n5ON6OLWUcMapSO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711808; c=relaxed/simple; bh=d++7Mji7sslSIpALnZvyQ3DoWN99uHX/f4AP2kZE3r8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jezd2KZRPNkYkZjQoDk0PSnuq0ErxFa4zcP3Enjqt2PWG5/UigVzKa8Ah/HDBaZGhekY8oASNQ1Rc9nxevCfv5u/Z0G05Wnjo1ux+RXExHfsS5aNogIA2ZENhP2j1auaQuuokQj8GOPD6DHsLLvUuybf2sDSh746eIRkiPqMcE4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=rFyAWsFw; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="rFyAWsFw" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=evy5ffpxH1pOU6KU8S5DoExZmuc8IAjbV0oiS0M36hM=; b=rFyAWsFwp6ftMmtwWBjFPZuuy+ovo7+R0luaziVgpgjr5WTaq8+zWKMGUQzhQlZfUTq3Ek gEz5fQWKGv5h+FfKYvTOMlmL4usXczqwQMgmQxXj1wwI9eoEvsMGqEZREUbRjLpCLamjX1 zhtW0Qi/5QgfqLC47w3RZu4F4drXcPI= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 15/33] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Date: Thu, 5 Mar 2026 19:52:33 +0800 Message-ID: <21d1abab7342615745ea4c18a88237335ab44d13.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mem_cgroup_swap_full(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/memcontrol.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a4bb8b8b2c457..063fdfdd58223 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5345,27 +5345,29 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup= *memcg) bool mem_cgroup_swap_full(struct folio *folio) { struct mem_cgroup *memcg; + bool ret =3D false; =20 VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); =20 if (vm_swap_full()) return true; - if (do_memsw_account()) - return false; + if (do_memsw_account() || !folio_memcg_charged(folio)) + return ret; =20 + rcu_read_lock(); memcg =3D folio_memcg(folio); - if (!memcg) - return false; - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { unsigned long usage =3D page_counter_read(&memcg->swap); =20 if (usage * 2 >=3D READ_ONCE(memcg->swap.high) || - usage * 2 >=3D READ_ONCE(memcg->swap.max)) - return true; + usage * 2 >=3D READ_ONCE(memcg->swap.max)) { + ret =3D true; + break; + } } + rcu_read_unlock(); =20 - return false; + return ret; } =20 static int __init setup_swap_account(char *s) --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EB9B4207A for ; Thu, 5 Mar 2026 11:56:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711817; cv=none; b=M+5ShKRwoWoakQFDoQXVHNZdlx2SPvphzG+2/+URDp9otKqwhXUZaTdFe53l1AfGt8nBZQGdJb8p5DAdVhXrLVsrVP1pM9G4U97XmA9Gb6nQgCsmRw0Q4f2jXWuhpfUmLtD/Ad48DoY4Cp+PyDvMGvyYv+wTp+XW6ktDJj30guo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711817; c=relaxed/simple; bh=o93nMP7jIhsr634EbTQj/Xvx2WphgmVbqrbjgWYGleQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NpbCA26BjJtHyOL9QEDlNHXZ8ptXTK3ILYyGneNN1XN0N+pFFtKWOYecz7o/vwY1xXY/gGtd3rjMExMBzMne2zC2qsXJfGL9qRt6+6WzGe6KB4sAaSKpeo+3Lk3oFfrT7hmnYZJ2nB8pPy5gR0PBYnxWKermE1pJFi/JHM5x8ig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=fe+nMrSJ; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="fe+nMrSJ" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711814; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZbG23+zJ5fPDETZsA4JYiiScdZSpnAwzCU/omjW9+7Y=; b=fe+nMrSJbiyNEjT8QnfP2jiU7zSCtJUFDhUkZ6xW8MCf6wGdRSGgSko4Kwhc4zPQBedWa5 DmrzF7ubIzkhASX91nLHrdY9b0eWVYN48Ccc0XZJ8LXDN4O7BgEnue4o3MoZ2JmNDreaDv bQfDWzg57Ggq6hClfYxiaK4sGUfw26M= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 16/33] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Date: Thu, 5 Mar 2026 19:52:34 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in lru_gen_eviction(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/workingset.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 5e8b6e62a6175..6971aa163e466 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -244,12 +244,15 @@ static void *lru_gen_eviction(struct folio *folio) int refs =3D folio_lru_refs(folio); bool workingset =3D folio_test_workingset(folio); int tier =3D lru_tier_from_refs(refs, workingset); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); + unsigned short memcg_id; =20 BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > BITS_PER_LONG - max(EVICTION_SHIFT, EVICTION_SHIFT_ANON)); =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); lruvec =3D mem_cgroup_lruvec(memcg, pgdat); lrugen =3D &lruvec->lrugen; min_seq =3D READ_ONCE(lrugen->min_seq[type]); @@ -257,8 +260,10 @@ static void *lru_gen_eviction(struct folio *folio) =20 hist =3D lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + memcg_id =3D mem_cgroup_private_id(memcg); + rcu_read_unlock(); =20 - return pack_shadow(mem_cgroup_private_id(memcg), pgdat, token, workingset= , type); + return pack_shadow(memcg_id, pgdat, token, workingset, type); } =20 /* --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7412396D35 for ; Thu, 5 Mar 2026 11:57:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711839; cv=none; b=NrqYAmeYxF9TG+XICdEYj++Y5Jq5sZMjTc2wbQhkf3rKlZe+q1BKYb5/GWAheoOSWIy7F4YjVY2p8dDwcYb8SWBJ0YL8R6dnOBzsyTGYaKerVfspDsA0yshJiCxEN0wE9tnGT0ZbFZRURT4mJRmqP1soZexAK4ZTIVBKJXnLEVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711839; c=relaxed/simple; bh=UcvEXyOAO8ytx6gVQoTdCNbrrABjtlHmCjWfboxYXdw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dl4k11hobxq5NGzlNuy2X6yuB8xmPD+g7Cpi84aT7i1PMOM1/pNSfPU3bRoOJ/XQyxWdEfqC789lWnelNq3n0thHGr9IdH3zeE15zXyV2RN+JqEbaaef5WPkej8lv1oTjLYq7MEW44QCbk5+Hhs+W3FJQlzARZiuBZNkLU5vsu0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=mDHzNUEq; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="mDHzNUEq" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MMHHbOrsvMEf7fKsOGegfWk7YRVFbtR0jAiLvTiMweo=; b=mDHzNUEqAQqp7SL0gmqequ7wqLe6seQrDyt4SxyM3fHn/DvPTQvI5jYPRoQYMnXXn+LlcW wmz1cq6L2WlD5784mqCaAOhK+e8AJDvNjI5x1EIKMdRyQDa/cS71gR+Lr6mLQ9MVqyXQXs 07pwy8XxAKhcCGpQBwjvKf2U6ifeoWw= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 17/33] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Date: Thu, 5 Mar 2026 19:52:35 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in folio_split_queue_lock{_irqsave}(). Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: David Hildenbrand (Red Hat) Acked-by: Muchun Song --- mm/huge_memory.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f6c0a86055bdc..56db54fa48181 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1157,13 +1157,29 @@ split_queue_lock_irqsave(int nid, struct mem_cgroup= *memcg, unsigned long *flags =20 static struct deferred_split *folio_split_queue_lock(struct folio *folio) { - return split_queue_lock(folio_nid(folio), folio_memcg(folio)); + struct deferred_split *queue; + + rcu_read_lock(); + queue =3D split_queue_lock(folio_nid(folio), folio_memcg(folio)); + /* + * The memcg destruction path is acquiring the split queue lock for + * reparenting. Once you have it locked, it's safe to drop the rcu lock. + */ + rcu_read_unlock(); + + return queue; } =20 static struct deferred_split * folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) { - return split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), fla= gs); + struct deferred_split *queue; + + rcu_read_lock(); + queue =3D split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), = flags); + rcu_read_unlock(); + + return queue; } =20 static inline void split_queue_unlock(struct deferred_split *queue) --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07F0B39A7FC for ; Thu, 5 Mar 2026 11:57:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711840; cv=none; b=lRlwhU6ZpHaBvSunYHgbIMp8F3zRVBzOk0VdLqevMcd5Zyu3lTaH9orRCnFWu77SV/8Gzx7/M+52l4IbZYSt2YEbMrkE27n12SuXKuDhFmoUxRLPBHdWHutN6/7sUUWfdVTXPToFC/A5ZJy80IuJLid2IXao5RLUIR3ebiew8ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711840; c=relaxed/simple; bh=foPF5gf9399mlEYKh0nP2hLyaTs4UeimkKcCKOYDhM8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mrXJdyQKUdp1ydx7eLoC2xoQRvU/BzhkLE5aOn6CxvLDN14QVz0j9GiMd892L6vGcj+kk3UnrBd0p3bMyn+whxJLBg1B7wX6lNAbW0mztGldknNIR51w7J8To6Lwdj8bMGPWK0230swQl+wb16ku57WqnaJh8mjvWwSJesezgVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=s3v1iK8W; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="s3v1iK8W" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711836; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MuGdh3NsMuiZscvyzRC4/MBSUBTvm1soytMKG+347XA=; b=s3v1iK8W+2/lyiTlyMS3zxFj3WgIlRYhe5H+s0QFT0llX2t+stJ7v/BZUrvc7S0Nhlw+EV 9IpuZYgKVcG3RvQ0aect08xUayoWd8gLAKDQoOHod9upjfloUsBuJAYVVXJw8cV/URSRVE pdfe0Qfpo6AHi0UluFFEJceymQL/VzM= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 18/33] mm: zswap: prevent memory cgroup release in zswap_compress() Date: Thu, 5 Mar 2026 19:52:36 +0800 Message-ID: <340f315050fb8a67caaf01b4836d4f38a41cf1a8.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in zswap_compress(). Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Muchun Song Reviewed-by: Harry Yoo --- mm/zswap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index a399f7a108304..fb525874a1b6b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -893,11 +893,14 @@ static bool zswap_compress(struct page *page, struct = zswap_entry *entry, * to the active LRU list in the case. */ if (comp_ret || !dlen || dlen >=3D PAGE_SIZE) { + rcu_read_lock(); if (!mem_cgroup_zswap_writeback_enabled( folio_memcg(page_folio(page)))) { + rcu_read_unlock(); comp_ret =3D comp_ret ? comp_ret : -EINVAL; goto unlock; } + rcu_read_unlock(); comp_ret =3D 0; dlen =3D PAGE_SIZE; dst =3D kmap_local_page(page); --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D7DC37B406 for ; Thu, 5 Mar 2026 11:57:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711849; cv=none; b=vGXXsQwX8RCpBHARg86w4SQ7f2OkL1RU5C/mJP9ftrmbWa7DxgPg1i8FoyzLF41Lw+8PQPqAbuTlEFLkDpbajjRRxDnNGtFk089LIygVJRbKIL1qTUd6vSPJXuIycY8miw2iz9lHDSgpjEtnAU49wq3wOlSX8/3eJ6LIKR7XTq8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711849; c=relaxed/simple; bh=bjPjSCY1jeT+Ai30FFIgq0m+z5z98B08y7d45jjzYqU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t2Hgd3Fc/a8mAThQX/M2vilS3fcdHbBGrETe+EmJbwSYHqengFJaBvqlNnEuKWWcm87BYBmwImfKHY3TFVjAqSDnajkt5bebUrBQBBt01ml9dmHJx2dBnIssglecYCBZfzgkKJHQEzn0iFjokmsXcmdFSkGzYvZcUH+A1LQDMG0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=O0ym4ixk; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="O0ym4ixk" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711846; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ai/SdLNl7tLi9uYAWt04inUE82NVSwYAydx9AFMCLgs=; b=O0ym4ixk8FzaJ7HFtYmOOYBLKfjfYWDjCZjhkK2LYTeSQPb8f0NOUV2UvbOXWY0yjhQU+4 BptwqbI4fRWd2u+PtjvzfSzHtuyIz0GL5dgQyo0RGLo4BuSfGeEtMHogBq4uNW1m2F5FG+ jVuMqQaOq0J8WoGdp/eKdWwBrWbjSLE= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 19/33] mm: workingset: prevent lruvec release in workingset_refault() Date: Thu, 5 Mar 2026 19:52:37 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_refault(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Shakeel Butt --- mm/workingset.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 6971aa163e466..2de2a355f0f86 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -546,6 +546,7 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, void workingset_refault(struct folio *folio, void *shadow) { bool file =3D folio_is_file_lru(folio); + struct mem_cgroup *memcg; struct lruvec *lruvec; bool workingset; long nr; @@ -567,11 +568,12 @@ void workingset_refault(struct folio *folio, void *sh= adow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); - lruvec =3D folio_lruvec(folio); + memcg =3D get_mem_cgroup_from_folio(folio); + lruvec =3D mem_cgroup_lruvec(memcg, folio_pgdat(folio)); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) - return; + goto out; =20 folio_set_active(folio); workingset_age_nonresident(lruvec, nr); @@ -587,6 +589,8 @@ void workingset_refault(struct folio *folio, void *shad= ow) lru_note_cost_refault(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr); } +out: + mem_cgroup_put(memcg); } =20 /** --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91FBA38424E; Thu, 5 Mar 2026 11:57:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711860; cv=none; b=g9MmDwHvPvDg0yvitr4vRmzSk2t4Rh8y/WmdrnaRJbSpBGHp892kRx/StMbUa6HEWvoewqHEVjeK3bXmG3ahM1IqpMQjIyJO10rl4wSnwk1m6dpkMVM6Bvjj0EOCIAKI6TymknRc1G3ghlwdAgqGaWrLqPirHt6Y0Hcy8qCz1ew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711860; c=relaxed/simple; bh=dR/vbyqOb8a7p0boP0y2ACe1ICZlx8mPQUnTtGDzDgg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bDx2jxVXqcZKmwVp3kVk+A+y5r/63NTH7XKsztKoN//xVh4BktnIRafPDGMk/5JVMB+vohF6Ayno2QCm/a6gH5X13umdotifraP2E9O78VZ+mSegRybXWWEQ14c2wmzzl1LVNuqdDGzX8lws/o7j95m6dscEpeyxtjplNJohrpc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=D5G9LhV9; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="D5G9LhV9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jmuZGex7AAaN+LnMc+A6Q6xc1gQPncq5PZ4nPQh5Rck=; b=D5G9LhV9i9D+g8zOKEs6s94sZHQteuhZZQvtZYE5qwK9GZ7kfNcLpYygIw0aL1xzHzwQ7H qo3NIicbH0E9eBQhnms0uu/Ctl6xOdp5PuA/td0lZLaXnbWyCF1loex70Eb/PwO0pqB2b6 D5pea089i0q7vev0/I3J2Cj7sSfCois= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Nhat Pham , Chengming Zhou , Qi Zheng Subject: [PATCH v6 20/33] mm: zswap: prevent lruvec release in zswap_folio_swapin() Date: Thu, 5 Mar 2026 19:52:38 +0800 Message-ID: <02b3f76ee8d1132f69ac5baaedce38fb82b09a48.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in zswap_folio_swapin(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Acked-by: Nhat Pham Reviewed-by: Chengming Zhou Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/zswap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index fb525874a1b6b..bdd24430f6ff2 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -664,8 +664,10 @@ void zswap_folio_swapin(struct folio *folio) struct lruvec *lruvec; =20 if (folio) { + rcu_read_lock(); lruvec =3D folio_lruvec(folio); atomic_long_inc(&lruvec->zswap_lruvec_state.nr_disk_swapins); + rcu_read_unlock(); } } =20 --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34111397694 for ; Thu, 5 Mar 2026 11:57:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711871; cv=none; b=CBJHpLGybDXmtRmr5dSwDnBGBW6fPshIYFGOxhiCTbDd5jScABB5qVo2CA6CINBdNjbjkY06R0r0H1oo0S7GSeReuMiiWg2sHudwLq0WiUzzJVs6DQDsYtCo7I5AbOV1GYokB7c0T9qSsBRFtgZvcdKOnpbj5DLARUkyN+10fUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711871; c=relaxed/simple; bh=6p1rTQSn/szsvXfKM1Mr1/YwGhM74koqcntWzctOIok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lk6Z/+5109Io21+kWdzLV8az47Z9vGj3fZUlNYiMyKPvTfcfUZN++UdBubj8TQ0oB0/oLxt9PvnqttYiK6FROKQPQopVsYkWbUea7vyIAMwJj55vigWOj7E/BlBiKAYpHRpeDh3AM6BCtG19dYC34StotFQeIA5IM9ZpKt0vcQI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=oWqeSw21; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="oWqeSw21" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711868; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9FlmauyfzMJ30WpiQCMliVOAMipux2gAWQ7/AZwbz08=; b=oWqeSw21EVIisf+Wj+u9dXY6H5MlWHP/u/aRWPVeg4N/9LvC8U7YDcY9wxveLdQ3v0BfUa E8sqMktWT4x/T8OfAXV48BSFpMOvHwZPJVZm55nALNnWmslbD3f9FJ4zhCQyiZ4svU0plW AxWzHDpEn9gcI+cA28kiHMl7w8XkEAU= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 21/33] mm: swap: prevent lruvec release in lru_gen_clear_refs() Date: Thu, 5 Mar 2026 19:52:39 +0800 Message-ID: <986cd26227191a48a7c34a2a15812d361f4ebd53.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in lru_gen_clear_refs(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/swap.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 245ba159e01d7..cb1148a92d8ec 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -412,18 +412,20 @@ static void lru_gen_inc_refs(struct folio *folio) =20 static bool lru_gen_clear_refs(struct folio *folio) { - struct lru_gen_folio *lrugen; int gen =3D folio_lru_gen(folio); int type =3D folio_is_file_lru(folio); + unsigned long seq; =20 if (gen < 0) return true; =20 set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS | BIT(PG_workingset), 0); =20 - lrugen =3D &folio_lruvec(folio)->lrugen; + rcu_read_lock(); + seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); + rcu_read_unlock(); /* whether can do without shuffling under the LRU lock */ - return gen =3D=3D lru_gen_from_seq(READ_ONCE(lrugen->min_seq[type])); + return gen =3D=3D lru_gen_from_seq(seq); } =20 #else /* !CONFIG_LRU_GEN */ --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7508939B49D for ; Thu, 5 Mar 2026 11:58:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711883; cv=none; b=Hg7A0PoaagPyHIvA+7rBDzCO+kq0Ri3IXqKZOSgS5oZKpfWaTmtZ7Kn44ocvsGwQQjYWF1NdsfUkXV+ZmdWGyLCSFDrNT91viaRKoRse6A7Yiaru4qQhLNY1aI0N2RaMNu6bGhcbNVrjKwUt+P9+UDsjisCsdWAsETRtG2cE/sQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711883; c=relaxed/simple; bh=g3B0/y0HnBKBMdib3yq+p45Vtq4UjGZbi5z7OwUWoXU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tNGbkh4zJXrvjxFbmSrLg1Bm7p6OUeA2DWjH3zAGBnoGqAc90OLKd3PCpbWo/siygm3eS6G3fFhrCI4V3X/blrQJeDzC+32CD0bvF6/1yiGcza3dHN9vyZM8XOmSp6LRWFkCQa4OmEFi+3AqIiK5j0v7TqAvWnN0NHYJ7u1Dc2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=nqgE8YuM; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="nqgE8YuM" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f+e+MrbxStkCyw/Uqa2jpcsS2d2D2I5PIhTEX7t4Mug=; b=nqgE8YuMYpDeMSH70KNVeXPS0W/PRINNke44gBx7AlwJ4YmwlZE50vf2Y4Jo1VuXMWZp62 KPKir4dzqgTTlpd9/Attzl0mRO9jxg8AH2zpwt1vEZFsP+YMJW7CyX4oq3KY/eLsrnIwsf T0X1I8DTJfskCOaiY4Ntpo66q467K2k= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 22/33] mm: workingset: prevent lruvec release in workingset_activation() Date: Thu, 5 Mar 2026 19:52:40 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_activation(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/workingset.c b/mm/workingset.c index 2de2a355f0f86..95d722a452e1c 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -603,8 +603,11 @@ void workingset_activation(struct folio *folio) * Filter non-memcg pages here, e.g. unmap can call * mark_page_accessed() on VDSO pages. */ - if (mem_cgroup_disabled() || folio_memcg_charged(folio)) + if (mem_cgroup_disabled() || folio_memcg_charged(folio)) { + rcu_read_lock(); workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); + rcu_read_unlock(); + } } =20 /* --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4407396B6E for ; Thu, 5 Mar 2026 11:58:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711894; cv=none; b=ZAiBksV2+kNcHk5Xjluf6pSSSrIeoHMyq2mBE+C46v1eijWJrHbJatTNSRFOG8Qc9TMjJew4gNP5CnXhBodKe7WT/vzF2OSQ7BPeRBot8wW1hohOjhwh3OQyUe2Vv/+NbppKdW/YP4b5zonLp/c+yE1aoUBOqt1iokQsCDzxpEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711894; c=relaxed/simple; bh=q5GNu88ibdKx8MpiuDMplPMbU2D1/ejWOvGMPSbq5qQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JQcD532qoaVeqj1LauZQYKnW8GygM+xhGk7uW8jp4pxY4Hyn8/axQp+V95GIHHYp/hCc1men0BAgUonK3hV3nPR1+9morVzix7SFyBwgc4UieGmydhSmaY7dVucqwXDeTY7DeZtw8RxlBv64ZHp1Q9byHJdnW7VWkiLUh5iBA0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ujqCsOj+; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ujqCsOj+" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dTdOluh/9ylRlwKeMfFMFMRs1I9w1KAVDyJhidoOplQ=; b=ujqCsOj+84waaPYa2xYnno8KBb/Qm+dSd/c6JqZOkI6RM+gFwqp6n2JuRJqkK1rcXXat4H XjgzQd8S7LX3qY7gbhceVX88qiX6+EqxKpZnCPnBj5ocrSlYGlx22GDbCpMj1ZDk1S27TU fDyhVT+ARwZLcmddSa0x3SdCYBNT9VQ= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 23/33] mm: do not open-code lruvec lock Date: Thu, 5 Mar 2026 19:52:41 +0800 Message-ID: <2d0bafe7564e17ece46dfd58197af22ce57017dc.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Now we have lruvec_unlock(), lruvec_unlock_irq() and lruvec_unlock_irqrestore(), but no the paired lruvec_lock(), lruvec_lock_irq() and lruvec_lock_irqsave(). There is currently no use case for lruvec_lock_irqsave(), so only introduce lruvec_lock_irq(), and change all open-code places to use this helper function. This looks cleaner and prepares for reparenting LRU pages, preventing user from missing RCU lock calls due to open-code lruvec lock. Signed-off-by: Qi Zheng Acked-by: Muchun Song Acked-by: Shakeel Butt Reviewed-by: Harry Yoo --- include/linux/memcontrol.h | 5 +++++ mm/vmscan.c | 38 +++++++++++++++++++------------------- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0fdfb8044458e..c7c207a830c50 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1499,6 +1499,11 @@ static inline struct lruvec *parent_lruvec(struct lr= uvec *lruvec) return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } =20 +static inline void lruvec_lock_irq(struct lruvec *lruvec) +{ + spin_lock_irq(&lruvec->lru_lock); +} + static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); diff --git a/mm/vmscan.c b/mm/vmscan.c index 024ff870b1a03..08ed278115f70 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2002,7 +2002,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, =20 lru_add_drain(); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); =20 nr_taken =3D isolate_lru_folios(nr_to_scan, lruvec, &folio_list, &nr_scanned, sc, lru); @@ -2012,7 +2012,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, mod_lruvec_state(lruvec, item, nr_scanned); mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned); =20 - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); =20 if (nr_taken =3D=3D 0) return 0; @@ -2029,7 +2029,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, mod_lruvec_state(lruvec, item, nr_reclaimed); mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); =20 @@ -2108,7 +2108,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, =20 lru_add_drain(); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); =20 nr_taken =3D isolate_lru_folios(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2117,7 +2117,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, =20 mod_lruvec_state(lruvec, PGREFILL, nr_scanned); =20 - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); =20 while (!list_empty(&l_hold)) { struct folio *folio; @@ -2173,7 +2173,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); @@ -3794,9 +3794,9 @@ static void walk_mm(struct mm_struct *mm, struct lru_= gen_mm_walk *walk) } =20 if (walk->batched) { - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); reset_batch_size(walk); - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); } =20 cond_resched(); @@ -3956,7 +3956,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsign= ed long seq, int swappiness if (seq < READ_ONCE(lrugen->max_seq)) return false; =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); =20 VM_WARN_ON_ONCE(!seq_is_valid(lruvec)); =20 @@ -3971,7 +3971,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsign= ed long seq, int swappiness if (inc_min_seq(lruvec, type, swappiness)) continue; =20 - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); cond_resched(); goto restart; } @@ -4006,7 +4006,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsign= ed long seq, int swappiness /* make sure preceding modifications appear */ smp_store_release(&lrugen->max_seq, lrugen->max_seq + 1); unlock: - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); =20 return success; } @@ -4697,7 +4697,7 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); =20 scanned =3D isolate_folios(nr_to_scan, lruvec, sc, swappiness, &type, &li= st); =20 @@ -4706,7 +4706,7 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen= ->max_seq) scanned =3D 0; =20 - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); =20 if (list_empty(&list)) return scanned; @@ -4744,9 +4744,9 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, walk =3D current->reclaim_state->mm_walk; if (walk && walk->batched) { walk->lruvec =3D lruvec; - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); reset_batch_size(walk); - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); } =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), @@ -5178,7 +5178,7 @@ static void lru_gen_change_state(bool enabled) for_each_node(nid) { struct lruvec *lruvec =3D get_lruvec(memcg, nid); =20 - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); =20 VM_WARN_ON_ONCE(!seq_is_valid(lruvec)); VM_WARN_ON_ONCE(!state_is_valid(lruvec)); @@ -5186,12 +5186,12 @@ static void lru_gen_change_state(bool enabled) lruvec->lrugen.enabled =3D enabled; =20 while (!(enabled ? fill_evictable(lruvec) : drain_evictable(lruvec))) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); cond_resched(); - spin_lock_irq(&lruvec->lru_lock); + lruvec_lock_irq(lruvec); } =20 - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); } =20 cond_resched(); --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D97A7386425 for ; Thu, 5 Mar 2026 11:58:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711907; cv=none; b=bRYNjU81yUVtJMWYYQrvMdxcvmv2S/ZZtiGwWEi0I6pjZr3VYJ9+6qIi4q9QxQUlY41TkL7nlxQvOi/2SUS0VbzKlVeCY4GNlx6RNvZno5VwBUmGUmL2yMRvZLFiuOy4y2TkfDtAcZpNz/d7LC8qfFsc84+L5YiQFeaCpqB0ERY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711907; c=relaxed/simple; bh=9Gqz5dVR6da8IaSmYLAgFNEN19/46lj4wkr2geOgxnc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kzjFczubFuwUFr55k1KDsFW/1UHROoq4kmEZP4AGvt5yN8abrr+cfEN009EpVUt2If3j1MMqX8BNLS1k+iMx734YP0mNQNSfQ3huQ/luUDC8HH2uhmqk4jtwyWRQA02DX1IKi4Su4PCMKsJckCTHqH7ZmiLm9cX8/4wzszQZ7C8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=dfcOCS/p; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="dfcOCS/p" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711903; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RasZGqUB4zC1AENEdRCp41SIwuhRwwx/E/csmdVjO6A=; b=dfcOCS/p4dMLHRsX8fRxUpkX9OkKs/WdM2uNEaORWTCF9M2GJ9XnT7RvQQ/uQSIO4MAs+V RIwfMrBW9ODIq53+bMYvhGGYN22uv9NG1Ikkj7W3Ifnbz38SCXYH35K1rHmiloppP0ZBqe U86719XM5mIgaoQaAu3fWk2DabgqAvY= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 24/33] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Date: Thu, 5 Mar 2026 19:52:42 +0800 Message-ID: <23f22cbb1419f277a3483018b32158ae2b86c666.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song The following diagram illustrates how to ensure the safety of the folio lruvec lock when LRU folios undergo reparenting. In the folio_lruvec_lock(folio) function: ``` rcu_read_lock(); retry: lruvec =3D folio_lruvec(folio); /* There is a possibility of folio reparenting at this point. */ spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { /* * The wrong lruvec lock was acquired, and a retry is required. * This is because the folio resides on the parent memcg lruvec * list. */ spin_unlock(&lruvec->lru_lock); goto retry; } /* Reaching here indicates that folio_memcg() is stable. */ ``` In the memcg_reparent_objcgs(memcg) function: ``` spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); /* Transfer folios from the lruvec list to the parent's. */ spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); ``` After acquiring the lruvec lock, it is necessary to verify whether the folio has been reparented. If reparenting has occurred, the new lruvec lock must be reacquired. During the LRU folio reparenting process, the lruvec lock will also be acquired (this will be implemented in a subsequent patch). Therefore, folio_memcg() remains unchanged while the lruvec lock is held. Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after the lruvec lock is acquired, the lruvec_memcg_debug() check is redundant. Hence, it is removed. This patch serves as a preparation for the reparenting of LRU folios. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 34 ++++++++++++------------ include/linux/swap.h | 3 +-- mm/compaction.c | 29 ++++++++++++++++----- mm/memcontrol.c | 53 +++++++++++++++++++------------------- mm/swap.c | 6 ++++- 5 files changed, 73 insertions(+), 52 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index c7c207a830c50..d2748e672fd88 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -742,7 +742,15 @@ static inline struct lruvec *mem_cgroup_lruvec(struct = mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * Call with rcu_read_lock() held to ensure the lifetime of the returned l= ruvec. + * Note that this alone will NOT guarantee the stability of the folio->lru= vec + * association; the folio can be reparented to an ancestor if this races w= ith + * cgroup deletion. + * + * Use folio_lruvec_lock() to ensure both lifetime and stability of the bi= nding. + * Once a lruvec is locked, folio_lruvec() can be called on other folios, = and + * their binding is stable if the returned lruvec matches the one the call= er has + * locked. Useful for lock batching. */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -765,15 +773,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *fol= io); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1199,11 +1198,6 @@ static inline struct lruvec *folio_lruvec(struct fol= io *folio) return &pgdat->__lruvec; } =20 -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memc= g) { return NULL; @@ -1262,6 +1256,7 @@ static inline struct lruvec *folio_lruvec_lock(struct= folio *folio) { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } @@ -1270,6 +1265,7 @@ static inline struct lruvec *folio_lruvec_lock_irq(st= ruct folio *folio) { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock_irq(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } @@ -1279,6 +1275,7 @@ static inline struct lruvec *folio_lruvec_lock_irqsav= e(struct folio *folio, { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); return &pgdat->__lruvec; } @@ -1501,23 +1498,26 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) =20 static inline void lruvec_lock_irq(struct lruvec *lruvec) { + rcu_read_lock(); spin_lock_irq(&lruvec->lru_lock); } =20 static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); + rcu_read_unlock(); } =20 static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); } =20 -static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, - unsigned long flags) +static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigne= d long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); + rcu_read_unlock(); } =20 /* Test requires a stable folio->memcg binding, see folio_memcg() */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 0effe3cc50f5f..00ce4c5e6e714 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -312,8 +312,7 @@ extern unsigned long totalreserve_pages; =20 /* linux/mm/swap.c */ void lru_note_cost_unlock_irq(struct lruvec *lruvec, bool file, - unsigned int nr_io, unsigned int nr_rotated) - __releases(lruvec->lru_lock); + unsigned int nr_io, unsigned int nr_rotated); void lru_note_cost_refault(struct folio *); void folio_add_lru(struct folio *); void folio_add_lru_vma(struct folio *, struct vm_area_struct *); diff --git a/mm/compaction.c b/mm/compaction.c index c3e338aaa0ffb..3648ce22c8072 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -518,6 +518,24 @@ static bool compact_lock_irqsave(spinlock_t *lock, uns= igned long *flags, return true; } =20 +static struct lruvec * +compact_folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flag= s, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); + compact_lock_irqsave(&lruvec->lru_lock, flags, cc); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avo= id @@ -839,7 +857,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, { pg_data_t *pgdat =3D cc->zone->zone_pgdat; unsigned long nr_scanned =3D 0, nr_isolated =3D 0; - struct lruvec *lruvec; + struct lruvec *lruvec =3D NULL; unsigned long flags =3D 0; struct lruvec *locked =3D NULL; struct folio *folio =3D NULL; @@ -1153,18 +1171,17 @@ isolate_migratepages_block(struct compact_control *= cc, unsigned long low_pfn, if (!folio_test_clear_lru(folio)) goto isolate_fail_put; =20 - lruvec =3D folio_lruvec(folio); + if (locked) + lruvec =3D folio_lruvec(folio); =20 /* If we already hold the lock, we can skip some rechecking */ - if (lruvec !=3D locked) { + if (lruvec !=3D locked || !locked) { if (locked) lruvec_unlock_irqrestore(locked, flags); =20 - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec =3D compact_folio_lruvec_lock_irqsave(folio, &flags, cc); locked =3D lruvec; =20 - lruvec_memcg_debug(lruvec, folio); - /* * Try get exclusive access under lock. If marked for * skip, the scan is aborted unless the current context diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 063fdfdd58223..49a076b7c2e42 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1207,23 +1207,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, } } =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg =3D folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(!mem_cgroup_is_root(lruvec_memcg(lruvec)), folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) !=3D memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1233,14 +1216,20 @@ void lruvec_memcg_debug(struct lruvec *lruvec, stru= ct folio *folio) * - folio_test_lru false * - folio frozen (refcount of 0) * - * Return: The lruvec this folio is on with its lock held. + * Return: The lruvec this folio is on with its lock held and rcu read loc= k held. */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1255,14 +1244,20 @@ struct lruvec *folio_lruvec_lock(struct folio *foli= o) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1278,15 +1273,21 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *= folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } =20 return lruvec; } diff --git a/mm/swap.c b/mm/swap.c index cb1148a92d8ec..d5bfe6a76ca45 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -240,6 +240,7 @@ void folio_rotate_reclaimable(struct folio *folio) void lru_note_cost_unlock_irq(struct lruvec *lruvec, bool file, unsigned int nr_io, unsigned int nr_rotated) __releases(lruvec->lru_lock) + __releases(rcu) { unsigned long cost; =20 @@ -253,6 +254,7 @@ void lru_note_cost_unlock_irq(struct lruvec *lruvec, bo= ol file, cost =3D nr_io * SWAP_CLUSTER_MAX + nr_rotated; if (!cost) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); return; } =20 @@ -285,8 +287,10 @@ void lru_note_cost_unlock_irq(struct lruvec *lruvec, b= ool file, =20 spin_unlock_irq(&lruvec->lru_lock); lruvec =3D parent_lruvec(lruvec); - if (!lruvec) + if (!lruvec) { + rcu_read_unlock(); break; + } spin_lock_irq(&lruvec->lru_lock); } } --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3FF239E19D for ; Thu, 5 Mar 2026 11:58:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711918; cv=none; b=Sp701EC+UZDgrwjMeVO6hedEBMbjltw2N0sEbTSU/JKY9IjkZaX59/SiQfyLpgqpjzFuOGKOysqYJl6lgQG5rWKVcbVYD+GiBWyjedhaXdfvGKhd5EqJnc7eMQvE8ggo5HdzVqLRAnolyglIDepUEP/VA4kMNt2RKCTued6pNTQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711918; c=relaxed/simple; bh=H1jeheqKqppr6y+XHcdS73CV2RT+AIZwbgMzziim2bo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kiOdo4CbEcoFMqxYsRhpIoMvH9RMbxczJ6aroxyXN9v2SYYD6MPFYkCc0lptU26OySMT78jpi1Pdu7BdAvNLTbBHHmS4Vo1zOU/C9/FJbv6wH6uy86Ng3pQ0+UPdXR1sEu8RK5ADxNcoLnZd0bmrc1BVpCsmJBrh7BQmhbWHIMY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=h3sQKY7Z; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="h3sQKY7Z" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711914; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=usUS7HG6u84XSS4hIIZlOhBunMjn2SlwVU6f8o31p/g=; b=h3sQKY7ZYBCeKu/SV7widnxjYzrIiMVtKJE3hixVtQhv9zpuf5kpJtq2SQX6egs833WiZJ 6270SI5L7rxFuGG+Q7w27N0RI8tLPlrU9+fpfgFRmF3CxaQcpCJnaAsBvOi/Nk/fhET4zH gOHt1l7Qs7IVBzi6unUAdGIE8RAG9eI= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 25/33] mm: vmscan: prepare for reparenting traditional LRU folios Date: Thu, 5 Mar 2026 19:52:43 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng To resolve the dying memcg issue, we need to reparent LRU folios of child memcg to its parent memcg. For traditional LRU list, each lruvec of every memcg comprises four LRU lists. Due to the symmetry of the LRU lists, it is feasible to transfer the LRU lists from a memcg to its parent memcg during the reparenting process. This commit implements the specific function, which will be used during the reparenting process. Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Muchun Song Acked-by: Shakeel Butt --- include/linux/swap.h | 21 +++++++++++++++++++++ mm/swap.c | 33 +++++++++++++++++++++++++++++++++ mm/vmscan.c | 19 ------------------- 3 files changed, 54 insertions(+), 19 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 00ce4c5e6e714..64af9462ae8af 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -548,6 +548,8 @@ static inline int mem_cgroup_swappiness(struct mem_cgro= up *memcg) =20 return READ_ONCE(memcg->swappiness); } + +void lru_reparent_memcg(struct mem_cgroup *memcg, struct mem_cgroup *paren= t, int nid); #else static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) { @@ -612,5 +614,24 @@ static inline bool mem_cgroup_swap_full(struct folio *= folio) } #endif =20 +/* for_each_managed_zone_pgdat - helper macro to iterate over all managed = zones in a pgdat up to + * and including the specified highidx + * @zone: The current zone in the iterator + * @pgdat: The pgdat which node_zones are being iterated + * @idx: The index variable + * @highidx: The index of the highest zone to return + * + * This macro iterates through all managed zones up to and including the s= pecified highidx. + * The zone iterator enters an invalid state after macro call and must be = reinitialized + * before it can be used again. + */ +#define for_each_managed_zone_pgdat(zone, pgdat, idx, highidx) \ + for ((idx) =3D 0, (zone) =3D (pgdat)->node_zones; \ + (idx) <=3D (highidx); \ + (idx)++, (zone)++) \ + if (!managed_zone(zone)) \ + continue; \ + else + #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/mm/swap.c b/mm/swap.c index d5bfe6a76ca45..e255be6a0f34f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -1090,6 +1090,39 @@ void folio_batch_remove_exceptionals(struct folio_ba= tch *fbatch) fbatch->nr =3D j; } =20 +#ifdef CONFIG_MEMCG +static void lruvec_reparent_lru(struct lruvec *child_lruvec, + struct lruvec *parent_lruvec, + enum lru_list lru, int nid) +{ + int zid; + struct zone *zone; + + if (lru !=3D LRU_UNEVICTABLE) + list_splice_tail_init(&child_lruvec->lists[lru], &parent_lruvec->lists[l= ru]); + + for_each_managed_zone_pgdat(zone, NODE_DATA(nid), zid, MAX_NR_ZONES - 1) { + unsigned long size =3D mem_cgroup_get_zone_lru_size(child_lruvec, lru, z= id); + + mem_cgroup_update_lru_size(parent_lruvec, lru, zid, size); + } +} + +void lru_reparent_memcg(struct mem_cgroup *memcg, struct mem_cgroup *paren= t, int nid) +{ + enum lru_list lru; + struct lruvec *child_lruvec, *parent_lruvec; + + child_lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + parent_lruvec =3D mem_cgroup_lruvec(parent, NODE_DATA(nid)); + parent_lruvec->anon_cost +=3D child_lruvec->anon_cost; + parent_lruvec->file_cost +=3D child_lruvec->file_cost; + + for_each_lru(lru) + lruvec_reparent_lru(child_lruvec, parent_lruvec, lru, nid); +} +#endif + static const struct ctl_table swap_sysctl_table[] =3D { { .procname =3D "page-cluster", diff --git a/mm/vmscan.c b/mm/vmscan.c index 08ed278115f70..606b4ecf77ef3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -269,25 +269,6 @@ static int sc_swappiness(struct scan_control *sc, stru= ct mem_cgroup *memcg) } #endif =20 -/* for_each_managed_zone_pgdat - helper macro to iterate over all managed = zones in a pgdat up to - * and including the specified highidx - * @zone: The current zone in the iterator - * @pgdat: The pgdat which node_zones are being iterated - * @idx: The index variable - * @highidx: The index of the highest zone to return - * - * This macro iterates through all managed zones up to and including the s= pecified highidx. - * The zone iterator enters an invalid state after macro call and must be = reinitialized - * before it can be used again. - */ -#define for_each_managed_zone_pgdat(zone, pgdat, idx, highidx) \ - for ((idx) =3D 0, (zone) =3D (pgdat)->node_zones; \ - (idx) <=3D (highidx); \ - (idx)++, (zone)++) \ - if (!managed_zone(zone)) \ - continue; \ - else - static void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs) { --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 320A339B4BE for ; Thu, 5 Mar 2026 11:58:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711929; cv=none; b=tTi7dMVcBlTUUmeXScoHTyw5BgrjPw1F5T54jV27c2s1Bn+6VF//zT8mXDoXgFAtpoTYkpaj/YsEYgPvk4Lnqn1+XZ7NXasKvfet4KCMcuZqzfltXDw5gBtvCPwsaX7Q5wpUKU0FhZHMH5LEknB0to/dVlfBBH2qb/y4onxDCrI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711929; c=relaxed/simple; bh=qBkqsgTOCMLemM85WeDRnLHxP4I3vGwJ576cBtjYN9I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SEYWDZzg03nda9ZkAAvMvLjhEe/ivyxJPpj7Tf+msO36BGXEhocV+faebnQDrMcwI2H0FZ/I1euWX7/eChFSFVf/nihW0Ea91l7pUzhiWkseeRwUGwh0zSTjD1YVyN5uQJXGB1MiPlrilpg/Z1xBbuooMQlXu7q6Ty6qJnR5x4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=nK5cOxuP; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="nK5cOxuP" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711926; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FNVJ4KS5apbdFZdhuZeNuYa2gU+wA/pVS6ou2uf08Uw=; b=nK5cOxuPDHk2gapaPO9PtGKvoNl0fuenWnofmUxxaf38ar1Zu9UNZ8LqU48OXeMt0hNxKD PsrOAwAAgLYcbOoJ+XI4Tner1RfwB0CzmAJBb+sqixT/aqSmhxUgGhL2JWP/RgagZbvWuu Duqvm7d7Sml9BxhyHLdDM2X3/tEZGOU= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 26/33] mm: vmscan: prepare for reparenting MGLRU folios Date: Thu, 5 Mar 2026 19:52:44 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Similar to traditional LRU folios, in order to solve the dying memcg problem, we also need to reparenting MGLRU folios to the parent memcg when memcg offline. However, there are the following challenges: 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the number of generations of the parent and child memcg may be different, so we cannot simply transfer MGLRU folios in the child memcg to the parent memcg as we did for traditional LRU folios. 2. The generation information is stored in folio->flags, but we cannot traverse these folios while holding the lru lock, otherwise it may cause softlockup. 3. In walk_update_folio(), the gen of folio and corresponding lru size may be updated, but the folio is not immediately moved to the corresponding lru list. Therefore, there may be folios of different generations on an LRU list. 4. In lru_gen_del_folio(), the generation to which the folio belongs is found based on the generation information in folio->flags, and the corresponding LRU size will be updated. Therefore, we need to update the lru size correctly during reparenting, otherwise the lru size may be updated incorrectly in lru_gen_del_folio(). Finally, this patch chose a compromise method, which is to splice the lru list in the child memcg to the lru list of the same generation in the parent memcg during reparenting. And in order to ensure that the parent memcg has the same generation, we need to increase the generations in the parent memcg to the MAX_NR_GENS before reparenting. Of course, the same generation has different meanings in the parent and child memcg, this will cause confusion in the hot and cold information of folios. But other than that, this method is simple enough, the lru size is correct, and there is no need to consider some concurrency issues (such as lru_gen_del_folio()). To prepare for the above work, this commit implements the specific functions, which will be used during reparenting. Suggested-by: Harry Yoo Suggested-by: Imran Khan Signed-off-by: Qi Zheng Acked-by: Harry Yoo --- include/linux/mmzone.h | 17 +++++ mm/vmscan.c | 142 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 159 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 546bca95ca40c..e7a8cd41619b2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -637,6 +637,9 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg); void lru_gen_offline_memcg(struct mem_cgroup *memcg); void lru_gen_release_memcg(struct mem_cgroup *memcg); void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid); +void max_lru_gen_memcg(struct mem_cgroup *memcg, int nid); +bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg, int nid); +void lru_gen_reparent_memcg(struct mem_cgroup *memcg, struct mem_cgroup *p= arent, int nid); =20 #else /* !CONFIG_LRU_GEN */ =20 @@ -677,6 +680,20 @@ static inline void lru_gen_soft_reclaim(struct mem_cgr= oup *memcg, int nid) { } =20 +static inline void max_lru_gen_memcg(struct mem_cgroup *memcg, int nid) +{ +} + +static inline bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg, int= nid) +{ + return true; +} + +static inline +void lru_gen_reparent_memcg(struct mem_cgroup *memcg, struct mem_cgroup *p= arent, int nid) +{ +} + #endif /* CONFIG_LRU_GEN */ =20 struct lruvec { diff --git a/mm/vmscan.c b/mm/vmscan.c index 606b4ecf77ef3..0fb81fb7985e2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4408,6 +4408,148 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg,= int nid) lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); } =20 +bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg, int nid) +{ + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + int type; + + for (type =3D 0; type < ANON_AND_FILE; type++) { + if (get_nr_gens(lruvec, type) !=3D MAX_NR_GENS) + return false; + } + + return true; +} + +static void try_to_inc_max_seq_nowalk(struct mem_cgroup *memcg, + struct lruvec *lruvec) +{ + struct lru_gen_mm_list *mm_list =3D get_mm_list(memcg); + struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); + int swappiness =3D mem_cgroup_swappiness(memcg); + DEFINE_MAX_SEQ(lruvec); + bool success =3D false; + + /* + * We are not iterating the mm_list here, updating mm_state->seq is just + * to make mm walkers work properly. + */ + if (mm_state) { + spin_lock(&mm_list->lock); + VM_WARN_ON_ONCE(mm_state->seq + 1 < max_seq); + if (max_seq > mm_state->seq) { + WRITE_ONCE(mm_state->seq, mm_state->seq + 1); + success =3D true; + } + spin_unlock(&mm_list->lock); + } else { + success =3D true; + } + + if (success) + inc_max_seq(lruvec, max_seq, swappiness); +} + +/* + * We need to ensure that the folios of child memcg can be reparented to t= he + * same gen of the parent memcg, so the gens of the parent memcg needed be + * incremented to the MAX_NR_GENS before reparenting. + */ +void max_lru_gen_memcg(struct mem_cgroup *memcg, int nid) +{ + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + int type; + + for (type =3D 0; type < ANON_AND_FILE; type++) { + while (get_nr_gens(lruvec, type) < MAX_NR_GENS) { + try_to_inc_max_seq_nowalk(memcg, lruvec); + cond_resched(); + } + } +} + +/* + * Compared to traditional LRU, MGLRU faces the following challenges: + * + * 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the + * number of generations of the parent and child memcg may be different, + * so we cannot simply transfer MGLRU folios in the child memcg to the + * parent memcg as we did for traditional LRU folios. + * 2. The generation information is stored in folio->flags, but we cannot + * traverse these folios while holding the lru lock, otherwise it may + * cause softlockup. + * 3. In walk_update_folio(), the gen of folio and corresponding lru size + * may be updated, but the folio is not immediately moved to the + * corresponding lru list. Therefore, there may be folios of different + * generations on an LRU list. + * 4. In lru_gen_del_folio(), the generation to which the folio belongs is + * found based on the generation information in folio->flags, and the + * corresponding LRU size will be updated. Therefore, we need to update + * the lru size correctly during reparenting, otherwise the lru size may + * be updated incorrectly in lru_gen_del_folio(). + * + * Finally, we choose a compromise method, which is to splice the lru list= in + * the child memcg to the lru list of the same generation in the parent me= mcg + * during reparenting. + * + * The same generation has different meanings in the parent and child memc= g, + * so this compromise method will cause the LRU inversion problem. But as = the + * system runs, this problem will be fixed automatically. + */ +static void __lru_gen_reparent_memcg(struct lruvec *child_lruvec, struct l= ruvec *parent_lruvec, + int zone, int type) +{ + struct lru_gen_folio *child_lrugen, *parent_lrugen; + enum lru_list lru =3D type * LRU_INACTIVE_FILE; + int i; + + child_lrugen =3D &child_lruvec->lrugen; + parent_lrugen =3D &parent_lruvec->lrugen; + + for (i =3D 0; i < get_nr_gens(child_lruvec, type); i++) { + int gen =3D lru_gen_from_seq(child_lrugen->max_seq - i); + long nr_pages =3D child_lrugen->nr_pages[gen][type][zone]; + int child_lru_active =3D lru_gen_is_active(child_lruvec, gen) ? LRU_ACTI= VE : 0; + int parent_lru_active =3D lru_gen_is_active(parent_lruvec, gen) ? LRU_AC= TIVE : 0; + + /* Assuming that child pages are colder than parent pages */ + list_splice_init(&child_lrugen->folios[gen][type][zone], + &parent_lrugen->folios[gen][type][zone]); + + WRITE_ONCE(child_lrugen->nr_pages[gen][type][zone], 0); + WRITE_ONCE(parent_lrugen->nr_pages[gen][type][zone], + parent_lrugen->nr_pages[gen][type][zone] + nr_pages); + + if (lru_gen_is_active(child_lruvec, gen) !=3D lru_gen_is_active(parent_l= ruvec, gen)) { + __update_lru_size(child_lruvec, lru + child_lru_active, zone, -nr_pages= ); + __update_lru_size(parent_lruvec, lru + parent_lru_active, zone, nr_page= s); + } + } +} + +void lru_gen_reparent_memcg(struct mem_cgroup *memcg, struct mem_cgroup *p= arent, int nid) +{ + struct lruvec *child_lruvec, *parent_lruvec; + int type, zid; + struct zone *zone; + enum lru_list lru; + + child_lruvec =3D get_lruvec(memcg, nid); + parent_lruvec =3D get_lruvec(parent, nid); + + for_each_managed_zone_pgdat(zone, NODE_DATA(nid), zid, MAX_NR_ZONES - 1) + for (type =3D 0; type < ANON_AND_FILE; type++) + __lru_gen_reparent_memcg(child_lruvec, parent_lruvec, zid, type); + + for_each_lru(lru) { + for_each_managed_zone_pgdat(zone, NODE_DATA(nid), zid, MAX_NR_ZONES - 1)= { + unsigned long size =3D mem_cgroup_get_zone_lru_size(child_lruvec, lru, = zid); + + mem_cgroup_update_lru_size(parent_lruvec, lru, zid, size); + } + } +} + #endif /* CONFIG_MEMCG */ =20 /*************************************************************************= ***** --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0398A39F196 for ; Thu, 5 Mar 2026 11:58:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711940; cv=none; b=g/zyIkJM2M9Zf6dJEwOZsQ3YUaZenjk9bgtBIzpAUBS4TB60xoPEc4gv1Gw7sb/VEXzw1rWbSQfl78ETitMklnecAbNONr4s5m9x9TbSmPElmbG1rrFDZ47qgylBAfN4pLjOfTwXspR3sGI99BvWFwmbURX6qeou2b+ImdYaYi0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711940; c=relaxed/simple; bh=Sg7X0J2LeOj21kGyyAuviahyCnjetwqckops/SwYF7U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k9w0ns4f9B2QEeyeJugUfuJuSZEnJ2t0jSU11aR2jxJ45OQJk2CEEjl8TagYuRiatQ0g0Fbl23FyAGmgBzhz6uOdiqdxsyV3X/MSjES3EsWTI+c0FnvoDjY7H5GZWO5QCjXy/JrVnuMfsFp8qCrsaaoFZKDTGviGH7ut/q1BrEI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=BOcfMfGt; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="BOcfMfGt" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ih1p6IEUWMwvOImSKCL1YcoSFVupYh3GoMOZwrQGBdQ=; b=BOcfMfGtjpHcZp+xS/IpT3JULkLv02uZSYMCkCq8bDAyt4Jvy92uC0aTckuxqn3xx3Jxbb q/M0pn2SWVc7gCp+kU8JZd+USDTs48XoIopK3xpPraNBcil60WNqxgue3ZE0S/NnoqLuR+ zXFJzorToXtaCX16OVcQD6Gtc39QfHA= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 27/33] mm: memcontrol: refactor memcg_reparent_objcgs() Date: Thu, 5 Mar 2026 19:52:45 +0800 Message-ID: <2e5696db1993e593a51004c1dacedbc261689629.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Refactor the memcg_reparent_objcgs() to facilitate subsequent reparenting LRU folios here. Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt Reviewed-by: Harry Yoo Reviewed-by: Muchun Song --- mm/memcontrol.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 49a076b7c2e42..5929e397c3c31 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -209,15 +209,12 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg) +static inline struct obj_cgroup *__memcg_reparent_objcgs(struct mem_cgroup= *memcg, + struct mem_cgroup *parent) { struct obj_cgroup *objcg, *iter; - struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); =20 objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); - - spin_lock_irq(&objcg_lock); - /* 1) Ready to reparent active objcg. */ list_add(&objcg->list, &memcg->objcg_list); /* 2) Reparent active objcg and already reparented objcgs to parent. */ @@ -226,7 +223,29 @@ static void memcg_reparent_objcgs(struct mem_cgroup *m= emcg) /* 3) Move already reparented objcgs to the parent's list */ list_splice(&memcg->objcg_list, &parent->objcg_list); =20 + return objcg; +} + +static inline void reparent_locks(struct mem_cgroup *memcg, struct mem_cgr= oup *parent) +{ + spin_lock_irq(&objcg_lock); +} + +static inline void reparent_unlocks(struct mem_cgroup *memcg, struct mem_c= group *parent) +{ spin_unlock_irq(&objcg_lock); +} + +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) +{ + struct obj_cgroup *objcg; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + + reparent_locks(memcg, parent); + + objcg =3D __memcg_reparent_objcgs(memcg, parent); + + reparent_unlocks(memcg, parent); =20 percpu_ref_kill(&objcg->refcnt); } --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2E8A397694 for ; Thu, 5 Mar 2026 11:59:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711952; cv=none; b=hhN20suyeLjZQxm8cLgSSP6ynZj5GDYSQoI5KLTx0fiuATeKmP840dcfInYM/Fkl7qDPc1MnCyYjSCvA//F52dPCEBmrZNhxcSLhcBOjlkoVGpkq7VYVnLlfLZhKQPmBoLUkEmU9wO0hrsEAJh6+zyNksAr3+wvJV3w3bAmCsto= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711952; c=relaxed/simple; bh=JN7pBv8VkkKncPOYW80zogZZnT96T/hdsZkXkiHcEpQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hoO9ogiTStP0JdGpv1sMULKhh9WYc5Wu1fNaeIKw9FCGrLDwtc7ndRGvqfSEvnUgKYEdvYS0bnuOkD4mJNWMJ5EIp887D6w8wTalrKjUWFKkSTb2jadweiq0yDJm7vRccpyRZNygv0ObEngknerpS/iJDsKRZUTBXWOCkrBnTMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=LgKIJi1D; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="LgKIJi1D" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1AoaAnBbgsYhG9W1u4+1KzMV1vTbFtNuDB1KSaR3wOM=; b=LgKIJi1DWuIE/xjcn6z9kiBAWJ6togT5O+WwgW7ZnZ2O5s04jq3seB7SiC/hZozI4rml5D hg0OBDuPhSDbcZop5d/825FrUJwpKspB+z+CjqVjZWwH7/Obz5iSlQ69ZaY7JRLVXyUrNY n5mC+uHSvROGnoU/9AyCNzNaagktUG4= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 28/33] mm: workingset: use lruvec_lru_size() to get the number of lru pages Date: Thu, 5 Mar 2026 19:52:46 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng For cgroup v2, count_shadow_nodes() is the only place to read non-hierarchical stats (lruvec_stats->state_local). To avoid the need to consider cgroup v2 during subsequent non-hierarchical stats reparenting, use lruvec_lru_size() instead of lruvec_page_state_local() to get the number of lru pages. For NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B cases, it appears that the statistics here have already been problematic for a while since slab pages have been reparented. So just ignore it for now. Signed-off-by: Qi Zheng Acked-by: Shakeel Butt Acked-by: Muchun Song --- include/linux/swap.h | 1 + mm/vmscan.c | 3 +-- mm/workingset.c | 5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 64af9462ae8af..9d0e292875398 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -354,6 +354,7 @@ extern void swap_setup(void); extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int orde= r, gfp_t gfp_mask, nodemask_t *mask); +unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, in= t zone_idx); =20 #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) diff --git a/mm/vmscan.c b/mm/vmscan.c index 0fb81fb7985e2..7f9f66e0b40e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -390,8 +390,7 @@ unsigned long zone_reclaimable_pages(struct zone *zone) * @lru: lru to use * @zone_idx: zones to consider (use MAX_NR_ZONES - 1 for the whole LRU li= st) */ -static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list = lru, - int zone_idx) +unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, in= t zone_idx) { unsigned long size =3D 0; int zid; diff --git a/mm/workingset.c b/mm/workingset.c index 95d722a452e1c..07e6836d05020 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -691,9 +691,10 @@ static unsigned long count_shadow_nodes(struct shrinke= r *shrinker, =20 mem_cgroup_flush_stats_ratelimited(sc->memcg); lruvec =3D mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid)); + for (pages =3D 0, i =3D 0; i < NR_LRU_LISTS; i++) - pages +=3D lruvec_page_state_local(lruvec, - NR_LRU_BASE + i); + pages +=3D lruvec_lru_size(lruvec, i, MAX_NR_ZONES - 1); + pages +=3D lruvec_page_state_local( lruvec, NR_SLAB_RECLAIMABLE_B) >> PAGE_SHIFT; pages +=3D lruvec_page_state_local( --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7948E396D27 for ; Thu, 5 Mar 2026 11:59:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711960; cv=none; b=oxBjRtIVubDHlyk36FTrHGG6RwJ2QXSvirb84i0ER1MkhS39MJBiny4bEKrLt2vd3Kp/LWM3KPbPAX1o91ChNfeG0EOPUFD/6pfc+SYr3zje/0SFXsB7DVCUZIemh5LnIObimQe6KFMIuMcb9Rtrl2ck/hEvDFUcLgp1yPUKjWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711960; c=relaxed/simple; bh=Xqj+0uh8mD2tRbIWDoIhiPzxpjyN4PoCsHgk2HAFqic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qGa9HkHCbrYWc7U9ZHKx9i5goNGykHDeCVrmW0hnhKfAkQnBtRqF2naXgZ+UfoxwvO9mcGX9NpebkZZFWcQFZmKZ1mX8+siTZqQFgFsd32tKb2KIAyrp3lUY3WejKxHCxcEDKGXaGfh6i6jCMOaHH7fxiIfXpd/fC6Fn54gLGdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VuMZBFmy; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VuMZBFmy" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711957; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OkkDq9fYZNwUQylXgNIXOLA+3SHN3gdrMA7iDR/pB7s=; b=VuMZBFmyBkVgZzDvr//F8qLgETBTT0LZkc1dSfT2HpZqxQ3K/SW32urwpUT+jwMKE+XfX9 +nfIk/2lc/8qOweHnZicjhtTr6/x0V5nC6dQN2K42J2ijNJUuZft24ZaWSwbP02ecM7Frd PWkv4pGGCJvB9u4BwbqTROdLTPndqlQ= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng , Yosry Ahmed Subject: [PATCH v6 29/33] mm: memcontrol: refactor mod_memcg_state() and mod_memcg_lruvec_state() Date: Thu, 5 Mar 2026 19:52:47 +0800 Message-ID: <7f8bd3aacec2270b9453428fc8585cca9f10751e.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Refactor the memcg_reparent_objcgs() to facilitate subsequent reparenting non-hierarchical stats. Co-developed-by: Yosry Ahmed Signed-off-by: Yosry Ahmed Signed-off-by: Qi Zheng --- mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5929e397c3c31..23b70bd80ddc9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -718,21 +718,12 @@ static int memcg_state_val_in_pages(int idx, int val) return max(val * unit / PAGE_SIZE, 1UL); } =20 -/** - * mod_memcg_state - update cgroup memory statistics - * @memcg: the memory cgroup - * @idx: the stat item - can be enum memcg_stat_item or enum node_stat_item - * @val: delta to add to the counter, can be negative - */ -void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, - int val) +static void __mod_memcg_state(struct mem_cgroup *memcg, + enum memcg_stat_item idx, int val) { int i =3D memcg_stats_index(idx); int cpu; =20 - if (mem_cgroup_disabled()) - return; - if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 @@ -746,6 +737,21 @@ void mod_memcg_state(struct mem_cgroup *memcg, enum me= mcg_stat_item idx, put_cpu(); } =20 +/** + * mod_memcg_state - update cgroup memory statistics + * @memcg: the memory cgroup + * @idx: the stat item - can be enum memcg_stat_item or enum node_stat_item + * @val: delta to add to the counter, can be negative + */ +void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, + int val) +{ + if (mem_cgroup_disabled()) + return; + + __mod_memcg_state(memcg, idx, val); +} + #ifdef CONFIG_MEMCG_V1 /* idx can be of type enum memcg_stat_item or node_stat_item. */ unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) @@ -765,21 +771,16 @@ unsigned long memcg_page_state_local(struct mem_cgrou= p *memcg, int idx) } #endif =20 -static void mod_memcg_lruvec_state(struct lruvec *lruvec, - enum node_stat_item idx, - int val) +static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn, + enum node_stat_item idx, int val) { - struct mem_cgroup_per_node *pn; - struct mem_cgroup *memcg; + struct mem_cgroup *memcg =3D pn->memcg; int i =3D memcg_stats_index(idx); int cpu; =20 if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 - pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); - memcg =3D pn->memcg; - cpu =3D get_cpu(); =20 /* Update memcg */ @@ -795,6 +796,17 @@ static void mod_memcg_lruvec_state(struct lruvec *lruv= ec, put_cpu(); } =20 +static void mod_memcg_lruvec_state(struct lruvec *lruvec, + enum node_stat_item idx, + int val) +{ + struct mem_cgroup_per_node *pn; + + pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); + + __mod_memcg_lruvec_state(pn, idx, val); +} + /** * mod_lruvec_state - update lruvec memory statistics * @lruvec: the lruvec --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACEDD396D30 for ; Thu, 5 Mar 2026 11:59:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711973; cv=none; b=IlfUwUDV61ip+OaN6g964tfuhlFHVqjGXYG2cfGdMYCdlDcxat+EOj7KHEw3JWysIB05OaZOBdCkqJbXP0PuGTCrgmTd4/TRfikXtgE7r3/8mMs8IkZG6T8X5r1kYOYM1ArmbxSF3qa904EkQAc8nzO9PY+iyzCPWtpFJIx9sVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711973; c=relaxed/simple; bh=T46f9lJNGeYk9fE7qlzG8IF0YxUcD2tczF2uwxI7cXQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Tec6nkicZqYxNmVdCYkIFVBXMHbcnavD9+G0ALmKta9ogAt4B/XXxrtojlp6K0oyuF2/y/OAq0ODf9vjkjnF9kJ+yEouL2WOq/2sgtasng/W3jgYq9CbKL+zmU5I96OgUcSkx9K1trFWv9uiW66LAzVCHzxwx5LnDY5Nu9Y6SkA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=a7n0Qy4E; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="a7n0Qy4E" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711969; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HqFrQFPtVkdFa+x5udaTTR/4bek6Z3abiEm6Dfccfmw=; b=a7n0Qy4ENXJeEguCWvjqw9nNjrrdnQOqZiHKBtpkL8SsI21/hCDrzAK2ROURwoVSSdMwDH XF0k4a5wB/cMBuVNR1A9uD5UuJdzQ2ZyymrWWsaXZJYoq403esAMzDgzyZziG+fE7mvHZW S2GFZdcxiy7m82Ucvc5wnHOANDKGvR4= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng , Yosry Ahmed Subject: [PATCH v6 30/33] mm: memcontrol: prepare for reparenting non-hierarchical stats Date: Thu, 5 Mar 2026 19:52:48 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng To resolve the dying memcg issue, we need to reparent LRU folios of child memcg to its parent memcg. This could cause problems for non-hierarchical stats. As Yosry Ahmed pointed out: ``` In short, if memory is charged to a dying cgroup at the time of reparenting, when the memory gets uncharged the stats updates will occur at the parent. This will update both hierarchical and non-hierarchical stats of the parent, which would corrupt the parent's non-hierarchical stats (because those counters were never incremented when the memory was charged). ``` Now we have the following two types of non-hierarchical stats, and they are only used in CONFIG_MEMCG_V1: a. memcg->vmstats->state_local[i] b. pn->lruvec_stats->state_local[i] To ensure that these non-hierarchical stats work properly, we need to reparent these non-hierarchical stats after reparenting LRU folios. To this end, this commit makes the following preparations: 1. implement reparent_state_local() to reparent non-hierarchical stats 2. make css_killed_work_fn() to be called in rcu work, and implement get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race between mod_memcg_state()/mod_memcg_lruvec_state() and reparent_state_local() Co-developed-by: Yosry Ahmed Signed-off-by: Yosry Ahmed Signed-off-by: Qi Zheng Acked-by: Shakeel Butt --- kernel/cgroup/cgroup.c | 9 ++-- mm/memcontrol-v1.c | 16 +++++++ mm/memcontrol-v1.h | 7 +++ mm/memcontrol.c | 97 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 125 insertions(+), 4 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index be1d71dda3179..a9007f8aa029e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6044,8 +6044,9 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const= char *name, umode_t mode) */ static void css_killed_work_fn(struct work_struct *work) { - struct cgroup_subsys_state *css =3D - container_of(work, struct cgroup_subsys_state, destroy_work); + struct cgroup_subsys_state *css; + + css =3D container_of(to_rcu_work(work), struct cgroup_subsys_state, destr= oy_rwork); =20 cgroup_lock(); =20 @@ -6066,8 +6067,8 @@ static void css_killed_ref_fn(struct percpu_ref *ref) container_of(ref, struct cgroup_subsys_state, refcnt); =20 if (atomic_dec_and_test(&css->online_cnt)) { - INIT_WORK(&css->destroy_work, css_killed_work_fn); - queue_work(cgroup_offline_wq, &css->destroy_work); + INIT_RCU_WORK(&css->destroy_rwork, css_killed_work_fn); + queue_rcu_work(cgroup_offline_wq, &css->destroy_rwork); } } =20 diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index fe42ef664f1e1..51fb4406f45cf 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1897,6 +1897,22 @@ static const unsigned int memcg1_events[] =3D { PGMAJFAULT, }; =20 +void reparent_memcg1_state_local(struct mem_cgroup *memcg, struct mem_cgro= up *parent) +{ + int i; + + for (i =3D 0; i < ARRAY_SIZE(memcg1_stats); i++) + reparent_memcg_state_local(memcg, parent, memcg1_stats[i]); +} + +void reparent_memcg1_lruvec_state_local(struct mem_cgroup *memcg, struct m= em_cgroup *parent) +{ + int i; + + for (i =3D 0; i < NR_LRU_LISTS; i++) + reparent_memcg_lruvec_state_local(memcg, parent, i); +} + void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) { unsigned long memory, memsw; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 4041b5027a94b..05e6ff40f7556 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -77,6 +77,13 @@ void memcg1_uncharge_batch(struct mem_cgroup *memcg, uns= igned long pgpgout, unsigned long nr_memory, int nid); =20 void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); +void reparent_memcg1_state_local(struct mem_cgroup *memcg, struct mem_cgro= up *parent); +void reparent_memcg1_lruvec_state_local(struct mem_cgroup *memcg, struct m= em_cgroup *parent); + +void reparent_memcg_state_local(struct mem_cgroup *memcg, + struct mem_cgroup *parent, int idx); +void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg, + struct mem_cgroup *parent, int idx); =20 void memcg1_account_kmem(struct mem_cgroup *memcg, int nr_pages); static inline bool memcg1_tcpmem_active(struct mem_cgroup *memcg) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 23b70bd80ddc9..b0519a16f5684 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -226,6 +226,34 @@ static inline struct obj_cgroup *__memcg_reparent_objc= gs(struct mem_cgroup *memc return objcg; } =20 +#ifdef CONFIG_MEMCG_V1 +static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force); + +static inline void reparent_state_local(struct mem_cgroup *memcg, struct m= em_cgroup *parent) +{ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return; + + /* + * Reparent stats exposed non-hierarchically. Flush @memcg's stats first + * to read its stats accurately , and conservatively flush @parent's + * stats after reparenting to avoid hiding a potentially large stat + * update (e.g. from callers of mem_cgroup_flush_stats_ratelimited()). + */ + __mem_cgroup_flush_stats(memcg, true); + + /* The following counts are all non-hierarchical and need to be reparente= d. */ + reparent_memcg1_state_local(memcg, parent); + reparent_memcg1_lruvec_state_local(memcg, parent); + + __mem_cgroup_flush_stats(parent, true); +} +#else +static inline void reparent_state_local(struct mem_cgroup *memcg, struct m= em_cgroup *parent) +{ +} +#endif + static inline void reparent_locks(struct mem_cgroup *memcg, struct mem_cgr= oup *parent) { spin_lock_irq(&objcg_lock); @@ -473,6 +501,30 @@ unsigned long lruvec_page_state_local(struct lruvec *l= ruvec, return x; } =20 +#ifdef CONFIG_MEMCG_V1 +static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn, + enum node_stat_item idx, int val); + +void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg, + struct mem_cgroup *parent, int idx) +{ + int nid; + + for_each_node(nid) { + struct lruvec *child_lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + struct lruvec *parent_lruvec =3D mem_cgroup_lruvec(parent, NODE_DATA(nid= )); + unsigned long value =3D lruvec_page_state_local(child_lruvec, idx); + struct mem_cgroup_per_node *child_pn, *parent_pn; + + child_pn =3D container_of(child_lruvec, struct mem_cgroup_per_node, lruv= ec); + parent_pn =3D container_of(parent_lruvec, struct mem_cgroup_per_node, lr= uvec); + + __mod_memcg_lruvec_state(child_pn, idx, -value); + __mod_memcg_lruvec_state(parent_pn, idx, value); + } +} +#endif + /* Subset of vm_event_item to report for memcg event stats */ static const unsigned int memcg_vm_event_stat[] =3D { #ifdef CONFIG_MEMCG_V1 @@ -718,6 +770,42 @@ static int memcg_state_val_in_pages(int idx, int val) return max(val * unit / PAGE_SIZE, 1UL); } =20 +#ifdef CONFIG_MEMCG_V1 +/* + * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid race wi= th + * reparenting of non-hierarchical state_locals. + */ +static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgro= up *memcg) +{ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return memcg; + + rcu_read_lock(); + + while (memcg_is_dying(memcg)) + memcg =3D parent_mem_cgroup(memcg); + + return memcg; +} + +static inline void get_non_dying_memcg_end(void) +{ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return; + + rcu_read_unlock(); +} +#else +static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgro= up *memcg) +{ + return memcg; +} + +static inline void get_non_dying_memcg_end(void) +{ +} +#endif + static void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) { @@ -769,6 +857,15 @@ unsigned long memcg_page_state_local(struct mem_cgroup= *memcg, int idx) #endif return x; } + +void reparent_memcg_state_local(struct mem_cgroup *memcg, + struct mem_cgroup *parent, int idx) +{ + unsigned long value =3D memcg_page_state_local(memcg, idx); + + __mod_memcg_state(memcg, idx, -value); + __mod_memcg_state(parent, idx, value); +} #endif =20 static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn, --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3634339A815 for ; Thu, 5 Mar 2026 11:59:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711988; cv=none; b=SQ0cPekbl34PV9A0tiL5SFoNv1tlf2R4aM/fraZxhzPByUNWcB/923hkgF5EVVyG4Yrzi5TnQ6B1mqwP4w2fewq7Qe8YlJ1W7UmJLcPWLiBH95qK+OtbwolGxRQCyzJImGLowzLLyo7jp7mxMpQkkV03DAvbEyW4NoMSpm9f4to= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711988; c=relaxed/simple; bh=PrqCF9HLEMi54cS1XE53/eSgDp1R0X08k2aknO/wmZg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g/AxRKG/v7ztyzVuVVgFVWmH3LcQKsRCi2Hd22PZEbxWFpkTx3uUgwFxLN1v4OI42BVqnmqHmmdN48SZqUDKTg26180zfrz0IcSILSNv+WmjJERQbNe9pBzugHBH2aGof1aMTx8r189GKCmosLjVvgRx8HBdE0dXmsUNzsCVDBw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=eE/2iDHg; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="eE/2iDHg" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=000wQrhgOQlDoscfHVaYYyHCVpTODCn4mRL9PBdBKCM=; b=eE/2iDHgDMQ63mmeDwbJuI5AHuk2LFgHFiuTu8V95o3mZZejkhTq3yRUqwRjXVqXXvXjZE fVF9mmbIvtEj+il2YBtAJnoX/oA9M3C2JpjP2oDcitVmYdstJrgDWXiqNwNIZ7VL+kpOI3 6ZH8O7BA15WAoPabnZd56p72EyWfoYc= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v6 31/33] mm: memcontrol: convert objcg to be per-memcg per-node type Date: Thu, 5 Mar 2026 19:52:49 +0800 Message-ID: <56c04b1c5d54f75ccdc12896df6c1ca35403ecc3.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Convert objcg to be per-memcg per-node type, so that when reparent LRU folios later, we can hold the lru lock at the node level, thus avoiding holding too many lru locks at once. Signed-off-by: Qi Zheng Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 23 +++++------ include/linux/sched.h | 2 +- mm/memcontrol.c | 79 +++++++++++++++++++++++--------------- 3 files changed, 62 insertions(+), 42 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d2748e672fd88..57d86decf2830 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -116,6 +116,16 @@ struct mem_cgroup_per_node { unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter; =20 + /* + * objcg is wiped out as a part of the objcg repaprenting process. + * orig_objcg preserves a pointer (and a reference) to the original + * objcg until the end of live of memcg. + */ + struct obj_cgroup __rcu *objcg; + struct obj_cgroup *orig_objcg; + /* list of inherited objcgs, protected by objcg_lock */ + struct list_head objcg_list; + #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC /* slab stats for nmi context */ atomic_t slab_reclaimable; @@ -180,6 +190,7 @@ struct obj_cgroup { struct list_head list; /* protected by objcg_lock */ struct rcu_head rcu; }; + bool is_root; }; =20 /* @@ -258,15 +269,6 @@ struct mem_cgroup { seqlock_t socket_pressure_seqlock; #endif int kmemcg_id; - /* - * memcg->objcg is wiped out as a part of the objcg repaprenting - * process. memcg->orig_objcg preserves a pointer (and a reference) - * to the original objcg until the end of live of memcg. - */ - struct obj_cgroup __rcu *objcg; - struct obj_cgroup *orig_objcg; - /* list of inherited objcgs, protected by objcg_lock */ - struct list_head objcg_list; =20 struct memcg_vmstats_percpu __percpu *vmstats_percpu; =20 @@ -333,7 +335,6 @@ struct mem_cgroup { #define MEMCG_CHARGE_BATCH 64U =20 extern struct mem_cgroup *root_mem_cgroup; -extern struct obj_cgroup *root_obj_cgroup; =20 enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ @@ -552,7 +553,7 @@ static inline bool mem_cgroup_is_root(struct mem_cgroup= *memcg) =20 static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) { - return objcg =3D=3D root_obj_cgroup; + return objcg->is_root; } =20 static inline bool mem_cgroup_disabled(void) diff --git a/include/linux/sched.h b/include/linux/sched.h index a7b4a980eb2f0..7b63b7b74f414 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1533,7 +1533,7 @@ struct task_struct { /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; =20 - /* Cache for current->cgroups->memcg->objcg lookups: */ + /* Cache for current->cgroups->memcg->nodeinfo[nid]->objcg lookups: */ struct obj_cgroup *objcg; #endif =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b0519a16f5684..e31c58bc89188 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -84,8 +84,6 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; EXPORT_SYMBOL(root_mem_cgroup); =20 -struct obj_cgroup *root_obj_cgroup __read_mostly; - /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg); @@ -210,18 +208,21 @@ static struct obj_cgroup *obj_cgroup_alloc(void) } =20 static inline struct obj_cgroup *__memcg_reparent_objcgs(struct mem_cgroup= *memcg, - struct mem_cgroup *parent) + struct mem_cgroup *parent, + int nid) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup_per_node *pn =3D memcg->nodeinfo[nid]; + struct mem_cgroup_per_node *parent_pn =3D parent->nodeinfo[nid]; =20 - objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); + objcg =3D rcu_replace_pointer(pn->objcg, NULL, true); /* 1) Ready to reparent active objcg. */ - list_add(&objcg->list, &memcg->objcg_list); + list_add(&objcg->list, &pn->objcg_list); /* 2) Reparent active objcg and already reparented objcgs to parent. */ - list_for_each_entry(iter, &memcg->objcg_list, list) + list_for_each_entry(iter, &pn->objcg_list, list) WRITE_ONCE(iter->memcg, parent); /* 3) Move already reparented objcgs to the parent's list */ - list_splice(&memcg->objcg_list, &parent->objcg_list); + list_splice(&pn->objcg_list, &parent_pn->objcg_list); =20 return objcg; } @@ -268,14 +269,17 @@ static void memcg_reparent_objcgs(struct mem_cgroup *= memcg) { struct obj_cgroup *objcg; struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + int nid; =20 - reparent_locks(memcg, parent); + for_each_node(nid) { + reparent_locks(memcg, parent); =20 - objcg =3D __memcg_reparent_objcgs(memcg, parent); + objcg =3D __memcg_reparent_objcgs(memcg, parent, nid); =20 - reparent_unlocks(memcg, parent); + reparent_unlocks(memcg, parent); =20 - percpu_ref_kill(&objcg->refcnt); + percpu_ref_kill(&objcg->refcnt); + } } =20 /* @@ -2877,8 +2881,10 @@ struct mem_cgroup *mem_cgroup_from_virt(void *p) =20 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *m= emcg) { + int nid =3D numa_node_id(); + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { - struct obj_cgroup *objcg =3D rcu_dereference(memcg->objcg); + struct obj_cgroup *objcg =3D rcu_dereference(memcg->nodeinfo[nid]->objcg= ); =20 if (likely(objcg && obj_cgroup_tryget(objcg))) return objcg; @@ -2942,6 +2948,7 @@ __always_inline struct obj_cgroup *current_obj_cgroup= (void) { struct mem_cgroup *memcg; struct obj_cgroup *objcg; + int nid =3D numa_node_id(); =20 if (IS_ENABLED(CONFIG_MEMCG_NMI_UNSAFE) && in_nmi()) return NULL; @@ -2958,14 +2965,14 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) * Objcg reference is kept by the task, so it's safe * to use the objcg by the current task. */ - return objcg ? : root_obj_cgroup; + return objcg ? : rcu_dereference_check(root_mem_cgroup->nodeinfo[nid]->o= bjcg, 1); } =20 memcg =3D this_cpu_read(int_active_memcg); if (unlikely(memcg)) goto from_memcg; =20 - return root_obj_cgroup; + return rcu_dereference_check(root_mem_cgroup->nodeinfo[nid]->objcg, 1); =20 from_memcg: for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { @@ -2975,12 +2982,12 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) * away and can be used within the scope without any additional * protection. */ - objcg =3D rcu_dereference_check(memcg->objcg, 1); + objcg =3D rcu_dereference_check(memcg->nodeinfo[nid]->objcg, 1); if (likely(objcg)) return objcg; } =20 - return root_obj_cgroup; + return rcu_dereference_check(root_mem_cgroup->nodeinfo[nid]->objcg, 1); } =20 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) @@ -3877,6 +3884,8 @@ static bool alloc_mem_cgroup_per_node_info(struct mem= _cgroup *memcg, int node) if (!pn->lruvec_stats_percpu) goto fail; =20 + INIT_LIST_HEAD(&pn->objcg_list); + lruvec_init(&pn->lruvec); pn->memcg =3D memcg; =20 @@ -3891,10 +3900,12 @@ static void __mem_cgroup_free(struct mem_cgroup *me= mcg) { int node; =20 - obj_cgroup_put(memcg->orig_objcg); + for_each_node(node) { + struct mem_cgroup_per_node *pn =3D memcg->nodeinfo[node]; =20 - for_each_node(node) - free_mem_cgroup_per_node_info(memcg->nodeinfo[node]); + obj_cgroup_put(pn->orig_objcg); + free_mem_cgroup_per_node_info(pn); + } memcg1_free_events(memcg); kfree(memcg->vmstats); free_percpu(memcg->vmstats_percpu); @@ -3965,7 +3976,6 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem= _cgroup *parent) #endif memcg1_memcg_init(memcg); memcg->kmemcg_id =3D -1; - INIT_LIST_HEAD(&memcg->objcg_list); #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); for (i =3D 0; i < MEMCG_CGWB_FRN_CNT; i++) @@ -4042,6 +4052,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); struct obj_cgroup *objcg; + int nid; =20 memcg_online_kmem(memcg); =20 @@ -4053,17 +4064,19 @@ static int mem_cgroup_css_online(struct cgroup_subs= ys_state *css) if (alloc_shrinker_info(memcg)) goto offline_kmem; =20 - objcg =3D obj_cgroup_alloc(); - if (!objcg) - goto free_shrinker; + for_each_node(nid) { + objcg =3D obj_cgroup_alloc(); + if (!objcg) + goto free_objcg; =20 - if (unlikely(mem_cgroup_is_root(memcg))) - root_obj_cgroup =3D objcg; + if (unlikely(mem_cgroup_is_root(memcg))) + objcg->is_root =3D true; =20 - objcg->memcg =3D memcg; - rcu_assign_pointer(memcg->objcg, objcg); - obj_cgroup_get(objcg); - memcg->orig_objcg =3D objcg; + objcg->memcg =3D memcg; + rcu_assign_pointer(memcg->nodeinfo[nid]->objcg, objcg); + obj_cgroup_get(objcg); + memcg->nodeinfo[nid]->orig_objcg =3D objcg; + } =20 if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_dfl_wq, &stats_flush_dwork, @@ -4087,7 +4100,13 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); =20 return 0; -free_shrinker: +free_objcg: + for_each_node(nid) { + struct mem_cgroup_per_node *pn =3D memcg->nodeinfo[nid]; + + if (pn && pn->orig_objcg) + obj_cgroup_put(pn->orig_objcg); + } free_shrinker_info(memcg); offline_kmem: memcg_offline_kmem(memcg); --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDD2C3845D4 for ; Thu, 5 Mar 2026 11:59:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711995; cv=none; b=OCfDWGX5nCfyKkl3s41M8526XvyaFV+xnycxroBXeEpB4/9gUFH9U/+YXxlj8nFDHGFaEsCwT2m7v5RK+1+kB8bp2OjrpeyMXWLhOKwUzGwCZNRW8K2RkOu3SF5MKG78vcdcPCkfyfRf8jmwujRedbR85eTR+ZGqR3jwhNMPcts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772711995; c=relaxed/simple; bh=LuFe+LTt932YN9QHDaIEVszq9nj1IMPuORqobyfPehI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nWRZpF8Zzk0LbdLOo1B2OllRHXAu/1VyNUIslods1oNOyl7bkjVtchlJdoNoh+5CaaJl7nVxkE9XyYxD6clRJ0R77SmcHyL6NEImvRknJq6eDUsGs/A+A0E3k4IEgtl4dUwUm67CywWR7222zRzCFbeQ+yAn1HwFVpg/Kekl1KU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=f9w4t76E; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="f9w4t76E" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772711991; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bMIyrDgWN3BRb8HiAamv2Eys6uwM9xLuruudWWISqFg=; b=f9w4t76E/ysx8fGAKxmoiZyMBd+Jj7F7cMcEYAGf/Cs5ouDONAn9lgnWR6G13++U4EtxQn 1UDf+NVDuKMEbIgA4o/ZNwQul+mvj6nmmFYq9WTVUpC31B3jRZ+K52fL4VTHQSsMnDJlqp iUzgc9yygV3myxrbmPExYJuDGMrp7FI= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 32/33] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Date: Thu, 5 Mar 2026 19:52:50 +0800 Message-ID: <80cb7af198dc6f2173fe616d1207a4c315ece141.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Now that everything is set up, switch folio->memcg_data pointers to objcgs, update the accessors, and execute reparenting on cgroup death. Finally, folio->memcg_data of LRU folios and kmem folios will always point to an object cgroup pointer. The folio->memcg_data of slab folios will point to an vector of object cgroups. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 77 +++++---------- mm/memcontrol-v1.c | 15 +-- mm/memcontrol.c | 194 ++++++++++++++++++++++--------------- 3 files changed, 151 insertions(+), 135 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 57d86decf2830..1b0dbc70c6b08 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -370,9 +370,6 @@ enum objext_flags { #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) =20 #ifdef CONFIG_MEMCG - -static inline bool folio_memcg_kmem(struct folio *folio); - /* * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. @@ -386,43 +383,19 @@ static inline struct mem_cgroup *obj_cgroup_memcg(str= uct obj_cgroup *objcg) } =20 /* - * __folio_memcg - Get the memory cgroup associated with a non-kmem folio - * @folio: Pointer to the folio. - * - * Returns a pointer to the memory cgroup associated with the folio, - * or NULL. This function assumes that the folio is known to have a - * proper memory cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * kmem folios. - */ -static inline struct mem_cgroup *__folio_memcg(struct folio *folio) -{ - unsigned long memcg_data =3D folio->memcg_data; - - VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); - - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); -} - -/* - * __folio_objcg - get the object cgroup associated with a kmem folio. + * folio_objcg - get the object cgroup associated with a folio. * @folio: Pointer to the folio. * * Returns a pointer to the object cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a - * proper object cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * LRU folios. + * proper object cgroup pointer. */ -static inline struct obj_cgroup *__folio_objcg(struct folio *folio) +static inline struct obj_cgroup *folio_objcg(struct folio *folio) { unsigned long memcg_data =3D folio->memcg_data; =20 VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); =20 return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } @@ -436,21 +409,30 @@ static inline struct obj_cgroup *__folio_objcg(struct= folio *folio) * proper memory cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * For a folio any of the following ensures folio and objcg binding stabil= ity: * * - the folio lock * - LRU isolation * - exclusive reference * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * Based on the stable binding of folio and objcg, for a folio any of the + * following ensures folio and memcg binding stability: + * + * - cgroup_mutex + * - the lruvec lock + * + * If the caller only want to ensure that the page counters of memcg are + * updated correctly, ensure that the binding stability of folio and objcg + * is sufficient. + * + * Note: The caller should hold an rcu read lock or cgroup_mutex to protect + * memcg associated with a folio from being released. */ static inline struct mem_cgroup *folio_memcg(struct folio *folio) { - if (folio_memcg_kmem(folio)) - return obj_cgroup_memcg(__folio_objcg(folio)); - return __folio_memcg(folio); + struct obj_cgroup *objcg =3D folio_objcg(folio); + + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 /* @@ -474,15 +456,10 @@ static inline bool folio_memcg_charged(struct folio *= folio) * has an associated memory cgroup pointer or an object cgroups vector or * an object cgroup. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * The page and objcg or memcg binding rules can refer to folio_memcg(). * - * - the folio lock - * - LRU isolation - * - exclusive reference - * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * A caller should hold an rcu read lock to protect memcg associated with a + * page from being released. */ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) { @@ -491,18 +468,14 @@ static inline struct mem_cgroup *folio_memcg_check(st= ruct folio *folio) * for slabs, READ_ONCE() should be used here. */ unsigned long memcg_data =3D READ_ONCE(folio->memcg_data); + struct obj_cgroup *objcg; =20 if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; =20 - if (memcg_data & MEMCG_DATA_KMEM) { - struct obj_cgroup *objcg; - - objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); - return obj_cgroup_memcg(objcg); - } + objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); =20 - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 static inline struct mem_cgroup *page_memcg_check(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 51fb4406f45cf..427cc45c3c369 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -613,6 +613,7 @@ void memcg1_commit_charge(struct folio *folio, struct m= em_cgroup *memcg) void memcg1_swapout(struct folio *folio, swp_entry_t entry) { struct mem_cgroup *memcg, *swap_memcg; + struct obj_cgroup *objcg; unsigned int nr_entries; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); @@ -624,12 +625,13 @@ void memcg1_swapout(struct folio *folio, swp_entry_t = entry) if (!do_memsw_account()) return; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* * In case the memcg owning these pages has been offlined and doesn't * have an ID allocated to it anymore, charge the closest online @@ -644,7 +646,7 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) folio_unqueue_deferred_split(folio); folio->memcg_data =3D 0; =20 - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) page_counter_uncharge(&memcg->memory, nr_entries); =20 if (memcg !=3D swap_memcg) { @@ -665,7 +667,8 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) preempt_enable_nested(); memcg1_check_events(memcg, folio_nid(folio)); =20 - css_put(&memcg->css); + rcu_read_unlock(); + obj_cgroup_put(objcg); } =20 /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e31c58bc89188..992a3f5caa62b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -255,13 +255,17 @@ static inline void reparent_state_local(struct mem_cg= roup *memcg, struct mem_cgr } #endif =20 -static inline void reparent_locks(struct mem_cgroup *memcg, struct mem_cgr= oup *parent) +static inline void reparent_locks(struct mem_cgroup *memcg, struct mem_cgr= oup *parent, int nid) { spin_lock_irq(&objcg_lock); + spin_lock_nested(&mem_cgroup_lruvec(memcg, NODE_DATA(nid))->lru_lock, 1); + spin_lock_nested(&mem_cgroup_lruvec(parent, NODE_DATA(nid))->lru_lock, 2); } =20 -static inline void reparent_unlocks(struct mem_cgroup *memcg, struct mem_c= group *parent) +static inline void reparent_unlocks(struct mem_cgroup *memcg, struct mem_c= group *parent, int nid) { + spin_unlock(&mem_cgroup_lruvec(parent, NODE_DATA(nid))->lru_lock); + spin_unlock(&mem_cgroup_lruvec(memcg, NODE_DATA(nid))->lru_lock); spin_unlock_irq(&objcg_lock); } =20 @@ -272,14 +276,31 @@ static void memcg_reparent_objcgs(struct mem_cgroup *= memcg) int nid; =20 for_each_node(nid) { - reparent_locks(memcg, parent); +retry: + if (lru_gen_enabled()) + max_lru_gen_memcg(parent, nid); + + reparent_locks(memcg, parent, nid); + + if (lru_gen_enabled()) { + if (!recheck_lru_gen_max_memcg(parent, nid)) { + reparent_unlocks(memcg, parent, nid); + cond_resched(); + goto retry; + } + lru_gen_reparent_memcg(memcg, parent, nid); + } else { + lru_reparent_memcg(memcg, parent, nid); + } =20 objcg =3D __memcg_reparent_objcgs(memcg, parent, nid); =20 - reparent_unlocks(memcg, parent); + reparent_unlocks(memcg, parent, nid); =20 percpu_ref_kill(&objcg->refcnt); } + + reparent_state_local(memcg, parent); } =20 /* @@ -824,6 +845,7 @@ static void __mod_memcg_state(struct mem_cgroup *memcg, this_cpu_add(memcg->vmstats_percpu->state[i], val); val =3D memcg_state_val_in_pages(idx, val); memcg_rstat_updated(memcg, val, cpu); + trace_mod_memcg_state(memcg, idx, val); =20 put_cpu(); @@ -841,7 +863,9 @@ void mod_memcg_state(struct mem_cgroup *memcg, enum mem= cg_stat_item idx, if (mem_cgroup_disabled()) return; =20 + memcg =3D get_non_dying_memcg_start(memcg); __mod_memcg_state(memcg, idx, val); + get_non_dying_memcg_end(); } =20 #ifdef CONFIG_MEMCG_V1 @@ -901,11 +925,17 @@ static void mod_memcg_lruvec_state(struct lruvec *lru= vec, enum node_stat_item idx, int val) { + struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; + struct mem_cgroup *memcg; =20 pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); + memcg =3D get_non_dying_memcg_start(pn->memcg); + pn =3D memcg->nodeinfo[pgdat->node_id]; =20 __mod_memcg_lruvec_state(pn, idx, val); + + get_non_dying_memcg_end(); } =20 /** @@ -1128,6 +1158,8 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) /** * get_mem_cgroup_from_folio - Obtain a reference on a given folio's memcg. * @folio: folio from which memcg should be extracted. + * + * See folio_memcg() for folio->objcg/memcg binding rules. */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { @@ -2769,17 +2801,17 @@ static inline int try_charge(struct mem_cgroup *mem= cg, gfp_t gfp_mask, return try_charge_memcg(memcg, gfp_mask, nr_pages); } =20 -static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) +static void commit_charge(struct folio *folio, struct obj_cgroup *objcg) { VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); /* - * Any of the following ensures page's memcg stability: + * Any of the following ensures folio's objcg stability: * * - the page lock * - LRU isolation * - exclusive reference */ - folio->memcg_data =3D (unsigned long)memcg; + folio->memcg_data =3D (unsigned long)objcg; } =20 #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC @@ -2893,6 +2925,17 @@ static struct obj_cgroup *__get_obj_cgroup_from_memc= g(struct mem_cgroup *memcg) return NULL; } =20 +static inline struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgro= up *memcg) +{ + struct obj_cgroup *objcg; + + rcu_read_lock(); + objcg =3D __get_obj_cgroup_from_memcg(memcg); + rcu_read_unlock(); + + return objcg; +} + static struct obj_cgroup *current_objcg_update(void) { struct mem_cgroup *memcg; @@ -2994,17 +3037,10 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct= folio *folio) { struct obj_cgroup *objcg; =20 - if (!memcg_kmem_online()) - return NULL; - - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); + objcg =3D folio_objcg(folio); + if (objcg) obj_cgroup_get(objcg); - } else { - rcu_read_lock(); - objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); - rcu_read_unlock(); - } + return objcg; } =20 @@ -3520,7 +3556,7 @@ void folio_split_memcg_refs(struct folio *folio, unsi= gned old_order, return; =20 new_refs =3D (1 << (old_order - new_order)) - 1; - css_get_many(&__folio_memcg(folio)->css, new_refs); + obj_cgroup_get_many(folio_objcg(folio), new_refs); } =20 static void memcg_online_kmem(struct mem_cgroup *memcg) @@ -4949,16 +4985,20 @@ void mem_cgroup_calculate_protection(struct mem_cgr= oup *root, static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, gfp_t gfp) { - int ret; - - ret =3D try_charge(memcg, gfp, folio_nr_pages(folio)); - if (ret) - goto out; + int ret =3D 0; + struct obj_cgroup *objcg; =20 - css_get(&memcg->css); - commit_charge(folio, memcg); + objcg =3D get_obj_cgroup_from_memcg(memcg); + /* Do not account at the root objcg level. */ + if (!obj_cgroup_is_root(objcg)) + ret =3D try_charge_memcg(memcg, gfp, folio_nr_pages(folio)); + if (ret) { + obj_cgroup_put(objcg); + return ret; + } + commit_charge(folio, objcg); memcg1_commit_charge(folio, memcg); -out: + return ret; } =20 @@ -5044,7 +5084,7 @@ int mem_cgroup_swapin_charge_folio(struct folio *foli= o, struct mm_struct *mm, } =20 struct uncharge_gather { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; @@ -5058,58 +5098,52 @@ static inline void uncharge_gather_clear(struct unc= harge_gather *ug) =20 static void uncharge_batch(const struct uncharge_gather *ug) { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(ug->objcg); if (ug->nr_memory) { - memcg_uncharge(ug->memcg, ug->nr_memory); + memcg_uncharge(memcg, ug->nr_memory); if (ug->nr_kmem) { - mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem); - memcg1_account_kmem(ug->memcg, -ug->nr_kmem); + mod_memcg_state(memcg, MEMCG_KMEM, -ug->nr_kmem); + memcg1_account_kmem(memcg, -ug->nr_kmem); } - memcg1_oom_recover(ug->memcg); + memcg1_oom_recover(memcg); } =20 - memcg1_uncharge_batch(ug->memcg, ug->pgpgout, ug->nr_memory, ug->nid); + memcg1_uncharge_batch(memcg, ug->pgpgout, ug->nr_memory, ug->nid); + rcu_read_unlock(); =20 /* drop reference from uncharge_folio */ - css_put(&ug->memcg->css); + obj_cgroup_put(ug->objcg); } =20 static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) { long nr_pages; - struct mem_cgroup *memcg; struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); =20 /* * Nobody should be changing or seriously looking at - * folio memcg or objcg at this point, we have fully - * exclusive access to the folio. + * folio objcg at this point, we have fully exclusive + * access to the folio. */ - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); - /* - * This get matches the put at the end of the function and - * kmem pages do not hold memcg references anymore. - */ - memcg =3D get_mem_cgroup_from_objcg(objcg); - } else { - memcg =3D __folio_memcg(folio); - } - - if (!memcg) + objcg =3D folio_objcg(folio); + if (!objcg) return; =20 - if (ug->memcg !=3D memcg) { - if (ug->memcg) { + if (ug->objcg !=3D objcg) { + if (ug->objcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg =3D memcg; + ug->objcg =3D objcg; ug->nid =3D folio_nid(folio); =20 - /* pairs with css_put in uncharge_batch */ - css_get(&memcg->css); + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(objcg); } =20 nr_pages =3D folio_nr_pages(folio); @@ -5117,20 +5151,17 @@ static void uncharge_folio(struct folio *folio, str= uct uncharge_gather *ug) if (folio_memcg_kmem(folio)) { ug->nr_memory +=3D nr_pages; ug->nr_kmem +=3D nr_pages; - - folio->memcg_data =3D 0; - obj_cgroup_put(objcg); } else { /* LRU pages aren't accounted at the root level */ - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) ug->nr_memory +=3D nr_pages; ug->pgpgout++; =20 WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); - folio->memcg_data =3D 0; } =20 - css_put(&memcg->css); + folio->memcg_data =3D 0; + obj_cgroup_put(objcg); } =20 void __mem_cgroup_uncharge(struct folio *folio) @@ -5154,7 +5185,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) uncharge_gather_clear(&ug); for (i =3D 0; i < folios->nr; i++) uncharge_folio(folios->folios[i], &ug); - if (ug.memcg) + if (ug.objcg) uncharge_batch(&ug); } =20 @@ -5171,6 +5202,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) void mem_cgroup_replace_folio(struct folio *old, struct folio *new) { struct mem_cgroup *memcg; + struct obj_cgroup *objcg; long nr_pages =3D folio_nr_pages(new); =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); @@ -5185,21 +5217,24 @@ void mem_cgroup_replace_folio(struct folio *old, st= ruct folio *new) if (folio_memcg_charged(new)) return; =20 - memcg =3D folio_memcg(old); - VM_WARN_ON_ONCE_FOLIO(!memcg, old); - if (!memcg) + objcg =3D folio_objcg(old); + VM_WARN_ON_ONCE_FOLIO(!objcg, old); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* Force-charge the new page. The old one will be freed soon */ - if (!mem_cgroup_is_root(memcg)) { + if (!obj_cgroup_is_root(objcg)) { page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); } =20 - css_get(&memcg->css); - commit_charge(new, memcg); + obj_cgroup_get(objcg); + commit_charge(new, objcg); memcg1_commit_charge(new, memcg); + rcu_read_unlock(); } =20 /** @@ -5215,7 +5250,7 @@ void mem_cgroup_replace_folio(struct folio *old, stru= ct folio *new) */ void mem_cgroup_migrate(struct folio *old, struct folio *new) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); VM_BUG_ON_FOLIO(!folio_test_locked(new), new); @@ -5226,18 +5261,18 @@ void mem_cgroup_migrate(struct folio *old, struct f= olio *new) if (mem_cgroup_disabled()) return; =20 - memcg =3D folio_memcg(old); + objcg =3D folio_objcg(old); /* - * Note that it is normal to see !memcg for a hugetlb folio. + * Note that it is normal to see !objcg for a hugetlb folio. * For e.g, it could have been allocated when memory_hugetlb_accounting * was not selected. */ - VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !memcg, old); - if (!memcg) + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !objcg, old); + if (!objcg) return; =20 - /* Transfer the charge and the css ref */ - commit_charge(new, memcg); + /* Transfer the charge and the objcg ref */ + commit_charge(new, objcg); =20 /* Warning should never happen, so don't worry about refcount non-0 */ WARN_ON_ONCE(folio_unqueue_deferred_split(old)); @@ -5420,22 +5455,27 @@ int __mem_cgroup_try_charge_swap(struct folio *foli= o, swp_entry_t entry) unsigned int nr_pages =3D folio_nr_pages(folio); struct page_counter *counter; struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 if (do_memsw_account()) return 0; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return 0; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); if (!entry.val) { memcg_memory_event(memcg, MEMCG_SWAP_FAIL); + rcu_read_unlock(); return 0; } =20 memcg =3D mem_cgroup_private_id_get_online(memcg, nr_pages); + /* memcg is pined by memcg ID. */ + rcu_read_unlock(); =20 if (!mem_cgroup_is_root(memcg) && !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) { --=20 2.20.1 From nobody Wed Apr 1 22:34:21 2026 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CBB33A0EA5 for ; Thu, 5 Mar 2026 12:00:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772712005; cv=none; b=J83CSJ6p8ZlAaHRIJH1n6FgKMIpPxtbi2Wdq7fqDaCrbvexnW1w9Pq8XVq+CNRbEv/+Nd86C0jW0neciDJM2YlbRn2qtihjATJTKgItA9YHfH5CNUWfbGYsb+sEtdE8UwemaP0G1r0n3o3sFF2t4cc4v/21n8ou2YcEkhtd6Hfk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772712005; c=relaxed/simple; bh=lomsJiRun49fJQhC81vs2LmoS+7TmRP+dsmyx7T2TzM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LNkmWgiTWPHQXXkpo28X0qY8OpGL8MqK1joqw39OkopikUWNmZdjBD/QVZmYrfJksqzMBxUmxa/B8zTtBWahMbbnDdsExC1CQxeJdPVnpooOZHFR6nmX+6YDOdDwq6Z517ttU33DWBRFVRWJECQECF7TqlMX0/wiNES5Ko1WrSs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ULgx3CO2; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ULgx3CO2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1772712001; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ME0/F5y30Uv1s+QbxgHUtCa7ObeVmIsWm7RG/5Co0FY=; b=ULgx3CO2HqA2P9ywrhLXm8b1vwiy/kInSpCXg3uJ+8iCynQKk54mAUKR9fYwxk8bU0lEzd 4GHK8veV3AqtbXGyEWzW4/c86ienIwVtSTyWnY5zODh2lVJf13R2dG+oU06J7xNhi9Epx/ PZGWtvUyHzmkZptXFjRRD3J4kQz1/Hs= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, yosry.ahmed@linux.dev, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com, usamaarif642@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 33/33] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Date: Thu, 5 Mar 2026 19:52:51 +0800 Message-ID: <2c90fc006d9d730331a3caeef96f7e5dabe2036d.1772711148.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song We must ensure the folio is deleted from or added to the correct lruvec list. So, add VM_WARN_ON_ONCE_FOLIO() to catch invalid users. The VM_BUG_ON_PAGE() in move_pages_to_lru() can be removed as add_page_to_lru_list() will perform the necessary check. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- include/linux/mm_inline.h | 6 ++++++ mm/vmscan.c | 1 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index fa2d6ba811b53..ad50688d89dba 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -342,6 +342,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, false)) return; =20 @@ -356,6 +358,8 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struc= t folio *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, true)) return; =20 @@ -370,6 +374,8 @@ void lruvec_del_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_del_folio(lruvec, folio, false)) return; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index 7f9f66e0b40e1..73bfa93696a27 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1920,7 +1920,6 @@ static unsigned int move_folios_to_lru(struct list_he= ad *list) continue; } =20 - VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); nr_moved +=3D nr_pages; --=20 2.20.1