From nobody Fri Dec 19 16:00:40 2025 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAEC4335BBE for ; Wed, 17 Dec 2025 07:28:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956528; cv=none; b=ZwRs9V/UW7Knazcp8nux1SoFAMCwB6fMAFoYlIt4K0jOd73oTCv0kHv6N3g+cv2EWeMN7NOfzX9FVVSpUy8ECREQHnMOkqg4wZPGd4niBDibGUo667wJwMp2qA1pZ5tnvWg6P1hGt54ttHIjJro4X1hXXllpj7f5+shrzvXkmvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956528; c=relaxed/simple; bh=Y0Tx92gw19P0/LOqlsVOLK3dXPTBECuL7Q52++0vsHc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YEoErGyG4OFaEnRu72XjQOQdSdRBa4HjLUhG2kCCEwp6OW7Hvrvc3m2kSaQyPqy/jxSDMP/g47WyPnybfW2Q/WUwqgr4DPvGqGlhcza61qcUp8UyOy2fXNnIzkrx2WvWLM3IjdtL2BP/sWtIF/cggfdOAwb0VwfAD/4yQs6QxBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=qFETWJ3/; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="qFETWJ3/" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PzAKpbiIZxa6n28+BpnjQ9tXWvyZHSCscw68X1Foo9A=; b=qFETWJ3/QbdSXlWUUX5sj/GLxv/uSjNALIuZ6SkUIGRjkqLuFad+gZwNeXXCVqqGKcwF8m Ph4pcxZJZGxxofIQxCgpMmaoeumjFkDPE/0ZabisZiaJrL0RllkrAspiKspyj++VDnh8vH ee7Bw8gYmx4i0tjC6dQoLoP21p+2XUk= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng , Chen Ridong Subject: [PATCH v2 01/28] mm: memcontrol: remove dead code of checking parent memory cgroup Date: Wed, 17 Dec 2025 15:27:25 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Since the no-hierarchy mode has been deprecated after the commit: commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode"). As a result, parent_mem_cgroup() will not return NULL except when passing the root memcg, and the root memcg cannot be offline. Hence, it's safe to remove the check on the returned value of parent_mem_cgroup(). Remove the corresponding dead code. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Reviewed-by: Chen Ridong Acked-by: Shakeel Butt --- mm/memcontrol.c | 5 ----- mm/shrinker.c | 6 +----- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e2e49f4ec9e0e..ae234518d023c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3331,9 +3331,6 @@ static void memcg_offline_kmem(struct mem_cgroup *mem= cg) return; =20 parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - memcg_reparent_list_lrus(memcg, parent); =20 /* @@ -3624,8 +3621,6 @@ struct mem_cgroup *mem_cgroup_id_get_online(struct me= m_cgroup *memcg) break; } memcg =3D parent_mem_cgroup(memcg); - if (!memcg) - memcg =3D root_mem_cgroup; } return memcg; } diff --git a/mm/shrinker.c b/mm/shrinker.c index 4a93fd433689a..e8e092a2f7f41 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -286,14 +286,10 @@ void reparent_shrinker_deferred(struct mem_cgroup *me= mcg) { int nid, index, offset; long nr; - struct mem_cgroup *parent; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); struct shrinker_info *child_info, *parent_info; struct shrinker_info_unit *child_unit, *parent_unit; =20 - parent =3D parent_mem_cgroup(memcg); - if (!parent) - parent =3D root_mem_cgroup; - /* Prevent from concurrent shrinker_info expand */ mutex_lock(&shrinker_mutex); for_each_node(nid) { --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1396335551 for ; Wed, 17 Dec 2025 07:29:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956546; cv=none; b=FrwpRTQKGSsf8SOLoOvFAZH3ePYY4CGIp9YlnbTW1MU5j2rYSV+vqbl/HWCj9HLe9flADDJCGiKA2Bpym8dQN/tIThRKu7DTQMlzdT0LgK2LnOguW07g06/nCrqSlg/pyfQ/P3AtsAa1yFIYnUr9pOIfaIdD0db7/T3jccxk9as= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956546; c=relaxed/simple; bh=91L5mf5b6B6koXKiVxxgao2s4CjXsExMntWK/4YdUZA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TdYUi6K8SgUUZQwx6XpxYdxZc9N3nngyJU/Aj/8SMN5uKt/HBl7JrEM7HnCSdp31uukE0qQ/fM0L3FMmZbQ+kzclLm5Qt1+dHM2lP8W4Btx77k3wl3A/zXOGYUW+QeQEY92OPJklPzpXPBWdwkJ4jBb35RE/AWIcusu5JCdS8pw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=HzWVNay8; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="HzWVNay8" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956535; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OTqh+R9AqZ6ZGZqMsV71qrw0XbHGmnb9j25C52g1tSo=; b=HzWVNay8nSdGORHcukbBXgAzr4n9E8zTAuZ3u9psS9ZbHKHC+kPQLttd9NUck2y7Un5PnU wguQekl8agZMRrWohpjuuXBgzV0tm6QYsctXzvnjUqRyPG3qVudTG9pjvg2VZCLkuSNrrs 4cXYorAWepxqxVXWPDP/ut7O1GjqHvQ= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 02/28] mm: workingset: use folio_lruvec() in workingset_refault() Date: Wed, 17 Dec 2025 15:27:26 +0800 Message-ID: <08c00b5f429b44a6df3f3798e43046ebd5825415.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Use folio_lruvec() to simplify the code. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Shakeel Butt --- mm/workingset.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index e9f05634747a7..e41b44e29944b 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -534,8 +534,6 @@ bool workingset_test_recent(void *shadow, bool file, bo= ol *workingset, void workingset_refault(struct folio *folio, void *shadow) { bool file =3D folio_is_file_lru(folio); - struct pglist_data *pgdat; - struct mem_cgroup *memcg; struct lruvec *lruvec; bool workingset; long nr; @@ -557,10 +555,7 @@ void workingset_refault(struct folio *folio, void *sha= dow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); - memcg =3D folio_memcg(folio); - pgdat =3D folio_pgdat(folio); - lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - + lruvec =3D folio_lruvec(folio); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E20D028136C for ; Wed, 17 Dec 2025 07:29:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956559; cv=none; b=iqS2fZRmebSmpO84oNIPfQKO2bmodHiazYbmjQsa/RcIELEf28AxvokrgNtHfkOsXbu1jG5k+j7Y/+FaShfA4i7g5YUQZYspAZ43alQM+8AHzJkMMSClkTcU9J11S0lXSZFr6Pf/BSEd9YFYY86HyjwS1GFnnK9Lfd6eHOSQZbo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956559; c=relaxed/simple; bh=U6m2VzWknbLlaBHJywfGw7jn0XU1TPaDYRztNSOF6kE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ESY4DNwB1PkZfRZtsOv2oGSqnH1jlkgBNBSo5l9sWwvvY/+i2/8x2aNE35b6Sebj5Yl+D96N8yoXeSAZzRTigBYbhEO32HMLSWt7szPzBk4HBhL4Kg6sjirkmBArmirOxVVjF2tS9mBtLqMCGsq1ukh9uZ/Irb4nZ5ExniaARFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=b8lGVFMU; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="b8lGVFMU" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4L3YNXkehnIUzmA6CqnAEojCUbAuuP/FY++XgJVDs2Q=; b=b8lGVFMUdp31z3ViKZO9ump0anJKy72LA/L/JhZ57uNsA6yNNyApzS7YROJV21E4UwWLmI b4dAJfey+LBXc3d7gLEIlKJc+YLaxUhW+zi+6ERo3nQQ0imO8Hc0plH3kE/UAKciz786CR 9hGDSdlAMoSLd6IADJTjF6eif5Bppxw= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng , Chen Ridong Subject: [PATCH v2 03/28] mm: rename unlock_page_lruvec_irq and its variants Date: Wed, 17 Dec 2025 15:27:27 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song It is inappropriate to use folio_lruvec_lock() variants in conjunction with unlock_page_lruvec() variants, as this involves the inconsistent operation of locking a folio while unlocking a page. To rectify this, the functions unlock_page_lruvec{_irq, _irqrestore} are renamed to lruvec_unlock{_irq,_irqrestore}. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Reviewed-by: Chen Ridong Acked-by: David Hildenbrand (Red Hat) Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 10 +++++----- mm/compaction.c | 14 +++++++------- mm/huge_memory.c | 2 +- mm/mlock.c | 2 +- mm/swap.c | 12 ++++++------ mm/vmscan.c | 4 ++-- 6 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6a48398a1f4e7..288dd6337f80f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1465,17 +1465,17 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } =20 -static inline void unlock_page_lruvec(struct lruvec *lruvec) +static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); } =20 -static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, +static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); @@ -1497,7 +1497,7 @@ static inline struct lruvec *folio_lruvec_relock_irq(= struct folio *folio, if (folio_matches_lruvec(folio, locked_lruvec)) return locked_lruvec; =20 - unlock_page_lruvec_irq(locked_lruvec); + lruvec_unlock_irq(locked_lruvec); } =20 return folio_lruvec_lock_irq(folio); @@ -1511,7 +1511,7 @@ static inline void folio_lruvec_relock_irqsave(struct= folio *folio, if (folio_matches_lruvec(folio, *lruvecp)) return; =20 - unlock_page_lruvec_irqrestore(*lruvecp, *flags); + lruvec_unlock_irqrestore(*lruvecp, *flags); } =20 *lruvecp =3D folio_lruvec_lock_irqsave(folio, flags); diff --git a/mm/compaction.c b/mm/compaction.c index 1e8f8eca318c6..c3e338aaa0ffb 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -913,7 +913,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, */ if (!(low_pfn % COMPACT_CLUSTER_MAX)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -964,7 +964,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, } /* for alloc_contig case */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1053,7 +1053,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, if (unlikely(page_has_movable_ops(page)) && !PageMovableOpsIsolated(page)) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } =20 @@ -1158,7 +1158,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, /* If we already hold the lock, we can skip some rechecking */ if (lruvec !=3D locked) { if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); =20 compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); locked =3D lruvec; @@ -1226,7 +1226,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, isolate_fail_put: /* Avoid potential deadlock in freeing page under lru_lock */ if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } folio_put(folio); @@ -1242,7 +1242,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, */ if (nr_isolated) { if (locked) { - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); locked =3D NULL; } putback_movable_pages(&cc->migratepages); @@ -1274,7 +1274,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, =20 isolate_abort: if (locked) - unlock_page_lruvec_irqrestore(locked, flags); + lruvec_unlock_irqrestore(locked, flags); if (folio) { folio_set_lru(folio); folio_put(folio); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40cf59301c21a..12b46215b30c1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3899,7 +3899,7 @@ static int __folio_freeze_and_split_unmapped(struct f= olio *folio, unsigned int n folio_ref_unfreeze(folio, folio_cache_ref_count(folio) + 1); =20 if (do_lru) - unlock_page_lruvec(lruvec); + lruvec_unlock(lruvec); =20 if (ci) swap_cluster_unlock(ci); diff --git a/mm/mlock.c b/mm/mlock.c index 2f699c3497a57..66740e16679c3 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -205,7 +205,7 @@ static void mlock_folio_batch(struct folio_batch *fbatc= h) } =20 if (lruvec) - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folios_put(fbatch); } =20 diff --git a/mm/swap.c b/mm/swap.c index 2260dcd2775e7..ec0c654e128dc 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -91,7 +91,7 @@ static void page_cache_release(struct folio *folio) =20 __page_cache_release(folio, &lruvec, &flags); if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); } =20 void __folio_put(struct folio *folio) @@ -175,7 +175,7 @@ static void folio_batch_move_lru(struct folio_batch *fb= atch, move_fn_t move_fn) } =20 if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); folios_put(fbatch); } =20 @@ -349,7 +349,7 @@ void folio_activate(struct folio *folio) =20 lruvec =3D folio_lruvec_lock_irq(folio); lru_activate(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); folio_set_lru(folio); } #endif @@ -963,7 +963,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) =20 if (folio_is_zone_device(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } if (folio_ref_sub_and_test(folio, nr_refs)) @@ -977,7 +977,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) /* hugetlb has its own memcg */ if (folio_test_hugetlb(folio)) { if (lruvec) { - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); lruvec =3D NULL; } free_huge_folio(folio); @@ -991,7 +991,7 @@ void folios_put_refs(struct folio_batch *folios, unsign= ed int *refs) j++; } if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec_unlock_irqrestore(lruvec, flags); if (!j) { folio_batch_reinit(folios); return; diff --git a/mm/vmscan.c b/mm/vmscan.c index 670fe9fae5baa..28d9b3af47130 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1829,7 +1829,7 @@ bool folio_isolate_lru(struct folio *folio) folio_get(folio); lruvec =3D folio_lruvec_lock_irq(folio); lruvec_del_folio(lruvec, folio); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); ret =3D true; } =20 @@ -7855,7 +7855,7 @@ void check_move_unevictable_folios(struct folio_batch= *fbatch) if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - unlock_page_lruvec_irq(lruvec); + lruvec_unlock_irq(lruvec); } else if (pgscanned) { count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E89761D5160 for ; Wed, 17 Dec 2025 07:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956572; cv=none; b=sQ3o6JRax9K+AhWBahe7GGu7ELNOrkOm1O9cyD0uT/MGt6jrPdPt9HzF737cOAqBm12onY5fwVg/8LZMGGZhmi44Xp+fVd/+rkOXVYXFfV+ZJB6euDrJbk9k7RFHf5VvDbySHMbZhZln9jouOfQYLoQXYGB5FOZ0E9cY45qyc9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956572; c=relaxed/simple; bh=XElbJXTmKGRKcs6YyHuy5k2fGz3B4Sx4NWCQTaH8EoM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j8Chb7Pf4hP0BwcW7sOzxoQU03QNkgysMx4YzLJ8NO3CPm83Tb/q55/YnWGO4asQ2h1zJZCtQkbeAWn/dQsmEcS4levzmoYcNkV4PeC0Iuhj3AOrBiNNZXS5tIwQk2Bji0v2LISJtLPO/OdzjgWDp0KWBBgEul2buHWoDK89Br0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=TGrKrdr6; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="TGrKrdr6" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956563; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2pE9uf1TLSHkLubOWIgm4VTvwh88IAohI6sYrjRqdOw=; b=TGrKrdr6WtZDtwmUq5jN9ccsCd7ukQJQBZZNWllJP0uWeb2+ozNTjHjwdyOUg/IzcxmjCl VxzTkWLpcvHPtjwoj2edd4HShUUS29/dYCI9/pLEjwSRZvJW2apkUhBi80y2FQbmJCmaGM BG4fA/ZZNwOv1UCSh3TzY6hIaNXKIyY= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 04/28] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Date: Wed, 17 Dec 2025 15:27:28 +0800 Message-ID: <4a7ca63e3d872b7e4d117cf4e2696486772facb6.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng After refactoring the move_folios_to_lru(), its caller no longer needs to hold the lruvec lock, the disabling IRQ is only for __count_vm_events() and __mod_node_page_state(). On the PREEMPT_RT kernel, the local_irq_disable() cannot be used. To avoid using local_irq_disable() and reduce the critical section of disabling IRQ, make all callers of move_folios_to_lru() use IRQ-safed count_vm_events() and mod_node_page_state(). Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/vmscan.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 28d9b3af47130..49e5661746213 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2021,12 +2021,12 @@ static unsigned long shrink_inactive_list(unsigned = long nr_to_scan, =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), stat.nr_demoted); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item =3D PGSTEAL_KSWAPD + reclaimer_offset(sc); if (!cgroup_reclaim(sc)) - __count_vm_events(item, nr_reclaimed); + count_vm_events(item, nr_reclaimed); count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); - __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); + count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); =20 lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); @@ -2171,10 +2171,10 @@ static void shrink_active_list(unsigned long nr_to_= scan, nr_activate =3D move_folios_to_lru(lruvec, &l_active); nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); =20 - __count_vm_events(PGDEACTIVATE, nr_deactivate); + count_vm_events(PGDEACTIVATE, nr_deactivate); count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); =20 - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); =20 lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, @@ -4751,9 +4751,9 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, =20 item =3D PGSTEAL_KSWAPD + reclaimer_offset(sc); if (!cgroup_reclaim(sc)) - __count_vm_events(item, reclaimed); + count_vm_events(item, reclaimed); count_memcg_events(memcg, item, reclaimed); - __count_vm_events(PGSTEAL_ANON + type, reclaimed); + count_vm_events(PGSTEAL_ANON + type, reclaimed); =20 spin_unlock_irq(&lruvec->lru_lock); =20 --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0051121772A for ; Wed, 17 Dec 2025 07:29:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956586; cv=none; b=Ucni1dDsPTY0tGlv9eAXa4k6o+M9j+ANL/T59fygpvjP89s2ZCwOYFg0cLiYP0KF939DX3BJXgqoZh9oqc8xS4t2UF+UmTfKG2lq3lB17h4qNjOpnPFWVghOtthteK9kudDRsf+mVYBTo+RlYRF0aiIA2Sy9Ld7QFFC0C7eSbVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956586; c=relaxed/simple; bh=8vgx2ykexnnO8DAvB1JfddChw0WTru0tk8binaaVqjU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mrp1RaUDY70fhpOXo/YR79HXKDbiz2G9HcqAmbh+CROaQH+V2VB4QA28ZtmmsEKS7qZ/5vo1IYp1wSAvFwfK81s2H1dml2L7Ft9j8m84sRreseIkStaxmQEfD0j3rI4Nwh419RhHwwlb3saLE/p7gsbXON++nOtBJSvZAQnSmjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=G49xQSCD; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="G49xQSCD" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956577; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q0OheEBAXk7XPYwqSkJ0Le26eixYKExsbR31tbstIws=; b=G49xQSCDDpZAppo/wZsDsqQWibcEkNZSb7m2ERxT7qti7MrBuU10SmnxCYBc+WUIHf4TCS iMR8QuBNfphzQC4ptasTWFpd/SM2MIr5k84EuasV7jJsBHSTrlV0TZPz/uS0quxKVuaEio Rs436+PJI9YW/vq4R6xRXGnOPF2R2A4= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 05/28] mm: vmscan: refactor move_folios_to_lru() Date: Wed, 17 Dec 2025 15:27:29 +0800 Message-ID: <0140f3b290fd259d58e11f86f1f04f732e8096f1.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In a subsequent patch, we'll reparent the LRU folios. The folios that are moved to the appropriate LRU list can undergo reparenting during the move_folios_to_lru() process. Hence, it's incorrect for the caller to hold a lruvec lock. Instead, we should utilize the more general interface of folio_lruvec_relock_irq() to obtain the correct lruvec lock. This patch involves only code refactoring and doesn't introduce any functional changes. Signed-off-by: Muchun Song Acked-by: Johannes Weiner Signed-off-by: Qi Zheng Acked-by: Shakeel Butt --- mm/vmscan.c | 46 +++++++++++++++++++++------------------------- 1 file changed, 21 insertions(+), 25 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 49e5661746213..354b19f7365d4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1883,24 +1883,27 @@ static bool too_many_isolated(struct pglist_data *p= gdat, int file, /* * move_folios_to_lru() moves folios from private @list to appropriate LRU= list. * - * Returns the number of pages moved to the given lruvec. + * Returns the number of pages moved to the appropriate lruvec. + * + * Note: The caller must not hold any lruvec lock. */ -static unsigned int move_folios_to_lru(struct lruvec *lruvec, - struct list_head *list) +static unsigned int move_folios_to_lru(struct list_head *list) { int nr_pages, nr_moved =3D 0; + struct lruvec *lruvec =3D NULL; struct folio_batch free_folios; =20 folio_batch_init(&free_folios); while (!list_empty(list)) { struct folio *folio =3D lru_to_folio(list); =20 + lruvec =3D folio_lruvec_relock_irq(folio, lruvec); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); list_del(&folio->lru); if (unlikely(!folio_evictable(folio))) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); folio_putback_lru(folio); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; continue; } =20 @@ -1922,19 +1925,15 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, =20 folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) =3D=3D 0) { - spin_unlock_irq(&lruvec->lru_lock); + lruvec_unlock_irq(lruvec); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); + lruvec =3D NULL; } =20 continue; } =20 - /* - * All pages were isolated from the same lruvec (and isolation - * inhibits memcg migration). - */ VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); @@ -1943,11 +1942,12 @@ static unsigned int move_folios_to_lru(struct lruve= c *lruvec, workingset_age_nonresident(lruvec, nr_pages); } =20 + if (lruvec) + lruvec_unlock_irq(lruvec); + if (free_folios.nr) { - spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); free_unref_folios(&free_folios); - spin_lock_irq(&lruvec->lru_lock); } =20 return nr_moved; @@ -2016,8 +2016,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, nr_reclaimed =3D shrink_folio_list(&folio_list, pgdat, sc, &stat, false, lruvec_memcg(lruvec)); =20 - spin_lock_irq(&lruvec->lru_lock); - move_folios_to_lru(lruvec, &folio_list); + move_folios_to_lru(&folio_list); =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), stat.nr_demoted); @@ -2028,6 +2027,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); =20 + spin_lock_irq(&lruvec->lru_lock); lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); =20 @@ -2166,16 +2166,14 @@ static void shrink_active_list(unsigned long nr_to_= scan, /* * Move folios back to the lru list. */ - spin_lock_irq(&lruvec->lru_lock); - - nr_activate =3D move_folios_to_lru(lruvec, &l_active); - nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); + nr_activate =3D move_folios_to_lru(&l_active); + nr_deactivate =3D move_folios_to_lru(&l_inactive); =20 count_vm_events(PGDEACTIVATE, nr_deactivate); count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); =20 + spin_lock_irq(&lruvec->lru_lock); lru_note_cost_unlock_irq(lruvec, file, 0, nr_rotated); trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); @@ -4736,14 +4734,14 @@ static int evict_folios(unsigned long nr_to_scan, s= truct lruvec *lruvec, set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active)); } =20 - spin_lock_irq(&lruvec->lru_lock); - - move_folios_to_lru(lruvec, &list); + move_folios_to_lru(&list); =20 walk =3D current->reclaim_state->mm_walk; if (walk && walk->batched) { walk->lruvec =3D lruvec; + spin_lock(&lruvec->lru_lock); reset_batch_size(walk); + spin_unlock(&lruvec->lru_lock); } =20 mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc), @@ -4755,8 +4753,6 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, count_memcg_events(memcg, item, reclaimed); count_vm_events(PGSTEAL_ANON + type, reclaimed); =20 - spin_unlock_irq(&lruvec->lru_lock); - list_splice_init(&clean, &list); =20 if (!list_empty(&list)) { --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32AE9336EC7 for ; Wed, 17 Dec 2025 07:29:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956599; cv=none; b=LoRObtRzXVMXYPbD7fXUJlLZWTCtW6dkP9MsSOc4mDhBcsyDjPbmABFgmVCBNo/ZQYTSrBQM5rYuA/nolmn1Dr7Zd7NOUHa0TWArxVFJ40yOw9AT9IqY2rBRa9T0nSHooAukwN+QABOA2TcSlef0Z8+NLcz0jJwdVgwX1dnA+L0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956599; c=relaxed/simple; bh=01Z3FN+O1DEpQl38IZhAAcMBWaXEbnhc6mZhMPDq68Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RJ+zUGQ1PVTUZ+tWRxjVh+ea84X9S9o1ygaNQerqMrBgzWmNfExr51nEy+JLlA389E0m220U8dNH3lqb035riH5r9jwNfbLE3pdrb9/i0SW7+qW9p1kg2FNKQxmw/RvIDytlI89p+9TwU6UfUzFmHpUqBE7FlGUuImTjTXNX1Hw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Wphsswng; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Wphsswng" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w4dfEcEvEfUWW7owlGocpCYghDEZ1Bptpjv9xHEdbhs=; b=Wphsswng/as3O4tABZ8G7iOBDRI+iGwrqKHsK8293bBW8/bK/AKP7cwvDUzYdHMl5pOdpp qE8F7Udp6woqzO9/M9NUloZqhwqogpco7PSNhaKAvVMLBGZlpiXaOyZ0reMDEzkdWujQb1 kMIJLp2Ly4sVJ2ibeyQqjUuKFVtZZWQ= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 06/28] mm: memcontrol: allocate object cgroup for non-kmem case Date: Wed, 17 Dec 2025 15:27:30 +0800 Message-ID: <897be76398cb2027d08d1bcda05260ede54dc134.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Pagecache pages are charged at allocation time and hold a reference to the original memory cgroup until reclaimed. Depending on memory pressure, page sharing patterns between different cgroups and cgroup creation/destruction rates, many dying memory cgroups can be pinned by pagecache pages, reducing page reclaim efficiency and wasting memory. Converting LRU folios and most other raw memory cgroup pins to the object cgroup direction can fix this long-living problem. As a result, the objcg infrastructure is no longer solely applicable to the kmem case. In this patch, we extend the scope of the objcg infrastructure beyond the kmem case, enabling LRU folios to reuse it for folio charging purposes. It should be noted that LRU folios are not accounted for at the root level, yet the folio->memcg_data points to the root_mem_cgroup. Hence, the folio->memcg_data of LRU folios always points to a valid pointer. However, the root_mem_cgroup does not possess an object cgroup. Therefore, we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- mm/memcontrol.c | 51 +++++++++++++++++++++++-------------------------- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae234518d023c..544b3200db12d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -204,10 +204,10 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); =20 objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); =20 @@ -3294,30 +3294,17 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *m= emcg, bool swap) return val; } =20 -static int memcg_online_kmem(struct mem_cgroup *memcg) +static void memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; - if (mem_cgroup_kmem_disabled()) - return 0; + return; =20 if (unlikely(mem_cgroup_is_root(memcg))) - return 0; - - objcg =3D obj_cgroup_alloc(); - if (!objcg) - return -ENOMEM; - - objcg->memcg =3D memcg; - rcu_assign_pointer(memcg->objcg, objcg); - obj_cgroup_get(objcg); - memcg->orig_objcg =3D objcg; + return; =20 static_branch_enable(&memcg_kmem_online_key); =20 memcg->kmemcg_id =3D memcg->id.id; - - return 0; } =20 static void memcg_offline_kmem(struct mem_cgroup *memcg) @@ -3332,12 +3319,6 @@ static void memcg_offline_kmem(struct mem_cgroup *me= mcg) =20 parent =3D parent_mem_cgroup(memcg); memcg_reparent_list_lrus(memcg, parent); - - /* - * Objcg's reparenting must be after list_lru's, make sure list_lru - * helpers won't use parent's list_lru until child is drained. - */ - memcg_reparent_objcgs(memcg, parent); } =20 #ifdef CONFIG_CGROUP_WRITEBACK @@ -3854,9 +3835,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + struct obj_cgroup *objcg; =20 - if (memcg_online_kmem(memcg)) - goto remove_id; + memcg_online_kmem(memcg); =20 /* * A memcg must be visible for expand_shrinker_info() @@ -3866,6 +3847,15 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) if (alloc_shrinker_info(memcg)) goto offline_kmem; =20 + objcg =3D obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg =3D memcg; + rcu_assign_pointer(memcg->objcg, objcg); + obj_cgroup_get(objcg); + memcg->orig_objcg =3D objcg; + if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); @@ -3888,9 +3878,10 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) xa_store(&mem_cgroup_ids, memcg->id.id, memcg, GFP_KERNEL); =20 return 0; +free_shrinker: + free_shrinker_info(memcg); offline_kmem: memcg_offline_kmem(memcg); -remove_id: mem_cgroup_id_remove(memcg); return -ENOMEM; } @@ -3908,6 +3899,12 @@ static void mem_cgroup_css_offline(struct cgroup_sub= sys_state *css) =20 memcg_offline_kmem(memcg); reparent_deferred_split_queue(memcg); + /* + * The reparenting of objcg must be after the reparenting of the + * list_lru and deferred_split_queue above, which ensures that they will + * not mistakenly get the parent list_lru and deferred_split_queue. + */ + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); lru_gen_offline_memcg(memcg); --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8204233D9E for ; Wed, 17 Dec 2025 07:30:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956611; cv=none; b=LzvJ8FoA0gKBmqQDA4qpF1PqwijjxK/MSyospGfrp2E9BOdSLIgfxa+wTsAqwboMzP9bF8cO1oJLqXdGR+5Ly+iMwLY0c+bgE12yFdk0izFqw4t+fMbeGRGnbw9fDcmOhBU/NQFE9yXx1GCS2A3QeG6y4GIn1FDvJ93zvMK2v0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956611; c=relaxed/simple; bh=Ceqg36g8lhW7NbRG1DbazTUt181t9tkKOgM3UcsDSl0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XswO38/4C25cWfdbRUkr3jm3GH/vljPk4o0inz4cExksF8mjg/lx+TsnUAwWGeN24lRgwvp7IeUSjPkOFheWCnu/3BsaXdHtvn4x7C667Q4I/Izl80UT31pCpClbmHxh3ZzUwX7/D3iOEXpjlPaCUmkkMg3a532XzYaHyPaTxZw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=u3ENEzXx; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="u3ENEzXx" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956602; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MnE9D8GcmnKRbpRKs57u5JAtmYg0S0V4yTV24f2KZzo=; b=u3ENEzXxDb64wtMs933am3zKugIqhdmbD57tRwNjWeyhnsAiReZEVeXYbf/N1PYvNFWTdH HkmXFLPyJ+pqHOImJNdVCSRahBZ2ZcwRkABHvQtvTMGJeIj1cNo5P2seUKiPABLUEs84+u cFx3aKrJcnFk2RxraqJQm8kATG+0c+o= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 07/28] mm: memcontrol: return root object cgroup for root memory cgroup Date: Wed, 17 Dec 2025 15:27:31 +0800 Message-ID: <3e454b151f3926dbd67d5df6dc2b129edd927101.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Memory cgroup functions such as get_mem_cgroup_from_folio() and get_mem_cgroup_from_mm() return a valid memory cgroup pointer, even for the root memory cgroup. In contrast, the situation for object cgroups has been different. Previously, the root object cgroup couldn't be returned because it didn't exist. Now that a valid root object cgroup exists, for the sake of consistency, it's necessary to align the behavior of object-cgroup-related operations with that of memory cgroup APIs. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- include/linux/memcontrol.h | 26 +++++++++++++++++----- mm/memcontrol.c | 45 ++++++++++++++++++++------------------ mm/percpu.c | 2 +- 3 files changed, 45 insertions(+), 28 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 288dd6337f80f..776d9be1f446a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -332,6 +332,7 @@ struct mem_cgroup { #define MEMCG_CHARGE_BATCH 64U =20 extern struct mem_cgroup *root_mem_cgroup; +extern struct obj_cgroup *root_obj_cgroup; =20 enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ @@ -549,6 +550,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgrou= p *memcg) return (memcg =3D=3D root_mem_cgroup); } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return objcg =3D=3D root_obj_cgroup; +} + static inline bool mem_cgroup_disabled(void) { return !cgroup_subsys_enabled(memory_cgrp_subsys); @@ -773,23 +779,26 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_= subsys_state *css){ =20 static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg) { + if (obj_cgroup_is_root(objcg)) + return true; return percpu_ref_tryget(&objcg->refcnt); } =20 -static inline void obj_cgroup_get(struct obj_cgroup *objcg) +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, + unsigned long nr) { - percpu_ref_get(&objcg->refcnt); + if (!obj_cgroup_is_root(objcg)) + percpu_ref_get_many(&objcg->refcnt, nr); } =20 -static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, - unsigned long nr) +static inline void obj_cgroup_get(struct obj_cgroup *objcg) { - percpu_ref_get_many(&objcg->refcnt, nr); + obj_cgroup_get_many(objcg, 1); } =20 static inline void obj_cgroup_put(struct obj_cgroup *objcg) { - if (objcg) + if (objcg && !obj_cgroup_is_root(objcg)) percpu_ref_put(&objcg->refcnt); } =20 @@ -1084,6 +1093,11 @@ static inline bool mem_cgroup_is_root(struct mem_cgr= oup *memcg) return true; } =20 +static inline bool obj_cgroup_is_root(const struct obj_cgroup *objcg) +{ + return true; +} + static inline bool mem_cgroup_disabled(void) { return true; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 544b3200db12d..21b5aad34cae7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -83,6 +83,8 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; EXPORT_SYMBOL(root_mem_cgroup); =20 +struct obj_cgroup *root_obj_cgroup __read_mostly; + /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg); @@ -2634,15 +2636,14 @@ struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) =20 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *m= emcg) { - struct obj_cgroup *objcg =3D NULL; + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { + struct obj_cgroup *objcg =3D rcu_dereference(memcg->objcg); =20 - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { - objcg =3D rcu_dereference(memcg->objcg); if (likely(objcg && obj_cgroup_tryget(objcg))) - break; - objcg =3D NULL; + return objcg; } - return objcg; + + return NULL; } =20 static struct obj_cgroup *current_objcg_update(void) @@ -2716,18 +2717,17 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) * Objcg reference is kept by the task, so it's safe * to use the objcg by the current task. */ - return objcg; + return objcg ? : root_obj_cgroup; } =20 memcg =3D this_cpu_read(int_active_memcg); if (unlikely(memcg)) goto from_memcg; =20 - return NULL; + return root_obj_cgroup; =20 from_memcg: - objcg =3D NULL; - for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { + for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { /* * Memcg pointer is protected by scope (see set_active_memcg()) * and is pinning the corresponding objcg, so objcg can't go @@ -2736,10 +2736,10 @@ __always_inline struct obj_cgroup *current_obj_cgro= up(void) */ objcg =3D rcu_dereference_check(memcg->objcg, 1); if (likely(objcg)) - break; + return objcg; } =20 - return objcg; + return root_obj_cgroup; } =20 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) @@ -2753,14 +2753,8 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct = folio *folio) objcg =3D __folio_objcg(folio); obj_cgroup_get(objcg); } else { - struct mem_cgroup *memcg; - rcu_read_lock(); - memcg =3D __folio_memcg(folio); - if (memcg) - objcg =3D __get_obj_cgroup_from_memcg(memcg); - else - objcg =3D NULL; + objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); rcu_read_unlock(); } return objcg; @@ -2863,7 +2857,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t= gfp, int order) int ret =3D 0; =20 objcg =3D current_obj_cgroup(); - if (objcg) { + if (objcg && !obj_cgroup_is_root(objcg)) { ret =3D obj_cgroup_charge_pages(objcg, gfp, 1 << order); if (!ret) { obj_cgroup_get(objcg); @@ -3164,7 +3158,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *= s, struct list_lru *lru, * obj_cgroup_get() is used to get a permanent reference. */ objcg =3D current_obj_cgroup(); - if (!objcg) + if (!objcg || obj_cgroup_is_root(objcg)) return true; =20 /* @@ -3851,6 +3845,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) if (!objcg) goto free_shrinker; =20 + if (unlikely(mem_cgroup_is_root(memcg))) + root_obj_cgroup =3D objcg; + objcg->memcg =3D memcg; rcu_assign_pointer(memcg->objcg, objcg); obj_cgroup_get(objcg); @@ -5471,6 +5468,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg= , size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC)); =20 /* PF_MEMALLOC context, charging must succeed */ @@ -5498,6 +5498,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *obj= cg, size_t size) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 + if (obj_cgroup_is_root(objcg)) + return; + obj_cgroup_uncharge(objcg, size); =20 rcu_read_lock(); diff --git a/mm/percpu.c b/mm/percpu.c index 81462ce5866e1..5c1a9b77d6b93 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1616,7 +1616,7 @@ static bool pcpu_memcg_pre_alloc_hook(size_t size, gf= p_t gfp, return true; =20 objcg =3D current_obj_cgroup(); - if (!objcg) + if (!objcg || obj_cgroup_is_root(objcg)) return true; =20 if (obj_cgroup_charge(objcg, gfp, pcpu_obj_full_size(size))) --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA1EE21772A for ; Wed, 17 Dec 2025 07:30:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956626; cv=none; b=VTPHmhkHTSt5JJS8JGJjvJIcbUm50trXRK1040SNhBSkUv97NZPuIYRqzyUPoBhmNS3wleu9ihT7zdQu2khP3yf6kqDgdo9xAv3UFep/U2GdnrTNge7yVuLy4XcsYW0xmCYSE9YUc4VwKidrTF6bs/v6kYOg35kq+zQmyo+tYVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956626; c=relaxed/simple; bh=MxYgV8/w8PhCxmkMua0bb3Oh8xEkGZUmjUJTojVlETQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gIOAQPI+/I2rES/olBg0E9cmZjhxBqyAXfbs7+Gbx6drSjba0INMNpQpe/fnk596RmagnLyE2BRo35kI1ZfkNQmnNlofz3fJFT+Xb4QrduAe4vC2O2tlH2kSNmXwlyfIAsUHVKKJezEOk0JASYeKWFoDGt3p73/N+FQiVPFU4Mw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ht/pt0s4; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ht/pt0s4" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956616; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WJaSY8/LO9jJ6S19J29RE7irvv8d++fvHvljBrkMe5g=; b=ht/pt0s4g+3r53ot1RQ62n1/WqHwohZA9oU7UDB4uUGCVJQQfb73NonfcosdEPDPprev9u kp/QvCPPE1NpyYcUK+QXcQhfINyowuqc1uV/DUY3uHn9/sZ24vHwfj08oOrtlHG1H5nQDq esXNLr5XLmnrkUJOnln+Nwks5ycdKwc= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Date: Wed, 17 Dec 2025 15:27:32 +0800 Message-ID: <29e5c116de15e55be082a544e3f24d8ddb6b3476.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in get_mem_cgroup_from_folio(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo --- mm/memcontrol.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 21b5aad34cae7..431b3154c70c5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 if (mem_cgroup_disabled()) return NULL; =20 + if (!folio_memcg_charged(folio)) + return root_mem_cgroup; + rcu_read_lock(); - if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg =3D root_mem_cgroup; +retry: + memcg =3D folio_memcg(folio); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; rcu_read_unlock(); return memcg; } --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E8D321772A for ; Wed, 17 Dec 2025 07:30:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956640; cv=none; b=X2OnxnQaPatNJLa5FXjovb/RLm4CqTyBbk7jJrwqR8CPiXRUI4JGOuCr2ror/IbDCzgM0n+2UWDyHlKSawZEStpFb4/N4Sf0+wYg9yzg7u2UiYps0eoWw8jzwTWTTJ1Uge9rZIE8m0cAlDeDM6za2fJskfi8ctpmsmgWA6kdIjc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956640; c=relaxed/simple; bh=5t8mtaKEbTaBeCOppY8BEGcL/Ru6coBEX3Lct8PQw8s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=THTEC2D492OELmdZ4IhIDycd+vzKC6TvlIA4cZuvaK7wA7gAkI/hM+i9FUPMWUQwDg4/zA8RGMbY2y2E7EaVchNlFZN5UcCRnmtQRH0I0JtbhTRG/HNrwzRuLEGRKQoNOYdW+2z6zjFLVmwS5XlVc6tS4PvF6H+GiHUBdEtLrrQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=BKBDmMTT; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="BKBDmMTT" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956630; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TRju/G2AicNAtYXExappHAHtEmzxl9wl3ybhh14XiuA=; b=BKBDmMTTYzW8+KMzw0Cb3xu5yMNfhSfyczOl2Mv78ulFxvyEK/rvQ2MT6DdrmHYG8sBJ4f 0ubQx0rUizrQc/VOotOqMHlErCaCFTLP7UWSLt0s/+GV3xQMxDZwaPZ74xKb/hK4pj6g8l moKY4bPgdI/MUiDypKfWktPBmaozvTU= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 09/28] buffer: prevent memory cgroup release in folio_alloc_buffers() Date: Wed, 17 Dec 2025 15:27:33 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_from_folio() is employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- fs/buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index fd53b806ab7eb..4552d9cab0dbd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -925,8 +925,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, long offset; struct mem_cgroup *memcg, *old_memcg; =20 - /* The folio lock pins the memcg */ - memcg =3D folio_memcg(folio); + memcg =3D get_mem_cgroup_from_folio(folio); old_memcg =3D set_active_memcg(memcg); =20 head =3D NULL; @@ -947,6 +946,7 @@ struct buffer_head *folio_alloc_buffers(struct folio *f= olio, unsigned long size, } out: set_active_memcg(old_memcg); + mem_cgroup_put(memcg); return head; /* * In case anything failed, we just free everything we got. --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42304338F45 for ; Wed, 17 Dec 2025 07:30:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956652; cv=none; b=r7nzC1qLXp6a7Isun4Mgx4ANhn2uUec6qlYPppVqZ8tF0hsIN3q5zLmlGiBTsxeHsEpOGsZlkcWB76oLq7+41w9BVbacRm93PGsDkQeZ3ZYAHbIo1umwEwiC12EsRtZ6vUwGqpjWvvNWB+J31TJHLviSo/lTOXKduPRklu8yQDA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956652; c=relaxed/simple; bh=bIUIaX7AaZAWTGaQIgLVs77/5oKLusC8ZUMhKNKwwpw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NmPnnWPS1R3KZYn7VBpizosULReAemwW77LMqBl2UJD5F07HrPdP/6KAcRXZOARYySNha623Y37s5RbScJyBxUdobSIgyFhP8Yc1JRF7oBDbKIT7OwWJ/HIz2OopWj9KWyBJuC/SkoEZOhLJxAW4W+aHVTo/cz+Pt+/5hjDIH5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ZrvJVvvW; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ZrvJVvvW" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956644; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cb8NWIyZ/ts5sKA8lazTxoYJ+MrmJmaOEPm0Hg1IF8Q=; b=ZrvJVvvWuNeMXQL9pMRHM4uWZyvvIGS4eeGanGz2oB/X+b3VfcEuyVIps/0tthZkAdqdER WYYXtlOzeXulJisgMUeJS2MAmu/yMhGhzaCnO4JeGiEvFWXmp/3gOQMVYWVe8xAjW17tZ+ omsRM5kP270T8TDwabvlJGBS3joq4mA= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 10/28] writeback: prevent memory cgroup release in writeback module Date: Wed, 17 Dec 2025 15:27:34 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the function get_mem_cgroup_css_from_folio() and the rcu read lock are employed to safeguard against the release of the memory cgroup. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner Acked-by: Shakeel Butt --- fs/fs-writeback.c | 22 +++++++++++----------- include/linux/memcontrol.h | 9 +++++++-- include/trace/events/writeback.h | 3 +++ mm/memcontrol.c | 14 ++++++++------ 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 5dd6e89a6d29e..2e57b7e2b4453 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -279,15 +279,13 @@ void __inode_attach_wb(struct inode *inode, struct fo= lio *folio) if (inode_cgwb_enabled(inode)) { struct cgroup_subsys_state *memcg_css; =20 - if (folio) { - memcg_css =3D mem_cgroup_css_from_folio(folio); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - } else { - /* must pin memcg_css, see wb_get_create() */ + /* must pin memcg_css, see wb_get_create() */ + if (folio) + memcg_css =3D get_mem_cgroup_css_from_folio(folio); + else memcg_css =3D task_get_css(current, memory_cgrp_id); - wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); - css_put(memcg_css); - } + wb =3D wb_get_create(bdi, memcg_css, GFP_ATOMIC); + css_put(memcg_css); } =20 if (!wb) @@ -979,16 +977,16 @@ void wbc_account_cgroup_owner(struct writeback_contro= l *wbc, struct folio *folio if (!wbc->wb || wbc->no_cgroup_owner) return; =20 - css =3D mem_cgroup_css_from_folio(folio); + css =3D get_mem_cgroup_css_from_folio(folio); /* dead cgroups shouldn't contribute to inode ownership arbitration */ if (!css_is_online(css)) - return; + goto out; =20 id =3D css->id; =20 if (id =3D=3D wbc->wb_id) { wbc->wb_bytes +=3D bytes; - return; + goto out; } =20 if (id =3D=3D wbc->wb_lcand_id) @@ -1001,6 +999,8 @@ void wbc_account_cgroup_owner(struct writeback_control= *wbc, struct folio *folio wbc->wb_tcand_bytes +=3D bytes; else wbc->wb_tcand_bytes -=3D min(bytes, wbc->wb_tcand_bytes); +out: + css_put(css); } EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); =20 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 776d9be1f446a..bc526e0d37e0b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -895,7 +895,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return match; } =20 -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio); +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio); ino_t page_cgroup_ino(struct page *page); =20 static inline bool mem_cgroup_online(struct mem_cgroup *memcg) @@ -1549,9 +1549,14 @@ static inline void mem_cgroup_track_foreign_dirty(st= ruct folio *folio, if (mem_cgroup_disabled()) return; =20 + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); memcg =3D folio_memcg(folio); - if (unlikely(memcg && &memcg->css !=3D wb->memcg_css)) + if (unlikely(&memcg->css !=3D wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(folio, wb); + rcu_read_unlock(); } =20 void mem_cgroup_flush_foreign(struct bdi_writeback *wb); diff --git a/include/trace/events/writeback.h b/include/trace/events/writeb= ack.h index 311a341e6fe42..f5bfe8c1a160a 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -295,7 +295,10 @@ TRACE_EVENT(track_foreign_dirty, __entry->ino =3D inode ? inode->i_ino : 0; __entry->memcg_id =3D wb->memcg_css->id; __entry->cgroup_ino =3D __trace_wb_assign_cgroup(wb); + + rcu_read_lock(); __entry->page_cgroup_ino =3D cgroup_ino(folio_memcg(folio)->css.cgroup); + rcu_read_unlock(); ), =20 TP_printk("bdi %s[%llu]: ino=3D%lu memcg_id=3D%u cgroup_ino=3D%lu page_cg= roup_ino=3D%lu", diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 431b3154c70c5..131f940c03fa0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -241,7 +241,7 @@ DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key); EXPORT_SYMBOL(memcg_bpf_enabled_key); =20 /** - * mem_cgroup_css_from_folio - css of the memcg associated with a folio + * get_mem_cgroup_css_from_folio - acquire a css of the memcg associated w= ith a folio * @folio: folio of interest * * If memcg is bound to the default hierarchy, css of the memcg associated @@ -251,14 +251,16 @@ EXPORT_SYMBOL(memcg_bpf_enabled_key); * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup * is returned. */ -struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio) +struct cgroup_subsys_state *get_mem_cgroup_css_from_folio(struct folio *fo= lio) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) - memcg =3D root_mem_cgroup; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return &root_mem_cgroup->css; =20 - return &memcg->css; + memcg =3D get_mem_cgroup_from_folio(folio); + + return memcg ? &memcg->css : &root_mem_cgroup->css; } =20 /** --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 505FF339878 for ; Wed, 17 Dec 2025 07:31:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956666; cv=none; b=Umfcuxsmvj+ONHyUlH/n49XB/EMbOrS6pvG05CU7WBCpydgJDBrLLVl9bhS5XuCFhYsxGrk/2Y9RIOMd7se6lU3VxQ85s2jWt0hpoFzRBgylg6UO4DZQfSe5aQ5gdxUbKmve2RGEwl0fi1MziKg6CJURABvB23C5+VZjx+gTWws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956666; c=relaxed/simple; bh=x5shAAneGRsuZovquvw5ex7nSqfnEHCxfAtSl/Y/Wl8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=moLrpA9r4fnVAyq4Nb19jYRElbBPJ3WAHYsZiqvglm2unQhGJjXJL78E7rwre79x0VZ9fWmupEfe6APduYe2pwm2XCTYpmmM1S1qB0gyn2doqO6qRLTK49PAoCuGXGIs2GhrTk+vl+tbZEPVnmBVz4e9+3I7yaq7R7tCHhFmLCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Vm1diLNu; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Vm1diLNu" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WKohFIfUqUoJgtR0v3jBFvc4zN9jyjHuINauLbc6uS8=; b=Vm1diLNu7R5n3RF3dF3GkU5FENNVYHqQ/FI+Kuws9mzfZAlFuI50n5b4rAfXOETi4aKA4D w353FHFeu0RHicJH2cE8lKXmT6xoKaStiAgtLBTHiJ1oNcNpWYzzxEUCjkQt0o7QBP1yko IJGaIa+hPaBFrWjsA4Ls4pWDa3l2niM= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 11/28] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Date: Wed, 17 Dec 2025 15:27:35 +0800 Message-ID: <5f8032bc300b7c12e61446ba4f3d28fba5a7d9d5.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in count_memcg_folio_events(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bc526e0d37e0b..69c4bcfb3c3cd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -974,10 +974,15 @@ void count_memcg_events(struct mem_cgroup *memcg, enu= m vm_event_item idx, static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) { - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; =20 - if (memcg) - count_memcg_events(memcg, idx, nr); + if (!folio_memcg_charged(folio)) + return; + + rcu_read_lock(); + memcg =3D folio_memcg(folio); + count_memcg_events(memcg, idx, nr); + rcu_read_unlock(); } =20 static inline void count_memcg_events_mm(struct mm_struct *mm, --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2373C33A00F for ; Wed, 17 Dec 2025 07:31:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956679; cv=none; b=S6M58g9tkSi6s1CGU2v4/+kTcNVIoZY2WFuf1gq4nVd06neIk0Oc/t+/j2r0sVGmaMFz/cXLA8fHyt3qnHbm72gPt/AEIIpWly70jYJWQgT07VBDV2F+1bqmW8cMxf/boVQeY0piSsUnca25RU70OZuPAvuMOv1F++WQIju3R0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956679; c=relaxed/simple; bh=RkqbGONrbAMH9lbfHJVcAdNrIclB3tdhwfB12aIwQyc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CylA96pDCIa3aIzeI6Wxs1TYOdYH4OHq+Mik5S3D98UeGoL9qH0S3uxQGNsIyKdeDWFAM768eeTFiIXmMZLRIUSeAWT5GcHi0mAHR6GGE9+0K8+6NFd5nfuEPcFcCjMBaKKCSLVs7FRmNe0DebbNI4fcEwyvQFQe6TyCycTkIqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=K96tzvyH; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="K96tzvyH" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7JUakH8RoGEUpXQwlompW3asRykfRFAg5UucbZpTDd8=; b=K96tzvyHVSfFjEQWFZYsknQeYErc4m9UCrZguyLOTm1EyoGLAezAyOquWDuHyYDHzdGvES vvyK28/fhaNwYlaxHWbq2BeRB5FUUSU+2+nQGRLWo979MFTMFgx97t5vmdwCw7NpWXk+sS w7mWHISxfr/ZIzsS97XE2xuRSP96qV8= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 12/28] mm: page_io: prevent memory cgroup release in page_io module Date: Wed, 17 Dec 2025 15:27:36 +0800 Message-ID: <30588f984137d557e4663ae8dcf398b8c408169b.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in swap_writeout() and bio_associate_blkg_from_page(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/page_io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 3c342db77ce38..ec7720762042c 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -276,10 +276,14 @@ int swap_writeout(struct folio *folio, struct swap_io= cb **swap_plug) count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); goto out_unlock; } + + rcu_read_lock(); if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) { + rcu_read_unlock(); folio_mark_dirty(folio); return AOP_WRITEPAGE_ACTIVATE; } + rcu_read_unlock(); =20 __swap_writepage(folio, swap_plug); return 0; @@ -307,11 +311,11 @@ static void bio_associate_blkg_from_page(struct bio *= bio, struct folio *folio) struct cgroup_subsys_state *css; struct mem_cgroup *memcg; =20 - memcg =3D folio_memcg(folio); - if (!memcg) + if (!folio_memcg_charged(folio)) return; =20 rcu_read_lock(); + memcg =3D folio_memcg(folio); css =3D cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); rcu_read_unlock(); --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F44333A701 for ; Wed, 17 Dec 2025 07:31:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956700; cv=none; b=pJe0HlkWA4N+PjEowib9TDIHE6+YeOor3aN6DFU9xk7DYGQLI4xp8X1Bj+lVKD1Nnxv2sM4sYYJdNlKY6dD3quOltMGsqloORQt16NV/YMhD/rKAcFqrlla2fV9JlRMBjEqkwF3b0yhEmICDL80xP01hpGuAoq94y+0miQUnqJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956700; c=relaxed/simple; bh=F/3rA84FRmyudweYBO2D5mXlwv20pVZ3BpV2rou5P1Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zq4siNpoJ90h+RzCJi9R+njRETfEamoQlZRQTVkq+LyJ4V3JVtzzaueFKa7vYo78GFT0+lzoL+NfDTGtQiK3dbVicv44+VefnK1+0xOdcpYbR7l4zYqzY3+QcEUmyEmZS/MfkvPOZnXTZ6JavCi13UqwAeVS8C0ogdd1P7PX0m8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Dpym/8wi; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Dpym/8wi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9ugmq4fp6u6Yh4W4K7GQktCJ98/N712HUp8zSOzWPIU=; b=Dpym/8wiT7J1ZVKeNfPt+C3A+L7GdRuNhMyDPTQKIMteFSS5QIJYRZCFJNdHpiMS1ez25Y 7X1VGxT67Up330JCOXDXoI1rHcxe2C6xowJgyhm7GBrskvebqMmSp66/G9AOsDH4sMQa4I QgiaDtXzXOzyT4f+DsixHyZtIUbp+CY= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 13/28] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Date: Wed, 17 Dec 2025 15:27:37 +0800 Message-ID: <1554459c705a46324b83799ede617b670b9e22fb.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in folio_migrate_mapping(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/migrate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index 5169f9717f606..8bcd588c083ca 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -671,6 +671,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; =20 + rcu_read_lock(); memcg =3D folio_memcg(folio); old_lruvec =3D mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec =3D mem_cgroup_lruvec(memcg, newzone->zone_pgdat); @@ -698,6 +699,7 @@ static int __folio_migrate_mapping(struct address_space= *mapping, mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } + rcu_read_unlock(); } local_irq_enable(); =20 --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 729AD33A9D9; Wed, 17 Dec 2025 07:31:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956708; cv=none; b=BXCAef7aGN3zlFBg13UmnXa19p9if7VA4jfm8Odv2Z/cukJNusxRM0YlNIeor7f/9aQg/Q388RbnuXa9Y47SR8s19luFsZ12mKa3NO2jo4mH2xIav7qAH+Icm9XuDXraNTsREz6F/dABIXB3uUOB7/3dnMROs5Li4iSVhpLrDWw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956708; c=relaxed/simple; bh=rbr0d8jBiRsl8AdGg9Qf16262/nqBn7Sep4c+qa2ZBg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bqGdTgLxdFOdCjLbajWc/YSa4HzMwyl50Ckvl+dEPLXamlPI33QGXdCic755ybSC/gbGLq+J3VqiHMbiJvBzz0xmCJAkk+69B3gPfRzzlpwBaLazIEsZnTKpXbuFV5R2lkoxTgCY0TlygRpIv/ZvFOQ0VGGowZxNDai86Xm8+ik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=rcM1irid; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="rcM1irid" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O/RC2JaZfQWnRGhHEhR7EU7DG2VrpSmCFoSBZ0fAAxk=; b=rcM1irid7iZhLEiyC31dWf7GEZg2x6QnaCq8514+s+TRHJjJRBt5XX0VjwPoXWf4b870kM gF5CmOgkGe+CAHQ+l47v+XRNjSHJHKl4eleD+t4cE2jzU/bol8kuSG/q/eFSE/KS5KSEvx XfQohA5II8VscYDtO1EgTJmoLnFVmII= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 14/28] mm: mglru: prevent memory cgroup release in mglru Date: Wed, 17 Dec 2025 15:27:38 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mglru. This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng --- mm/vmscan.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 354b19f7365d4..814498a2c1bd6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3444,8 +3444,10 @@ static struct folio *get_pfn_folio(unsigned long pfn= , struct mem_cgroup *memcg, if (folio_nid(folio) !=3D pgdat->node_id) return NULL; =20 + rcu_read_lock(); if (folio_memcg(folio) !=3D memcg) - return NULL; + folio =3D NULL; + rcu_read_unlock(); =20 return folio; } @@ -4202,12 +4204,12 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) unsigned long addr =3D pvmw->address; struct vm_area_struct *vma =3D pvmw->vma; struct folio *folio =3D pfn_folio(pvmw->pfn); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); - struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); - DEFINE_MAX_SEQ(lruvec); - int gen =3D lru_gen_from_seq(max_seq); + struct lruvec *lruvec; + struct lru_gen_mm_state *mm_state; + unsigned long max_seq; + int gen; =20 lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); @@ -4242,6 +4244,13 @@ bool lru_gen_look_around(struct page_vma_mapped_walk= *pvmw) } } =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); + lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + max_seq =3D READ_ONCE((lruvec)->lrugen.max_seq); + gen =3D lru_gen_from_seq(max_seq); + mm_state =3D get_mm_state(lruvec); + arch_enter_lazy_mmu_mode(); =20 pte -=3D (addr - start) / PAGE_SIZE; @@ -4282,6 +4291,8 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); =20 + rcu_read_unlock(); + return true; } =20 --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B94B33A9E8 for ; Wed, 17 Dec 2025 07:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956720; cv=none; b=j7bTp1e2NXnMIMdpOMYalx+ZGvUk5I2w28amTs3wXvzNGQFn2nCfbQLeXXAEhUHc1Dmg1FvzAaZsVGBALKW8lDaWTmFJFFJvhnsizGlYHL+0+0e6GQctnwB+UyFp6uXeEmEEgZoKS+tNK+xtYaUtPYNEi2SInZ1MZLFM4AXMLvg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956720; c=relaxed/simple; bh=ZYekYnCofdyXKSWuuxA7WoQrLCHZb+9ckgI4J2K0dw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZJ04YnQtbznf8yU3x/k3u0+FDu/QUVl8U2spNimtlWODGJxu8reaDlxa8U7raqvCnRuNc/ZE9GJN4rNG3CwYzOb6pDQ1rUyAYd/O2q4CcUK+q5n4BkbfHk2UXULL3arPa6GqAqMZk8XtA1ivrA3y1Y+5pzjVGqDgm0P3aZijyik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=soa+opLi; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="soa+opLi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956711; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V4K7YSAAoMX3+MZNm3D3u3VukDVTNTfZTJaO+WTfHwU=; b=soa+opLiHzhlpOkib/c47niPkRmdgYD8ywE7qI/ViG3fYKxPr3bMPex+1o4kW9a9f4HlRa 2oFTuBu1qByHbwxK1c+5vo0S2/VfJbGBb7/DRCsw0TO+bIx2hsf0PZnxhl0CabMJ40pyLE U5+XoS3KoIKWlGB4LdX/NaW8KUaFXI0= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 15/28] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Date: Wed, 17 Dec 2025 15:27:39 +0800 Message-ID: <4dd1fb48ef4367e0932dbe7265d876bd95880808.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in mem_cgroup_swap_full(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/memcontrol.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 131f940c03fa0..f2c891c1f49d5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5267,17 +5267,21 @@ bool mem_cgroup_swap_full(struct folio *folio) if (do_memsw_account()) return false; =20 - memcg =3D folio_memcg(folio); - if (!memcg) + if (!folio_memcg_charged(folio)) return false; =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); for (; !mem_cgroup_is_root(memcg); memcg =3D parent_mem_cgroup(memcg)) { unsigned long usage =3D page_counter_read(&memcg->swap); =20 if (usage * 2 >=3D READ_ONCE(memcg->swap.high) || - usage * 2 >=3D READ_ONCE(memcg->swap.max)) + usage * 2 >=3D READ_ONCE(memcg->swap.max)) { + rcu_read_unlock(); return true; + } } + rcu_read_unlock(); =20 return false; } --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21A6733B96E for ; Wed, 17 Dec 2025 07:32:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956735; cv=none; b=tVQUF3qBJjf4p1oUwjV4GGV/CbtWprYAhYJ43deQPuy/lRroXWCr/ZUND6UJhDTX4IEeNti8WeJVVtvFSkWV6jDWtQqB8SDgdm05LVNWL6ZHkF6Bf6sUdglIA64nzJ5oytdPFyaWCOvfl/F4PuQZi+DPbtfSl1HAW9WqzgJ5Dwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956735; c=relaxed/simple; bh=D7/PbtAXuTTVeNQU1iZEVxvTJxucngN+PpDwIvTk93I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kV7FIZ5/HNWJWcJDzreQ34gpNApjALtiY23oPc+oaA6ba68HOwJzLZTWqGQz4Wd32MkeZoK+IvswskFDwO0pNx2QGXzYFDXb8Aaq5MRHGw2bf9fgylDOuL0AJbtBf91Sj6sUkgri7Ppmw91B0LHMVyIDnU14T9ljp55j3ZSEDqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=XUKaepUY; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="XUKaepUY" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956725; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vARDX0KlrHYxt2WGF2YuRlzAmZs3wCmu/kHd4dQ4hPc=; b=XUKaepUYVHa9QAIGaedmhA8z/fphq0IfY+wdqHSG9tqpDpuiKr7mrVJ5qM4Jr0q+YvluKS mB0P+UX+ymomQKAE70mACTXkZHOPx4AISxOk026ea+ZCNky3lbACkQHiu4x+s/cY2z19S5 ky38UxpborhFk/pXqhAoVw/xPUwewxU= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 16/28] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Date: Wed, 17 Dec 2025 15:27:40 +0800 Message-ID: <86b0573753db20e40315c61f5d6e01bdc6a8313a.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in lru_gen_eviction(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/workingset.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index e41b44e29944b..445fc634196d8 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -241,11 +241,14 @@ static void *lru_gen_eviction(struct folio *folio) int refs =3D folio_lru_refs(folio); bool workingset =3D folio_test_workingset(folio); int tier =3D lru_tier_from_refs(refs, workingset); - struct mem_cgroup *memcg =3D folio_memcg(folio); + struct mem_cgroup *memcg; struct pglist_data *pgdat =3D folio_pgdat(folio); + unsigned short memcg_id; =20 BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH > BITS_PER_LONG - EVICTION_SH= IFT); =20 + rcu_read_lock(); + memcg =3D folio_memcg(folio); lruvec =3D mem_cgroup_lruvec(memcg, pgdat); lrugen =3D &lruvec->lrugen; min_seq =3D READ_ONCE(lrugen->min_seq[type]); @@ -253,8 +256,10 @@ static void *lru_gen_eviction(struct folio *folio) =20 hist =3D lru_hist_from_seq(min_seq); atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); + memcg_id =3D mem_cgroup_id(memcg); + rcu_read_unlock(); =20 - return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset); + return pack_shadow(memcg_id, pgdat, token, workingset); } =20 /* --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81B42265CAD for ; Wed, 17 Dec 2025 07:32:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956747; cv=none; b=C2Kn9ffDCPPsMCfjo/mgJ/Y0zZYlxXOwqsV+Vz8tHWkQ1hy5yBWsDEN1km/okaiDr7/dYLGrNwXqMl3ql8OT/IiaPjUJqPXo2vHN1Cfw40uMB65hsGwRYrscq4zs5l3UmHo/atVubDrSbLjL+F2ozv48tUoRB1op8Fdz4vlDTeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956747; c=relaxed/simple; bh=69RjVH8TirWS3KliuY4x7mW+QRT+PoHoP27Nooi0lYo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=b8sbH08OVJBOmyA/UlmxR+KwKp7VkVCDxSaTJizB3YJMk+6YFJ83olty1O9S8mrQ09Zp9A8xDU7CU/czOCWfvYjFDgBx34K7VMQdquf6Ag+4sOvv4xngnj4uziTNl3vWdnbebmm8eOoYOIxqnQe4d7OJ5dCjSd0TqL9tF3lnSWk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=sWrtvrRI; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="sWrtvrRI" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XgHA+TVYxxe6Vv06rsmZd9zVObrCtofW37vIgARv2qc=; b=sWrtvrRIUjcLsV7Wz5k1ASxrUwoIXf5ObCdZlxnW7diW5X7mj4vFtrqgIm0bmWwwlrporg FvuQhwE+227JPy6VoGESBiGjDTkh1nwLknInGXf2rSOD4WFKaF/jNCHA5EV+7gOdbt70ce PqEwH7/QF0cw5Zu1jPS8S3uwLTa7hUM= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 17/28] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Date: Wed, 17 Dec 2025 15:27:41 +0800 Message-ID: <4cb81ea06298a3b41873b7086bfc68f64b2ba8be.1765956025.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in folio_split_queue_lock{_irqsave}(). Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: David Hildenbrand (Red Hat) Acked-by: Johannes Weiner --- mm/huge_memory.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 12b46215b30c1..b9e6855ec0b6a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1154,13 +1154,25 @@ split_queue_lock_irqsave(int nid, struct mem_cgroup= *memcg, unsigned long *flags =20 static struct deferred_split *folio_split_queue_lock(struct folio *folio) { - return split_queue_lock(folio_nid(folio), folio_memcg(folio)); + struct deferred_split *queue; + + rcu_read_lock(); + queue =3D split_queue_lock(folio_nid(folio), folio_memcg(folio)); + rcu_read_unlock(); + + return queue; } =20 static struct deferred_split * folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) { - return split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), fla= gs); + struct deferred_split *queue; + + rcu_read_lock(); + queue =3D split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), = flags); + rcu_read_unlock(); + + return queue; } =20 static inline void split_queue_unlock(struct deferred_split *queue) --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22E29265CAD for ; Wed, 17 Dec 2025 07:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956758; cv=none; b=DcXbDzzaDkyAHhqvHnnvEJtj/FhuuWrkvEQFzrZf0q0aIuUbXSIjgNI5ISNK4IXqBwWEtUUuablsLsQq4UR/8Rl94i5cYw6iqeBjhjduwLhivxE23D52ojqz8/MdimyyOWbZ78AWN2dKeW8y2B/0KRlf98mZJQ4hBTHdp031cLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956758; c=relaxed/simple; bh=wdQJrAQvik1RORRRSK5agyJxG2LLdyS3lJaaV2kuLg4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YNJEzKQI9RPjUP15exK291vN+lu6f/DBXzXclWX5XK0cePlWSCS1Mb3OS7mCaPWeyvSsBU4taj/65j8jRIg3J/0IB6m+GH5t1AnnVIsLTVUnO8it6RqGE81R7/fRl+BnhoiZzFfQbEd/dP0DAToshMdVJxW5KaNzRJu3iANkBkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=T5ORc/6N; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="T5ORc/6N" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cYvZQgglq3AruPpCGYCizx0FTzSaTRzKdYal9LtEFnk=; b=T5ORc/6NzynyN75t5NLdbC/TN+V/CiKMrTESzw+PZr/EQRm4YPNNI9B4IDpZ3qzQYCPxeH DDkJEN4Y943MwYhsMvFQ5Auw0qsZmXdRaJ2uuvTOoQc+lWq0smWRfXT9CM0t+5XV4ls3As ABaeMLsbr7xgIQ2tXXD0YxFdDs/X3kw= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 18/28] mm: zswap: prevent memory cgroup release in zswap_compress() Date: Wed, 17 Dec 2025 15:27:42 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng In the near future, a folio will no longer pin its corresponding memory cgroup. To ensure safety, it will only be appropriate to hold the rcu read lock or acquire a reference to the memory cgroup returned by folio_memcg(), thereby preventing it from being released. In the current patch, the rcu read lock is employed to safeguard against the release of the memory cgroup in zswap_compress(). Signed-off-by: Qi Zheng Acked-by: Johannes Weiner --- mm/zswap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index 5d0f8b13a958d..b468046a90754 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -894,11 +894,14 @@ static bool zswap_compress(struct page *page, struct = zswap_entry *entry, * to the active LRU list in the case. */ if (comp_ret || !dlen || dlen >=3D PAGE_SIZE) { + rcu_read_lock(); if (!mem_cgroup_zswap_writeback_enabled( folio_memcg(page_folio(page)))) { + rcu_read_unlock(); comp_ret =3D comp_ret ? comp_ret : -EINVAL; goto unlock; } + rcu_read_unlock(); comp_ret =3D 0; dlen =3D PAGE_SIZE; dst =3D kmap_local_page(page); --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7542833D4EB; Wed, 17 Dec 2025 07:32:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956773; cv=none; b=CUqu+i8QYxNQLSQizMgCnL8c1OdDMUg3fYqd4mxxb6p8gWHdHUVs+PCQgZe8ljjtKkQ+CoiW9H5pzl1Ad5zOXlVMGvJTWabPvDNECYhC1OXrhtkuqEtjbk6ldhwoINjCmNLA3hZKCZ6tjFvSLFP6zc2v7yHWNX+evmrDnMv0r0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956773; c=relaxed/simple; bh=ansN195gKGsOuB8x9C0oB4o91QVBkG5P7dmvk9I+fIg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iVRcyLPUK5xEXXHfk0qOPtGw6YW6eEpK4887TamTtqjD1JbIeyrPQjPlH+lAGuBv0Htv0zrrTFzyV1OEzhldZBE2chPG8Q3VpNXP2/2uKjFnn0HiHu+oK7K0H5YevwodOgBMk5wwOdVG7U/2u6EtP4c0bAS8v2iTM/JE7H28xRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=utjimqar; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="utjimqar" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r7R2YsecSy/bMyWfSR2fXTDFkSfi8ZmuktkAOmJUkaM=; b=utjimqarpJXzPTVvhid+eE3cwOsJ/hVXX6Fz447+S0Nrj5Mhr4vYtYMabOisntRkuyl9gg MQYGXnv6tjh2xy/Ve9Lmnbfr5XyjYeZXIoIuOKmxsE5zZxZmcG1tdfavDUQWt8ksYgQ4hQ Asqcc8Q9449fvBZaJ9EYLaptzYLE8rs= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 19/28] mm: workingset: prevent lruvec release in workingset_refault() Date: Wed, 17 Dec 2025 15:27:43 +0800 Message-ID: <1b6ad26b5199b8134de37506b669ad4e3c0b6356.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_refault(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/workingset.c b/mm/workingset.c index 445fc634196d8..427ca1a5625e8 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -560,11 +560,12 @@ void workingset_refault(struct folio *folio, void *sh= adow) * locked to guarantee folio_memcg() stability throughout. */ nr =3D folio_nr_pages(folio); + rcu_read_lock(); lruvec =3D folio_lruvec(folio); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); =20 if (!workingset_test_recent(shadow, file, &workingset, true)) - return; + goto out; =20 folio_set_active(folio); workingset_age_nonresident(lruvec, nr); @@ -580,6 +581,8 @@ void workingset_refault(struct folio *folio, void *shad= ow) lru_note_cost_refault(folio); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr); } +out: + rcu_read_unlock(); } =20 /** --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 688F533E376 for ; Wed, 17 Dec 2025 07:33:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956786; cv=none; b=GUoPysBpcsJ80gj7VdeINHNV8HpYcVqdBXq6f/33cHnHlNFysOT8Nd10MyaPTxbxFdt9mWdb7Ni8C3n7Y3w5bt/p+bJh4HS/dkzh39d6SOvsp4Asmt/WORg9vXBDYXHPUDMoqQefbrGyhluyjNK6frOpYwDiz2QW6llu++FKyM8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956786; c=relaxed/simple; bh=V2QjUp/J2A3sgD5j0GufhNzAkOwa8ipgKv/DGTi1o4s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sOc8CSZHqsKS4gbAk0V23b7xRGJ6T6U9Fgx7n8OTp5Dgg7OeqSgYLkJq3Hw0bjFZOeNPydFS0Cp2y2rfxV1rOmo+MODMC6OzspshcLh1InRng7oT9GDt/43Xh+kkxMmW52Ami3fqAql53fNMhZiKoYnln1imsdr554QqwYfN4dk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=qss7+xs4; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="qss7+xs4" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956778; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3hCVtkwWInjeWp1lZK86oR9Lr8eWsGStrPfk/NkSL0k=; b=qss7+xs4+fcC/k4TRH856pLkCsZ3zf6NmhCCKBEDP94wPzgJX5Se5bzxYs1nQvCcPAFf1H IUyFBywc0v7UZh6JX+MCryCQnvJm9NcBRnkhxr55rLWMPZBrpBYI7A4pbR3nSLhuCbDRTI 7cQwLAFU/zWeAVJrioGetOwSuqdGWIE= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Nhat Pham , Chengming Zhou , Qi Zheng Subject: [PATCH v2 20/28] mm: zswap: prevent lruvec release in zswap_folio_swapin() Date: Wed, 17 Dec 2025 15:27:44 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in zswap_folio_swapin(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Acked-by: Nhat Pham Reviewed-by: Chengming Zhou Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/zswap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index b468046a90754..738d914e53549 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -664,8 +664,10 @@ void zswap_folio_swapin(struct folio *folio) struct lruvec *lruvec; =20 if (folio) { + rcu_read_lock(); lruvec =3D folio_lruvec(folio); atomic_long_inc(&lruvec->zswap_lruvec_state.nr_disk_swapins); + rcu_read_unlock(); } } =20 --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EF6B340D93 for ; Wed, 17 Dec 2025 07:33:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956798; cv=none; b=srr2fek6+9sISQtdkFjb1X4B+ebpdvEbdOWVykPiCwWryB1CHgVppbX+gMizWHm/Zuwhx34KlN2dW5+lEVb1Uk7+rh5xI3KYZnGPwEEdpS2+a5NTvIr2L5a+tjaILref6Sc3EKcSYv+M3giKjxl68oXtQqQj/lwkMLegwWTrxKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956798; c=relaxed/simple; bh=oOwZu8LB2xUfXg/u4xHF/LdzhGO4WBWTbELElUTt3aQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HUj7ORk0e7nvSol2sd8AByE8EN5fr4kwOzBQqD6Alt5ulvtySo7suJnRAB6nmmmczXV45FM6ZYZhTLih0qhKtqaeCrMUKPFwojVHAjIPTgZ5IRvaG1DL19y7AW+g4p6SpflaPADNOuWpozg80Z1M7E3rMML20bz6DFlKUtee8Rk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=tfCThczT; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="tfCThczT" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956790; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TEKzYkGMehRKc5BRVPJO6+VCdZxZFdG/Vbt/yGKMiiM=; b=tfCThczTZ5dXkREXxScD5XtwTpBC3Z49+zvUce5Q/nbsjgKPLwV4v8wGIQWkwug47qe12I N6JYxxROFlYv9wp09tles8iGrq3+oTnJifX3x5SldxBXk8H1lwOEOLDqkg8XlgU+LhZJAA 4q/eNpVSgu5kYQ9ENtmtE7Ar3sRAt84= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 21/28] mm: swap: prevent lruvec release in lru_gen_clear_refs() Date: Wed, 17 Dec 2025 15:27:45 +0800 Message-ID: <42682f81686e31019504a6e025fa08d2c9dea718.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in lru_gen_clear_refs(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/swap.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index ec0c654e128dc..0606795f3ccf3 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -412,18 +412,20 @@ static void lru_gen_inc_refs(struct folio *folio) =20 static bool lru_gen_clear_refs(struct folio *folio) { - struct lru_gen_folio *lrugen; int gen =3D folio_lru_gen(folio); int type =3D folio_is_file_lru(folio); + unsigned long seq; =20 if (gen < 0) return true; =20 set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS | BIT(PG_workingset), 0); =20 - lrugen =3D &folio_lruvec(folio)->lrugen; + rcu_read_lock(); + seq =3D READ_ONCE(folio_lruvec(folio)->lrugen.min_seq[type]); + rcu_read_unlock(); /* whether can do without shuffling under the LRU lock */ - return gen =3D=3D lru_gen_from_seq(READ_ONCE(lrugen->min_seq[type])); + return gen =3D=3D lru_gen_from_seq(seq); } =20 #else /* !CONFIG_LRU_GEN */ --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C333B341061 for ; Wed, 17 Dec 2025 07:33:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956809; cv=none; b=RLRIUe7n7niejVcn2uooELYOgdNdYukCSsWsb3B109Eu/w8SSCCsYWOHXJD3fJ35MyMzpKyQ0q9EcS0Uq26VyQhQIA4E46kQOWQRpxPrWxk5pcoDYwpSt/ffnvUjHvLgx9Ci0t5Yvg3MzVRjd0Z/Da5s9/c6aBFg5/DjSZh9FxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956809; c=relaxed/simple; bh=OHFZ0LNEpKuTW12Z6kaHmQe+wuzpZdGmCCr/dTwWHsg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aai5pKbfWD0ZDfqbOLV0+Ks62lGrzlfHrBlZzlaibTrh9xjOZ0yL6zeNlU26X3AuKC/uI594spPjYtw+PmEDZMT/X6jggZMQtwdNq07WJJE63f9sxSuy0UTLrE9J3uRH11Mla7bcYg0PAOHlNUx814j9rZAe4MXc0Bqluqzj3UU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Dusw2/39; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Dusw2/39" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MLWSJF0nLNh7N5C/0x5tciJKSygGd9RMW0b7OeDmmlY=; b=Dusw2/39bPZpCKgWRmixvObH3oye1W0d2l/EI+/2d4touyB7sDcx41H6ThV+apVJtk/Yzi Mh3NZrd4Dfg10moPMGz7Vru+UO3Po4bUjn2yLFwclZU0w1iEms+dNnGleWkuQpAKJ4cqlr 7M4Gnng6vNejpEIsBHdQU/5f4k5uVa0= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 22/28] mm: workingset: prevent lruvec release in workingset_activation() Date: Wed, 17 Dec 2025 15:27:46 +0800 Message-ID: <195a8cb47b90e48cd1ec6cb93bc33a8e794847f6.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song In the near future, a folio will no longer pin its corresponding memory cgroup. So an lruvec returned by folio_lruvec() could be released without the rcu read lock or a reference to its memory cgroup. In the current patch, the rcu read lock is employed to safeguard against the release of the lruvec in workingset_activation(). This serves as a preparatory measure for the reparenting of the LRU pages. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/workingset.c b/mm/workingset.c index 427ca1a5625e8..d6484f7a3ad28 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -595,8 +595,11 @@ void workingset_activation(struct folio *folio) * Filter non-memcg pages here, e.g. unmap can call * mark_page_accessed() on VDSO pages. */ - if (mem_cgroup_disabled() || folio_memcg_charged(folio)) + if (mem_cgroup_disabled() || folio_memcg_charged(folio)) { + rcu_read_lock(); workingset_age_nonresident(folio_lruvec(folio), folio_nr_pages(folio)); + rcu_read_unlock(); + } } =20 /* --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6EF5342532 for ; Wed, 17 Dec 2025 07:33:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956823; cv=none; b=SkmiIoV4SOnVp9EwMNPpfMn2l+cBWa4D8hzapd3PVG34hTKI7UlV7aHWFafkBAPsAVII9YZVkvNtIWXpkNcTNxkx2Mvys65RI1raqjjPsFWnMxUCGiP6IuVaT5U8PmWqlnlhEbmjJkGY+LjYMbP/bDzCBQcfY2Nfyo6HizHjJxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956823; c=relaxed/simple; bh=iZw9sD9n4TGNzBelAqfFvNoggjdYiuQEZV7FLFy86wo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L2uaUy+cQAsVKpHa0n0ckt1EMG/7cSGJBHmiGKn/75z5mdS5kgzeVQiVK3R6CoHbC2KBcHoYggXJe5sFJGZKYjjipoVe6wVFOCul0QAVp0AcjRROHN2Pck1QWBBH+D5zARytlOJ2V3RNhrficjbDv8iCcjSAKCZ87/0qfEtsxSM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=tidJtTr2; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="tidJtTr2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R8YR9gN+9LKeUz1NQLMLCUGYh2Hy/QLvO4JfzYTy1RU=; b=tidJtTr2EuymlnKpDmZPd1dmfRgbULwp4WtdUDc16XnyIjODf6yUCldeSWrpQWBTBYp4BY UWVJwLpBTE+3v8zXTelxiSGHmsRYdpSlwHHU4p65p7Y3feB/FoZev4ao7VwqCdGH+bHpcJ zZOnHwD3qZ4ZgJ2CBPl+BiBu4QdxlHM= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 23/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Date: Wed, 17 Dec 2025 15:27:47 +0800 Message-ID: <6d643ea41dd89134eb3c7af96f5bfb3531da7aa7.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song The following diagram illustrates how to ensure the safety of the folio lruvec lock when LRU folios undergo reparenting. In the folio_lruvec_lock(folio) function: ``` rcu_read_lock(); retry: lruvec =3D folio_lruvec(folio); /* There is a possibility of folio reparenting at this point. */ spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { /* * The wrong lruvec lock was acquired, and a retry is required. * This is because the folio resides on the parent memcg lruvec * list. */ spin_unlock(&lruvec->lru_lock); goto retry; } /* Reaching here indicates that folio_memcg() is stable. */ ``` In the memcg_reparent_objcgs(memcg) function: ``` spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); /* Transfer folios from the lruvec list to the parent's. */ spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); ``` After acquiring the lruvec lock, it is necessary to verify whether the folio has been reparented. If reparenting has occurred, the new lruvec lock must be reacquired. During the LRU folio reparenting process, the lruvec lock will also be acquired (this will be implemented in a subsequent patch). Therefore, folio_memcg() remains unchanged while the lruvec lock is held. Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after the lruvec lock is acquired, the lruvec_memcg_debug() check is redundant. Hence, it is removed. This patch serves as a preparation for the reparenting of LRU folios. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 26 ++++++++----------- mm/compaction.c | 29 ++++++++++++++++----- mm/memcontrol.c | 53 +++++++++++++++++++------------------- 3 files changed, 61 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 69c4bcfb3c3cd..85265b28c5d18 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -740,7 +740,11 @@ static inline struct lruvec *mem_cgroup_lruvec(struct = mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * The user should hold an rcu read lock to protect lruvec associated with + * the folio from being released. But it does not prevent binding stability + * between the folio and the returned lruvec from being changed to its par= ent + * or ancestor (e.g. like folio_lruvec_lock() does that holds LRU lock to + * prevent the change). */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -763,15 +767,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *fol= io); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1194,11 +1189,6 @@ static inline struct lruvec *folio_lruvec(struct fol= io *folio) return &pgdat->__lruvec; } =20 -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memc= g) { return NULL; @@ -1257,6 +1247,7 @@ static inline struct lruvec *folio_lruvec_lock(struct= folio *folio) { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } @@ -1265,6 +1256,7 @@ static inline struct lruvec *folio_lruvec_lock_irq(st= ruct folio *folio) { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock_irq(&pgdat->__lruvec.lru_lock); return &pgdat->__lruvec; } @@ -1274,6 +1266,7 @@ static inline struct lruvec *folio_lruvec_lock_irqsav= e(struct folio *folio, { struct pglist_data *pgdat =3D folio_pgdat(folio); =20 + rcu_read_lock(); spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); return &pgdat->__lruvec; } @@ -1487,17 +1480,20 @@ static inline struct lruvec *parent_lruvec(struct l= ruvec *lruvec) static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); + rcu_read_unlock(); } =20 static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); } =20 static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); + rcu_read_unlock(); } =20 /* Test requires a stable folio->memcg binding, see folio_memcg() */ diff --git a/mm/compaction.c b/mm/compaction.c index c3e338aaa0ffb..3648ce22c8072 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -518,6 +518,24 @@ static bool compact_lock_irqsave(spinlock_t *lock, uns= igned long *flags, return true; } =20 +static struct lruvec * +compact_folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flag= s, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); + compact_lock_irqsave(&lruvec->lru_lock, flags, cc); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avo= id @@ -839,7 +857,7 @@ isolate_migratepages_block(struct compact_control *cc, = unsigned long low_pfn, { pg_data_t *pgdat =3D cc->zone->zone_pgdat; unsigned long nr_scanned =3D 0, nr_isolated =3D 0; - struct lruvec *lruvec; + struct lruvec *lruvec =3D NULL; unsigned long flags =3D 0; struct lruvec *locked =3D NULL; struct folio *folio =3D NULL; @@ -1153,18 +1171,17 @@ isolate_migratepages_block(struct compact_control *= cc, unsigned long low_pfn, if (!folio_test_clear_lru(folio)) goto isolate_fail_put; =20 - lruvec =3D folio_lruvec(folio); + if (locked) + lruvec =3D folio_lruvec(folio); =20 /* If we already hold the lock, we can skip some rechecking */ - if (lruvec !=3D locked) { + if (lruvec !=3D locked || !locked) { if (locked) lruvec_unlock_irqrestore(locked, flags); =20 - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec =3D compact_folio_lruvec_lock_irqsave(folio, &flags, cc); locked =3D lruvec; =20 - lruvec_memcg_debug(lruvec, folio); - /* * Try get exclusive access under lock. If marked for * skip, the scan is aborted unless the current context diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f2c891c1f49d5..930dacd6ce31a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1184,23 +1184,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, } } =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg =3D folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(!mem_cgroup_is_root(lruvec_memcg(lruvec)), folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) !=3D memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1210,14 +1193,20 @@ void lruvec_memcg_debug(struct lruvec *lruvec, stru= ct folio *folio) * - folio_test_lru false * - folio frozen (refcount of 0) * - * Return: The lruvec this folio is on with its lock held. + * Return: The lruvec this folio is on with its lock held and rcu read loc= k held. */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1232,14 +1221,20 @@ struct lruvec *folio_lruvec_lock(struct folio *foli= o) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } =20 return lruvec; } @@ -1255,15 +1250,21 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *= folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } =20 return lruvec; } --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 355933431FA for ; Wed, 17 Dec 2025 07:33:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956835; cv=none; b=Qsbe7rGcSS63INvtMIgutSj96B8gvgUy/SRojX7zzrqmrB90H+6LdVC4KnHnsYvmsARYGsVsnffnCNKhZPr+YnPecgKnAFKzytkpJA9FWgc2Avaz2D0zNYrv9pwQnwQ/+ycyzbzIuUwqnG4TAxU3lXa1xio4qxUJv6xTweCEF/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956835; c=relaxed/simple; bh=d1uvfBYPvgUMKuoz46yjDNePoh2ydg4Ri+wGNDlR2WU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MEDBO1lq7/NqeW1AAF29aNSRoJzuT9KSpKAAwQKPiRD3YWb1VNhPJfW6ZjhoFOHu2FvRkO6XG/Z2TvUsLSxHdGeiJqvhaXw2fJ6p7hZQYZ3XlDiS+6Zz5ZGRX02ZpVvqz2GQtS/2ccKOQ9biQludR3ZZmxO3sN4DZ4X9BYpRtKk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=fqyO8smi; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="fqyO8smi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xOMfKFXTzfX2R2XpU+f1ml9Q+i3fk14KvQzSfKjMTSY=; b=fqyO8smiuddWbVlG0vZAYnu6k2elDEhHFJPXEaCWp6peGVZcY6pMReFkk9h4m3bVD6QPdn shz0rtopSn/vAxHqNv0DClGCEw4T5XSRRwgSQh5i2gVr6Kn3qlit2xtndhCGfKu2Siggu+ Cy42Gx14dN+VV6cgtHLhEVoRabHM4vI= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 24/28] mm: vmscan: prepare for reparenting traditional LRU folios Date: Wed, 17 Dec 2025 15:27:48 +0800 Message-ID: <800faf905149ee1e1699d9fd319842550d343f43.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng To reslove the dying memcg issue, we need to reparent LRU folios of child memcg to its parent memcg. For traditional LRU list, each lruvec of every memcg comprises four LRU lists. Due to the symmetry of the LRU lists, it is feasible to transfer the LRU lists from a memcg to its parent memcg during the reparenting process. This commit implements the specific function, which will be used during the reparenting process. Signed-off-by: Qi Zheng Reviewed-by: Harry Yoo Acked-by: Johannes Weiner --- include/linux/mmzone.h | 4 ++++ mm/vmscan.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 75ef7c9f9307f..08132012aa8b8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -366,6 +366,10 @@ enum lruvec_flags { LRUVEC_NODE_CONGESTED, }; =20 +#ifdef CONFIG_MEMCG +void lru_reparent_memcg(struct mem_cgroup *src, struct mem_cgroup *dst); +#endif /* CONFIG_MEMCG */ + #endif /* !__GENERATING_BOUNDS_H */ =20 /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 814498a2c1bd6..5fd0f97c3719c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2648,6 +2648,44 @@ static bool can_age_anon_pages(struct lruvec *lruvec, lruvec_memcg(lruvec)); } =20 +#ifdef CONFIG_MEMCG +static void lruvec_reparent_lru(struct lruvec *src, struct lruvec *dst, + enum lru_list lru) +{ + int zid; + struct mem_cgroup_per_node *mz_src, *mz_dst; + + mz_src =3D container_of(src, struct mem_cgroup_per_node, lruvec); + mz_dst =3D container_of(dst, struct mem_cgroup_per_node, lruvec); + + if (lru !=3D LRU_UNEVICTABLE) + list_splice_tail_init(&src->lists[lru], &dst->lists[lru]); + + for (zid =3D 0; zid < MAX_NR_ZONES; zid++) { + mz_dst->lru_zone_size[zid][lru] +=3D mz_src->lru_zone_size[zid][lru]; + mz_src->lru_zone_size[zid][lru] =3D 0; + } +} + +void lru_reparent_memcg(struct mem_cgroup *src, struct mem_cgroup *dst) +{ + int nid; + + for_each_node(nid) { + enum lru_list lru; + struct lruvec *src_lruvec, *dst_lruvec; + + src_lruvec =3D mem_cgroup_lruvec(src, NODE_DATA(nid)); + dst_lruvec =3D mem_cgroup_lruvec(dst, NODE_DATA(nid)); + dst_lruvec->anon_cost +=3D src_lruvec->anon_cost; + dst_lruvec->file_cost +=3D src_lruvec->file_cost; + + for_each_lru(lru) + lruvec_reparent_lru(src_lruvec, dst_lruvec, lru); + } +} +#endif + #ifdef CONFIG_LRU_GEN =20 #ifdef CONFIG_LRU_GEN_ENABLED --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1492343D8A for ; Wed, 17 Dec 2025 07:34:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956847; cv=none; b=Cg+S2s+tlHXpiLeDlR3g9XNqyc/H6Qb5bB81YJr3+mJJQJVjlvEVfS6RWGEfK3PA+H/uTjfvLsJ6Eem3NOFkk7BVjyRnttr6a8bmPiczlCa2R0KwZr47/is3VRTWz6YBj6+BzEsFwVHjBZDTr/Ch8qDDunoOE9r2u7p1CYZe214= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956847; c=relaxed/simple; bh=W0ZKn+6mb+Gj1V90Ma3loP3jlen8wTxTVGftcomO3iw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J+GxlgAYS4hePYOuNFnwtN8lj85Y5MMUOYEUT9G/x/ez9fVMcDISt5XYDJPyQ6DqulLSI8Q8ttZZptKn7lxueR38vDiPaxTiZMHo+pfum37Jzm0si1OOO8Q12d8nXXl31odFljpmwbxWXZ3rKxYZUPn8PeVEI1uelAl6EKrjnKk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=P8PB9ZU9; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="P8PB9ZU9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956841; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w8JtVsEwuAYX9hiBQJNmTbS3AoZrY1ADR5PkaJDwFdU=; b=P8PB9ZU9Q6MDmfdfPtWGse616l/crVah1JQtTwg41R/rrJo/+jWao2sbIDhN1hY5RoJtHp jlE7lGcLKeM5XVvntPmnQE+xWOdXeJEYeMqK5Oa+LRWnNdkcHO3OcicIiJkDDt4VMRH0x6 2chPgmBGuZSYyx8QQqXE1HsH5/pQ28w= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 25/28] mm: vmscan: prepare for reparenting MGLRU folios Date: Wed, 17 Dec 2025 15:27:49 +0800 Message-ID: <93cf8a847992563a096fdf9b24b18529606c29ee.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Similar to traditional LRU folios, in order to solve the dying memcg problem, we also need to reparenting MGLRU folios to the parent memcg when memcg offline. However, there are the following challenges: 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the number of generations of the parent and child memcg may be different, so we cannot simply transfer MGLRU folios in the child memcg to the parent memcg as we did for traditional LRU folios. 2. The generation information is stored in folio->flags, but we cannot traverse these folios while holding the lru lock, otherwise it may cause softlockup. 3. In walk_update_folio(), the gen of folio and corresponding lru size may be updated, but the folio is not immediately moved to the corresponding lru list. Therefore, there may be folios of different generations on an LRU list. 4. In lru_gen_del_folio(), the generation to which the folio belongs is found based on the generation information in folio->flags, and the corresponding LRU size will be updated. Therefore, we need to update the lru size correctly during reparenting, otherwise the lru size may be updated incorrectly in lru_gen_del_folio(). Finally, this patch chose a compromise method, which is to splice the lru list in the child memcg to the lru list of the same generation in the parent memcg during reparenting. And in order to ensure that the parent memcg has the same generation, we need to increase the generations in the parent memcg to the MAX_NR_GENS before reparenting. Of course, the same generation has different meanings in the parent and child memcg, this will cause confusion in the hot and cold information of folios. But other than that, this method is simple enough, the lru size is correct, and there is no need to consider some concurrency issues (such as lru_gen_del_folio()). To prepare for the above work, this commit implements the specific functions, which will be used during reparenting. Suggested-by: Harry Yoo Suggested-by: Imran Khan Signed-off-by: Qi Zheng --- include/linux/mmzone.h | 16 +++++ mm/vmscan.c | 141 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 157 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 08132012aa8b8..67c0e55da1220 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -628,6 +628,9 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg); void lru_gen_offline_memcg(struct mem_cgroup *memcg); void lru_gen_release_memcg(struct mem_cgroup *memcg); void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid); +void max_lru_gen_memcg(struct mem_cgroup *memcg); +bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg); +void lru_gen_reparent_memcg(struct mem_cgroup *src, struct mem_cgroup *dst= ); =20 #else /* !CONFIG_LRU_GEN */ =20 @@ -668,6 +671,19 @@ static inline void lru_gen_soft_reclaim(struct mem_cgr= oup *memcg, int nid) { } =20 +static inline void max_lru_gen_memcg(struct mem_cgroup *memcg) +{ +} + +static inline bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg) +{ + return true; +} + +static inline void lru_gen_reparent_memcg(struct mem_cgroup *src, struct m= em_cgroup *dst) +{ +} + #endif /* CONFIG_LRU_GEN */ =20 struct lruvec { diff --git a/mm/vmscan.c b/mm/vmscan.c index 5fd0f97c3719c..64a85eea26dc6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4466,6 +4466,147 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg,= int nid) lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); } =20 +bool recheck_lru_gen_max_memcg(struct mem_cgroup *memcg) +{ + int nid; + + for_each_node(nid) { + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + int type; + + for (type =3D 0; type < ANON_AND_FILE; type++) { + if (get_nr_gens(lruvec, type) !=3D MAX_NR_GENS) + return false; + } + } + + return true; +} + +static void try_to_inc_max_seq_nowalk(struct mem_cgroup *memcg, + struct lruvec *lruvec) +{ + struct lru_gen_mm_list *mm_list =3D get_mm_list(memcg); + struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); + int swappiness =3D mem_cgroup_swappiness(memcg); + DEFINE_MAX_SEQ(lruvec); + bool success =3D false; + + /* + * We are not iterating the mm_list here, updating mm_state->seq is just + * to make mm walkers work properly. + */ + if (mm_state) { + spin_lock(&mm_list->lock); + VM_WARN_ON_ONCE(mm_state->seq + 1 < max_seq); + if (max_seq > mm_state->seq) { + WRITE_ONCE(mm_state->seq, mm_state->seq + 1); + success =3D true; + } + spin_unlock(&mm_list->lock); + } else { + success =3D true; + } + + if (success) + inc_max_seq(lruvec, max_seq, swappiness); +} + +/* + * We need to ensure that the folios of child memcg can be reparented to t= he + * same gen of the parent memcg, so the gens of the parent memcg needed be + * incremented to the MAX_NR_GENS before reparenting. + */ +void max_lru_gen_memcg(struct mem_cgroup *memcg) +{ + int nid; + + for_each_node(nid) { + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + int type; + + for (type =3D 0; type < ANON_AND_FILE; type++) { + while (get_nr_gens(lruvec, type) < MAX_NR_GENS) { + try_to_inc_max_seq_nowalk(memcg, lruvec); + cond_resched(); + } + } + } +} + +/* + * Compared to traditional LRU, MGLRU faces the following challenges: + * + * 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the + * number of generations of the parent and child memcg may be different, + * so we cannot simply transfer MGLRU folios in the child memcg to the + * parent memcg as we did for traditional LRU folios. + * 2. The generation information is stored in folio->flags, but we cannot + * traverse these folios while holding the lru lock, otherwise it may + * cause softlockup. + * 3. In walk_update_folio(), the gen of folio and corresponding lru size + * may be updated, but the folio is not immediately moved to the + * corresponding lru list. Therefore, there may be folios of different + * generations on an LRU list. + * 4. In lru_gen_del_folio(), the generation to which the folio belongs is + * found based on the generation information in folio->flags, and the + * corresponding LRU size will be updated. Therefore, we need to update + * the lru size correctly during reparenting, otherwise the lru size may + * be updated incorrectly in lru_gen_del_folio(). + * + * Finally, we choose a compromise method, which is to splice the lru list= in + * the child memcg to the lru list of the same generation in the parent me= mcg + * during reparenting. + * + * The same generation has different meanings in the parent and child memc= g, + * so this compromise method will cause the LRU inversion problem. But as = the + * system runs, this problem will be fixed automatically. + */ +static void __lru_gen_reparent_memcg(struct lruvec *src_lruvec, struct lru= vec *dst_lruvec, + int zone, int type) +{ + struct lru_gen_folio *src_lrugen, *dst_lrugen; + enum lru_list lru =3D type * LRU_INACTIVE_FILE; + int i; + + src_lrugen =3D &src_lruvec->lrugen; + dst_lrugen =3D &dst_lruvec->lrugen; + + for (i =3D 0; i < get_nr_gens(src_lruvec, type); i++) { + int gen =3D lru_gen_from_seq(src_lrugen->max_seq - i); + long nr_pages =3D src_lrugen->nr_pages[gen][type][zone]; + int src_lru_active =3D lru_gen_is_active(src_lruvec, gen) ? LRU_ACTIVE := 0; + int dst_lru_active =3D lru_gen_is_active(dst_lruvec, gen) ? LRU_ACTIVE := 0; + + list_splice_tail_init(&src_lrugen->folios[gen][type][zone], + &dst_lrugen->folios[gen][type][zone]); + + WRITE_ONCE(src_lrugen->nr_pages[gen][type][zone], 0); + WRITE_ONCE(dst_lrugen->nr_pages[gen][type][zone], + dst_lrugen->nr_pages[gen][type][zone] + nr_pages); + + __update_lru_size(src_lruvec, lru + src_lru_active, zone, -nr_pages); + __update_lru_size(dst_lruvec, lru + dst_lru_active, zone, nr_pages); + } +} + +void lru_gen_reparent_memcg(struct mem_cgroup *src, struct mem_cgroup *dst) +{ + int nid; + + for_each_node(nid) { + struct lruvec *src_lruvec, *dst_lruvec; + int type, zone; + + src_lruvec =3D get_lruvec(src, nid); + dst_lruvec =3D get_lruvec(dst, nid); + + for (zone =3D 0; zone < MAX_NR_ZONES; zone++) + for (type =3D 0; type < ANON_AND_FILE; type++) + __lru_gen_reparent_memcg(src_lruvec, dst_lruvec, zone, type); + } +} + #endif /* CONFIG_MEMCG */ =20 /*************************************************************************= ***** --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6FF8344056 for ; Wed, 17 Dec 2025 07:34:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956859; cv=none; b=sPlit5hoVzZSguurRyrorV4xrwC/rToXhK5ZaA+ZcIzdr9kdd0D2VZ2yuQVJ9p48q5klOG3ICqR+m09Ws7wB1vrKuDz+uvNyUeErm49YHUs18TM88ojQuZSR4KPJEbcPCDm5dz6jAx7pJ5vBEETB8JM6Esl4CFxrSIgUWtqdHIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956859; c=relaxed/simple; bh=ApGhz8PrqdtnYhglElt3PQ8WpfkLuxmUgCZ4HCQcF7E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AreATwyJx+55DvOiBC2949CFB2O+D/Nlrc2dH+ppYj1SoLmXF5S6ez2nY9fZcTIRWAbFfDq865AsfLmIlFKHHNIoGd5kih1FBuxXDRwPeAVDleIOGbSfa/2ZCWNKf9YpQY8DvDX7ZlMUx6Ah4v1T6e81Fh8MxjcH9ZleYzMQlsc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=AXc74V+X; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="AXc74V+X" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956850; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EV6q27Ww1Pi8gdjbSvPXKtgD9Ua4KYVSU4hxepzaQ/c=; b=AXc74V+X5J85APZtey4ZYmzvZIJ2n7TB8CPnOTzJ7cjlFcn2vZ3x3ewYMJKyeAZsbiEzDu FKOiBmq0kxoybbzgvggqr2RteHqX3hID9EFLyHUtckL6onKDbRzk9Ec261fXIQOpz1aFq2 eiGoRqCmHtDMmXsNBl3TymksjA3OhxA= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Qi Zheng Subject: [PATCH v2 26/28] mm: memcontrol: refactor memcg_reparent_objcgs() Date: Wed, 17 Dec 2025 15:27:50 +0800 Message-ID: <8e4dff3139390fc0f18546a770d2b35c9c148b8b.1765956026.git.zhengqi.arch@bytedance.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Qi Zheng Refactor the memcg_reparent_objcgs() to facilitate subsequent reparenting LRU folios here. Signed-off-by: Qi Zheng Acked-by: Johannes Weiner --- mm/memcontrol.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 930dacd6ce31a..3daa99a0c65fe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -206,24 +206,41 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } =20 -static void memcg_reparent_objcgs(struct mem_cgroup *memcg) +static inline void __memcg_reparent_objcgs(struct mem_cgroup *src, + struct mem_cgroup *dst) { struct obj_cgroup *objcg, *iter; - struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); - - objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); - - spin_lock_irq(&objcg_lock); =20 + objcg =3D rcu_replace_pointer(src->objcg, NULL, true); /* 1) Ready to reparent active objcg. */ - list_add(&objcg->list, &memcg->objcg_list); + list_add(&objcg->list, &src->objcg_list); /* 2) Reparent active objcg and already reparented objcgs to parent. */ - list_for_each_entry(iter, &memcg->objcg_list, list) - WRITE_ONCE(iter->memcg, parent); + list_for_each_entry(iter, &src->objcg_list, list) + WRITE_ONCE(iter->memcg, dst); /* 3) Move already reparented objcgs to the parent's list */ - list_splice(&memcg->objcg_list, &parent->objcg_list); + list_splice(&src->objcg_list, &dst->objcg_list); +} + +static inline void reparent_locks(struct mem_cgroup *src, struct mem_cgrou= p *dst) +{ + spin_lock_irq(&objcg_lock); +} =20 +static inline void reparent_unlocks(struct mem_cgroup *src, struct mem_cgr= oup *dst) +{ spin_unlock_irq(&objcg_lock); +} + +static void memcg_reparent_objcgs(struct mem_cgroup *src) +{ + struct obj_cgroup *objcg =3D rcu_dereference_protected(src->objcg, true); + struct mem_cgroup *dst =3D parent_mem_cgroup(src); + + reparent_locks(src, dst); + + __memcg_reparent_objcgs(src, dst); + + reparent_unlocks(src, dst); =20 percpu_ref_kill(&objcg->refcnt); } --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE1EC345CCC for ; Wed, 17 Dec 2025 07:34:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956883; cv=none; b=KAtOr079/KjMBnaZCyNGklXR8E/SnT0UXYAQSiLOGH4aeBuDrIYj67AP0nlJf9oV64YVtQ/1f1Qyj0VKmAWWzsRwk1oBB0s+TSMfDwa9DWw5Xo4QQCjpui28UsBEUfdcOw51sXIAfYCoZBd/hJo8xT3G5I6mc7ahmfOB8b8XsE8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956883; c=relaxed/simple; bh=MHsuKuVRRcq3sIq8agVpEuV5gWicTAMPQuS4B2TBrjo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DCAUHqddPGEkjIyv4KDWiIhznk6JJVndKqnyd+LLoq1md+7EWwaAnpn5m2A0eN5si+EzK95bhBNYv3BdsB+poJfOKjcN9d/1ct5R0mTauTWelKpQ5M48//jnv2g8W03QEXyJY60D3dTjNdhiyPURva03hqk9INe9eGbIRG+RP40= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=UacKPeUo; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="UacKPeUo" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/gN2incYyd3bEfYUDdyG/px7wt+Bzu1XD/CTk+F4CEY=; b=UacKPeUoE72RU6WlaViTblNzq31GLZjQTFmJ8WgKimuCJhWC3wgTm5WfyBVq4iB2PkQtsA G36vivMut2LKuUW3ffrmaSX067rJSKhPz+rknuw8x3SdshYTtk+TKD2i+l0bTKSEKhZa2i /Q04oTEd9QyPMpOaTqpe0rOVtF61QZE= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Date: Wed, 17 Dec 2025 15:27:51 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song Pagecache pages are charged at allocation time and hold a reference to the original memory cgroup until reclaimed. Depending on memory pressure, page sharing patterns between different cgroups and cgroup creation/destruction rates, many dying memory cgroups can be pinned by pagecache pages, reducing page reclaim efficiency and wasting memory. Converting LRU folios and most other raw memory cgroup pins to the object cgroup direction can fix this long-living problem. Finally, folio->memcg_data of LRU folios and kmem folios will always point to an object cgroup pointer. The folio->memcg_data of slab folios will point to an vector of object cgroups. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng --- include/linux/memcontrol.h | 77 +++++---------- mm/memcontrol-v1.c | 15 +-- mm/memcontrol.c | 189 +++++++++++++++++++++++-------------- 3 files changed, 150 insertions(+), 131 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 85265b28c5d18..9be52ce72f2c5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -369,9 +369,6 @@ enum objext_flags { #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) =20 #ifdef CONFIG_MEMCG - -static inline bool folio_memcg_kmem(struct folio *folio); - /* * After the initialization objcg->memcg is always pointing at * a valid memcg, but can be atomically swapped to the parent memcg. @@ -385,43 +382,19 @@ static inline struct mem_cgroup *obj_cgroup_memcg(str= uct obj_cgroup *objcg) } =20 /* - * __folio_memcg - Get the memory cgroup associated with a non-kmem folio - * @folio: Pointer to the folio. - * - * Returns a pointer to the memory cgroup associated with the folio, - * or NULL. This function assumes that the folio is known to have a - * proper memory cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * kmem folios. - */ -static inline struct mem_cgroup *__folio_memcg(struct folio *folio) -{ - unsigned long memcg_data =3D folio->memcg_data; - - VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); - - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); -} - -/* - * __folio_objcg - get the object cgroup associated with a kmem folio. + * folio_objcg - get the object cgroup associated with a folio. * @folio: Pointer to the folio. * * Returns a pointer to the object cgroup associated with the folio, * or NULL. This function assumes that the folio is known to have a - * proper object cgroup pointer. It's not safe to call this function - * against some type of folios, e.g. slab folios or ex-slab folios or - * LRU folios. + * proper object cgroup pointer. */ -static inline struct obj_cgroup *__folio_objcg(struct folio *folio) +static inline struct obj_cgroup *folio_objcg(struct folio *folio) { unsigned long memcg_data =3D folio->memcg_data; =20 VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); - VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); =20 return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); } @@ -435,21 +408,30 @@ static inline struct obj_cgroup *__folio_objcg(struct= folio *folio) * proper memory cgroup pointer. It's not safe to call this function * against some type of folios, e.g. slab folios or ex-slab folios. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * For a folio any of the following ensures folio and objcg binding stabil= ity: * * - the folio lock * - LRU isolation * - exclusive reference * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * Based on the stable binding of folio and objcg, for a folio any of the + * following ensures folio and memcg binding stability: + * + * - cgroup_mutex + * - the lruvec lock + * + * If the caller only want to ensure that the page counters of memcg are + * updated correctly, ensure that the binding stability of folio and objcg + * is sufficient. + * + * Note: The caller should hold an rcu read lock or cgroup_mutex to protect + * memcg associated with a folio from being released. */ static inline struct mem_cgroup *folio_memcg(struct folio *folio) { - if (folio_memcg_kmem(folio)) - return obj_cgroup_memcg(__folio_objcg(folio)); - return __folio_memcg(folio); + struct obj_cgroup *objcg =3D folio_objcg(folio); + + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 /* @@ -473,15 +455,10 @@ static inline bool folio_memcg_charged(struct folio *= folio) * has an associated memory cgroup pointer or an object cgroups vector or * an object cgroup. * - * For a non-kmem folio any of the following ensures folio and memcg bindi= ng - * stability: + * The page and objcg or memcg binding rules can refer to folio_memcg(). * - * - the folio lock - * - LRU isolation - * - exclusive reference - * - * For a kmem folio a caller should hold an rcu read lock to protect memcg - * associated with a kmem folio from being released. + * A caller should hold an rcu read lock to protect memcg associated with a + * page from being released. */ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) { @@ -490,18 +467,14 @@ static inline struct mem_cgroup *folio_memcg_check(st= ruct folio *folio) * for slabs, READ_ONCE() should be used here. */ unsigned long memcg_data =3D READ_ONCE(folio->memcg_data); + struct obj_cgroup *objcg; =20 if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; =20 - if (memcg_data & MEMCG_DATA_KMEM) { - struct obj_cgroup *objcg; - - objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); - return obj_cgroup_memcg(objcg); - } + objcg =3D (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK); =20 - return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); + return objcg ? obj_cgroup_memcg(objcg) : NULL; } =20 static inline struct mem_cgroup *page_memcg_check(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 6eed14bff7426..23c07df2063c8 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -591,6 +591,7 @@ void memcg1_commit_charge(struct folio *folio, struct m= em_cgroup *memcg) void memcg1_swapout(struct folio *folio, swp_entry_t entry) { struct mem_cgroup *memcg, *swap_memcg; + struct obj_cgroup *objcg; unsigned int nr_entries; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); @@ -602,12 +603,13 @@ void memcg1_swapout(struct folio *folio, swp_entry_t = entry) if (!do_memsw_account()) return; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* * In case the memcg owning these pages has been offlined and doesn't * have an ID allocated to it anymore, charge the closest online @@ -625,7 +627,7 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) folio_unqueue_deferred_split(folio); folio->memcg_data =3D 0; =20 - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) page_counter_uncharge(&memcg->memory, nr_entries); =20 if (memcg !=3D swap_memcg) { @@ -646,7 +648,8 @@ void memcg1_swapout(struct folio *folio, swp_entry_t en= try) preempt_enable_nested(); memcg1_check_events(memcg, folio_nid(folio)); =20 - css_put(&memcg->css); + rcu_read_unlock(); + obj_cgroup_put(objcg); } =20 /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3daa99a0c65fe..cd2f0f0c0f5ce 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -223,22 +223,55 @@ static inline void __memcg_reparent_objcgs(struct mem= _cgroup *src, =20 static inline void reparent_locks(struct mem_cgroup *src, struct mem_cgrou= p *dst) { + int nid, nest =3D 0; + spin_lock_irq(&objcg_lock); + for_each_node(nid) { + spin_lock_nested(&mem_cgroup_lruvec(src, + NODE_DATA(nid))->lru_lock, nest++); + spin_lock_nested(&mem_cgroup_lruvec(dst, + NODE_DATA(nid))->lru_lock, nest++); + } } =20 static inline void reparent_unlocks(struct mem_cgroup *src, struct mem_cgr= oup *dst) { + int nid; + + for_each_node(nid) { + spin_unlock(&mem_cgroup_lruvec(dst, NODE_DATA(nid))->lru_lock); + spin_unlock(&mem_cgroup_lruvec(src, NODE_DATA(nid))->lru_lock); + } spin_unlock_irq(&objcg_lock); } =20 +static void memcg_reparent_lru_folios(struct mem_cgroup *src, + struct mem_cgroup *dst) +{ + if (lru_gen_enabled()) + lru_gen_reparent_memcg(src, dst); + else + lru_reparent_memcg(src, dst); +} + static void memcg_reparent_objcgs(struct mem_cgroup *src) { struct obj_cgroup *objcg =3D rcu_dereference_protected(src->objcg, true); struct mem_cgroup *dst =3D parent_mem_cgroup(src); =20 +retry: + if (lru_gen_enabled()) + max_lru_gen_memcg(dst); + reparent_locks(src, dst); + if (lru_gen_enabled() && !recheck_lru_gen_max_memcg(dst)) { + reparent_unlocks(src, dst); + cond_resched(); + goto retry; + } =20 __memcg_reparent_objcgs(src, dst); + memcg_reparent_lru_folios(src, dst); =20 reparent_unlocks(src, dst); =20 @@ -989,6 +1022,8 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) /** * get_mem_cgroup_from_folio - Obtain a reference on a given folio's memcg. * @folio: folio from which memcg should be extracted. + * + * The folio and objcg or memcg binding rules can refer to folio_memcg(). */ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio) { @@ -2557,17 +2592,17 @@ static inline int try_charge(struct mem_cgroup *mem= cg, gfp_t gfp_mask, return try_charge_memcg(memcg, gfp_mask, nr_pages); } =20 -static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) +static void commit_charge(struct folio *folio, struct obj_cgroup *objcg) { VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); /* - * Any of the following ensures page's memcg stability: + * Any of the following ensures folio's objcg stability: * * - the page lock * - LRU isolation * - exclusive reference */ - folio->memcg_data =3D (unsigned long)memcg; + folio->memcg_data =3D (unsigned long)objcg; } =20 #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC @@ -2671,6 +2706,17 @@ static struct obj_cgroup *__get_obj_cgroup_from_memc= g(struct mem_cgroup *memcg) return NULL; } =20 +static inline struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgro= up *memcg) +{ + struct obj_cgroup *objcg; + + rcu_read_lock(); + objcg =3D __get_obj_cgroup_from_memcg(memcg); + rcu_read_unlock(); + + return objcg; +} + static struct obj_cgroup *current_objcg_update(void) { struct mem_cgroup *memcg; @@ -2771,17 +2817,10 @@ struct obj_cgroup *get_obj_cgroup_from_folio(struct= folio *folio) { struct obj_cgroup *objcg; =20 - if (!memcg_kmem_online()) - return NULL; - - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); + objcg =3D folio_objcg(folio); + if (objcg) obj_cgroup_get(objcg); - } else { - rcu_read_lock(); - objcg =3D __get_obj_cgroup_from_memcg(__folio_memcg(folio)); - rcu_read_unlock(); - } + return objcg; } =20 @@ -3288,7 +3327,7 @@ void folio_split_memcg_refs(struct folio *folio, unsi= gned old_order, return; =20 new_refs =3D (1 << (old_order - new_order)) - 1; - css_get_many(&__folio_memcg(folio)->css, new_refs); + obj_cgroup_get_many(folio_objcg(folio), new_refs); } =20 unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) @@ -4737,16 +4776,20 @@ void mem_cgroup_calculate_protection(struct mem_cgr= oup *root, static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, gfp_t gfp) { - int ret; - - ret =3D try_charge(memcg, gfp, folio_nr_pages(folio)); - if (ret) - goto out; + int ret =3D 0; + struct obj_cgroup *objcg; =20 - css_get(&memcg->css); - commit_charge(folio, memcg); + objcg =3D get_obj_cgroup_from_memcg(memcg); + /* Do not account at the root objcg level. */ + if (!obj_cgroup_is_root(objcg)) + ret =3D try_charge(memcg, gfp, folio_nr_pages(folio)); + if (ret) { + obj_cgroup_put(objcg); + return ret; + } + commit_charge(folio, objcg); memcg1_commit_charge(folio, memcg); -out: + return ret; } =20 @@ -4832,7 +4875,7 @@ int mem_cgroup_swapin_charge_folio(struct folio *foli= o, struct mm_struct *mm, } =20 struct uncharge_gather { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; @@ -4846,58 +4889,52 @@ static inline void uncharge_gather_clear(struct unc= harge_gather *ug) =20 static void uncharge_batch(const struct uncharge_gather *ug) { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(ug->objcg); if (ug->nr_memory) { - memcg_uncharge(ug->memcg, ug->nr_memory); + memcg_uncharge(memcg, ug->nr_memory); if (ug->nr_kmem) { - mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem); - memcg1_account_kmem(ug->memcg, -ug->nr_kmem); + mod_memcg_state(memcg, MEMCG_KMEM, -ug->nr_kmem); + memcg1_account_kmem(memcg, -ug->nr_kmem); } - memcg1_oom_recover(ug->memcg); + memcg1_oom_recover(memcg); } =20 - memcg1_uncharge_batch(ug->memcg, ug->pgpgout, ug->nr_memory, ug->nid); + memcg1_uncharge_batch(memcg, ug->pgpgout, ug->nr_memory, ug->nid); + rcu_read_unlock(); =20 /* drop reference from uncharge_folio */ - css_put(&ug->memcg->css); + obj_cgroup_put(ug->objcg); } =20 static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) { long nr_pages; - struct mem_cgroup *memcg; struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); =20 /* * Nobody should be changing or seriously looking at - * folio memcg or objcg at this point, we have fully - * exclusive access to the folio. + * folio objcg at this point, we have fully exclusive + * access to the folio. */ - if (folio_memcg_kmem(folio)) { - objcg =3D __folio_objcg(folio); - /* - * This get matches the put at the end of the function and - * kmem pages do not hold memcg references anymore. - */ - memcg =3D get_mem_cgroup_from_objcg(objcg); - } else { - memcg =3D __folio_memcg(folio); - } - - if (!memcg) + objcg =3D folio_objcg(folio); + if (!objcg) return; =20 - if (ug->memcg !=3D memcg) { - if (ug->memcg) { + if (ug->objcg !=3D objcg) { + if (ug->objcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg =3D memcg; + ug->objcg =3D objcg; ug->nid =3D folio_nid(folio); =20 - /* pairs with css_put in uncharge_batch */ - css_get(&memcg->css); + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(objcg); } =20 nr_pages =3D folio_nr_pages(folio); @@ -4905,20 +4942,17 @@ static void uncharge_folio(struct folio *folio, str= uct uncharge_gather *ug) if (folio_memcg_kmem(folio)) { ug->nr_memory +=3D nr_pages; ug->nr_kmem +=3D nr_pages; - - folio->memcg_data =3D 0; - obj_cgroup_put(objcg); } else { /* LRU pages aren't accounted at the root level */ - if (!mem_cgroup_is_root(memcg)) + if (!obj_cgroup_is_root(objcg)) ug->nr_memory +=3D nr_pages; ug->pgpgout++; =20 WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); - folio->memcg_data =3D 0; } =20 - css_put(&memcg->css); + folio->memcg_data =3D 0; + obj_cgroup_put(objcg); } =20 void __mem_cgroup_uncharge(struct folio *folio) @@ -4942,7 +4976,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) uncharge_gather_clear(&ug); for (i =3D 0; i < folios->nr; i++) uncharge_folio(folios->folios[i], &ug); - if (ug.memcg) + if (ug.objcg) uncharge_batch(&ug); } =20 @@ -4959,6 +4993,7 @@ void __mem_cgroup_uncharge_folios(struct folio_batch = *folios) void mem_cgroup_replace_folio(struct folio *old, struct folio *new) { struct mem_cgroup *memcg; + struct obj_cgroup *objcg; long nr_pages =3D folio_nr_pages(new); =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); @@ -4973,21 +5008,24 @@ void mem_cgroup_replace_folio(struct folio *old, st= ruct folio *new) if (folio_memcg_charged(new)) return; =20 - memcg =3D folio_memcg(old); - VM_WARN_ON_ONCE_FOLIO(!memcg, old); - if (!memcg) + objcg =3D folio_objcg(old); + VM_WARN_ON_ONCE_FOLIO(!objcg, old); + if (!objcg) return; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); /* Force-charge the new page. The old one will be freed soon */ - if (!mem_cgroup_is_root(memcg)) { + if (!obj_cgroup_is_root(objcg)) { page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); } =20 - css_get(&memcg->css); - commit_charge(new, memcg); + obj_cgroup_get(objcg); + commit_charge(new, objcg); memcg1_commit_charge(new, memcg); + rcu_read_unlock(); } =20 /** @@ -5003,7 +5041,7 @@ void mem_cgroup_replace_folio(struct folio *old, stru= ct folio *new) */ void mem_cgroup_migrate(struct folio *old, struct folio *new) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 VM_BUG_ON_FOLIO(!folio_test_locked(old), old); VM_BUG_ON_FOLIO(!folio_test_locked(new), new); @@ -5014,18 +5052,18 @@ void mem_cgroup_migrate(struct folio *old, struct f= olio *new) if (mem_cgroup_disabled()) return; =20 - memcg =3D folio_memcg(old); + objcg =3D folio_objcg(old); /* - * Note that it is normal to see !memcg for a hugetlb folio. + * Note that it is normal to see !objcg for a hugetlb folio. * For e.g, itt could have been allocated when memory_hugetlb_accounting * was not selected. */ - VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !memcg, old); - if (!memcg) + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !objcg, old); + if (!objcg) return; =20 - /* Transfer the charge and the css ref */ - commit_charge(new, memcg); + /* Transfer the charge and the objcg ref */ + commit_charge(new, objcg); =20 /* Warning should never happen, so don't worry about refcount non-0 */ WARN_ON_ONCE(folio_unqueue_deferred_split(old)); @@ -5200,22 +5238,27 @@ int __mem_cgroup_try_charge_swap(struct folio *foli= o, swp_entry_t entry) unsigned int nr_pages =3D folio_nr_pages(folio); struct page_counter *counter; struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 if (do_memsw_account()) return 0; =20 - memcg =3D folio_memcg(folio); - - VM_WARN_ON_ONCE_FOLIO(!memcg, folio); - if (!memcg) + objcg =3D folio_objcg(folio); + VM_WARN_ON_ONCE_FOLIO(!objcg, folio); + if (!objcg) return 0; =20 + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); if (!entry.val) { memcg_memory_event(memcg, MEMCG_SWAP_FAIL); + rcu_read_unlock(); return 0; } =20 memcg =3D mem_cgroup_id_get_online(memcg); + /* memcg is pined by memcg ID. */ + rcu_read_unlock(); =20 if (!mem_cgroup_is_root(memcg) && !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) { --=20 2.20.1 From nobody Fri Dec 19 16:00:40 2025 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABB75346778 for ; Wed, 17 Dec 2025 07:34:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956887; cv=none; b=UAs0T6NNP/dhL8TYPpZj9BjiwPTjRhy6I0l9rXpAyRES+Soc0C+/KLUxz0Y5U21dpKew665EyRPkhnJsRoosTrDibPoYPmmDyCGr3e3w2Oz1tUotcFjT/M8jU8Q4YE0zUwhFLsErDhL9Ui+adz8m7aa0j8SgdTBJvnpkwJvJdG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765956887; c=relaxed/simple; bh=yYv7xS8s4Ims/kzpShNgtsffxq0sVO5GQfr68oj1Q48=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sYFdlIsKxUi9Xem8XGlAMETYUFF3DuTC+naj5igO6XTKLbi7nOp8SxJsCvL7u+1vPmmT5PBhYtB3K5saBC2diD56mO5VYAEJhlYd2TlIQ509SqDhi1MspiQgj/Q7F1tO/8Gbp74E309dO/RO4rks39YexTCsk+VGtj2Z//gEzb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=FrCcOy5z; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="FrCcOy5z" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1765956878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qToP01h6jjC7s+UEzKCCG2SIQKxDhYzXBfQMHZwu8oY=; b=FrCcOy5zejKPmIe6RZhxcc37rX0Thqf+a7WCqahbplcOnDgkMNMZvKvGVtfQsbsk7ikWoa wG02IG7PU6FRiYDU1ldrOYpSY/YflikOoZMb575URPc1K4yiDO1KifgR1iOm6uyd9ovkHd GyaYb4aNIiF22aPDMQKVbN8C1QfQPdM= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, chenridong@huaweicloud.com, mkoutny@suse.com, akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, lance.yang@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v2 28/28] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Date: Wed, 17 Dec 2025 15:27:52 +0800 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Muchun Song We must ensure the folio is deleted from or added to the correct lruvec list. So, add VM_WARN_ON_ONCE_FOLIO() to catch invalid users. The VM_BUG_ON_PAGE() in move_pages_to_lru() can be removed as add_page_to_lru_list() will perform the necessary check. Signed-off-by: Muchun Song Acked-by: Roman Gushchin Signed-off-by: Qi Zheng Acked-by: Johannes Weiner --- include/linux/mm_inline.h | 6 ++++++ mm/vmscan.c | 1 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index fa2d6ba811b53..ad50688d89dba 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -342,6 +342,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, false)) return; =20 @@ -356,6 +358,8 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struc= t folio *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_add_folio(lruvec, folio, true)) return; =20 @@ -370,6 +374,8 @@ void lruvec_del_folio(struct lruvec *lruvec, struct fol= io *folio) { enum lru_list lru =3D folio_lru_list(folio); =20 + VM_WARN_ON_ONCE_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); + if (lru_gen_del_folio(lruvec, folio, false)) return; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index 64a85eea26dc6..2dc3ae432a017 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1934,7 +1934,6 @@ static unsigned int move_folios_to_lru(struct list_he= ad *list) continue; } =20 - VM_BUG_ON_FOLIO(!folio_matches_lruvec(folio, lruvec), folio); lruvec_add_folio(lruvec, folio); nr_pages =3D folio_nr_pages(folio); nr_moved +=3D nr_pages; --=20 2.20.1