From nobody Tue Apr 7 19:03:33 2026 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5D6A44D035 for ; Thu, 26 Feb 2026 19:29:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772134201; cv=none; b=I9oQ1lZghQf8uKSE4H4ehsEWS+arDnrPHuw9muHp262UahhO0oV6GDkYgnjKOPZwms7hGzcI5YNivXRVbqhJAOKqaF24CBoXALV81mmH850tipEUIgt3LVrwhblff6Qg7zUbvVcjXpD1crTGrcZDheD4wYx72w+gFYxRXUfTkao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772134201; c=relaxed/simple; bh=edHifQhuD39SytJ2pdcEcAu31cCV6BrBABjFy6qdldM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gSDf2baBg9jp4ddUrP2PwxubdbKSlpSgO6Wny4nDgQlDV+1nkHyVs/3sUYMOUsqwo2iTUIvb3PloskDnPw5KaVhlhmCHuR7NsS2TWMefbnMDdSgC/VCtAXJOVVkjjNt+mlEv262BEUsw9B+brU52GbJfFTyyB8qE28NnL1eInmQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XhwA1atj; arc=none smtp.client-ip=209.85.210.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XhwA1atj" Received: by mail-ot1-f53.google.com with SMTP id 46e09a7af769-7d1890f5cafso492695a34.1 for ; Thu, 26 Feb 2026 11:29:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772134194; x=1772738994; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DB+jMUpA86bxO+NhQL7ENpCNqUG0HUtrRjDNh//YUrM=; b=XhwA1atjD5hDDHc1pivvDyZRBKzL1yjyBZIUMS6ZVJ2pwpsrN6h5tDl9SW1dtl+Mlx zAXYX1j+slujQgZ8cVtRA4dnkTpyZn2hIa5KwbVqoB0z6zyccJFV1KxrSh1dIMC6c+Mk +eIamg5qCaXnijzUa/67kVrEWHncBsoRfgZKs3Wb2Gz5k55ubRL3pkwAx2BzUturpQxr LGHduzsAOMFnF/76rerVqPjAfbIyFNJ0sLsw+d0WpDjP/ijKMRSNx109wyA1PLP+uq/o Jd9Io6Y1/v+5hF9FuY5219XfiTTCM89/dfcZVEYjoP5Am2mf1Hc7G+cGtgVcnqea0slx 7A5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772134194; x=1772738994; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=DB+jMUpA86bxO+NhQL7ENpCNqUG0HUtrRjDNh//YUrM=; b=oBAJYA9ADCsfdeTsTsvI9pNHUbj3G9aicUxQXTvz4Jej6xjJ0MqiDXEHt9oCwDS6np oLxNHUPzjzOPmjOhaPiBgT2Sm/YZYOhGdGVcaJfXC00MFt4p7qVVysXsHPSO5FKSrT6X k5djp93lZJmqa6xZ0wpq4+2Vc8HJLhs9Z3WYJptQYgr1ZTbdv20jbQc+/R9JjqHH8SCT AdfZ0GP+lcMLNSuNtYSRuTcyZv17/b88aNeBNWr5bdwebVGDHGo3sxWVeC3qK0hLVMXT jMPL8V0u2RZ5vmm56e6QO2Uv2uxlQctFq3Mx8sAwsz2DoKgn4ikSIf5fF0gAsc3mLfMs v2cw== X-Forwarded-Encrypted: i=1; AJvYcCWNgryvp+/4/I9RkCKjl0FCGqykGQy89Xsh3BS1ubwEVU1ItS9/bfVamDlyyYHB+h664VVbljQwbvEKy14=@vger.kernel.org X-Gm-Message-State: AOJu0YzvfabD8K3ktiTqLc+8qpLRAjIKLdNdzbmqy9MkiTO63e2DObrl pAOmBNWX4IYE8/GxtyoA/XiBhPVp/4dUYFLpDAphRaQVu5aocpxhYbp4 X-Gm-Gg: ATEYQzyP6d56EaqnvnC6LQU9/VnlauYJHo87Js6/wKyegP/wGXas2UbYxPjtXnQOla2 JydKlFl+bN0ZIsknfoXc3DKsoUd+NbHq8XnLx0IuNtMbQLOU/idJnIHHE52rCVEy3xDCLRY9uyG a0177w5pl88sTyuQOagj5qPHe/+NI6OE+8yw53cJ/eagj7zLi2dhvRwBgNekvmc2sqTsIN4OAss MwDE7hI4W2Tlx7LEowTJjsblon8OfDL7wPf75QIsIZopEwejGgKPLLrjMnbdBgMSfbwMpuAS4/q Z+tgbfg3M60812zulnxG1nq20xciW26dwhtnNcEdDZTyMhlspoCU0Akd1/4sDiKRDTnza+wuQSp s9UIsnxopjupj0HDf3ub7DgoLwL5ZkbayKVK1iWRAY3ty8ybCdBbHz/PNGoVFlOaKb+MA+JICPa nMDqH+0tnAxdc2UtjXir2wmw== X-Received: by 2002:a05:6830:8388:b0:7cb:125d:2a43 with SMTP id 46e09a7af769-7d591be4a4dmr307474a34.28.1772134193631; Thu, 26 Feb 2026 11:29:53 -0800 (PST) Received: from localhost ([2a03:2880:10ff:5e::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d5866557c6sm2419781a34.24.2026.02.26.11.29.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Feb 2026 11:29:53 -0800 (PST) From: Joshua Hahn To: Minchan Kim , Sergey Senozhatsky Cc: Johannes Weiner , Yosry Ahmed , Nhat Pham , Nhat Pham , Chengming Zhou , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 8/8] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Date: Thu, 26 Feb 2026 11:29:31 -0800 Message-ID: <20260226192936.3190275-9-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260226192936.3190275-1-joshua.hahnjy@gmail.com> References: <20260226192936.3190275-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that memcg charging happens in the zsmalloc layer where we have both objcg and page information, we can specify which node's memcg lruvec zswapped memory should be accounted to. Move MEMCG_ZSWAP_B and MEMCG_ZSWAPPED_B from enum_node_stat_item to int memcg_node_stat_items. Rename their prefix from MEMCG to NR to reflect this move as well. In addition, decouple the updates of node stats (vmstat) and memcg-lruvec stats, since node stats can only track values at a PAGE_SIZE granularity. Finally, track the moving charges whenever a compressed object migrates from one zspage to another. memcg-lruvec stats are now updated precisely and proportionally when compressed objects are split across pages. Unfortunately for node stats, only NR_ZSWAP_B can be kept accurate. NR_ZSWAPPED_B works as a good best-effort value, but cannot proportionally account for compressed objects split across pages due to the coarse PAGE_SIZE granularity of node stats. For such objects, NR_ZSWAPPED_B is accounted to the first zpdesc's node stats. Note that this is not a new inaccuracy, but one that is simply left unable to be fixed as part of these changes. The small inaccuracy is accepted in place of invasive changes across all of vmstat infrastructure to begin tracking stats at byte granularity. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/memcontrol.h | 5 +-- include/linux/mmzone.h | 2 ++ mm/memcontrol.c | 18 +++++----- mm/vmstat.c | 2 ++ mm/zsmalloc.c | 72 ++++++++++++++++++++++++++++++-------- mm/zswap.c | 4 +-- 6 files changed, 76 insertions(+), 27 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d3952c918fd4..ba97b86d9104 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -37,8 +37,6 @@ enum memcg_stat_item { MEMCG_PERCPU_B, MEMCG_VMALLOC, MEMCG_KMEM, - MEMCG_ZSWAP_B, - MEMCG_ZSWAPPED_B, MEMCG_NR_STAT, }; =20 @@ -932,6 +930,9 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memc= g); void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val); =20 +void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, + int val); + static inline void mod_memcg_page_state(struct page *page, enum memcg_stat_item idx, int val) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3e51190a55e4..ae16a90491ac 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -258,6 +258,8 @@ enum node_stat_item { #ifdef CONFIG_HUGETLB_PAGE NR_HUGETLB, #endif + NR_ZSWAP_B, + NR_ZSWAPPED_B, NR_BALLOON_PAGES, NR_KERNEL_FILE_PAGES, NR_VM_NODE_STAT_ITEMS diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b662902d4e03..dc7cfff97296 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -331,6 +331,8 @@ static const unsigned int memcg_node_stat_items[] =3D { #ifdef CONFIG_HUGETLB_PAGE NR_HUGETLB, #endif + NR_ZSWAP_B, + NR_ZSWAPPED_B, }; =20 static const unsigned int memcg_stat_items[] =3D { @@ -339,8 +341,6 @@ static const unsigned int memcg_stat_items[] =3D { MEMCG_PERCPU_B, MEMCG_VMALLOC, MEMCG_KMEM, - MEMCG_ZSWAP_B, - MEMCG_ZSWAPPED_B, }; =20 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) @@ -726,7 +726,7 @@ unsigned long memcg_page_state_local(struct mem_cgroup = *memcg, int idx) } #endif =20 -static void mod_memcg_lruvec_state(struct lruvec *lruvec, +void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { @@ -1344,8 +1344,8 @@ static const struct memory_stat memory_stats[] =3D { { "vmalloc", MEMCG_VMALLOC }, { "shmem", NR_SHMEM }, #ifdef CONFIG_ZSWAP - { "zswap", MEMCG_ZSWAP_B }, - { "zswapped", MEMCG_ZSWAPPED_B }, + { "zswap", NR_ZSWAP_B }, + { "zswapped", NR_ZSWAPPED_B }, #endif { "file_mapped", NR_FILE_MAPPED }, { "file_dirty", NR_FILE_DIRTY }, @@ -1392,8 +1392,8 @@ static int memcg_page_state_unit(int item) { switch (item) { case MEMCG_PERCPU_B: - case MEMCG_ZSWAP_B: - case MEMCG_ZSWAPPED_B: + case NR_ZSWAP_B: + case NR_ZSWAPPED_B: case NR_SLAB_RECLAIMABLE_B: case NR_SLAB_UNRECLAIMABLE_B: return 1; @@ -5424,7 +5424,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) =20 /* Force flush to get accurate stats for charging */ __mem_cgroup_flush_stats(memcg, true); - pages =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; + pages =3D memcg_page_state(memcg, NR_ZSWAP_B) / PAGE_SIZE; if (pages < max) continue; ret =3D false; @@ -5453,7 +5453,7 @@ static u64 zswap_current_read(struct cgroup_subsys_st= ate *css, struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); =20 mem_cgroup_flush_stats(memcg); - return memcg_page_state(memcg, MEMCG_ZSWAP_B); + return memcg_page_state(memcg, NR_ZSWAP_B); } =20 static int zswap_max_show(struct seq_file *m, void *v) diff --git a/mm/vmstat.c b/mm/vmstat.c index 99270713e0c1..4b10610bd999 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1279,6 +1279,8 @@ const char * const vmstat_text[] =3D { #ifdef CONFIG_HUGETLB_PAGE [I(NR_HUGETLB)] =3D "nr_hugetlb", #endif + [I(NR_ZSWAP_B)] =3D "zswap", + [I(NR_ZSWAPPED_B)] =3D "zswapped", [I(NR_BALLOON_PAGES)] =3D "nr_balloon_pages", [I(NR_KERNEL_FILE_PAGES)] =3D "nr_kernel_file_pages", #undef I diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 6794927c60fb..548e7f4b8bf6 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -810,6 +810,7 @@ static void __free_zspage(struct zs_pool *pool, struct = size_class *class, struct zspage *zspage) { struct zpdesc *zpdesc, *next; + bool objcg =3D !!zpdesc_objcgs(zspage->first_zpdesc); =20 assert_spin_locked(&class->lock); =20 @@ -823,6 +824,8 @@ static void __free_zspage(struct zs_pool *pool, struct = size_class *class, reset_zpdesc(zpdesc); zpdesc_unlock(zpdesc); zpdesc_dec_zone_page_state(zpdesc); + if (objcg) + dec_node_page_state(zpdesc_page(zpdesc), NR_ZSWAP_B); zpdesc_put(zpdesc); zpdesc =3D next; } while (zpdesc !=3D NULL); @@ -963,11 +966,45 @@ static bool alloc_zspage_objcgs(struct size_class *cl= ass, gfp_t gfp, return true; } =20 -static void zs_charge_objcg(struct zpdesc *zpdesc, struct obj_cgroup *objc= g, - int size, unsigned long offset) +static void __zs_mod_memcg_lruvec(struct zpdesc *zpdesc, + struct obj_cgroup *objcg, int size, + int sign, unsigned long offset) { struct mem_cgroup *memcg; + struct lruvec *lruvec; + int compressed_size =3D size, original_size =3D PAGE_SIZE; + int nid =3D page_to_nid(zpdesc_page(zpdesc)); + int next_nid =3D nid; + + if (offset + size > PAGE_SIZE) { + struct zpdesc *next_zpdesc =3D get_next_zpdesc(zpdesc); + + next_nid =3D page_to_nid(zpdesc_page(next_zpdesc)); + if (nid !=3D next_nid) { + compressed_size =3D PAGE_SIZE - offset; + original_size =3D (PAGE_SIZE * compressed_size) / size; + } + } + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); + lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + mod_memcg_lruvec_state(lruvec, NR_ZSWAP_B, sign * compressed_size); + mod_memcg_lruvec_state(lruvec, NR_ZSWAPPED_B, sign * original_size); + + if (nid !=3D next_nid) { + lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(next_nid)); + mod_memcg_lruvec_state(lruvec, NR_ZSWAP_B, + sign * (size - compressed_size)); + mod_memcg_lruvec_state(lruvec, NR_ZSWAPPED_B, + sign * (PAGE_SIZE - original_size)); + } + rcu_read_unlock(); +} =20 +static void zs_charge_objcg(struct zpdesc *zpdesc, struct obj_cgroup *objc= g, + int size, unsigned long offset) +{ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 @@ -977,28 +1014,30 @@ static void zs_charge_objcg(struct zpdesc *zpdesc, s= truct obj_cgroup *objcg, if (obj_cgroup_charge(objcg, GFP_KERNEL, size)) VM_WARN_ON_ONCE(1); =20 - rcu_read_lock(); - memcg =3D obj_cgroup_memcg(objcg); - mod_memcg_state(memcg, MEMCG_ZSWAP_B, size); - mod_memcg_state(memcg, MEMCG_ZSWAPPED_B, 1); - rcu_read_unlock(); + __zs_mod_memcg_lruvec(zpdesc, objcg, size, 1, offset); + + /* + * Node-level vmstats are charged in PAGE_SIZE units. As a + * best-effort, always charge NR_ZSWAPPED_B to the first zpdesc. + */ + inc_node_page_state(zpdesc_page(zpdesc), NR_ZSWAPPED_B); } =20 static void zs_uncharge_objcg(struct zpdesc *zpdesc, struct obj_cgroup *ob= jcg, int size, unsigned long offset) { - struct mem_cgroup *memcg; - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return; =20 obj_cgroup_uncharge(objcg, size); =20 - rcu_read_lock(); - memcg =3D obj_cgroup_memcg(objcg); - mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size); - mod_memcg_state(memcg, MEMCG_ZSWAPPED_B, -1); - rcu_read_unlock(); + __zs_mod_memcg_lruvec(zpdesc, objcg, size, -1, offset); + + /* + * Node-level vmstats are uncharged in PAGE_SIZE units. As a + * best-effort, always uncharge NR_ZSWAPPED_B to the first zpdesc. + */ + dec_node_page_state(zpdesc_page(zpdesc), NR_ZSWAPPED_B); } =20 static void migrate_obj_objcg(unsigned long used_obj, unsigned long free_o= bj, @@ -1135,6 +1174,8 @@ static struct zspage *alloc_zspage(struct zs_pool *po= ol, __zpdesc_set_zsmalloc(zpdesc); =20 zpdesc_inc_zone_page_state(zpdesc); + if (objcg) + inc_node_page_state(zpdesc_page(zpdesc), NR_ZSWAP_B); zpdescs[i] =3D zpdesc; } =20 @@ -1149,6 +1190,9 @@ static struct zspage *alloc_zspage(struct zs_pool *po= ol, err: while (--i >=3D 0) { zpdesc_dec_zone_page_state(zpdescs[i]); + if (objcg) + dec_node_page_state(zpdesc_page(zpdescs[i]), + NR_ZSWAP_B); free_zpdesc(zpdescs[i]); } cache_free_zspage(zspage); diff --git a/mm/zswap.c b/mm/zswap.c index 97f38d0afa86..9e845e1d7214 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1214,9 +1214,9 @@ static unsigned long zswap_shrinker_count(struct shri= nker *shrinker, */ if (!mem_cgroup_disabled()) { mem_cgroup_flush_stats(memcg); - nr_backing =3D memcg_page_state(memcg, MEMCG_ZSWAP_B); + nr_backing =3D memcg_page_state(memcg, NR_ZSWAP_B); nr_backing >>=3D PAGE_SHIFT; - nr_stored =3D memcg_page_state(memcg, MEMCG_ZSWAPPED_B); + nr_stored =3D memcg_page_state(memcg, NR_ZSWAPPED_B); nr_stored >>=3D PAGE_SHIFT; } else { nr_backing =3D zswap_total_pages(); --=20 2.47.3