From nobody Tue Dec 16 11:04:00 2025 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1294B1D7E21 for ; Fri, 28 Feb 2025 07:58:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729514; cv=none; b=EMedFlBhpMzp8/42mP+mMBxOscEQttSoAfUOoL+uBhsr4TGhCjsrqJLgLZnLu6ICISSNuUVC7yP1e8iwlrWdA9awPR2FCkB1a/lR6B4qXi1cdC8Ui1MTeZwT0ExxG3M0Pl6QKPaCfLTMLM149eInITNr34+UEP076oV7KecK3Gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729514; c=relaxed/simple; bh=QRx7jZRT55AKyjpFQGyFiao/6ngnuUeVAto4r0nfsVg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XLFEtnLcVCPrjwo8KtEoaS4CVX6R2pUSs+jdGRvHtnRdBWuIYAXaljRYXKTwEN6HJB4XhQ8Mir0W7go4lvgfXIXhaO4RHq8dWlQH3G/l20WP8xXtVdXE5zXDHzxJxLy2LdNL7BVqTB+G0CxyElDzDxba7ylWHHE7/HDKeqroKQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=pKGhwYIy; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="pKGhwYIy" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740729511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4VCEGjyZe+NN0DZWtHUpr5+jCBnuJyJ0VIJUe5VXlq8=; b=pKGhwYIythzfPZUUvu9WlOfs4j0TLxTYveugs6VVz6vtAmEdqTKcfPp3JrH0JyMA0ymAC4 hX0Qdr6CY54uCEink45UtRfouqr3vp5DvPqBt9JxhObqqgUcqXSLkl1G7truPbl8j0SF+i YVXLPk+zopZiy6Wrq9/L+SrnXUM7cjg= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1 Date: Thu, 27 Feb 2025 23:58:06 -0800 Message-ID: <20250228075808.207484-2-shakeel.butt@linux.dev> In-Reply-To: <20250228075808.207484-1-shakeel.butt@linux.dev> References: <20250228075808.207484-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Memcg-v1 does not support memory protection (min/low) and thus there is no need to track protected memory usage for it. Signed-off-by: Shakeel Butt --- mm/memcontrol.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 55b0e9482c00..36b2dfbc86c0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3601,6 +3601,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) { struct mem_cgroup *parent =3D mem_cgroup_from_css(parent_css); struct mem_cgroup *memcg, *old_memcg; + bool memcg_on_dfl =3D cgroup_subsys_on_dfl(memory_cgrp_subsys); =20 old_memcg =3D set_active_memcg(parent); memcg =3D mem_cgroup_alloc(parent); @@ -3618,7 +3619,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) if (parent) { WRITE_ONCE(memcg->swappiness, mem_cgroup_swappiness(parent)); =20 - page_counter_init(&memcg->memory, &parent->memory, true); + page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl); page_counter_init(&memcg->swap, &parent->swap, false); #ifdef CONFIG_MEMCG_V1 WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable)); @@ -3638,7 +3639,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) return &memcg->css; } =20 - if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket) + if (memcg_on_dfl && !cgroup_memory_nosocket) static_branch_inc(&memcg_sockets_enabled_key); =20 if (!cgroup_memory_nobpf) --=20 2.43.5 From nobody Tue Dec 16 11:04:00 2025 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 102C31D61B9 for ; Fri, 28 Feb 2025 07:58:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729521; cv=none; b=rAiNxCjSa5/7a5hHnuniSSdSI4OCk0CS7AXz39pGogCIU5KnrtXKrlDjKTlp7ubAbc4vXDHKy/J3sBLY0XkLLsCTYXseVc6brECpYvCbhriUek80PzCve4GGnq4zQDTNWgK5FUfe/24LAFwabWn3/uZhDTSm+iAulcTSqWwnlNM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729521; c=relaxed/simple; bh=zA6PXsDktXbXxM6E2AG0NGTa/L70w9OXygFrIpQLjFQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O93z9pAbI7aetI040TX9vun8BhniRJQ3yMGq8aYVI/WDZhdI7UNyLnn96wtS68kJd6STOxJbi9WSeGLQouG9/zUA7G8RUwWZwLCD5R3gF64bK0exk8F+mrdEQnY/4g1FpUjWyOQrQQQR5/syvIGPhIsMemAJX5jPwEatQvpgKRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=KVQzTsVV; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="KVQzTsVV" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740729517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jgR5BcL3MvfMipx+x4bMFzdCySvKkv9t6LJsU3AWxSI=; b=KVQzTsVVF++szQiHnUrzJKSOalzgMh4NejwrixnLYUQE8tb2MjSfqWnQ+d2+qFkNoVDAT3 I84I69ippat79QUZDAFdLwExq1udUPcG/9iMdGZ65f8pjfBnsMXVjxE4uNRuUpXvaxUt94 59CEZ1R0Z9C1uw+1ODGG9YuchIaZMT8= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH 2/3] page_counter: track failcnt only for legacy cgroups Date: Thu, 27 Feb 2025 23:58:07 -0800 Message-ID: <20250228075808.207484-3-shakeel.butt@linux.dev> In-Reply-To: <20250228075808.207484-1-shakeel.butt@linux.dev> References: <20250228075808.207484-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Currently page_counter tracks failcnt for counters used by v1 and v2 controllers. However failcnt is only exported for v1 deployment and thus there is no need to maintain it in v2. The oom report does expose failcnt for memory and swap in v2 but v2 already maintains MEMCG_MAX and MEMCG_SWAP_MAX event counters which can be used. Signed-off-by: Shakeel Butt --- include/linux/page_counter.h | 4 +++- mm/hugetlb_cgroup.c | 31 ++++++++++++++----------------- mm/memcontrol.c | 12 ++++++++++-- mm/page_counter.c | 4 +++- 4 files changed, 30 insertions(+), 21 deletions(-) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 46406f3fe34d..e4bd8fd427be 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -28,12 +28,13 @@ struct page_counter { unsigned long watermark; /* Latest cg2 reset watermark */ unsigned long local_watermark; - unsigned long failcnt; + unsigned long failcnt; /* v1-only field */ =20 /* Keep all the read most fields in a separete cacheline. */ CACHELINE_PADDING(_pad2_); =20 bool protection_support; + bool track_failcnt; unsigned long min; unsigned long low; unsigned long high; @@ -58,6 +59,7 @@ static inline void page_counter_init(struct page_counter = *counter, counter->max =3D PAGE_COUNTER_MAX; counter->parent =3D parent; counter->protection_support =3D protection_support; + counter->track_failcnt =3D false; } =20 static inline unsigned long page_counter_read(struct page_counter *counter) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index bb9578bd99f9..58e895f3899a 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -101,10 +101,9 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup = *h_cgroup, int idx; =20 for (idx =3D 0; idx < HUGE_MAX_HSTATE; idx++) { - struct page_counter *fault_parent =3D NULL; - struct page_counter *rsvd_parent =3D NULL; + struct page_counter *fault, *fault_parent =3D NULL; + struct page_counter *rsvd, *rsvd_parent =3D NULL; unsigned long limit; - int ret; =20 if (parent_h_cgroup) { fault_parent =3D hugetlb_cgroup_counter_from_cgroup( @@ -112,24 +111,22 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup= *h_cgroup, rsvd_parent =3D hugetlb_cgroup_counter_from_cgroup_rsvd( parent_h_cgroup, idx); } - page_counter_init(hugetlb_cgroup_counter_from_cgroup(h_cgroup, - idx), - fault_parent, false); - page_counter_init( - hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx), - rsvd_parent, false); + fault =3D hugetlb_cgroup_counter_from_cgroup(h_cgroup, idx); + rsvd =3D hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx); + + page_counter_init(fault, fault_parent, false); + page_counter_init(rsvd, rsvd_parent, false); + + if (!cgroup_subsys_on_dfl(hugetlb_cgrp_subsys)) { + fault->track_failcnt =3D true; + rsvd->track_failcnt =3D true; + } =20 limit =3D round_down(PAGE_COUNTER_MAX, pages_per_huge_page(&hstates[idx])); =20 - ret =3D page_counter_set_max( - hugetlb_cgroup_counter_from_cgroup(h_cgroup, idx), - limit); - VM_BUG_ON(ret); - ret =3D page_counter_set_max( - hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx), - limit); - VM_BUG_ON(ret); + VM_BUG_ON(page_counter_set_max(fault, limit)); + VM_BUG_ON(page_counter_set_max(rsvd, limit)); } } =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 36b2dfbc86c0..030fadbd5bf2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1572,16 +1572,23 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup= *memcg) /* Use static buffer, for the caller is holding oom_lock. */ static char buf[SEQ_BUF_SIZE]; struct seq_buf s; + unsigned long memory_failcnt; =20 lockdep_assert_held(&oom_lock); =20 + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + memory_failcnt =3D atomic_long_read(&memcg->memory_events[MEMCG_MAX]); + else + memory_failcnt =3D memcg->memory.failcnt; + pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memory)), - K((u64)READ_ONCE(memcg->memory.max)), memcg->memory.failcnt); + K((u64)READ_ONCE(memcg->memory.max)), memory_failcnt); if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) pr_info("swap: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->swap)), - K((u64)READ_ONCE(memcg->swap.max)), memcg->swap.failcnt); + K((u64)READ_ONCE(memcg->swap.max)), + atomic_long_read(&memcg->memory_events[MEMCG_SWAP_MAX])); #ifdef CONFIG_MEMCG_V1 else { pr_info("memory+swap: usage %llukB, limit %llukB, failcnt %lu\n", @@ -3622,6 +3629,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl); page_counter_init(&memcg->swap, &parent->swap, false); #ifdef CONFIG_MEMCG_V1 + memcg->memory.track_failcnt =3D !memcg_on_dfl; WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable)); page_counter_init(&memcg->kmem, &parent->kmem, false); page_counter_init(&memcg->tcpmem, &parent->tcpmem, false); diff --git a/mm/page_counter.c b/mm/page_counter.c index af23f927611b..661e0f2a5127 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -121,6 +121,7 @@ bool page_counter_try_charge(struct page_counter *count= er, { struct page_counter *c; bool protection =3D track_protection(counter); + bool track_failcnt =3D counter->track_failcnt; =20 for (c =3D counter; c; c =3D c->parent) { long new; @@ -146,7 +147,8 @@ bool page_counter_try_charge(struct page_counter *count= er, * inaccuracy in the failcnt which is only used * to report stats. */ - data_race(c->failcnt++); + if (track_failcnt) + data_race(c->failcnt++); *fail =3D c; goto failed; } --=20 2.43.5 From nobody Tue Dec 16 11:04:00 2025 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D84911DA614 for ; Fri, 28 Feb 2025 07:58:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729525; cv=none; b=m7m2YM/OFKvDWSHDKMhDK/6tvEFkW89b2gxXRxUMD0irE5RWqRkl5RaLBGgeTvlkjQ4WiRMB3Su+6JxixKtrD4Af4mZI+qWQuTzi0GrnwyR8Mz9jxPLdgK/Gs8PhZYuQFUHsO2pMrlEJKnb3Y/8daasL3NSABosO3Tff1r+nxzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740729525; c=relaxed/simple; bh=6PbihMNTnIkkQvZLlThPRIBzMBGa5NVhsdPURXHpnno=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q5qZP9XZkl08Z4HCQG1Dp2TujYwF9b+bwZmTuJLK7AJyFWukS5x5HodgjRH8i2UjcLKzutLGy1PyCLq7V0YZIOTZIVgfbwUrqZaQeL5V906dKTYfelfQhkVu6j+3anigFGvaj4ogpmNpfuRJTq0qdGYIEX98BPaH0Ok6DTWJEgY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=a4goRKM7; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="a4goRKM7" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740729521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BK/lHOWMvSnS+WIRroS9/stVhRR6z5AG9NC04iFYhYg=; b=a4goRKM7J4ZIFDTPgn+GjDlc39QJ1F7e412TgXrydAts3sFCMK9dW4A0cowefVceam44tp K3tH5jwmqqjoQI8//8jeZpIbMkvjfX4T0xF1SsSOoa542AKlyodxF5cSdAiGkuD8KaGpJn j3xB5Zl+kCBqKe9Kkn2PM8DhkM1po7Y= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH 3/3] page_counter: reduce struct page_counter size Date: Thu, 27 Feb 2025 23:58:08 -0800 Message-ID: <20250228075808.207484-4-shakeel.butt@linux.dev> In-Reply-To: <20250228075808.207484-1-shakeel.butt@linux.dev> References: <20250228075808.207484-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The struct page_counter has explicit padding for better cache alignment. The commit c6f53ed8f213a ("mm, memcg: cg2 memory{.swap,}.peak write handlers") added a field to the struct page_counter and accidently increased its size. Let's move the failcnt field which is v1-only field to the same cacheline of usage to reduce the size of struct page_counter. Signed-off-by: Shakeel Butt --- include/linux/page_counter.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index e4bd8fd427be..d649b6bbbc87 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -9,10 +9,12 @@ =20 struct page_counter { /* - * Make sure 'usage' does not share cacheline with any other field. The - * memcg->memory.usage is a hot member of struct mem_cgroup. + * Make sure 'usage' does not share cacheline with any other field in + * v2. The memcg->memory.usage is a hot member of struct mem_cgroup. */ atomic_long_t usage; + unsigned long failcnt; /* v1-only field */ + CACHELINE_PADDING(_pad1_); =20 /* effective memory.min and memory.min usage tracking */ @@ -28,7 +30,6 @@ struct page_counter { unsigned long watermark; /* Latest cg2 reset watermark */ unsigned long local_watermark; - unsigned long failcnt; /* v1-only field */ =20 /* Keep all the read most fields in a separete cacheline. */ CACHELINE_PADDING(_pad2_); --=20 2.43.5