From nobody Fri Dec 19 17:33:19 2025 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FEF71D6DB4 for ; Tue, 13 May 2025 03:13:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106028; cv=none; b=YIDNErD/fU5sORGuvlX6p9pVZZgCgDrc1v77p07WXr3UCM8gIK06HQI6YC9BaqxiLJzYWOxPn3Xq7eZ+oesGNwHDipEM2v8uK195hSHwyCXXWmIgBOMbNVVQFmCnSjmv9/MJ4bFz2j3hsm6T6p+fPHR3029SZkD6qBLgD2hXHA0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106028; c=relaxed/simple; bh=n5s9ZZRvWvoxataSlwAcmGbeXZvkMF12DrmfaYB1ZDA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c3FADQXz6w4x3XkKQQ+uqqSEHEWfasZ0wCK84HWniBAag7fHplSA+cblEwimt9/zBpWTe8dfe7uI+t0iBopyuAW7Db1B2yB4sHtWclX3lBwLJJiGRhmYF3s2VuKfabKPjQ0hIuHuTNX0dk8NF7su2xRJjWmmzdkBCXF4FNjXw48= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=s046WqLi; arc=none smtp.client-ip=95.215.58.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="s046WqLi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fKjwuFgLIi+/qbpLVSCmVcizGpR4uToQYQN4R9XWXaE=; b=s046WqLiD6NzciDYeAs2Pd9sL0cg9XQg1MC6wfbfV4pxi+XDfH9ARAET9Z5Afa6tzo72s7 xbSQAxH8ENw0+G5K5ilXEYUxLCAFQR7nzo62cUREnuWUW604+ZxMzrhqyDoDEPFHYlGfcS h3wbcmRWWAGlhLsSP4vhNIGsGZTnp+g= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs Date: Mon, 12 May 2025 20:13:10 -0700 Message-ID: <20250513031316.2147548-2-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The function memcg_rstat_updated() is used to track the memcg stats updates for optimizing the flushes. At the moment, it is not re-entrant safe and the callers disabled irqs before calling. However to achieve the goal of updating memcg stats without irqs, memcg_rstat_updated() needs to be re-entrant safe against irqs. This patch makes memcg_rstat_updated() re-entrant safe against irqs. However it is using atomic_* ops which on x86, adds lock prefix to the instructions. Since this is per-cpu data, the this_cpu_* ops are preferred. However the percpu pointer is stored in struct mem_cgroup and doing the upward traversal through struct mem_cgroup may cause two cache misses as compared to traversing through struct memcg_vmstats_percpu pointer. NOTE: explore if there is atomic_* ops alternative without lock prefix. Signed-off-by: Shakeel Butt --- mm/memcontrol.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6cfa3550f300..2c4c095bf26c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -503,7 +503,7 @@ static inline int memcg_events_index(enum vm_event_item= idx) =20 struct memcg_vmstats_percpu { /* Stats updates since the last flush */ - unsigned int stats_updates; + atomic_t stats_updates; =20 /* Cached pointers for fast iteration in memcg_rstat_updated() */ struct memcg_vmstats_percpu *parent; @@ -590,12 +590,15 @@ static bool memcg_vmstats_needs_flush(struct memcg_vm= stats *vmstats) static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { struct memcg_vmstats_percpu *statc; - int cpu =3D smp_processor_id(); - unsigned int stats_updates; + int cpu; + int stats_updates; =20 if (!val) return; =20 + /* Don't assume callers have preemption disabled. */ + cpu =3D get_cpu(); + cgroup_rstat_updated(memcg->css.cgroup, cpu); statc =3D this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc =3D statc->parent) { @@ -607,14 +610,16 @@ static inline void memcg_rstat_updated(struct mem_cgr= oup *memcg, int val) if (memcg_vmstats_needs_flush(statc->vmstats)) break; =20 - stats_updates =3D READ_ONCE(statc->stats_updates) + abs(val); - WRITE_ONCE(statc->stats_updates, stats_updates); + stats_updates =3D atomic_add_return(abs(val), &statc->stats_updates); if (stats_updates < MEMCG_CHARGE_BATCH) continue; =20 - atomic64_add(stats_updates, &statc->vmstats->stats_updates); - WRITE_ONCE(statc->stats_updates, 0); + stats_updates =3D atomic_xchg(&statc->stats_updates, 0); + if (stats_updates) + atomic64_add(stats_updates, + &statc->vmstats->stats_updates); } + put_cpu(); } =20 static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) @@ -4155,7 +4160,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_= subsys_state *css, int cpu) mem_cgroup_stat_aggregate(&ac); =20 } - WRITE_ONCE(statc->stats_updates, 0); + atomic_set(&statc->stats_updates, 0); /* We are in a per-cpu loop here, only do the atomic write once */ if (atomic64_read(&memcg->vmstats->stats_updates)) atomic64_set(&memcg->vmstats->stats_updates, 0); --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7671C1DF982 for ; Tue, 13 May 2025 03:13:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106032; cv=none; b=j90SUhba6mXr4sSCdhtHsvgqZadmhAWGVMvSYp782HgiToKxmNGAvEL6BmLuEA7YUBKr14/DzPN+IIvcfNE2s153v2pLpahhk8zkQKazcgERoM1W+873FKLm5u/8bF/LqXsubm45m8kR9aKaUlbU7yH2lbrAHvJrWvdaqpZFeBs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106032; c=relaxed/simple; bh=+rnUk1ve1loqahnQwcEmGR/tx0jbO1Z8D8Z1kS5bT4c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=drk+LSCROgAn/jduknLNAC9kIf68k+Qm8v9L5UpBvtlbVQdwFvjjxZ1K3wpZlcNmd/gKYCbeHv96lrGh4XdA8mm6JPl2T0dxkagz33T7HDPx89SS2gjeOR4U1pcMYauD3kgH7kALozszqgkw860LY9J6KRKilXAp3gZlPp/Cr+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=C3EQVIw8; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="C3EQVIw8" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/8sFVhv+XawDkwe/3YGMSQXN4zaIyJz49c278HEHQGA=; b=C3EQVIw8Y1nTVChBcJCKWIBxYqV4/Y1lz/1Ujxcqedmdus/MZqYNUvrgok9fdfCOPb626J esIQ4xLECOxawcCDqHWReJVu3iDYZQmbVGRspPyxWAJtsTt0J07twE74pSmbLhyaf4m8YA KuJRH8xcc9tkRoSw146PCOltrl4otFI= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated Date: Mon, 12 May 2025 20:13:11 -0700 Message-ID: <20250513031316.2147548-3-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Let's move the explicit preempt disable code to the callers of memcg_rstat_updated and also remove the memcg_stats_lock and related functions which ensures the callers of stats update functions have disabled preemption because now the stats update functions are explicitly disabling preemption. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- mm/memcontrol.c | 74 +++++++++++++------------------------------------ 1 file changed, 19 insertions(+), 55 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2c4c095bf26c..62450e7991d8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -558,47 +558,21 @@ static u64 flush_last_time; =20 #define FLUSH_TIME (2UL*HZ) =20 -/* - * Accessors to ensure that preemption is disabled on PREEMPT_RT because i= t can - * not rely on this as part of an acquired spinlock_t lock. These function= s are - * never used in hardirq context on PREEMPT_RT and therefore disabling pre= emtion - * is sufficient. - */ -static void memcg_stats_lock(void) -{ - preempt_disable_nested(); - VM_WARN_ON_IRQS_ENABLED(); -} - -static void __memcg_stats_lock(void) -{ - preempt_disable_nested(); -} - -static void memcg_stats_unlock(void) -{ - preempt_enable_nested(); -} - - static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) { return atomic64_read(&vmstats->stats_updates) > MEMCG_CHARGE_BATCH * num_online_cpus(); } =20 -static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) +static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val, + int cpu) { struct memcg_vmstats_percpu *statc; - int cpu; int stats_updates; =20 if (!val) return; =20 - /* Don't assume callers have preemption disabled. */ - cpu =3D get_cpu(); - cgroup_rstat_updated(memcg->css.cgroup, cpu); statc =3D this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc =3D statc->parent) { @@ -619,7 +593,6 @@ static inline void memcg_rstat_updated(struct mem_cgrou= p *memcg, int val) atomic64_add(stats_updates, &statc->vmstats->stats_updates); } - put_cpu(); } =20 static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) @@ -717,6 +690,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum m= emcg_stat_item idx, int val) { int i =3D memcg_stats_index(idx); + int cpu; =20 if (mem_cgroup_disabled()) return; @@ -724,12 +698,14 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum= memcg_stat_item idx, if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 - memcg_stats_lock(); + cpu =3D get_cpu(); + __this_cpu_add(memcg->vmstats_percpu->state[i], val); val =3D memcg_state_val_in_pages(idx, val); - memcg_rstat_updated(memcg, val); + memcg_rstat_updated(memcg, val, cpu); trace_mod_memcg_state(memcg, idx, val); - memcg_stats_unlock(); + + put_cpu(); } =20 #ifdef CONFIG_MEMCG_V1 @@ -758,6 +734,7 @@ static void __mod_memcg_lruvec_state(struct lruvec *lru= vec, struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; int i =3D memcg_stats_index(idx); + int cpu; =20 if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; @@ -765,24 +742,7 @@ static void __mod_memcg_lruvec_state(struct lruvec *lr= uvec, pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg =3D pn->memcg; =20 - /* - * The caller from rmap relies on disabled preemption because they never - * update their counter from in-interrupt context. For these two - * counters we check that the update is never performed from an - * interrupt context while other caller need to have disabled interrupt. - */ - __memcg_stats_lock(); - if (IS_ENABLED(CONFIG_DEBUG_VM)) { - switch (idx) { - case NR_ANON_MAPPED: - case NR_FILE_MAPPED: - case NR_ANON_THPS: - WARN_ON_ONCE(!in_task()); - break; - default: - VM_WARN_ON_IRQS_ENABLED(); - } - } + cpu =3D get_cpu(); =20 /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[i], val); @@ -791,9 +751,10 @@ static void __mod_memcg_lruvec_state(struct lruvec *lr= uvec, __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); =20 val =3D memcg_state_val_in_pages(idx, val); - memcg_rstat_updated(memcg, val); + memcg_rstat_updated(memcg, val, cpu); trace_mod_memcg_lruvec_state(memcg, idx, val); - memcg_stats_unlock(); + + put_cpu(); } =20 /** @@ -873,6 +834,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enu= m vm_event_item idx, unsigned long count) { int i =3D memcg_events_index(idx); + int cpu; =20 if (mem_cgroup_disabled()) return; @@ -880,11 +842,13 @@ void __count_memcg_events(struct mem_cgroup *memcg, e= num vm_event_item idx, if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 - memcg_stats_lock(); + cpu =3D get_cpu(); + __this_cpu_add(memcg->vmstats_percpu->events[i], count); - memcg_rstat_updated(memcg, count); + memcg_rstat_updated(memcg, count, cpu); trace_count_memcg_events(memcg, idx, count); - memcg_stats_unlock(); + + put_cpu(); } =20 unsigned long memcg_events(struct mem_cgroup *memcg, int event) --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D06761E5B71 for ; Tue, 13 May 2025 03:14:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106046; cv=none; b=qCVnPocQpgYPu6+Pk/LelG/ePTcumjaV4mPF7bzXgYSU7JeSOf6IAgHsITHSW6S/hCObgtpnN3W/l1tlUUab6sbbz9GG9eflYuzBeuS314GR5giRa2VEqAmKrzq3vua+4afANHTBFWz3p9MvsFdyH9K8Fs6J/ZfMHEWEnBiTv58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106046; c=relaxed/simple; bh=e5zcqxXqQpR2JMymlF/KWPHcfisuz8s1WaeQLc69vnQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EBxOFhvEpsTJXGxW8Mq5HfIwlnjJCrJ4/G1god5s5UUvgWrQTLzdVvzwXCR07xyhMRg0sARgdq1hd3LoqnaBd7qPVlgtRzzA9wn8XC9ujVAqMaMXylaeG63Qhrj+8W2zkgmvWwXb/QW9J5lHaUxSJQ5SqxCYbtxKJgrAwSSLsL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ooSsm/BG; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ooSsm/BG" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106031; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BPiRuCw/bD9XYJlwvKtjg4J90L1EW0IfWXzhcocLuzg=; b=ooSsm/BGkm7jLt7WeCVzcIeXfSfmLTTxu3NEHROcON2zE6dLycnVCdS6+Pirr61u6vkq4u euHpiYkfCAx2oacQVM//UVnoD5lrLUEW6o3iX/cgHCN3dSAZ0N3CB7eKx5dyzXrhUClRXc QwDnQ1oHgBA0T+GBQDkE5Eg074eoGKI= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs Date: Mon, 12 May 2025 20:13:12 -0700 Message-ID: <20250513031316.2147548-4-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Let's make mod_memcg_state re-entrant safe against irqs. The only thing needed is to convert the usage of __this_cpu_add() to this_cpu_add(). In addition, with re-entrant safety, there is no need to disable irqs. mod_memcg_state() is not safe against nmi, so let's add warning if someone tries to call it in nmi context. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- include/linux/memcontrol.h | 20 ++------------------ mm/memcontrol.c | 12 ++++++++---- 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ed9acb68652a..84e2cea7e666 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -911,19 +911,9 @@ struct mem_cgroup *mem_cgroup_get_oom_group(struct tas= k_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); =20 -void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, - int val); - /* idx can be of type enum memcg_stat_item or node_stat_item */ -static inline void mod_memcg_state(struct mem_cgroup *memcg, - enum memcg_stat_item idx, int val) -{ - unsigned long flags; - - local_irq_save(flags); - __mod_memcg_state(memcg, idx, val); - local_irq_restore(flags); -} +void mod_memcg_state(struct mem_cgroup *memcg, + enum memcg_stat_item idx, int val); =20 static inline void mod_memcg_page_state(struct page *page, enum memcg_stat_item idx, int val) @@ -1390,12 +1380,6 @@ static inline void mem_cgroup_print_oom_group(struct= mem_cgroup *memcg) { } =20 -static inline void __mod_memcg_state(struct mem_cgroup *memcg, - enum memcg_stat_item idx, - int nr) -{ -} - static inline void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int nr) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 62450e7991d8..373d36cae069 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -681,12 +681,12 @@ static int memcg_state_val_in_pages(int idx, int val) } =20 /** - * __mod_memcg_state - update cgroup memory statistics + * mod_memcg_state - update cgroup memory statistics * @memcg: the memory cgroup * @idx: the stat item - can be enum memcg_stat_item or enum node_stat_item * @val: delta to add to the counter, can be negative */ -void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, +void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) { int i =3D memcg_stats_index(idx); @@ -698,9 +698,13 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum = memcg_stat_item idx, if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 + if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n", + __func__, idx)) + return; + cpu =3D get_cpu(); =20 - __this_cpu_add(memcg->vmstats_percpu->state[i], val); + this_cpu_add(memcg->vmstats_percpu->state[i], val); val =3D memcg_state_val_in_pages(idx, val); memcg_rstat_updated(memcg, val, cpu); trace_mod_memcg_state(memcg, idx, val); @@ -2969,7 +2973,7 @@ static void drain_obj_stock(struct obj_stock_pcp *sto= ck) =20 memcg =3D get_mem_cgroup_from_objcg(old); =20 - __mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages); + mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages); memcg1_account_kmem(memcg, -nr_pages); if (!mem_cgroup_is_root(memcg)) memcg_uncharge(memcg, nr_pages); --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30B7F1DF27D for ; Tue, 13 May 2025 03:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106041; cv=none; b=r8NAsmooVeeqKe0DiQz2qTErtivYvm3F6UQSnnIWNCO/QTQkfZZARl5cAPKkYrEu3UGpOU3Tg1RyDAW8c6zSmhBHeG6fQIuut+v+7oLOEXO3ckGrXWQ3worpOX9Snlc6I5sZ3p5OQGGj2H/nLOPFX8UnwmcZ7MaEOf7zW47gxqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106041; c=relaxed/simple; bh=1mm3f0O5Be2EnsuG7OBgfK2sWYAOIWscnoGZnAsBQqQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kj0XHr8GzrQo8umdL1caPY+GsHZyY8GGxDnvQOPT6PV01E9DoFSKzPkyznonAoJKf49whuzWr1KWYbaBrk1co02CitGmQypD1CBPovdBHcjBpc4GOYkbGxFgrBLpDZMBQiSgQ/v6gK6glxvwNEtcdQ1GpIuCu+clSbkzxWLZblo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=DaZOypgB; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="DaZOypgB" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=klyfdgosczkIxLXzuA4t2pqJ0LtQH1VmXZPt0a+K7m8=; b=DaZOypgBkXrwAwUvrSfjv5WH82ml+vBvH8L/+D6GSCzgZWyHydlEIwz6ulgSdtD/+bgZSD Rtimfu6vTpmLLDhniHQ6m5qYv+DCrfFLtQDKTsju/rtv+3aGwkjkxgxHj8HSWwXyvB+MHm pN7O+/sWU0gSH5MVWZ/R9Qv+0ZnaXqE= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 4/7] memcg: make count_memcg_events re-entrant safe against irqs Date: Mon, 12 May 2025 20:13:13 -0700 Message-ID: <20250513031316.2147548-5-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Let's make count_memcg_events re-entrant safe against irqs. The only thing needed is to convert the usage of __this_cpu_add() to this_cpu_add(). In addition, with re-entrant safety, there is no need to disable irqs. Also add warnings for in_nmi() as it is not safe against nmi context. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- include/linux/memcontrol.h | 21 ++------------------- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol.c | 10 +++++++--- mm/swap.c | 8 ++++---- mm/vmscan.c | 14 +++++++------- 5 files changed, 23 insertions(+), 36 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 84e2cea7e666..31b9ab93d4e1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -952,19 +952,8 @@ static inline void mod_lruvec_kmem_state(void *p, enum= node_stat_item idx, local_irq_restore(flags); } =20 -void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, - unsigned long count); - -static inline void count_memcg_events(struct mem_cgroup *memcg, - enum vm_event_item idx, - unsigned long count) -{ - unsigned long flags; - - local_irq_save(flags); - __count_memcg_events(memcg, idx, count); - local_irq_restore(flags); -} +void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, + unsigned long count); =20 static inline void count_memcg_folio_events(struct folio *folio, enum vm_event_item idx, unsigned long nr) @@ -1438,12 +1427,6 @@ static inline void mod_lruvec_kmem_state(void *p, en= um node_stat_item idx, } =20 static inline void count_memcg_events(struct mem_cgroup *memcg, - enum vm_event_item idx, - unsigned long count) -{ -} - -static inline void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) { diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 3852f0713ad2..581c960ba19b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -512,9 +512,9 @@ static void memcg1_charge_statistics(struct mem_cgroup = *memcg, int nr_pages) { /* pagein of a big page is an event. So, ignore page size */ if (nr_pages > 0) - __count_memcg_events(memcg, PGPGIN, 1); + count_memcg_events(memcg, PGPGIN, 1); else { - __count_memcg_events(memcg, PGPGOUT, 1); + count_memcg_events(memcg, PGPGOUT, 1); nr_pages =3D -nr_pages; /* for event */ } =20 @@ -689,7 +689,7 @@ void memcg1_uncharge_batch(struct mem_cgroup *memcg, un= signed long pgpgout, unsigned long flags; =20 local_irq_save(flags); - __count_memcg_events(memcg, PGPGOUT, pgpgout); + count_memcg_events(memcg, PGPGOUT, pgpgout); __this_cpu_add(memcg->events_percpu->nr_page_events, nr_memory); memcg1_check_events(memcg, nid); local_irq_restore(flags); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 373d36cae069..9e7dc90cc460 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -829,12 +829,12 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_= item idx, int val) } =20 /** - * __count_memcg_events - account VM events in a cgroup + * count_memcg_events - account VM events in a cgroup * @memcg: the memory cgroup * @idx: the event item * @count: the number of events that occurred */ -void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, +void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) { int i =3D memcg_events_index(idx); @@ -846,9 +846,13 @@ void __count_memcg_events(struct mem_cgroup *memcg, en= um vm_event_item idx, if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 + if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n", + __func__, idx)) + return; + cpu =3D get_cpu(); =20 - __this_cpu_add(memcg->vmstats_percpu->events[i], count); + this_cpu_add(memcg->vmstats_percpu->events[i], count); memcg_rstat_updated(memcg, count, cpu); trace_count_memcg_events(memcg, idx, count); =20 diff --git a/mm/swap.c b/mm/swap.c index 77b2d5997873..4fc322f7111a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -309,7 +309,7 @@ static void lru_activate(struct lruvec *lruvec, struct = folio *folio) trace_mm_lru_activate(folio); =20 __count_vm_events(PGACTIVATE, nr_pages); - __count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, nr_pages); + count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, nr_pages); } =20 #ifdef CONFIG_SMP @@ -581,7 +581,7 @@ static void lru_deactivate_file(struct lruvec *lruvec, = struct folio *folio) =20 if (active) { __count_vm_events(PGDEACTIVATE, nr_pages); - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, + count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages); } } @@ -599,7 +599,7 @@ static void lru_deactivate(struct lruvec *lruvec, struc= t folio *folio) lruvec_add_folio(lruvec, folio); =20 __count_vm_events(PGDEACTIVATE, nr_pages); - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages); + count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages); } =20 static void lru_lazyfree(struct lruvec *lruvec, struct folio *folio) @@ -625,7 +625,7 @@ static void lru_lazyfree(struct lruvec *lruvec, struct = folio *folio) lruvec_add_folio(lruvec, folio); =20 __count_vm_events(PGLAZYFREE, nr_pages); - __count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, nr_pages); + count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, nr_pages); } =20 /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 5efd939d8c76..f86d264558f5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2028,7 +2028,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, item =3D PGSCAN_KSWAPD + reclaimer_offset(sc); if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_scanned); - __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); + count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); __count_vm_events(PGSCAN_ANON + file, nr_scanned); =20 spin_unlock_irq(&lruvec->lru_lock); @@ -2048,7 +2048,7 @@ static unsigned long shrink_inactive_list(unsigned lo= ng nr_to_scan, item =3D PGSTEAL_KSWAPD + reclaimer_offset(sc); if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); - __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); + count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); spin_unlock_irq(&lruvec->lru_lock); =20 @@ -2138,7 +2138,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, =20 if (!cgroup_reclaim(sc)) __count_vm_events(PGREFILL, nr_scanned); - __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); + count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); =20 spin_unlock_irq(&lruvec->lru_lock); =20 @@ -2195,7 +2195,7 @@ static void shrink_active_list(unsigned long nr_to_sc= an, nr_deactivate =3D move_folios_to_lru(lruvec, &l_inactive); =20 __count_vm_events(PGDEACTIVATE, nr_deactivate); - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); + count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); =20 __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&lruvec->lru_lock); @@ -4616,8 +4616,8 @@ static int scan_folios(unsigned long nr_to_scan, stru= ct lruvec *lruvec, __count_vm_events(item, isolated); __count_vm_events(PGREFILL, sorted); } - __count_memcg_events(memcg, item, isolated); - __count_memcg_events(memcg, PGREFILL, sorted); + count_memcg_events(memcg, item, isolated); + count_memcg_events(memcg, PGREFILL, sorted); __count_vm_events(PGSCAN_ANON + type, isolated); trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH, scanned, skipped, isolated, @@ -4769,7 +4769,7 @@ static int evict_folios(unsigned long nr_to_scan, str= uct lruvec *lruvec, item =3D PGSTEAL_KSWAPD + reclaimer_offset(sc); if (!cgroup_reclaim(sc)) __count_vm_events(item, reclaimed); - __count_memcg_events(memcg, item, reclaimed); + count_memcg_events(memcg, item, reclaimed); __count_vm_events(PGSTEAL_ANON + type, reclaimed); =20 spin_unlock_irq(&lruvec->lru_lock); --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1FD01DE4C3 for ; Tue, 13 May 2025 03:14:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106061; cv=none; b=fnCDbMpJHr+VI/Xqrv+F3XQJL44aLPCPSxSuqOzYP53/x2H+d1LDPTdVLYnzw0nZq0fFM7I2FEBvt45shYBks2kTYwUly7vFWzTmX8+LuU/7aWddDx9ks3t2gv41sQbxpSrMPfM9/A0ZTOlSsIW2TUMeOX+OaiMflbTjmkCEa40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106061; c=relaxed/simple; bh=7sONA/V33xpobgeSmyIE5zKaplWdxKZgqzaN0mo3UjE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RGfz5lOV7XmHRE9t3e9YTiDsoLAaiy/pIbMF9sEpFU53VoaYfTGAMS2NzB5yg9gKDrS5vM/kz4OvMr2k/g3X6Z1eS9a5UgKBHEIajxHrlM8m/eEWdeZEnUdY/Dj+sREij1JrgtFhUJ9ZB3tc/+R167XqBlhS+bAkD4J4wOC70Mo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=nMEpv0pa; arc=none smtp.client-ip=95.215.58.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="nMEpv0pa" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AODqjUq8YBPAVfDEVDBp4w1/avIi3o3A3GYYWWucdOg=; b=nMEpv0paVndzE8CayYRfrJko55+hok+VSNUS4tnLfpg19zLqu/PvnMnIxqCvJWjK48L3bJ pgdw7Iur+sFTvXj2v9Wm4AgeQIsJIFY5CJhHjQVZ5kZpIc7+TLNW/z16Mp1RmWkUmKZHQW 31iyv5GGYhMMXu1kJxlCeCI14I5l6pk= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs Date: Mon, 12 May 2025 20:13:14 -0700 Message-ID: <20250513031316.2147548-6-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Let's make __mod_memcg_lruvec_state re-entrant safe and name it mod_memcg_lruvec_state(). The only thing needed is to convert the usage of __this_cpu_add() to this_cpu_add(). There are two callers of mod_memcg_lruvec_state() and one of them i.e. __mod_objcg_mlstate() will be re-entrant safe as well, so, rename it mod_objcg_mlstate(). The last caller __mod_lruvec_state() still calls __mod_node_page_state() which is not re-entrant safe yet, so keep it as is. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- mm/memcontrol.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9e7dc90cc460..adf2f1922118 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -731,7 +731,7 @@ unsigned long memcg_page_state_local(struct mem_cgroup = *memcg, int idx) } #endif =20 -static void __mod_memcg_lruvec_state(struct lruvec *lruvec, +static void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { @@ -743,16 +743,20 @@ static void __mod_memcg_lruvec_state(struct lruvec *l= ruvec, if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, id= x)) return; =20 + if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n", + __func__, idx)) + return; + pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg =3D pn->memcg; =20 cpu =3D get_cpu(); =20 /* Update memcg */ - __this_cpu_add(memcg->vmstats_percpu->state[i], val); + this_cpu_add(memcg->vmstats_percpu->state[i], val); =20 /* Update lruvec */ - __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); + this_cpu_add(pn->lruvec_stats_percpu->state[i], val); =20 val =3D memcg_state_val_in_pages(idx, val); memcg_rstat_updated(memcg, val, cpu); @@ -779,7 +783,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum nod= e_stat_item idx, =20 /* Update memcg and lruvec */ if (!mem_cgroup_disabled()) - __mod_memcg_lruvec_state(lruvec, idx, val); + mod_memcg_lruvec_state(lruvec, idx, val); } =20 void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx, @@ -2559,7 +2563,7 @@ static void commit_charge(struct folio *folio, struct= mem_cgroup *memcg) folio->memcg_data =3D (unsigned long)memcg; } =20 -static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg, +static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, struct pglist_data *pgdat, enum node_stat_item idx, int nr) { @@ -2570,7 +2574,7 @@ static inline void __mod_objcg_mlstate(struct obj_cgr= oup *objcg, memcg =3D obj_cgroup_memcg(objcg); if (likely(!in_nmi())) { lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - __mod_memcg_lruvec_state(lruvec, idx, nr); + mod_memcg_lruvec_state(lruvec, idx, nr); } else { struct mem_cgroup_per_node *pn =3D memcg->nodeinfo[pgdat->node_id]; =20 @@ -2901,12 +2905,12 @@ static void __account_obj_stock(struct obj_cgroup *= objcg, struct pglist_data *oldpg =3D stock->cached_pgdat; =20 if (stock->nr_slab_reclaimable_b) { - __mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B, + mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B, stock->nr_slab_reclaimable_b); stock->nr_slab_reclaimable_b =3D 0; } if (stock->nr_slab_unreclaimable_b) { - __mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B, + mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B, stock->nr_slab_unreclaimable_b); stock->nr_slab_unreclaimable_b =3D 0; } @@ -2932,7 +2936,7 @@ static void __account_obj_stock(struct obj_cgroup *ob= jcg, } } if (nr) - __mod_objcg_mlstate(objcg, pgdat, idx, nr); + mod_objcg_mlstate(objcg, pgdat, idx, nr); } =20 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_by= tes, @@ -3004,13 +3008,13 @@ static void drain_obj_stock(struct obj_stock_pcp *s= tock) */ if (stock->nr_slab_reclaimable_b || stock->nr_slab_unreclaimable_b) { if (stock->nr_slab_reclaimable_b) { - __mod_objcg_mlstate(old, stock->cached_pgdat, + mod_objcg_mlstate(old, stock->cached_pgdat, NR_SLAB_RECLAIMABLE_B, stock->nr_slab_reclaimable_b); stock->nr_slab_reclaimable_b =3D 0; } if (stock->nr_slab_unreclaimable_b) { - __mod_objcg_mlstate(old, stock->cached_pgdat, + mod_objcg_mlstate(old, stock->cached_pgdat, NR_SLAB_UNRECLAIMABLE_B, stock->nr_slab_unreclaimable_b); stock->nr_slab_unreclaimable_b =3D 0; @@ -3050,7 +3054,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg= , unsigned int nr_bytes, =20 if (unlikely(in_nmi())) { if (pgdat) - __mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes); + mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes); nr_pages =3D nr_bytes >> PAGE_SHIFT; nr_bytes =3D nr_bytes & (PAGE_SIZE - 1); atomic_add(nr_bytes, &objcg->nr_charged_bytes); --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F3CA2940B for ; Tue, 13 May 2025 03:14:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106089; cv=none; b=Es/T4z0RQsSfZa2olj69DDBBaxvxB+HVZ9IBRn0RIWzxjTaLAqB3k3GYvrFv5KkpO4Y8W+83xpYuapSIBH6Hlp6IfShWEPiTXgvKlLaX01asLAADT3u99Grl5ACPuvAnDBG5al1KSuUC87vRXqX31yVDKHaEZqH6tY9uy6O3eoY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106089; c=relaxed/simple; bh=tzT6JA/r5TkV8JllNM+FoqeHtdq7KrC3AoGh5YFp/kE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ChQfOC7YXkOF1XzGLIqO+VnRkQG6gQ/P7JHeELgLGaEpQoij8ib5cvLnA4Mqt1kIe+wDGxAAOV7RG2ASG7QY7BQbrsiyrcdRZi7vRlT7qvtQBXjB+/F+XdZ7PmZ6xMmtOu5uWvRTH8hSnqYNZTNrVDgKQkLaIVwTJ1uxWigax4A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=pDrJPdFU; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="pDrJPdFU" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106085; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fO3n7VreyxXtpgsjbDq9VlopfXabKVT3tylZkZXuQ8Q=; b=pDrJPdFUkledZkOpZSrVThmJnwlzIPPtE5A/fYKeqLewpsDPY157dmQoLr7OnQk7QY33XP 0HqvK4rK8E795Gp0RDxNXi9XpYhIZBQPB7f3h3rPmGJJelyscFEQACOESqw0iAv6FE3nV8 ZfRjYtHnABfCNVuo8b6DOnvJoM8P6ug= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling Date: Mon, 12 May 2025 20:13:15 -0700 Message-ID: <20250513031316.2147548-7-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" There is no need to disable irqs to use objcg per-cpu stock, so let's just not do that but consume_obj_stock() and refill_obj_stock() will need to use trylock instead to keep per-cpu stock safe. One consequence of this change is that the charge request from irq context may take slowpath more often but it should be rare. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- mm/memcontrol.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index adf2f1922118..af7df675d733 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1918,18 +1918,17 @@ static void drain_local_memcg_stock(struct work_str= uct *dummy) static void drain_local_obj_stock(struct work_struct *dummy) { struct obj_stock_pcp *stock; - unsigned long flags; =20 if (WARN_ONCE(!in_task(), "drain in non-task context")) return; =20 - local_lock_irqsave(&obj_stock.lock, flags); + local_lock(&obj_stock.lock); =20 stock =3D this_cpu_ptr(&obj_stock); drain_obj_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); =20 - local_unlock_irqrestore(&obj_stock.lock, flags); + local_unlock(&obj_stock.lock); } =20 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) @@ -2062,14 +2061,13 @@ void drain_all_stock(struct mem_cgroup *root_memcg) static int memcg_hotplug_cpu_dead(unsigned int cpu) { struct obj_stock_pcp *obj_st; - unsigned long flags; =20 obj_st =3D &per_cpu(obj_stock, cpu); =20 /* drain_obj_stock requires objstock.lock */ - local_lock_irqsave(&obj_stock.lock, flags); + local_lock(&obj_stock.lock); drain_obj_stock(obj_st); - local_unlock_irqrestore(&obj_stock.lock, flags); + local_unlock(&obj_stock.lock); =20 /* no need for the local lock */ drain_stock_fully(&per_cpu(memcg_stock, cpu)); @@ -2943,14 +2941,12 @@ static bool consume_obj_stock(struct obj_cgroup *ob= jcg, unsigned int nr_bytes, struct pglist_data *pgdat, enum node_stat_item idx) { struct obj_stock_pcp *stock; - unsigned long flags; bool ret =3D false; =20 - if (unlikely(in_nmi())) + if (unlikely(in_nmi()) || + !local_trylock(&obj_stock.lock)) return ret; =20 - local_lock_irqsave(&obj_stock.lock, flags); - stock =3D this_cpu_ptr(&obj_stock); if (objcg =3D=3D READ_ONCE(stock->cached_objcg) && stock->nr_bytes >=3D n= r_bytes) { stock->nr_bytes -=3D nr_bytes; @@ -2960,7 +2956,7 @@ static bool consume_obj_stock(struct obj_cgroup *objc= g, unsigned int nr_bytes, __account_obj_stock(objcg, stock, nr_bytes, pgdat, idx); } =20 - local_unlock_irqrestore(&obj_stock.lock, flags); + local_unlock(&obj_stock.lock); =20 return ret; } @@ -3049,10 +3045,10 @@ static void refill_obj_stock(struct obj_cgroup *obj= cg, unsigned int nr_bytes, enum node_stat_item idx) { struct obj_stock_pcp *stock; - unsigned long flags; unsigned int nr_pages =3D 0; =20 - if (unlikely(in_nmi())) { + if (unlikely(in_nmi()) || + !local_trylock(&obj_stock.lock)) { if (pgdat) mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes); nr_pages =3D nr_bytes >> PAGE_SHIFT; @@ -3061,8 +3057,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg= , unsigned int nr_bytes, goto out; } =20 - local_lock_irqsave(&obj_stock.lock, flags); - stock =3D this_cpu_ptr(&obj_stock); if (READ_ONCE(stock->cached_objcg) !=3D objcg) { /* reset if necessary */ drain_obj_stock(stock); @@ -3083,7 +3077,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg= , unsigned int nr_bytes, stock->nr_bytes &=3D (PAGE_SIZE - 1); } =20 - local_unlock_irqrestore(&obj_stock.lock, flags); + local_unlock(&obj_stock.lock); out: if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); --=20 2.47.1 From nobody Fri Dec 19 17:33:19 2025 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A6421DFDB9 for ; Tue, 13 May 2025 03:14:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106092; cv=none; b=RO/HMHXn3cQZRfLoT6DSbMLluJj3EJnp/7v8JYEy6L8HLmPoSfiM4wKj3K7OTW4ouzuyfGd93Q1oMAFvMqpMPdHk5ayyekcoL0BSWa/NluYSE2A4GHMYugk7VYW0fEL2TZ+kkoRdrQByXb70QN3vVVJd5WKgt3of7VCpstS+fxc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747106092; c=relaxed/simple; bh=bq4n+CapkiEufHmuXYXMnRfxZsa90fHutL5r54oLDFw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UBsTpzVLXRCo20nD2ndkOSD2V3c6FuNjDgVTCoNpxiX/ruAtrYeoyKbKbN1YSbcyHoVDZudjn4nSdODXUHqAecy3KDB0/yDDZAkyjMxRsdX/17lMxWvMF+liWrNgfphUq5JediKVWH91edho+5aQtBwHCgOvKNIt7az0ye5Re8U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VHLI3S18; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VHLI3S18" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747106089; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FByBL+Uk6nVPajbs5jWaOtV9bA6rKUCuGTge/ypRKI4=; b=VHLI3S18C0ilRJtHPuy5DdQfm7qGScWd+9hI+WlIJvZueihhycJEFqGZU2rpRElPW0R7dG eRrrEvnQdBtAUsWD0dya+31904+ICeO+NjH2WCcHY18grx/0I+CEACOvkXQojrVYoAv2gN mjw9+y9HBGPc9spYYkZCsrqOavp9eyE= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug Date: Mon, 12 May 2025 20:13:16 -0700 Message-ID: <20250513031316.2147548-8-shakeel.butt@linux.dev> In-Reply-To: <20250513031316.2147548-1-shakeel.butt@linux.dev> References: <20250513031316.2147548-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Previously on the cpu hot-unplug, the kernel would call drain_obj_stock() with objcg local lock. However local lock was not neede as the stock which was accessed belongs to a dead cpu but we kept it there to disable irqs as drain_obj_stock() may call mod_objcg_mlstate() which required irqs disabled. However there is no need to disable irqs now for mod_objcg_mlstate(), so we can remove the lcoal lock altogether from cpu hot-unplug path. Signed-off-by: Shakeel Butt Acked-by: Vlastimil Babka --- mm/memcontrol.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7df675d733..539cd76e1492 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2060,16 +2060,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg) =20 static int memcg_hotplug_cpu_dead(unsigned int cpu) { - struct obj_stock_pcp *obj_st; - - obj_st =3D &per_cpu(obj_stock, cpu); - - /* drain_obj_stock requires objstock.lock */ - local_lock(&obj_stock.lock); - drain_obj_stock(obj_st); - local_unlock(&obj_stock.lock); - /* no need for the local lock */ + drain_obj_stock(&per_cpu(obj_stock, cpu)); drain_stock_fully(&per_cpu(memcg_stock, cpu)); =20 return 0; --=20 2.47.1