[PATCH net-next v2] net-memcg: Fix scope of sockmem pressure indicators

Abel Wu posted 1 patch 2 years, 5 months ago
include/linux/memcontrol.h | 9 +++++++--
mm/vmpressure.c            | 8 ++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
[PATCH net-next v2] net-memcg: Fix scope of sockmem pressure indicators
Posted by Abel Wu 2 years, 5 months ago
Now there are two indicators of socket memory pressure sit inside
struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
memory reclaim pressure in memcg->memory and ->tcpmem respectively.

When in legacy mode (cgroupv1), the socket memory is charged into
->tcpmem which is independent of ->memory, so socket_pressure has
nothing to do with socket's pressure at all. Things could be worse
by taking socket_pressure into consideration in legacy mode, as a
pressure in ->memory can lead to premature reclamation/throttling
in socket.

While for the default mode (cgroupv2), the socket memory is charged
into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.

So {socket,tcpmem}_pressure are only used in default/legacy mode
respectively for indicating socket memory pressure. This patch fixes
the pieces of code that make mixed use of both.

Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
v2:
 - Add a comment to clarify the scope of indicators (Roman)
 - Drop code cleanup on tcpmem_pressure (Roman)
 - Fix the lack of a Fixes: tag (Eric)
---
 include/linux/memcontrol.h | 9 +++++++--
 mm/vmpressure.c            | 8 ++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5818af8eca5a..dbf26bc89dd4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -284,6 +284,11 @@ struct mem_cgroup {
 	atomic_long_t		memory_events[MEMCG_NR_MEMORY_EVENTS];
 	atomic_long_t		memory_events_local[MEMCG_NR_MEMORY_EVENTS];
 
+	/*
+	 * Hint of reclaim pressure for socket memroy management. Note
+	 * that this indicator should NOT be used in legacy cgroup mode
+	 * where socket memory is accounted/charged separately.
+	 */
 	unsigned long		socket_pressure;
 
 	/* Legacy tcp memory accounting */
@@ -1727,8 +1732,8 @@ void mem_cgroup_sk_alloc(struct sock *sk);
 void mem_cgroup_sk_free(struct sock *sk);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
-		return true;
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		return !!memcg->tcpmem_pressure;
 	do {
 		if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
 			return true;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index b52644771cc4..22c6689d9302 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -244,6 +244,14 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 	if (mem_cgroup_disabled())
 		return;
 
+	/*
+	 * The in-kernel users only care about the reclaim efficiency
+	 * for this @memcg rather than the whole subtree, and there
+	 * isn't and won't be any in-kernel user in a legacy cgroup.
+	 */
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !tree)
+		return;
+
 	vmpr = memcg_to_vmpressure(memcg);
 
 	/*
-- 
2.37.3
Re: [PATCH net-next v2] net-memcg: Fix scope of sockmem pressure indicators
Posted by Shakeel Butt 2 years, 5 months ago
On Mon, Aug 14, 2023 at 12:09 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>
> Now there are two indicators of socket memory pressure sit inside
> struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
> memory reclaim pressure in memcg->memory and ->tcpmem respectively.
>
> When in legacy mode (cgroupv1), the socket memory is charged into
> ->tcpmem which is independent of ->memory, so socket_pressure has
> nothing to do with socket's pressure at all. Things could be worse
> by taking socket_pressure into consideration in legacy mode, as a
> pressure in ->memory can lead to premature reclamation/throttling
> in socket.
>
> While for the default mode (cgroupv2), the socket memory is charged
> into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
>
> So {socket,tcpmem}_pressure are only used in default/legacy mode
> respectively for indicating socket memory pressure. This patch fixes
> the pieces of code that make mixed use of both.
>
> Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>

So, this is undoing the unintended exposure of v2 functionality for
the v1. I wonder if someone might have started depending upon that
behavior but I am more convinced that no one is using v1's tcpmem
accounting due to performance impact. So, this looks good to me.

Acked-by: Shakeel Butt <shakeelb@google.com>

I do think we should start the deprecation process of v1's tcpmem accounting.
Re: Re: [PATCH net-next v2] net-memcg: Fix scope of sockmem pressure indicators
Posted by Abel Wu 2 years, 5 months ago
On 8/15/23 4:18 AM, Shakeel Butt wrote:
> On Mon, Aug 14, 2023 at 12:09 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>>
>> Now there are two indicators of socket memory pressure sit inside
>> struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
>> memory reclaim pressure in memcg->memory and ->tcpmem respectively.
>>
>> When in legacy mode (cgroupv1), the socket memory is charged into
>> ->tcpmem which is independent of ->memory, so socket_pressure has
>> nothing to do with socket's pressure at all. Things could be worse
>> by taking socket_pressure into consideration in legacy mode, as a
>> pressure in ->memory can lead to premature reclamation/throttling
>> in socket.
>>
>> While for the default mode (cgroupv2), the socket memory is charged
>> into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
>>
>> So {socket,tcpmem}_pressure are only used in default/legacy mode
>> respectively for indicating socket memory pressure. This patch fixes
>> the pieces of code that make mixed use of both.
>>
>> Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> 
> So, this is undoing the unintended exposure of v2 functionality for

Exactly.

> the v1. I wonder if someone might have started depending upon that
> behavior but I am more convinced that no one is using v1's tcpmem
> accounting due to performance impact. So, this looks good to me.

Agreed. The performance impact is not negligible. While not accounting
tcpmem is also undesired for Resource Manager to do provision properly.
So we have to migrate to cgroupv2, and now we encountered a new issue.
Some discussion with Roman can be found here:

https://lore.kernel.org/netdev/29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com/

It would be great if you can shed some light on this!

> 
> Acked-by: Shakeel Butt <shakeelb@google.com>

Thanks!
	Abel

> 
> I do think we should start the deprecation process of v1's tcpmem accounting.