[PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.

Kuniyuki Iwashima posted 13 patches 1 month, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20250721203624.3807041-1-kuniyu@google.com
Documentation/admin-guide/cgroup-v2.rst | 16 +++++
include/linux/memcontrol.h              | 50 ++++++++-----
include/net/proto_memory.h              | 10 ++-
include/net/sock.h                      | 66 +++++++++++++++++
include/net/tcp.h                       | 10 ++-
mm/memcontrol.c                         | 84 +++++++++++++++++++---
net/core/sock.c                         | 95 ++++++++++++++++---------
net/ipv4/inet_connection_sock.c         | 35 +++++----
net/ipv4/tcp_output.c                   | 13 ++--
net/mptcp/protocol.h                    |  4 +-
net/mptcp/subflow.c                     | 11 +--
11 files changed, 299 insertions(+), 95 deletions(-)
[PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by Kuniyuki Iwashima 1 month, 3 weeks ago
Some protocols (e.g., TCP, UDP) has their own memory accounting for
socket buffers and charge memory to global per-protocol counters such
as /proc/net/ipv4/tcp_mem.

When running under a non-root cgroup, this memory is also charged to
the memcg as sock in memory.stat.

Sockets using such protocols are still subject to the global limits,
thus affected by a noisy neighbour outside cgroup.

This makes it difficult to accurately estimate and configure appropriate
global limits.

If all workloads were guaranteed to be controlled under memcg, the issue
can be worked around by setting tcp_mem[0~2] to UINT_MAX.

However, this assumption does not always hold, and a single workload that
opts out of memcg can consume memory up to the global limit, which is
problematic.

This series introduces a new per-memcg know to allow decoupling memcg
from the global memory accounting, which simplifies the memcg
configuration while keeping the global limits within a reasonable range.

Overview of the series:

  patch 1 is a bug fix for MPTCP
  patch 2 ~ 9 move sk->sk_memcg accesses to a single place
  patch 10 moves sk_memcg under CONFIG_MEMCG
  patch 11 & 12 introduces a flag and stores it to the lowest bit of sk->sk_memcg
  patch 13 decouples memcg from sk_prot->memory_allocated based on the flag


Kuniyuki Iwashima (13):
  mptcp: Fix up subflow's memcg when CONFIG_SOCK_CGROUP_DATA=n.
  mptcp: Use tcp_under_memory_pressure() in mptcp_epollin_ready().
  tcp: Simplify error path in inet_csk_accept().
  net: Call trace_sock_exceed_buf_limit() for memcg failure with
    SK_MEM_RECV.
  net: Clean up __sk_mem_raise_allocated().
  net-memcg: Introduce mem_cgroup_from_sk().
  net-memcg: Introduce mem_cgroup_sk_enabled().
  net-memcg: Pass struct sock to mem_cgroup_sk_(un)?charge().
  net-memcg: Pass struct sock to mem_cgroup_sk_under_memory_pressure().
  net: Define sk_memcg under CONFIG_MEMCG.
  net-memcg: Add memory.socket_isolated knob.
  net-memcg: Store memcg->socket_isolated in sk->sk_memcg.
  net-memcg: Allow decoupling memcg from global protocol memory
    accounting.

 Documentation/admin-guide/cgroup-v2.rst | 16 +++++
 include/linux/memcontrol.h              | 50 ++++++++-----
 include/net/proto_memory.h              | 10 ++-
 include/net/sock.h                      | 66 +++++++++++++++++
 include/net/tcp.h                       | 10 ++-
 mm/memcontrol.c                         | 84 +++++++++++++++++++---
 net/core/sock.c                         | 95 ++++++++++++++++---------
 net/ipv4/inet_connection_sock.c         | 35 +++++----
 net/ipv4/tcp_output.c                   | 13 ++--
 net/mptcp/protocol.h                    |  4 +-
 net/mptcp/subflow.c                     | 11 +--
 11 files changed, 299 insertions(+), 95 deletions(-)

-- 
2.50.0.727.gbf7dc18ff4-goog
Re: [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by Shakeel Butt 1 month, 3 weeks ago
On Mon, Jul 21, 2025 at 08:35:19PM +0000, Kuniyuki Iwashima wrote:
> Some protocols (e.g., TCP, UDP) has their own memory accounting for
> socket buffers and charge memory to global per-protocol counters such
> as /proc/net/ipv4/tcp_mem.
> 
> When running under a non-root cgroup, this memory is also charged to
> the memcg as sock in memory.stat.
> 
> Sockets using such protocols are still subject to the global limits,
> thus affected by a noisy neighbour outside cgroup.
> 
> This makes it difficult to accurately estimate and configure appropriate
> global limits.
> 
> If all workloads were guaranteed to be controlled under memcg, the issue
> can be worked around by setting tcp_mem[0~2] to UINT_MAX.
> 
> However, this assumption does not always hold, and a single workload that
> opts out of memcg can consume memory up to the global limit, which is
> problematic.
> 
> This series introduces a new per-memcg know to allow decoupling memcg
> from the global memory accounting, which simplifies the memcg
> configuration while keeping the global limits within a reasonable range.

Sorry, the above para is confusing. What is per-memcg know? Or maybe it
is knob. Also please go a bit in more detail how decoupling helps the
global limits within a reasonable range?
Re: [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by Eric Dumazet 1 month, 3 weeks ago
On Tue, Jul 22, 2025 at 8:04 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Mon, Jul 21, 2025 at 08:35:19PM +0000, Kuniyuki Iwashima wrote:
> > Some protocols (e.g., TCP, UDP) has their own memory accounting for
> > socket buffers and charge memory to global per-protocol counters such
> > as /proc/net/ipv4/tcp_mem.
> >
> > When running under a non-root cgroup, this memory is also charged to
> > the memcg as sock in memory.stat.
> >
> > Sockets using such protocols are still subject to the global limits,
> > thus affected by a noisy neighbour outside cgroup.
> >
> > This makes it difficult to accurately estimate and configure appropriate
> > global limits.
> >
> > If all workloads were guaranteed to be controlled under memcg, the issue
> > can be worked around by setting tcp_mem[0~2] to UINT_MAX.
> >
> > However, this assumption does not always hold, and a single workload that
> > opts out of memcg can consume memory up to the global limit, which is
> > problematic.
> >
> > This series introduces a new per-memcg know to allow decoupling memcg
> > from the global memory accounting, which simplifies the memcg
> > configuration while keeping the global limits within a reasonable range.
>
> Sorry, the above para is confusing. What is per-memcg know? Or maybe it
> is knob. Also please go a bit in more detail how decoupling helps the
> global limits within a reasonable range?

The intent is to no longer have to increase tcp_mem[0..2] just to
allow a big job to use 90 % of physical memory all for TCP sockets and
buffers.

Leave the linux default values. They have been considered reasonable
for decades.

They will only be used by applications not using memcg to limit TCP
memory usage.
Re: [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by MPTCP CI 1 month, 3 weeks ago
Hi Kuniyuki,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16436561761

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/fdb62a4fb078
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=984458


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Re: [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by MPTCP CI 1 month, 3 weeks ago
Hi Kuniyuki,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Critical: Global Timeout ❌
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16436561761

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/fdb62a4fb078
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=984458


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Re: [PATCH v1 net-next 00/13] net-memcg: Allow decoupling memcg from sk->sk_prot->memory_allocated.
Posted by MPTCP CI 1 month, 3 weeks ago
Hi Kuniyuki,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Unstable: 2 failed test(s): packetdrill_sockopts selftest_mptcp_join 🔴
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Unstable: 1 failed test(s): bpftest_test_progs-cpuv4_mptcp 🔴
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16427954278

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/2b90b2bcf308
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=984458


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)