[PATCH mptcp-net 0/9] net: sysctl: avoid using current->nsproxy

Matthieu Baerts (NGI0) posted 9 patches 1 week, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20250107-mptcp-sysfs-netns-v1-0-2fa7075d9970@kernel.org
There is a newer version of this series
net/mptcp/ctrl.c  | 17 +++++++++--------
net/rds/tcp.c     | 39 ++++++++++++++++++++++++++++++++-------
net/sctp/sysctl.c | 14 ++++++++------
3 files changed, 49 insertions(+), 21 deletions(-)
[PATCH mptcp-net 0/9] net: sysctl: avoid using current->nsproxy
Posted by Matthieu Baerts (NGI0) 1 week, 6 days ago
As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
structure via 'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'net' or 'pernet' structure can be obtained from the table->data
using container_of().

Note that table->data could also be used directly in more places, but
that would increase the size of this fix to replace all accesses via
'net'.

Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
in MPTCP, SCTP and RDS. There are multiple patches to ease the
backports.

Patch 1 is not directly linked to this, but it is a small fix for MPTCP
available_schedulers sysctl knob to explicitly mark it as read-only.

Please note that this series does not address Al's comment [2]. In SCTP,
some sysctl knobs set other sysfs-exposed variables for the min/max: two
processes could then write two linked values at the same time, resulting
in new values being outside the new boundaries.

Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Link: https://lore.kernel.org/netdev/20250105211158.GL1977892@ZenIV/ [2]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (9):
      mptcp: sysctl: avail sched: remove write access
      mptcp: sysctl: sched: avoid using current->nsproxy
      mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
      sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
      sctp: sysctl: rto_min/max: avoid using current->nsproxy
      sctp: sysctl: auth_enable: avoid using current->nsproxy
      sctp: sysctl: udp_port: avoid using current->nsproxy
      sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
      rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy

 net/mptcp/ctrl.c  | 17 +++++++++--------
 net/rds/tcp.c     | 39 ++++++++++++++++++++++++++++++++-------
 net/sctp/sysctl.c | 14 ++++++++------
 3 files changed, 49 insertions(+), 21 deletions(-)
---
base-commit: 474ab61b2b3a9a0c31267efe807c4fed12d66d28
change-id: 20250105-mptcp-sysfs-netns-23723ba85401

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>
Re: [PATCH mptcp-net 0/9] net: sysctl: avoid using current->nsproxy
Posted by Mat Martineau 1 week, 6 days ago
On Tue, 7 Jan 2025, Matthieu Baerts (NGI0) wrote:

> As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
> structure via 'current' is not recommended for different reasons:
>
> - Inconsistency: getting info from the reader's/writer's netns vs only
>  from the opener's netns.
>
> - current->nsproxy can be NULL in some cases, resulting in an 'Oops'
>  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
>  syzbot [1] using acct(2).
>
> The 'net' or 'pernet' structure can be obtained from the table->data
> using container_of().
>
> Note that table->data could also be used directly in more places, but
> that would increase the size of this fix to replace all accesses via
> 'net'.
>
> Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
> in MPTCP, SCTP and RDS. There are multiple patches to ease the
> backports.
>
> Patch 1 is not directly linked to this, but it is a small fix for MPTCP
> available_schedulers sysctl knob to explicitly mark it as read-only.
>
> Please note that this series does not address Al's comment [2]. In SCTP,
> some sysctl knobs set other sysfs-exposed variables for the min/max: two
> processes could then write two linked values at the same time, resulting
> in new values being outside the new boundaries.


Thanks Matthieu,

The mptcp patches (1-3) LGTM:

Reviewed-by: Mat Martineau <martineau@kernel.org>


The sctp and rds patches look fine as well.


- Mat


>
> Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
> Link: https://lore.kernel.org/netdev/20250105211158.GL1977892@ZenIV/ [2]
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Matthieu Baerts (NGI0) (9):
>      mptcp: sysctl: avail sched: remove write access
>      mptcp: sysctl: sched: avoid using current->nsproxy
>      mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
>      sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
>      sctp: sysctl: rto_min/max: avoid using current->nsproxy
>      sctp: sysctl: auth_enable: avoid using current->nsproxy
>      sctp: sysctl: udp_port: avoid using current->nsproxy
>      sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
>      rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
>
> net/mptcp/ctrl.c  | 17 +++++++++--------
> net/rds/tcp.c     | 39 ++++++++++++++++++++++++++++++++-------
> net/sctp/sysctl.c | 14 ++++++++------
> 3 files changed, 49 insertions(+), 21 deletions(-)
> ---
> base-commit: 474ab61b2b3a9a0c31267efe807c4fed12d66d28
> change-id: 20250105-mptcp-sysfs-netns-23723ba85401
>
> Best regards,
> -- 
> Matthieu Baerts (NGI0) <matttbe@kernel.org>
>
>
>
Re: [PATCH mptcp-net 0/9] net: sysctl: avoid using current->nsproxy
Posted by Matthieu Baerts 1 week, 6 days ago
Hi Mat,

On 08/01/2025 02:49, Mat Martineau wrote:
> On Tue, 7 Jan 2025, Matthieu Baerts (NGI0) wrote:
> 
>> As pointed out by Al Viro and Eric Dumazet in [1], using the 'net'
>> structure via 'current' is not recommended for different reasons:
>>
>> - Inconsistency: getting info from the reader's/writer's netns vs only
>>  from the opener's netns.
>>
>> - current->nsproxy can be NULL in some cases, resulting in an 'Oops'
>>  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
>>  syzbot [1] using acct(2).
>>
>> The 'net' or 'pernet' structure can be obtained from the table->data
>> using container_of().
>>
>> Note that table->data could also be used directly in more places, but
>> that would increase the size of this fix to replace all accesses via
>> 'net'.
>>
>> Patches 2-9 remove access of net via current->nsproxy in sysfs handlers
>> in MPTCP, SCTP and RDS. There are multiple patches to ease the
>> backports.
>>
>> Patch 1 is not directly linked to this, but it is a small fix for MPTCP
>> available_schedulers sysctl knob to explicitly mark it as read-only.
>>
>> Please note that this series does not address Al's comment [2]. In SCTP,
>> some sysctl knobs set other sysfs-exposed variables for the min/max: two
>> processes could then write two linked values at the same time, resulting
>> in new values being outside the new boundaries.
> 
> 
> Thanks Matthieu,
> 
> The mptcp patches (1-3) LGTM:
> 
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> 
> 
> The sctp and rds patches look fine as well.

Thank you for this review!

Patches are now in our tree (fixes for -net) with your RvB tag on the 3
first ones. I will upstream them later on today.

New patches for t/upstream-net and t/upstream:
- 968e9bb5a4cd: mptcp: sysctl: avail sched: remove write access
- eda363130f80: mptcp: sysctl: sched: avoid using current->nsproxy
- c838529f87ee: mptcp: sysctl: blackhole timeout: avoid using
current->nsproxy
- fb1a7a8fb16c: sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
- a18c9552dca0: sctp: sysctl: rto_min/max: avoid using current->nsproxy
- 8408f9182e52: sctp: sysctl: auth_enable: avoid using current->nsproxy
- c45cb99c9aa0: sctp: sysctl: udp_port: avoid using current->nsproxy
- 3df3f684ee7e: sctp: sysctl: plpmtud_probe_interval: avoid using
current->nsproxy
- a77b912deb35: rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using
current->nsproxy
- Results: bd3d4b1ab1bc..d66993cd4a59 (export-net)
- Results: ebe8ebeee62e..f289bc80702c (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/6c101bff7e1613c0087472d2ee78c986e221435e/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/0e96d0da96b041f087c5c0b46bc34978572ccffe/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net 0/9] net: sysctl: avoid using current->nsproxy
Posted by MPTCP CI 1 week, 6 days ago
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12656267762

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/6e18f999ad48
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=923074


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)