[PATCH net 0/6] Make TCP-MD5-diag slightly less broken

Dmitry Safonov via B4 Relay posted 6 patches 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20241106-tcp-md5-diag-prep-v1-0-d62debf3dded@gmail.com
There is a newer version of this series
include/linux/inet_diag.h |  3 +-
include/net/tcp.h         |  1 -
net/ipv4/inet_diag.c      | 87 ++++++++++++++++++++++++++++++++++++++---------
net/ipv4/tcp_diag.c       | 69 ++++++++++++++++++-------------------
net/mptcp/diag.c          | 20 -----------
net/netlink/af_netlink.c  |  4 +--
net/tls/tls_main.c        | 17 ---------
7 files changed, 109 insertions(+), 92 deletions(-)
[PATCH net 0/6] Make TCP-MD5-diag slightly less broken
Posted by Dmitry Safonov via B4 Relay 2 weeks ago
My original intent was to replace the last non-upstream Arista's TCP-AO
piece. That is per-netns procfs seqfile which lists AO keys. In my view
an acceptable upstream alternative would be TCP-AO-diag uAPI.

So, I started by looking and reviewing TCP-MD5-diag code. And straight
away I saw a bunch of issues:

1. Similarly to TCP_MD5SIG_EXT, which doesn't check tcpm_flags for
   unknown flags and so being non-extendable setsockopt(), the same
   way tcp_diag_put_md5sig() dumps md5 keys in an array of
   tcp_diag_md5sig, which makes it ABI non-extendable structure
   as userspace can't tolerate any new members in it.

2. Inet-diag allocates netlink message for sockets in
   inet_diag_dump_one_icsk(), which uses a TCP-diag callback
   .idiag_get_aux_size(), that pre-calculates the needed space for
   TCP-diag related information. But as neither socket lock nor
   rcu_readlock() are held between allocation and the actual TCP
   info filling, the TCP-related space requirement may change before
   reaching tcp_diag_put_md5sig(). I.e., the number of TCP-MD5 keys on
   a socket. Thankfully, TCP-MD5-diag won't overwrite the skb, but will
   return EMSGSIZE, triggering WARN_ON() in inet_diag_dump_one_icsk().

3. Inet-diag "do" request* can create skb of any message required size.
   But "dump" request* the skb size, since d35c99ff77ec ("netlink: do
   not enter direct reclaim from netlink_dump()") is limited by
   32 KB. Having in mind that sizeof(struct tcp_diag_md5sig) = 100 bytes, 
   dumps for sockets that have more than 327 keys are going to fail
   (not counting other diag infos, which lower this limit futher).
   That is much lower than the number of TCP-MD5 keys that can be
   allocated on a socket with the current default
   optmem_max limit (128Kb).

So, then I went and written selftests for TCP-MD5-diag and besides
confirming that (2) and (3) are not theoretical issues, I also
discovered another issues, that I didn't notice on code inspection:

4. nlattr::nla_len is __u16, which limits the largest netlink attibute
   by 64Kb or by 655 tcp_diag_md5sig keys in the diag array. What
   happens de-facto is that the netlink attribute gets u16 overflow,
   breaking the userspace parsing - RTA_NEXT(), that should point
   to the next attribute, points into the middle of md5 keys array.

In this patch set issues (2) and (4) are addressed.
(2) by not returning EMSGSIZE when the dump raced with modifying
TCP-MD5 keys on a socket, but mark the dump inconsistent by setting
NLM_F_DUMP_INTR nlmsg flag. Which changes uAPI in situations where
previously kernel did WARN() and errored the dump.
(4) by artificially limiting the maximum attribute size by U16_MAX - 1.

In order to remove the new limit from (4) solution, my plan is to
convert the dump of TCP-MD5 keys from an array to
NL_ATTR_TYPE_NESTED_ARRAY (or alike), which should also address (1).
And for (3), it's needed to teach tcp-diag how-to remember not only
socket on which previous recvmsg() stopped, but potentially TCP-MD5
key as well.

I plan in the next part of patch set address (3), (1) and the new limit
for (4), together with adding new TCP-AO-diag.

* Terminology from Documentation/userspace-api/netlink/intro.rst

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
---
Dmitry Safonov (6):
      net/diag: Do not race on dumping MD5 keys with adding new MD5 keys
      net/diag: Warn only once on EMSGSIZE
      net/diag: Pre-allocate optional info only if requested
      net/diag: Always pre-allocate tcp_ulp info
      net/diag: Limit TCP-MD5-diag array by max attribute length
      net/netlink: Correct the comment on netlink message max cap

 include/linux/inet_diag.h |  3 +-
 include/net/tcp.h         |  1 -
 net/ipv4/inet_diag.c      | 87 ++++++++++++++++++++++++++++++++++++++---------
 net/ipv4/tcp_diag.c       | 69 ++++++++++++++++++-------------------
 net/mptcp/diag.c          | 20 -----------
 net/netlink/af_netlink.c  |  4 +--
 net/tls/tls_main.c        | 17 ---------
 7 files changed, 109 insertions(+), 92 deletions(-)
---
base-commit: 2e1b3cc9d7f790145a80cb705b168f05dab65df2
change-id: 20241106-tcp-md5-diag-prep-2f0dcf371d90

Best regards,
-- 
Dmitry Safonov <0x7f454c46@gmail.com>
Re: [PATCH net 0/6] Make TCP-MD5-diag slightly less broken
Posted by MPTCP CI 2 weeks ago
Hi Dmitry,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Unstable: 1 failed test(s): bpftest_test_progs_mptcp 🔴
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/11710160691

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/67a329533b43
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=907077


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)