net/ipv4/af_inet.c | 7 +++++++ 1 file changed, 7 insertions(+)
From: Menglong Dong <imagedong@tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
I'm not sure what should do here, maybe we should unhash the sock
for UDP? Therefor, user can try to bind another port?
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
net/ipv4/af_inet.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 04067b249bf3..9e5710f40a39 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
if (!(flags & BIND_FROM_BPF)) {
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
+ if (sk->sk_prot == &udp_prot)
+ sk->sk_prot->unhash(sk);
+ else if (sk->sk_prot == &tcp_prot)
+ inet_put_port(sk);
+
inet->inet_saddr = inet->inet_rcv_saddr = 0;
+ err = -EPERM;
+
goto out_release_sock;
}
}
--
2.30.2
On Mon, 27 Dec 2021 14:20:35 +0800 menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
>
> The return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() in
> __inet_bind() is not handled properly. While the return value
> is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
> exit:
>
> err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
> if (err) {
> inet->inet_saddr = inet->inet_rcv_saddr = 0;
> goto out_release_sock;
> }
>
> Let's take UDP for example and see what will happen. For UDP
> socket, it will be added to 'udp_prot.h.udp_table->hash' and
> 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
> called success. If 'inet->inet_rcv_saddr' is specified here,
> then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
> to (because inet_saddr is changed to 0), and UDP packet received
> will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
> specified here, the sock will work fine, as it can receive packet
> properly, which is wired, as the 'bind()' is already failed.
>
> I'm not sure what should do here, maybe we should unhash the sock
> for UDP? Therefor, user can try to bind another port?
Enumarating the L4 unwind paths in L3 code seems like a fairly clear
layering violation. A new callback to undo ->sk_prot->get_port() may
be better.
Does IPv6 no need as similar change?
You need to provide a selftest to validate the expected behavior.
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 04067b249bf3..9e5710f40a39 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
> if (!(flags & BIND_FROM_BPF)) {
> err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
> if (err) {
> + if (sk->sk_prot == &udp_prot)
> + sk->sk_prot->unhash(sk);
> + else if (sk->sk_prot == &tcp_prot)
> + inet_put_port(sk);
> +
> inet->inet_saddr = inet->inet_rcv_saddr = 0;
> + err = -EPERM;
> +
> goto out_release_sock;
> }
> }
On Thu, Dec 30, 2021 at 5:09 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 27 Dec 2021 14:20:35 +0800 menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > The return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() in
> > __inet_bind() is not handled properly. While the return value
> > is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
> > exit:
> >
> > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
> > if (err) {
> > inet->inet_saddr = inet->inet_rcv_saddr = 0;
> > goto out_release_sock;
> > }
> >
> > Let's take UDP for example and see what will happen. For UDP
> > socket, it will be added to 'udp_prot.h.udp_table->hash' and
> > 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
> > called success. If 'inet->inet_rcv_saddr' is specified here,
> > then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
> > to (because inet_saddr is changed to 0), and UDP packet received
> > will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
> > specified here, the sock will work fine, as it can receive packet
> > properly, which is wired, as the 'bind()' is already failed.
> >
> > I'm not sure what should do here, maybe we should unhash the sock
> > for UDP? Therefor, user can try to bind another port?
>
> Enumarating the L4 unwind paths in L3 code seems like a fairly clear
> layering violation. A new callback to undo ->sk_prot->get_port() may
> be better.
Yeah, it seems there isn't an easier way to solve this problem, a new
callback is needed.
>
> Does IPv6 no need as similar change?
>
IPv6 nedd change too. This patch is just to get some suggestions :/
> You need to provide a selftest to validate the expected behavior.
I'll add it.
Thanks!
Menglong Dong
>
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 04067b249bf3..9e5710f40a39 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
> > if (!(flags & BIND_FROM_BPF)) {
> > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
> > if (err) {
> > + if (sk->sk_prot == &udp_prot)
> > + sk->sk_prot->unhash(sk);
> > + else if (sk->sk_prot == &tcp_prot)
> > + inet_put_port(sk);
> > +
> > inet->inet_saddr = inet->inet_rcv_saddr = 0;
> > + err = -EPERM;
> > +
> > goto out_release_sock;
> > }
> > }
>
© 2016 - 2026 Red Hat, Inc.