net/mptcp/pm_netlink.c | 2 ++ 1 file changed, 2 insertions(+)
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
net/mptcp/pm_netlink.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3e4ad801786f..d4cbf7dcf983 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1430,8 +1430,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
+ spin_lock_bh(&msk->pm.lock);
list_del(&entry->list);
kfree(entry);
+ spin_unlock_bh(&msk->pm.lock);
return true;
}
--
2.43.0
Hello, Thank you for this patch! On 04/09/2024 03:01, Edward Adam Davis wrote: > There are two paths to access mptcp_pm_del_add_timer, result in a race > condition: > > CPU1 CPU2 > ==== ==== > net_rx_action > napi_poll netlink_sendmsg > __napi_poll netlink_unicast > process_backlog netlink_unicast_kernel > __netif_receive_skb genl_rcv > __netif_receive_skb_one_core netlink_rcv_skb > NF_HOOK genl_rcv_msg > ip_local_deliver_finish genl_family_rcv_msg > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > tcp_data_queue remove_anno_list_by_saddr > mptcp_incoming_options mptcp_pm_del_add_timer > mptcp_pm_del_add_timer kfree(entry) > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > zone protected by "pm.lock", the entry will be released, which leads to the > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d Please add a Fixes tag and Cc stable. And add 'net' after PATCH in the subject: [PATCH net v3] > Signed-off-by: Edward Adam Davis <eadavis@qq.com> > --- > net/mptcp/pm_netlink.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c > index 3e4ad801786f..d4cbf7dcf983 100644 > --- a/net/mptcp/pm_netlink.c > +++ b/net/mptcp/pm_netlink.c > @@ -1430,8 +1430,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk, > > entry = mptcp_pm_del_add_timer(msk, addr, false); > if (entry) { > + spin_lock_bh(&msk->pm.lock); > list_del(&entry->list); > kfree(entry); > + spin_unlock_bh(&msk->pm.lock); Mmh, I can understand it would help to reduce issues here, but I don't think that's enough: in mptcp_pm_del_add_timer(), CPU1 can get the entry from the list under the lock, then immediately after, the free can happen on CPU2, while CPU1 is trying to access entry->add_timer outside the lock, no? Something like this: CPU1 CPU2 ==== ==== entry = (...) kfree(entry) entry->add_timer What about keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer"? I'm thinking about something like that to be applied *on top* of your patch, WDYT? https://lore.kernel.org/20240904170517.237863-2-matttbe@kernel.org Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Wed, 4 Sep 2024 22:39:10 +0200, Matthieu Baerts wrote: >On 04/09/2024 03:01, Edward Adam Davis wrote: >> There are two paths to access mptcp_pm_del_add_timer, result in a race >> condition: >> >> CPU1 CPU2 >> ==== ==== >> net_rx_action >> napi_poll netlink_sendmsg >> __napi_poll netlink_unicast >> process_backlog netlink_unicast_kernel >> __netif_receive_skb genl_rcv >> __netif_receive_skb_one_core netlink_rcv_skb >> NF_HOOK genl_rcv_msg >> ip_local_deliver_finish genl_family_rcv_msg >> ip_protocol_deliver_rcu genl_family_rcv_msg_doit >> tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit >> tcp_v4_do_rcv mptcp_nl_remove_addrs_list >> tcp_rcv_established mptcp_pm_remove_addrs_and_subflows >> tcp_data_queue remove_anno_list_by_saddr >> mptcp_incoming_options mptcp_pm_del_add_timer >> mptcp_pm_del_add_timer kfree(entry) >> >> In remove_anno_list_by_saddr(running on CPU2), after leaving the critical >> zone protected by "pm.lock", the entry will be released, which leads to the >> occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). >> >> Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > >Please add a Fixes tag and Cc stable. > >And add 'net' after PATCH in the subject: Got it, I have added them in V3 patch. > > [PATCH net v3] > >> Signed-off-by: Edward Adam Davis <eadavis@qq.com> >> --- >> net/mptcp/pm_netlink.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c >> index 3e4ad801786f..d4cbf7dcf983 100644 >> --- a/net/mptcp/pm_netlink.c >> +++ b/net/mptcp/pm_netlink.c >> @@ -1430,8 +1430,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk, >> >> entry = mptcp_pm_del_add_timer(msk, addr, false); >> if (entry) { >> + spin_lock_bh(&msk->pm.lock); >> list_del(&entry->list); >> kfree(entry); >> + spin_unlock_bh(&msk->pm.lock); > >Mmh, I can understand it would help to reduce issues here, but I don't >think that's enough: in mptcp_pm_del_add_timer(), CPU1 can get the entry >from the list under the lock, then immediately after, the free can >happen on CPU2, while CPU1 is trying to access entry->add_timer outside >the lock, no? Something like this: > > CPU1 CPU2 > ==== ==== > entry = (...) > kfree(entry) > entry->add_timer > > >What about keeping a reference to add_timer inside the lock, and calling >sk_stop_timer_sync() with this reference, instead of "entry->add_timer"? >I'm thinking about something like that to be applied *on top* of your >patch, WDYT? I strongly agree. This can avoid accessing the entry outside the lock. I have integrated your code to my patch. BR, Edward
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
net/mptcp/pm_netlink.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3e4ad801786f..7ddb373cc6ad 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -329,17 +329,21 @@ struct mptcp_pm_add_entry *
mptcp_pm_del_add_timer(struct mptcp_sock *msk,
const struct mptcp_addr_info *addr, bool check_id)
{
- struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
+ struct mptcp_pm_add_entry *entry;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1430,8 +1434,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
+ spin_lock_bh(&msk->pm.lock);
list_del(&entry->list);
kfree(entry);
+ spin_unlock_bh(&msk->pm.lock);
return true;
}
--
2.43.0
On 9/5/24 14:27, Edward Adam Davis wrote: > There are two paths to access mptcp_pm_del_add_timer, result in a race > condition: > > CPU1 CPU2 > ==== ==== > net_rx_action > napi_poll netlink_sendmsg > __napi_poll netlink_unicast > process_backlog netlink_unicast_kernel > __netif_receive_skb genl_rcv > __netif_receive_skb_one_core netlink_rcv_skb > NF_HOOK genl_rcv_msg > ip_local_deliver_finish genl_family_rcv_msg > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > tcp_data_queue remove_anno_list_by_saddr > mptcp_incoming_options mptcp_pm_del_add_timer > mptcp_pm_del_add_timer kfree(entry) > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > zone protected by "pm.lock", the entry will be released, which leads to the > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > Keeping a reference to add_timer inside the lock, and calling > sk_stop_timer_sync() with this reference, instead of "entry->add_timer". > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > Cc: stable@vger.kernel.org > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Signed-off-by: Edward Adam Davis <eadavis@qq.com> > --- > net/mptcp/pm_netlink.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c > index 3e4ad801786f..7ddb373cc6ad 100644 > --- a/net/mptcp/pm_netlink.c > +++ b/net/mptcp/pm_netlink.c > @@ -329,17 +329,21 @@ struct mptcp_pm_add_entry * > mptcp_pm_del_add_timer(struct mptcp_sock *msk, > const struct mptcp_addr_info *addr, bool check_id) > { > - struct mptcp_pm_add_entry *entry; > struct sock *sk = (struct sock *)msk; > + struct timer_list *add_timer = NULL; > + struct mptcp_pm_add_entry *entry; > > spin_lock_bh(&msk->pm.lock); > entry = mptcp_lookup_anno_list_by_saddr(msk, addr); > - if (entry && (!check_id || entry->addr.id == addr->id)) > + if (entry && (!check_id || entry->addr.id == addr->id)) { > entry->retrans_times = ADD_ADDR_RETRANS_MAX; > + add_timer = &entry->add_timer; > + } > spin_unlock_bh(&msk->pm.lock); > > - if (entry && (!check_id || entry->addr.id == addr->id)) > - sk_stop_timer_sync(sk, &entry->add_timer); > + /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */ > + if (add_timer) > + sk_stop_timer_sync(sk, add_timer); > > return entry; > } > @@ -1430,8 +1434,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk, > > entry = mptcp_pm_del_add_timer(msk, addr, false); > if (entry) { > + spin_lock_bh(&msk->pm.lock); > list_del(&entry->list); > kfree(entry); > + spin_unlock_bh(&msk->pm.lock); I'm sorry for the late feedback. I think this is not enough to fix races for good, i.e. mptcp_nl_remove_subflow_and_signal_addr() -> mptcp_pm_remove_anno_addr() -> remove_anno_list_by_saddr() could race with: mptcp_pm_remove_addrs() -> remove_anno_list_by_saddr() and both CPUs could see the same 'entry' returned by mptcp_pm_del_add_timer(). I think the list_del() in remove_anno_list_by_saddr() should moved under the pm lock protection inside mptcp_pm_del_add_timer(), and no need to add spin_lock_bh() around the kfree call. Thanks, Paolo
On Mon, 9 Sep 2024 15:07:21 +0200, Paolo Abeni wrote: > On 9/5/24 14:27, Edward Adam Davis wrote: > > There are two paths to access mptcp_pm_del_add_timer, result in a race > > condition: > > > > CPU1 CPU2 > > ==== ==== > > net_rx_action > > napi_poll netlink_sendmsg > > __napi_poll netlink_unicast > > process_backlog netlink_unicast_kernel > > __netif_receive_skb genl_rcv > > __netif_receive_skb_one_core netlink_rcv_skb > > NF_HOOK genl_rcv_msg > > ip_local_deliver_finish genl_family_rcv_msg > > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > > tcp_data_queue remove_anno_list_by_saddr > > mptcp_incoming_options mptcp_pm_del_add_timer > > mptcp_pm_del_add_timer kfree(entry) > > > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > > zone protected by "pm.lock", the entry will be released, which leads to the > > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > > > Keeping a reference to add_timer inside the lock, and calling > > sk_stop_timer_sync() with this reference, instead of "entry->add_timer". > > > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > > Cc: stable@vger.kernel.org > > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > > Signed-off-by: Edward Adam Davis <eadavis@qq.com> > > --- > > net/mptcp/pm_netlink.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c > > index 3e4ad801786f..7ddb373cc6ad 100644 > > --- a/net/mptcp/pm_netlink.c > > +++ b/net/mptcp/pm_netlink.c > > @@ -329,17 +329,21 @@ struct mptcp_pm_add_entry * > > mptcp_pm_del_add_timer(struct mptcp_sock *msk, > > const struct mptcp_addr_info *addr, bool check_id) > > { > > - struct mptcp_pm_add_entry *entry; > > struct sock *sk = (struct sock *)msk; > > + struct timer_list *add_timer = NULL; > > + struct mptcp_pm_add_entry *entry; > > > > spin_lock_bh(&msk->pm.lock); > > entry = mptcp_lookup_anno_list_by_saddr(msk, addr); > > - if (entry && (!check_id || entry->addr.id == addr->id)) > > + if (entry && (!check_id || entry->addr.id == addr->id)) { > > entry->retrans_times = ADD_ADDR_RETRANS_MAX; > > + add_timer = &entry->add_timer; > > + } > > spin_unlock_bh(&msk->pm.lock); > > > > - if (entry && (!check_id || entry->addr.id == addr->id)) > > - sk_stop_timer_sync(sk, &entry->add_timer); > > + /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */ > > + if (add_timer) > > + sk_stop_timer_sync(sk, add_timer); > > > > return entry; > > } > > @@ -1430,8 +1434,10 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk, > > > > entry = mptcp_pm_del_add_timer(msk, addr, false); > > if (entry) { > > + spin_lock_bh(&msk->pm.lock); > > list_del(&entry->list); > > kfree(entry); > > + spin_unlock_bh(&msk->pm.lock); > > I'm sorry for the late feedback. > > I think this is not enough to fix races for good, i.e. > > mptcp_nl_remove_subflow_and_signal_addr() -> mptcp_pm_remove_anno_addr() > -> remove_anno_list_by_saddr() > > could race with: > > mptcp_pm_remove_addrs() -> remove_anno_list_by_saddr() > > and both CPUs could see the same 'entry' returned by > mptcp_pm_del_add_timer(). Yes, you are right. > > I think the list_del() in remove_anno_list_by_saddr() should moved under > the pm lock protection inside mptcp_pm_del_add_timer(), and no need to > add spin_lock_bh() around the kfree call. Agreed, I will update the patch. BR, Edward
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
net/mptcp/pm_netlink.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3e4ad801786f..f195b577c367 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -331,15 +331,21 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
+ if (!check_id && entry)
+ list_del(&entry->list);
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1430,7 +1436,6 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
- list_del(&entry->list);
kfree(entry);
return true;
}
--
2.43.0
Hi Edward, On 10/09/2024 11:58, Edward Adam Davis wrote: > There are two paths to access mptcp_pm_del_add_timer, result in a race > condition: > > CPU1 CPU2 > ==== ==== > net_rx_action > napi_poll netlink_sendmsg > __napi_poll netlink_unicast > process_backlog netlink_unicast_kernel > __netif_receive_skb genl_rcv > __netif_receive_skb_one_core netlink_rcv_skb > NF_HOOK genl_rcv_msg > ip_local_deliver_finish genl_family_rcv_msg > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > tcp_data_queue remove_anno_list_by_saddr > mptcp_incoming_options mptcp_pm_del_add_timer > mptcp_pm_del_add_timer kfree(entry) > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > zone protected by "pm.lock", the entry will be released, which leads to the > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > Keeping a reference to add_timer inside the lock, and calling > sk_stop_timer_sync() with this reference, instead of "entry->add_timer". > > Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, > do not directly access any members of the entry outside the pm lock, which > can avoid similar "entry->x" uaf. > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > Cc: stable@vger.kernel.org > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Signed-off-by: Edward Adam Davis <eadavis@qq.com> It looks also good to me, but no need for me to add a reviewed-by tag I suppose. This v4 can be applied in netdev directly. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On 9/10/24 11:58, Edward Adam Davis wrote: > There are two paths to access mptcp_pm_del_add_timer, result in a race > condition: > > CPU1 CPU2 > ==== ==== > net_rx_action > napi_poll netlink_sendmsg > __napi_poll netlink_unicast > process_backlog netlink_unicast_kernel > __netif_receive_skb genl_rcv > __netif_receive_skb_one_core netlink_rcv_skb > NF_HOOK genl_rcv_msg > ip_local_deliver_finish genl_family_rcv_msg > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > tcp_data_queue remove_anno_list_by_saddr > mptcp_incoming_options mptcp_pm_del_add_timer > mptcp_pm_del_add_timer kfree(entry) > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > zone protected by "pm.lock", the entry will be released, which leads to the > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > Keeping a reference to add_timer inside the lock, and calling > sk_stop_timer_sync() with this reference, instead of "entry->add_timer". > > Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, > do not directly access any members of the entry outside the pm lock, which > can avoid similar "entry->x" uaf. > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > Cc: stable@vger.kernel.org > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com>
Hi Edward, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10790383980 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/480c7ad533ba Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=888806 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
net/mptcp/pm_netlink.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 3e4ad801786f..f195b577c367 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -331,15 +331,21 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
+ if (!check_id && entry)
+ list_del(&entry->list);
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1430,7 +1436,6 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
- list_del(&entry->list);
kfree(entry);
return true;
}
--
2.43.0
Hi Edward, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10790389747 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/e58211a3272b Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=888810 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
Hi Edward, On 05/09/2024 14:27, Edward Adam Davis wrote: > There are two paths to access mptcp_pm_del_add_timer, result in a race > condition: > > CPU1 CPU2 > ==== ==== > net_rx_action > napi_poll netlink_sendmsg > __napi_poll netlink_unicast > process_backlog netlink_unicast_kernel > __netif_receive_skb genl_rcv > __netif_receive_skb_one_core netlink_rcv_skb > NF_HOOK genl_rcv_msg > ip_local_deliver_finish genl_family_rcv_msg > ip_protocol_deliver_rcu genl_family_rcv_msg_doit > tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit > tcp_v4_do_rcv mptcp_nl_remove_addrs_list > tcp_rcv_established mptcp_pm_remove_addrs_and_subflows > tcp_data_queue remove_anno_list_by_saddr > mptcp_incoming_options mptcp_pm_del_add_timer > mptcp_pm_del_add_timer kfree(entry) > > In remove_anno_list_by_saddr(running on CPU2), after leaving the critical > zone protected by "pm.lock", the entry will be released, which leads to the > occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). > > Keeping a reference to add_timer inside the lock, and calling > sk_stop_timer_sync() with this reference, instead of "entry->add_timer". > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > Cc: stable@vger.kernel.org > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> According to the doc [1], a 'Co-dev' tag is supposed to be added before this SoB: Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> But I'm sure that's fine without it. [1] https://docs.kernel.org/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by This v3 now looks good to me. I don't know if it needs to be reviewed by someone else as I'm now listed as co-developed. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Fri, 6 Sep 2024 20:55:20 +0200 Matthieu Baerts wrote: > > Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") > > Cc: stable@vger.kernel.org > > Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > > According to the doc [1], a 'Co-dev' tag is supposed to be added before > this SoB: > > Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > > But I'm sure that's fine without it. To be clear, would you like us to pick this up directly for net? Or you'll send it back to us with other MPTCP patches?
Hi Jakub, On 07/09/2024 00:02, Jakub Kicinski wrote: > On Fri, 6 Sep 2024 20:55:20 +0200 Matthieu Baerts wrote: >>> Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") >>> Cc: stable@vger.kernel.org >>> Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com >>> Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d >>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> >> >> According to the doc [1], a 'Co-dev' tag is supposed to be added before >> this SoB: >> >> Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> >> >> But I'm sure that's fine without it. > > To be clear, would you like us to pick this up directly for net? Sorry, I forgot that: yes, can you pick this up directly please? I think that's best to do that for fixes that are ready. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Edward, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Unstable: 1 failed test(s): mptcp_connect_mmap 🔴 - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10721999077 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/b6a976b908cd Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=887227 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
Hi Edward, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Script error! ❓ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10720554167 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/a896db33c525 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=887227 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
Hi Edward, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10693305915 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/39f9be3b7f68 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=886515 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2024 Red Hat, Inc.