net/ipv4/ipmr.c | 2 ++ 1 file changed, 2 insertions(+)
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the lock is not held, causing the following error
when `RCU_PROVE` is not held.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by RetransmitAggre/3519:
#0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
#1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
stack backtrace:
lockdep_rcu_suspicious
mr_table_dump
ipmr_rtm_dumproute
rtnl_dump_all
rtnl_dumpit
netlink_dump
__netlink_dump_start
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds
`rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to
prevent RCU-list traversal in non-reader section.
Since `mr_table_dump()` is the only function that touches the list, that
might be the only critical section in `ipmr_rtm_dumproute()` that needs
to be protected in ipmr_rtm_dumproute().
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
---
net/ipv4/ipmr.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
return -ENOENT;
}
+ rcu_read_lock();
err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
&mfc_unres_lock, &filter);
+ rcu_read_unlock();
return skb->len ? : err;
}
---
base-commit: 25d70702142ac2115e75e01a0a985c6ea1d78033
change-id: 20241107-ipmr_rcu-291d85400b16
Best regards,
--
Breno Leitao <leitao@debian.org>
On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote: > > Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the > following code flow, the lock is not held, causing the following error > when `RCU_PROVE` is not held. > > 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N > ----------------------------- > net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!! > > rcu_scheduler_active = 2, debug_locks = 1 > 2 locks held by RetransmitAggre/3519: > #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290 > #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90 > > stack backtrace: > lockdep_rcu_suspicious > mr_table_dump > ipmr_rtm_dumproute > rtnl_dump_all > rtnl_dumpit > netlink_dump > __netlink_dump_start > rtnetlink_rcv_msg > netlink_rcv_skb > netlink_unicast > netlink_sendmsg > > Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds > `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to > prevent RCU-list traversal in non-reader section. > > Since `mr_table_dump()` is the only function that touches the list, that > might be the only critical section in `ipmr_rtm_dumproute()` that needs > to be protected in ipmr_rtm_dumproute(). > > Signed-off-by: Breno Leitao <leitao@debian.org> > Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") > --- > net/ipv4/ipmr.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c > index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644 > --- a/net/ipv4/ipmr.c > +++ b/net/ipv4/ipmr.c > @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb) > NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist"); > return -ENOENT; > } > + rcu_read_lock(); > err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute, > &mfc_unres_lock, &filter); > + rcu_read_unlock(); > return skb->len ? : err; > } > > What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ? In my opinion, since we still hold RTNL in these paths, we should change the lockdep annotation. Then later we can remove RTNL from these dump operations. diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb 100644 --- a/net/ipv4/ipmr_base.c +++ b/net/ipv4/ipmr_base.c @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb, if (filter->filter_set) flags |= NLM_F_DUMP_FILTERED; - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) { + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list, + lockdep_rtnl_is_held()) { if (e < s_e) goto next_entry; if (filter->dev &&
Hello Eric, On Thu, Nov 07, 2024 at 02:13:14PM +0100, Eric Dumazet wrote: > On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote: > > > > Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the > > following code flow, the lock is not held, causing the following error > > when `RCU_PROVE` is not held. > > > > 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N > > ----------------------------- > > net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!! > > > > rcu_scheduler_active = 2, debug_locks = 1 > > 2 locks held by RetransmitAggre/3519: > > #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290 > > #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90 > > > > stack backtrace: > > lockdep_rcu_suspicious > > mr_table_dump > > ipmr_rtm_dumproute > > rtnl_dump_all > > rtnl_dumpit > > netlink_dump > > __netlink_dump_start > > rtnetlink_rcv_msg > > netlink_rcv_skb > > netlink_unicast > > netlink_sendmsg > > > > Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds > > `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to > > prevent RCU-list traversal in non-reader section. > > > > Since `mr_table_dump()` is the only function that touches the list, that > > might be the only critical section in `ipmr_rtm_dumproute()` that needs > > to be protected in ipmr_rtm_dumproute(). > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") > > --- > > net/ipv4/ipmr.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c > > index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644 > > --- a/net/ipv4/ipmr.c > > +++ b/net/ipv4/ipmr.c > > @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb) > > NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist"); > > return -ENOENT; > > } > > + rcu_read_lock(); > > err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute, > > &mfc_unres_lock, &filter); > > + rcu_read_unlock(); > > return skb->len ? : err; > > } > > > > > > What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ? That one might require as well. > In my opinion, since we still hold RTNL in these paths, we should > change the lockdep annotation. I don't have much experience mixing locks like this. Is it safe to mix and match rtnl and RCUs like this? I have the impression that, when iterating a RCU protected list *without* being in the read-side critical sections, the RCU doesn't know that someone might be traversing the list, and remove the element mid air (mroute_clean_tables()?). Is this model incorrect? > Then later we can remove RTNL from these dump operations. Do you mean that, execute the dump operation without holding the RTNL, thus, relying solely on RCU? > diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c > index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb > 100644 > --- a/net/ipv4/ipmr_base.c > +++ b/net/ipv4/ipmr_base.c > @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb, > if (filter->filter_set) > flags |= NLM_F_DUMP_FILTERED; > > - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) { > + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list, > + lockdep_rtnl_is_held()) { > if (e < s_e) > goto next_entry; > if (filter->dev && Clarifying next steps: Would you like me to review/test and submit, or are you planning to send it officially? Thanks for your feedback, --breno
On Thu, Nov 7, 2024 at 2:51 PM Breno Leitao <leitao@debian.org> wrote: > > Hello Eric, > > On Thu, Nov 07, 2024 at 02:13:14PM +0100, Eric Dumazet wrote: > > On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote: > > > > > > Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the > > > following code flow, the lock is not held, causing the following error > > > when `RCU_PROVE` is not held. > > > > > > 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N > > > ----------------------------- > > > net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!! > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > 2 locks held by RetransmitAggre/3519: > > > #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290 > > > #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90 > > > > > > stack backtrace: > > > lockdep_rcu_suspicious > > > mr_table_dump > > > ipmr_rtm_dumproute > > > rtnl_dump_all > > > rtnl_dumpit > > > netlink_dump > > > __netlink_dump_start > > > rtnetlink_rcv_msg > > > netlink_rcv_skb > > > netlink_unicast > > > netlink_sendmsg > > > > > > Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds > > > `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to > > > prevent RCU-list traversal in non-reader section. > > > > > > Since `mr_table_dump()` is the only function that touches the list, that > > > might be the only critical section in `ipmr_rtm_dumproute()` that needs > > > to be protected in ipmr_rtm_dumproute(). > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") > > > --- > > > net/ipv4/ipmr.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c > > > index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644 > > > --- a/net/ipv4/ipmr.c > > > +++ b/net/ipv4/ipmr.c > > > @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb) > > > NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist"); > > > return -ENOENT; > > > } > > > + rcu_read_lock(); > > > err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute, > > > &mfc_unres_lock, &filter); > > > + rcu_read_unlock(); > > > return skb->len ? : err; > > > } > > > > > > > > > > What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ? > > That one might require as well. > > > In my opinion, since we still hold RTNL in these paths, we should > > change the lockdep annotation. > > I don't have much experience mixing locks like this. Is it safe to mix > and match rtnl and RCUs like this? If holding RTNL is preventing any updates, then surely holding RTNL is enough to iterate through the list. > > I have the impression that, when iterating a RCU protected list *without* being in the read-side > critical sections, the RCU doesn't know that someone might be traversing > the list, and remove the element mid air (mroute_clean_tables()?). Is > this model incorrect? > > > Then later we can remove RTNL from these dump operations. > > Do you mean that, execute the dump operation without holding the RTNL, > thus, relying solely on RCU? Right, you might have seen patches adding RTNL_FLAG_DUMP_UNLOCKED in some places ? More patches are welcomed, in net-next. > > > diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c > > index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb > > 100644 > > --- a/net/ipv4/ipmr_base.c > > +++ b/net/ipv4/ipmr_base.c > > @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb, > > if (filter->filter_set) > > flags |= NLM_F_DUMP_FILTERED; > > > > - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) { > > + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list, > > + lockdep_rtnl_is_held()) { > > if (e < s_e) > > goto next_entry; > > if (filter->dev && > > Clarifying next steps: Would you like me to review/test and submit, or > are you planning to send it officially? For this kind of feedback, I am usually expecting you to send a new version (after waiting one day, maybe other reviewers have something to say) Thank you.
Hello Eric, On Thu, Nov 07, 2024 at 03:04:19PM +0100, Eric Dumazet wrote: > > Do you mean that, execute the dump operation without holding the RTNL, > > thus, relying solely on RCU? > > Right, you might have seen patches adding RTNL_FLAG_DUMP_UNLOCKED in > some places ? > > More patches are welcomed, in net-next. Sure. Do you have any explict driver that you need help, and no one is looking at? > > Clarifying next steps: Would you like me to review/test and submit, or > > are you planning to send it officially? > > For this kind of feedback, I am usually expecting you to send a new > version (after waiting one day, > maybe other reviewers have something to say) Thanks. I am testing it now, and I will submit the patch tomorrow if no one else have any further comments.
© 2016 - 2024 Red Hat, Inc.