net/ipv4/ipmr.c | 2 ++ 1 file changed, 2 insertions(+)
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the lock is not held, causing the following error
when `RCU_PROVE` is not held.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by RetransmitAggre/3519:
#0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
#1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
stack backtrace:
lockdep_rcu_suspicious
mr_table_dump
ipmr_rtm_dumproute
rtnl_dump_all
rtnl_dumpit
netlink_dump
__netlink_dump_start
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds
`rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to
prevent RCU-list traversal in non-reader section.
Since `mr_table_dump()` is the only function that touches the list, that
might be the only critical section in `ipmr_rtm_dumproute()` that needs
to be protected in ipmr_rtm_dumproute().
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
---
net/ipv4/ipmr.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
return -ENOENT;
}
+ rcu_read_lock();
err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
&mfc_unres_lock, &filter);
+ rcu_read_unlock();
return skb->len ? : err;
}
---
base-commit: 25d70702142ac2115e75e01a0a985c6ea1d78033
change-id: 20241107-ipmr_rcu-291d85400b16
Best regards,
--
Breno Leitao <leitao@debian.org>
On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote:
>
> Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
> following code flow, the lock is not held, causing the following error
> when `RCU_PROVE` is not held.
>
> 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
> -----------------------------
> net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
>
> rcu_scheduler_active = 2, debug_locks = 1
> 2 locks held by RetransmitAggre/3519:
> #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
> #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
>
> stack backtrace:
> lockdep_rcu_suspicious
> mr_table_dump
> ipmr_rtm_dumproute
> rtnl_dump_all
> rtnl_dumpit
> netlink_dump
> __netlink_dump_start
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
>
> Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds
> `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to
> prevent RCU-list traversal in non-reader section.
>
> Since `mr_table_dump()` is the only function that touches the list, that
> might be the only critical section in `ipmr_rtm_dumproute()` that needs
> to be protected in ipmr_rtm_dumproute().
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
> ---
> net/ipv4/ipmr.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
> NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
> return -ENOENT;
> }
> + rcu_read_lock();
> err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
> &mfc_unres_lock, &filter);
> + rcu_read_unlock();
> return skb->len ? : err;
> }
>
>
What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ?
In my opinion, since we still hold RTNL in these paths, we should
change the lockdep annotation.
Then later we can remove RTNL from these dump operations.
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb
100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
if (filter->filter_set)
flags |= NLM_F_DUMP_FILTERED;
- list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
+ list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list,
+ lockdep_rtnl_is_held()) {
if (e < s_e)
goto next_entry;
if (filter->dev &&
Hello Eric,
On Thu, Nov 07, 2024 at 02:13:14PM +0100, Eric Dumazet wrote:
> On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
> > following code flow, the lock is not held, causing the following error
> > when `RCU_PROVE` is not held.
> >
> > 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
> > -----------------------------
> > net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
> >
> > rcu_scheduler_active = 2, debug_locks = 1
> > 2 locks held by RetransmitAggre/3519:
> > #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
> > #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
> >
> > stack backtrace:
> > lockdep_rcu_suspicious
> > mr_table_dump
> > ipmr_rtm_dumproute
> > rtnl_dump_all
> > rtnl_dumpit
> > netlink_dump
> > __netlink_dump_start
> > rtnetlink_rcv_msg
> > netlink_rcv_skb
> > netlink_unicast
> > netlink_sendmsg
> >
> > Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds
> > `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to
> > prevent RCU-list traversal in non-reader section.
> >
> > Since `mr_table_dump()` is the only function that touches the list, that
> > might be the only critical section in `ipmr_rtm_dumproute()` that needs
> > to be protected in ipmr_rtm_dumproute().
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
> > ---
> > net/ipv4/ipmr.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> > index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644
> > --- a/net/ipv4/ipmr.c
> > +++ b/net/ipv4/ipmr.c
> > @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
> > NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
> > return -ENOENT;
> > }
> > + rcu_read_lock();
> > err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
> > &mfc_unres_lock, &filter);
> > + rcu_read_unlock();
> > return skb->len ? : err;
> > }
> >
> >
>
> What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ?
That one might require as well.
> In my opinion, since we still hold RTNL in these paths, we should
> change the lockdep annotation.
I don't have much experience mixing locks like this. Is it safe to mix
and match rtnl and RCUs like this?
I have the impression that, when iterating a RCU protected list *without* being in the read-side
critical sections, the RCU doesn't know that someone might be traversing
the list, and remove the element mid air (mroute_clean_tables()?). Is
this model incorrect?
> Then later we can remove RTNL from these dump operations.
Do you mean that, execute the dump operation without holding the RTNL,
thus, relying solely on RCU?
> diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
> index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb
> 100644
> --- a/net/ipv4/ipmr_base.c
> +++ b/net/ipv4/ipmr_base.c
> @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
> if (filter->filter_set)
> flags |= NLM_F_DUMP_FILTERED;
>
> - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
> + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list,
> + lockdep_rtnl_is_held()) {
> if (e < s_e)
> goto next_entry;
> if (filter->dev &&
Clarifying next steps: Would you like me to review/test and submit, or
are you planning to send it officially?
Thanks for your feedback,
--breno
On Thu, Nov 7, 2024 at 2:51 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Eric,
>
> On Thu, Nov 07, 2024 at 02:13:14PM +0100, Eric Dumazet wrote:
> > On Thu, Nov 7, 2024 at 12:03 PM Breno Leitao <leitao@debian.org> wrote:
> > >
> > > Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
> > > following code flow, the lock is not held, causing the following error
> > > when `RCU_PROVE` is not held.
> > >
> > > 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
> > > -----------------------------
> > > net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
> > >
> > > rcu_scheduler_active = 2, debug_locks = 1
> > > 2 locks held by RetransmitAggre/3519:
> > > #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
> > > #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
> > >
> > > stack backtrace:
> > > lockdep_rcu_suspicious
> > > mr_table_dump
> > > ipmr_rtm_dumproute
> > > rtnl_dump_all
> > > rtnl_dumpit
> > > netlink_dump
> > > __netlink_dump_start
> > > rtnetlink_rcv_msg
> > > netlink_rcv_skb
> > > netlink_unicast
> > > netlink_sendmsg
> > >
> > > Fix accessing `mfc_cache_list` without holding the RCU read lock. Adds
> > > `rcu_read_lock()` and `rcu_read_unlock()` around `mr_table_dump()` to
> > > prevent RCU-list traversal in non-reader section.
> > >
> > > Since `mr_table_dump()` is the only function that touches the list, that
> > > might be the only critical section in `ipmr_rtm_dumproute()` that needs
> > > to be protected in ipmr_rtm_dumproute().
> > >
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
> > > ---
> > > net/ipv4/ipmr.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> > > index 089864c6a35eec146a1ba90c22d79245f8e48158..bb855f32f328024f384a2fa58f42fc227705206e 100644
> > > --- a/net/ipv4/ipmr.c
> > > +++ b/net/ipv4/ipmr.c
> > > @@ -2612,8 +2612,10 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
> > > NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
> > > return -ENOENT;
> > > }
> > > + rcu_read_lock();
> > > err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
> > > &mfc_unres_lock, &filter);
> > > + rcu_read_unlock();
> > > return skb->len ? : err;
> > > }
> > >
> > >
> >
> > What about net/ipv6/ip6mr.c ip6mr_rtm_dumproute() ?
>
> That one might require as well.
>
> > In my opinion, since we still hold RTNL in these paths, we should
> > change the lockdep annotation.
>
> I don't have much experience mixing locks like this. Is it safe to mix
> and match rtnl and RCUs like this?
If holding RTNL is preventing any updates, then surely holding RTNL is enough
to iterate through the list.
>
> I have the impression that, when iterating a RCU protected list *without* being in the read-side
> critical sections, the RCU doesn't know that someone might be traversing
> the list, and remove the element mid air (mroute_clean_tables()?). Is
> this model incorrect?
>
> > Then later we can remove RTNL from these dump operations.
>
> Do you mean that, execute the dump operation without holding the RTNL,
> thus, relying solely on RCU?
Right, you might have seen patches adding RTNL_FLAG_DUMP_UNLOCKED in
some places ?
More patches are welcomed, in net-next.
>
> > diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
> > index 271dc03fc6dbd9b35db4d5782716679134f225e4..f0af12a2f70bcdf5ba54321bf7ebebe798318abb
> > 100644
> > --- a/net/ipv4/ipmr_base.c
> > +++ b/net/ipv4/ipmr_base.c
> > @@ -310,7 +310,8 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
> > if (filter->filter_set)
> > flags |= NLM_F_DUMP_FILTERED;
> >
> > - list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
> > + list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list,
> > + lockdep_rtnl_is_held()) {
> > if (e < s_e)
> > goto next_entry;
> > if (filter->dev &&
>
> Clarifying next steps: Would you like me to review/test and submit, or
> are you planning to send it officially?
For this kind of feedback, I am usually expecting you to send a new
version (after waiting one day,
maybe other reviewers have something to say)
Thank you.
Hello Eric, On Thu, Nov 07, 2024 at 03:04:19PM +0100, Eric Dumazet wrote: > > Do you mean that, execute the dump operation without holding the RTNL, > > thus, relying solely on RCU? > > Right, you might have seen patches adding RTNL_FLAG_DUMP_UNLOCKED in > some places ? > > More patches are welcomed, in net-next. Sure. Do you have any explict driver that you need help, and no one is looking at? > > Clarifying next steps: Would you like me to review/test and submit, or > > are you planning to send it officially? > > For this kind of feedback, I am usually expecting you to send a new > version (after waiting one day, > maybe other reviewers have something to say) Thanks. I am testing it now, and I will submit the patch tomorrow if no one else have any further comments.
© 2016 - 2026 Red Hat, Inc.