include/net/l3mdev.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
When delete l3s ipvlan:
ip link del link eth0 ipvlan1 type ipvlan mode l3s
This may cause a null pointer dereference:
Call trace:
ip_rcv_finish+0x48/0xd0
ip_rcv+0x5c/0x100
__netif_receive_skb_one_core+0x64/0xb0
__netif_receive_skb+0x20/0x80
process_backlog+0xb4/0x204
napi_poll+0xe8/0x294
net_rx_action+0xd8/0x22c
__do_softirq+0x12c/0x354
This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after
ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process
like this:
(CPU1) | (CPU2)
l3mdev_l3_rcv() |
check dev->priv_flags: |
master = skb->dev; |
|
| ipvlan_l3s_unregister()
| set dev->priv_flags
| dev->l3mdev_ops = NULL;
|
visit master->l3mdev_ops |
Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve
this issue by add check for master->l3mdev_ops.
Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
---
include/net/l3mdev.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index f7fe796e8429..b5af87a35a9f 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -165,6 +165,7 @@ static inline
struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
{
struct net_device *master = NULL;
+ const struct l3mdev_ops *l3mdev_ops;
if (netif_is_l3_slave(skb->dev))
master = netdev_master_upper_dev_get_rcu(skb->dev);
@@ -172,8 +173,12 @@ struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
netif_has_l3_rx_handler(skb->dev))
master = skb->dev;
- if (master && master->l3mdev_ops->l3mdev_l3_rcv)
- skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);
+ if (!master)
+ return skb;
+
+ l3mdev_ops = master->l3mdev_ops;
+ if (l3mdev_ops && l3mdev_ops->l3mdev_l3_rcv)
+ skb = l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);
return skb;
}
--
2.34.1
On Thu, Mar 13, 2025 at 09:27:13AM +0800, Wang Liang wrote:
> When delete l3s ipvlan:
>
> ip link del link eth0 ipvlan1 type ipvlan mode l3s
>
> This may cause a null pointer dereference:
>
> Call trace:
> ip_rcv_finish+0x48/0xd0
> ip_rcv+0x5c/0x100
> __netif_receive_skb_one_core+0x64/0xb0
> __netif_receive_skb+0x20/0x80
> process_backlog+0xb4/0x204
> napi_poll+0xe8/0x294
> net_rx_action+0xd8/0x22c
> __do_softirq+0x12c/0x354
>
> This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after
> ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process
> like this:
>
> (CPU1) | (CPU2)
> l3mdev_l3_rcv() |
> check dev->priv_flags: |
> master = skb->dev; |
> |
> | ipvlan_l3s_unregister()
> | set dev->priv_flags
> | dev->l3mdev_ops = NULL;
> |
> visit master->l3mdev_ops |
>
> Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve
> this issue by add check for master->l3mdev_ops.
Hi Wang Liang,
It seems to me that checking master->l3mdev_ops like this is racy.
>
> Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes")
> Signed-off-by: Wang Liang <wangliang74@huawei.com>
> ---
> include/net/l3mdev.h | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
> index f7fe796e8429..b5af87a35a9f 100644
> --- a/include/net/l3mdev.h
> +++ b/include/net/l3mdev.h
> @@ -165,6 +165,7 @@ static inline
> struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
> {
> struct net_device *master = NULL;
> + const struct l3mdev_ops *l3mdev_ops;
If you end up respinning for some other reason, please rearrange the lines
above in reverse xmas tree order - longest line to shortest.
>
> if (netif_is_l3_slave(skb->dev))
> master = netdev_master_upper_dev_get_rcu(skb->dev);
> @@ -172,8 +173,12 @@ struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
> netif_has_l3_rx_handler(skb->dev))
> master = skb->dev;
>
> - if (master && master->l3mdev_ops->l3mdev_l3_rcv)
> - skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);
> + if (!master)
> + return skb;
> +
> + l3mdev_ops = master->l3mdev_ops;
> + if (l3mdev_ops && l3mdev_ops->l3mdev_l3_rcv)
> + skb = l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);
>
> return skb;
> }
> --
> 2.34.1
>
On 3/18/25 3:38 PM, Simon Horman wrote: > On Thu, Mar 13, 2025 at 09:27:13AM +0800, Wang Liang wrote: >> When delete l3s ipvlan: >> >> ip link del link eth0 ipvlan1 type ipvlan mode l3s >> >> This may cause a null pointer dereference: >> >> Call trace: >> ip_rcv_finish+0x48/0xd0 >> ip_rcv+0x5c/0x100 >> __netif_receive_skb_one_core+0x64/0xb0 >> __netif_receive_skb+0x20/0x80 >> process_backlog+0xb4/0x204 >> napi_poll+0xe8/0x294 >> net_rx_action+0xd8/0x22c >> __do_softirq+0x12c/0x354 >> >> This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after >> ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process >> like this: >> >> (CPU1) | (CPU2) >> l3mdev_l3_rcv() | >> check dev->priv_flags: | >> master = skb->dev; | >> | >> | ipvlan_l3s_unregister() >> | set dev->priv_flags >> | dev->l3mdev_ops = NULL; >> | >> visit master->l3mdev_ops | >> >> Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve >> this issue by add check for master->l3mdev_ops. > > Hi Wang Liang, > > It seems to me that checking master->l3mdev_ops like this is racy. vrf device leaves the l3mdev ops set; that is probably the better way to go.
在 2025/3/18 23:02, David Ahern 写道: > On 3/18/25 3:38 PM, Simon Horman wrote: >> On Thu, Mar 13, 2025 at 09:27:13AM +0800, Wang Liang wrote: >>> When delete l3s ipvlan: >>> >>> ip link del link eth0 ipvlan1 type ipvlan mode l3s >>> >>> This may cause a null pointer dereference: >>> >>> Call trace: >>> ip_rcv_finish+0x48/0xd0 >>> ip_rcv+0x5c/0x100 >>> __netif_receive_skb_one_core+0x64/0xb0 >>> __netif_receive_skb+0x20/0x80 >>> process_backlog+0xb4/0x204 >>> napi_poll+0xe8/0x294 >>> net_rx_action+0xd8/0x22c >>> __do_softirq+0x12c/0x354 >>> >>> This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after >>> ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process >>> like this: >>> >>> (CPU1) | (CPU2) >>> l3mdev_l3_rcv() | >>> check dev->priv_flags: | >>> master = skb->dev; | >>> | >>> | ipvlan_l3s_unregister() >>> | set dev->priv_flags >>> | dev->l3mdev_ops = NULL; >>> | >>> visit master->l3mdev_ops | >>> >>> Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve >>> this issue by add check for master->l3mdev_ops. >> Hi Wang Liang, >> >> It seems to me that checking master->l3mdev_ops like this is racy. > vrf device leaves the l3mdev ops set; that is probably the better way to go. Thanks. Only l3s ipvlan set the dev->l3mdev_ops to NULL at present, I will delete 'dev->l3mdev_ops = NULL' in ipvlan_l3s_unregister(), is that ok?
On Wed, Mar 19, 2025 at 10:07:03AM +0800, Wang Liang wrote: > > 在 2025/3/18 23:02, David Ahern 写道: > > On 3/18/25 3:38 PM, Simon Horman wrote: > > > On Thu, Mar 13, 2025 at 09:27:13AM +0800, Wang Liang wrote: > > > > When delete l3s ipvlan: > > > > > > > > ip link del link eth0 ipvlan1 type ipvlan mode l3s > > > > > > > > This may cause a null pointer dereference: > > > > > > > > Call trace: > > > > ip_rcv_finish+0x48/0xd0 > > > > ip_rcv+0x5c/0x100 > > > > __netif_receive_skb_one_core+0x64/0xb0 > > > > __netif_receive_skb+0x20/0x80 > > > > process_backlog+0xb4/0x204 > > > > napi_poll+0xe8/0x294 > > > > net_rx_action+0xd8/0x22c > > > > __do_softirq+0x12c/0x354 > > > > > > > > This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after > > > > ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process > > > > like this: > > > > > > > > (CPU1) | (CPU2) > > > > l3mdev_l3_rcv() | > > > > check dev->priv_flags: | > > > > master = skb->dev; | > > > > | > > > > | ipvlan_l3s_unregister() > > > > | set dev->priv_flags > > > > | dev->l3mdev_ops = NULL; > > > > | > > > > visit master->l3mdev_ops | > > > > > > > > Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve > > > > this issue by add check for master->l3mdev_ops. > > > Hi Wang Liang, > > > > > > It seems to me that checking master->l3mdev_ops like this is racy. > > vrf device leaves the l3mdev ops set; that is probably the better way to go. > > Thanks. > > Only l3s ipvlan set the dev->l3mdev_ops to NULL at present, I will delete > 'dev->l3mdev_ops = NULL' in ipvlan_l3s_unregister(), is that ok? TBH, I am somewhat unclear on the correct tear down behaviour. But I also understood that to be David's suggestion. And I think that implementing that, and posting it as v2 would be a good next step.
在 2025/3/20 21:37, Simon Horman 写道: > On Wed, Mar 19, 2025 at 10:07:03AM +0800, Wang Liang wrote: >> 在 2025/3/18 23:02, David Ahern 写道: >>> On 3/18/25 3:38 PM, Simon Horman wrote: >>>> On Thu, Mar 13, 2025 at 09:27:13AM +0800, Wang Liang wrote: >>>>> When delete l3s ipvlan: >>>>> >>>>> ip link del link eth0 ipvlan1 type ipvlan mode l3s >>>>> >>>>> This may cause a null pointer dereference: >>>>> >>>>> Call trace: >>>>> ip_rcv_finish+0x48/0xd0 >>>>> ip_rcv+0x5c/0x100 >>>>> __netif_receive_skb_one_core+0x64/0xb0 >>>>> __netif_receive_skb+0x20/0x80 >>>>> process_backlog+0xb4/0x204 >>>>> napi_poll+0xe8/0x294 >>>>> net_rx_action+0xd8/0x22c >>>>> __do_softirq+0x12c/0x354 >>>>> >>>>> This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after >>>>> ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process >>>>> like this: >>>>> >>>>> (CPU1) | (CPU2) >>>>> l3mdev_l3_rcv() | >>>>> check dev->priv_flags: | >>>>> master = skb->dev; | >>>>> | >>>>> | ipvlan_l3s_unregister() >>>>> | set dev->priv_flags >>>>> | dev->l3mdev_ops = NULL; >>>>> | >>>>> visit master->l3mdev_ops | >>>>> >>>>> Add lock for dev->priv_flags and dev->l3mdev_ops is too expensive. Resolve >>>>> this issue by add check for master->l3mdev_ops. >>>> Hi Wang Liang, >>>> >>>> It seems to me that checking master->l3mdev_ops like this is racy. >>> vrf device leaves the l3mdev ops set; that is probably the better way to go. >> Thanks. >> >> Only l3s ipvlan set the dev->l3mdev_ops to NULL at present, I will delete >> 'dev->l3mdev_ops = NULL' in ipvlan_l3s_unregister(), is that ok? > TBH, I am somewhat unclear on the correct tear down behaviour. > But I also understood that to be David's suggestion. > And I think that implementing that, and posting it as v2 would > be a good next step. Ok. I will send a patch v2 later, please check it. Thanks.
On 3/18/25 3:07 AM, Wang Liang wrote: >>> It seems to me that checking master->l3mdev_ops like this is racy. >> vrf device leaves the l3mdev ops set; that is probably the better way >> to go. > > Thanks. > > Only l3s ipvlan set the dev->l3mdev_ops to NULL at present, I will delete > 'dev->l3mdev_ops = NULL' in ipvlan_l3s_unregister(), is that ok? > I think that avoids the race you saw. vrf has had the ops for 9 years or so and not seen that problem.
© 2016 - 2025 Red Hat, Inc.