[PATCH net-next 0/3] net: move netdev_compute_master_upper_features to ndo_set_features

Hangbin Liu posted 3 patches 4 weeks, 1 day ago
There is a newer version of this series
drivers/net/bonding/bond_main.c | 14 +++++----
drivers/net/net_failover.c      | 67 +++++------------------------------------
drivers/net/team/team_core.c    | 15 ++++-----
include/net/net_failover.h      |  7 -----
net/8021q/vlan.c                |  2 --
net/bridge/br_device.c          |  7 +++++
net/bridge/br_if.c              |  6 ----
net/core/dev.c                  |  8 +++--
net/hsr/hsr_slave.c             |  1 -
9 files changed, 37 insertions(+), 90 deletions(-)
[PATCH net-next 0/3] net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Hangbin Liu 4 weeks, 1 day ago
Currently, master devices (bonding, bridge, team) manually call
netdev_compute_master_upper_features() scattered throughout their port
add/remove operations. This approach requires each driver to remember
to update features at the right times and leads to code duplication.

The series moves netdev_compute_master_upper_features() to callback
ndo_set_features so that the offload compute could automatically
invoked during feature updates when upper/lower device relationships
change. This centralizes the feature computation flow and removes the
burden from individual drivers.

---
Hangbin Liu (3):
      net: use ndo_set_features to set offload features for bonding/bridge/team
      failover: use ndo_set_features for failover offload compute
      net: no need to disable LRO specifically

 drivers/net/bonding/bond_main.c | 14 +++++----
 drivers/net/net_failover.c      | 67 +++++------------------------------------
 drivers/net/team/team_core.c    | 15 ++++-----
 include/net/net_failover.h      |  7 -----
 net/8021q/vlan.c                |  2 --
 net/bridge/br_device.c          |  7 +++++
 net/bridge/br_if.c              |  6 ----
 net/core/dev.c                  |  8 +++--
 net/hsr/hsr_slave.c             |  1 -
 9 files changed, 37 insertions(+), 90 deletions(-)
---
base-commit: 52ede1bce557c66309f41ac29dd190be23ca9129
change-id: 20260310-offload_compute-4c0bafa2e022

Best regards,
-- 
Hangbin Liu <liuhangbin@gmail.com>
Re: [PATCH net-next 0/3] net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Paolo Abeni 3 weeks, 6 days ago
On 3/10/26 8:45 AM, Hangbin Liu wrote:
> Currently, master devices (bonding, bridge, team) manually call
> netdev_compute_master_upper_features() scattered throughout their port
> add/remove operations. This approach requires each driver to remember
> to update features at the right times and leads to code duplication.
> 
> The series moves netdev_compute_master_upper_features() to callback
> ndo_set_features so that the offload compute could automatically
> invoked during feature updates when upper/lower device relationships
> change. This centralizes the feature computation flow and removes the
> burden from individual drivers.
> 
> ---
> Hangbin Liu (3):
>       net: use ndo_set_features to set offload features for bonding/bridge/team
>       failover: use ndo_set_features for failover offload compute
>       net: no need to disable LRO specifically
> 
>  drivers/net/bonding/bond_main.c | 14 +++++----
>  drivers/net/net_failover.c      | 67 +++++------------------------------------
>  drivers/net/team/team_core.c    | 15 ++++-----
>  include/net/net_failover.h      |  7 -----
>  net/8021q/vlan.c                |  2 --
>  net/bridge/br_device.c          |  7 +++++
>  net/bridge/br_if.c              |  6 ----
>  net/core/dev.c                  |  8 +++--
>  net/hsr/hsr_slave.c             |  1 -
>  9 files changed, 37 insertions(+), 90 deletions(-)

I'm dropping this series from PW due to the self-tests failures. I guess
it could be restored later if it turns out the real problem is elsewhere.

/P
[syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by syzbot ci 4 weeks, 1 day ago
syzbot ci has tested the following series

[v1] net: move netdev_compute_master_upper_features to ndo_set_features
https://lore.kernel.org/all/20260310-offload_compute-v1-0-3df79c09ea65@gmail.com
* [PATCH net-next 1/3] net: use ndo_set_features to set offload features for bonding/bridge/team
* [PATCH net-next 2/3] failover: use ndo_set_features for failover offload compute
* [PATCH net-next 3/3] net: no need to disable LRO specifically

and found the following issue:
WARNING in rtmsg_ifinfo_build_skb

Full report is available here:
https://ci.syzbot.org/series/d5001c4a-a51e-49c1-9106-624836e43ec2

***

WARNING in rtmsg_ifinfo_build_skb

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      52ede1bce557c66309f41ac29dd190be23ca9129
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/281e7e02-f6af-4b4a-845e-d1d5842a9301/config

batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
hsr_slave_0: entered promiscuous mode
hsr_slave_1: entered promiscuous mode
------------[ cut here ]------------
err == -EMSGSIZE
WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
Modules linked in:
CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
FS:  0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 rtnetlink_event+0x1b7/0x270
 notifier_call_chain+0x1be/0x400
 netdev_change_features+0x95/0xe0
 __netdev_upper_dev_link+0xb20/0xc80
 netdev_upper_dev_link+0xb0/0x100
 macsec_newlink+0xb11/0x1200
 rtnl_newlink_create+0x329/0xb70
 rtnl_newlink+0x1666/0x1be0
 rtnetlink_rcv_msg+0x7d5/0xbe0
 netlink_rcv_skb+0x232/0x4b0
 netlink_unicast+0x80f/0x9b0
 netlink_sendmsg+0x813/0xb40
 __sys_sendto+0x672/0x710
 __x64_sys_sendto+0xde/0x100
 do_syscall_64+0x14d/0xf80
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ffadf757917
Code: 48 89 fa 4c 89 df e8 a8 56 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffd0eb79f80 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000555557c4a500 RCX: 00007ffadf757917
RDX: 0000000000000054 RSI: 00007ffae0544670 RDI: 0000000000000003
RBP: 0000000000000001 R08: 00007ffd0eb79fe4 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000000000000 R14: 00007ffae0544670 R15: 0000000000000000
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Sabrina Dubroca 4 weeks, 1 day ago
2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> hsr_slave_0: entered promiscuous mode
> hsr_slave_1: entered promiscuous mode
> ------------[ cut here ]------------
> err == -EMSGSIZE
> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496

I'm not sure this one is caused by this series, but either way,
reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really
unpleasant :/

Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size:
 - IFLA_PARENT_DEV_NAME
 - IFLA_PARENT_DEV_BUS_NAME
   (both from 00e77ed8e64d ("rtnetlink: add
   IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to
   if_nlmsg_size)
 - rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the
   IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size
   only counts the nest, and its caller (rtnl_link_get_size) doesn't
   have anything more about the slave info. This may be what syzbot is
   tripping on here.


But there's a

    + nla_total_size(4) /* IFLA_WEIGHT */

that doesn't get filled anywhere.


> Modules linked in:
> CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
> Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
> RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
> RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
> RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
> RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
> R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
> R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
> FS:  0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
> Call Trace:
>  <TASK>
>  rtnetlink_event+0x1b7/0x270
>  notifier_call_chain+0x1be/0x400
>  netdev_change_features+0x95/0xe0
>  __netdev_upper_dev_link+0xb20/0xc80
>  netdev_upper_dev_link+0xb0/0x100
>  macsec_newlink+0xb11/0x1200
>  rtnl_newlink_create+0x329/0xb70
>  rtnl_newlink+0x1666/0x1be0

-- 
Sabrina
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Hangbin Liu 4 weeks, 1 day ago
On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> > hsr_slave_0: entered promiscuous mode
> > hsr_slave_1: entered promiscuous mode
> > ------------[ cut here ]------------
> > err == -EMSGSIZE
> > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> 
> I'm not sure this one is caused by this series, but either way,


 rtnetlink_event+0x1b7/0x270
 notifier_call_chain+0x1be/0x400
 netdev_change_features+0x95/0xe0
 __netdev_upper_dev_link+0xb20/0xc80
 netdev_upper_dev_link+0xb0/0x100


This patch calls netdev_change_features() after __netdev_upper_dev_link(),
Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
to fill the new link info. Maybe the event is a bit early and macsec has
data not ready?

Thanks
Hangbin

> reviewing if_nlmsg_size/rtnl_fill_ifinfo for mismatches is really
> unpleasant :/
> 
> Things I see in rtnl_fill_ifinfo but don't find in if_nlmsg_size:
>  - IFLA_PARENT_DEV_NAME
>  - IFLA_PARENT_DEV_BUS_NAME
>    (both from 00e77ed8e64d ("rtnetlink: add
>    IFLA_PARENT_[DEV|DEV_BUS]_NAME"), which doesn't include a change to
>    if_nlmsg_size)
>  - rtnl_link_slave_info_fill also outputs IFLA_INFO_SLAVE_KIND + the
>    IFLA_INFO_SLAVE_DATA nest, but rtnl_link_get_slave_info_data_size
>    only counts the nest, and its caller (rtnl_link_get_size) doesn't
>    have anything more about the slave info. This may be what syzbot is
>    tripping on here.
> 
> 
> But there's a
> 
>     + nla_total_size(4) /* IFLA_WEIGHT */
> 
> that doesn't get filled anywhere.
> 
> 
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 6496 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) 
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:rtmsg_ifinfo_build_skb+0x218/0x260
> > Code: f6 ba 01 00 00 00 89 e9 e8 25 15 3a 00 4c 89 f0 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d e9 7f 3a 2e 02 cc e8 49 3b 42 f8 90 <0f> 0b 90 eb 90 89 d9 80 e1 07 fe c1 38 c1 0f 8c 95 fe ff ff 48 89
> > RSP: 0018:ffffc9000637e9a0 EFLAGS: 00010293
> > RAX: ffffffff89835e27 RBX: 0000000000000000 RCX: ffff8881b80a57c0
> > RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
> > RBP: 00000000ffffffa6 R08: 0000000000000004 R09: 0000000000000004
> > R10: fffff52000c6fcdc R11: 0000000000000000 R12: 1ffff110235ddc21
> > R13: 0000000000000000 R14: ffff8881133dc780 R15: ffff88811aeee000
> > FS:  0000555557c4a500(0000) GS:ffff88818de65000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055555e1838c8 CR3: 0000000168b80000 CR4: 00000000000006f0
> > Call Trace:
> >  <TASK>
> >  rtnetlink_event+0x1b7/0x270
> >  notifier_call_chain+0x1be/0x400
> >  netdev_change_features+0x95/0xe0
> >  __netdev_upper_dev_link+0xb20/0xc80
> >  netdev_upper_dev_link+0xb0/0x100
> >  macsec_newlink+0xb11/0x1200
> >  rtnl_newlink_create+0x329/0xb70
> >  rtnl_newlink+0x1666/0x1be0
> 
> -- 
> Sabrina
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Sabrina Dubroca 4 weeks ago
2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> > 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> > > batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> > > hsr_slave_0: entered promiscuous mode
> > > hsr_slave_1: entered promiscuous mode
> > > ------------[ cut here ]------------
> > > err == -EMSGSIZE
> > > WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> > 
> > I'm not sure this one is caused by this series, but either way,
> 
> 
>  rtnetlink_event+0x1b7/0x270
>  notifier_call_chain+0x1be/0x400
>  netdev_change_features+0x95/0xe0
>  __netdev_upper_dev_link+0xb20/0xc80
>  netdev_upper_dev_link+0xb0/0x100
> 
> 
> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> to fill the new link info. Maybe the event is a bit early and macsec has
> data not ready?

But this would still mean that there's a mismatch between
if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
revealing it.

I'll send fixes for the stuff I mentioned, no idea if that's what
syzbot saw since we don't have a repro.

-- 
Sabrina
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Paolo Abeni 3 weeks, 6 days ago
On 3/11/26 10:18 PM, Sabrina Dubroca wrote:
> 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
>> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
>>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
>>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
>>>> hsr_slave_0: entered promiscuous mode
>>>> hsr_slave_1: entered promiscuous mode
>>>> ------------[ cut here ]------------
>>>> err == -EMSGSIZE
>>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
>>>
>>> I'm not sure this one is caused by this series, but either way,
>>
>>
>>  rtnetlink_event+0x1b7/0x270
>>  notifier_call_chain+0x1be/0x400
>>  netdev_change_features+0x95/0xe0
>>  __netdev_upper_dev_link+0xb20/0xc80
>>  netdev_upper_dev_link+0xb0/0x100
>>
>>
>> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
>> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
>> to fill the new link info. Maybe the event is a bit early and macsec has
>> data not ready?
> 
> But this would still mean that there's a mismatch between
> if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> revealing it.
> 
> I'll send fixes for the stuff I mentioned, no idea if that's what
> syzbot saw since we don't have a repro.

It looks like even the nipa CI is reproducing the issue, i.e.:

https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/

more failures here:

https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0

the fail in mascsec-offload looks quite deterministic, could you please
have a look?

/P
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Sabrina Dubroca 3 weeks, 6 days ago
2026-03-12, 10:40:43 +0100, Paolo Abeni wrote:
> On 3/11/26 10:18 PM, Sabrina Dubroca wrote:
> > 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
> >> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> >>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> >>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active
> >>>> hsr_slave_0: entered promiscuous mode
> >>>> hsr_slave_1: entered promiscuous mode
> >>>> ------------[ cut here ]------------
> >>>> err == -EMSGSIZE
> >>>> WARNING: net/core/rtnetlink.c:4421 at rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> >>>
> >>> I'm not sure this one is caused by this series, but either way,
> >>
> >>
> >>  rtnetlink_event+0x1b7/0x270
> >>  notifier_call_chain+0x1be/0x400
> >>  netdev_change_features+0x95/0xe0
> >>  __netdev_upper_dev_link+0xb20/0xc80
> >>  netdev_upper_dev_link+0xb0/0x100
> >>
> >>
> >> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> >> to fill the new link info. Maybe the event is a bit early and macsec has
> >> data not ready?
> > 
> > But this would still mean that there's a mismatch between
> > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> > revealing it.
> > 
> > I'll send fixes for the stuff I mentioned, no idea if that's what
> > syzbot saw since we don't have a repro.
> 
> It looks like even the nipa CI is reproducing the issue, i.e.:
> 
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
> 
> more failures here:
> 
> https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
> 
> the fail in mascsec-offload looks quite deterministic, could you please
> have a look?

Ah crap, sorry Hangbin, you were right.  macsec_fill_info() returns
-EMSGSIZE when the key length is unexpected, and at this point we
haven't set it to its proper value yet.

Bandaid solution would be:

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..0f7ef7bfbdde 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb,
                csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
                break;
        default:
-               goto nla_put_failure;
+               return 0;
        }
 
        if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci,


Proper fix (so that the notification we're sending during
upper_dev_link has full linkinfo) would be to move
netdev_upper_dev_link() to after macsec_changelink_common() and fix up
the error handling. I don't see anything in macsec_add_dev or
macsec_changelink_common that needs the device to be linked. But
anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
on invalid data, so the "bandaid" should be included as well.

Should this be part of this series (either just the "bandaid" or the
"proper fix"+bandaid), since we never saw a problem before?

-- 
Sabrina
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Hangbin Liu 3 weeks, 6 days ago
On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> > >> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> > >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> > >> to fill the new link info. Maybe the event is a bit early and macsec has
> > >> data not ready?
> > > 
> > > But this would still mean that there's a mismatch between
> > > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> > > revealing it.
> > > 
> > > I'll send fixes for the stuff I mentioned, no idea if that's what
> > > syzbot saw since we don't have a repro.
> > 
> > It looks like even the nipa CI is reproducing the issue, i.e.:
> > 
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
> > 
> > more failures here:
> > 
> > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
> > 
> > the fail in mascsec-offload looks quite deterministic, could you please
> > have a look?
> 
> Ah crap, sorry Hangbin, you were right.  macsec_fill_info() returns
> -EMSGSIZE when the key length is unexpected, and at this point we
> haven't set it to its proper value yet.
> 
> Bandaid solution would be:
> 
> diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> index f6cad0746a02..0f7ef7bfbdde 100644
> --- a/drivers/net/macsec.c
> +++ b/drivers/net/macsec.c
> @@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb,
>                 csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 : MACSEC_CIPHER_ID_GCM_AES_256;
>                 break;
>         default:
> -               goto nla_put_failure;
> +               return 0;
>         }
>  
>         if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci,
> 
> 
> Proper fix (so that the notification we're sending during
> upper_dev_link has full linkinfo) would be to move
> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> the error handling. I don't see anything in macsec_add_dev or
> macsec_changelink_common that needs the device to be linked. But

If we move the netdev_upper_dev_link() after macsec_changelink_common(),
we will not goto nla_put_failure via default, right?

> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> on invalid data, so the "bandaid" should be included as well.
> 
> Should this be part of this series (either just the "bandaid" or the
> "proper fix"+bandaid), since we never saw a problem before?

Since macsec need the "bandaid" fix either way. How about you post the
"bandaid" fix to net. And I include the "proper fix" in this series for
net-next?

BTW, here is my draft patch, would you like to review it first?

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..1f8da4c291c6 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev,
 	lockdep_set_class(&dev->addr_list_lock,
 			  &macsec_netdev_addr_lock_key);
 
-	err = netdev_upper_dev_link(real_dev, dev, extack);
-	if (err < 0)
-		goto unregister;
-
 	/* need to be already registered so that ->init has run and
 	 * the MAC addr is set
 	 */
@@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev,
 
 	if (rx_handler && sci_exists(real_dev, sci)) {
 		err = -EBUSY;
-		goto unlink;
+		goto unregister;
 	}
 
 	err = macsec_add_dev(dev, sci, icv_len);
 	if (err)
-		goto unlink;
+		goto unregister;
 
 	if (data) {
 		err = macsec_changelink_common(dev, data);
@@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev,
 			goto del_dev;
 	}
 
+	err = netdev_upper_dev_link(real_dev, dev, extack);
+	if (err < 0)
+		goto unlink;
+
 	/* If h/w offloading is available, propagate to the device */
 	if (macsec_is_offloaded(macsec)) {
 		const struct macsec_ops *ops;
@@ -4200,7 +4200,7 @@ static int macsec_newlink(struct net_device *dev,
 			ctx.secy = &macsec->secy;
 			err = macsec_offload(ops->mdo_add_secy, &ctx);
 			if (err)
-				goto del_dev;
+				goto unlink;
 
 			macsec->insert_tx_tag =
 				macsec_needs_tx_tag(macsec, ops);
@@ -4209,7 +4209,7 @@ static int macsec_newlink(struct net_device *dev,
 
 	err = register_macsec_dev(real_dev, dev);
 	if (err < 0)
-		goto del_dev;
+		goto unlink;
 
 	netdev_update_features(dev);
 	netif_stacked_transfer_operstate(real_dev, dev);
@@ -4219,10 +4219,10 @@ static int macsec_newlink(struct net_device *dev,
 
 	return 0;
 
-del_dev:
-	macsec_del_dev(macsec);
 unlink:
 	netdev_upper_dev_unlink(real_dev, dev);
+del_dev:
+	macsec_del_dev(macsec);
 unregister:
 	unregister_netdevice(dev);
 	return err;

Thanks
Hangbin
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Sabrina Dubroca 3 weeks, 6 days ago
2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> > Proper fix (so that the notification we're sending during
> > upper_dev_link has full linkinfo) would be to move
> > netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> > the error handling. I don't see anything in macsec_add_dev or
> > macsec_changelink_common that needs the device to be linked. But
> 
> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
> we will not goto nla_put_failure via default, right?

Yes.

> > anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> > on invalid data, so the "bandaid" should be included as well.
> > 
> > Should this be part of this series (either just the "bandaid" or the
> > "proper fix"+bandaid), since we never saw a problem before?
> 
> Since macsec need the "bandaid" fix either way. How about you post the
> "bandaid" fix to net. And I include the "proper fix" in this series for
> net-next?

But I don't think it's needed in net. Am I missing a codepath (before
your series) where macsec_fill_info could be called for the new device
before macsec_newlink returns? If not, it doesn't really qualify as a
fix, that's why I was asking Paolo.


> BTW, here is my draft patch, would you like to review it first?
> 
> diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> index f6cad0746a02..1f8da4c291c6 100644
> --- a/drivers/net/macsec.c
> +++ b/drivers/net/macsec.c
> @@ -4161,10 +4161,6 @@ static int macsec_newlink(struct net_device *dev,
>  	lockdep_set_class(&dev->addr_list_lock,
>  			  &macsec_netdev_addr_lock_key);
>  
> -	err = netdev_upper_dev_link(real_dev, dev, extack);
> -	if (err < 0)
> -		goto unregister;
> -
>  	/* need to be already registered so that ->init has run and
>  	 * the MAC addr is set
>  	 */
> @@ -4177,12 +4173,12 @@ static int macsec_newlink(struct net_device *dev,
>  
>  	if (rx_handler && sci_exists(real_dev, sci)) {
>  		err = -EBUSY;
> -		goto unlink;
> +		goto unregister;
>  	}
>  
>  	err = macsec_add_dev(dev, sci, icv_len);
>  	if (err)
> -		goto unlink;
> +		goto unregister;
>  
>  	if (data) {
>  		err = macsec_changelink_common(dev, data);
> @@ -4190,6 +4186,10 @@ static int macsec_newlink(struct net_device *dev,
>  			goto del_dev;
>  	}
>  
> +	err = netdev_upper_dev_link(real_dev, dev, extack);
> +	if (err < 0)
> +		goto unlink;

This one shouldn't be "goto unlink" since we haven't linked yet. I'm
not sure netdev_upper_dev_unlink can handle "not linked yet" devices.

Otherwise this looks ok.

-- 
Sabrina
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Paolo Abeni 3 weeks, 6 days ago

On 3/12/26 4:58 PM, Sabrina Dubroca wrote:
> 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
>> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
>>> Proper fix (so that the notification we're sending during
>>> upper_dev_link has full linkinfo) would be to move
>>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
>>> the error handling. I don't see anything in macsec_add_dev or
>>> macsec_changelink_common that needs the device to be linked. But
>>
>> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
>> we will not goto nla_put_failure via default, right?
> 
> Yes.
> 
>>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
>>> on invalid data, so the "bandaid" should be included as well.
>>>
>>> Should this be part of this series (either just the "bandaid" or the
>>> "proper fix"+bandaid), since we never saw a problem before?
>>
>> Since macsec need the "bandaid" fix either way. How about you post the
>> "bandaid" fix to net. And I include the "proper fix" in this series for
>> net-next?
> 
> But I don't think it's needed in net. Am I missing a codepath (before
> your series) where macsec_fill_info could be called for the new device
> before macsec_newlink returns? If not, it doesn't really qualify as a
> fix, that's why I was asking Paolo.

FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb()
before device initialization, so I would be fine targeting net-next with
the EMSGSIZE-related change.

Side note, it looks like that the WARN() in the rnnetlink code here
helped identifying a real problem and correctly returning 0 when the
key_len is not yet initialized will silence it forever, what about
preserving a warning for this kind of race? something alike:

---
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..82974d4fa3f6 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4337,7 +4337,8 @@ static int macsec_fill_info(struct sk_buff *skb,
 		csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 :
MACSEC_CIPHER_ID_GCM_AES_256;
 		break;
 	default:
-		goto nla_put_failure;
+		WARN_ON_ONCE(1);
+		return 0;
 	}
Re: [syzbot ci] Re: net: move netdev_compute_master_upper_features to ndo_set_features
Posted by Sabrina Dubroca 3 weeks, 6 days ago
2026-03-12, 17:47:45 +0100, Paolo Abeni wrote:
> 
> 
> On 3/12/26 4:58 PM, Sabrina Dubroca wrote:
> > 2026-03-12, 14:34:44 +0000, Hangbin Liu wrote:
> >> On Thu, Mar 12, 2026 at 12:13:52PM +0100, Sabrina Dubroca wrote:
> >>> Proper fix (so that the notification we're sending during
> >>> upper_dev_link has full linkinfo) would be to move
> >>> netdev_upper_dev_link() to after macsec_changelink_common() and fix up
> >>> the error handling. I don't see anything in macsec_add_dev or
> >>> macsec_changelink_common that needs the device to be linked. But
> >>
> >> If we move the netdev_upper_dev_link() after macsec_changelink_common(),
> >> we will not goto nla_put_failure via default, right?
> > 
> > Yes.
> > 
> >>> anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
> >>> on invalid data, so the "bandaid" should be included as well.
> >>>
> >>> Should this be part of this series (either just the "bandaid" or the
> >>> "proper fix"+bandaid), since we never saw a problem before?
> >>
> >> Since macsec need the "bandaid" fix either way. How about you post the
> >> "bandaid" fix to net. And I include the "proper fix" in this series for
> >> net-next?
> > 
> > But I don't think it's needed in net. Am I missing a codepath (before
> > your series) where macsec_fill_info could be called for the new device
> > before macsec_newlink returns? If not, it doesn't really qualify as a
> > fix, that's why I was asking Paolo.
> 
> FWIW, I don't see a codepath calling into rtmsg_ifinfo_build_skb()
> before device initialization, so I would be fine targeting net-next with
> the EMSGSIZE-related change.
> 
> Side note, it looks like that the WARN() in the rnnetlink code here
> helped identifying a real problem and correctly returning 0 when the
> key_len is not yet initialized will silence it forever, what about
> preserving a warning for this kind of race? something alike:

Right, I was thinking about putting a DEBUG_NET_WARN_ON_ONCE there.

-- 
Sabrina