[PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering

Jonas Gorski posted 1 patch 9 months, 3 weeks ago
net/dsa/user.c | 9 +++++++++
1 file changed, 9 insertions(+)
[PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 3 weeks ago
When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
will add VLAN 0 when enabling the device, and remove it on disabling it
again.

But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
dsa_user_manage_vlan_filtering(), user ports that are already enabled
may end up with no VLAN 0 configured, or VLAN 0 left configured.

E.g.the following sequence would leave sw1p1 without VLAN 0 configured:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set br0 up
$ ip link set sw1p1 up (filtering is 0, so no HW filter added)
$ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)

while the following sequence would work:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set br0 up
$ ip link set sw1p1 master br0 (filtering gets set to 1)
$ ip link set sw1p1 up (filtering is 1, HW filter is added)

Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
enabled on a vlan_filtering_is_global dsa switch:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set br0 up
$ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
$ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
$ ip link set sw1p1 nomaster (filtering is reset to 0 again)
$ ip link set sw1p2 down (VLAN 0 filter is left configured)

This even causes untagged traffic to break on b53 after undoing the
bridge (though this is partially caused by b53's own doing).

Fix this by emulating 8021q's vlan_device_event() behavior when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
absence of it doesn't become a red herring.

While vlan_vid_add() has a return value, vlan_device_event() does not
check its return value, so let us do the same.

Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
 net/dsa/user.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/dsa/user.c b/net/dsa/user.c
index 804dc7dac4f2..f7d62523da93 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -1993,7 +1993,16 @@ int dsa_user_manage_vlan_filtering(struct net_device *user,
 			user->features &= ~NETIF_F_HW_VLAN_CTAG_FILTER;
 			return err;
 		}
+
+		if (user->flags & IFF_UP) {
+			pr_info("dsa: adding VLAN 0 to HW filter on device %s\n",
+				user->name);
+			vlan_vid_add(user, htons(ETH_P_8021Q), 0);
+		}
 	} else {
+		if (user->flags & IFF_UP)
+			vlan_vid_del(user, htons(ETH_P_8021Q), 0);
+
 		err = vlan_for_each(user, dsa_user_clear_vlan, user);
 		if (err)
 			return err;
-- 
2.43.0
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> will add VLAN 0 when enabling the device, and remove it on disabling it
> again.
> 
> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> 
> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> 
> $ ip link add br0 type bridge vlan_filtering 1
> $ ip link set br0 up
> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> 
> while the following sequence would work:
> 
> $ ip link add br0 type bridge vlan_filtering 1
> $ ip link set br0 up
> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> 
> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> enabled on a vlan_filtering_is_global dsa switch:
> 
> $ ip link add br0 type bridge vlan_filtering 1
> $ ip link set br0 up
> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> 
> This even causes untagged traffic to break on b53 after undoing the
> bridge (though this is partially caused by b53's own doing).
> 
> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> absence of it doesn't become a red herring.
> 
> While vlan_vid_add() has a return value, vlan_device_event() does not
> check its return value, so let us do the same.
> 
> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
> ---

Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
disabled or be an unloaded module, how does it work in that case?
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Florian Fainelli 9 months, 2 weeks ago

On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
>> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
>> will add VLAN 0 when enabling the device, and remove it on disabling it
>> again.
>>
>> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
>> dsa_user_manage_vlan_filtering(), user ports that are already enabled
>> may end up with no VLAN 0 configured, or VLAN 0 left configured.
>>
>> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
>>
>> $ ip link add br0 type bridge vlan_filtering 1
>> $ ip link set br0 up
>> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
>> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
>>
>> while the following sequence would work:
>>
>> $ ip link add br0 type bridge vlan_filtering 1
>> $ ip link set br0 up
>> $ ip link set sw1p1 master br0 (filtering gets set to 1)
>> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
>>
>> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
>> enabled on a vlan_filtering_is_global dsa switch:
>>
>> $ ip link add br0 type bridge vlan_filtering 1
>> $ ip link set br0 up
>> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
>> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
>> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
>> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
>>
>> This even causes untagged traffic to break on b53 after undoing the
>> bridge (though this is partially caused by b53's own doing).
>>
>> Fix this by emulating 8021q's vlan_device_event() behavior when changing
>> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
>> absence of it doesn't become a red herring.
>>
>> While vlan_vid_add() has a return value, vlan_device_event() does not
>> check its return value, so let us do the same.
>>
>> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
>> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
>> ---
> 
> Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> disabled or be an unloaded module, how does it work in that case?

This is explained in this commit:

https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee

however the case of starting up with CONFIG_VLAN_8021Q and then loading 
the 8021q module was not thought about, arguably I am not sure what sort 
of notification or event we can hook onto in order to react properly to 
that module being loaded. Do you know?
-- 
Florian
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
On Thu, Apr 24, 2025 at 2:34 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> > On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> >> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> >> will add VLAN 0 when enabling the device, and remove it on disabling it
> >> again.
> >>
> >> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> >> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> >> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> >>
> >> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> >>
> >> $ ip link add br0 type bridge vlan_filtering 1
> >> $ ip link set br0 up
> >> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> >> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> >>
> >> while the following sequence would work:
> >>
> >> $ ip link add br0 type bridge vlan_filtering 1
> >> $ ip link set br0 up
> >> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> >> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> >>
> >> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> >> enabled on a vlan_filtering_is_global dsa switch:
> >>
> >> $ ip link add br0 type bridge vlan_filtering 1
> >> $ ip link set br0 up
> >> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> >> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> >> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> >> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> >>
> >> This even causes untagged traffic to break on b53 after undoing the
> >> bridge (though this is partially caused by b53's own doing).
> >>
> >> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> >> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> >> absence of it doesn't become a red herring.
> >>
> >> While vlan_vid_add() has a return value, vlan_device_event() does not
> >> check its return value, so let us do the same.
> >>
> >> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> >> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
> >> ---
> >
> > Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> > disabled or be an unloaded module, how does it work in that case?
>
> This is explained in this commit:
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee
>
> however the case of starting up with CONFIG_VLAN_8021Q and then loading
> the 8021q module was not thought about, arguably I am not sure what sort
> of notification or event we can hook onto in order to react properly to
> that module being loaded. Do you know?

config BRIDGE_VLAN_FILTERING
        bool "VLAN filtering"
        depends on BRIDGE
        depends on VLAN_8021Q

without 8021Q there is no vlan filtering bridge, so filtering can
never be 1, so NETIF_F_HW_VLAN_CTAG_FILTER is never set, so HW filters
for VLAN 0 are never installed or removed, therefore the issue can
never happen.

The issue is only if a vlan filtering bridge was there, and now isn't
anymore, and a previously VLAN 0 HW filter is left intact. This causes
an incomplete vlan entry left programmed in the vlan table of the chip
with just this port as a member, which breaks forwarding for that
VLAN, which is incidentally also the VLAN used for untagged traffic in
the non filtering case.

There are a several issues currently in how b53 handles VLANs,
especially on non filtering bridges. E.g. switchdev.rst says all vlan
configuration should be ignored, but currently it isn't, as b53
dutifully configures any vlans to the hardware passed on from DSA. I
will attempt to fix at a later time, but first I wanted to make sure
that switchdev/dsa/vlan/etc subsystems are working correctly.

Best regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Thu, Apr 24, 2025 at 03:58:50PM +0200, Jonas Gorski wrote:
> On Thu, Apr 24, 2025 at 2:34 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> > > On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> > >> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > >> will add VLAN 0 when enabling the device, and remove it on disabling it
> > >> again.
> > >>
> > >> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > >> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > >> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> > >>
> > >> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> > >> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> > >>
> > >> while the following sequence would work:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> > >> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> > >>
> > >> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> > >> enabled on a vlan_filtering_is_global dsa switch:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> > >> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> > >> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> > >> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> > >>
> > >> This even causes untagged traffic to break on b53 after undoing the
> > >> bridge (though this is partially caused by b53's own doing).
> > >>
> > >> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> > >> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> > >> absence of it doesn't become a red herring.
> > >>
> > >> While vlan_vid_add() has a return value, vlan_device_event() does not
> > >> check its return value, so let us do the same.
> > >>
> > >> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> > >> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
> > >> ---
> > >
> > > Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> > > disabled or be an unloaded module, how does it work in that case?
> >
> > This is explained in this commit:
> >
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee
> >
> > however the case of starting up with CONFIG_VLAN_8021Q and then loading
> > the 8021q module was not thought about, arguably I am not sure what sort
> > of notification or event we can hook onto in order to react properly to
> > that module being loaded. Do you know?
> 
> config BRIDGE_VLAN_FILTERING
>         bool "VLAN filtering"
>         depends on BRIDGE
>         depends on VLAN_8021Q
> 
> without 8021Q there is no vlan filtering bridge, so filtering can
> never be 1, so NETIF_F_HW_VLAN_CTAG_FILTER is never set, so HW filters
> for VLAN 0 are never installed or removed, therefore the issue can
> never happen.

nitpick: except for ds->needs_standalone_vlan_filtering (which b53 does not set though).

> 
> The issue is only if a vlan filtering bridge was there, and now isn't
> anymore, and a previously VLAN 0 HW filter is left intact. This causes
> an incomplete vlan entry left programmed in the vlan table of the chip
> with just this port as a member, which breaks forwarding for that
> VLAN, which is incidentally also the VLAN used for untagged traffic in
> the non filtering case.

Ok, so let's say b53_default_pvid() is 0, and b53_setup() ->
b53_apply_config() -> b53_configure_vlan() calls b53_set_vlan_entry() on
it to set it up, independently of the 8021q layer. So far so good.

But then, the 8021q layer can modify VID 0 on the device from the way in
which it was set up by the driver for VLAN-unaware operation, namely it
can remove it, and this breaks VLAN-unaware reception.

One needs to wonder why would the b53 driver even permit changes coming
from .port_vlan_add() and .port_vlan_del() to a VID it has reserved for
special use. There's nothing to gain from reacting to these operations,
only to lose.

I'm trying to think whether switchdev drivers in general have anything
to benefit from commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated
as "no vlan tag" (802.1p packet)"). I'm tempted to say "thanks for the
well-intended hint about VID 0, but a switchdev's data path is so
complicated that we'd rather manage VID 0 ourselves". Not only do many
drivers reserve VID 0 and thus need to be independent of the 8021q layer
even for VLAN-unaware operation, but also think of this: the bridge may
have vlan_filtering 1 and vlan_default_pvid 0. What will happen if the
8021q layer decides to add VID 0 to the RX filtering table? My logic and
testing with the software data path says that VID 0 traffic should not
be forwarded. My intuition is that it will make b53 accept this kind of
traffic.

Here's a self test I posted exactly for this scenario, if you don't
mind, please let me know what happens, and we'll see where to go from
there. On ocelot, which has commit 9323ac367005 ("net: mscc: ocelot:
ignore VID 0 added by 8021q module"), it passes (but fails elsewhere,
sadly - I've sent a patch also for that).
https://lore.kernel.org/netdev/20250424223734.3096202-2-vladimir.oltean@nxp.com/T/#u

That being said, I don't think we are quite prepared to adopt my
solution (of ignoring VID 0) DSA-wide (especially not as a bug fix),
because it is driver-dependent whether VID 0 is in a conflict with a
VLAN ID reserved for private use or not. Even though adding VID 0 to the
RX filtering table possibly allows its forwarding even when it shouldn't
(and that isn't desirable), there might be some positive benefits as
well - like permitting VID 0 reception when it _should_ be received.

It's a pretty tricky situation, I guess we should first establish what
are the tests that need to pass, then assess on a per-driver basis where
we are. We don't have nearly as much coverage as we would need.
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 12:57 AM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 03:58:50PM +0200, Jonas Gorski wrote:
> > On Thu, Apr 24, 2025 at 2:34 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> > > > On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> > > >> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > > >> will add VLAN 0 when enabling the device, and remove it on disabling it
> > > >> again.
> > > >>
> > > >> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > > >> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > > >> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> > > >>
> > > >> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> > > >> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> > > >>
> > > >> while the following sequence would work:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> > > >> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> > > >>
> > > >> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> > > >> enabled on a vlan_filtering_is_global dsa switch:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> > > >> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> > > >> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> > > >> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> > > >>
> > > >> This even causes untagged traffic to break on b53 after undoing the
> > > >> bridge (though this is partially caused by b53's own doing).
> > > >>
> > > >> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> > > >> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> > > >> absence of it doesn't become a red herring.
> > > >>
> > > >> While vlan_vid_add() has a return value, vlan_device_event() does not
> > > >> check its return value, so let us do the same.
> > > >>
> > > >> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> > > >> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
> > > >> ---
> > > >
> > > > Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> > > > disabled or be an unloaded module, how does it work in that case?
> > >
> > > This is explained in this commit:
> > >
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee
> > >
> > > however the case of starting up with CONFIG_VLAN_8021Q and then loading
> > > the 8021q module was not thought about, arguably I am not sure what sort
> > > of notification or event we can hook onto in order to react properly to
> > > that module being loaded. Do you know?
> >
> > config BRIDGE_VLAN_FILTERING
> >         bool "VLAN filtering"
> >         depends on BRIDGE
> >         depends on VLAN_8021Q
> >
> > without 8021Q there is no vlan filtering bridge, so filtering can
> > never be 1, so NETIF_F_HW_VLAN_CTAG_FILTER is never set, so HW filters
> > for VLAN 0 are never installed or removed, therefore the issue can
> > never happen.
>
> nitpick: except for ds->needs_standalone_vlan_filtering (which b53 does not set though).
>
> >
> > The issue is only if a vlan filtering bridge was there, and now isn't
> > anymore, and a previously VLAN 0 HW filter is left intact. This causes
> > an incomplete vlan entry left programmed in the vlan table of the chip
> > with just this port as a member, which breaks forwarding for that
> > VLAN, which is incidentally also the VLAN used for untagged traffic in
> > the non filtering case.
>
> Ok, so let's say b53_default_pvid() is 0, and b53_setup() ->
> b53_apply_config() -> b53_configure_vlan() calls b53_set_vlan_entry() on
> it to set it up, independently of the 8021q layer. So far so good.
>
> But then, the 8021q layer can modify VID 0 on the device from the way in
> which it was set up by the driver for VLAN-unaware operation, namely it
> can remove it, and this breaks VLAN-unaware reception.
>
> One needs to wonder why would the b53 driver even permit changes coming
> from .port_vlan_add() and .port_vlan_del() to a VID it has reserved for
> special use. There's nothing to gain from reacting to these operations,
> only to lose.
>
> I'm trying to think whether switchdev drivers in general have anything
> to benefit from commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated
> as "no vlan tag" (802.1p packet)"). I'm tempted to say "thanks for the
> well-intended hint about VID 0, but a switchdev's data path is so
> complicated that we'd rather manage VID 0 ourselves". Not only do many
> drivers reserve VID 0 and thus need to be independent of the 8021q layer
> even for VLAN-unaware operation, but also think of this: the bridge may
> have vlan_filtering 1 and vlan_default_pvid 0. What will happen if the
> 8021q layer decides to add VID 0 to the RX filtering table? My logic and
> testing with the software data path says that VID 0 traffic should not
> be forwarded. My intuition is that it will make b53 accept this kind of
> traffic.
>
> Here's a self test I posted exactly for this scenario, if you don't
> mind, please let me know what happens, and we'll see where to go from
> there. On ocelot, which has commit 9323ac367005 ("net: mscc: ocelot:
> ignore VID 0 added by 8021q module"), it passes (but fails elsewhere,
> sadly - I've sent a patch also for that).
> https://lore.kernel.org/netdev/20250424223734.3096202-2-vladimir.oltean@nxp.com/T/#u
>
> That being said, I don't think we are quite prepared to adopt my
> solution (of ignoring VID 0) DSA-wide (especially not as a bug fix),
> because it is driver-dependent whether VID 0 is in a conflict with a
> VLAN ID reserved for private use or not. Even though adding VID 0 to the
> RX filtering table possibly allows its forwarding even when it shouldn't
> (and that isn't desirable), there might be some positive benefits as
> well - like permitting VID 0 reception when it _should_ be received.
>
> It's a pretty tricky situation, I guess we should first establish what
> are the tests that need to pass, then assess on a per-driver basis where
> we are. We don't have nearly as much coverage as we would need.

I gave it a test with a vlan_filtering bridge with no PVID / egress
untagged vlan defined on a pure software bridge, and STP continued to
work fine. So in a sense, VLAN 0 is needed, as we still need to allow
untagged traffic to be received regardless of a PVID egress untagged
VLAN being defined.

But we shouldn't forward it (except to the cpu port) unless it is part
of a PVID egress untagged VLAN. This is the tricky part. If (dsa)
switch drivers ensure that untagged traffic always reaches the cpu
port, then we can ignore VLAN 0.

So I think this boils down to that dsa needs a way to pass on to
drivers whether a VLAN should be forwarded to other members or not
when adding it to a port.

Currently, from a dsa driver perspective, the following two scenarios
would be indistinguishable:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master br0
$ ip link set sw1p2 master br0
$ bridge vlan add dev sw1p1 vid 10
$ bridge vlan add dev sw2p1 vid 10

and

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master br0
$ ip link set sw1p2 master br0
$ ip link add sw1p1.10 link sw1p1 type vlan id 10
$ ip link add sw1p2.10 link sw1p2 type vlan id 10

But in the second case, swp1p1 and sw1p2 should be isolated.

This is because vlan filters and bridge vlans result in the same
port_vlan_add() call, with no way of the driver to tell from where the
call comes from.

And yes, this is something that is probably hard to configure for many
smaller embedded switch chips. E.g. b53 supported switches do not have
forward/flood/etc masks per VLAN, so some cheating/workaround is
needed here. switchdev.rst says to fall back to software forwarding if
there is no other way. I have some ideas, but I will first need to
verify that they work ... .

Regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 09:52:13AM +0200, Jonas Gorski wrote:
> I gave it a test with a vlan_filtering bridge with no PVID / egress
> untagged vlan defined on a pure software bridge, and STP continued to
> work fine.

STP is not part of the bridge data path, it is control path. The PVID
rules don't apply to it.
In software terms, br_handle_frame() returns RX_HANDLER_PASS for it, it
doesn't go through br_handle_vlan().

> So in a sense, VLAN 0 is needed, as we still need to allow
> untagged traffic to be received regardless of a PVID egress untagged
> VLAN being defined.

When we are talking about the hardware data path of a switchdev port,
that is debatable as well, since many switchdevs have built-in packet
traps which again bypass the VLAN table (a function specific to the
switching layer, like learning, STP state etc). I would argue that the
presence of VID 0 in the RX filtering table is irrelevant for STP as far
as switchdevs are concerned.

> But we shouldn't forward it (except to the cpu port) unless it is part
> of a PVID egress untagged VLAN. This is the tricky part. If (dsa)
> switch drivers ensure that untagged traffic always reaches the cpu
> port, then we can ignore VLAN 0.
> 
> So I think this boils down to that dsa needs a way to pass on to
> drivers whether a VLAN should be forwarded to other members or not
> when adding it to a port.

That can be done (add a struct dsa_db argument to port_vlan_add(),
signifying whether it is a port VLAN or a bridge VLAN), but I haven't
come across switches which can make the distinction. It would require
mapping the same VID, coming from different ports, to different hardware
FIDs.

> Currently, from a dsa driver perspective, the following two scenarios
> would be indistinguishable:
> 
> $ ip link add br0 type bridge vlan_filtering 1
> $ ip link set sw1p1 master br0
> $ ip link set sw1p2 master br0
> $ bridge vlan add dev sw1p1 vid 10
> $ bridge vlan add dev sw2p1 vid 10
> 
> and
> 
> $ ip link add br0 type bridge vlan_filtering 1
> $ ip link set sw1p1 master br0
> $ ip link set sw1p2 master br0
> $ ip link add sw1p1.10 link sw1p1 type vlan id 10
> $ ip link add sw1p2.10 link sw1p2 type vlan id 10
> 
> But in the second case, swp1p1 and sw1p2 should be isolated.
> 
> This is because vlan filters and bridge vlans result in the same
> port_vlan_add() call, with no way of the driver to tell from where the
> call comes from.
> 
> And yes, this is something that is probably hard to configure for many
> smaller embedded switch chips. E.g. b53 supported switches do not have
> forward/flood/etc masks per VLAN, so some cheating/workaround is
> needed here. switchdev.rst says to fall back to software forwarding if
> there is no other way. I have some ideas, but I will first need to
> verify that they work ... .

We have insufficient coverage in dsa_user_prechangeupper_sanity_check()
and dsa_port_can_apply_vlan_filtering(), but we should add another
restriction for this: 8021q uppers with the same VID should not be
installed to ports spanning the same VLAN-aware bridge. And there should
be a new test for it in tools/testing/selftests/net/forwarding/no_forwarding.sh.

The restriction can be selectively lifted if there ever appear drivers
which can make the distinction you are talking about, but I don't think
that any of them can, at the moment.
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 1:42 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Fri, Apr 25, 2025 at 09:52:13AM +0200, Jonas Gorski wrote:
> > I gave it a test with a vlan_filtering bridge with no PVID / egress
> > untagged vlan defined on a pure software bridge, and STP continued to
> > work fine.
>
> STP is not part of the bridge data path, it is control path. The PVID
> rules don't apply to it.
> In software terms, br_handle_frame() returns RX_HANDLER_PASS for it, it
> doesn't go through br_handle_vlan().
>
> > So in a sense, VLAN 0 is needed, as we still need to allow
> > untagged traffic to be received regardless of a PVID egress untagged
> > VLAN being defined.
>
> When we are talking about the hardware data path of a switchdev port,
> that is debatable as well, since many switchdevs have built-in packet
> traps which again bypass the VLAN table (a function specific to the
> switching layer, like learning, STP state etc). I would argue that the
> presence of VID 0 in the RX filtering table is irrelevant for STP as far
> as switchdevs are concerned.

For b53, it is not, at least in the current state. LLC packets (and
other reserved multicast) are only redirected to CPU if there is a
PVID VLAN defined. But this isn't unfixable, I just noticed we
explicitly enable that reserved multicast follows VLAN configuration.

> > But we shouldn't forward it (except to the cpu port) unless it is part
> > of a PVID egress untagged VLAN. This is the tricky part. If (dsa)
> > switch drivers ensure that untagged traffic always reaches the cpu
> > port, then we can ignore VLAN 0.
> >
> > So I think this boils down to that dsa needs a way to pass on to
> > drivers whether a VLAN should be forwarded to other members or not
> > when adding it to a port.
>
> That can be done (add a struct dsa_db argument to port_vlan_add(),
> signifying whether it is a port VLAN or a bridge VLAN), but I haven't
> come across switches which can make the distinction. It would require
> mapping the same VID, coming from different ports, to different hardware
> FIDs.

At least for b53, the workaround I see that could work is (for the
filtering case):

1. enable trap of vlan membership violations to cpu (this also
disables learning for those packets)
2. when the first port installs a vlan filter, have a vlan entry with
just the cpu port
3. when the last port removes the filter, remove the cpu port again

The theoretical result should be:

1. if the vlan does not exist on the bridge, and no standalone port
has a vlan upper with it, it will be dropped
2. if the vlan does exist on the bridge, it will be forwarded between
ports that are members of it
3. if there is a vlan upper, that ports traffic will be redirected to
the cpu port

Since all packets will be redirected to the cpu port, and learning is
disabled for these packets, there is no need for a separate FID for
that port/vlan, and it would ensure that vlan uppers are separated
between each other.

This would even work for VLANs that are already configured on the
bridge (though switchdev.rst says that isn't allowed).

The down side would be that any VLAN membership violation for
configured VLANs would be redirected to the cpu port, though it's
probably still less than e.g. switching ports to standalone and do
forwarding in software (which would be an alternative for allowing
multiple VLAN uppers with a vlan aware bridge).

> > Currently, from a dsa driver perspective, the following two scenarios
> > would be indistinguishable:
> >
> > $ ip link add br0 type bridge vlan_filtering 1
> > $ ip link set sw1p1 master br0
> > $ ip link set sw1p2 master br0
> > $ bridge vlan add dev sw1p1 vid 10
> > $ bridge vlan add dev sw2p1 vid 10
> >
> > and
> >
> > $ ip link add br0 type bridge vlan_filtering 1
> > $ ip link set sw1p1 master br0
> > $ ip link set sw1p2 master br0
> > $ ip link add sw1p1.10 link sw1p1 type vlan id 10
> > $ ip link add sw1p2.10 link sw1p2 type vlan id 10
> >
> > But in the second case, swp1p1 and sw1p2 should be isolated.
> >
> > This is because vlan filters and bridge vlans result in the same
> > port_vlan_add() call, with no way of the driver to tell from where the
> > call comes from.
> >
> > And yes, this is something that is probably hard to configure for many
> > smaller embedded switch chips. E.g. b53 supported switches do not have
> > forward/flood/etc masks per VLAN, so some cheating/workaround is
> > needed here. switchdev.rst says to fall back to software forwarding if
> > there is no other way. I have some ideas, but I will first need to
> > verify that they work ... .
>
> We have insufficient coverage in dsa_user_prechangeupper_sanity_check()
> and dsa_port_can_apply_vlan_filtering(), but we should add another
> restriction for this: 8021q uppers with the same VID should not be
> installed to ports spanning the same VLAN-aware bridge. And there should
> be a new test for it in tools/testing/selftests/net/forwarding/no_forwarding.sh.
>
> The restriction can be selectively lifted if there ever appear drivers
> which can make the distinction you are talking about, but I don't think
> that any of them can, at the moment.

AFAICT, we probably then also need to deny any vlan uppers on ports
bridged via vlan unaware bridges for now, as we currently don't
actually configure them at all.

switchdev.rst says "Frames ingressing the device with a VID that is
not programmed into the bridge/switch's VLAN table must be forwarded
and may be processed using a VLAN device (see below)" (I thought with
a vlan unaware bridge we do not program the VLAN table? anway ...)

"- with VLAN filtering turned off, the bridge will process all ingress traffic
  for the port, except for the traffic tagged with a VLAN ID destined for a
  VLAN upper."

This is currently not happening when creating a vlan upper, at least I
don't see any port_vlan_add() (?) calls for that. And even then, it
would be indistinguishable from the port_vlan_add() calls that happen
if you add a vlan to the bridge's vlan table, which you can, and DSA
passes on, but should not be acted upon by the switch until vlan
filtering is turned on.

Also in general vlan unaware bridges feel like they have a lot of
complicated and probably unsupportable setups.

Like for example:

br0 { sw1p1, sw1p2, sw1p3, sw1p4 }
br1 { sw1p1.10, sw1p4.10 }

would AFAIU mean:

- forward all traffic except vlan 10 tagged between all 4 ports
- vlan 10 tagged traffic from sw1p1 can only go to sw1p4
- vlan 10 tagged traffic from sw1p4 can only go to sw1p1
- vlan 10 tagged traffic from sw1p2 and sw1p3 can go to all other ports

I'm not confident that I could program that correctly on a Broadcom
XGS switch, and these support a *lot* of VLAN shenanigans.

This probably leaves:

"    * Treat bridge ports with VLAN upper interfaces as standalone, and let
      forwarding be handled in the software data path."

as the only viable option here (I have to admit I don't really
understand what the first option is suggesting to do).

Best regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Sat, Apr 26, 2025 at 05:44:33PM +0200, Jonas Gorski wrote:
> On Fri, Apr 25, 2025 at 1:42 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > We have insufficient coverage in dsa_user_prechangeupper_sanity_check()
> > and dsa_port_can_apply_vlan_filtering(), but we should add another
> > restriction for this: 8021q uppers with the same VID should not be
> > installed to ports spanning the same VLAN-aware bridge. And there should
> > be a new test for it in tools/testing/selftests/net/forwarding/no_forwarding.sh.
> >
> > The restriction can be selectively lifted if there ever appear drivers
> > which can make the distinction you are talking about, but I don't think
> > that any of them can, at the moment.
> 
> AFAICT, we probably then also need to deny any vlan uppers on ports
> bridged via vlan unaware bridges for now, as we currently don't
> actually configure them at all.

So it seems.

> switchdev.rst says "Frames ingressing the device with a VID that is
> not programmed into the bridge/switch's VLAN table must be forwarded
> and may be processed using a VLAN device (see below)" (I thought with
> a vlan unaware bridge we do not program the VLAN table? anway ...)

I think the sentence just intended to clarify what it means for the port
to be non-filtering, but in the process introduced the ambiguity of
whether VLANs should be committed to hardware tables or not.

To be clear, from the bridge perspective, VLAN filtering = VLAN awareness
(there doesn't exist any option to, say, have per-VLAN learning but not
filtering). I suppose that if the hardware supports mechanisms of being
VLAN-aware but not filtering, and still do VLAN-unaware learning, those
mechanisms can be used as long as the bridging layer expectations are met.

> "- with VLAN filtering turned off, the bridge will process all ingress traffic
>   for the port, except for the traffic tagged with a VLAN ID destined for a
>   VLAN upper."
> 
> This is currently not happening when creating a vlan upper, at least I
> don't see any port_vlan_add() (?) calls for that. And even then, it
> would be indistinguishable from the port_vlan_add() calls that happen
> if you add a vlan to the bridge's vlan table, which you can, and DSA
> passes on, but should not be acted upon by the switch until vlan
> filtering is turned on.
> 
> Also in general vlan unaware bridges feel like they have a lot of
> complicated and probably unsupportable setups.
> 
> Like for example:
> 
> br0 { sw1p1, sw1p2, sw1p3, sw1p4 }
> br1 { sw1p1.10, sw1p4.10 }
> 
> would AFAIU mean:
> 
> - forward all traffic except vlan 10 tagged between all 4 ports
> - vlan 10 tagged traffic from sw1p1 can only go to sw1p4
> - vlan 10 tagged traffic from sw1p4 can only go to sw1p1
> - vlan 10 tagged traffic from sw1p2 and sw1p3 can go to all other ports
> 
> I'm not confident that I could program that correctly on a Broadcom
> XGS switch, and these support a *lot* of VLAN shenanigans.

You can add restrictions for these cases for 'net', as I don't think any
current DSA driver handles this correctly. Then, if you want to take a
stab at offloading these configurations, feel free to do so in net-next.

> This probably leaves:
> 
> "    * Treat bridge ports with VLAN upper interfaces as standalone, and let
>       forwarding be handled in the software data path."
> 
> as the only viable option here

Keep in mind that we haven't experimented with dynamically unoffloading
a bridge port before (calling switchdev_bridge_port_unoffload() at
runtime). That will be necessary when toggling the bridge vlan_filtering
option. It should work, but I can't guarantee that it does. In general,
software bridging was retrofitted onto DSA drivers which also support
bridge offload, and not all of them may work correctly (they may still
have address learning enabled, things like that). I'd only consider that
path as suitable for net-next.

> (I have to admit I don't really understand what the first option is
> suggesting to do).

The first option suggests that you can implement a VLAN-unaware bridge
by mapping the entire 4K VLAN space to a single VLAN within the hardware
switch. It has to be configured specially at the driver level, to always
preserve the VLAN tag (and ID) when egressing the packet. Then exempt
certain VLAN IDs (of 8021q uppers) from this treatment, and include only
the user port and CPU port in their forwarding mask.

The above can also be reinterpreted in the key that packets with a VLAN
ID that isn't in the VLAN database aren't "filtered" but forwarded as-is,
and learned all together. It seems to be similar with your own
suggestion above.

It's just an idea, and requires hardware support to achieve that. This
circles back to the idea that the port can be configured as VLAN "aware"
as long as it's not "filtering" from the bridge perspective, and there
is no address space segregation per VLAN in the FDB. The bridge doesn't
need to be aware of this custom mode of operation, the driver may decide
to transition to it only in the presence of 8021q uppers on top of
VLAN-unaware bridge ports.
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Paolo Abeni 9 months, 2 weeks ago

On 4/22/25 8:49 PM, Jonas Gorski wrote:
> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> will add VLAN 0 when enabling the device, and remove it on disabling it
> again.
> 
> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> may end up with no VLAN 0 configured, or VLAN 0 left configured.

Why this is a problem specifically for dsa and not a generic one? others
devices allow flipping the NETIF_F_HW_VLAN_CTAG_FILTER feature at runtime.

AFAICS dsa_user_manage_vlan_filtering() is currently missing a call to
netdev_update_features(), why is that not sufficient nor necessary?

Thanks,

Paolo
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
Hi,

On Thu, Apr 24, 2025 at 11:15 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
>
> On 4/22/25 8:49 PM, Jonas Gorski wrote:
> > When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > will add VLAN 0 when enabling the device, and remove it on disabling it
> > again.
> >
> > But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > may end up with no VLAN 0 configured, or VLAN 0 left configured.
>
> Why this is a problem specifically for dsa and not a generic one? others
> devices allow flipping the NETIF_F_HW_VLAN_CTAG_FILTER feature at runtime.
>
> AFAICS dsa_user_manage_vlan_filtering() is currently missing a call to
> netdev_update_features(), why is that not sufficient nor necessary?

Good point, I missed that (looked for something like this, but
obviously didn't look hard enough). But checking the flow of it in the
kernel ...

netdev_update_features() for NETIF_F_HW_VLAN_CTAG_FILTER triggers a
NETDEV_CVLAN_FILTER_PUSH_INFO notification, which would then trigger
vlan_filter_push_vids(), which then calls vlan_add_rx_filter_info()
for all configured vlans.

This is more or less identical to what dsa does with its
vlan_for_each(user, dsa_user_restore_vlan, user); call.

And AFAICT it also has the same issue I am trying to fix here, that it
does not install a VLAN 0 filter for devices that are already up,
which it would have if the device had NETIF_F_HW_VLAN_CTAG_FILTER set
when was the device was enabled (and vice versa on on down/remove).

So I guess the course of action for a V2 is fixing this in the core
vlan code and make vlan_filter_push_vids() / vlan_filter_drop_vids()
take care of the VLAN 0 filter as well, and then make dsa use
netdev_update_features() to simplify the code as well.

Does that sound reasonable?

Best Regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
On Thu, Apr 24, 2025 at 11:50 AM Jonas Gorski <jonas.gorski@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 24, 2025 at 11:15 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> >
> >
> > On 4/22/25 8:49 PM, Jonas Gorski wrote:
> > > When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > > will add VLAN 0 when enabling the device, and remove it on disabling it
> > > again.
> > >
> > > But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > > dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > > may end up with no VLAN 0 configured, or VLAN 0 left configured.
> >
> > Why this is a problem specifically for dsa and not a generic one? others
> > devices allow flipping the NETIF_F_HW_VLAN_CTAG_FILTER feature at runtime.
> >
> > AFAICS dsa_user_manage_vlan_filtering() is currently missing a call to
> > netdev_update_features(), why is that not sufficient nor necessary?
>
> Good point, I missed that (looked for something like this, but
> obviously didn't look hard enough). But checking the flow of it in the
> kernel ...
>
> netdev_update_features() for NETIF_F_HW_VLAN_CTAG_FILTER triggers a
> NETDEV_CVLAN_FILTER_PUSH_INFO notification, which would then trigger
> vlan_filter_push_vids(), which then calls vlan_add_rx_filter_info()
> for all configured vlans.
>
> This is more or less identical to what dsa does with its
> vlan_for_each(user, dsa_user_restore_vlan, user); call.
>
> And AFAICT it also has the same issue I am trying to fix here, that it
> does not install a VLAN 0 filter for devices that are already up,
> which it would have if the device had NETIF_F_HW_VLAN_CTAG_FILTER set
> when was the device was enabled (and vice versa on on down/remove).
>
> So I guess the course of action for a V2 is fixing this in the core
> vlan code and make vlan_filter_push_vids() / vlan_filter_drop_vids()
> take care of the VLAN 0 filter as well, and then make dsa use
> netdev_update_features() to simplify the code as well.
>
> Does that sound reasonable?

After looking into it a bit more, netdev_update_features() does not
relay any success or failure, so there is no way for DSA to know if it
succeded or not. And there are places where we temporarily want to
undo all configured vlans, which makes it hard to do via
netdev_update_features().

Not sure anymore if this is a good way forward, especially if it is
just meant to fix a corner case. @Vladimir, what do you think?

I'd probably rather go forward with the current fix (+ apply it as
well for the vlan core code), and do the conversion to
netdev_update_features() at later time, since I see potential for
unexpected breakage.

Best regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 09:30:17AM +0200, Jonas Gorski wrote:
> After looking into it a bit more, netdev_update_features() does not
> relay any success or failure, so there is no way for DSA to know if it
> succeded or not. And there are places where we temporarily want to
> undo all configured vlans, which makes it hard to do via
> netdev_update_features().
> 
> Not sure anymore if this is a good way forward, especially if it is
> just meant to fix a corner case. @Vladimir, what do you think?
> 
> I'd probably rather go forward with the current fix (+ apply it as
> well for the vlan core code), and do the conversion to
> netdev_update_features() at later time, since I see potential for
> unexpected breakage.
> 
> Best regards,
> Jonas

I see the inconsistency you're trying to fix, but I'm still wondering
whether it is the fix that b53 requires, given the fact that it doesn't
seem to otherwise depend on 8021q to set up or modify VID 0. I would say
I don't yet have a fully developed opinion and I am waiting for you to
provide the result of the modified bridge_vlan_aware selftest,
specifically drop_untagged().
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Jonas Gorski 9 months, 2 weeks ago
On Fri, Apr 25, 2025 at 9:51 AM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 09:30:17AM +0200, Jonas Gorski wrote:
> > After looking into it a bit more, netdev_update_features() does not
> > relay any success or failure, so there is no way for DSA to know if it
> > succeded or not. And there are places where we temporarily want to
> > undo all configured vlans, which makes it hard to do via
> > netdev_update_features().
> >
> > Not sure anymore if this is a good way forward, especially if it is
> > just meant to fix a corner case. @Vladimir, what do you think?
> >
> > I'd probably rather go forward with the current fix (+ apply it as
> > well for the vlan core code), and do the conversion to
> > netdev_update_features() at later time, since I see potential for
> > unexpected breakage.
> >
> > Best regards,
> > Jonas
>
> I see the inconsistency you're trying to fix, but I'm still wondering
> whether it is the fix that b53 requires, given the fact that it doesn't
> seem to otherwise depend on 8021q to set up or modify VID 0. I would say
> I don't yet have a fully developed opinion and I am waiting for you to
> provide the result of the modified bridge_vlan_aware selftest,
> specifically drop_untagged().

It does need a lot more fixes on top of that. With this patch applied:

TEST: Reception of 802.1p-tagged traffic                            [ OK ]
TEST: Dropping of untagged and 802.1p-tagged traffic with no PVID   [FAIL]
        802.1p-tagged reception succeeded, but should have failed

The latter is no surprise, since b53 does not handle non filtering
bridges correctly, or toggling filtering at runtime.

I fixed most issues I found in b53 and it now succeeds in WIP code I
have (and most other tests from there).

One thing I struggled a bit is that the second test tests four
different scenarios, but only has one generic failure message, so a
failure does not tell which of the four setups failed.

The issues I fixed so far locally:

1. b53 programs the vlan table based on bridge vlans regardless if
filtering is on or not
2. b53 allows vlan 0 to be modified from
dsa_switch_ops::port_vlan_{add,remove} for bridged ports
3. b53 adds vlan 0 to a port when it leaves a bridge, but does not
remove it on join
4. b53 does not handle switching a vlan from pvid to non-pvid
5. stp (and other reserved multicast) requires a PVID vlan.

This makes especially non-filtering bridges not work as expected, or
the switch in any way after adding and then removing a filtering
bridge.

Best regards,
Jonas
Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering
Posted by Vladimir Oltean 9 months, 2 weeks ago
On Sat, Apr 26, 2025 at 05:48:26PM +0200, Jonas Gorski wrote:
> It does need a lot more fixes on top of that. With this patch applied:
> 
> TEST: Reception of 802.1p-tagged traffic                            [ OK ]
> TEST: Dropping of untagged and 802.1p-tagged traffic with no PVID   [FAIL]
>         802.1p-tagged reception succeeded, but should have failed
> 
> The latter is no surprise, since b53 does not handle non filtering
> bridges correctly, or toggling filtering at runtime.
> 
> I fixed most issues I found in b53 and it now succeeds in WIP code I
> have (and most other tests from there).
> 
> One thing I struggled a bit is that the second test tests four
> different scenarios, but only has one generic failure message, so a
> failure does not tell which of the four setups failed.
> 
> The issues I fixed so far locally:
> 
> 1. b53 programs the vlan table based on bridge vlans regardless if
> filtering is on or not
> 2. b53 allows vlan 0 to be modified from
> dsa_switch_ops::port_vlan_{add,remove} for bridged ports
> 3. b53 adds vlan 0 to a port when it leaves a bridge, but does not
> remove it on join
> 4. b53 does not handle switching a vlan from pvid to non-pvid
> 5. stp (and other reserved multicast) requires a PVID vlan.
> 
> This makes especially non-filtering bridges not work as expected, or
> the switch in any way after adding and then removing a filtering
> bridge.

I'll admit that I'm not very familiar with the b53 driver and that my
attention span has been quite short (and the passage of the weekend has
not helped). Please post patches explaining clearly where we are and
I'll try to follow along.