drivers/net/dsa/qca8k.c | 709 +++++++++++++++++++++++++++++++++--- drivers/net/dsa/qca8k.h | 46 ++- include/linux/dsa/tag_qca.h | 82 +++++ include/net/dsa.h | 17 + net/dsa/dsa2.c | 74 +++- net/dsa/dsa_priv.h | 13 + net/dsa/slave.c | 32 ++ net/dsa/switch.c | 15 + net/dsa/tag_qca.c | 83 +++-- 9 files changed, 993 insertions(+), 78 deletions(-) create mode 100644 include/linux/dsa/tag_qca.h
Hi, this is ready but require some additional test on a wider userbase.
The main reason for this is that we notice some routing problem in the
switch and it seems assisted learning is needed. Considering mdio is
quite slow due to the indirect write using this Ethernet alternative way
seems to be quicker.
The qca8k switch supports a special way to pass mdio read/write request
using specially crafted Ethernet packet.
This works by putting some defined data in the Ethernet header where the
mac source and dst should be placed. The Ethernet type header is set to qca
header and is set to a mdio read/write type.
This is used to communicate to the switch that this is a special packet
and should be parsed differently.
Currently we use Ethernet packet for
- MIB counter
- mdio read/write configuration
- phy read/write for each port
Current implementation of this use completion API to wait for the packet
to be processed by the tagger and has a timeout that fallback to the
legacy mdio way and mutex to enforce one transaction at time.
We now have connect()/disconnect() ops for the tagger. They are used to
allocate priv data in the dsa priv. The header still has to be put in
global include to make it usable by a dsa driver.
They are called when the tag is connect to the dst and the data is freed
using discconect on tagger change.
(if someone wonder why the bind function is put at in the general setup
function it's because tag is set in the cpu port where the notifier is
still not available and we require the notifier to sen the
tag_proto_connect() event.
We now have a tag_proto_connect() for the dsa driver used to put
additional data in the tagger priv (that is actually the dsa priv).
This is called using a switch event DSA_NOTIFIER_TAG_PROTO_CONNECT.
Current use for this is adding handler for the Ethernet packet to keep
the tagger code as dumb as possible.
The tagger priv implement only the handler for the special packet. All the
other stuff is placed in the qca8k_priv and the tagger has to access
it under lock.
We use the new API from Vladimir to track if the master port is
operational or not. We had to track many thing to reach a usable state.
Checking if the port is UP is not enough and tracking a NETDEV_CHANGE is
also not enough since it use also for other task. The correct way was
both track for interface UP and if a qdisc was assigned to the
interface. That tells us the port (and the tagger indirectly) is ready
to accept and process packet.
I tested this with multicpu port and with port6 set as the unique port and
it's sad.
It seems they implemented this feature in a bad way and this is only
supported with cpu port0. When cpu port6 is the unique port, the switch
doesn't send ack packet. With multicpu port, packet ack are not duplicated
and only cpu port0 sends them. This is the same for the MIB counter.
For this reason this feature is enabled only when cpu port0 is enabled and
operational.
Sorry if I missed any comments. This series is 2 month old so I think it
would be good to recheck everything. In the mean time this was tested on
and no regression are observed related to random port drop.
v7:
- Rebase on net-next changes
- Add bulk patches to speedup this even more
v6:
- Fix some error in ethtool handler caused by rebase/cleanup
v5:
- Adapt to new API fixes
- Fix a wrong logic for noop
- Add additional lock for master_state change
- Limit mdio Ethernet to cpu port0 (switch limitation)
- Add priority to these special packet
- Move mdio cache to qca8k_priv
v4:
- Remove duplicate patch sent by mistake.
v3:
- Include MIB with Ethernet packet.
- Include phy read/write with Ethernet packet.
- Reorganize code with new API.
- Introuce master tracking by Vladimir
v2:
- Address all suggestion from Vladimir.
Try to generilize this with connect/disconnect function from the
tagger and tag_proto_connect for the driver.
Ansuel Smith (14):
net: dsa: tag_qca: convert to FIELD macro
net: dsa: tag_qca: move define to include linux/dsa
net: dsa: tag_qca: enable promisc_on_master flag
net: dsa: tag_qca: add define for handling mgmt Ethernet packet
net: dsa: tag_qca: add define for handling MIB packet
net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet
packet
net: dsa: qca8k: add tracking state of master port
net: dsa: qca8k: add support for mgmt read/write in Ethernet packet
net: dsa: qca8k: add support for mib autocast in Ethernet packet
net: dsa: qca8k: add support for phy read/write with mgmt Ethernet
net: dsa: qca8k: move page cache to driver priv
net: dsa: qca8k: cache lo and hi for mdio write
net: da: qca8k: add support for larger read/write size with mgmt
Ethernet
net: dsa: qca8k: introduce qca8k_bulk_read/write function
Vladimir Oltean (2):
net: dsa: provide switch operations for tracking the master state
net: dsa: replay master state events in
dsa_tree_{setup,teardown}_master
drivers/net/dsa/qca8k.c | 709 +++++++++++++++++++++++++++++++++---
drivers/net/dsa/qca8k.h | 46 ++-
include/linux/dsa/tag_qca.h | 82 +++++
include/net/dsa.h | 17 +
net/dsa/dsa2.c | 74 +++-
net/dsa/dsa_priv.h | 13 +
net/dsa/slave.c | 32 ++
net/dsa/switch.c | 15 +
net/dsa/tag_qca.c | 83 +++--
9 files changed, 993 insertions(+), 78 deletions(-)
create mode 100644 include/linux/dsa/tag_qca.h
--
2.33.1
On Sun, Jan 23, 2022 at 02:33:21AM +0100, Ansuel Smith wrote:
> Hi, this is ready but require some additional test on a wider userbase.
>
> The main reason for this is that we notice some routing problem in the
> switch and it seems assisted learning is needed. Considering mdio is
> quite slow due to the indirect write using this Ethernet alternative way
> seems to be quicker.
>
> The qca8k switch supports a special way to pass mdio read/write request
> using specially crafted Ethernet packet.
> This works by putting some defined data in the Ethernet header where the
> mac source and dst should be placed. The Ethernet type header is set to qca
> header and is set to a mdio read/write type.
> This is used to communicate to the switch that this is a special packet
> and should be parsed differently.
>
> Currently we use Ethernet packet for
> - MIB counter
> - mdio read/write configuration
> - phy read/write for each port
>
> Current implementation of this use completion API to wait for the packet
> to be processed by the tagger and has a timeout that fallback to the
> legacy mdio way and mutex to enforce one transaction at time.
>
> We now have connect()/disconnect() ops for the tagger. They are used to
> allocate priv data in the dsa priv. The header still has to be put in
> global include to make it usable by a dsa driver.
> They are called when the tag is connect to the dst and the data is freed
> using discconect on tagger change.
>
> (if someone wonder why the bind function is put at in the general setup
> function it's because tag is set in the cpu port where the notifier is
> still not available and we require the notifier to sen the
> tag_proto_connect() event.
>
> We now have a tag_proto_connect() for the dsa driver used to put
> additional data in the tagger priv (that is actually the dsa priv).
> This is called using a switch event DSA_NOTIFIER_TAG_PROTO_CONNECT.
> Current use for this is adding handler for the Ethernet packet to keep
> the tagger code as dumb as possible.
>
> The tagger priv implement only the handler for the special packet. All the
> other stuff is placed in the qca8k_priv and the tagger has to access
> it under lock.
>
> We use the new API from Vladimir to track if the master port is
> operational or not. We had to track many thing to reach a usable state.
> Checking if the port is UP is not enough and tracking a NETDEV_CHANGE is
> also not enough since it use also for other task. The correct way was
> both track for interface UP and if a qdisc was assigned to the
> interface. That tells us the port (and the tagger indirectly) is ready
> to accept and process packet.
>
> I tested this with multicpu port and with port6 set as the unique port and
> it's sad.
> It seems they implemented this feature in a bad way and this is only
> supported with cpu port0. When cpu port6 is the unique port, the switch
> doesn't send ack packet. With multicpu port, packet ack are not duplicated
> and only cpu port0 sends them. This is the same for the MIB counter.
> For this reason this feature is enabled only when cpu port0 is enabled and
> operational.
>
>
>
>
> Sorry if I missed any comments. This series is 2 month old so I think it
> would be good to recheck everything. In the mean time this was tested on
> and no regression are observed related to random port drop.
>
> v7:
> - Rebase on net-next changes
> - Add bulk patches to speedup this even more
> v6:
> - Fix some error in ethtool handler caused by rebase/cleanup
> v5:
> - Adapt to new API fixes
> - Fix a wrong logic for noop
> - Add additional lock for master_state change
> - Limit mdio Ethernet to cpu port0 (switch limitation)
> - Add priority to these special packet
> - Move mdio cache to qca8k_priv
> v4:
> - Remove duplicate patch sent by mistake.
> v3:
> - Include MIB with Ethernet packet.
> - Include phy read/write with Ethernet packet.
> - Reorganize code with new API.
> - Introuce master tracking by Vladimir
> v2:
> - Address all suggestion from Vladimir.
> Try to generilize this with connect/disconnect function from the
> tagger and tag_proto_connect for the driver.
>
> Ansuel Smith (14):
> net: dsa: tag_qca: convert to FIELD macro
> net: dsa: tag_qca: move define to include linux/dsa
> net: dsa: tag_qca: enable promisc_on_master flag
> net: dsa: tag_qca: add define for handling mgmt Ethernet packet
> net: dsa: tag_qca: add define for handling MIB packet
> net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet
> packet
> net: dsa: qca8k: add tracking state of master port
> net: dsa: qca8k: add support for mgmt read/write in Ethernet packet
> net: dsa: qca8k: add support for mib autocast in Ethernet packet
> net: dsa: qca8k: add support for phy read/write with mgmt Ethernet
> net: dsa: qca8k: move page cache to driver priv
> net: dsa: qca8k: cache lo and hi for mdio write
> net: da: qca8k: add support for larger read/write size with mgmt
> Ethernet
> net: dsa: qca8k: introduce qca8k_bulk_read/write function
>
> Vladimir Oltean (2):
> net: dsa: provide switch operations for tracking the master state
> net: dsa: replay master state events in
> dsa_tree_{setup,teardown}_master
>
> drivers/net/dsa/qca8k.c | 709 +++++++++++++++++++++++++++++++++---
> drivers/net/dsa/qca8k.h | 46 ++-
> include/linux/dsa/tag_qca.h | 82 +++++
> include/net/dsa.h | 17 +
> net/dsa/dsa2.c | 74 +++-
> net/dsa/dsa_priv.h | 13 +
> net/dsa/slave.c | 32 ++
> net/dsa/switch.c | 15 +
> net/dsa/tag_qca.c | 83 +++--
> 9 files changed, 993 insertions(+), 78 deletions(-)
> create mode 100644 include/linux/dsa/tag_qca.h
>
> --
> 2.33.1
>
Hi,
sorry for the delay in sending v8, it's ready but I'm far from home and
I still need to check some mdio improvement with pointer handling.
Anyway I have some concern aboutall the skb alloc.
I wonder if that part can be improved at the cost of some additional
space used.
The idea Is to use the cache stuff also for the eth skb (or duplicate
it?) And use something like build_skb and recycle the skb space
everytime...
This comes from the fact that packet size is ALWAYS the same and it
seems stupid to allocate and free it everytime. Considering we also
enforce a one way transaction (we send packet and we wait for response)
this makes the allocation process even more stupid.
So I wonder if we would have some perf improvement/less load by
declaring the mgmt eth space and build an skb that always use that
preallocate space and just modify data.
I would really love some feedback considering qca8k is also used in very
low spec ath79 device where we need to reduce the load in every way
possible. Also if anyone have more ideas on how to improve this to make
it less heavy cpu side, feel free to point it out even if it would
mean that my implemenation is complete sh*t.
(The use of caching the address would permit us to reduce the write to
this preallocated space even more or ideally to send the same skb)
--
Ansuel
On 1/30/2022 5:59 AM, Ansuel Smith wrote: >> > > Hi, > sorry for the delay in sending v8, it's ready but I'm far from home and > I still need to check some mdio improvement with pointer handling. > > Anyway I have some concern aboutall the skb alloc. > I wonder if that part can be improved at the cost of some additional > space used. > > The idea Is to use the cache stuff also for the eth skb (or duplicate > it?) And use something like build_skb and recycle the skb space > everytime... > This comes from the fact that packet size is ALWAYS the same and it > seems stupid to allocate and free it everytime. Considering we also > enforce a one way transaction (we send packet and we wait for response) > this makes the allocation process even more stupid. > > So I wonder if we would have some perf improvement/less load by > declaring the mgmt eth space and build an skb that always use that > preallocate space and just modify data. > > I would really love some feedback considering qca8k is also used in very > low spec ath79 device where we need to reduce the load in every way > possible. Also if anyone have more ideas on how to improve this to make > it less heavy cpu side, feel free to point it out even if it would > mean that my implemenation is complete sh*t. > > (The use of caching the address would permit us to reduce the write to > this preallocated space even more or ideally to send the same skb) I would say first things first: get this patch series included since it is very close from being suitable for inclusion in net-next. Then you can profile the I/O accesses over the management Ethernet frames and devise a strategy to optimize them to make as little CPU cycles intensive as possible. build_skb() is not exactly a magic bullet that will solve all performance problems, you still need the non-data portion of the skb to be allocated, and also keep in mind that you need tail room at the end of the data buffer in order for struct skb_shared_info to be written. This means that the hardware is not allowed to write at the end of the data buffer, or you must reduce the maximum RX length accordingly to prevent that. Your frames are small enough here this is unlikely to be an issue. Since the MDIO layer does not really allow more than one outstanding transaction per MDIO device at a time, you might be just fine with just have a front and back skb set of buffers and alternating between these two. -- Florian
On Sun, Jan 30, 2022 at 09:07:16AM -0800, Florian Fainelli wrote: > > > On 1/30/2022 5:59 AM, Ansuel Smith wrote: > > > > > > > Hi, > > sorry for the delay in sending v8, it's ready but I'm far from home and > > I still need to check some mdio improvement with pointer handling. > > > > Anyway I have some concern aboutall the skb alloc. > > I wonder if that part can be improved at the cost of some additional > > space used. > > > > The idea Is to use the cache stuff also for the eth skb (or duplicate > > it?) And use something like build_skb and recycle the skb space > > everytime... > > This comes from the fact that packet size is ALWAYS the same and it > > seems stupid to allocate and free it everytime. Considering we also > > enforce a one way transaction (we send packet and we wait for response) > > this makes the allocation process even more stupid. > > > > So I wonder if we would have some perf improvement/less load by > > declaring the mgmt eth space and build an skb that always use that > > preallocate space and just modify data. > > > > I would really love some feedback considering qca8k is also used in very > > low spec ath79 device where we need to reduce the load in every way > > possible. Also if anyone have more ideas on how to improve this to make > > it less heavy cpu side, feel free to point it out even if it would > > mean that my implemenation is complete sh*t. > > > > (The use of caching the address would permit us to reduce the write to > > this preallocated space even more or ideally to send the same skb) > > I would say first things first: get this patch series included since it is > very close from being suitable for inclusion in net-next. Then you can > profile the I/O accesses over the management Ethernet frames and devise a > strategy to optimize them to make as little CPU cycles intensive as > possible. > Don't know if it's correct to continue this disccusion here. > build_skb() is not exactly a magic bullet that will solve all performance > problems, you still need the non-data portion of the skb to be allocated, > and also keep in mind that you need tail room at the end of the data buffer > in order for struct skb_shared_info to be written. This means that the > hardware is not allowed to write at the end of the data buffer, or you must > reduce the maximum RX length accordingly to prevent that. Your frames are > small enough here this is unlikely to be an issue. > I did some test with a build_skb() implemenation and I just discovered that It wouldn't work... Problem of build_skb() is that the driver will release the data and that's exactly what I want to skip (one allocated memory space that is reused for every skb) Wonder if it would be acceptable to allocate a skb when master became operational and use always that. When this preallocated skb has to be used, the required data is changed and the users of the skb is increased so that it's not free. In theory all the skb shared data and head should be the same as what changes of the packet is just the data and nothing else. It looks like an hack but that is the only way I found to skip the skb_free when the packet is processed. (increasing the skb users) > Since the MDIO layer does not really allow more than one outstanding > transaction per MDIO device at a time, you might be just fine with just have > a front and back skb set of buffers and alternating between these two. Another way as you suggested would be have 2 buffer and use build_skb to use build the sbk around the allocated buffer. But still my main concern is if the use of manually increasing the skb user is accepted to skip any skb free from happening. Hope I'm not too annoying with these kind of question. > -- > Florian -- Ansuel
On Thu, Feb 03, 2022 at 06:59:13PM +0100, Ansuel Smith wrote: > On Sun, Jan 30, 2022 at 09:07:16AM -0800, Florian Fainelli wrote: > > On 1/30/2022 5:59 AM, Ansuel Smith wrote: > > > Hi, > > > sorry for the delay in sending v8, it's ready but I'm far from home and > > > I still need to check some mdio improvement with pointer handling. > > > > > > Anyway I have some concern aboutall the skb alloc. > > > I wonder if that part can be improved at the cost of some additional > > > space used. > > > > > > The idea Is to use the cache stuff also for the eth skb (or duplicate > > > it?) And use something like build_skb and recycle the skb space > > > everytime... > > > This comes from the fact that packet size is ALWAYS the same and it > > > seems stupid to allocate and free it everytime. Considering we also > > > enforce a one way transaction (we send packet and we wait for response) > > > this makes the allocation process even more stupid. > > > > > > So I wonder if we would have some perf improvement/less load by > > > declaring the mgmt eth space and build an skb that always use that > > > preallocate space and just modify data. > > > > > > I would really love some feedback considering qca8k is also used in very > > > low spec ath79 device where we need to reduce the load in every way > > > possible. Also if anyone have more ideas on how to improve this to make > > > it less heavy cpu side, feel free to point it out even if it would > > > mean that my implemenation is complete sh*t. > > > > > > (The use of caching the address would permit us to reduce the write to > > > this preallocated space even more or ideally to send the same skb) > > > > I would say first things first: get this patch series included since it is > > very close from being suitable for inclusion in net-next. Then you can > > profile the I/O accesses over the management Ethernet frames and devise a > > strategy to optimize them to make as little CPU cycles intensive as > > possible. > > > > Don't know if it's correct to continue this disccusion here. > > > build_skb() is not exactly a magic bullet that will solve all performance > > problems, you still need the non-data portion of the skb to be allocated, > > and also keep in mind that you need tail room at the end of the data buffer > > in order for struct skb_shared_info to be written. This means that the > > hardware is not allowed to write at the end of the data buffer, or you must > > reduce the maximum RX length accordingly to prevent that. Your frames are > > small enough here this is unlikely to be an issue. > > > > I did some test with a build_skb() implemenation and I just discovered > that It wouldn't work... Problem of build_skb() is that the driver will > release the data and that's exactly what I want to skip (one allocated > memory space that is reused for every skb) > > Wonder if it would be acceptable to allocate a skb when master became > operational and use always that. > When this preallocated skb has to be used, the required data is changed > and the users of the skb is increased so that it's not free. In theory > all the skb shared data and head should be the same as what changes of > the packet is just the data and nothing else. > It looks like an hack but that is the only way I found to skip the > skb_free when the packet is processed. (increasing the skb users) > > > Since the MDIO layer does not really allow more than one outstanding > > transaction per MDIO device at a time, you might be just fine with just have > > a front and back skb set of buffers and alternating between these two. > > Another way as you suggested would be have 2 buffer and use build_skb to > use build the sbk around the allocated buffer. But still my main concern > is if the use of manually increasing the skb user is accepted to skip > any skb free from happening. > > Hope I'm not too annoying with these kind of question. To my knowledge, when you call dev_queue_xmit(), the skb is no longer yours, end of story, it doesn't matter whether you increase the refcount on it or not. The DSA master may choose to do whatever it wishes with that buffer after its TX completion interrupt fires: it may not call napi_consume_skb() but directly recycle that buffer in its pool of RX buffers, as part of some weird buffer recycling scheme. So you'll think that the buffer is yours, but it isn't, because the driver hasn't returned it to the allocator, and your writes for the next packet may be concurrent with some RX DMA transactions. I don't have a mainline example to give you, but I've seen the pattern, and I don't think it's illegal (although of course, I stand to be corrected if necessary).
On Thu, 3 Feb 2022 20:21:28 +0200 Vladimir Oltean wrote: > To my knowledge, when you call dev_queue_xmit(), the skb is no longer > yours, end of story, it doesn't matter whether you increase the refcount > on it or not. The DSA master may choose to do whatever it wishes with > that buffer after its TX completion interrupt fires: it may not call > napi_consume_skb() but directly recycle that buffer in its pool of RX > buffers, as part of some weird buffer recycling scheme. So you'll think > that the buffer is yours, but it isn't, because the driver hasn't > returned it to the allocator, and your writes for the next packet may be > concurrent with some RX DMA transactions. I don't have a mainline > example to give you, but I've seen the pattern, and I don't think it's > illegal (although of course, I stand to be corrected if necessary). Are we talking about holding onto the Tx skb here or also recycling the Rx one? Sorry for another out of context comment in advance.. AFAIK in theory shared skbs are supposed to be untouched or unshared explicitly by the driver on Tx. pktgen takes advantage of it. We have IFF_TX_SKB_SHARING. In practice everyone gets opted into SKB_SHARING because ether_setup() sets the flag. A lot of drivers are not aware of the requirement and will assume full ownership (and for example use skb->cb[]) :/ I don't think there is any Tx completion -> Rx pool recycling scheme inside the drivers (if that's what you described).
On Thu, Feb 03, 2022 at 12:10:27PM -0800, Jakub Kicinski wrote: > On Thu, 3 Feb 2022 20:21:28 +0200 Vladimir Oltean wrote: > > To my knowledge, when you call dev_queue_xmit(), the skb is no longer > > yours, end of story, it doesn't matter whether you increase the refcount > > on it or not. The DSA master may choose to do whatever it wishes with > > that buffer after its TX completion interrupt fires: it may not call > > napi_consume_skb() but directly recycle that buffer in its pool of RX > > buffers, as part of some weird buffer recycling scheme. So you'll think > > that the buffer is yours, but it isn't, because the driver hasn't > > returned it to the allocator, and your writes for the next packet may be > > concurrent with some RX DMA transactions. I don't have a mainline > > example to give you, but I've seen the pattern, and I don't think it's > > illegal (although of course, I stand to be corrected if necessary). > > Are we talking about holding onto the Tx skb here or also recycling > the Rx one? Sorry for another out of context comment in advance.. > Here we are talking about tx skb. (We can't really operate on the received skb handled by the tagger) What I want to improve is the fact that we alloc 3 very small skb that will have only some part of them modified and the rest equal on all 3 skb. So my idea is find a way to preallocate a space and build a skb around it. In short as we force a 1:1 comunication where we send an skb and we wait for the response to be processed, we can reuse the same skb everytime and just modify the data in it. I'm asking if anyone have some hint/direction without proposing a patch that is a big massive hack that will be NACK in 0.1 ms LOL > AFAIK in theory shared skbs are supposed to be untouched or unshared > explicitly by the driver on Tx. pktgen takes advantage of it. > We have IFF_TX_SKB_SHARING. > Will check how this flags is used. If you have any hint about this feel free to suggest it. > In practice everyone gets opted into SKB_SHARING because ether_setup() > sets the flag. A lot of drivers are not aware of the requirement and > will assume full ownership (and for example use skb->cb[]) :/ > Consider that currently this will be used by stmmac driver, brcm driver and someother atheros driver. > I don't think there is any Tx completion -> Rx pool recycling scheme > inside the drivers (if that's what you described). -- Ansuel
On Thu, Feb 03, 2022 at 12:10:27PM -0800, Jakub Kicinski wrote:
> On Thu, 3 Feb 2022 20:21:28 +0200 Vladimir Oltean wrote:
> > To my knowledge, when you call dev_queue_xmit(), the skb is no longer
> > yours, end of story, it doesn't matter whether you increase the refcount
> > on it or not. The DSA master may choose to do whatever it wishes with
> > that buffer after its TX completion interrupt fires: it may not call
> > napi_consume_skb() but directly recycle that buffer in its pool of RX
> > buffers, as part of some weird buffer recycling scheme. So you'll think
> > that the buffer is yours, but it isn't, because the driver hasn't
> > returned it to the allocator, and your writes for the next packet may be
> > concurrent with some RX DMA transactions. I don't have a mainline
> > example to give you, but I've seen the pattern, and I don't think it's
> > illegal (although of course, I stand to be corrected if necessary).
>
> Are we talking about holding onto the Tx skb here or also recycling
> the Rx one? Sorry for another out of context comment in advance..
We're talking about the possibility that the DSA master holds onto the
TX skb, for the purpose of saving a netdev_alloc_skb() call later in the
RX path.
> AFAIK in theory shared skbs are supposed to be untouched or unshared
> explicitly by the driver on Tx. pktgen takes advantage of it.
> We have IFF_TX_SKB_SHARING.
>
> In practice everyone gets opted into SKB_SHARING because ether_setup()
> sets the flag. A lot of drivers are not aware of the requirement and
> will assume full ownership (and for example use skb->cb[]) :/
>
> I don't think there is any Tx completion -> Rx pool recycling scheme
> inside the drivers (if that's what you described).
You made me go look again at commit acb600def211 ("net: remove skb
recycling"), a revert of which is still carried in some vendor kernels.
So there was a skb_is_recycleable() function with this check:
if (skb_shared(skb))
return false;
which means that yes, my argument is basically invalid, skb_get() in DSA
will protect against skb recycling.
I know Ansuel is using OpenWRT, so not a stranger to vendor kernels &
network drivers. My comment wasn't as much a hard NACK as it was a word
of caution. DSA allows pairing any switch driver to any host controller
driver. If somebody hits the jackpot (probably won't be me, won't be
Ansuel), it won't be exactly fun for them until they figure out what's
going on, with the symptoms being random data corruption during switch
register access. But after I revisited the exact situation, I don't have
an example that proves to be problematic.
© 2016 - 2026 Red Hat, Inc.