There are two skb pointers to manage tx skb's enqueued from n/w stack.
waiting_tx_skb pointer points to the tx skb which needs to be processed
and ongoing_tx_skb pointer points to the tx skb which is being processed.
SPI thread prepares the tx data chunks from the tx skb pointed by the
ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
processed, the tx skb pointed by the waiting_tx_skb is assigned to
ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
Whenever there is a new tx skb from n/w stack, it will be assigned to
waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
handled in two different threads.
Consider a scenario where the SPI thread processed an ongoing_tx_skb and
it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
without doing any NULL check. At this time, if the waiting_tx_skb pointer
is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
stack and there is a chance to overwrite the tx skb pointer with NULL in
the SPI thread. Finally one of the tx skb will be left as unhandled,
resulting packet missing and memory leak.
- Consider the below scenario where the TXC reported from the previous
transfer is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
transported in 20 TXCs and waiting_tx_skb is still NULL.
tx_credits = 10; /* 21 are filled in the previous transfer */
ongoing_tx_skb = 20;
waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = 10;
waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to
transport the next remaining 10 tx chunks from ongoing_tx_skb.
tx_credits = 21;
ongoing_tx_skb = 10;
waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = NULL;
waiting_tx_skb = NULL;
- Now the below bad case might happen,
Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
--------------------------- -----------------------------------
- if waiting_tx_skb is NULL
- if ongoing_tx_skb is NULL
- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
- waiting_tx_skb = NULL
...
- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
To overcome the above issue, protect the moving of tx skb reference from
waiting_tx_skb pointer to ongoing_tx_skb pointer and assigning new tx skb
to waiting_tx_skb pointer, so that the other thread can't access the
waiting_tx_skb pointer until the current thread completes moving the tx
skb reference safely.
Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
---
drivers/net/ethernet/oa_tc6.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index 4c8b0ca922b7..574bfd4e9356 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -113,6 +113,7 @@ struct oa_tc6 {
struct mii_bus *mdiobus;
struct spi_device *spi;
struct mutex spi_ctrl_lock; /* Protects spi control transfer */
+ struct mutex tx_skb_lock; /* Protects tx skb handling */
void *spi_ctrl_tx_buf;
void *spi_ctrl_rx_buf;
void *spi_data_tx_buf;
@@ -1004,8 +1005,10 @@ static u16 oa_tc6_prepare_spi_tx_buf_for_tx_skbs(struct oa_tc6 *tc6)
for (used_tx_credits = 0; used_tx_credits < tc6->tx_credits;
used_tx_credits++) {
if (!tc6->ongoing_tx_skb) {
+ mutex_lock(&tc6->tx_skb_lock);
tc6->ongoing_tx_skb = tc6->waiting_tx_skb;
tc6->waiting_tx_skb = NULL;
+ mutex_unlock(&tc6->tx_skb_lock);
}
if (!tc6->ongoing_tx_skb)
break;
@@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
return NETDEV_TX_OK;
}
+ mutex_lock(&tc6->tx_skb_lock);
tc6->waiting_tx_skb = skb;
+ mutex_unlock(&tc6->tx_skb_lock);
/* Wake spi kthread to perform spi transfer */
wake_up_interruptible(&tc6->spi_wq);
@@ -1240,6 +1245,7 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev)
tc6->netdev = netdev;
SET_NETDEV_DEV(netdev, &spi->dev);
mutex_init(&tc6->spi_ctrl_lock);
+ mutex_init(&tc6->tx_skb_lock);
/* Set the SPI controller to pump at realtime priority */
tc6->spi->rt = true;
--
2.34.1
On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote: > @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb) > return NETDEV_TX_OK; > } > > + mutex_lock(&tc6->tx_skb_lock); > tc6->waiting_tx_skb = skb; > + mutex_unlock(&tc6->tx_skb_lock); start_xmit runs in BH / softirq context. You can't take sleeping locks. The lock has to be a spin lock. You could possibly try to use the existing spin lock of the tx queue (__netif_tx_lock()) but that may be more challenging to do cleanly from within a library.. Please make sure you test with builds including the kernel/configs/debug.config Kconfigs. -- pw-bot: cr
Hi Jakub, On 10/12/24 5:41 am, Jakub Kicinski wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote: >> @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb) >> return NETDEV_TX_OK; >> } >> >> + mutex_lock(&tc6->tx_skb_lock); >> tc6->waiting_tx_skb = skb; >> + mutex_unlock(&tc6->tx_skb_lock); > > start_xmit runs in BH / softirq context. You can't take sleeping locks. > The lock has to be a spin lock. You could possibly try to use the > existing spin lock of the tx queue (__netif_tx_lock()) but that may be > more challenging to do cleanly from within a library.. Thanks for the input. Yes, it looks like implementing a spin lock would be a right choice. I will implement it and do the testing as you suggested below and share the feedback. Best regards, Parthiban V > > Please make sure you test with builds including the > kernel/configs/debug.config Kconfigs. > -- > pw-bot: cr
Hi Jakub, On 12/12/24 6:03 pm, Parthiban.Veerasooran@microchip.com wrote: > Hi Jakub, > > On 10/12/24 5:41 am, Jakub Kicinski wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote: >>> @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb) >>> return NETDEV_TX_OK; >>> } >>> >>> + mutex_lock(&tc6->tx_skb_lock); >>> tc6->waiting_tx_skb = skb; >>> + mutex_unlock(&tc6->tx_skb_lock); >> >> start_xmit runs in BH / softirq context. You can't take sleeping locks. >> The lock has to be a spin lock. You could possibly try to use the >> existing spin lock of the tx queue (__netif_tx_lock()) but that may be >> more challenging to do cleanly from within a library.. > Thanks for the input. Yes, it looks like implementing a spin lock would > be a right choice. I will implement it and do the testing as you > suggested below and share the feedback. I tried using spin_lock_bh() variants (as the softirq involved) on both start_xmit() and spi_thread() where the critical regions need to be protected and tested by enabling the Kconfigs in the kernel/configs/debug.config. Didn't notice any warnings in the dmesg log. Note: Prior to the above test, purposefully I tried with spin_lock() variants on both the sides to check/simulate for the warnings using Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg regarding deadlock which clarified the expected behavior. And then I proceeded with the above fix and it worked as expected. If you agree, I will prepare the next version with this fix and post. Best regards, Parthiban V > > Best regards, > Parthiban V >> >> Please make sure you test with builds including the >> kernel/configs/debug.config Kconfigs. >> -- >> pw-bot: cr >
On Fri, 13 Dec 2024 10:35:03 +0000 Parthiban.Veerasooran@microchip.com wrote: > >> start_xmit runs in BH / softirq context. You can't take sleeping locks. > >> The lock has to be a spin lock. You could possibly try to use the > >> existing spin lock of the tx queue (__netif_tx_lock()) but that may be > >> more challenging to do cleanly from within a library.. > > Thanks for the input. Yes, it looks like implementing a spin lock would > > be a right choice. I will implement it and do the testing as you > > suggested below and share the feedback. > I tried using spin_lock_bh() variants (as the softirq involved) on both > start_xmit() and spi_thread() where the critical regions need to be > protected and tested by enabling the Kconfigs in the > kernel/configs/debug.config. Didn't notice any warnings in the dmesg log. > > Note: Prior to the above test, purposefully I tried with spin_lock() > variants on both the sides to check/simulate for the warnings using > Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg > regarding deadlock which clarified the expected behavior. And then I > proceeded with the above fix and it worked as expected. > > If you agree, I will prepare the next version with this fix and post. Go ahead.
Hi Jakub, On 14/12/24 8:12 am, Jakub Kicinski wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Fri, 13 Dec 2024 10:35:03 +0000 Parthiban.Veerasooran@microchip.com > wrote: >>>> start_xmit runs in BH / softirq context. You can't take sleeping locks. >>>> The lock has to be a spin lock. You could possibly try to use the >>>> existing spin lock of the tx queue (__netif_tx_lock()) but that may be >>>> more challenging to do cleanly from within a library.. >>> Thanks for the input. Yes, it looks like implementing a spin lock would >>> be a right choice. I will implement it and do the testing as you >>> suggested below and share the feedback. >> I tried using spin_lock_bh() variants (as the softirq involved) on both >> start_xmit() and spi_thread() where the critical regions need to be >> protected and tested by enabling the Kconfigs in the >> kernel/configs/debug.config. Didn't notice any warnings in the dmesg log. >> >> Note: Prior to the above test, purposefully I tried with spin_lock() >> variants on both the sides to check/simulate for the warnings using >> Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg >> regarding deadlock which clarified the expected behavior. And then I >> proceeded with the above fix and it worked as expected. >> >> If you agree, I will prepare the next version with this fix and post. > > Go ahead. Thanks for your comment. It is already posted for review. https://patchwork.kernel.org/project/netdevbpf/patch/20241213123159.439739-3-parthiban.veerasooran@microchip.com/ Best regards, Parthiban V
© 2016 - 2025 Red Hat, Inc.