[PATCH v4] can: mcp251x: Fix race condition on receive interrupt

Sebastian Würl posted 1 patch 3 years, 8 months ago
drivers/net/can/spi/mcp251x.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
[PATCH v4] can: mcp251x: Fix race condition on receive interrupt
Posted by Sebastian Würl 3 years, 8 months ago
The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
 - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
 - Setup CAN to 1 MHz
 - Spam bursts of 5 CAN-messages with increasing CAN-ids
 - Continue sending the bursts while sleeping a second between the bursts
 - Check on the RPi whether the received messages have increasing CAN-ids
 - Without this patch, every burst of messages will contain a flipped pair

Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
---
 drivers/net/can/spi/mcp251x.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 89897a2d41fa..df748b2e7421 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1068,15 +1068,12 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 	mutex_lock(&priv->mcp_lock);
 	while (!priv->force_quit) {
 		enum can_state new_state;
-		u8 intf, eflag;
+		u8 intf, intf1, eflag, eflag1;
 		u8 clear_intf = 0;
 		int can_id = 0, data1 = 0;
 
 		mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
 
-		/* mask out flags we don't care about */
-		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
-
 		/* receive buffer 0 */
 		if (intf & CANINTF_RX0IF) {
 			mcp251x_hw_rx(spi, 0);
@@ -1086,6 +1083,16 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 			if (mcp251x_is_2510(spi))
 				mcp251x_write_bits(spi, CANINTF,
 						   CANINTF_RX0IF, 0x00);
+
+			/* check if buffer 1 is already known to be full, no need to re-read */
+			if (!(intf & CANINTF_RX1IF)) {
+				/* intf needs to be read again to avoid a race condition */
+				mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+
+				/* combine flags from both operations for error handling */
+				intf |= intf1;
+				eflag |= eflag1;
+			}
 		}
 
 		/* receive buffer 1 */
@@ -1096,6 +1103,9 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
 				clear_intf |= CANINTF_RX1IF;
 		}
 
+		/* mask out flags we don't care about */
+		intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
 		/* any error or tx interrupt we need to clear? */
 		if (intf & (CANINTF_ERR | CANINTF_TX))
 			clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
-- 
2.30.2

Re: [PATCH v4] can: mcp251x: Fix race condition on receive interrupt
Posted by Andy Shevchenko 3 years, 8 months ago
On Thu, Aug 4, 2022 at 10:42 AM Sebastian Würl
<sebastian.wuerl@ororatech.com> wrote:
>
> The mcp251x driver uses both receiving mailboxes of the CAN controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled and will retrieve the
> messages accordingly.
>
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
>
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
>
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair

For future submissions...

> Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
> ---

When you send the new version of a single patch, use --annotate to put
a changelog here (after the cutter '---' line) so people will know how
v4 differes to v3 to v2 and to v1.


Also you mentioned some issues with the mail program. Use `git
send-email` for patch submission. Also you may consider my "smart"
(no) script [1] for that.

[1]: https://github.com/andy-shev/home-bin-tools/blob/master/ge2maintainer.sh


-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH v4] can: mcp251x: Fix race condition on receive interrupt
Posted by Marc Kleine-Budde 3 years, 8 months ago
On 04.08.2022 10:14:11, Sebastian Würl wrote:
> The mcp251x driver uses both receiving mailboxes of the CAN controller
> chips. For retrieving the CAN frames from the controller via SPI, it checks
> once per interrupt which mailboxes have been filled and will retrieve the
> messages accordingly.
> 
> This introduces a race condition, as another CAN frame can enter mailbox 1
> while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
> the interrupt handler is called next, mailbox 0 is emptied before
> mailbox 1, leading to out-of-order CAN frames in the network device.
> 
> This is fixed by checking the interrupt flags once again after freeing
> mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
> 
> For reproducing the bug I created the following setup:
>  - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
>  - Setup CAN to 1 MHz
>  - Spam bursts of 5 CAN-messages with increasing CAN-ids
>  - Continue sending the bursts while sleeping a second between the bursts
>  - Check on the RPi whether the received messages have increasing CAN-ids
>  - Without this patch, every burst of messages will contain a flipped pair
> 
> Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
> Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>

I've reduced the scope of intf1, eflag1 while applying the patch to
can-next/master.

Thanks,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |