[v1] can: m_can: initialize spin lock on device probe

[PATCH] can: m_can: initialize spin lock on device probe

Posted by Antonios Salios 9 months, 2 weeks ago

The spin lock tx_handling_spinlock in struct m_can_classdev is not being
initialized. This leads to bug complaints from the kernel, eg. when
trying to send CAN frames with cansend from can-utils.

This patch fixes that by initializing the spin lock in the corresponding
device probe functions.

Signed-off-by: Antonios Salios <antonios@mwa.re>
---
 drivers/net/can/m_can/m_can_pci.c      | 1 +
 drivers/net/can/m_can/m_can_platform.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
index 9ad7419f8..06243cd43 100644
--- a/drivers/net/can/m_can/m_can_pci.c
+++ b/drivers/net/can/m_can/m_can_pci.c
@@ -143,6 +143,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
 	pm_runtime_use_autosuspend(dev);
 	pm_runtime_put_noidle(dev);
 	pm_runtime_allow(dev);
+	spin_lock_init(&mcan_class->tx_handling_spinlock);
 
 	return 0;
 
diff --git a/drivers/net/can/m_can/m_can_platform.c b/drivers/net/can/m_can/m_can_platform.c
index b832566ef..c09c61d25 100644
--- a/drivers/net/can/m_can/m_can_platform.c
+++ b/drivers/net/can/m_can/m_can_platform.c
@@ -154,6 +154,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
 	ret = m_can_class_register(mcan_class);
 	if (ret)
 		goto out_runtime_disable;
+	spin_lock_init(&mcan_class->tx_handling_spinlock);
 
 	return ret;
 
-- 
2.49.0

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Markus Elfring 9 months, 2 weeks ago

…
> This patch fixes …

See also:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.15-rc3#n94

Regards,
Markus

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Marc Kleine-Budde 9 months, 2 weeks ago

Hello Antonios,

On 24.04.2025 14:52:20, Antonios Salios wrote:
> The spin lock tx_handling_spinlock in struct m_can_classdev is not being
> initialized. This leads to bug complaints from the kernel, eg. when
> trying to send CAN frames with cansend from can-utils.

Thanks for your contribution, that's a good catch!

> This patch fixes that by initializing the spin lock in the corresponding
> device probe functions.

The spin_lock is only used in m_can.c, so it's better to be initialized
there.

Please add a fixes tag:

Fixes: 1fa80e23c150 ("can: m_can: Introduce a tx_fifo_in_flight counter")

I've also added Markus on Cc, who introduced the spin_lock.

regards,
Marc


-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Vincent Mailhol 9 months, 2 weeks ago

On 24/04/2025 at 22:11, Marc Kleine-Budde wrote:
> Hello Antonios,
> 
> On 24.04.2025 14:52:20, Antonios Salios wrote:
>> The spin lock tx_handling_spinlock in struct m_can_classdev is not being
>> initialized. This leads to bug complaints from the kernel, eg. when
                              ^^^^^^^^^^^^^^
Maybe you can briefly describe what kind of bug (NULL pointer dereference).

Also, if you have the dmesg log of the error, this is something you can add at
the end of the patch description.

You will probably get a CVE for your finding.


Yours sincerely,
Vincent Mailhol

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Antonios Salios 9 months, 2 weeks ago

On Fri, 2025-04-25 at 00:23 +0900, Vincent Mailhol wrote:
> Maybe you can briefly describe what kind of bug (NULL pointer
> dereference).

It's a spinlock bad magic bug that occurs when one tries to send a CAN
frame using cansend. The frame gets transferred nonetheless.
I'm testing the driver in an virtual RISC-V 64-bit environment with a
recent mainline kernel. The M_CAN controller is io-mapped to the
system.

> Also, if you have the dmesg log of the error, this is something you
> can add at
> the end of the patch description.

Will do, I'm just waiting for more feedback on the patch before sending
a v3. In the meantime, the dmesg log looks like this:

$ cansend can0 123#deadbeef
[   10.631450] BUG: spinlock bad magic on CPU#0, cansend/95
[   10.631462]  lock: 0xff60000002ec1010, .magic: 00000000, .owner:
<none>/-1, .owner_cpu: 0
[   10.631479] CPU: 0 UID: 0 PID: 95 Comm: cansend Not tainted 6.15.0-
rc3-00032-ga79be02bba5c #5 NONE
[   10.631487] Hardware name: MachineWare SIM-V (DT)
[   10.631490] Call Trace:
[   10.631493] [<ffffffff800133e0>] dump_backtrace+0x1c/0x24
[   10.631503] [<ffffffff800022f2>] show_stack+0x28/0x34
[   10.631510] [<ffffffff8000de3e>] dump_stack_lvl+0x4a/0x68
[   10.631518] [<ffffffff8000de70>] dump_stack+0x14/0x1c
[   10.631526] [<ffffffff80003134>] spin_dump+0x62/0x6e
[   10.631534] [<ffffffff800883ba>] do_raw_spin_lock+0xd0/0x142
[   10.631542] [<ffffffff807a6fcc>] _raw_spin_lock_irqsave+0x20/0x2c
[   10.631554] [<ffffffff80536dba>] m_can_start_xmit+0x90/0x34a
[   10.631567] [<ffffffff806148b0>] dev_hard_start_xmit+0xa6/0xee
[   10.631577] [<ffffffff8065b730>] sch_direct_xmit+0x114/0x292
[   10.631586] [<ffffffff80614e2a>] __dev_queue_xmit+0x3b0/0xaa8
[   10.631596] [<ffffffff8073b8fa>] can_send+0xc6/0x242
[   10.631604] [<ffffffff8073d1c0>] raw_sendmsg+0x1a8/0x36c
[   10.631612] [<ffffffff805ebf06>] sock_write_iter+0x9a/0xee
[   10.631623] [<ffffffff801d06ea>] vfs_write+0x184/0x3a6
[   10.631633] [<ffffffff801d0a88>] ksys_write+0xa0/0xc0
[   10.631643] [<ffffffff801d0abc>] __riscv_sys_write+0x14/0x1c
[   10.631654] [<ffffffff8079ebf8>] do_trap_ecall_u+0x168/0x212
[   10.631662] [<ffffffff807a830a>] handle_exception+0x146/0x152

-- 
Antonios Salios
Software Engineer

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Vincent Mailhol 9 months, 2 weeks ago

On Fry. 25 Apr. 2025 at 15:34, Antonios Salios <antonios@mwa.re> wrote:
> On Fri, 2025-04-25 at 00:23 +0900, Vincent Mailhol wrote:
> > Maybe you can briefly describe what kind of bug (NULL pointer
> > dereference).
>
> It's a spinlock bad magic bug that occurs when one tries to send a CAN
> frame using cansend. The frame gets transferred nonetheless.
> I'm testing the driver in an virtual RISC-V 64-bit environment with a
> recent mainline kernel. The M_CAN controller is io-mapped to the
> system.

I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK:

  https://elixir.bootlin.com/linux/v6.14.3/source/lib/Kconfig.debug#L1437

Without it, this would have been a more severe NULL pointer dereference.

> > Also, if you have the dmesg log of the error, this is something you
> > can add at
> > the end of the patch description.
>
> Will do, I'm just waiting for more feedback on the patch before sending
> a v3. In the meantime, the dmesg log looks like this:

Great!

My comments on the patch description are just nitpicks anyway. You can add my:

Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

to the v3.

Thanks a lot!


Yours sincerely,
Vincent Mailhol

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Antonios Salios 9 months, 2 weeks ago

On Fri, 2025-04-25 at 16:18 +0900, Vincent Mailhol wrote:
> I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK:

Indeed.

> Without it, this would have been a more severe NULL pointer
> dereference.

Strangely, a NULL pointer dereference does not occur, when I try again
with CONFIG_DEBUG_SPINLOCK disabled. The kernel does not crash, at
least on rv64.

Looking through the implementations of arch_spinlock_t, it seems that
only PARISC's implementation would cause problems in this case since it
uses an array.

https://elixir.bootlin.com/linux/v6.15-rc3/source/arch/parisc/include/asm/spinlock_types.h#L11

I think I'm missing something, why do you think a NULL pointer deref
would occur in this case?

Thanks for your feedback!

-- 
Antonios Salios
Software Engineer

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock

Re: [PATCH] can: m_can: initialize spin lock on device probe

Posted by Vincent Mailhol 9 months, 2 weeks ago

On Fri. 25 Apr. 2025 à 18:18, Antonios Salios <antonios@mwa.re> wrote:
> On Fri, 2025-04-25 at 16:18 +0900, Vincent Mailhol wrote:
> > I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK:
>
> Indeed.
>
> > Without it, this would have been a more severe NULL pointer
> > dereference.
>
> Strangely, a NULL pointer dereference does not occur, when I try again
> with CONFIG_DEBUG_SPINLOCK disabled. The kernel does not crash, at
> least on rv64.
>
> Looking through the implementations of arch_spinlock_t, it seems that
> only PARISC's implementation would cause problems in this case since it
> uses an array.
>
> https://elixir.bootlin.com/linux/v6.15-rc3/source/arch/parisc/include/asm/spinlock_types.h#L11
>
> I think I'm missing something, why do you think a NULL pointer deref
> would occur in this case?

I see. Thanks for your test. I went a bit too quick in my analysis
when I saw things like:

  raw_spin_lock(&lock->rlock);

in

  https://elixir.bootlin.com/linux/v6.14/source/include/linux/spinlock.h#L351

I thought about the NULL pointer dereference. But indeed, you are
right. The spinlock_t is just one attribute of a structure and will be
allocated anyway even if spin_lock_init is not called, so calling

  spin_lock_irqsave(&cdev->tx_handling_spinlock, irqflags);

will still pass a valid address.

The other thing which put me off guard is that some other "spinlock
bad magic" got assigned some CVE.

https://lore.kernel.org/linux-cve-announce/2025031217-CVE-2025-21862-e8a0@gregkh/

But here as well, that does not imply a NULL pointer dereference. I
think that the bug is only that the spin_lock is not working as
intended.

Regardless, just saying that it is a spinlock bad magic bug with the
dmesg trace is enough. Thanks again for your tests!

Yours sincerely,
Vincent Mailhol