drivers/net/can/m_can/m_can_pci.c | 1 + drivers/net/can/m_can/m_can_platform.c | 1 + 2 files changed, 2 insertions(+)
The spin lock tx_handling_spinlock in struct m_can_classdev is not being
initialized. This leads to bug complaints from the kernel, eg. when
trying to send CAN frames with cansend from can-utils.
This patch fixes that by initializing the spin lock in the corresponding
device probe functions.
Signed-off-by: Antonios Salios <antonios@mwa.re>
---
drivers/net/can/m_can/m_can_pci.c | 1 +
drivers/net/can/m_can/m_can_platform.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
index 9ad7419f8..06243cd43 100644
--- a/drivers/net/can/m_can/m_can_pci.c
+++ b/drivers/net/can/m_can/m_can_pci.c
@@ -143,6 +143,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
pm_runtime_use_autosuspend(dev);
pm_runtime_put_noidle(dev);
pm_runtime_allow(dev);
+ spin_lock_init(&mcan_class->tx_handling_spinlock);
return 0;
diff --git a/drivers/net/can/m_can/m_can_platform.c b/drivers/net/can/m_can/m_can_platform.c
index b832566ef..c09c61d25 100644
--- a/drivers/net/can/m_can/m_can_platform.c
+++ b/drivers/net/can/m_can/m_can_platform.c
@@ -154,6 +154,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
ret = m_can_class_register(mcan_class);
if (ret)
goto out_runtime_disable;
+ spin_lock_init(&mcan_class->tx_handling_spinlock);
return ret;
--
2.49.0
… > This patch fixes … See also: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.15-rc3#n94 Regards, Markus
Hello Antonios,
On 24.04.2025 14:52:20, Antonios Salios wrote:
> The spin lock tx_handling_spinlock in struct m_can_classdev is not being
> initialized. This leads to bug complaints from the kernel, eg. when
> trying to send CAN frames with cansend from can-utils.
Thanks for your contribution, that's a good catch!
> This patch fixes that by initializing the spin lock in the corresponding
> device probe functions.
The spin_lock is only used in m_can.c, so it's better to be initialized
there.
Please add a fixes tag:
Fixes: 1fa80e23c150 ("can: m_can: Introduce a tx_fifo_in_flight counter")
I've also added Markus on Cc, who introduced the spin_lock.
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
On 24/04/2025 at 22:11, Marc Kleine-Budde wrote:
> Hello Antonios,
>
> On 24.04.2025 14:52:20, Antonios Salios wrote:
>> The spin lock tx_handling_spinlock in struct m_can_classdev is not being
>> initialized. This leads to bug complaints from the kernel, eg. when
^^^^^^^^^^^^^^
Maybe you can briefly describe what kind of bug (NULL pointer dereference).
Also, if you have the dmesg log of the error, this is something you can add at
the end of the patch description.
You will probably get a CVE for your finding.
Yours sincerely,
Vincent Mailhol
On Fri, 2025-04-25 at 00:23 +0900, Vincent Mailhol wrote: > Maybe you can briefly describe what kind of bug (NULL pointer > dereference). It's a spinlock bad magic bug that occurs when one tries to send a CAN frame using cansend. The frame gets transferred nonetheless. I'm testing the driver in an virtual RISC-V 64-bit environment with a recent mainline kernel. The M_CAN controller is io-mapped to the system. > Also, if you have the dmesg log of the error, this is something you > can add at > the end of the patch description. Will do, I'm just waiting for more feedback on the patch before sending a v3. In the meantime, the dmesg log looks like this: $ cansend can0 123#deadbeef [ 10.631450] BUG: spinlock bad magic on CPU#0, cansend/95 [ 10.631462] lock: 0xff60000002ec1010, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 10.631479] CPU: 0 UID: 0 PID: 95 Comm: cansend Not tainted 6.15.0- rc3-00032-ga79be02bba5c #5 NONE [ 10.631487] Hardware name: MachineWare SIM-V (DT) [ 10.631490] Call Trace: [ 10.631493] [<ffffffff800133e0>] dump_backtrace+0x1c/0x24 [ 10.631503] [<ffffffff800022f2>] show_stack+0x28/0x34 [ 10.631510] [<ffffffff8000de3e>] dump_stack_lvl+0x4a/0x68 [ 10.631518] [<ffffffff8000de70>] dump_stack+0x14/0x1c [ 10.631526] [<ffffffff80003134>] spin_dump+0x62/0x6e [ 10.631534] [<ffffffff800883ba>] do_raw_spin_lock+0xd0/0x142 [ 10.631542] [<ffffffff807a6fcc>] _raw_spin_lock_irqsave+0x20/0x2c [ 10.631554] [<ffffffff80536dba>] m_can_start_xmit+0x90/0x34a [ 10.631567] [<ffffffff806148b0>] dev_hard_start_xmit+0xa6/0xee [ 10.631577] [<ffffffff8065b730>] sch_direct_xmit+0x114/0x292 [ 10.631586] [<ffffffff80614e2a>] __dev_queue_xmit+0x3b0/0xaa8 [ 10.631596] [<ffffffff8073b8fa>] can_send+0xc6/0x242 [ 10.631604] [<ffffffff8073d1c0>] raw_sendmsg+0x1a8/0x36c [ 10.631612] [<ffffffff805ebf06>] sock_write_iter+0x9a/0xee [ 10.631623] [<ffffffff801d06ea>] vfs_write+0x184/0x3a6 [ 10.631633] [<ffffffff801d0a88>] ksys_write+0xa0/0xc0 [ 10.631643] [<ffffffff801d0abc>] __riscv_sys_write+0x14/0x1c [ 10.631654] [<ffffffff8079ebf8>] do_trap_ecall_u+0x168/0x212 [ 10.631662] [<ffffffff807a830a>] handle_exception+0x146/0x152 -- Antonios Salios Software Engineer MachineWare GmbH | www.machineware.de Hühnermarkt 19, 52062 Aachen, Germany Amtsgericht Aachen HRB25734 Geschäftsführung Lukas Jünger Dr.-Ing. Jan Henrik Weinstock
On Fry. 25 Apr. 2025 at 15:34, Antonios Salios <antonios@mwa.re> wrote: > On Fri, 2025-04-25 at 00:23 +0900, Vincent Mailhol wrote: > > Maybe you can briefly describe what kind of bug (NULL pointer > > dereference). > > It's a spinlock bad magic bug that occurs when one tries to send a CAN > frame using cansend. The frame gets transferred nonetheless. > I'm testing the driver in an virtual RISC-V 64-bit environment with a > recent mainline kernel. The M_CAN controller is io-mapped to the > system. I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK: https://elixir.bootlin.com/linux/v6.14.3/source/lib/Kconfig.debug#L1437 Without it, this would have been a more severe NULL pointer dereference. > > Also, if you have the dmesg log of the error, this is something you > > can add at > > the end of the patch description. > > Will do, I'm just waiting for more feedback on the patch before sending > a v3. In the meantime, the dmesg log looks like this: Great! My comments on the patch description are just nitpicks anyway. You can add my: Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> to the v3. Thanks a lot! Yours sincerely, Vincent Mailhol
On Fri, 2025-04-25 at 16:18 +0900, Vincent Mailhol wrote: > I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK: Indeed. > Without it, this would have been a more severe NULL pointer > dereference. Strangely, a NULL pointer dereference does not occur, when I try again with CONFIG_DEBUG_SPINLOCK disabled. The kernel does not crash, at least on rv64. Looking through the implementations of arch_spinlock_t, it seems that only PARISC's implementation would cause problems in this case since it uses an array. https://elixir.bootlin.com/linux/v6.15-rc3/source/arch/parisc/include/asm/spinlock_types.h#L11 I think I'm missing something, why do you think a NULL pointer deref would occur in this case? Thanks for your feedback! -- Antonios Salios Software Engineer MachineWare GmbH | www.machineware.de Hühnermarkt 19, 52062 Aachen, Germany Amtsgericht Aachen HRB25734 Geschäftsführung Lukas Jünger Dr.-Ing. Jan Henrik Weinstock
On Fri. 25 Apr. 2025 à 18:18, Antonios Salios <antonios@mwa.re> wrote: > On Fri, 2025-04-25 at 16:18 +0900, Vincent Mailhol wrote: > > I guess this is because your kernel has CONFIG_DEBUG_SPINLOCK: > > Indeed. > > > Without it, this would have been a more severe NULL pointer > > dereference. > > Strangely, a NULL pointer dereference does not occur, when I try again > with CONFIG_DEBUG_SPINLOCK disabled. The kernel does not crash, at > least on rv64. > > Looking through the implementations of arch_spinlock_t, it seems that > only PARISC's implementation would cause problems in this case since it > uses an array. > > https://elixir.bootlin.com/linux/v6.15-rc3/source/arch/parisc/include/asm/spinlock_types.h#L11 > > I think I'm missing something, why do you think a NULL pointer deref > would occur in this case? I see. Thanks for your test. I went a bit too quick in my analysis when I saw things like: raw_spin_lock(&lock->rlock); in https://elixir.bootlin.com/linux/v6.14/source/include/linux/spinlock.h#L351 I thought about the NULL pointer dereference. But indeed, you are right. The spinlock_t is just one attribute of a structure and will be allocated anyway even if spin_lock_init is not called, so calling spin_lock_irqsave(&cdev->tx_handling_spinlock, irqflags); will still pass a valid address. The other thing which put me off guard is that some other "spinlock bad magic" got assigned some CVE. https://lore.kernel.org/linux-cve-announce/2025031217-CVE-2025-21862-e8a0@gregkh/ But here as well, that does not imply a NULL pointer dereference. I think that the bug is only that the spin_lock is not working as intended. Regardless, just saying that it is a spinlock bad magic bug with the dmesg trace is enough. Thanks again for your tests! Yours sincerely, Vincent Mailhol
© 2016 - 2026 Red Hat, Inc.