drivers/firmware/arm_scmi/mailbox.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
There is a race between the failure of probe and rx_callback (due to a
delayed response).
scmi_probe
scmi_acquire_protocal
do_xfer
timeout
mailbox_chan_free
<--- delay response
rx_callback
mbox_free_channel
cinfo->transport_info = NULL
mailbox_clear_channel
dereference cinfo->transport_info
This patch check for it, and clear the channel at mailbox_chan_free.
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
---
drivers/firmware/arm_scmi/mailbox.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_scmi/mailbox.c b/drivers/firmware/arm_scmi/mailbox.c
index 19246ed1f01f..32c5fe0b8c29 100644
--- a/drivers/firmware/arm_scmi/mailbox.c
+++ b/drivers/firmware/arm_scmi/mailbox.c
@@ -230,6 +230,7 @@ static int mailbox_chan_free(int id, void *p, void *data)
smbox->chan = NULL;
smbox->chan_receiver = NULL;
smbox->cinfo = NULL;
+ shmem_clear_channel(smbox->shmem);
}
return 0;
@@ -284,7 +285,8 @@ static void mailbox_clear_channel(struct scmi_chan_info *cinfo)
{
struct scmi_mailbox *smbox = cinfo->transport_info;
- shmem_clear_channel(smbox->shmem);
+ if (smbox)
+ shmem_clear_channel(smbox->shmem);
}
static bool
--
2.30.2
On Wed, Aug 30, 2023 at 01:07:47AM +0800, Qiujun Huang wrote: > There is a race between the failure of probe and rx_callback (due to a > delayed response). > > scmi_probe > scmi_acquire_protocal > do_xfer > timeout > mailbox_chan_free > <--- delay response > rx_callback > mbox_free_channel > cinfo->transport_info = NULL > mailbox_clear_channel > dereference cinfo->transport_info It is always good to provide the kernel stacktrace which you get when a NULL pointer is dereference. It helps for review and also to document it. -- Regards, Sudeep
> 2023年8月30日 下午5:39,Sudeep Holla <sudeep.holla@arm.com> 写道: > > On Wed, Aug 30, 2023 at 01:07:47AM +0800, Qiujun Huang wrote: >> There is a race between the failure of probe and rx_callback (due to a >> delayed response). >> >> scmi_probe >> scmi_acquire_protocal >> do_xfer >> timeout >> mailbox_chan_free >> <--- delay response >> rx_callback >> mbox_free_channel >> cinfo->transport_info = NULL >> mailbox_clear_channel >> dereference cinfo->transport_info > > It is always good to provide the kernel stacktrace which you get when a > NULL pointer is dereference. It helps for review and also to document it. > > -- > Regards, > Sudeep Get it. Here is the splat. [ 1.942240][ C0] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048 [ 1.942241][ C0] Mem abort info: [ 1.942243][ C0] ESR = 0x96000005 [ 1.944888][ T9] spmi spmi-1: PMIC arbiter version v7 (0x70000000) [ 1.950652][ C0] EC = 0x25: DABT (current EL), IL = 32 bits [ 1.950653][ C0] SET = 0, FnV = 0 [ 1.950654][ C0] EA = 0, S1PTW = 0 [ 1.950656][ C0] Data abort info: [ 1.950657][ C0] ISV = 0, ISS = 0x00000005 [ 1.950658][ C0] CM = 0, WnR = 0 [ 1.950660][ C0] [0000000000000048] user address but active_mm is swapper [ 1.950663][ C0] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 2.338929][ C0] pc : mailbox_clear_channel+0x18/0x64 [ 2.344384][ C0] lr : scmi_handle_response+0x17c/0x4f4 [ 2.349923][ C0] sp : ffffffc010003db0 [ 2.354045][ C0] x29: ffffffc010003dc0 x28: ffffffd85263f000 [ 2.360216][ C0] x27: ffffffd851621068 x26: ffffffd84ec815c8 [ 2.366386][ C0] x25: ffffffd85263bf80 x24: ffffffd85263d230 [ 2.372556][ C0] x23: ffffff803cd70cc0 x22: 0000000000000008 [ 2.378726][ C0] x21: ffffff8036cf0df8 x20: ffffffd85161bac8 [ 2.384896][ C0] x19: ffffff8043ffa580 x18: ffffffc010005050 [ 2.391065][ C0] x17: 0000000000000000 x16: 00000000000000d8 [ 2.397234][ C0] x15: ffffffd8507965e8 x14: ffffffd84eaebdf0 [ 2.403404][ C0] x13: 00000000000001ea x12: 0000000000007ffb [ 2.409574][ C0] x11: 000000000000ffff x10: ffffffd852c5a000 [ 2.415744][ C0] x9 : d7be1a9b75f29500 x8 : 0000000000000000 [ 2.421914][ C0] x7 : 382e31202020205b x6 : ffffffd852c57e7c [ 2.428084][ C0] x5 : ffffffffffffffff x4 : 0000000000000000 [ 2.434254][ C0] x3 : ffffffd84eae6668 x2 : 0000000000000001 [ 2.440424][ C0] x1 : 0000000000000000 x0 : ffffff8043ffa580 [ 2.446594][ C0] Call trace: [ 2.449819][ C0] mailbox_clear_channel+0x18/0x64 [ 2.454916][ C0] scmi_handle_response+0x17c/0x4f4 [ 2.460100][ C0] rx_callback+0x60/0xec [ 2.464311][ C0] mbox_chan_received_data+0x44/0x94 [ 2.469584][ C0] qcom_rimps_rx_interrupt+0xc0/0x144 [qcom_rimps] [ 2.476111][ C0] __handle_irq_event_percpu+0xa0/0x414 [ 2.481652][ C0] handle_irq_event+0x84/0x1cc [ 2.486393][ C0] handle_fasteoi_irq+0x150/0x394 [ 2.491403][ C0] __handle_domain_irq+0x114/0x1e4 [ 2.496500][ C0] gic_handle_irq.33979+0x6c/0x2b8 [ 2.501597][ C0] el1_irq+0xe4/0x1c0 [ 2.505537][ C0] cpuidle_enter_state+0x3a4/0xa04 [ 2.510634][ C0] do_idle+0x308/0x574 [ 2.514661][ C0] cpu_startup_entry+0x84/0x90 [ 2.519402][ C0] kernel_init+0x0/0x310 [ 2.523611][ C0] start_kernel+0x0/0x648 [ 2.527908][ C0] start_kernel+0x52c/0x648 [ 2.532390][ C0] Code: f800865e a9017bfd 910043fd f9400808 (f9402508) [ 2.539360][ C0] ---[ end trace da7fdd5fdd7f7f09 ]--- [ 2.550088][ C0] Kernel panic - not syncing: Oops: Fatal exception in interrupt - - - thanks
On Wed, Aug 30, 2023 at 09:23:58PM +0800, Qiujun Huang wrote: > > > > 2023年8月30日 下午5:39,Sudeep Holla <sudeep.holla@arm.com> 写道: > > > > On Wed, Aug 30, 2023 at 01:07:47AM +0800, Qiujun Huang wrote: > >> There is a race between the failure of probe and rx_callback (due to a > >> delayed response). > >> > >> scmi_probe > >> scmi_acquire_protocal > >> do_xfer > >> timeout > >> mailbox_chan_free > >> <--- delay response > >> rx_callback > >> mbox_free_channel > >> cinfo->transport_info = NULL > >> mailbox_clear_channel > >> dereference cinfo->transport_info > > > > It is always good to provide the kernel stacktrace which you get when a > > NULL pointer is dereference. It helps for review and also to document it. > > > > -- > > Regards, > > Sudeep > > Get it. Here is the splat. > > [ 1.942240][ C0] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048 > [ 1.942241][ C0] Mem abort info: > [ 1.942243][ C0] ESR = 0x96000005 > [ 1.944888][ T9] spmi spmi-1: PMIC arbiter version v7 (0x70000000) > [ 1.950652][ C0] EC = 0x25: DABT (current EL), IL = 32 bits > [ 1.950653][ C0] SET = 0, FnV = 0 > [ 1.950654][ C0] EA = 0, S1PTW = 0 > [ 1.950656][ C0] Data abort info: > [ 1.950657][ C0] ISV = 0, ISS = 0x00000005 > [ 1.950658][ C0] CM = 0, WnR = 0 > [ 1.950660][ C0] [0000000000000048] user address but active_mm is swapper > [ 1.950663][ C0] Internal error: Oops: 96000005 [#1] PREEMPT SMP > [ 2.338929][ C0] pc : mailbox_clear_channel+0x18/0x64 > [ 2.344384][ C0] lr : scmi_handle_response+0x17c/0x4f4 > [ 2.349923][ C0] sp : ffffffc010003db0 > [ 2.354045][ C0] x29: ffffffc010003dc0 x28: ffffffd85263f000 > [ 2.360216][ C0] x27: ffffffd851621068 x26: ffffffd84ec815c8 > [ 2.366386][ C0] x25: ffffffd85263bf80 x24: ffffffd85263d230 > [ 2.372556][ C0] x23: ffffff803cd70cc0 x22: 0000000000000008 > [ 2.378726][ C0] x21: ffffff8036cf0df8 x20: ffffffd85161bac8 > [ 2.384896][ C0] x19: ffffff8043ffa580 x18: ffffffc010005050 > [ 2.391065][ C0] x17: 0000000000000000 x16: 00000000000000d8 > [ 2.397234][ C0] x15: ffffffd8507965e8 x14: ffffffd84eaebdf0 > [ 2.403404][ C0] x13: 00000000000001ea x12: 0000000000007ffb > [ 2.409574][ C0] x11: 000000000000ffff x10: ffffffd852c5a000 > [ 2.415744][ C0] x9 : d7be1a9b75f29500 x8 : 0000000000000000 > [ 2.421914][ C0] x7 : 382e31202020205b x6 : ffffffd852c57e7c > [ 2.428084][ C0] x5 : ffffffffffffffff x4 : 0000000000000000 > [ 2.434254][ C0] x3 : ffffffd84eae6668 x2 : 0000000000000001 > [ 2.440424][ C0] x1 : 0000000000000000 x0 : ffffff8043ffa580 > [ 2.446594][ C0] Call trace: > [ 2.449819][ C0] mailbox_clear_channel+0x18/0x64 Is this with latest kernel ? IIUC the mailbox_clear_channel should get called only for delayed response and notification in this path(scmi_handle_response). You are saying it happens as part of probe and again IIUC probe doesn't have any delayed response command. What am I missing ? Any additional changes in the tree ? My build has much smaller mailbox_clear_channel. > [ 2.454916][ C0] scmi_handle_response+0x17c/0x4f4 > [ 2.460100][ C0] rx_callback+0x60/0xec > [ 2.464311][ C0] mbox_chan_received_data+0x44/0x94 > [ 2.469584][ C0] qcom_rimps_rx_interrupt+0xc0/0x144 [qcom_rimps] This suggests out of tree driver, any pointers ? Also I vaguely remember discussing this in the past. Perhaps at [1] or somewhere else. -- Regards, Sudeep [1] https://lore.kernel.org/linux-arm-kernel/cfa26ff3-c95a-1986-58fc-b49fc9be49d5@quicinc.com/
On Thu, Aug 31, 2023 at 8:34 PM Sudeep Holla <sudeep.holla@arm.com> wrote: > > On Wed, Aug 30, 2023 at 09:23:58PM +0800, Qiujun Huang wrote: > > > > > > > 2023年8月30日 下午5:39,Sudeep Holla <sudeep.holla@arm.com> 写道: > > > > > > On Wed, Aug 30, 2023 at 01:07:47AM +0800, Qiujun Huang wrote: > > >> There is a race between the failure of probe and rx_callback (due to a > > >> delayed response). > > >> > > >> scmi_probe > > >> scmi_acquire_protocal > > >> do_xfer > > >> timeout > > >> mailbox_chan_free > > >> <--- delay response > > >> rx_callback > > >> mbox_free_channel > > >> cinfo->transport_info = NULL > > >> mailbox_clear_channel > > >> dereference cinfo->transport_info > > > > > > It is always good to provide the kernel stacktrace which you get when a > > > NULL pointer is dereference. It helps for review and also to document it. > > > > > > -- > > > Regards, > > > Sudeep > > > > Get it. Here is the splat. > > > > [ 1.942240][ C0] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048 > > [ 1.942241][ C0] Mem abort info: > > [ 1.942243][ C0] ESR = 0x96000005 > > [ 1.944888][ T9] spmi spmi-1: PMIC arbiter version v7 (0x70000000) > > [ 1.950652][ C0] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 1.950653][ C0] SET = 0, FnV = 0 > > [ 1.950654][ C0] EA = 0, S1PTW = 0 > > [ 1.950656][ C0] Data abort info: > > [ 1.950657][ C0] ISV = 0, ISS = 0x00000005 > > [ 1.950658][ C0] CM = 0, WnR = 0 > > [ 1.950660][ C0] [0000000000000048] user address but active_mm is swapper > > [ 1.950663][ C0] Internal error: Oops: 96000005 [#1] PREEMPT SMP > > [ 2.338929][ C0] pc : mailbox_clear_channel+0x18/0x64 > > [ 2.344384][ C0] lr : scmi_handle_response+0x17c/0x4f4 > > [ 2.349923][ C0] sp : ffffffc010003db0 > > [ 2.354045][ C0] x29: ffffffc010003dc0 x28: ffffffd85263f000 > > [ 2.360216][ C0] x27: ffffffd851621068 x26: ffffffd84ec815c8 > > [ 2.366386][ C0] x25: ffffffd85263bf80 x24: ffffffd85263d230 > > [ 2.372556][ C0] x23: ffffff803cd70cc0 x22: 0000000000000008 > > [ 2.378726][ C0] x21: ffffff8036cf0df8 x20: ffffffd85161bac8 > > [ 2.384896][ C0] x19: ffffff8043ffa580 x18: ffffffc010005050 > > [ 2.391065][ C0] x17: 0000000000000000 x16: 00000000000000d8 > > [ 2.397234][ C0] x15: ffffffd8507965e8 x14: ffffffd84eaebdf0 > > [ 2.403404][ C0] x13: 00000000000001ea x12: 0000000000007ffb > > [ 2.409574][ C0] x11: 000000000000ffff x10: ffffffd852c5a000 > > [ 2.415744][ C0] x9 : d7be1a9b75f29500 x8 : 0000000000000000 > > [ 2.421914][ C0] x7 : 382e31202020205b x6 : ffffffd852c57e7c > > [ 2.428084][ C0] x5 : ffffffffffffffff x4 : 0000000000000000 > > [ 2.434254][ C0] x3 : ffffffd84eae6668 x2 : 0000000000000001 > > [ 2.440424][ C0] x1 : 0000000000000000 x0 : ffffff8043ffa580 > > [ 2.446594][ C0] Call trace: > > [ 2.449819][ C0] mailbox_clear_channel+0x18/0x64 > > Is this with latest kernel ? IIUC the mailbox_clear_channel should get called no, it's on 5.10-stable kernel > only for delayed response and notification in this path(scmi_handle_response). > You are saying it happens as part of probe and again IIUC probe doesn't have > any delayed response command. What am I missing ? > > Any additional changes in the tree ? My build has much smaller > mailbox_clear_channel. > > > [ 2.454916][ C0] scmi_handle_response+0x17c/0x4f4 > > [ 2.460100][ C0] rx_callback+0x60/0xec > > [ 2.464311][ C0] mbox_chan_received_data+0x44/0x94 > > [ 2.469584][ C0] qcom_rimps_rx_interrupt+0xc0/0x144 [qcom_rimps] > > This suggests out of tree driver, any pointers ? > > Also I vaguely remember discussing this in the past. Perhaps at [1] or > somewhere else. Get that. It looks painful > > -- > Regards, > Sudeep > > [1] https://lore.kernel.org/linux-arm-kernel/cfa26ff3-c95a-1986-58fc-b49fc9be49d5@quicinc.com/
© 2016 - 2025 Red Hat, Inc.